Anonymous version
  Home History Support Articles Screenshots Download F.A.Q. Beta Area Advertise here  
SpeedFan 4.52
Copyright 2000-2020 by Alfredo Milani Comparetti
Donate something for SpeedFan :-)
Last edited on 09 Feb 2005.

What is S.M.A.R.T.?
S.M.A.R.T. is for Self-Monitoring, Analysis and Reporting Technology. First developed by Compaq, Hitachi, IBM, Maxtor, Quantum, Seagate, Toshiba and Western Digital, it has been adopted by almost every manufacturer. While a hard disk is running, its internal logic encounters events and reacts to them to fix unusual or unwanted situations. By keeping track of them, we can know that something sometimes didn't work fine with our hard disk. S.M.A.R.T. extends this philosophy by analyzing several parameters and reporting them. Data reported by S.M.A.R.T. can include the number of retries when transmitting data to the computer, the number of spare sectors that were used to replace bad ones, the number of times the hard disk has been started and stopped, the internal temperature and much more. S.M.A.R.T. relies on attributes, values and thresholds. Based on them, a hard disk might be about to fail.

What are attributes, values and thresholds?
hard disk An attribute is something that a specific hard disk logic is able to analyze and report about. Every hard disk can include a different set of attributes. Every device manufacturer can publish some attribute based on its ability to report about it and on his knowledge that such an attribute is useful to decide about hard disk reliability over time. Every attribute can assume a value. Such values change over time. Higher values indicate a better health, while lower ones should be considered symptoms of something that either degraded or is degrading. Every attribute has a corresponding threshold. When an attribute value is the same as or lower than its threshold, the drive is considered to be failing S.M.A.R.T. status. A threshold of 255 means something that will alway fail and should only be used for test purposes. A threshold can be only be set to values from 1 to 253. 254 is forbidden and 0 means that the attribute the threshold is associated to should be considered only informational and that it has no direct influence over reliability. Every attribute stores the worst value it ever assumed and some raw data. Raw data is highly vendor specific and only specific tools find it really useful. It should be noted that some kind of silent agreement exists over raw values and some kind of standard might be assumed. According to S.M.A.R.T. specifications, no linearity is implied in attribute values, but since lower values mean worst conditions, even if linearity is not mandatory, it can be assumed as a first guess. It should be noted that a lot of things in S.M.A.R.T. specifications are left up to the device manufacturer. At the lowest level, we might even know nothing about any attribute as long as values and thresholds are not exceeded. Any device that is not failing any attribute should be considered fully functional.

How to interpret S.M.A.R.T. data
Some attributes are flagged to be performance related, while other ones are related to the actual fitness of the drive. Some other attributes have no special relationship. It's up to the manufacturer to set flags and thresholds accordingly. Attribute values can range from 1 to 253. 0, 254 and 255 are invalid and should not be used. 253 is the highest value an attribute can assume and 100 is the initial value for any attribute prior to any data collection. Let's have a look at a sample report from a S.M.A.R.T. enabled hard disk:
  • Attribute id is 4 ("Start/stop count")
  • Value is 253
  • Worst value is 253
  • Threshold is 0
  • Raw value is 1324
Since the threshold was set by the device manufacturer to 0, it means this is an informational attribute. The raw value indicates how many times the hard disk was started and stopped. The value is set to 253, which means that the health related to this attribute is at its best, and the worst value set to 253 states that the drive was always reported to be healthy.

Now let's look at another attribute:
  • Attribute id is 5 ("Reallocated sector count")
  • Value is 253
  • Worst value is 253
  • Threshold is 63
  • Raw value is 0
This time, the threshold value shows that this attribute is strictly related to device reliability. If the value for this attribute reaches 63 or an even lower value, the drive is expected to fail soon. This is obvious, as modern hard disks do include an area with spare sectors that are normally unused, but where bad sectors can be transparently remapped when found. The amount of spare sectors is fixed and while they become less and less with new bad sectors being detected, this attribute is updated. The raw value shows the number of reallocated sectors. When they are 0, no bad sector was found and needed remapping. When this value is higher, some bad sectors were discovered. While the raw value is still low, there is no real threat to the hard disk reliability, but when that number grows, we should seriously consider a replacement for the drive. This all will be reflected by the synthetic value associated to this attribute. In this example, its value is 253, which means that everything is working perfectly when coming to reallocated sectors.

Now let's look at another sample for the same attribute, but from a different drive:
  • Attribute id is 5 ("Reallocated sector count")
  • Value is 85
  • Worst value is 85
  • Threshold is 63
  • Raw value is 37
This time the value is 85, which is less than 100 and even less than 253. This means that this attribute is not in perfect shape. Since the manufacturer set this threshold to 63, we can still assume the drive is working properly and will still work properly in the (near) future. Because of the nature of this attribute, the raw value is easy to decipher and we can try to infere something by reading it too. Keep in mind that this manufacturer decided that to a raw value of 37 corresponds a value of 85. Some other manufacturer might use different numbers. Since most IDE drives include 512 spare sectors, we can try to figure out how bad the situation is, but we should remember that this is something that is not directly stated by the S.M.A.R.T. data. In order to be sure that the raw value actually represents the number of reallocated sectors, we should read the product manual for the drive and we should do the same to be sure about the 512 value. What can be read from S.M.A.R.T. data is that the attribute whose id is 5 (we need not to know that it actually represents "Reallocated sector count") has a direct influence over reliability (we understand this because the threshold value is higher than 0), that it is somewhat degraded or not at its best (because its current value is lower than both 100 and 253) and that it is not failing. This example helps us to understand that the actual meaning for the threshold is not to show something that already failed, but something that is about to fail. If we assume that spare sectors are 512 and that the raw value represents the number of spare sectors currently used to remap bad sectors, we might expect to read a value that equals the threshold when the raw value reports, say, 300. This means that several bad sectors were spotted and that the drive manufacturer considers this as a significant evidence of a hard disk that is about to fail.

How does SpeedFan interpret S.M.A.R.T. data?
SpeedFan shows the data collected from the hard disk and publishes two additional synthetic indicators that try to represent the status in a more complete way than the simple OK/FAIL status reported by S.M.A.R.T.. This should be considered as an attempt at achieving a higher degree of knowledge, but it should not be considered as a final word. SpeedFan publishes the fitness and the performance indicators. They are the result of the analysis of the retrieved attributes. Some of them are flagged, by the manufacturer, as performance related, while some others are flagged as health related. The fitness indicator considers only those attributes that are health related. The performance indicator considers the ones related to performance. Starting with SpeedFan 4.28, there is no longer the assumption that attributes' values are linear. This is the result of a deep statistical analysis performed on about 100.000 reports. The synthetic indicators shown inside SpeedFan are a good indication of your current hard disk status, but the IN-DEPTH ONLINE ANALYSIS tool available on the S.M.A.R.T. tab is extremely more powerful. That tool requires you to be connected to the internet. Your hard disk status will be compared to an updated database and relevant annotations will be issued. Just like an expert analyzing your drive would do.

Why is my brand new drive showing poor fitness values?
As shown in the paragraph above, SpeedFan tries to build two synthetic indicators, but this is based on heuristic and on some basic assumptions. Most manufacturers do not use values lower than 100 when any attribute needs no attention. Some of them use 253 as the starting value and then decrease it down to zero. Some others use 253 until it needs to be updated and, when this happens, it is suddenly lowered down to 100. Some other do use values in the range 45 to 65 for perfectly running hard disks. The in-depth analysis tool can handle all such situations and report higly relevant information.

What to do when S.M.A.R.T. reports a failing hard disk?
The answer is easy: you need to backup your data as soon as possible. If spotted early, this warning condition can be quite easy to handle, as a full drive copy should be possible. This means that the downtime for our system will be extremely short and there will be no need for any reinstall. What if we didn't have S.M.A.R.T. and our drive failed all of a sudden?...

By Alfredo Milani Comparetti (alfredo [at] almico.com)


Google      Web www.almico.com
Page generated in 0.0438 seconds Powered by (new)... Page viewed 457697 times
- Did you know that SpeedFan has an RSS feed feed? Privacy policy