Thursday, 6 June 2013

How to check hard disk failure in linux

smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. First, make sure S.M.A.R.T. support is enabled in the BIOS. Next, run the following command to see if your hard disks support S.M.A.R.T technology or not:

[root@vellore ~]# smartctl -i /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST3250318AS
Serial Number:    9VM8EC2E
LU WWN Device Id: 5 000c50 01a4a684d
Firmware Version: CC38
User Capacity:    250,058,268,160 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Jun  6 10:19:02 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

If not enabled by default, To enable SMART, run:

[root@vellore ~]# smartctl -s on -d ata /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

Run overall-health self-assessment test, enter:

[root@vellore ~]# smartctl -d ata -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

A sample output from failing hard disk: 

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   044   033   045    Old_age   Always   FAILING_NOW 56 (96 110 58 25)
 
The following will provide even more information about failing hard disk:

[root@vellore ~]# smartctl --attributes --log=selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       221288423
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7332
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       205719784
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8550
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   020    Old_age   Always       -       3611
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   046   046   000    Old_age   Always       -       54
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       68720526384
189 High_Fly_Writes         0x003a   095   095   000    Old_age   Always       -       5
190 Airflow_Temperature_Cel 0x0022   066   048   045    Old_age   Always       -       34 (Min/Max 20/37)
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       -       34 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   038   032   000    Old_age   Always       -       221288423
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   192   000    Old_age   Always       -       2109
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       269139830654254
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1152744560
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       177103330

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8550         -
# 2  Short offline       Completed without error       00%      8550         -



You can read more data from hard disk by typing the following command:

[root@vellore ~]# smartctl -d ata -a /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST3250318AS
Serial Number:    9VM8EC2E
LU WWN Device Id: 5 000c50 01a4a684d
Firmware Version: CC38
User Capacity:    250,058,268,160 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Jun  6 10:25:54 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  634) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (  49) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x103f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       221291036
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7332
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       205719881
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8550
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   020    Old_age   Always       -       3611
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   046   046   000    Old_age   Always       -       54
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       68720526384
189 High_Fly_Writes         0x003a   095   095   000    Old_age   Always       -       5
190 Airflow_Temperature_Cel 0x0022   067   048   045    Old_age   Always       -       33 (Min/Max 20/37)
194 Temperature_Celsius     0x0022   033   052   000    Old_age   Always       -       33 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   038   032   000    Old_age   Always       -       221291036
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   192   000    Old_age   Always       -       2109
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       251745213105454
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1152748352
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       177103394

SMART Error Log Version: 1
ATA Error Count: 53 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 53 occurred at disk power-on lifetime: 2271 hours (94 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2a 68 f4 00  Error: UNC at LBA = 0x00f4682a = 16017450

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 18 18 68 f4 e0 00      00:05:05.081  READ DMA EXT
  25 00 1d 7a 5f 21 e0 00      00:05:05.072  READ DMA EXT
  25 00 12 27 ba 0b e0 00      00:05:05.065  READ DMA EXT
  25 00 40 fb 2c 1b e0 00      00:05:05.058  READ DMA EXT
  35 00 08 a8 84 6f e1 00      00:05:05.053  WRITE DMA EXT

Error 52 occurred at disk power-on lifetime: 2271 hours (94 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2a 68 f4 00  Error: UNC at LBA = 0x00f4682a = 16017450

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 18 18 68 f4 e0 00      00:04:59.057  READ DMA EXT
  35 00 08 c8 b2 63 e0 00      00:04:59.057  WRITE DMA EXT
  35 00 08 d8 1b 63 e0 00      00:04:59.056  WRITE DMA EXT
  35 00 08 90 1b 5f e0 00      00:04:59.056  WRITE DMA EXT
  35 00 05 48 dd 2a e0 00      00:04:59.055  WRITE DMA EXT

Error 51 occurred at disk power-on lifetime: 2271 hours (94 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2a 68 f4 00  Error: UNC at LBA = 0x00f4682a = 16017450

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 28 68 f4 e0 00      00:04:55.718  READ DMA EXT
  35 00 00 70 8e c7 e0 00      00:04:55.717  WRITE DMA EXT
  35 00 00 70 8d c7 e0 00      00:04:55.716  WRITE DMA EXT
  35 00 00 70 8c c7 e0 00      00:04:55.716  WRITE DMA EXT
  35 00 00 70 8b c7 e0 00      00:04:55.715  WRITE DMA EXT

Error 50 occurred at disk power-on lifetime: 2271 hours (94 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2a 68 f4 00  Error: UNC at LBA = 0x00f4682a = 16017450

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 28 68 f4 e0 00      00:04:52.682  READ DMA EXT
  35 00 10 e8 3d 63 e0 00      00:04:52.681  WRITE DMA EXT
  35 00 08 48 22 63 e0 00      00:04:52.681  WRITE DMA EXT
  35 00 08 68 2c 16 e0 00      00:04:52.681  WRITE DMA EXT
  35 00 08 e0 36 04 e0 00      00:04:52.680  WRITE DMA EXT

Error 49 occurred at disk power-on lifetime: 2271 hours (94 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2a 68 f4 00  Error: UNC at LBA = 0x00f4682a = 16017450

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 20 68 f4 e0 00      00:04:49.433  READ DMA EXT
  35 00 08 88 1b 5f e0 00      00:04:49.433  WRITE DMA EXT
  35 00 08 40 29 16 e0 00      00:04:49.433  WRITE DMA EXT
  35 00 08 90 3b 07 e0 00      00:04:49.433  WRITE DMA EXT
  35 00 06 88 83 f4 e0 00      00:04:49.432  WRITE DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8550         -
# 2  Short offline       Completed without error       00%      8550         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Say hello to GSmartControl

GSmartControl is hard disk drive health inspection tool and a graphical user interface for smartctl command. This tool has the following features:
  1. Automatically reports and highlights any anomalies;
  2. Allows enabling/disabling SMART;
  3. Allows enabling/disabling Automatic Offline Data Collection - a short self-check that the drive will perform automatically every four hours with no impact on performance;
  4. Supports configuration of global and per-drive options for smartctl;
  5. Performs SMART self-tests;
  6. Displays drive identity information, capabilities, attributes, and self-test/error logs;
  7. Can read in smartctl output from a saved file, interpreting it as a read-only virtual device;
  8. Works on most smartctl-supported operating systems like *BSD and various Linux distros;
  9. Has extensive help information.

[root@vellore ~]# yum install gsmartcontrol



No comments:

Post a Comment