Megaraid, media errors

Application(Hadoop) logs I/O errors:

2016-02-15 02:48:04,911 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage for block pool: \
BP-   2136893094-Server_IP-1400619662809 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used \
block  storage: /path
ExitCodeException exitCode=1: du: cannot access `/path': Input/output error
du: cannot access `/path': Input/output error
du: cannot access `/path': Input/output error

SCSI reports “Medium Error” in /var/log/messages:

Feb 15 02:47:04 hostame kernel: EXT4-fs error (device sdi): __ext4_get_inode_loc: unable to read inode \
block -  inode=50331696, block=201326626
Feb 15 02:47:33 hostame kernel: sd 0:2:8:0: [sdi] Unhandled sense code
Feb 15 02:47:33 hostame kernel: sd 0:2:8:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 15 02:47:33 hostame kernel: sd 0:2:8:0: [sdi] Sense Key : Medium Error [current]
Feb 15 02:47:33 hostame kernel: sd 0:2:8:0: [sdi] Add. Sense: No additional sense information
Feb 15 02:47:33 hostame kernel: sd 0:2:8:0: [sdi] CDB: Read(10): 28 00 60 00 01 10 00 00 08 00

MegaCli shows the same disk has media errors

# ./MegaCli64    -PDList -aALL | grep 'Media Error Count'
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 51
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 347279
Media Error Count: 0
Media Error Count: 14



# smartctl -a -d megaraid,31  /dev/sdi | grep -A10 'Error'
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0     5481    347279         0          0     847283.137      347279
write:         0        0         0         0          0      30548.932           0
verify:        0        0         5         0          0      98019.557           5

I am told that rebooting the server will clear the errors which I don’t find surprising. How do I know the disk is failing or not? I will update this post when I have answer

MegaCli cheat sheet

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s