How to check memory error count in Linux

We have 8 DIMMs, 4 on each controller in this system.

sudo dmidecode -t memory  |grep -A10 'Locator: DIMM' | grep Serial.Number | grep -v NO.DIMM
    Serial Number: E22C60A2
    Serial Number: E12C4EA2
    Serial Number: DD2C61A2
    Serial Number: E32C4EA2
    Serial Number: E32C53A2
    Serial Number: DE2C5DA2
    Serial Number: E02C50A2
    Serial Number: E22C65A2
sudo dmidecode -t memory  |grep -A10 'Locator: DIMM' | grep Serial.Number | grep -v NO.DIMM | wc -l
8

Correctable error count

ls  -l /sys/devices/system/edac/mc/mc*/csrow*/*ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ce_count .     - Correctable error count for this row
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count   - Correctable error count for this channel
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch3_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch3_ce_count

There are no correctable errors.

cat /sys/devices/system/edac/mc/mc*/csrow*/*ce_count
0
0
0
0
0
0
0
0
0
0

Uncorrectable errors

cat  /sys/devices/system/edac/mc/mc*/csrow*/ue_count
0
0

This his provided by a kernel module edac (Error Detection and Correction)

lsmod  | grep edac
sb_edac                27005  0
edac_core              57973  1 sb_edac

There is a utility called edac-util.
http://fibrevillage.com/sysadmin/240-how-to-identify-defective-dimm-from-edac-error-on-linux-2

Good reference
http://www.admin-magazine.com/Articles/Monitoring-Memory-Errors

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s