How to check memory error count in Linux

We have 8 DIMMs, 4 on each controller in this system.

sudo dmidecode -t memory  |grep -A10 'Locator: DIMM' | grep Serial.Number | grep -v NO.DIMM
    Serial Number: E22C60A2
    Serial Number: E12C4EA2
    Serial Number: DD2C61A2
    Serial Number: E32C4EA2
    Serial Number: E32C53A2
    Serial Number: DE2C5DA2
    Serial Number: E02C50A2
    Serial Number: E22C65A2
sudo dmidecode -t memory  |grep -A10 'Locator: DIMM' | grep Serial.Number | grep -v NO.DIMM | wc -l
8

Correctable error count

ls  -l /sys/devices/system/edac/mc/mc*/csrow*/*ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ce_count .     - Correctable error count for this row
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count   - Correctable error count for this channel
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch2_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc0/csrow0/ch3_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch1_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch2_ce_count
-r--r--r-- 1 root root 4096 Apr 21 01:48 /sys/devices/system/edac/mc/mc1/csrow0/ch3_ce_count

There are no correctable errors.

cat /sys/devices/system/edac/mc/mc*/csrow*/*ce_count
0
0
0
0
0
0
0
0
0
0

Uncorrectable errors

cat  /sys/devices/system/edac/mc/mc*/csrow*/ue_count
0
0

This his provided by a kernel module edac (Error Detection and Correction)

lsmod  | grep edac
sb_edac                27005  0
edac_core              57973  1 sb_edac

There is a utility called edac-util.
http://fibrevillage.com/sysadmin/240-how-to-identify-defective-dimm-from-edac-error-on-linux-2

Good reference
http://www.admin-magazine.com/Articles/Monitoring-Memory-Errors

Advertisements

How to check if Linux is swapping memory in and out?

My Linux server has lots of free RAM but it is still using swap. Should I be worried? Not really. What you should be more worried about is when your server is constantly swapping in and out.

How do I check that?

[root@rtfmp ~]# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1589660 182012 996316    0    0     5     5   56   71  0  0 99  0  0
 0  0      0 1589628 182012 996316    0    0     0     4   46   54  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   48   58  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   45   55  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     1   51   58  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   50   55  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   49   58  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   48   53  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   49   58  0  0 100  0  0
 0  0      0 1589628 182012 996316    0    0     0     0   48   54  0  0 100  0  0

I ran vmstat every 5 seconds, 10 times. Looking at the si and so fields, there is no swapping activity.

Even with a large RAM, you still need swap.

The casual reader1 may think that with a sufficient amount of memory, swap is unnecessary but this brings us to the second reason. A significant
number of the pages referenced by a process early in its life may only be used for initialisation and then never used again. It is better to swap out those pages and create more disk buffers than leave them resident and unused.
via Swap Management [kernel.org]

Another good read on swap from Linux.com: All about Linux swap space