二维码
中国国内免费供求贸易网

扫一扫关注

当前位置: 首页 » 资讯 » 商务贸易 » 技术交流 » 正文

Hardware Error 内存报错 kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB

放大字体  缩小字体 发布日期:2018-05-18 08:54:02    来源:企业网    评论:0
导读

企业网用的服务器突然内存报错,不断错误蹦出来无法控制,查看/var/log/messages日志发现:May 17 12:39:44 oldweb kernel:

企业网用的服务器突然内存报错,不断错误蹦出来无法控制,查看/var/log/messages日志发现:
May 17 12:39:44 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:39:44 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xa97802b50
May 17 12:39:44 oldweb kernel: EDAC MC2: CE page 0xa97802, offset 0xb50, grain 0, syndrome 0x11c1, row 0, channel 1, label "": amd64_edac
May 17 12:39:44 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:39:44 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:39:44 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000a97802b50
May 17 12:39:44 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:42:14 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:42:14 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xaa880c350
May 17 12:42:14 oldweb kernel: EDAC MC2: CE page 0xaa880c, offset 0x350, grain 0, syndrome 0x11c1, row 0, channel 1, label "": amd64_edac
May 17 12:42:14 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:42:14 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:42:14 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000aa880c350
May 17 12:42:14 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:43:29 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:43:29 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xbe1283db0
May 17 12:43:29 oldweb kernel: EDAC MC2: CE page 0xbe1283, offset 0xdb0, grain 0, syndrome 0x11c1, row 4, channel 1, label "": amd64_edac
May 17 12:43:29 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:43:29 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:43:29 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000be1283db0
May 17 12:43:29 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:06 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:06 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x97eeb47a0
May 17 12:44:06 oldweb kernel: EDAC MC2: CE page 0x97eeb4, offset 0x7a0, grain 0, syndrome 0x11c1, row 5, channel 1, label "": amd64_edac
May 17 12:44:06 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:06 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:44:06 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x000000097eeb47a0
May 17 12:44:06 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:25 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:25 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8fe32c0c0
May 17 12:44:25 oldweb kernel: EDAC MC2: CE page 0x8fe32c, offset 0xc0, grain 0, syndrome 0x2242, row 1, channel 1, label "": amd64_edac
May 17 12:44:25 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:25 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc21400022080a13
May 17 12:44:25 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008fe32c0c0
May 17 12:44:25 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:34 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:34 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8e2f2b0c0
May 17 12:44:34 oldweb kernel: EDAC MC2: CE page 0x8e2f2b, offset 0xc0, grain 0, syndrome 0x11c1, row 1, channel 1, label "": amd64_edac
May 17 12:44:34 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:34 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080813
May 17 12:44:34 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008e2f2b0c0
May 17 12:44:34 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: SRC (no timeout)
May 17 12:44:39 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:39 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8d1504000
May 17 12:44:39 oldweb kernel: EDAC MC2: CE page 0x8d1504, offset 0x0, grain 0, syndrome 0x2242, row 0, channel 1, label "": amd64_edac
May 17 12:44:39 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:39 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc21400022080a13
May 17 12:44:39 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008d1504000
May 17 12:44:39 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:41 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:41 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8b1012c80
May 17 12:44:41 oldweb kernel: EDAC MC2: CE page 0x8b1012, offset 0xc80, grain 0, syndrome 0x3383, row 0, channel 1, label "": amd64_edac
May 17 12:44:41 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:41 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc41c00033080a13
May 17 12:44:41 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008b1012c80
May 17 12:44:41 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:42 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:42 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xb66a9caf0
May 17 12:44:42 oldweb kernel: EDAC MC2: CE page 0xb66a9c, offset 0xaf0, grain 0, syndrome 0x11c1, row 4, channel 1, label "": amd64_edac
May 17 12:44:42 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:42 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:44:42 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000b66a9caf0
May 17 12:44:42 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:43 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:43 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8923a3f90



网上说 (node 2)是CPU2
于是用命令查看:
grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

其他省略....
/sys/devices/system/edac/mc/mc2/csrow1/ch1_ce_count:4414           (该条内存不归0了,应该是他)

用其他命令如下

[root@oldweb ~]# grep [0-9] /sys/devices/system/edac/mc/mc2/csrow0/ch*_ce_count
/sys/devices/system/edac/mc/mc2/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow0/ch1_ce_count:4976


[root@oldweb ~]# grep [0-9] /sys/devices/system/edac/mc/mc2/csrow1/ch*_ce_count
/sys/devices/system/edac/mc/mc2/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow1/ch1_ce_count:4414

count不为0的行即代表存在内存错误。

 

mc*:第好多个CPU(一定要看主板上标注的是CPU0还是CPU2,不要把CPU1当成第二颗,根据实际标注)。

csrow*:内存通道。

ch*:通道内的第几根内存。

通过分析知道是CPU2的第一个通道的DIMM 1 出问题了。于是拆下该内存。




 
(文/小编)
免责声明
本文为小编原创作品,作者: 小编。欢迎转载,转载请注明原文出处:http://news.shangjiaku.cn/show-186826.html 。本文仅代表作者个人观点,本站未对其内容进行核实,请读者仅做参考,如若文中涉及有违公德、触犯法律的内容,一经发现,立即删除,作者需自行承担相应责任。涉及到版权或其他问题,请及时联系我们。
0相关评论
 

冀ICP备10017211号-20

冀ICP备2022001573号-1