Machine-check exception
This article aims to help users implement services to actively monitor, log, and report hardware errors. A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.
Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs or the CPU, or from other miscellaneous faults, including faulty software triggering hardware errors.
Installation
Install the mcelog package. mcelog written by Andi Kleen is one of the tools to gather MCE information.
Configuration
mcelog's configuration file is located at /etc/mcelog/mcelog.conf
. See man mcelog
, man mcelog.conf
and man mcelog.triggers
for more information.
Start and enable mcelog.service
. By default, the service runs mcelog as a daemon.
See also
- Wikipedia:Machine_Check_Exception
- Wikipedia:Machine_check_architecture
- mcelog Home
- mcelog References