RHEL6 Kernel Panic and System Crash – Troubleshooting Reference

by 서진우 · 2015년 11월 5일

Crash is a generic term used usually to say that the system has come to halt and no progress is observed. The system seems unresponsive or has already rebooted.

Kernel Panic – A voluntary halt to all system activity when an abnormal situation is detected by the kernel. A Kernel panic is an action taken by an operating system upon detecting an Internal fatal error from which it cannot safely recover. And in Linux these Kernel Panics can be caused by different reasons

Hardware: Machine Check Exceptions
Error Detection and Correction (EDAC)
Non-Maskable Interrupts (NMIs)
- Hardware NMI Button
- NMI Watch Dog
- unknown_nmi_panic
- panic_on_unrecovered_nmi
- panic_on_io_nmi
Software related BUG() macro
Software related Bad pointer handling
Software related Pseudo-hangs
Software related Out-of-Memory killer

Hardware: Machine Check Exceptions

Hardware Machine Check Exceptions normally caused by the the Component failures detected and reported by the hardware via an exception, and they typically looks like:

kernel: CPU 0: Machine Check Exception: 4
Bank 0: b278c00000000175
kernel: TSC 4d9eab664a9a60
kernel: Kernel panic – not syncing: Machine check

Sample Scenario 1 :

System hangs or kernel panics with MCE (Machine Check Exception) in /var/log/messages file.
System was not responding. Checked the messages in netdump server. Found the following messages …”Kernel panic – not syncing: Machine check”.
System crashes under load.
System crashed and rebooted.
Machine Check Exception panic

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Error Detection and Correction (EDAC)

Normally, EDAC errors caused by Hardware mechanism to detect and report memory chip and PCI transfer errors, and reported in /sys/devices/system/edac/{mc/,pci} and logged by the kernel as:

EDAC MC0: CE page 0x283, offset 0xce0, grain 8,
syndrome 0x6ec3, row 0, channel 1 “DIMM_B1”:
amd76x_edac

All the Informational EDAC messages (such as a corrected ECC error) are printed to the system log, where as critical EDAC messages (such as exceeding a hardware-defined temperature threshold) trigger a kernel panic.

Sample Scenario 2 :

Console Screen having the messages as below

Northbridge Error, node 1, core: -1
K8 ECC error.
EDAC amd64 MC1: CE ERROR_ADDRESS= 0x101a793400
EDAC MC1: INTERNAL ERROR: row out of range (-22 >= 8)
EDAC MC1: CE – no information available: INTERNAL ERROR
EDAC MC1: CE – no information available: amd64_edacError Overflow

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Non-Maskable Interrupts (NMIs)

A Non maskable interrupt (NMI) is an interrupt that is unable to be ignored/masked out by standard operating system mechanisms. A non-maskable interrupt (NMI) cannot be ignored, and is generally used only for critical hardware errors however recent changes in behavior has added additional functionality of:

1) NMI button.

The NMI This can be used to signal the operating system when other standard input mechanisms (keyboard, ssh, network) have ceased to function.
It can be used to create an intentional panic for additional debugging. It may not always be a physical button.
It may be presented through an iLO or Drac Interface.

Unknown NMIs – The kernel has mechanisms to handle certain known NMIs appropriately, unknown ones typically result in kernel log warnings such as:

Uhhuh. NMI received.
Dazed and confused, but trying to continue
You probably have a hardware problem with your RAM chips
Uhhuh. NMI received for unknown reason 32.
Dazed and confused, but trying to continue.
Do you have a strange power saving mode enabled?
These unknown NMI messages can be produced by ECC and other hardware problems. The kernel can be configured to panic when these are received

though this sysctl:

kernel.unknown_nmi_panic=1

This is generally only enabled for troubleshooting

Sample Scenario 3:

The following error message appearing in /var/log/messages

kernel: Dazed and confused, but trying to continue
kernel: Do you have a strange power saving mode enabled?
kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0
kernel: Dazed and confused, but trying to continue
kernel: Do you have a strange power saving mode enabled?
kernel: Uhhuh. NMI received for unknown reason 31 on CPU 0.

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

2) A Watchdog-like software on the system that monitors for perceived system hangs
The NMI watchdog monitors system interrupts and sends an NMI if the system appears to have hung.
On a normal system hundreds of device and timer interrupts are received per second. If there are no interrupts in a 30 second interval*,
the NMI watchdog assumes that the system has hung and sends an NMI to the system to trigger a kernel panic or restart.

How an NMI watchdog works

A standard system level watchdog waits for regular events to fire and reboots the machine if no event is received within a designated timeframe. The NMI watchdog is no different. When using the NMI watchdog the system generates periodic NMI interrupts, and the kernel can monitor whether any CPU has locked up and print out debugging messages if so.

Enabling NMI Watchdog

The Red Hat Enterprise Linux 6 kernel is built with NMI watchdog support on currently supported x86 and x86-64 platforms.

Ensure NMI is being used:

For SMP machines and Single processor systems with an IO-APIC use nmi_watchdog=1.

For Single processor systems without an IO-APIC use nmi_watchdog=2.

Verification to check NMI watchdog working

Boot the system with the the parameter as stated above and check the /proc/interrupts file for the “NMI count” line. This value should be non zero and increase over time. If the value is zero and does not increase over time the wrong NMI watchdog parameter has been used, change

If it is still zero then log a problem, you probably have a processor that needs to be added to the nmi code.

Here is an example from /etc/grub.conf for systems which utilize the GRUB boot loader:

title Red Hat Enterprise Linux Server (2.6.32-358.6.1.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-358.6.1.el6.x86_64 ro root=/dev/mapper/vg_worklaptop-lv_root crashkernel=auto rd_LVM_LV=vg_worklaptop/lv_root rhgb quiet nmi_watchdog=1
initrd /initramfs-2.6.32-358.6.1.el6.x86_64.img

To determine if the NMI watchdog was properly activated, check the /proc/interrupts file. The NMI interrupt should display a non-zero value. If the NMI interrupt displays a zero, alter the nmi_watchdog value, restart the system, and examine this file again. If a zero is still displayed, then the processor in the test system is not supported by the NMI watchdog code.

The output, when functioning correctly, should look similar to the following:

[root@work-laptop wmealing]# cat /proc/interrupts | grep ^NMI
NMI: 861 636 377 357 Non-maskable interrupts

Each processor core has an NMI count. These should all be increasing over time. The above example is a quad core system.

System wide NMI settings

The NMI settings can be configured at runtime by using the sysctl interface.

In the /etc/sysctl.conf, to enable, set:

kernel.nmi_watchdog = 1

To disable, set:

kernel.nmi_watchdog = 0

Note that this does not enable the functionality, the kernel parameter is required to correctly enable the NMI watchdog.

unknown_nmi_panic

A feature was introduced in kernel 2.6.9 which helps to make easier the process of diagnosing system hangs on specific hardware.
The feature utilizes the kernels behavior when dealing with unknown NMI sources. The behavior is to allow it to panic, rather than handle the unknown nmi source. This feature cannot be utilized on systems that also use the NMI Watchdog or some oprofile (and other tools that use performance metric features as both of these also make use of the undefined NMI interrupt. If unknown_nmi_panic is activated with one of these features present, it will not work.

Note that this is a user-initiated interrupt which is really most useful for helping to diagnose a system that is experiencing system hangs for unknown reasons.

To enable this feature, set the following system control parameter in the /etc/sysctl.conf file as follows:

kernel.unknown_nmi_panic = 1

To disable, set:

kernel.unknown_nmi_panic = 0

Once this change has taken effect, a panic can be forced by pushing the system’s NMI switch. Systems that do not have an NMI switch can still use the NMI Watchdog feature which will automatically generate an NMI if a system hang is detected.

panic_on_unrecovered_nmi

Some systems may generate an NMI based on vendor configuration, such as power management, low battery etc. It may be important to set this if your system is generating NMI’s in a known-working environment.

To enable this feature, set the following system control parameter in the /etc/sysctl.conf file as follows:

kernel.panic_on_unrecovered_nmi = 1

To disable, set:

kernel.panic_on_unrecovered_nmi = 0

panic_on_io_nmi

This setting was only available in Red Hat Enterprise Linux 6. When set, this will cause a kernel panic when the kernel receives an NMI caused by an Input/Output error.

Sample Scenario 4 :

Console Shows following Error Message

NMI: IOCK error (debug interrupt?)
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge mptctl mptbase bonding be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom hpilo bnx2 serio_raw shpchp pcspkr sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage qla2xxx scsi_transport_fc ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-194.17.4.el5 #1
RIP: 0010:[<ffffffff8019d550>] [<ffffffff8019d550>] acpi_processor_idle_simple+0x14c/0x30e
RSP: 0018:ffffffff803fbf58 EFLAGS: 00000046
RAX: 0000000000d4d87e RBX: ffff81061e10a160 RCX: 0000000000000908
RDX: 0000000000000915 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000000d4d87e R08: ffffffff803fa000 R09: 0000000000000039
R10: ffff810001005710 R11: 0000000000000000 R12: 0000000000000000
R13: ffff81061e10a000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff803ca000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000009013954 CR3: 000000060799d000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff803fa000, task ffffffff80308b60)
Stack: ffff81061e10a000 ffffffff8019d404 0000000000000000 ffffffff8019d404
0000000000090000 0000000000000000 0000000000000000 ffffffff8004923a
0000000000200800 ffffffff80405807 0000000000090000 0000000000000000
Call Trace:
[<ffffffff8019d404>] acpi_processor_idle_simple+0x0/0x30e
[<ffffffff8019d404>] acpi_processor_idle_simple+0x0/0x30e
[<ffffffff8004923a>] cpu_idle+0x95/0xb8
[<ffffffff80405807>] start_kernel+0x220/0x225
[<ffffffff8040522f>] _sinittext+0x22f/0x236

Code: 89 ca ed ed 41 89 c4 41 8a 45 1c 83 e0 30 3c 30 75 15 f0 ff

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: The BUG() macro

This kind of kernel panic normally caused by the kernel code when an abnormal situation is seen , that indicates a programming error . And normally the Output looks like:

Kernel BUG at spinlock:118
invalid operand: 0000 [1] SMP
CPU 0

Sample Scenario 5:

NFS client kernel crash because async task already queued hitting BUG_ON(RPC_IS_QUEUED(task)); in __rpc_execute
kernel BUG at net/sunrpc/sched.c:616!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 8
Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss pcc_cpufreq sunrpc power_meter hpilo
hpwdt igb mlx4_ib(U) mlx4_en(U) raid0 mlx4_core(U) sg microcode serio_raw iTCO_wdt
iTCO_vendor_support ioatdma dca shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif mpt2sas
scsi_transport_sas raid_class ahci dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: scsi_wait_scan]

Pid: 2256, comm: rpciod/8 Not tainted 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Gen8/
RIP: 0010:[<ffffffffa01fe458>] [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
…
Process rpciod/8 (pid: 2256, threadinfo ffff882016152000, task ffff8820162e80c0)
…
Call Trace:
[<ffffffffa01fe4d0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa01fe4e5>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
Code: db df 2e e1 f6 05 e0 26 02 00 40 0f 84 48 fe ff ff 0f b7 b3 d4 00 00 00 48 c7
c7 94 39 21 a0 31 c0 e8 b9 df 2e e1 e9 2e fe ff ff <0f> 0b eb fe 0f b7 b7 d4 00 00 00
31 c0 48 c7 c7 60 63 21 a0 e8
RIP [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: Bad pointer handling

This kind of kernel panics typically indicates a programming error and normally appear as below:

NULL pointer dereference at 0x1122334455667788 ..
or
Unable to handle kernel paging request at virtual address 0x11223344

One of the most common reason for this kind of error is possible memory corruption

Sample Scenario 6 :

NFS client kernel panics when doing an ls in the directory of a snapshot that has already been removed.

NFS client kernel panics under certain conditions when connected to NFS server either NetApp or Solaris ZFS

Kernel crashes with message

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff81192957>] commit_tree+0x77/0x100
PGD 7ff2e69067 PUD 7feaf59067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:07:00.0/vendor
CPU 64
Modules linked in: nls_utf8 fuse mptctl mptbase autofs4 nfs lockd fscache(T) nfs_acl auth_rpcgss bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc smbus(U) ipmi_devintf ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table nf_conntrack_ftp ipt_REJECT ipt_LOG iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log microcode sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ixgbe mdio igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi ata_piix megaraid_sas dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: nls_utf8 fuse mptctl mptbase autofs4 nfs lockd fscache(T) nfs_acl auth_rpcgss bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc smbus(U) ipmi_devintf ipmi_si ipmi_msghandler sunrpc cpufreq_ondemand acpi_cpufreq freq_table nf_conntrack_ftp ipt_REJECT ipt_LOG iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log microcode sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ixgbe mdio igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi ata_piix megaraid_sas dm_mod [last unloaded: scsi_wait_scan]
Pid: 79910, comm: ls Tainted: G —————- T 2.6.32-131.6.1.el6.x86_64 #1 PRIMERGY RX900 S1
RIP: 0010:[<ffffffff81192957>] [<ffffffff81192957>] commit_tree+0x77/0x100
RSP: 0018:ffff885f1484dab8 EFLAGS: 00010246
RAX: ffff881f5f43d3e8 RBX: ffff885f1484dab8 RCX: ffff885f1484dab8
RDX: ffff881f5f43d3e8 RSI: ffff881f5f43d3e8 RDI: ffff885f1484dab8
RBP: ffff885f1484dae8 R08: ffff881f5f43d3e8 R09: 0000000000000000
R10: ffff882080440a40 R11: 0000000000000000 R12: 0000000000000000
R13: ffff881f5f43d380 R14: ffff881f5fcba2c0 R15: 0000000000000000
FS: 00007f9b188177a0(0000) GS:ffff88011c700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000007fecaf5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 79910, threadinfo ffff885f1484c000, task ffff881fc4164b00)
Stack:
ffff881f5f43d3e8 ffff881f5f43d3e8 ffff881f5f43d380 ffff885f1484db08
<0> ffff881f5fcba2c0 ffff885f1484ddd8 ffff885f1484db48 ffffffff81192c6f
<0> ffff881c94a4d200 000000001484dbf8 ffff885f1484db08 ffff885f1484db08
Call Trace:
[<ffffffff81192c6f>] attach_recursive_mnt+0x28f/0x2a0
[<ffffffff81192d80>] graft_tree+0x100/0x140
[<ffffffff814dc686>] ? down_write+0x16/0x40
[<ffffffff81192e5f>] do_add_mount+0x9f/0x160
[<ffffffffa045ce2f>] nfs_follow_mountpoint+0x1bf/0x570 [nfs]
[<ffffffff811810a0>] do_follow_link+0x120/0x440
[<ffffffffa03112e0>] ? put_rpccred+0x50/0x150 [sunrpc]
[<ffffffff81180eeb>] __link_path_walk+0x78b/0x820
[<ffffffff8118164a>] path_walk+0x6a/0xe0
[<ffffffff8118181b>] do_path_lookup+0x5b/0xa0
[<ffffffff811819a7>] user_path_at+0x57/0xa0
[<ffffffff81041594>] ? __do_page_fault+0x1e4/0x480
[<ffffffff810ce97d>] ? audit_filter_rules+0x2d/0xa10
[<ffffffff81177cac>] vfs_fstatat+0x3c/0x80
[<ffffffff81177d5e>] vfs_lstat+0x1e/0x20
[<ffffffff81177d84>] sys_newlstat+0x24/0x50
[<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff814e054e>] ? do_page_fault+0x3e/0xa0
[<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 83 e8 68 eb 12 0f 1f 80 00 00 00 00 4c 89 a0 c0 00 00 00 48 8d 42 98 48 8b 50 68 48 8d 48 68 48 39 cb 0f 18 0a 75 e5 48 8b 45 d0 <49> 8b 54 24 18 48 39 d8 74 15 48 8b 0a 48 8b 5d d8 48 89 50 08
RIP [<ffffffff81192957>] c
ommit_tree+0x77/0x100
RSP <ffff885f1484dab8>
CR2: 0000000000000018

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: Pseudo-hangs

This are the common situations, that we commonly encounter where the system appears to be hung, but some progress is being made, there are several reasons for this kind of behaviour, and they are

Livelock if running a realtime kernel, application load could be too high, leading the system into a state where it becomes effectively unresponsive in a “live lock/ busy wait” state. The system is not actually hung, but just moving so slowly that it appears to be hung.
Thrashing – continuous swapping with close to no useful processing done
Lower zone starvation – on i386 the low memory has a special significance and the system may “hang” even when there’s plenty of free memory
Memory starvation in one node in a NUMA system

Normally, Hangs which are not detected by the hardware are trickier to debug:

Use [sysrq + t] to collect process stack traces when possible
Enable the NMI watchdog which should detect those situations
Run hardware diagnostics when it’s a hard hang: memtest86, HP diagnostics

Sample Scenario 7:

The system is frequently getting hung and following error messages are getting logged in /var/log/messages file while performing IO operations on the /dev/cciss/xx devices:

INFO: task cmaperfd:5628 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
cmaperfd D ffff810009025e20 0 5628 1 5655 5577 (NOTLB)
ffff81081bdc9d18 0000000000000082 0000000000000000 0000000000000000
0000000000000000 0000000000000007 ffff81082250f040 ffff81043e100040
0000d75ba65246a4 0000000001f4db40 ffff81082250f228 0000000828e5ac68
Call Trace:
[<ffffffff8803bccc>] :jbd2:start_this_handle+0x2ed/0x3b7
[<ffffffff800a3c28>] autoremove_wake_function+0x0/0x2e
[<ffffffff8002d0f4>] mntput_no_expire+0x19/0x89
[<ffffffff8803be39>] :jbd2:jbd2_journal_start+0xa3/0xda
[<ffffffff8805e7b0>] :ext4:ext4_dirty_inode+0x1a/0x46
[<ffffffff80013deb>] __mark_inode_dirty+0x29/0x16e
[<ffffffff80041bf5>] inode_setattr+0xfd/0x104
[<ffffffff8805e70c>] :ext4:ext4_setattr+0x2db/0x365
[<ffffffff88055abc>] :ext4:ext4_file_open+0x0/0xf5
[<ffffffff8002cf2b>] notify_change+0x145/0x2f5
[<ffffffff800e45fe>] sys_fchmod+0xb3/0xd7

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

Software: Out-of-Memory killer

In certain memory starvation cases, the OOM killer is triggered to force the release of some memory by killing a “suitable” process. In severe starvation cases, the OOM killer may have to panic the system when no killable processes are found:

Kernel panic – not syncing:
Out of memory and no killable processes…

The kernel can also be configured to always panic during an OOM by setting the vm.panic_on_oom = 1 sysctl.

Sample Scenario 8 :

When the system panics kdump starts, but kdump hangs and does not output a vmcore. I see following error messages on the console:

Kernel panic - not syncing: Out of memory and no killable processes...

Troubleshooting Procedure Posted here Redhat Enterprise Linux – Troubleshooting Kernel Panic issues – Part 2

I am just preparing my lab systems ready to give a demo on kernel crash utility to analyse the kernel panic issues. And also diagnosis and root causes to the the kernel panic scenarios discussed in this post.

Please let me know , if you have experienced any other kind of kernel panic incidents which I have missed to refer here, so that it will be useful for others.

RHEL6 Kernel Panic and System Crash – Troubleshooting Reference

Hardware: Machine Check Exceptions

Error Detection and Correction (EDAC)

Non-Maskable Interrupts (NMIs)

Software: The BUG() macro

Software: Bad pointer handling

Software: Pseudo-hangs

Software: Out-of-Memory killer

You may also like...

알림글

시스존 통합 검색

카테고리

2025 8월
월	화	수	목	금	토	일
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

RHEL6 Kernel Panic and System Crash – Troubleshooting Reference

Hardware: Machine Check Exceptions

Error Detection and Correction (EDAC)

Non-Maskable Interrupts (NMIs)

Software: The BUG() macro

Software: Bad pointer handling

Software: Pseudo-hangs

Software: Out-of-Memory killer

You may also like...

리눅스에서 디스크 복구 방법

DSTAT 명령 주요 옵션 – 시스템 모니터링

2.6.x Kernel 에서 bridge firewall 구성 시 Forward Drop 문제

알림글

시스존 통합 검색

카테고리