Kexec/Kdump HOWTO – linux kernel crash dump

Kexec/Kdump HOWTO

Introduction

Kexec and kdump are new features in the 2.6 mainstream kernel. These features

are included in Red Hat Enterprise Linux 5. The purpose of these features

is to ensure faster boot up and creation of reliable kernel vmcores for

diagnostic purposes.

Overview

Kexec

Kexec is a fastboot mechanism which allows booting a Linux kernel from the

context of already running kernel without going through BIOS. BIOS can be very

time consuming especially on the big servers with lots of peripherals. This can

save a lot of time for developers who end up booting a machine numerous times.

Kdump

Kdump is a new kernel crash dumping mechanism and is very reliable because

the crash dump is captured from the context of a freshly booted kernel and

not from the context of the crashed kernel. Kdump uses kexec to boot into

a second kernel whenever system crashes. This second kernel, often called

a capture kernel, boots with very little memory and captures the dump image.

The first kernel reserves a section of memory that the second kernel uses

to boot. Kexec enables booting the capture kernel without going through BIOS

hence contents of first kernel’s memory are preserved, which is essentially

the kernel crash dump.

Kdump is supported on the i686, x86_64, ia64 and ppc64 platforms. The

standard kernel and capture kernel are one in the same on i686, x86_64

and ia64, while ppc64 requires a separate capture kernel (provided by the

kernel-kdump package) at this time.

If you’re reading this document, you should already have kexec-tools

installed. If not, you install it via the following command:

    # yum install kexec-tools

Now load a kernel with kexec:

    # kver=`uname -r` # kexec -l /boot/vmlinuz-$kver

    –initrd=/boot/initrd-$kver.img \\

        –command-line=”`cat /proc/cmdline`”

NOTE: The above will boot you back into the kernel you’re currently running,

if you want to load a different kernel, substitute it in place of `uname -r`.

Now reboot your system, taking note that it should bypass the BIOS:

    # reboot

How to configure kdump:

Again, we assume if you’re reading this document, you should already have

kexec-tools installed. If not, you install it via the following command:

    # yum install kexec-tools

If you’re on ppc64, you’ll first need to install the kernel-kdump package:

    # yum install kernel-kdump

To be able to do much of anything interesting in the way of debug analysis,

you’ll also need to install the kernel-debuginfo package, of the same arch

as your running kernel, and the crash utility:

    # yum –enablerepo=\\*debuginfo install kernel-debuginfo.$(uname -m) crash

Next up, we need to modify some boot parameters to reserve a chunk of memory

for the capture kernel. For i686 and x86_64, edit /etc/grub.conf, and append

“crashkernel=128M@16M” to the end of your kernel line. Similarly, append

the same to the append line in /etc/yaboot.conf for ppc64, followed by a

/sbin/ybin to load the new configuration (not needed for grub). On ia64,

edit /etc/elilo.conf, adding “crashkernel=256M@256M” to the append line for

your kernel. Note that the X@Y values are such that X = the amount of memory

to reserve for the capture kernel and Y = the offset into memory at which

that reservation should start.

Examples:

# cat /etc/grub.conf

# grub.conf generated by anaconda # # Note that you do not have to rerun grub

after making changes to this file # NOTICE:  You have a /boot partition.

This means that #          all kernel and initrd paths are relative to

/boot/, eg.  #          root (hd0,0) #          kernel /vmlinuz-version ro

root=/dev/VolGroup00/root #          initrd /initrd-version.img #boot=/dev/hda

default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title

Red Hat Enterprise Linux (2.6.17-1.2621.el5)

        root (hd0,0) kernel /vmlinuz-2.6.17-1.2621.el5 ro

        root=/dev/VolGroup00/root crashkernel=128M@16M initrd

        /initrd-2.6.17-1.2621.el5.img

# cat /etc/yaboot.conf

# yaboot.conf generated by anaconda

boot=/dev/sda1 init-message=Welcome to Red Hat Enterprise Linux!\\nHit <TAB>

for boot options

partition=2 timeout=80 install=/usr/lib/yaboot/yaboot delay=5 enablecdboot

enableofboot enablenetboot nonvram fstype=raw

image=/vmlinuz-2.6.17-1.2621.el5

        label=linux read-only initrd=/initrd-2.6.17-1.2621.el5.img

        append=”root=LABEL=/ crashkernel=128M@16M”

# cat /etc/elilo.conf

prompt timeout=20 default=2.6.17-1.2621.el5 relocatable

image=vmlinuz-2.6.17-1.2621.el5

        label=2.6.17-1.2621.el5 initrd=initrd-2.6.17-1.2621.el5.img read-only

        append=”– root=LABEL=/ crashkernel=256M@256M”

After making said changes, reboot your system, so that the X MB of memory

starting Y MB into your memory is left untouched by the normal system,

reserved for the capture kernel. Take note that the output of ‘free -m’ will

show X MB less memory than without this parameter, which is expected. You

may be able to get by with less than 128M, but testing with only 64M has

proven unreliable of late. On ia64, as much as 512M may be required.

Now that you’ve got that reserved memory region set up, you want to turn on

the kdump init script:

    # chkconfig kdump on

Then, start up kdump as well:

    # service kdump start

This should load your kernel-kdump image via kexec, leaving the system ready

to capture a vmcore upon crashing. To test this out, you can force-crash

your system by echo’ing a c into /proc/sysrq-trigger:

    # echo c > /proc/sysrq-trigger

You should see some panic output, followed by the system restarting into

the kdump kernel. When the boot process gets to the point where it starts

the kdump service, your vmcore should be copied out to disk (by default,

in /var/crash/<YYYY-MM-DD-HH:MM>/vmcore), then the system rebooted back into

your normal kernel.

Once back to your normal kernel, you can use the previously installed crash

kernel in conjunction with the previously installed kernel-debuginfo to

perform postmortem analysis:

    # crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux

    /var/crash/2006-08-23-15:34/vmcore

    crash> bt

and so on…

Advanced Setups:

In addition to being able to capture a vmcore to your system’s local file

system, kdump can be configured to capture a vmcore to a number of other

locations, including a raw disk partition, a dedicated file system, an NFS

mounted file system, or a remote system via ssh/scp. Additional options

exist for specifying the relative path under which the dump is captured,

what to do if the capture fails, and for compressing and filtering the dump

(so as to produce smaller, more manageable, vmcore files).

In theory, dumping to a location other than the local file system should be

safer than kdump’s default setup, as its possible the default setup will try

dumping to a file system that has become corrupted. The raw disk partition and

dedicated file system options allow you to still dump to the local system,

but without having to remount your possibly corrupted file system(s),

thereby decreasing the chance a vmcore won’t be captured. Dumping to an

NFS server or remote system via ssh/scp also has this advantage, as well

as allowing for the centralization of vmcore files, should you have several

systems from which you’d like to obtain vmcore files. Of course, note that

these configurations could present problems if your network is unreliable.

Advanced setups are configured via modifications to /etc/kdump.conf,

which out of the box, is fairly well documented itself. Any alterations to

/etc/kdump.conf should be followed by a restart of the kdump service, so

the changes can be incorporated in the kdump initrd. Restarting the kdump

service is as simple as ‘/sbin/service kdump restart’.

Note that kdump.conf is used as a configuration mechanism for capturing dump

files from the initramfs (in the interests of safety), the root file system is

mounted, and the init process is started, only as a last resort if the

initramfs fails to capture the vmcore.  As such, configuration made in

/etc/kdump.conf is only applicable to capture recorded in the initramfs.  If

for any reason the init process is started on the root file system, only a

simple copying of the vmcore from /proc/vmcore to /var/crash/$DATE/vmcore will

be preformed.

Raw partition

Raw partition dumping requires that a disk partition in the system, at least

as large as the amount of memory in the system, be left unformatted. Assuming

/dev/sda5 is left unformatted, kdump.conf can be configured with ‘raw

/dev/sda5′, and the vmcore file will be copied via dd directly onto partition

/dev/sda5. Restart the kdump service via ‘/sbin/service kdump restart’

to commit this change to your kdump initrd.

Dedicated file system

Similar to raw partition dumping, you can format a partition with the file

system of your choice, leaving it unmounted during normal operation. Again,

it should be at least as large as the amount of memory in the system. Assuming

/dev/sda3 has been formatted ext3, specify ‘ext3 /dev/sda3’ in kdump.conf,

and a vmcore file will be copied onto the file system after it has been

mounted. Dumping to a dedicated partition has the advantage that you can dump

multiple vmcores to the file system, space permitting, without overwriting

previous ones, as would be the case in a raw partition setup. Restart the

kdump service via ‘/sbin/service kdump restart’ to commit this change to

your kdump initrd.  Note that for local file systems ext3 and ext2 are

supported as dumpable targets.  Kdump will not prevent you from specifying

other filesystems, and they will most likely work, but their operation

cannot be guaranteed.  for instance specifying a vfat filesystem or msdos

filesystem will result in a successful load of the kdump service, but during

crash recovery, the dump will fail if the system has more than 2GB of memory

(since vfat and msdos filesystems do not support more than 2GB files).

Be careful of your filesystem selection when using this target.

NFS mount

Dumping over NFS requires an NFS server configured to export a file system

with full read/write access for the root user. All operations done within

the kdump initial ramdisk are done as root, and to write out a vmcore file,

we obviously must be able to write to the NFS mount. Configuring an NFS

server is outside the scope of this document, but either the no_root_squash

or anonuid options on the NFS server side are likely of interest to permit

the kdump initrd operations write to the NFS mount as root.

Assuming your’re exporting /dump on the machine nfs-server.example.com,

once the mount is properly configured, specify it in kdump.conf, via ‘net

nfs-server.example.com:/dump’. The server portion can be specified either

by host name or IP address. Following a system crash, the kdump initrd will

mount the NFS mount and copy out the vmcore to your NFS server. Restart the

kdump service via ‘/sbin/service kdump restart’ to commit this change to

your kdump initrd.

Remote system via ssh/scp

Dumping over ssh/scp requires setting up passwordless ssh keys for every

machine you wish to have dump via this method. First up, configure kdump.conf

for ssh/scp dumping, adding a config line of ‘net user@server’, where ‘user’

can be any user on the target system you choose, and ‘server’ is the host

name or IP address of the target system. Using a dedicated, restricted user

account on the target system is recommended, as there will be keyless ssh

access to this account.

Once kdump.conf is appropriately configured, issue the command ‘/sbin/service

kdump propagate’ to automatically set up the ssh host keys and transmit

the necessary bits to the target server. You’ll have to type in ‘yes’

to accept the host key for your targer server if this is the first time

you’ve connected to it, and then input the target system user’s password

to send over the necessary ssh key file. Restart the kdump service via

‘/sbin/service kdump restart’ to commit this change to your kdump initrd.

Path

By default, local file system vmcore files are written to /var/crash/%DATE

on the local system, ssh/scp dumps to /var/crash/%HOST-%DATE on the target

system, dedicated file system partition dumps to ./var/crash/%DATE, and

NFS dumps to ./var/crash/%HOST-%DATE, the latter two both relative to

their respective mount points within the kdump initrd (usually /mnt). The

‘/var/crash’ portion of the path can be overridden using kdump.conf’s ‘path’

variable, should you wish to write the vmcore out to a different location. For

example, ‘path /data/coredumps’ would lead to vmcore files being written to

/data/coredumps/%DATE if you were dumping to your local file system.  Note

that the path option is ingnored if your kdump configuration results in the

core being saved from the initscripts in the root filesystem.

Default action

By default, if a configured dump method fails, the kdump initrd falls back

to trying to dump to the local file system (i.e., into the file system(s)

you would have mounted under normal system operation). The system always

reboots following an attempted dump to your local file system, regardless

of success or failure.

However, for any of the advanced methods, if the dump fails, you can configure

the kdump initrd to skip trying to dump to the local file system, either

immediately rebooting (‘default reboot’) or dropping your to a shell within

the initrd (‘default shell’), from which you could try to capture the vmcore

manually. Again, if the ‘default’ parameter is unset, a local file system

dump will be attempted, then the system will reboot.

Compression and filtering

The ‘core_collector’ parameter in kdump.conf allows you to specify a custom

dump capture method. The most common alternate method is makedumpfile, which

is a dump filtering and compression utility provided with kexec-tools. On

some architectures, it can drastically reduce the size of your vmcore files,

which becomes very useful on systems with large amounts of memory.

A typical setup is ‘core_collector makedumpfile -c’, but check the output of

‘/sbin/makedumpfile –help’ for a list of all available options (-i and -g

don’t need to be specified, they’re automatically taken care of). Note that

use of makedumpfile requires that the kernel-debuginfo package corresponding

with your running kernel be installed. Also, note that for technical reasons,

makedumpfile cannot be used with ssh/scp or raw dumps.

Also note that makedumpfile is only used from the initramfs.  Saving a

core from the initscript in the root filesystem is considered a last ditch

effort, only used when the initramfs has failed to save the core properly.

As such only the cp utiltiy is used in the initscripts.  The implication

here is that in order to use makedumpfile as your core collector, you must

specify a dump target in /etc/kdump.conf.

Caveats:

Console frame-buffers and X are not properly supported. If you typically run

with something along the lines of “vga=791” in your kernel config line or

have X running, console video will be garbled when a kernel is booted via

kexec. Note that the kdump kernel should still be able to create a dump,

and when the system reboots, video should be restored to normal.

Notes on RHEL5 configuration:

The RHEL5 kexec-utils package contains two extra configuration files:

/etc/sysconfig/kdump

This file allows you to specify an alternate kernel to boot in the

event of a panic (other than the kernel running at the moment), and allows you

to override or append options on the kernel command line.  It also alows you

to pass extra options to the kexec utility when the kdump service is starting.

See documentation in the template kdump sysconfig file for exact usage

/etc/kdump.conf

This file allows you to configure how kdump will record your core

file.  Unlike the stock version of kdump, the RHEL5 version of kdump attempts

to record your vmcore file from the initramfs, so as to still function

properly in the event that your root file system is corrupted and unmountable.

This file is interrogated on kdump service start and is used to populate the

initramfs for the kdump kernel with the appropriate data and utilities to copy

your core file to the desired location.  See documentation in /etc/kdump.conf

for available config directives and targets.  Note especially the ifc option.

kdump will attempt to determine which network interface to use when dumping to

a remote server, but due to the possibility of interface renaming, or alternate

module load strategies, the interface name may change in the kdump kernel.

This option is used to override that guess, so that the appropriate interface

will be activated in the kdump kernel.

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.