RHEL5기반 Diskless cluster 구축하기

HOWTO – Building a Diskless Linux Cluster

Author: Mohamad Sindi


This is a detailed procedure of how to install a cluster of diskless Linux nodes. The procedure if fully done via command line tools so it is convenient to script and use for large scale clusters. My operating system of choice was RedHat EL 5.3 but it should work with other distributions as well. RedHat’s documentation doesn’t seem to have any procedure on doing this using command line, they only illustrate it using a GUI tool which is not convenient at all to use when you have a large number of nodes in a diskless cluster in an enterprise enviornment, so I thought it would be useful to share this with everyone, I hope you find it useful.


1. Introduction
2. Setting up the Server Node
3. Setting the Diskless Environment via Command Line
4. Booting the Diskless Clients
5. Modifying the Golden Image Node

1. Introduction

This is a step by step procedure to set up a cluster of diskless compute nodes along with a node acting as the server for the diskless environment. The operating system of choice was RedHat EL 5.3 using an x86_64 IBM Infiniband cluster. We cover how to do it in specific details via RedHat’s netboot. We will use the command line version of it since the GUI version might not be an efficient method to use when setting a large number of diskless nodes.

Types of nodes that are used for this demonstration:
• node002 (Image server – IP
• node003 (Diskless client – IP
• node004 (Golden image node installed on disk – IP

2. Setting up the Server Node

First of all, install the server node node002 with all RedHat packages for simplicity (use “@Everything” in kickstart packages section) and disable firewall and selinux.

Install one of the client nodes, e.g. node004, normally on disk and include whatever settings and software you want to have on it, this will be the golden source for our diskless image.

It is very important to make sure that the “busybox-anaconda” package is installed on the client golden image node and on the server node since it’s needed for diskless environments:
[root@node002 ~]# rpm -qa | grep busybox-anaconda
[root@node004 ~]# rpm -qa | grep busybox-anaconda

Create the root diskless directory (this will be shared with all diskless clients in Read Only mode):
[root@node002 ~]# mkdir -p /diskless/i386/RHEL5-AS/root

Copy your golden image (node004) to the server (make sure to exclude /proc and /sys):
[root@node002 ~]# rsync -a -e ssh –exclude=’/proc/*’ –exclude=’/sys/*’ node004:/ /diskless/i386/RHEL5-AS/root/

Verify it has been copied:
[root@node002 ~]# du -hs /diskless/i386/RHEL5-AS/root
1.6G /diskless/i386/RHEL5-AS/root

Make sure tftp is running on the server:
[root@node002 ~]# chkconfig xinetd start
[root@node002 ~]# chkconfig tftp on
[root@node002 ~]# chkconfig –list | grep tftp
tftp: on

Setup your DHCP configuration file, I used the below settings (obviously modify it with your specific IPs, MACs, PXE booting NIC, hostnames, DNS, etc…):

[root@node002 ~]# cat /etc/dhcpd.conf

ddns-update-style none;

allow bootp;
allow booting;

shared-network eth0 {

subnet netmask {
        option routers; # --- default gateway
	option subnet-mask;
        option nis-domain	        "<YOUR-DOMAIN>";
	option domain-name		"<YOUR-SEARCH-DOMAIN>";
	option domain-name-servers	<YOUR-DNS-IP>;
	option time-offset		-18000;	# Eastern Standard Time
	default-lease-time 		21600;
	max-lease-time 			43200;
        next-server ; #<--- TFTP server IP here

	host node003 {
		hardware ethernet 00:21:5E:40:64:5C;
        	              filename "linux-install/pxelinux.0";

	host node004 {
		hardware ethernet 00:21:5E:40:63:C8;
        	              	filename "linux-install/pxelinux.0";

Make sure the DHCP is running:
[root@node002 ~]# chkconfig dhcpd on
[root@node002 ~]# service dhcpd start
[root@node002 ~]# service dhcpd status
dhcpd (pid 28250) is running…

Set up NFS (notice the read-only and read-write settings):
[root@node002 ~]# cat /etc/exports
/diskless/i386/RHEL5-AS/root *(ro,sync,no_root_squash)
/diskless/i386/RHEL5-AS/snapshot *(rw,sync,no_root_squash)
[root@node002 ~]# chkconfig nfs on
[root@node002 ~]# service nfs start
[root@node002 ~]# showmount -e node002

Export list for node002:
/diskless/i386/RHEL5-AS/root *
/diskless/i386/RHEL5-AS/snapshot *

3. Setting the Diskless Environment via Command Line

Now we need to add clients to the diskless environment and create a snapshot image for each diskless client. The system-config-netboot is a RedHat GUI utility to do so. However, what if you have a cluster with hundreds of nodes and you want to add them to the diskless environment, it would be a hassle to use the GUI to do so. There are 2 commands that come with the system-config-netbook package which are called “pxeos” and “pxeboot”. The “pxeos” enables you to create a new operating system under /tftpboot and places the vmlinux and initrd images there. The “pxeboot” enables you to add new hosts to the diskless environment. The snapshot image for each diskless client contains data that the client has read-write access to, for example /var/log/messages and so on.

Below is how to add a new O.S. called “rhel5compute” using “pxeos”:
[root@node002 ~]# pxeos -a -i rhel5compute -p NFS -D 1 -s -L /diskless/i386/RHEL5-AS rhel5compute
Kernel not specified, using 2.6.18-128.el5

To verify that the O.S. was created:
[root@ node002 ~]# pxeos -l
Description: rhel5compute
Protocol: NFS
isDiskless: True
Location: /diskless/i386/RHEL5-AS

[root@ node002 ~]# ls /tftpboot/linux-install/rhel5compute/
initrd.img vmlinuz

Options for “pxeos”:
-a: Adds new O.S.
-i: Description of O.S.
-p: Protocol to use
-D: If set to 1, then it indicates it’s diskless
-s: NFS server
-L: Location of your O.S., this should contain the vmlinuz and initrd somewhere in it.

The “pxeboot” command adds clients to the diskless environment and creates the HEX file for the host in the /tftpboot directory and it also creates the snapshot directory for the individual nodes being added. You can script it to do all of this easily, below is an example for a 128 node cluster.

First thing create a file with the IPs and hostnames that need to be added, you can use the entries found in your /etc/hosts file if they exist:
[root@node128 ~]# cat nodes

Code:	node001	node002	node003
………………………	node128

Now use the script below to add all 128 nodes to your diskless environment:
[root@node128 ~]# cat add-pxe-hosts


#This script will generate the HEX files in /tftpboot as well as the individual snapshoot directories for nodes. The nodes file has 2 columns, first one is the IP and the second is the hostname

while read line
ip=`echo $line | gawk -F" " '{ print $1 }'`
host=`echo $line | gawk -F" " '{ print $2 }'`

pxeboot -a -O rhel5compute -r 28753 -S $host -e eth0 -N my_nis_domain -s console=ttyS0,115200n8r     $ip

done < nodes

Options for “pxeboot”:
-a: Adds new host
-O: Name of O.S. (You should have already created this with the system-config-netboot or pxeos)
-r: Size of ram disk
-S: Snapshot name of the diskless client.
-e: The interface through which the pxe boot will take place.
-N: NIS domain
-s: Serial console settings to view remote consoles during boot up

Now verify that all 128 hex files for tftpboot were created:
[root@node128 ~]# ls /tftpboot/linux-install/pxelinux.cfg/ | grep 0A | wc -l

An example of the content of one of the HEX files for node node001:
[root@node128 ~]# cat /tftpboot/linux-install/pxelinux.cfg/0A088801

default rhel5compute

label rhel5compute
    kernel rhel5compute/vmlinuz
    append console=ttyS0,9600n8 initrd=rhel5compute/initrd.img root=/dev/ram0 init=disklessrc NFSROOT= ramdisk_size=28753 ETHERNET=eth0 SNAPSHOT=node001 NISDOMAIN=my_nis_domain

Verify that all 128 snapshots were created:
[root@node128 ~]# ls /diskless/i386/RHEL5-AS/snapshot/ | grep node | wc -l

Each snapshot image should have a unique name since its data is specific to that client and the client has read-write access to it.
[root@node002 ~]# du -hs /diskless/i386/RHEL5-AS/snapshot/node003
80M /diskless/i386/RHEL5-AS/snapshot/node003
[root@node002 ~]# ls /diskless/i386/RHEL5-AS/snapshot/node003
boot etc home lib root var

Verify that the snapshot directory contains a text file called “files” which lists the files that can be mounted in read-write mode. Do not modify the “files” file manually, if you need to add more read-write files in your diskless system, create a new file called “files.custom” with the same format as “files” and list the desired files in it:

[root@node002 snapshot]# cat /diskless/i386/RHEL5-AS/snapshot/files

# This file contains the list of files/directories to be stored in the 
# snapshot directory for each diskless client.  Please do not edit this file,
# as Red Hat will be updating it with each release. If you wish to 
# add files please create a files.custom in this directory and add entries to it.

Verify that the kernel images have been created:
[root@node002 ~]# ls /tftpboot/linux-install/rhel5compute
initrd.img vmlinuz

4. Booting the Diskless Clients

At this stage the server is ready. Now we need to boot the clients into their diskless environment. Make sure that your client’s BIOS and NIC cards support PXE boot and that that they are setup in the BIOS to PXE boot via eth0 since our DHCP was configured with the MACs of eth0.

You can verify that the tftpboot file for the diskless client is ready by checking the tftpboot directory and file below for node003:
[root@node002 ~]# cat /tftpboot/linux-install/pxelinux.cfg/0A08E249

default rhel5compute

label rhel5compute
    kernel rhel5compute/vmlinuz
    append  initrd=rhel5compute/initrd.img root=/dev/ram0 init=disklessrc NFSROOT= ramdisk_size=28760 ETHERNET=eth0 SNAPSHOT=node003 NISDOMAIN=my_nis_domain

When a diskless client boots, the DHCP server will assign it an IP and the vmlinux and initrd.img images will be loaded via tftp to the memory of the diskless client and the client will boot up giving you the login prompt. After you log in you can verify that it is running from memory and not from disk (/dev/sda) by running commands such as “df” and “fdisk” and verifying your /etc/fstab file which indicates a diskless file system.

#The fstab file doesn’t mount the physical hard disk partitions:
[root@node003 ~]# cat /etc/fstab
# /etc/fstab for diskless clients, written by system-config-netboot
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /tmp tmpfs defaults 0 0
/dev/cdrom /media/cdrom iso9660 noauto,owner,kudzu,ro 0 0
/dev/fd0 /media/floppy auto noauto,owner,kudzu 0 0

#The df command shows that the local disk /dev/sda is not used
[root@node003 ~]# df -hl
Filesystem Size Used Avail Use% Mounted on
rootfs 44G 13G 29G 32% /
/dev 5.9G 212K 5.9G 1% /dev
/dev 5.9G 212K 5.9G 1% /dev
none 5.9G 0 5.9G 0% /dev/shm
none 5.9G 796K 5.9G 1% /tmp

#The fdisk command shows that the physical hard disk is there, but not used
[root@node003 ~]# fdisk -l
Disk /dev/sda: 73.4 GB, 73407820800 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 25 200781 83 Linux
/dev/sda2 26 1069 8385930 82 Linux swap / Solaris
/dev/sda3 1070 6871 46604565 83 Linux
/dev/sda4 6872 8924 16490722+ 5 Extended
/dev/sda5 6872 8401 12289693+ 83 Linux
/dev/sda6 8402 8923 4192933+ 83 Linux

5. Modifying the Golden Image Node

If later on you need to install more packages on these nodes, you only need to install it on the golden image node then sync your root directory on the server. Another example if you need to update the root password on all nodes, you just need to change it on the golden image and sync.

For example let’s say you want to install the “dstat” package on all nodes.

Go to the golden image node node004 which is running on normal disk and install it there:
[root@node004 ~]# yum install dstat

Now on your NFS server sync again with the golden image:
[root@node002 ~]# rsync -a -e ssh –exclude=’/proc/*’ –exclude=’/sys/*’ node004:/ /diskless/i386/RHEL5-AS/root/

Now the diskless client (node003) should be updated on the fly with no need for a reboot. This node has 12GB of physical RAM and only around 250MB seems to be used by the diskless environment:
[root@node003 ~]# dstat -m
used buff cach free
131M 256k 116M 11G

I hope you find this useful

Mohamad Sindi


슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.