DRBD Primary/Primary using GFS
DRBD Primary/Primary using GFS
My goal by using DRBD as Primary/Primary with GFS is to load balance a http service, my servers looks like the following:
i use the GFS partition as document-root for my webserver (Apache).
maybe it’s better to use SAN as storage but it’s so expensive, another solutions maybe iSCSI or GNBD but also it’s need more servers which needs extra money
maybe in the future i will implement it using SAN, iSCSI or GNBD but for now it’s good with DRBD and GFS as two nodes with load balancer and it’s fast enough.
for testing and preparing this quick howto i used Xen to create 2 virtual machine and centos 5 as OS. the partition that i want to use as GFS is named xvdb1, make sure that your partition don’t contain any data you want (it will be destroyed)
to destroy the partition i used this command in the two nodes:
dd if=/dev/zero of=/dev/xvdb1
change /dev/xvdb1 to your partition (make sure it doesn’t contain any needed data).
the following commands have to be done in the two nodes, for simplicity i use the output of one machine
* download DRBD on node1 & node2:
[root@node1 ~]# mkdir downloads
[root@node1 ~]# cd downloads/
[root@node1 downloads]# wget -c http://oss.linbit.com/drbd/8.2/drbd-8.2.5.tar.gz
* untar it:
[root@node1 downloads]# time tar -xzpf drbd-8.2.5.tar.gz -C /usr/src/
real 0m0.162s
user 0m0.016s
sys 0m0.028s
[root@node1 downloads]# ls /usr/src/
drbd-8.2.5 redhat
* before building DRBD:
before you start, make sure you have the following installed in your system:
– make, gcc, the glibc development libraries, and the flex scanner generator must be installed
– kernel-headers and kernel-devel:
[root@node1 downloads]# yum list kernel-*
Loading “installonlyn” plugin
Setting up repositories
Reading repository metadata in from local files
Installed Packages
kernel.i686 2.6.18-8.el5 installed
kernel-headers.i386 2.6.18-8.el5 installed
kernel-xen.i686 2.6.18-8.el5 installed
kernel-xen-devel.i686 2.6.18-8.el5 installed
Available Packages
kernel-PAE.i686 2.6.18-8.el5 local
kernel-PAE-devel.i686 2.6.18-8.el5 local
kernel-devel.i686 2.6.18-8.el5 local
kernel-doc.noarch 2.6.18-8.el5 local
remember that i use Xen kernel.
* building DRBD:
– building DRBD kernel module:
[root@node1 downloads]# cd /usr/src/drbd-8.2.5/drbd
[root@node1 drbd]# make clean all
.
.
.
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration … done.
[root@node1 drbd]#
– checking the new kernel module:
[root@node1 drbd]# modinfo drbd.ko
filename: drbd.ko
alias: block-major-147-*
license: GPL
description: drbd – Distributed Replicated Block Device v8.2.5
author: Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
srcversion: E325FBFE020C804C4FABA31
depends:
vermagic: 2.6.18-8.el5xen SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1
parm: minor_count:Maximum number of drbd devices (1-255) (int)
parm: allow_oos:DONT USE! (bool)
parm: enable_faults:int
parm: fault_rate:int
parm: fault_count:int
parm: fault_devs:int
parm: trace_level:int
parm: trace_type:int
parm: trace_devs:int
parm: usermode_helper:string
[root@node1 drbd]#
– Building a DRBD RPM package
[root@node1 drbd]# cd /usr/src/drbd-8.2.5/
[root@node1 drbd-8.2.5]# make rpm
.
.
.
You have now:
-rw-r–r– 1 root root 142722 May 23 11:45 dist/RPMS/i386/drbd-8.2.5-3.i386.rpm
-rw-r–r– 1 root root 232238 May 23 11:45 dist/RPMS/i386/drbd-debuginfo-8.2.5-3.i386.rpm
-rw-r–r– 1 root root 851602 May 23 11:45 dist/RPMS/i386/drbd-km-2.6.18_8.el5xen-8.2.5-3.i386.rpm
[root@node1 drbd-8.2.5]#
– installing DRBD:
[root@node1 drbd-8.2.5]# cd dist/RPMS/i386/
[root@node1 i386]# rpm -ihv drbd-8.2.5-3.i386.rpm drbd-km-2.6.18_8.el5xen-8.2.5-3.i386.rpm
Preparing… ########################################### [100%]
1:drbd ########################################### [ 50%]
2:drbd-km-2.6.18_8.el5xen########################################### [100%]
* Configuring DRBD:
– for lower-level storage i use a simple setup, both hosts have a free (currently unused) partition named /dev/xvdb1 and i use internal meta data.
– for /etc/drbd.conf i use this configuration:
resource r0 {
protocol C;
startup {
become-primary-on both;
}
net {
allow-two-primaries;
cram-hmac-alg “sha1″;
shared-secret “123456″;
after-sb-0pri discard-least-changes;
after-sb-1pri violently-as0p;
after-sb-2pri violently-as0p;
rr-conflict violently;
}
syncer {
rate 44M;
}
on node1.test.lab {
device /dev/drbd0;
disk /dev/xvdb1;
address 192.168.1.1:7789;
meta-disk internal;
}
on node2.test.lab {
device /dev/drbd0;
disk /dev/xvdb1;
address 192.168.1.2:7789;
meta-disk internal;
}
}
note that “become-primary-on both” startup option is needed in Primary/Primary configuration.
* starting DRBD for the first time:
the following steps must be performed on the two nodes:
– Create device metadata
[root@node1 i386]# drbdadm create-md r0
v08 Magic number not found
v07 Magic number not found
v07 Magic number not found
v08 Magic number not found
Writing meta data…
initialising activity log
NOT initialized bitmap
New drbd meta data block sucessfully created.
–== Creating metadata ==–
As with nodes we count the total number of devices mirrored by DRBD at
at http://usage.drbd.org.
The counter works completely anonymous. A random number gets created for
this device, and that randomer number and the devices size will be sent.
http://usage.drbd.org/cgi-bin/insert_usage.pl?nu=18231616900827588600&ru=15113975333795790860&rs=2147483648
Enter ‘no’ to opt out, or just press [return] to continue:
success
– Attach. This step associates the DRBD resource with its backing device:
[root@node1 i386]# modprobe drbd
[root@node1 i386]# drbdadm attach r0
– verify running DRBD:
on node1:
[root@node1 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node1.test.lab, 2008-05-23 11:45:23
0: cs:StandAlone st:Secondary/Unknown ds:Inconsistent/Outdated r—
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
on node2:
[root@node2 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node2.test.lab, 2008-05-23 12:58:18
0: cs:StandAlone st:Secondary/Unknown ds:Inconsistent/Outdated r—
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
– Connect. This step connects the DRBD resource with its counterpart on the peer node:
[root@node1 i386]# drbdadm connect r0
[root@node1 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node1.test.lab, 2008-05-23 11:45:23
0: cs:WFConnection st:Secondary/Unknown ds:Inconsistent/Outdated C r—
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
– initial device synchronization for the first time:
the following step must done just on one node, i used node1:
[root@node1 i386]# drbdadm — –overwrite-data-of-peer primary r0
– verify:
[root@node1 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node1.test.lab, 2008-05-23 11:45:23
0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r—
ns:792 nr:0 dw:0 dr:792 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
[>………………..] sync’ed: 0.2% (2096260/2097052)K
finish: 2:11:00 speed: 264 (264) K/sec
resync: used:0/31 hits:395 misses:1 starving:0 dirty:0 changed:1
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
[root@node2 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node2.test.lab, 2008-05-23 12:58:18
0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r—
ns:0 nr:1896 dw:1896 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
[>………………..] sync’ed: 0.2% (2095156/2097052)K
finish: 2:02:12 speed: 268 (268) K/sec
resync: used:0/31 hits:947 misses:1 starving:0 dirty:0 changed:1
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
By now, our DRBD device is fully operational, even before the initial synchronization has completed. we can now continue to configure GFS…
– Configuring your nodes to support GFS
before we can configure GFS, we need a littel help from RHCS, the following packages is needed to be installed on the systems:
– “cman” ( RedHat Cluster Maneger)
– “lvm2-cluster” (LVM with Cluster support)
– “gfs-utils” or “gfs2-utils” (GFS1 Utils or GFS2 Utils, as write of this document, i prefer GFS1)
– “kmod-gfs” or “kmod-gfs-xen” for Xen (GFS kernel module)
* we must enable and start the following system services on both nodes:
– cman : it will run ccsd, fenced, dlm and openais.
– clvmd.
– gfs.
starting cman:
before we can start cman, we have to conigure /etc/cluster/cluster.conf i use the following configration:
<?xml version=”1.0″?>
<cluster name=”my-cluster” config_version=”1″>
<cman two_node=”1″ expected_votes=”1″>
</cman>
<clusternodes>
<clusternode name=”node1.test.lab” votes=”1″ nodeid=”1″>
<fence>
<method name=”single”>
<device name=”human” ipaddr=”192.168.1.1″/>
</method>
</fence>
</clusternode>
<clusternode name=”node2.test.lab” votes=”1″ nodeid=”2″>
<fence>
<method name=”single”>
<device name=”human” ipaddr=”192.168.1.2″/>
</method>
</fence>
</clusternode>
</clusternodes>
<fence_devices>
<fence_device name=”human” agent=”fence_manual”/>
</fence_devices>
</cluster>
after editing /etc/cluster/cluster.conf we have to start it in the two nodes in the same time:
on the node1:
[root@node1 i386]# /etc/init.d/cman start
Starting cluster:
Loading modules… done
Mounting configfs… done
Starting ccsd… done
Starting cman… done
Starting daemons… done
Starting fencing… done
[ OK ]
on the node2:
[root@node2 i386]# /etc/init.d/cman start
Starting cluster:
Loading modules… done
Mounting configfs… done
Starting ccsd… done
Starting cman… done
Starting daemons… done
Starting fencing… done
[ OK ]
check nodes:
[root@node1 i386]# cman_tool nodes
Node Sts Inc Joined Name
1 M 4 2008-05-23 14:33:25 node1.test.lab
2 M 316 2008-05-23 14:41:34 node2.test.lab
in the ‘Sts’ column the ‘M’ means that every thing is going fine, if it’s ‘X’ then there is a problem happend..
– starting CLVMD:
first we need to change locking type in /etc/lvm/lvm.conf to 3 in the two nodes:
vi /etc/lvm/lvm.conf
change locking_type = 1 to locking_type = 3
we also need to change the filter option to let vgscan don’t see the duplicated PV (duplicate PV will happen because our xvdb1 will be the backend for drbd0) i changed filter like this
#filter = [ “a/.*/” ]
filter = [ “a|xvda.*|”, “a|drbd.*|”, “r|xvdb.*|” ]
in my filter option, “a|xvda.*|” means add all xvda partition, “a|drbd.*|” means add all drbd partition, and “r|xvdb.*|” means remove (ignore) all xvdb partition (one of them is our partition which is xvdb1)
save and exit..
the first thing to do is vgscan, so it’s read the new configuration:
[root@node1 i386]# vgscan
Reading all physical volumes. This may take a while…
Found volume group “VolGroup00″ using metadata type lvm2
– the following commands must done in one node, i used node1 –
now create our PV:
[root@node1 i386]# pvcreate /dev/drbd0
Physical volume “/dev/drbd0″ successfully created
creating our volume group:
[root@node1 i386]# vgcreate my-vol /dev/drbd0
Volume group “my-vol” successfully created
[root@node1 i386]# vgdisplay
— Volume group —
VG Name my-vol
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
Clustered yes
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 2.00 GB
PE Size 4.00 MB
Total PE 511
Alloc PE / Size 0 / 0
Free PE / Size 511 / 2.00 GB
VG UUID UaUK5v-P3aX-nmCn-Oj3F-XQox-AgxB-UsM0xS
did you noticed Clustered yes?
creating our lv:
[root@node1 i386]# lvcreate -L1.9G –name my-lv my-vol
Rounding up size to full physical extent 1.90 GB
Error locking on node node2.test.lab: device-mapper: reload ioctl failed: Invalid argument
Failed to activate new LV.
creating the GFS:
[root@node1 i386]# gfs_mkfs -p lock_dlm -t my-cluster:www -j 2 /dev/my-vol/my-lv
This will destroy any data on /dev/my-vol/my-lv.
Are you sure you want to proceed? [y/n] y
Device: /dev/my-vol/my-lv
Blocksize: 4096
Filesystem Size: 433092
Journals: 2
Resource Groups: 8
Locking Protocol: lock_dlm
Lock Table: my-cluster:www
Syncing…
All Done
start gfs service:
[root@node1 i386]# /etc/init.d/gfs start
mount it on the first node:
[root@node1 i386]# mount -t gfs /dev/my-vol/my-lv /www
[root@node1 i386]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
9.1G 3.4G 5.3G 40% /
/dev/xvda1 99M 17M 78M 18% /boot
tmpfs 129M 0 129M 0% /dev/shm
/dev/my-vol/my-lv 1.7G 20K 1.7G 1% /www
[root@node1 i386]# ls -lth /www/
total 0
mount it in the second node:
now you have to wait until the initial device synchronization finish, to check:
[root@node2 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node2, 2008-05-23 12:58:18
0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate C r—
ns:0 nr:1970404 dw:1970404 dr:0 al:0 bm:119 lo:0 pe:0 ua:0 ap:0
[=================>..] sync’ed: 93.4% (143276/2097052)K
finish: 0:08:57 speed: 252 (232) K/sec
resync: used:0/31 hits:976756 misses:120 starving:0 dirty:0 changed:120
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
after it finish we need to change it to primary before we can mount it:
[root@node2 i386]# drbdadm primary r0
[root@node2 i386]# cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by root@node2, 2008-05-23 12:58:18
0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r—
ns:0 nr:2113680 dw:2113680 dr:0 al:0 bm:128 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:1048386 misses:128 starving:0 dirty:0 changed:128
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
notice “st:Primary/Primary” it’s what we want!
now to check the volume group:
[root@node2 ~]# vgscan
Reading all physical volumes. This may take a while…
Found volume group “VolGroup00″ using metadata type lvm2
Found volume group “my-vol” using metadata type lvm2
mount it!
[root@node2 i386]# /etc/init.d/gfs start
[root@node2 i386]# mkdir /www
[root@node2 i386]# mount -t gfs /dev/my-vol/my-lv /www
/sbin/mount.gfs: can’t open /dev/my-vol/my-lv: No such file or directory
oOoPps do you remember the error “Error locking on node node2.test.lab: device-mapper: reload ioctl failed: Invalid argumen” when we created our LV in the first node? ok easy, restart clvmd in node2 and try remounting it:
[root@node2 i386]# /etc/init.d/clvmd restart
Deactivating VG my-vol: 0 logical volume(s) in volume group “my-vol” now active
[ OK ]
Stopping clvm: [ OK ]
Starting clvmd: [ OK ]
Activating VGs 2 logical volume(s) in volume group “VolGroup00″ now active
1 logical volume(s) in volume group “my-vol” now active
[ OK ]
[root@node2 i386]# mount -t gfs /dev/my-vol/my-lv /www
aha, lets touch some data:
[root@node2 i386]# touch /www/hi
[root@node2 i386]# ls -lth /www/
total 8.0K
-rw-r–r– 1 root root 0 May 23 16:35 hi
and from node1:
[root@node1 i386]# ls -lth /www/
total 8.0K
-rw-r–r– 1 root root 0 May 23 16:35 hi
cool right:? try it your self…