[클러스터][HPC] Teragon High Performance Computing technical Doc
*** Teragon High Performance Computing technical Doc (표준) ***
=================================================================
작성일 : 2005년 12월 25일 ..ㅠ.ㅠ
직성자 : 서 진우 (alang@syszone.co.kr)
=================================================================
– 목차 –
1. rsh, rlogin 설정
2. ensh 설치 및 설정
3. time sync 설정
4. intel compiler 설치 및 설정
5 PGI Compiler 설치 및 설정
6. ATLAS, Intel Math Library 설치 및 설정
7. mpich 설치 및 설정
8. lammpi 설치 및 설정
9. nfs, nis, automount 설정
10. apache, ganglia 설치 및 설정
11. dhcp, tftp, Pxe 설치 및 설정
12. hpc benchmark tool 설치 및 설정
1. rsh, rlogin 설정
– /etc/hosts define –
# vi /etc/hosts
—————————————————————–
127.0.0.1 localhost.localdomain localhost
192.168.1.254 node00.cluster.bj node00
192.168.1.1 node01.cluster.bj node01
—————————————————————–
– rsh, rlogin config –
[root@node00 ~]# chkconfig rsh on
[root@node00 ~]# chkconfig rlogin on
[root@node00 ~]# vi /etc/securetty
—————————————————————–
..제일 밑에..
rsh
rlogin
—————————————————————–
– 일반 사용자에게 rsh,rlogin 허용하는 설정 –
[root@node00 ~]# vi /etc/hosts.equiv
—————————————————————–
node00
node01
—————————————————————–
– root 사용자에게 rsh,rlogin을 허용하는 설정 –
[root@node00 ~]# vi /root/.rhosts
—————————————————————–
node00
node01
—————————————————————–
– rsh, rlogin test –
[root@node00 ~]# rsh node01
Last login: Sat Feb 4 11:48:53 from node00.cluster.bj
[root@node01 ~]#
[root@node00 ~]# rsh node01 uname -a
Linux node01.cluster.bj 2.6.9-22.EL #1 Mon Sep 19 17:49:49 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
[root@node00 ~]#
[root@node00 ~]# rcp pvm.host node01:/root
;; node00 “pvm.host” 파일을 node01 의 /root 밑에 복사
2. ensh 설치 및 설정
– 설치 –
[root@node00 src]# rpm -Uvh ensh-1.0.0-3.x86_64.rpm
– 설정 –
[root@node00 src]# vi /usr/clx/ensh/etc/nodelist
—————————————————————-
node00
node01
—————————————————————-
[root@node00 src]# ensh –init
—————————————————————
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): <- Enter
Enter passphrase (empty for no passphrase): <- Enter
Enter same passphrase again: <- Enter
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
7e:06:6a:32:06:ac:4e:05:f1:d3:9c:36:50:8f:b2:4c root@node00.cluster.bj
root@node00’s password: <- node00’s root password
root@node00’s password: <-
id_rsa.pub 100% 232 0.2KB/s 00:00
root@node01’s password: <- node01’s root password
root@node01’s password:
id_rsa.pub 100% 232 0.2KB/s 00:00
—————————————————————–
– ensh test –
[root@node00 ~]# ensh uname -a
——————————————————————
### executing in node00
node00 Linux node00.cluster.bj 2.6.9-22.ELsmp #1 SMP Mon Sep 19
### executing in node01
node01 Linux node01.cluster.bj 2.6.9-22.EL #1 Mon Sep 19 17:49:49
;; 일괄 명령 수행
[root@node00 ~]# ensync anaconda-ks.cfg
——————————————————————
### synchronizing node00
building file list … done
sent 61 bytes received 20 bytes 54.00 bytes/sec
total size is 3042 speedup is 37.56
### synchronizing node01
building file list … done
anaconda-ks.cfg
sent 908 bytes received 58 bytes 644.00 bytes/sec
total size is 3042 speedup is 3.15
;; 파일 동기화
3. time sync 설정
– time server 에서 설정 –
[root@node00 ~]# chkconfig time on
– client 에서 설정 –
[root@node01 ~]# rdate -s node00
– ensh을 이용한 클러스터 전체 노드 시간 동기화 –
[root@node00 ~]# timesync
* HPC system time synced ..
– ensh을 이용한 클러스터 전체 노드 시간 확인 –
[root@node00 ~]# timeview
node00 2006. 02. 04. (토) 12:57:25 KST
node01 2006. 02. 04. (토) 12:57:25 KST
4. intel compiler 설치 및 설정
Intel Compiler download site : http://www.intel.com
# tar xzvf l_cc_p_9.0.021.tar.gz
# cd l_cc_p_9.0.021
# ./install.sh
——————————————————————-
**********************************************************************
“Welcome to Installation”
Please make your selection by entering an option:
1. “Intel(R) C++ Compiler 9.0 for Linux*” – install
1a. Readme
1b. Release Notes
1c. Installation Guide
1d. Product Web Site URL
1e. Intel(R) Support Web Site URL
x. Exit.
Please type a selection : 1 —–> Input 1
======================================================================
Please select an option to continue:
1. Proceed with Serial Number to install and register. [Recommended]
2. Provide name of an existing license file.
x. Exit.
Please type your selection : 2 —–> Input 2
======================================================================
Please provide the license file name with full path (*.lic)
x.Exit
License file path : —–> Input 3
/usr/local/src/intel/noncommercial_cpp_l_N4R8-76ZGSDKV.lic
Checking RPM version …
Checking Dependencies …
Checking Kernel and glibc dependencies …
Which of the following would you like to do?
1. Typical Install (Recommended – Installs All Components).
2. Custom Install (Advanced Users Only).
x. Exit.
Please type a selection: 1 —-> Input 4
.
.
‘accept’ to continue,’reject’ toreturn to the main menu : accept -> Input5
Values in […] are the default values.
You can just hit the Enter key where you want to use the default values.
Where do you want to install to? Specify directory starting with ‘/’.
[/opt/intel/cc/9.0] :
..
Installation successful.
“Installation is complete ”
Thank you for using Intel(R) Software Development Products, tools for
improving application performance.
Please make your selection by entering an option:
1. “Intel(R) C++ Compiler 9.0 for Linux*” – install
(v9.0 install detected)
1a. Readme
1b. Release Notes
1c. Installation Guide
1d. Product Web Site URL
1e. Intel(R) Support Web Site URL
x. Exit.
Please type a selection : x
2. compiler compiler config
– C/C++ config
# cd /opt/intel/cc/9.0/bin/
# cp iccvars.sh /etc/profile.d/
# source /etc/profile.d/iccvars.sh
– Intel Fortran compiler config
# cd /opt/intel/fc/9.0/bin/
# cp ifortvars.sh /etc/profile.d/
# source /etc/profile.d/ifortvars.sh
– Intel Math Library install
# cd /usr/local/src/intel
# tar xzvf l_mkl_p_7.2.1.003.tar.gz
# cd l_mkl_p_7.2.1.003
# ./install
– Intel Math Library config
# vi /etc/ld.so.conf
——————————————————
.
.
/opt/intel/mkl721/lib/32
#/opt/intel/mkl721/lib/em64t -> EMT64bit config
——————————————————
# ldconfig
– compiler test (latticeeasy)
# tar xzvf latticeeasy2.0.tar.gz
# cd latticeeasy2.0
# vi makefile
———————————————————
.
#COMPILER = g++
COMPILER = icpc
FLAGS = -O3 -Wall
———————————————————-
# make
# ./latticeeasy
5 PGI Compiler 설치 및 설정
* PGI Compiler 설치 하기
PGI Compiler Source를 /usr/local/src 에 옮겨 놓는다
1. PGI Compiler 설치
# cd /usr/local/src/pgi
# tar xzvf linux86[1]-64.tar.gz
# ./install
————————————————————————
.
YOU ACKNOWLEDGE THAT YOU HAVE READ THIS AGREEMENT AND AGREE TO BE BOUND
BY ITS TERMS. YOU FURTHER AGREE THAT IT IS THE COMPLETE AND EXCLUSIVE
STATEMENT OF AGREEMENT BETWEEN YOU AND ST THAT SUPERSEDE ANY PRIOR
AGREEMENT, ORAL OR WRITTEN, ANY PROPOSAL AND ANY OTHER COMMUNICATIONS
BETWEEN YOU AND STUS RELATING TO THE SUBJECT MATTER OF THIS AGREEMENT.
Address:
The Portland Group
STMicroelectronics, Inc.
9150 SW Pioneer Ct. Suite H
Wilsonville, OR, USA 97070
Do you accept these terms? [accept,decline]
<- accept
This release of PGI software includes the ACML, which is a tuned
math library designed for high performance on AMD64 machines,
including Opteron(TM) and Athlon(TM) 64, and includes both 32-bit
and 64-bit library versions.
More information about the ACML can be found at the ACML web site:
http://www.developwithamd.com/acml
Install the ACML? [y/n]
<- y
If you agree to abide by the terms and conditions of this Agreement,
please click “Accept.” IF YOU DO NOT AGREE TO ABIDE BY THE TERMS
AND CONDITIONS OF THIS AGREEMENT AND CLICK “DECLINE,” YOU MAY NOT
USE THE LICENSED MATERIALS AND MUST DESTROY THEM OR RETURN THEM
TO AMD IMMEDIATELY.
Do you accept these terms? [accept,decline]
<- accept
Installation directory? [/usr/pgi]
설치할 디렉토리를 지정한다. (default)
If you don’t already have permanent keys for this product/release, a
fifteen-day evaluation license can be created now.
Create an evaluation license? [y/n]
<- y
PGI Software: PGI Fortran/C/C++ compilers and tools for 32-bit x86
and 64-bit AMD64 processor-based computer systems.
Do you accept these terms? [accept,decline]
<- accept
Creating temporary license.
Please enter your name: <- root
Please enter your user name: <- root
Please enter your E-mail address: <- root@localhost
You have entered the following information:
name root
user name root
E-mail address root@localhost
Do you wish to change anything? [yes/no]:
<- no
License acquired
The above information was saved to /usr/pgi/license.info.
Do you want the files in the install directory to be read-only? [y,n]
<- y
*** 설치 끝 ***
2. PGI Compiler 환경 설정
# vi /etc/profile.d/pgi.sh
———————————————————————-
#!/bin/sh
export PGI=/usr/pgi
export PATH=$PGI/linux86-64/6.0/bin:$PATH
export MANPATH=$MANPATH:$PGI/linux86-64/6.0/man
export LM_LICENSE_FILE=$PGI/license.dat
———————————————————————-
# source /etc/profile.d/pgi.sh
*** 주의 : 실제 설치된 경로를 확인하고 실제 경로에 맞게 수정한다.
3. 설치 확인
# pgf77 test.f
———————————————————————–
NOTE: your evaluation license will expire in 14 days, 23.5 hours.
For a permanent license, please read the order acknowledgement
that you received. Connect to https://www.pgroup.com/License with
the username and password in the order acknowledgement.
Name: root
User: root
Email: root@localhost
Hostid: PGI=0013D4E0BA225055DE54B8
PGFTN-F-0002-Unable to open source input file: test.f
이와 같이 나오면 정상 …
*** 참고 : PGI 는 상용 컴파일러로 15일 동안 사용할 권한이 주어진다.
15일 이후에서 다시 라이센스를 발급 받아야 한다.
*** PGI Compiler 테스트
# cd /usr/pgi/linux86-64/6.0/EXAMPLES/linpack/UNIX
# make
# ./linpkrd
————————————————————————–
norm. resid resid machep x(1)-1 x(n)-1
1.67117300E+00 7.41628980E-14 2.22044605E-16 -1.49880108E-14 -1.89848137E-14
times are reported for matrices of order 100
sgefa sgesl total Kflops unit ratio
times for array with leading dimension of 201
0.00076 0.00003 0.00079 872490. 0.00229 0.01405
0.00076 0.00003 0.00078 874875. 0.00229 0.01402
0.00075 0.00003 0.00078 879148. 0.00227 0.01395
0.00077 0.00000 0.00077 886997. 0.00225 0.01382
times for array with leading dimension of 200
0.00075 0.00003 0.00078 880223. 0.00227 0.01393
0.00074 0.00003 0.00078 882651. 0.00227 0.01389
0.00075 0.00003 0.00078 884820. 0.00226 0.01386
0.00077 0.00000 0.00077 888831. 0.00225 0.01380
ROLLED DOUBLE PRECISION LINPACK PERFORMANCE 886997 KFLOPS
FORTRAN STOP
6. ATLAS, Intel Math Library 설치 및 설정
Software download : http://math-atlas.sourceforge.net
# tar xzvf atlas3.7.8.tar.gz
# cd ATLAS/
**************************************************************
make config CC=<ANSI C compiler>
( if you have other Compiler. Default is gcc Compiler )
***************************************************************
# make config CC=gcc
============================================================================
.
.
011
010
009
008
007
006
005
004
003
002
001
Enter number at top left of screen [0]: < enter >
=============================================================================
IMPORTANT
=============================================================================
Before going any further, check
http://math-atlas.sourceforge.net/errata.html
This is the ATLAS errata file, which keeps a running count of all known
ATLAS bugs and system problems, with associated workarounds or fixes.
IF YOU DO NOT CHECK THIS FILE, YOU MAY BE COMPILING A LIBRARY WITH KNOWN BUGS.
Have you scoped the errata file? [y]:
.
.
Configuration completed successfully. You may want to examine the make
make install arch=Linux_P4SSE3
# make install arch=Linux_P4SSE3
그런 후..
# cd ..
# cp -a ATLAS /usr/local/atlas
ATLAS Library Path : /usr/local/atlas/lib/Linux_P4SSE3
7. mpich 설치 및 설정
– mpich 기본 설치 –
# tar xzvf mpich.tar.gz
# cd mpich-1.2.6/
# ./configure –prefix=/usr/local/mpich-gcc –with-device=ch_p4 \\
–with-arch=LINUX
# make
# make install
– mpich config
# cd /usr/local/mpich-gcc
# cd share
# vi machines.LINUX
——————————————————————-
node00:1
node01:1
node02:1
——————————————————————-
# vi /etc/profile.d/mpich-gcc.sh
——————————————————————–
#/bin/sh
MPICH_HOME=/usr/local/mpich-gcc
PATH=$MPICH_HOME/bin:$PATH
export MPICH_HOME PATH
———————————————————————
# source /etc/profile.d/mpich-gcc.sh
– mpich test
* cpi test
# cd /usr/local/mpich-gcc/examples
# make
# mpirun -np 3 cpi
* parall_add test
*** parall_add.c file check ..
# mpicc -o parall_add parall_add.c -lmpich
# mpirun -np 3 parall_add
———————————————————————–
*****************************************************
Notice !!
If input is not enough large,
Parallel method is not efficient.
This program will add from 1 to your input.
*****************************************************
Input integer number : 10000000000
Parallel SUM = 3221225472, Wall clock time = 4.481901
Serial Sum = 3221225472, Wall clcok time = 11.650580
SPEED UP = 2.599473
Goodbye! : )
————————————————————————–
– mpich 고급 설치 –
* intel compiler 설치 환경 확인
cc : /opt/intel/cce/9.0/bin/icc
fc : /opt/intel/fce/9.0/bin/ifort
c++ : /opt/intel/cce/9.0/bin/icpc
* mpich + intel compiler 환경 구축
# cd /usr/local/src
# tar xzvf mpich.tar.gz
# cd mpich-1.2.6
# ./configure –prefix=/usr/local/mpich-intel -fc=/opt/intel/fce/9.0/bin/ifort -cc=/opt/intel/cce/9.0/bin/icc -c++=/opt/intel/cce/9.0/bin/icpc –with-device=ch_p4 –with-arch=LINUX
# make && make install
– mpich + pgi compiler 환경 구축
cc : /usr/pgi/linux86-64/6.0/bin/pgcc
fc : /usr/pgi/linux86-64/6.0/bin/pgf77
c++ : /usr/pgi/linux86-64/6.0/bin/pgCC
# cd /usr/local/src
# tar xzvf mpich.tar.gz
# cd mpich-1.2.7
# ./configure –prefix=/usr/local/mpich-pgi -fc=/usr/pgi/linux86-64/6.0/bin/pgf77 -cc=/usr/pgi/linux86-64/6.0/bin/pgcc -c++=/usr/pgi/linux86-64/6.0/bin/pgCC -f90=/usr/pgi/linux86-64/6.0/bin/pgf90 -f90linker=/usr/pgi/linux86-64/6.0/bin/pgf90 –with-device=ch_p4 –with-arch=LINUX –enable-f77 –enable-f90modules
# make && make install
– mpich 설정 –
[root@node00 ~]# vi /usr/local/mpich-gcc/share/machines.LINUX
—————————————————————-
# hostname:processor_num
node00:2
node01:1
—————————————————————-
**** 주의 ***********
mpich 는 node00의 /usr/local 밑에 mpich-gcc, mpich-intel, mpich-pgi 란 이름의
폴더로 생성됨.
ensync 로 전 클러스터 노드에 동기화 시키면 된다.
8. lammpi 설치 및 설정
Software download : http://lammpi.org/download
– 기본 설치
# tar xzvf lam-7.1.1.tar.gz
# cd lam-7.1.1
# ./configure –prefix=/usr/local/lam-gcc
# make && make install
– 고급 설치
pgi compiler 연동 시 ..
# CC=/usr/local/pgi/linux86-64/5.2/bin/pgcc
# CXX=/usr/local/pgi/linux86-64/5.2/bin/pgCC
# FC=/usr/local/pgi/linux86-64/5.2/bin/pgf90
# CFLAGS=-fast
# FFLAGS=-fast
# CXXFLAGS=-fast
# export CC CXX FC CFLAGS FFLAGS CXXFLAGS
# ./configure –prefix=/usr/local/lam-pgi
# make && make install
혹은 ..
./configure –prefix=/usr/local/lam CC=/usr/local/pgi/linux86-64/5.2/bin/pgcc CXX=/usr/local/pgi/linux86-64/5.2/bin/pgCC FC=/usr/local/pgi/linux86-64/5.2/bin/pgf90 CFLAGS=-fast FFLAGS=-fast
intel complier 연동 시 ..
# CC=/usr/local/intel/cc/bin/icc
# CXX=/usr/local/intel/cc/bin/icpc
# FC=/usr/local/intel/fc/bin/ifc
# export CC CXX FC
./configure –prefix=/usr/local/lam-intel CFLAGS=’-O3 -fast -unroll -axW -align’ FFLAGS=’-O3 -fast -unroll -axW -align’
# make && make install
참고 :
lamboot 수행 시 SSI boot modules 에러가 발생하거나 기본 SSI boot modules인 rsh를 ssh로
변경하고 싶을 때는 “–with-rsh=ssh” 옵션을 configure 옵션에 추가해 준다.
– lammpi 설정
# vi /etc/profile.d/lam-gcc.sh
————————————————————-
#!/bin/sh
#LAMHOME=/usr/local/lam-<compiler>
LAMHOME=/usr/local/lam-gcc
PATH=$PATH:/usr/local/lam-gcc/bin
export LAMHOME PATH
————————————————————
# source /etc/profile.dlam-gcc.sh
# vi /etc/lamhosts
————————————————————-
# 노드별 CPU 1개 일 경우
node00
node01
node02
.
# 노드별 CPU 2개 일 경우
node00 cpu=2
node01 cpu=2
node02 cpu=2
————————————————————
– lammpi test ( parall_add )
참고 :
lammpi 는 mpich 와 달리 일반 계정에서만 실행이 가능하다.
$ lamboot -v -b /etc/lamhost ;; lamboot 실행
—————————————————————-
LAM 7.1.1/MPI 2 C++/ROMIO – Indiana University
n-1<31167> ssi:boot:base:linear: booting n0 (node00)
n-1<31167> ssi:boot:base:linear: booting n1 (node01)
.
—————————————————————-
$ lamnodes ;; lammpi node 구성 확인
$ mpicc -o parall_add parall_add.c -lmpi
$ mpirun -np 3 parall_add
———————————————————————–
*****************************************************
Notice !!
If input is not enough large,
Parallel method is not efficient.
This program will add from 1 to your input.
*****************************************************
Input integer number : 10000000000
Parallel SUM = 3221225472, Wall clock time = 4.481901
Serial Sum = 3221225472, Wall clcok time = 11.650580
SPEED UP = 2.599473
Goodbye! : )
————————————————————————–
9. nfs, nis, automount 설정
;; 사용자 통합 홈 디렉토리 및 통합 인증 시스템 환경 구축
– NFS 설정 (홈디렉토리 공유) –
* 서버 설정 *
[root@node00 ~]# vi /etc/exports
———————————————————————-
/home *(rw,no_root_squash)
———————————————————————-
[root@node00 ~]# /etc/rc.d/init.d/portmap restart
[root@node00 ~]# /etc/rc.d/init.d/nfs restart
[root@node00 ~]# chkconfig –level 345 portmap restart
[root@node00 ~]# chkconfig –level 345 nfs restart
* client 설정 *
[root@node01 ~]# vi /etc/auto.master
——————————————————————
/home /etc/auto.home –timeout=60
——————————————————————
[root@node01 ~]# vi /etc/auto.home
—————————————————————–
* -rw,soft,intr node00:/home/&
—————————————————————–
[root@node01 ~]# /etc/rc.d/init.d/portmap restart
[root@node01 ~]# /etc/rc.d/init.d/autofs restart
[root@node01 ~]# chkconfig –level 345 portmap on
[root@node01 ~]# chkconfig –level 345 autofs restart
– NIS 설정 (계정 통합) –
/etc/host.conf 파일에 multi on 설정을 추가 한다.
[root@node00 ~]# vi /etc/host.conf
——————————————————————–
order hosts,bind
multi on
——————————————————————–
NIS 도메인 이름을 결정한다. ( DNS 의 도메인 과 NIS 도메인은 다른 차원의 것이다. )
[root@node00 ~]# nisdomainname cluster.bj
[root@node00 ~]# vi /etc/sysconfig/network
——————————————————————–
NETWORKING=yes
HOSTNAME=node00.cluster.bj
GATEWAY=192.168.1.254
NISDOMAIN=cluster.bj
——————————————————————-
– 서버 설정
그런후 NIS 시스템에서 shadow file 을 인식 할수 있게 설정을 변경한다.
/var/yp/Makefile 을 열어서 all: 로 문자열을 검색하면
[root@node00 ~]# vi /var/yp/Makefile
——————————————————————–
.
.
all: passwd group hosts rpc services netid protocols mail \\
# publickey shadow netgrp networks ethers bootparams printcap \\
# amd.home auto.master auto.home auto.local passwd.adjunct \\
# timezone locale netmasks
나온다. 여기서 주석내용중의 shadow 를 주석 밖으로 빼내준다
all: passwd group hosts rpc services netid protocols mail shadow \\
# publickey netgrp networks ethers bootparams printcap \\
# amd.home auto.master auto.home auto.local passwd.adjunct \\
# timezone locale netmasks
———————————————————————
그런 후 ypserv 와 yppasswd 를 시작 해 준다. 기본적으로 yp 는 rpc 를 이용하는 서비스
임으로 portmap 을 먼저 실행하여야 한다.
[root@node00 ~]# /etc/rc.d/init.d/portmap start
[root@node00 ~]# /etc/rc.d/init.d/ypserv start
[root@node00 ~]# /etc/rc.d/init.d/yppasswdd start
[root@node00 ~]# chkconfig –level 345 portmap on
[root@node00 ~]# chkconfig –level 345 ypserv on
[root@node00 ~]# chkconfig –level 345 yppasswd on
[root@node00 ~]# make -C /var/yp
– 클라이언트 설정
[root@node01 ~]# vi /etc/yp.conf
————————————————————————–
ypserver node00.cluster.bj
domain node00.cluster.bj
————————————————————————–
그런 뒤 ypbind 를 실행한다. ypbind 역시 portmap 이 먼저 실행 되어야 한다.
# /etc/rc.d/init.d/portmap restart
# /etc/rc.d/init.d/ypbind restart
그런 후 /etc/passwd, /etc/group 설정에서 nis 로 계정을 인증 하겠다는 형식이 추가 되어야 한다.
/etc/passwd 파일의 제일 하단에 < +:*:0:0::: > 을 추가해 준다.
/etc/group 파일의 제일 하단에 < +:*:0:0: > 을 추가해 준다.
[root@node01 ~]# vi /etc/passwd
————————————————————————-
.
.
gujo:x:500:501::/home/gujo:/bin/bash
+:*:0:0:::
————————————————————————-
[root@node01 ~]# vi /etc/group
————————————————————————-
.
.
gujo:x:501:
+:*:0:0:
————————————————————————-
[root@node01 ~]# vi /etc/nsswitch.conf
————————————————————————–
.
.
passwd: files
shadow: files
group: files
위 내용을 ..아래로 변경 ..
passwd: files nisplus nis
shadow: files nisplus nis
group: files nisplus nis
————————————————————————–
– NIS test –
[root@node01 ~]# yptest
————————————————————————–
.
.
Test 9: yp_all
gujo gujo:$1$sdwZyE3g$C0KQ5zLQB02F7G1NkCPy0.:500:501::/home/gujo:/bin/bash
test01 test01:$1$v4bmrjYa$OrF2IKtzWJTUiooXO/MD40:505:506::/home/test01:/bin/bash
alang alang:$1$/e6fwjFu$2/Hn6ujxAbBwj3LZKx/wn0:501:502::/home/alang:/bin/bash
user03 user03:$1$7S75akgI$qj5E2SmuvTeh46It2g39Q/:504:505::/home/user03:/bin/bash
user02 user02:$1$UyZf0tLw$oRAVYxO3bwz4NyjR8k1eb0:503:504::/home/user02:/bin/bash
user01 user01:$1$uemX02H0$wMhe1lzVqDL7XOc4AMm1u/:502:503::/home/user01:/bin/bash
————————————————————————-
[root@node01 ~]# ypcat passwd
————————————————————————-
gujo:$1$sdwZyE3g$C0KQ5zLQB02F7G1NkCPy0.:500:501::/home/gujo:/bin/bash
test01:$1$v4bmrjYa$OrF2IKtzWJTUiooXO/MD40:505:506::/home/test01:/bin/bash
alang:$1$/e6fwjFu$2/Hn6ujxAbBwj3LZKx/wn0:501:502::/home/alang:/bin/bash
user03:$1$7S75akgI$qj5E2SmuvTeh46It2g39Q/:504:505::/home/user03:/bin/bash
user02:$1$UyZf0tLw$oRAVYxO3bwz4NyjR8k1eb0:503:504::/home/user02:/bin/bash
user01:$1$uemX02H0$wMhe1lzVqDL7XOc4AMm1u/:502:503::/home/user01:/bin/bash
————————————————————————-
10. apache, ganglia 설치 및 설정
– apache 는 OS 설치 단계에서 “웹서버” 항목을 체크하도록 함.
– ganglia 설치
* ganglia server 설치 *
[root@node00 ganglia]# rpm -Uvh ganglia-clx-gmetad-3.0.1-1.x86_64.rpm
[root@node00 ganglia]# rpm -Uvh ganglia-clx-web-3.0.1-1.noarch.rpm
[root@node00 ganglia]# rpm -Uvh rrdtool-clx-1.2.10-1.x86_64.rpm
* ganglia client 설치 *
[root@node01 ganglia]# rpm -Uvh ganglia-clx-gmond-3.0.1-1.x86_64.rpm
* ganglia server 설정 *
[root@node00 clx]# vi /etc/gmetad.conf
————————————————————————–
.
.
data_source “cluster.bj” localhost
.
.
————————————————————————–
[root@node00 clx]# vi /usr/clx/html/ganglia/conf.php
————————————————————————–
.
$gmetad_root = “/var/lib/ganglia”;
$rrds = “$gmetad_root/rrds”;
define(“RRDTOOL”, “/usr/bin/rrdtool”);
.
—————————————————————————
[root@node00 clx]# /etc/rc.d/init.d/gmetad restart
* ganglia client 설정
[root@node00 clx]# vi /etc/gmond.conf
—————————————————————————
.
cluster {
name = “bj hpc”
}
—————————————————————————-
[root@node00 clx]# ensync /etc/gmond.conf
;; gmond.conf 설정 동기화
[root@node00 clx]# /etc/rc.d/init.d/gmond restart
[root@node01 ~]# /etc/rc.d/init.d/gmond restart
11. dhcp, tftp, Pxe 설치 및 설정
######## DHCP + NFS + Tftp + Kickstart + PXE를 이용한
원격 네트워크 운영체제 자동 설치 #####################
11.1. dhcp 설치 및 설정
# cd /data1/os/rh4_x64_up2/RedHat/RPMS
# rpm -Uvh dhcp-3.0.1-12_EL.x86_64.rpm
# rpm -Uvh dhcp-devel-3.0.1-12_EL.x86_64.rpm
# vi /etc/dhcpd.conf
————————————————————-
### DHCP Server 설정 ###################################
#
# 설치 환경에 맞게 네트워크 정보 수정
#
########################################################
ddns-update-style interim;
ignore client-updates;
default-lease-time 600;
max-lease-time 7200;
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 192.168.1.254;
#option domain-name-servers 192.168.1.254;
#option domain-name “cluster.bj”;
# PXE 설정 시 아래 추가########################
allow booting;
allow bootp;
class “pxeclients” {
match if substring (option vendor-class-identifier, 0, 9) = “PXEClient”;
next-server 192.168.1.254;
filename “linux-install/pxelinux.0”;
}
#############################################################################
subnet 192.168.1.0 netmask 255.255.255.0 {
range 192.168.1.100 192.168.1.253;
}
———————————————————————-
# /etc/rc.d/init.d/dhcpd restart
11.2. NFS 설정 하기
운영체제 패키지가 있는 디렉토리를 NFS로 서비스 해야한다.
# vi /etc/exports
———————————————————————-
/data1/os/rh4_x64_up2 *(rw)
———————————————————————-
11.3. Tftp + PXE 설치 및 설정 하기
* 패키지 설치
# cd /data1/os/rh4_x64_up2/RedHat/RPMS/
# rpm -Uvh tftp-0.39-1.x86_64.rpm
# rpm -Uvh tftp-server-0.39-1.x86_64.rpm
# rpm -Uvh system-config-netboot-0.1.32-1_EL4.x86_64.rpm
* tftp 설정
# vi /etc/xinetd.d/tftp
————————————————————————-
service tftp
{
disable = no # disable = yes -> no
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -s /tftpboot
per_source = 11
cps = 100 2
flags = IPv4
}
————————————————————————–
# /etc/rc.d/init.d/xinetd restart
* pxe 설정
# mkdir /tftpboot/linux-install/rhes4
# cd /data1/os/rh4_x64_up2/images/pxeboot/
# cp vmlinuz /tftpboot/linux-install/rhes4/
# cp initrd.img /tftpboot/linux-install/rhes4/
# vi /tftpboot/linux-install/pxelinux.cfg/default
—————————————————————————
default local
timeout 10
prompt 1
display msgs/boot.msg
F1 msgs/boot.msg
F2 msgs/general.msg
F3 msgs/expert.msg
F4 msgs/param.msg
F5 msgs/rescue.msg
F7 msgs/snake.msg
LABEL local
localboot 1
LABEL node01
KERNEL rhes4/vmlinuz
APPEND initrd=rhes4/initrd.img ramdisk_size=10000 ks=nfs:192.168.1.254:/data1/os/rh4_x64_up2/ks1.cfg ksdevice=eth0
LABEL node02
KERNEL rhes4/vmlinuz
APPEND initrd=rhes4/initrd.img ramdisk_size=10000 ks=nfs:192.168.1.254:/data1/os/rh4_x64_up2/ks2.cfg ksdevice=eth0
———————————————————————————
* kickstart 설정 하기
# vi /data1/os/rh4_x64_up2/ks1.cfg
———————————————————————————
# kickstart 자동 설치
install
# nfs 로 설치 이미지 다운
nfs –server=192.168.1.254 –dir=/data1/os/rh4_x64_up2
# 언어 설정
lang en_US.UTF-8
# 지원 언어 설정
langsupport –default=ko_KR.UTF-8 en_US.UTF-8 ko_KR.UTF-8
# 키보드 설정
keyboard us
# 네트워크 설정 , 네트워크 설정 시 해당 정보를 수정해야 한다.
network –device eth0 –bootproto static –ip 192.168.1.1 –netmask 255.255.255.0 –gateway 192.168.1.254 –hostname node01.cluster.bj
# root 패스워드 ( no touch )
rootpw –iscrypted $1$P.9z.LGA$MOrwcO86rCh2IOt71tqIq1
# 방화벽 설정 firewall –disabled
# 보안 설정
selinux –disabled
authconfig –enableshadow –enablemd5
# 시간대 설정
timezone Asia/Seoul
# 설치 모드 설정 ( 만일 그래픽 모드에서 설치를 하실려면 아래 text를 삭제하면 된다.)
text
skipx
bootloader –location=mbr –append=”rhgb quiet”
# 파티션 설정
clearpart –all
part /boot –fstype ext3 –size=200
part /usr –fstype ext3 –size=10000
part swap –size=4000
part / –fstype ext3 –size=3000 –asprimary
part /var –fstype ext3 –size=2000
# 패키지 설정
%packages
@ compat-arch-development
@ engineering-and-scientific
@ admin-tools
@ editors
@ emacs
@ system-tools
@ korean-support
@ gnome-software-development
@ text-internet
@ x-software-development
@ legacy-network-server
@ dns-server
@ gnome-desktop
@ dialup
@ ftp-server
@ compat-arch-support
@ legacy-software-development
@ smb-server
@ base-x
@ server-cfg
@ sound-and-video
@ development-tools
@ graphical-internet
-evolution
-rusers
tetex-xdvi
e2fsprogs
pvm
sysstat
gftp
-rwho
rsh-server
vnc
iptraf
-evolution-webcal
kernel-devel
kernel-smp-devel
mc
nmap-frontend
thunderbird
%post
—————————————————————————————-
이제 node01 서버를 리부팅 하면..
Welcome to Gujo Cluster Auto Installer!
.-=-. .–.
__ .’ ‘. / ” )
_ .’ ‘. / .-. \\ / .-‘
( \\ / .-. \\ / / \\ \\ / /
\\ `-` / \\ `-‘ / \\ `-` /
`-.-` ‘.____.’ `.____.’
Input Hostname of the Operation System you wish to install:
*** Example > boot : node01
와 같은 화면이 나올 것이다.
boot : 에 “node01” 이라 입력하면 자동으로 node01 에 운영체제가 설치 된다.
12. hpc benchmark tool 설치 및 설정
[root@node00 ~]# cd /home/gujo/
[root@node00 gujo]# tar xzvf hpl.tgz
[root@node00 gujo]# cd hpl
[root@node00 hpl]# cp setup/Make.Linux_ATHLON_CBLAS Make.Linux_PIV
[root@node00 hpl]# vi Make.Linux_PIV
—————————————————————————
.
.
ARCH = Linux_PIV # 해당 arch 명을 적어 준다. (Make.XXXXXXX)
.
.
TOPdir = /home/gujo/hpl
.
MPdir = /usr/local/mpich-gcc
.
LAdir = /usr/local/ATLAS/lib/Linux_P4E64SSE3_2
.
CC = /usr/local/mpich-gcc/bin/mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
LINKER = /usr/local/mpich-gcc/bin/mpicc
LINKFLAGS = $(CCFLAGS)
—————————————————————————-
[root@node00 hpl]# vi Makefile
—————————————————————————-
.
arch = Linux_PIV
.
—————————————————————————-
[root@node00 hpl]# vi Make.top
—————————————————————————-
.
arch = Linux_PIV
.
—————————————————————————-
** xhpl 컴파일 **
[root@node00 hpl]# make build arch=Linux_PIV
[root@node00 hpl]# cd bin/Linux_PIV
[root@node00 Linux_PIV]# ls xhpl
xhpl
** HPL.dat 수정 **
[root@node00 Linux_PIV]# vi HPL.dat
—————————————————————————-
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
3 # of problems sizes (N)
10000 15000 16500 Ns
3 # of NBs
90 100 128 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps
3 Qs
16.0 threshold
1 # of panel fact
1 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
—————————————————————————-
;; 위 HPL.dat 는 프로세서가 3개로 설정이 되어 있다. 서버가 많아 지면 그기에 맞는
프로세서 수의 HPL.dat 파일의 수정이 필요하다.
Ps x Qs = 전체 프로세서 갯수
Ns x Ns x 8 = 해당 문제를 풀때 사용되는 메모리 용량
[root@node00 Linux_PIV]# mpirun -np 3 xhpl
—————————————————————————–
.
—————————————————————————-
WR00L2C4 16500 128 1 3 539.04 5.556e+00
—————————————————————————-
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0329561 …… PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0227421 …… PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0046330 …… PASSED
============================================================================
T/V N NB P Q Time Gflops
—————————————————————————-
WR00L2C4 16500 128 1 3 524.76 5.708e+00
—————————————————————————-
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0329561 …… PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0227421 …… PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0046330 …… PASSED
============================================================================
Finished 27 tests with the following results:
27 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
—————————————————————————-
위 벤치마크에서 bj cluster system 의 최대 flops 는 5.7Glops 으로 측정됨.