[클러스터][HPC] Teragon High Performance Computing technical Doc

*** Teragon High Performance Computing technical Doc (표준) ***

=================================================================

                        작성일 : 2005년 12월 25일 ..ㅠ.ㅠ

                        직성자 : 서 진우 (alang@syszone.co.kr)

=================================================================

– 목차 –

1. rsh, rlogin 설정

2. ensh 설치 및 설정

3. time sync 설정

4. intel compiler 설치 및 설정

5 PGI Compiler 설치 및 설정

6. ATLAS, Intel Math Library 설치 및 설정

7. mpich 설치 및 설정

8. lammpi 설치 및 설정

9. nfs, nis, automount 설정

10. apache, ganglia 설치 및 설정

11. dhcp, tftp, Pxe 설치 및 설정

12. hpc benchmark tool 설치 및 설정

1. rsh, rlogin 설정

– /etc/hosts define –

# vi /etc/hosts

—————————————————————–

127.0.0.1       localhost.localdomain localhost

192.168.1.254   node00.cluster.bj       node00

192.168.1.1     node01.cluster.bj       node01

—————————————————————–

– rsh, rlogin config –

[root@node00 ~]# chkconfig rsh on

[root@node00 ~]# chkconfig rlogin on

[root@node00 ~]# vi /etc/securetty

—————————————————————–

..제일 밑에..

rsh

rlogin

—————————————————————–

– 일반 사용자에게 rsh,rlogin 허용하는 설정 –

[root@node00 ~]# vi /etc/hosts.equiv

—————————————————————–

node00

node01

—————————————————————–

– root 사용자에게 rsh,rlogin을 허용하는 설정 –

[root@node00 ~]# vi /root/.rhosts

—————————————————————–

node00

node01

—————————————————————–

– rsh, rlogin test –

[root@node00 ~]# rsh node01

Last login: Sat Feb  4 11:48:53 from node00.cluster.bj

[root@node01 ~]#

[root@node00 ~]# rsh node01 uname -a

Linux node01.cluster.bj 2.6.9-22.EL #1 Mon Sep 19 17:49:49 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux

[root@node00 ~]#

[root@node00 ~]# rcp pvm.host node01:/root

;; node00 “pvm.host” 파일을 node01 의 /root 밑에 복사

2. ensh 설치 및 설정

– 설치 –

[root@node00 src]# rpm -Uvh ensh-1.0.0-3.x86_64.rpm

– 설정 –

[root@node00 src]# vi /usr/clx/ensh/etc/nodelist

—————————————————————-

node00

node01

—————————————————————-

[root@node00 src]# ensh –init

—————————————————————

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa): <- Enter

Enter passphrase (empty for no passphrase): <- Enter

Enter same passphrase again: <- Enter

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

7e:06:6a:32:06:ac:4e:05:f1:d3:9c:36:50:8f:b2:4c root@node00.cluster.bj

root@node00’s password: <- node00’s root password

root@node00’s password: <-

id_rsa.pub                                                             100%  232     0.2KB/s   00:00

root@node01’s password: <- node01’s root password

root@node01’s password:

id_rsa.pub                                                             100%  232     0.2KB/s   00:00

—————————————————————–

– ensh test –

[root@node00 ~]# ensh uname -a

——————————————————————

### executing in node00

node00 Linux node00.cluster.bj 2.6.9-22.ELsmp #1 SMP Mon Sep 19

### executing in node01

node01 Linux node01.cluster.bj 2.6.9-22.EL #1 Mon Sep 19 17:49:49

;; 일괄 명령 수행

[root@node00 ~]# ensync anaconda-ks.cfg

——————————————————————

### synchronizing node00

building file list … done

sent 61 bytes  received 20 bytes  54.00 bytes/sec

total size is 3042  speedup is 37.56

### synchronizing node01

building file list … done

anaconda-ks.cfg

sent 908 bytes  received 58 bytes  644.00 bytes/sec

total size is 3042  speedup is 3.15

;; 파일 동기화

3. time sync 설정

– time server 에서 설정 –

[root@node00 ~]# chkconfig time on

– client 에서 설정 –

[root@node01 ~]# rdate -s node00

– ensh을 이용한 클러스터 전체 노드 시간 동기화 –

[root@node00 ~]# timesync

* HPC system time synced ..

– ensh을 이용한 클러스터 전체 노드 시간 확인 –

[root@node00 ~]# timeview

node00 2006. 02. 04. (토) 12:57:25 KST

node01 2006. 02. 04. (토) 12:57:25 KST

4. intel compiler 설치 및 설정

Intel Compiler download site : http://www.intel.com

# tar xzvf l_cc_p_9.0.021.tar.gz

# cd l_cc_p_9.0.021

# ./install.sh

——————————————————————-

**********************************************************************

“Welcome to Installation”

Please make your selection by entering an option:

1. “Intel(R) C++ Compiler 9.0 for Linux*” – install

        1a. Readme

        1b. Release Notes

        1c. Installation Guide

        1d. Product Web Site URL

        1e. Intel(R) Support Web Site URL

x. Exit.

Please type a selection  :   1      —–> Input 1

======================================================================

Please select an option to continue:

        1. Proceed with Serial Number to install and register. [Recommended]

        2. Provide name of an existing license file.

        x. Exit.

Please type your selection  :   2    —–> Input 2

======================================================================

Please provide the license file name with full path (*.lic)

        x.Exit

License file path  :  —–> Input 3

/usr/local/src/intel/noncommercial_cpp_l_N4R8-76ZGSDKV.lic

Checking RPM version …

Checking Dependencies …

Checking Kernel and glibc dependencies …

Which of the following would you like to do?

1.    Typical Install (Recommended – Installs All Components).

2.    Custom Install (Advanced Users Only).

x.    Exit.

Please type a selection:   1    —-> Input 4

.

.

‘accept’ to continue,’reject’ toreturn to the main menu : accept -> Input5

Values in […] are the default values.

You can just hit the Enter key where you want to use the default values.

Where do you want to install to?  Specify directory starting with ‘/’.

[/opt/intel/cc/9.0]   :

..

Installation successful.

“Installation is complete ”

Thank you for using Intel(R) Software Development Products, tools for

improving application performance.

Please make your selection by entering an option:

1. “Intel(R) C++ Compiler 9.0 for Linux*” – install

        (v9.0 install detected)

        1a. Readme

        1b. Release Notes

        1c. Installation Guide

        1d. Product Web Site URL

        1e. Intel(R) Support Web Site URL

x. Exit.

Please type a selection  :   x

2. compiler compiler config

– C/C++ config

# cd /opt/intel/cc/9.0/bin/

# cp iccvars.sh /etc/profile.d/

# source /etc/profile.d/iccvars.sh

– Intel Fortran compiler config

# cd /opt/intel/fc/9.0/bin/

# cp ifortvars.sh /etc/profile.d/

# source /etc/profile.d/ifortvars.sh

– Intel Math Library install

# cd /usr/local/src/intel

# tar xzvf l_mkl_p_7.2.1.003.tar.gz

# cd l_mkl_p_7.2.1.003

# ./install

– Intel Math Library config

# vi /etc/ld.so.conf

——————————————————

.

.

/opt/intel/mkl721/lib/32

#/opt/intel/mkl721/lib/em64t   -> EMT64bit config

——————————————————

# ldconfig

– compiler test (latticeeasy)

# tar xzvf latticeeasy2.0.tar.gz

# cd latticeeasy2.0

# vi makefile

———————————————————

.

#COMPILER = g++

COMPILER = icpc

FLAGS = -O3 -Wall

———————————————————-

# make

# ./latticeeasy

5 PGI Compiler 설치 및 설정

* PGI Compiler 설치 하기

PGI Compiler Source를 /usr/local/src 에 옮겨 놓는다

1. PGI Compiler 설치

# cd /usr/local/src/pgi

# tar xzvf linux86[1]-64.tar.gz

# ./install

————————————————————————

.

YOU ACKNOWLEDGE THAT YOU HAVE READ THIS AGREEMENT AND AGREE TO BE BOUND

BY ITS TERMS.  YOU FURTHER AGREE THAT IT IS THE COMPLETE AND EXCLUSIVE

STATEMENT OF AGREEMENT BETWEEN YOU AND ST THAT SUPERSEDE ANY PRIOR

AGREEMENT, ORAL OR WRITTEN, ANY PROPOSAL AND ANY OTHER COMMUNICATIONS

BETWEEN YOU AND STUS RELATING TO THE SUBJECT MATTER OF THIS AGREEMENT.

Address:

        The Portland Group

        STMicroelectronics, Inc.

        9150 SW Pioneer Ct.  Suite H

        Wilsonville, OR, USA 97070

Do you accept these terms? [accept,decline]

<- accept

This release of PGI software includes the ACML, which is a tuned

math library designed for high performance on AMD64 machines,

including Opteron(TM) and Athlon(TM) 64, and includes both 32-bit

and 64-bit library versions.

More information about the ACML can be found at the ACML web site:

http://www.developwithamd.com/acml

Install the ACML? [y/n]

<- y

If you agree to abide by the terms and conditions of this Agreement,

please click “Accept.”  IF YOU DO NOT AGREE TO ABIDE BY THE TERMS

AND CONDITIONS OF THIS AGREEMENT AND CLICK “DECLINE,” YOU MAY NOT

USE THE LICENSED MATERIALS AND MUST DESTROY THEM OR RETURN THEM

TO AMD IMMEDIATELY.

Do you accept these terms? [accept,decline]

<- accept

Installation directory? [/usr/pgi]

설치할 디렉토리를 지정한다. (default)

If you don’t already have permanent keys for this product/release, a

fifteen-day evaluation license can be created now.

Create an evaluation license? [y/n]

<- y

PGI Software: PGI Fortran/C/C++ compilers and tools for 32-bit x86

and 64-bit AMD64 processor-based computer systems.

Do you accept these terms? [accept,decline]

<- accept

Creating temporary license.

Please enter your name: <- root

Please enter your user name: <- root

Please enter your E-mail address: <- root@localhost

You have entered the following information:

        name                 root

        user name            root

        E-mail address       root@localhost

Do you wish to change anything? [yes/no]:

<- no

License acquired

The above information was saved to /usr/pgi/license.info.

Do you want the files in the install directory to be read-only? [y,n]

<- y

*** 설치 끝 ***

2. PGI Compiler 환경 설정

# vi /etc/profile.d/pgi.sh

———————————————————————-

#!/bin/sh

export PGI=/usr/pgi

export PATH=$PGI/linux86-64/6.0/bin:$PATH

export MANPATH=$MANPATH:$PGI/linux86-64/6.0/man

export LM_LICENSE_FILE=$PGI/license.dat

———————————————————————-

# source /etc/profile.d/pgi.sh

*** 주의 : 실제 설치된 경로를 확인하고 실제 경로에 맞게 수정한다.

3. 설치 확인

# pgf77 test.f

———————————————————————–

NOTE: your evaluation license will expire in 14 days, 23.5 hours.

For a permanent license, please read the order acknowledgement

that you received.  Connect to https://www.pgroup.com/License with

the username and password in the order acknowledgement.

        Name:   root

        User:   root

        Email:  root@localhost

        Hostid: PGI=0013D4E0BA225055DE54B8

PGFTN-F-0002-Unable to open source input file: test.f

이와 같이 나오면 정상 …

*** 참고 : PGI 는 상용 컴파일러로 15일 동안 사용할 권한이 주어진다.

15일 이후에서 다시 라이센스를 발급 받아야 한다.

*** PGI Compiler 테스트

# cd /usr/pgi/linux86-64/6.0/EXAMPLES/linpack/UNIX

# make

# ./linpkrd

————————————————————————–

     norm. resid      resid           machep         x(1)-1        x(n)-1

  1.67117300E+00  7.41628980E-14  2.22044605E-16 -1.49880108E-14 -1.89848137E-14

    times are reported for matrices of order   100

      sgefa      sgesl      total     Kflops       unit      ratio

times for array with leading dimension of 201

    0.00076    0.00003    0.00079    872490.    0.00229    0.01405

    0.00076    0.00003    0.00078    874875.    0.00229    0.01402

    0.00075    0.00003    0.00078    879148.    0.00227    0.01395

    0.00077    0.00000    0.00077    886997.    0.00225    0.01382

times for array with leading dimension of 200

    0.00075    0.00003    0.00078    880223.    0.00227    0.01393

    0.00074    0.00003    0.00078    882651.    0.00227    0.01389

    0.00075    0.00003    0.00078    884820.    0.00226    0.01386

    0.00077    0.00000    0.00077    888831.    0.00225    0.01380

ROLLED DOUBLE  PRECISION LINPACK PERFORMANCE       886997 KFLOPS

FORTRAN STOP

6. ATLAS, Intel Math Library 설치 및 설정

Software download : http://math-atlas.sourceforge.net

# tar xzvf atlas3.7.8.tar.gz

# cd ATLAS/

**************************************************************

make config CC=<ANSI C compiler>

( if you have other Compiler. Default is gcc Compiler )

***************************************************************

# make config CC=gcc

============================================================================

.

.

011

010

009

008

007

006

005

004

003

002

001

Enter number at top left of screen [0]: < enter >

=============================================================================

                                  IMPORTANT

=============================================================================

Before going any further, check

   http://math-atlas.sourceforge.net/errata.html

This is the ATLAS errata file, which keeps a running count of all known

ATLAS bugs and system problems, with associated workarounds or fixes.

IF YOU DO NOT CHECK THIS FILE, YOU MAY BE COMPILING A LIBRARY WITH KNOWN BUGS.

Have you scoped the errata file? [y]:

.

.

Configuration completed successfully.  You may want to examine the make

   make install arch=Linux_P4SSE3

# make install arch=Linux_P4SSE3

그런 후..

# cd ..

# cp -a ATLAS /usr/local/atlas

ATLAS Library Path : /usr/local/atlas/lib/Linux_P4SSE3

7. mpich 설치 및 설정

– mpich 기본 설치 –

# tar xzvf mpich.tar.gz

# cd mpich-1.2.6/

# ./configure –prefix=/usr/local/mpich-gcc –with-device=ch_p4 \\

–with-arch=LINUX

# make

# make install

– mpich config

# cd /usr/local/mpich-gcc

# cd share

# vi machines.LINUX

——————————————————————-

node00:1

node01:1

node02:1

——————————————————————-

# vi /etc/profile.d/mpich-gcc.sh

——————————————————————–

#/bin/sh

MPICH_HOME=/usr/local/mpich-gcc

PATH=$MPICH_HOME/bin:$PATH

export MPICH_HOME PATH

———————————————————————

# source /etc/profile.d/mpich-gcc.sh

– mpich test

* cpi test

# cd /usr/local/mpich-gcc/examples

# make

# mpirun -np 3 cpi

* parall_add test

*** parall_add.c file check ..

# mpicc -o parall_add parall_add.c -lmpich

# mpirun -np 3 parall_add

———————————————————————–

*****************************************************

                 Notice !!

If input is not enough large,

   Parallel method is not efficient.

This program will add from 1 to your input.

*****************************************************

Input integer number : 10000000000

   Parallel SUM = 3221225472,  Wall clock time = 4.481901

   Serial   Sum = 3221225472,  Wall clcok time = 11.650580

   SPEED UP = 2.599473

Goodbye!  : )

————————————————————————–

– mpich 고급 설치 –

* intel compiler 설치 환경 확인

cc : /opt/intel/cce/9.0/bin/icc

fc : /opt/intel/fce/9.0/bin/ifort

c++ : /opt/intel/cce/9.0/bin/icpc

* mpich + intel compiler 환경 구축

# cd /usr/local/src

# tar xzvf mpich.tar.gz

# cd mpich-1.2.6

# ./configure –prefix=/usr/local/mpich-intel -fc=/opt/intel/fce/9.0/bin/ifort -cc=/opt/intel/cce/9.0/bin/icc -c++=/opt/intel/cce/9.0/bin/icpc –with-device=ch_p4 –with-arch=LINUX

# make && make install

– mpich + pgi compiler 환경 구축

cc : /usr/pgi/linux86-64/6.0/bin/pgcc

fc : /usr/pgi/linux86-64/6.0/bin/pgf77

c++ : /usr/pgi/linux86-64/6.0/bin/pgCC

# cd /usr/local/src

# tar xzvf mpich.tar.gz

# cd mpich-1.2.7

# ./configure –prefix=/usr/local/mpich-pgi -fc=/usr/pgi/linux86-64/6.0/bin/pgf77 -cc=/usr/pgi/linux86-64/6.0/bin/pgcc -c++=/usr/pgi/linux86-64/6.0/bin/pgCC -f90=/usr/pgi/linux86-64/6.0/bin/pgf90 -f90linker=/usr/pgi/linux86-64/6.0/bin/pgf90 –with-device=ch_p4 –with-arch=LINUX –enable-f77 –enable-f90modules

# make && make install

– mpich 설정 –

[root@node00 ~]# vi /usr/local/mpich-gcc/share/machines.LINUX

—————————————————————-

# hostname:processor_num

node00:2

node01:1

—————————————————————-

**** 주의 ***********

mpich 는 node00의  /usr/local 밑에 mpich-gcc, mpich-intel, mpich-pgi 란 이름의

폴더로 생성됨.

ensync 로 전 클러스터 노드에 동기화 시키면 된다.

8. lammpi 설치 및 설정

Software download : http://lammpi.org/download

– 기본 설치

# tar xzvf lam-7.1.1.tar.gz

# cd lam-7.1.1

# ./configure –prefix=/usr/local/lam-gcc

# make && make install

– 고급 설치

pgi compiler 연동 시 ..

# CC=/usr/local/pgi/linux86-64/5.2/bin/pgcc

# CXX=/usr/local/pgi/linux86-64/5.2/bin/pgCC

# FC=/usr/local/pgi/linux86-64/5.2/bin/pgf90

# CFLAGS=-fast

# FFLAGS=-fast

# CXXFLAGS=-fast

# export CC CXX FC CFLAGS FFLAGS CXXFLAGS

# ./configure –prefix=/usr/local/lam-pgi

# make && make install

혹은 ..

./configure –prefix=/usr/local/lam CC=/usr/local/pgi/linux86-64/5.2/bin/pgcc CXX=/usr/local/pgi/linux86-64/5.2/bin/pgCC FC=/usr/local/pgi/linux86-64/5.2/bin/pgf90 CFLAGS=-fast FFLAGS=-fast

intel complier 연동 시 ..

# CC=/usr/local/intel/cc/bin/icc

# CXX=/usr/local/intel/cc/bin/icpc

# FC=/usr/local/intel/fc/bin/ifc

# export CC CXX FC

./configure –prefix=/usr/local/lam-intel CFLAGS=’-O3 -fast -unroll -axW -align’ FFLAGS=’-O3 -fast -unroll -axW -align’

# make && make install

참고 :

lamboot 수행 시 SSI boot modules 에러가 발생하거나 기본 SSI boot modules인 rsh를 ssh로

변경하고 싶을 때는 “–with-rsh=ssh” 옵션을 configure 옵션에 추가해 준다.

– lammpi 설정

# vi /etc/profile.d/lam-gcc.sh

————————————————————-

#!/bin/sh

#LAMHOME=/usr/local/lam-<compiler>

LAMHOME=/usr/local/lam-gcc

PATH=$PATH:/usr/local/lam-gcc/bin

export LAMHOME PATH

————————————————————

# source /etc/profile.dlam-gcc.sh

# vi /etc/lamhosts

————————————————————-

# 노드별 CPU 1개 일 경우

node00

node01

node02

.

# 노드별 CPU 2개 일 경우

node00 cpu=2

node01 cpu=2

node02 cpu=2

————————————————————

– lammpi test ( parall_add )

참고 :

lammpi 는 mpich 와 달리 일반 계정에서만 실행이 가능하다.

$ lamboot -v -b /etc/lamhost      ;; lamboot 실행

—————————————————————-

LAM 7.1.1/MPI 2 C++/ROMIO – Indiana University

n-1<31167> ssi:boot:base:linear: booting n0 (node00)

n-1<31167> ssi:boot:base:linear: booting n1 (node01)

.

—————————————————————-

$ lamnodes                        ;; lammpi node 구성 확인

$ mpicc -o parall_add parall_add.c -lmpi

$ mpirun -np 3 parall_add

———————————————————————–

*****************************************************

                 Notice !!

If input is not enough large,

   Parallel method is not efficient.

This program will add from 1 to your input.

*****************************************************

Input integer number : 10000000000

   Parallel SUM = 3221225472,  Wall clock time = 4.481901

   Serial   Sum = 3221225472,  Wall clcok time = 11.650580

   SPEED UP = 2.599473

Goodbye!  : )

————————————————————————–

9. nfs, nis, automount 설정

;; 사용자 통합 홈 디렉토리 및 통합 인증 시스템 환경 구축

– NFS 설정 (홈디렉토리 공유) –

* 서버 설정 *

[root@node00 ~]# vi /etc/exports

———————————————————————-

/home                   *(rw,no_root_squash)

———————————————————————-

[root@node00 ~]# /etc/rc.d/init.d/portmap restart

[root@node00 ~]# /etc/rc.d/init.d/nfs restart

[root@node00 ~]# chkconfig –level 345 portmap restart

[root@node00 ~]# chkconfig –level 345 nfs restart

* client 설정 *

[root@node01 ~]# vi /etc/auto.master

——————————————————————

/home   /etc/auto.home –timeout=60

——————————————————————

[root@node01 ~]# vi /etc/auto.home

—————————————————————–

*               -rw,soft,intr   node00:/home/&

—————————————————————–

[root@node01 ~]# /etc/rc.d/init.d/portmap restart

[root@node01 ~]# /etc/rc.d/init.d/autofs restart

[root@node01 ~]# chkconfig –level 345 portmap on

[root@node01 ~]# chkconfig –level 345 autofs restart

– NIS 설정 (계정 통합) –

/etc/host.conf 파일에 multi on 설정을 추가 한다.

[root@node00 ~]# vi /etc/host.conf

——————————————————————–

order hosts,bind

multi on

——————————————————————–

NIS 도메인 이름을 결정한다. ( DNS 의 도메인  과 NIS 도메인은 다른 차원의 것이다. )

[root@node00 ~]# nisdomainname cluster.bj

[root@node00 ~]# vi /etc/sysconfig/network

——————————————————————–

NETWORKING=yes

HOSTNAME=node00.cluster.bj

GATEWAY=192.168.1.254

NISDOMAIN=cluster.bj

——————————————————————-

– 서버 설정

그런후 NIS 시스템에서 shadow file 을 인식 할수 있게 설정을 변경한다.

/var/yp/Makefile 을 열어서 all: 로 문자열을 검색하면

[root@node00 ~]# vi /var/yp/Makefile

——————————————————————–

.

.

all: passwd group hosts rpc services netid protocols mail \\

    # publickey shadow netgrp networks ethers bootparams printcap \\

    # amd.home auto.master auto.home auto.local passwd.adjunct \\

    # timezone locale netmasks

나온다. 여기서 주석내용중의 shadow 를 주석 밖으로 빼내준다

all: passwd group hosts rpc services netid protocols mail shadow \\

    # publickey netgrp networks ethers bootparams printcap \\

    # amd.home auto.master auto.home auto.local passwd.adjunct \\

    # timezone locale netmasks

———————————————————————

그런 후 ypserv 와 yppasswd 를 시작 해 준다. 기본적으로 yp 는 rpc 를 이용하는 서비스

임으로 portmap 을 먼저 실행하여야 한다.

[root@node00 ~]# /etc/rc.d/init.d/portmap start

[root@node00 ~]# /etc/rc.d/init.d/ypserv start

[root@node00 ~]# /etc/rc.d/init.d/yppasswdd start

[root@node00 ~]# chkconfig –level 345 portmap on

[root@node00 ~]# chkconfig –level 345 ypserv on

[root@node00 ~]# chkconfig –level 345 yppasswd on

[root@node00 ~]# make -C /var/yp

– 클라이언트 설정

[root@node01 ~]# vi /etc/yp.conf

————————————————————————–

ypserver node00.cluster.bj

domain node00.cluster.bj

————————————————————————–

그런 뒤 ypbind 를 실행한다. ypbind 역시 portmap 이 먼저 실행 되어야 한다.

# /etc/rc.d/init.d/portmap restart

# /etc/rc.d/init.d/ypbind restart

그런 후 /etc/passwd, /etc/group 설정에서 nis 로 계정을 인증 하겠다는 형식이 추가 되어야 한다.

/etc/passwd 파일의 제일 하단에 < +:*:0:0::: > 을 추가해 준다.

/etc/group 파일의 제일 하단에 < +:*:0:0: > 을 추가해 준다.

[root@node01 ~]# vi /etc/passwd

————————————————————————-

.

.

gujo:x:500:501::/home/gujo:/bin/bash

+:*:0:0:::

————————————————————————-

[root@node01 ~]# vi /etc/group

————————————————————————-

.

.

gujo:x:501:

+:*:0:0:

————————————————————————-

[root@node01 ~]# vi /etc/nsswitch.conf

————————————————————————–

.

.

passwd:     files

shadow:     files

group:      files

위 내용을 ..아래로 변경 ..

passwd:     files nisplus nis

shadow:     files nisplus nis

group:      files nisplus nis

————————————————————————–

– NIS test –

[root@node01 ~]# yptest

————————————————————————–

.

.

Test 9: yp_all

gujo gujo:$1$sdwZyE3g$C0KQ5zLQB02F7G1NkCPy0.:500:501::/home/gujo:/bin/bash

test01 test01:$1$v4bmrjYa$OrF2IKtzWJTUiooXO/MD40:505:506::/home/test01:/bin/bash

alang alang:$1$/e6fwjFu$2/Hn6ujxAbBwj3LZKx/wn0:501:502::/home/alang:/bin/bash

user03 user03:$1$7S75akgI$qj5E2SmuvTeh46It2g39Q/:504:505::/home/user03:/bin/bash

user02 user02:$1$UyZf0tLw$oRAVYxO3bwz4NyjR8k1eb0:503:504::/home/user02:/bin/bash

user01 user01:$1$uemX02H0$wMhe1lzVqDL7XOc4AMm1u/:502:503::/home/user01:/bin/bash

————————————————————————-

[root@node01 ~]# ypcat passwd

————————————————————————-

gujo:$1$sdwZyE3g$C0KQ5zLQB02F7G1NkCPy0.:500:501::/home/gujo:/bin/bash

test01:$1$v4bmrjYa$OrF2IKtzWJTUiooXO/MD40:505:506::/home/test01:/bin/bash

alang:$1$/e6fwjFu$2/Hn6ujxAbBwj3LZKx/wn0:501:502::/home/alang:/bin/bash

user03:$1$7S75akgI$qj5E2SmuvTeh46It2g39Q/:504:505::/home/user03:/bin/bash

user02:$1$UyZf0tLw$oRAVYxO3bwz4NyjR8k1eb0:503:504::/home/user02:/bin/bash

user01:$1$uemX02H0$wMhe1lzVqDL7XOc4AMm1u/:502:503::/home/user01:/bin/bash

————————————————————————-

10. apache, ganglia 설치 및 설정

– apache 는 OS 설치 단계에서 “웹서버” 항목을 체크하도록 함.

– ganglia 설치

* ganglia server 설치 *

[root@node00 ganglia]# rpm -Uvh ganglia-clx-gmetad-3.0.1-1.x86_64.rpm

[root@node00 ganglia]# rpm -Uvh ganglia-clx-web-3.0.1-1.noarch.rpm

[root@node00 ganglia]# rpm -Uvh rrdtool-clx-1.2.10-1.x86_64.rpm

* ganglia client 설치 *

[root@node01 ganglia]# rpm -Uvh ganglia-clx-gmond-3.0.1-1.x86_64.rpm

* ganglia server 설정 *

[root@node00 clx]# vi /etc/gmetad.conf

————————————————————————–

.

.

data_source “cluster.bj” localhost

.

.

————————————————————————–

[root@node00 clx]# vi /usr/clx/html/ganglia/conf.php

————————————————————————–

.

$gmetad_root = “/var/lib/ganglia”;

$rrds = “$gmetad_root/rrds”;

define(“RRDTOOL”, “/usr/bin/rrdtool”);

.

—————————————————————————

[root@node00 clx]# /etc/rc.d/init.d/gmetad restart

* ganglia client 설정

[root@node00 clx]# vi /etc/gmond.conf

—————————————————————————

.

cluster {

  name = “bj hpc”

}

—————————————————————————-

[root@node00 clx]# ensync /etc/gmond.conf

;; gmond.conf 설정 동기화

[root@node00 clx]# /etc/rc.d/init.d/gmond restart

[root@node01 ~]# /etc/rc.d/init.d/gmond restart

11. dhcp, tftp, Pxe 설치 및 설정

######## DHCP + NFS + Tftp + Kickstart + PXE를 이용한

원격 네트워크 운영체제 자동 설치 #####################

11.1. dhcp 설치 및 설정

# cd /data1/os/rh4_x64_up2/RedHat/RPMS

# rpm -Uvh dhcp-3.0.1-12_EL.x86_64.rpm

# rpm -Uvh dhcp-devel-3.0.1-12_EL.x86_64.rpm

# vi /etc/dhcpd.conf

————————————————————-

### DHCP Server 설정 ###################################

#

#  설치 환경에 맞게 네트워크 정보 수정

#

########################################################

ddns-update-style interim;

ignore client-updates;

default-lease-time 600;

max-lease-time 7200;

option subnet-mask 255.255.255.0;

option broadcast-address 192.168.1.255;

option routers 192.168.1.254;

#option domain-name-servers 192.168.1.254;

#option domain-name “cluster.bj”;

# PXE 설정 시 아래 추가########################

allow booting;

allow bootp;

class “pxeclients” {

    match if substring (option vendor-class-identifier, 0, 9) = “PXEClient”;

    next-server 192.168.1.254;

    filename “linux-install/pxelinux.0”;

}

#############################################################################

subnet 192.168.1.0 netmask 255.255.255.0 {

   range 192.168.1.100 192.168.1.253;

}

———————————————————————-

# /etc/rc.d/init.d/dhcpd restart

11.2. NFS 설정 하기

운영체제 패키지가 있는 디렉토리를 NFS로 서비스 해야한다.

# vi /etc/exports

———————————————————————-

/data1/os/rh4_x64_up2                *(rw)

———————————————————————-

11.3. Tftp + PXE 설치 및 설정 하기

* 패키지 설치

# cd /data1/os/rh4_x64_up2/RedHat/RPMS/

# rpm -Uvh tftp-0.39-1.x86_64.rpm

# rpm -Uvh tftp-server-0.39-1.x86_64.rpm

# rpm -Uvh system-config-netboot-0.1.32-1_EL4.x86_64.rpm

* tftp 설정

# vi /etc/xinetd.d/tftp

————————————————————————-

service tftp

{

        disable = no    # disable = yes -> no

        socket_type             = dgram

        protocol                = udp

        wait                    = yes

        user                    = root

        server                  = /usr/sbin/in.tftpd

        server_args             = -s /tftpboot

        per_source              = 11

        cps                     = 100 2

        flags                   = IPv4

}

————————————————————————–

# /etc/rc.d/init.d/xinetd restart

* pxe 설정

# mkdir /tftpboot/linux-install/rhes4

# cd /data1/os/rh4_x64_up2/images/pxeboot/

# cp vmlinuz /tftpboot/linux-install/rhes4/

# cp initrd.img /tftpboot/linux-install/rhes4/

# vi /tftpboot/linux-install/pxelinux.cfg/default

—————————————————————————

default local

timeout 10

prompt 1

display msgs/boot.msg

F1 msgs/boot.msg

F2 msgs/general.msg

F3 msgs/expert.msg

F4 msgs/param.msg

F5 msgs/rescue.msg

F7 msgs/snake.msg

LABEL local

  localboot 1

LABEL node01

KERNEL rhes4/vmlinuz

APPEND initrd=rhes4/initrd.img ramdisk_size=10000 ks=nfs:192.168.1.254:/data1/os/rh4_x64_up2/ks1.cfg ksdevice=eth0

LABEL node02

KERNEL rhes4/vmlinuz

APPEND initrd=rhes4/initrd.img ramdisk_size=10000 ks=nfs:192.168.1.254:/data1/os/rh4_x64_up2/ks2.cfg ksdevice=eth0

———————————————————————————

* kickstart 설정 하기

# vi /data1/os/rh4_x64_up2/ks1.cfg

———————————————————————————

# kickstart 자동 설치

install

# nfs 로 설치 이미지 다운

nfs –server=192.168.1.254  –dir=/data1/os/rh4_x64_up2

# 언어 설정

lang en_US.UTF-8

# 지원 언어 설정

langsupport –default=ko_KR.UTF-8 en_US.UTF-8 ko_KR.UTF-8

# 키보드 설정

keyboard us

# 네트워크 설정 , 네트워크 설정 시 해당 정보를 수정해야 한다.

network –device eth0 –bootproto static –ip 192.168.1.1 –netmask 255.255.255.0 –gateway 192.168.1.254  –hostname node01.cluster.bj

# root 패스워드 (  no touch )

rootpw –iscrypted $1$P.9z.LGA$MOrwcO86rCh2IOt71tqIq1

# 방화벽 설정 firewall –disabled

# 보안 설정

selinux –disabled

authconfig –enableshadow –enablemd5

# 시간대 설정

timezone Asia/Seoul

# 설치 모드 설정 ( 만일 그래픽 모드에서 설치를 하실려면 아래 text를 삭제하면 된다.)

text

skipx

bootloader –location=mbr –append=”rhgb quiet”

# 파티션 설정

clearpart –all

part /boot –fstype ext3 –size=200

part /usr –fstype ext3 –size=10000

part swap –size=4000

part / –fstype ext3 –size=3000 –asprimary

part /var –fstype ext3 –size=2000

# 패키지 설정

%packages

@ compat-arch-development

@ engineering-and-scientific

@ admin-tools

@ editors

@ emacs

@ system-tools

@ korean-support

@ gnome-software-development

@ text-internet

@ x-software-development

@ legacy-network-server

@ dns-server

@ gnome-desktop

@ dialup

@ ftp-server

@ compat-arch-support

@ legacy-software-development

@ smb-server

@ base-x

@ server-cfg

@ sound-and-video

@ development-tools

@ graphical-internet

-evolution

-rusers

tetex-xdvi

e2fsprogs

pvm

sysstat

gftp

-rwho

rsh-server

vnc

iptraf

-evolution-webcal

kernel-devel

kernel-smp-devel

mc

nmap-frontend

thunderbird

%post

—————————————————————————————-

이제 node01 서버를 리부팅 하면..

       Welcome to Gujo Cluster Auto Installer!

                          .-=-.          .–.

              __        .’     ‘.       /  ” )

      _     .’  ‘.     /   .-.   \\     /  .-‘

     ( \\   / .-.  \\   /   /   \\   \\   /  /  

      \\ `-` /   \\  `-‘   /     \\   `-`  /

       `-.-`     ‘.____.’       `.____.’

Input Hostname of the Operation System you wish to install:

*** Example > boot : node01

와 같은 화면이 나올 것이다.

boot : 에 “node01” 이라 입력하면 자동으로 node01 에 운영체제가 설치 된다.

12. hpc benchmark tool 설치 및 설정

[root@node00 ~]# cd /home/gujo/

[root@node00 gujo]# tar xzvf hpl.tgz

[root@node00 gujo]# cd hpl

[root@node00 hpl]# cp setup/Make.Linux_ATHLON_CBLAS Make.Linux_PIV

[root@node00 hpl]# vi Make.Linux_PIV

—————————————————————————

.

.

ARCH         = Linux_PIV    # 해당 arch 명을 적어 준다. (Make.XXXXXXX)

.

.

TOPdir       = /home/gujo/hpl

.

MPdir        = /usr/local/mpich-gcc

.

LAdir        = /usr/local/ATLAS/lib/Linux_P4E64SSE3_2

.

CC           = /usr/local/mpich-gcc/bin/mpicc

CCNOOPT      = $(HPL_DEFS)

CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall

#

LINKER       = /usr/local/mpich-gcc/bin/mpicc

LINKFLAGS    = $(CCFLAGS)

—————————————————————————-

[root@node00 hpl]# vi Makefile

—————————————————————————-

.

arch             = Linux_PIV

.

—————————————————————————-

[root@node00 hpl]# vi Make.top

—————————————————————————-

.

arch                = Linux_PIV

.

—————————————————————————-

** xhpl 컴파일 **

[root@node00 hpl]# make build arch=Linux_PIV

[root@node00 hpl]# cd bin/Linux_PIV

[root@node00 Linux_PIV]# ls xhpl

xhpl

** HPL.dat 수정 **

[root@node00 Linux_PIV]# vi HPL.dat

—————————————————————————-

HPLinpack benchmark input file

Innovative Computing Laboratory, University of Tennessee

HPL.out      output file name (if any)

6            device out (6=stdout,7=stderr,file)

3            # of problems sizes (N)

10000 15000 16500            Ns

3            # of NBs

90 100 128         NBs

0            PMAP process mapping (0=Row-,1=Column-major)

1            # of process grids (P x Q)

1          Ps

3          Qs

16.0         threshold

1            # of panel fact

1         PFACTs (0=left, 1=Crout, 2=Right)

1            # of recursive stopping criterium

4          NBMINs (>= 1)

1            # of panels in recursion

2            NDIVs

3            # of recursive panel fact.

1         RFACTs (0=left, 1=Crout, 2=Right)

1            # of broadcast

0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)

1            # of lookahead depth

0            DEPTHs (>=0)

2            SWAP (0=bin-exch,1=long,2=mix)

64           swapping threshold

0            L1 in (0=transposed,1=no-transposed) form

0            U  in (0=transposed,1=no-transposed) form

1            Equilibration (0=no,1=yes)

8            memory alignment in double (> 0)

—————————————————————————-

;; 위 HPL.dat 는 프로세서가 3개로 설정이 되어 있다. 서버가 많아 지면 그기에 맞는

프로세서 수의 HPL.dat 파일의 수정이 필요하다.

Ps x Qs = 전체 프로세서 갯수

Ns x Ns x 8 = 해당 문제를 풀때 사용되는 메모리 용량

[root@node00 Linux_PIV]# mpirun -np 3 xhpl

—————————————————————————–

.

—————————————————————————-

WR00L2C4       16500   128     1     3             539.04          5.556e+00

—————————————————————————-

||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0329561 …… PASSED

||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0227421 …… PASSED

||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0046330 …… PASSED

============================================================================

T/V                N    NB     P     Q               Time             Gflops

—————————————————————————-

WR00L2C4       16500   128     1     3             524.76          5.708e+00

—————————————————————————-

||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0329561 …… PASSED

||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0227421 …… PASSED

||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0046330 …… PASSED

============================================================================

Finished     27 tests with the following results:

             27 tests completed and passed residual checks,

              0 tests completed and failed residual checks,

              0 tests skipped because of illegal input values.

—————————————————————————-

위 벤치마크에서 bj cluster system 의 최대 flops 는 5.7Glops 으로 측정됨.

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.