[클러스터] Sun Grid Engine(SGE)-5.3 설치하기

SGE 설치하기 [[목차]]

1.1. 공통사항 [[목차]]

1) 관리 계정 만들기

설치하려는 클러스터의 모든 노드에 sge라는 계정을 추가한다.

# adduser sge

# passwd sge

2) 서비스 포트 추가

설치하려는 클러스터의 모든 노드의 /etc/services 화일에 sge_commd를 위한

포트를 지정한다. 가능하면 1024번 이하의 포트를 지정한다.

모든 노드가 같은 포트번호를 사용하여야 한다.

# vi /etc/services



sge_commd 536/tcp # Sun Grid Engine

:wq

1.2. 마스터 노드에 설치하기 [[목차]]

주의>>

SGE는 NFS로 마운트되어 공유되고 있는 파일 시스템상에 설치해야 한다.

보통 클러스터의 경우 /home 파티션을 공유해서 사용하므로,

/home/sge/SGE 에 설치를 하도록 한다.

1) 다운로드 및 압축 풀기

설치화일은

http://wwws.sun.com/software/gridware/

http://gridengine.sunsource.net/

에서 다운로드가 가능하다.

커널 2.4의 glibc 2.1이상의 x86 리눅스의 경우 sge-5.3p2-common.tar.gz 와

sge-5.3p2-bin-glinux.tar.gz 를 다운 받는다.

sge 계정으로 로그인하여 압축을 푼다.

# su sge

$ mkdir /home/sge/SGE

$ cd /home/sge/SGE

$ tar zxvf sge-5.3p2-common.tar.gz

$ tar zxvf sge-5.3p2-bin-glinux.tar.gz

$ exit

2) install_qmaster 실행

SGE_ROOT 환경 변수를 설정한다.

# export SGE_ROOT=/home/sge/SGE

SGE는 설치 프로그램을 이용하여 설치된다.

마스터 노드에 설치할때는 install_qmaster를, 계산노드에 설치할때는

install_execd를 실행한다.

install_qmaster를 실행한다.

# ./install_qmaster

Welcome to the Grid Engine installation

—————————————

Grid Engine qmaster host installation

————————————-

Before you continue with the installation please read these hints:

– Your terminal window should have a size of at least

80×24 characters

– The INTR character is often bound to the key Ctrl-C.

The term >Ctrl-C< is used during the installation if you

have the possibility to abort the installation

The qmaster installation procedure will take approximately 5-10 minutes.

Hit <RETURN> to continue >> Enter

Confirm Grid Engine default installation settings

————————————————-

The following default settings can be used for an accelerated

installation procedure:

$SGE_ROOT = /home/sge/SGE

service = sge_commd

admin user account = sge

Do you want to use these configuration parameters (y/n) [y] >> Enter

Verifying and setting file permissions

————————————–

Did you install this version with >pkgadd< or did you already

verify and set the file permissions of your distribution (y/n) [y] >> Enter

We do not verify file permissions. Hit <RETURN> to continue >>

Verifying and setting file permissions and owner in >3rd_party<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >ckpt<

Verifying and setting file permissions and owner in >examples<

Verifying and setting file permissions and owner in >install_execd<

Verifying and setting file permissions and owner in >install_qmaster<

Verifying and setting file permissions and owner in >mpi<

Verifying and setting file permissions and owner in >pvm<

Verifying and setting file permissions and owner in >qmon<

Verifying and setting file permissions and owner in >util<

Verifying and setting file permissions and owner in >utilbin<

Verifying and setting file permissions and owner in >catman<

Verifying and setting file permissions and owner in >doc<

Verifying and setting file permissions and owner in >man<

Verifying and setting file permissions and owner in >inst_sge<

Verifying and setting file permissions and owner in >inst_sgeee<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >lib<

Verifying and setting file permissions and owner in >utilbin<

Your file permissions were set

Hit <RETURN> to continue >> Enter

Making directories

——————

creating directory: default

creating directory: default/common

creating directory: default/common/history

creating directory: default/common/local_conf

creating directory: /home/sge/SGE/default/spool/qmaster

creating directory: /home/sge/SGE/default/spool/qmaster/admin_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/ckpt

creating directory: /home/sge/SGE/default/spool/qmaster/complexes

creating directory: /home/sge/SGE/default/spool/qmaster/exec_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/job_scripts

creating directory: /home/sge/SGE/default/spool/qmaster/jobs

creating directory: /home/sge/SGE/default/spool/qmaster/pe

creating directory: /home/sge/SGE/default/spool/qmaster/queues

creating directory: /home/sge/SGE/default/spool/qmaster/submit_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/usersets

Hit <RETURN> to continue >> Enter

Select default Grid Engine hostname resolving method

—————————————————-

Are all hosts of your cluster in one DNS domain? If this is

the case the hostnames

>hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com<

is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >> Enter

Ignoring domainname when comparing hostnames.

Hit <RETURN> to continue >> Enter

Grid Engine group id range

————————–

When jobs are started under the control of Grid Engine an additional group id

is set on platforms which do not support jobs.

This additional UNIX group id range must be unused group id’s in your system.

The range must be big enough to provide enough numbers for the maximum number

of Grid Engine jobs running at a single moment on a single host. E.g. a range

like >20000-20100< means, that Grid Engine will use the group id’s from

20000-20100 and thus provides a range for 101 jobs running at the same time

on a single host.

You can change at any time the group id range in your cluster configuration.

Please enter a range >> 20000-20100

Using >20000-20100< as gid range. Hit <RETURN> to continue >> Enter

Creating local configuration

—————————-

Creating >act_qmaster< file

Adding default complexes >host< and >queue<

Adding default parallel environment (PE) for >qmake<

Adding >sge_aliases< path aliases file

Adding >qtask< qtcsh sample default request file

Adding >sge_request< default submit options file

Creating settings files for >.profile/.cshrc<

Hit <RETURN> to continue >> Enter

Grid Engine startup script

————————–

Your Grid Engine cluster wide startup script is installed as:

/home/sge/SGE/default/common/rcsge<

Hit <RETURN> to continue >> Enter

Grid Engine startup script

————————–

We can install the startup script that

Grid Engine is started at machine boot (y/n) [y] >> Enter

Installing startup script /etc/rc.d/rc3.d/S95rcsge

Hit <RETURN> to continue >> Enter

Grid Engine qmaster and scheduler startup

—————————————–

Starting qmaster and scheduler daemon. Please wait …

starting sge_qmaster

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

Reading in complexes:

Complex “host”.

Complex “queue”.

Reading in parallel environments:

PE “make”.

Reading in scheduler configuration

starting sge_schedd

Hit <RETURN> to continue >> Enter

Adding Grid Engine hosts

————————

Please now add the list of hosts, where you will later install your execution

daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may

press <RETURN> if the line is getting too long. Once you are finished

simply press <RETURN> without entering a name.

You also may prepare a file with the hostnames of the machines where you plan

to install Grid Engine. This may be convenient if you are installing Grid

Engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >> Enter

Adding admin and submit hosts

—————————–

Please enter a blank seperated list of hosts.

Stop by entering <RETURN>. You may repeat this step until you are

entering an empty list. You will see messages from Grid Engine

when the hosts are added.

Host(s): sdd114

adminhost “sdd114.foo.bar.com” already exists

sdd114.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd105

sdd105.foo.bar.com added to administrative host list

sdd105.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd106

sdd106.foo.bar.com added to administrative host list

sdd106.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd107

sdd107.foo.bar.com added to administrative host list

sdd107.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): Enter

Finished adding hosts. Hit <RETURN> to continue >> Enter

Using Grid Engine

—————–

You should now enter the command:

source /home/sge/SGE/default/common/settings.csh

if you are a csh/tcsh user or

# . /home/sge/SGE/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

– $SGE_ROOT (always necessary)

– $SGE_CELL (if you are using a cell other than >default<)

– $COMMD_PORT (if you haven’t added the service >sge_commd<)

– $PATH/$path (to find the Grid Engine binaries)

– $MANPATH (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >> Enter

Grid Engine messages

——————–

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)

/tmp/execd_messages (during execution daemon startup)

After startup the daemons log thier messages in their spool directories.

Qmaster: /home/sge/SGE/default/spool/qmaster/messages

Exec daemon: //messages

Do you want to see previous screen about using Grid Engine again (y/n) [n] >> Enter

Your Grid Engine qmaster installation is now completed

——————————————————

Please now login to all hosts where you want to run an execution daemon

and start the execution host installation procedure.

If you want to run an execution daemon on this host, please do not forget

to make the execution host installation in this host as well.

All execution hosts must be administrative hosts during the installation.

All hosts which you added to the list of administrative hosts during this

installation procedure can now be installed.

You may verify your administrative hosts with the command

# qconf -sh

and you may add new administrative hosts with the command

# qconf -ah

실행중인 데몬 확인

# ps -aux –cols=120 | grep sge

root 22398 0.0 0.0 1708 848 ? S 15:26 0:00 /home/sge/SGE/bin

/glinux/sge_commd

sge 22402 0.0 0.1 3192 1756 ? S 15:27 0:00 /home/sge/SGE/bin

/glinux/sge_qmaster

sge 22406 0.0 0.1 2716 1372 ? S 15:27 0:00 /home/sge/SGE/bin

/glinux/sge_schedd

sge_commd는 root 권한으로, sge_qmaster와 sge_schedd는 sge 사용자 권한으로

실행되고 있음을 알 수 있다.

1.3. 계산 노드에 설치하기 [[목차]]

다음 과정을 모든 계산 노드에 대해 똑같이 반복한다.

# rlogin sdd105

# cd /home/sge/SGE

SGE_ROOT 환경 변수를 설정한다.

# export SGE_ROOT=/home/sge/SGE

/etc/services 화일에 포트번호를 추가하였는지 확인한다.

# vi /etc/services



sge_commd 536/tcp # Sun Grid Engine

:wq

install_execd를 실행한다.

# ./install_execd

Welcome to the Grid Engine execution host installation

——————————————————

If you haven’t installed the Grid Engine qmaster host yet, you must execute

this step (with >install_qmaster<) prior the execution host installation.

For a sucessfull installation you need a running Grid Engine qmaster. It is

also neccesary that this host is an administrative host.

You can verify your current list of administrative hosts with

the command:

# qconf -sh

You can add an administrative host with the command:

# qconf -ah

The execution host installation will take approximately 5 minutes.

Hit <RETURN> to continue >> Enter

Grid Engine admin user account

——————————

The current directory

/home/sge/SGE

is owned by user

sge

If user >root< does not have write permissions in this directory on *all*

of the machines where Grid Engine will be installed (NFS partitions not

exported for user >root< with read/write permissions) it is recommended to

install Grid Engine that all spool files will be created under the user id

of user >sge<.

IMPORTANT NOTE: The daemons still have to be started by user >root<.

Do you want to install Grid Engine as admin user >sge< (y/n) [y] >> Enter

Installing Grid Engine as admin user >sge<

Hit <RETURN> to continue >> Enter

Checking $SGE_ROOT directory

—————————-

Your $SGE_ROOT directory: /home/sge/SGE

Hit <RETURN> to continue >> Enter

Grid Engine cells

—————–

Please enter cell name which you used for the qmaster

installation or press <RETURN> to use default cell >default< >> Enter

Using cell: >default<

Hit <RETURN> to continue >> Enter

Confirm Grid Engine default installation settings

————————————————-

The following default settings can be used for an accelerated

installation procedure:

$SGE_ROOT = /home/sge/SGE

service = sge_commd

admin user account = sge

Do you want to use these configuration parameters (y/n) [y] >> Enter

Creating local configuration

—————————-

Creating local configuration for host >sdd105.foo.bar.com<

root@sdd105.foo.bar.com modified “sdd105.foo.bar.com” in configuration list

Local configuration for host >sdd105.foo.bar.com< created.

Hit <RETURN> to continue >> Enter

Grid Engine startup script

————————–

We can install the startup script that

Grid Engine is started at machine boot (y/n) [y] >> Enter

Installing startup script /etc/rc.d/rc3.d/S95rcsge

Hit <RETURN> to continue >> Enter

Grid Engine execution daemon startup

————————————

Starting execution daemon daemon. Please wait …

starting sge_execd

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

Hit <RETURN> to continue >> Enter

Adding a default Grid Engine queue for this host

————————————————

We can now add a sample queue for this host with following attributes:

– the queue has the name >sdd105.q<

– the queue provides 1 slot(s) for jobs

– the queue provides access for any user with an account on this machine

– the queue has no Unix resource limits

You do not need to add a queue now, but before running jobs on this host

need to add a queue with >qconf< or the GUI >qmon<.

Do you want to add a default queue for this host (y/n) [y] >> Enter

root@sdd105.foo.bar.com added “sdd105.q” to queue list

Hit <RETURN> to continue >> Enter

Using Grid Engine

—————–

You should now enter the command:

source /home/sge/SGE/default/common/settings.csh

if you are a csh/tcsh user or

# . /home/sge/SGE/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

– $SGE_ROOT (always necessary)

– $SGE_CELL (if you are using a cell other than >default<)

– $COMMD_PORT (if you haven’t added the service >sge_commd<)

– $PATH/$path (to find the Grid Engine binaries)

– $MANPATH (to access the manual pages)

Hit <RETURN> to see where Grid Engine logs messages >> Enter

Grid Engine messages

——————–

Grid Engine messages can be found at:

/tmp/qmaster_messages (during qmaster startup)

/tmp/execd_messages (during execution daemon startup)

After startup the daemons log thier messages in their spool directories.

Qmaster: /home/sge/SGE/default/spool/qmaster/messages

Exec daemon: //messages

Do you want to see previous screen about using Grid Engine again (y/n) [n] >> Enter

Your execution daemon installation is now completed.

실행중인 데몬 확인

# ps -aux –cols=120 | grep sge

root 10531 0.0 0.0 1688 824 ? S 15:28 0:00 /home/sge/SGE/bin

/glinux/sge_commd

sge 10533 0.0 0.1 2728 1376 ? S< 15:28 0:00 /home/sge/SGE/bin

/glinux/sge_execd

sge_commd는 root권한으로 sge_commd는 sge 유저 권한으로 실행되고 있음을

알 수 있다.

1.4. 환경설정 [[목차]]

직접 환경 변수를 설정하거나 settings.sh를 읽어들여도 된다.

참고>> csh의 경우는 settings.csh를 사용해야 한다.

settings.sh는 다음 변수들을 설정한다.

– $SGE_ROOT (always necessary)

– $SGE_CELL (if you are using a cell other than >default<)

– $COMMD_PORT (if you haven’t added the service >sge_commd<)

– $PATH/$path (to find the Grid Engine binaries)

– $MANPATH (to access the manual pages)

$ vi ~/.bashrc



# SGE

export SGE_ROOT=/home/sge/SGE

export PATH=/home/sge/SGE/bin/glinux:$PATH

export MANPATH=`man –path`:/home/sge/SGE/man

# or

# . /home/sge/SGE/default/common/settings.sh

:wq

1.5. 설치확인 [[목차]]

큐 설정 명령어인 qconf를 이용하여 설치를 확인하자.

참고>> man qconf

$ qconf -sel # execution host list

sdd105.foo.bar.com

sdd106.foo.bar.com

sdd107.foo.bar.com

$ qconf -se sdd105 # execution host definition

hostname sdd105.foo.bar.com

load_scaling NONE

complex_list NONE

complex_values NONE

load_values load_avg=0.000000,load_short=0.000000,load_medium=0.0

00000,load_long=0.000000,arch=glinux,num_proc=1,mem_free=915.554688M,swap_free=1

004.023438M,virtual_free=1919.578125M,mem_total=1004.511719M,swap_total=1004.023

438M,virtual_total=2008.535156M,mem_used=88.957031M,swap_used=0.000000M,virtual_

used=88.957031M,cpu=0.000000,np_load_avg=0.000000,np_load_short=0.000000,np_load

_medium=0.000000,np_load_long=0.000000

processors 1

user_lists NONE

xuser_lists NONE

$ qconf -secl # event client list

ID NAME HOST

————————————————–

1 scheduler sdd114.foo.bar.com

$ qconf -sep

# the number of licenced processors per execution host and in total

HOST PROCESSOR ARCH

===============================================

sdd105.foo.bar.com 1 glinux

sdd106.foo.bar.com 1 glinux

sdd107.foo.bar.com 1 glinux

===============================================

SUM 3

$ qconf -sh # administrative host

sdd114.foo.bar.com

sdd105.foo.bar.com

sdd106.foo.bar.com

sdd107.foo.bar.com

$ qconf -ss # submit host list

sdd114.foo.bar.com

sdd105.foo.bar.com

sdd106.foo.bar.com

sdd107.foo.bar.com

$ qconf -sm # managers list

sge

root

$ qconf -so # operator list

root

$ qconf -sql # list of all currently defined queues

sdd105.q

sdd106.q

sdd107.q

1.6. 데몬 재시작 하기 [[목차]]

SGE 데몬을 재시작 하려면

설치과정에서 /etc/rc.d/init.d/의 rcsge 스크립트를 이용하면 된다.

마스터 노드와 계산노드 모두 같은 방법이다.

# /etc/rc.d/init.d/rcsge stop

Shutting down Grid Engine execution daemon

Shutting down Grid Engine communication daemon

# /etc/rc.d/init.d/rcsge start

starting sge_execd

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

1.7. 간단한 작업 예제 [[목차]]

examples 디렉토리의 간단한 예제를 실행해 보자.

$ cd /home/sge/SGE/examples/jobs/

$ cat simple.sh

#!/bin/sh



# request Bourne shell as shell for job

#$ -S /bin/sh

# print date and time

date

# Sleep for 20 seconds

sleep 20

# print date and time again

date

$ qsub simple.sh

your job 1 (“simple.sh”) has been submitted

$ qstat

job-ID prior name user state submit/start at queue maste

r ja-task-ID

——————————————————————————–

————-

1 0 simple.sh sge t 10/18/2002 17:25:12 sdd105.q MASTE

R

참고>> 작업의 상태는

d(eletion), t(ransfering), r(unning), R(estarted), s(uspended), S(uspended),

T(hreshold), w(aiting) or h(old) 를 의미한다.

출력 화일은 홈디렉토리에 생성된다.

참고>> 출력 화일의 위치를 지정하려면

#$ -e path/to/err.file

#$ -o path/to/out.file

명령을 써주면 된다.

$ cd ~/

$ ls -la simple*

-rw-r–r– 1 sge sge 0 Oct 18 17:25 simple.sh.e1

-rw-r–r– 1 sge sge 58 Oct 18 17:25 simple.sh.o1

$ cat simple.sh.o5

Fri Oct 18 17:25:10 KST 2002

Fri Oct 18 17:25:30 KST 2002

SGE와 관련 있는 환경변수는 다음 작업을 실행해 보면 알 수 있다.

자세한 설명은 man qsub를 참조한다.

$ cat env.sh

#!/bin/sh

hostname

pwd

echo PID=$$

echo SGE_O_HOME=$SGE_O_HOME

echo SGE_O_HOST=$SGE_O_HOST

echo SGE_O_LOGNAME=$SGE_O_LOGNAME

echo SGE_O_MAIL=$SGE_O_MAIL

echo SGE_O_PATH=$SGE_O_PATH

echo SGE_O_SHELL=$SGE_O_SHELL

echo SGE_O_WORKDIR=$SGE_O_WORKDIR

echo ARC=$ARC

echo SGE_STDERR_PATH=$SGE_STDERR_PATH

echo SGE_STDOUT_PATH=$SGE_STDOUT_PATH

echo SGE_JOB_SPOOL_DIR=$SGE_JOB_SPOOL_DIR

echo SGE_TASK_ID=$SGE_TASK_ID

echo ENVIRONMENT=$ENVIRONMENT

echo HOME=$HOME

echo HOSTNAME=$HOSTNAME

echo JOB_ID=$JOB_ID

echo JOB_NAME=$JOB_NAME

echo LOGNAME=$LOGNAME

echo NHOSTS=$NHOSTS

echo NQUEUES=$NQUEUES

echo NSLOTS=$NSLOTS

echo PATH=$PATH

echo QUEUE=$QUEUE

echo REQUEST=$REQUEST

echo RESTARTED=$RESTARTED

echo SHELL=$SHELL

echo TMPDIR=$TMPDIR

echo TMP=$TMP

echo USER=$USER

$ qsub env.sh

your job 1 (“env.sh”) has been submitted

$ cat ~/env.sh.o1

sdd107.foo.bar.com

/home/sangwan

PID=10346

SGE_O_HOME=/home/sangwan

SGE_O_HOST=sdd114

SGE_O_LOGNAME=sangwan

SGE_O_MAIL=/var/spool/mail/sangwan

SGE_O_PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/sge/SGE/bin/glinux

SGE_O_SHELL=/bin/bash

SGE_O_WORKDIR=/home/sangwan/sge

ARC=glinux

SGE_STDERR_PATH=/home/sangwan/env.sh.e1

SGE_STDOUT_PATH=/home/sangwan/env.sh.o1

SGE_JOB_SPOOL_DIR=/home/sge/SGE/default/spool/sdd107/active_jobs/1.1

SGE_TASK_ID=undefined

ENVIRONMENT=BATCH

HOME=/home/sangwan

HOSTNAME=sdd107.foo.bar.com

JOB_ID=1

JOB_NAME=env.sh

LOGNAME=sangwan

NHOSTS=1

NQUEUES=1

NSLOTS=1

PATH=/tmp/1.1.sdd107.q:/usr/local/bin:/usr/ucb:/bin:/usr/bin:/usr/X11R6/bin

QUEUE=sdd107.q

REQUEST=env.sh

RESTARTED=0

SHELL=/bin/csh

TMPDIR=/tmp/1.1.sdd107.q

TMP=/tmp/1.1.sdd107.q

USER=sangwan

참고자료 [[목차]]

1. Sun Grid Engine 5.3 관리 및 사용 설명서

http://wwws.sun.com/software/gridware/

2. Sun Grid Engine 5.3p2 man pages

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.