Sun grid engine 설치하기

작성자: 김상완(sangwan@kisti.re.kr)

최초 작성일: 2002-10-18

수정자: 김상완(sangwan@kisti.re.kr)

수정일: 2003-08-23

수정 내용: 소스로 설치하는 방법을 추가

이 문서는 Sun grid engine 을 설치하고 사용하는 방법을 설명한다.

설치에 사용된 운영체제는 RedHat 7.1이고, 이 문서에서 설치된 SGE 버젼은

sge-5.3p2이다. 리눅스(x86, glibc)에서 설치하는 방법을 설명한다.


1. SGE 설치하기

  1.1. 공통사항

  1.2. 마스터 노드에 설치하기

  1.3. 계산 노드에 설치하기

  1.4. 환경설정

  1.5. 설치확인

  1.6. 데몬 재시작 하기

  1.7. 간단한 작업 예제

2. 소스로 설치하기

  2.1. 공통사항

  2.2. 컴파일 준비 및 설치 디렉토리 생성

  2.3. 컴파일 및 화일복사

  2.4. 설치



1. SGE 설치하기  [[목차]]

1.1. 공통사항  [[목차]]

1) 관리 계정 만들기

설치하려는 클러스터의 모든 노드에 sge라는 계정을 추가한다.

# adduser sge

# passwd sge

2) 서비스 포트 추가

설치하려는 클러스터의 모든 노드의 /etc/services 화일에 sge_commd를 위한

포트를 지정한다. 가능하면 1024번 이하의 포트를 지정한다.

모든 노드가 같은 포트번호를 사용하여야 한다.

# vi /etc/services

sge_commd 536/tcp # Sun grid engine


1.2. 마스터 노드에 설치하기  [[목차]]


SGE는 NFS로 마운트되어 공유되고 있는 파일 시스템상에 설치해야 한다.

보통 클러스터의 경우 /home 파티션을 공유해서 사용하므로,

/home/sge/SGE 에 설치를 하도록 한다.

1) 다운로드 및 압축 풀기




에서 다운로드가 가능하다.

커널 2.4의 glibc 2.1이상의 x86 리눅스의 경우 sge-5.3p2-common.tar.gz 와

sge-5.3p2-bin-glinux.tar.gz 를 다운 받는다.

sge 계정으로 로그인하여 압축을 푼다.

# su sge

$ mkdir /home/sge/SGE

$ cd /home/sge/SGE

$ tar zxvf sge-5.3p2-common.tar.gz

$ tar zxvf sge-5.3p2-bin-glinux.tar.gz

$ exit

2) install_qmaster 실행

SGE_ROOT 환경 변수를 설정한다.

# export SGE_ROOT=/home/sge/SGE

SGE는 설치 프로그램을 이용하여 설치된다.

마스터 노드에 설치할때는 install_qmaster를, 계산노드에 설치할때는

install_execd를 실행한다.

install_qmaster를 실행한다.

# ./install_qmaster

Welcome to the grid engine installation


grid engine qmaster host installation


Before you continue with the installation please read these hints:

   – Your terminal window should have a size of at least

     80×24 characters

   – The INTR character is often bound to the key Ctrl-C.

     The term >Ctrl-C< is used during the installation if you

     have the possibility to abort the installation

The qmaster installation procedure will take approximately 5-10 minutes.

Hit <RETURN> to continue >> Enter

Confirm grid engine default installation settings


The following default settings can be used for an accelerated

installation procedure:

      $SGE_ROOT          = /home/sge/SGE

      service            = sge_commd

      admin user account = sge

Do you want to use these configuration parameters (y/n) [y] >> Enter

Verifying and setting file permissions


Did you install this version with >pkgadd< or did you already

verify and set the file permissions of your distribution (y/n) [y] >> Enter

We do not verify file permissions. Hit <RETURN> to continue >>

Verifying and setting file permissions and owner in >3rd_party<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >ckpt<

Verifying and setting file permissions and owner in >examples<

Verifying and setting file permissions and owner in >install_execd<

Verifying and setting file permissions and owner in >install_qmaster<

Verifying and setting file permissions and owner in >mpi<

Verifying and setting file permissions and owner in >pvm<

Verifying and setting file permissions and owner in >qmon<

Verifying and setting file permissions and owner in >util<

Verifying and setting file permissions and owner in >utilbin<

Verifying and setting file permissions and owner in >catman<

Verifying and setting file permissions and owner in >doc<

Verifying and setting file permissions and owner in >man<

Verifying and setting file permissions and owner in >inst_sge<

Verifying and setting file permissions and owner in >inst_sgeee<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >lib<

Verifying and setting file permissions and owner in >utilbin<

Your file permissions were set

Hit <RETURN> to continue >> Enter

Making directories


creating directory: default

creating directory: default/common

creating directory: default/common/history

creating directory: default/common/local_conf

creating directory: /home/sge/SGE/default/spool/qmaster

creating directory: /home/sge/SGE/default/spool/qmaster/admin_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/ckpt

creating directory: /home/sge/SGE/default/spool/qmaster/complexes

creating directory: /home/sge/SGE/default/spool/qmaster/exec_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/job_scripts

creating directory: /home/sge/SGE/default/spool/qmaster/jobs

creating directory: /home/sge/SGE/default/spool/qmaster/pe

creating directory: /home/sge/SGE/default/spool/qmaster/queues

creating directory: /home/sge/SGE/default/spool/qmaster/submit_hosts

creating directory: /home/sge/SGE/default/spool/qmaster/usersets

Hit <RETURN> to continue >> Enter

Select default grid engine hostname resolving method


Are all hosts of your cluster in one DNS domain? If this is

the case the hostnames

   >hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com<

is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >> Enter

Ignoring domainname when comparing hostnames.

Hit <RETURN> to continue >> Enter

grid engine group id range


When jobs are started under the control of grid engine an additional group id

is set on platforms which do not support jobs.

This additional UNIX group id range must be unused group id’s in your system.

The range must be big enough to provide enough numbers for the maximum number

of grid engine jobs running at a single moment on a single host. E.g. a range

like >20000-20100< means, that grid engine will use the group id’s from

20000-20100 and thus provides a range for 101 jobs running at the same time

on a single host.

You can change at any time the group id range in your cluster configuration.

Please enter a range >> 20000-20100

Using >20000-20100< as gid range. Hit <RETURN> to continue >> Enter

Creating local configuration


Creating >act_qmaster< file

Adding default complexes >host< and >queue<

Adding default parallel environment (PE) for >qmake<

Adding >sge_aliases< path aliases file

Adding >qtask< qtcsh sample default request file

Adding >sge_request< default submit options file

Creating settings files for >.profile/.cshrc<

Hit <RETURN> to continue >> Enter

grid engine startup script


Your grid engine cluster wide startup script is installed as:


Hit <RETURN> to continue >> Enter

grid engine startup script


We can install the startup script that

grid engine is started at machine boot (y/n) [y] >> Enter

Installing startup script /etc/rc.d/rc3.d/S95rcsge

Hit <RETURN> to continue >> Enter

grid engine qmaster and scheduler startup


Starting qmaster and scheduler daemon. Please wait …

   starting sge_qmaster

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

Reading in complexes:

        Complex “host”.

        Complex “queue”.

Reading in parallel environments:

        PE “make”.

Reading in scheduler configuration

   starting sge_schedd

Hit <RETURN> to continue >> Enter

Adding grid engine hosts


Please now add the list of hosts, where you will later install your execution

daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may

press <RETURN> if the line is getting too long. Once you are finished

simply press <RETURN> without entering a name.

You also may prepare a file with the hostnames of the machines where you plan

to install grid engine. This may be convenient if you are installing grid

engine on many hosts.

Do you want to use a file which contains the list of hosts (y/n) [n] >> Enter

Adding admin and submit hosts


Please enter a blank seperated list of hosts.

Stop by entering <RETURN>. You may repeat this step until you are

entering an empty list. You will see messages from grid engine

when the hosts are added.

Host(s): sdd114

adminhost “sdd114.foo.bar.com” already exists

sdd114.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd105

sdd105.foo.bar.com added to administrative host list

sdd105.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd106

sdd106.foo.bar.com added to administrative host list

sdd106.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): sdd107

sdd107.foo.bar.com added to administrative host list

sdd107.foo.bar.com added to submit host list

Hit <RETURN> to continue >> Enter

Host(s): Enter

Finished adding hosts. Hit <RETURN> to continue >> Enter

Using grid engine


You should now enter the command:

   source /home/sge/SGE/default/common/settings.csh

if you are a csh/tcsh user or

   # . /home/sge/SGE/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   – $SGE_ROOT    (always necessary)

   – $SGE_CELL    (if you are using a cell other than >default<)

   – $COMMD_PORT   (if you haven’t added the service >sge_commd<)

   – $PATH/$path (to find the grid engine binaries)

   – $MANPATH     (to access the manual pages)

Hit <RETURN> to see where grid engine logs messages >> Enter

grid engine messages


grid engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log thier messages in their spool directories.

   Qmaster:     /home/sge/SGE/default/spool/qmaster/messages

   Exec daemon: <execd_spool_dir>/<hostname>/messages

Do you want to see previous screen about using grid engine again (y/n) [n] >> Enter

Your grid engine qmaster installation is now completed


Please now login to all hosts where you want to run an execution daemon

and start the execution host installation procedure.

If you want to run an execution daemon on this host, please do not forget

to make the execution host installation in this host as well.

All execution hosts must be administrative hosts during the installation.

All hosts which you added to the list of administrative hosts during this

installation procedure can now be installed.

You may verify your administrative hosts with the command

   # qconf -sh

and you may add new administrative hosts with the command

   # qconf -ah

실행중인 데몬 확인

# ps -aux –cols=120 | grep sge

root     22398  0.0  0.0  1708  848 ?        S    15:26   0:00 /home/sge/SGE/bin


sge      22402  0.0  0.1  3192 1756 ?        S    15:27   0:00 /home/sge/SGE/bin


sge      22406  0.0  0.1  2716 1372 ?        S    15:27   0:00 /home/sge/SGE/bin


sge_commd는 root 권한으로, sge_qmaster와 sge_schedd는 sge 사용자 권한으로

실행되고 있음을 알 수 있다.

1.3. 계산 노드에 설치하기  [[목차]]

다음 과정을 모든 계산 노드에 대해 똑같이 반복한다.

# rlogin sdd105

# cd /home/sge/SGE

SGE_ROOT 환경 변수를 설정한다.

# export SGE_ROOT=/home/sge/SGE

/etc/services 화일에 포트번호를 추가하였는지 확인한다.

# vi /etc/services

sge_commd 536/tcp # Sun grid engine


install_execd를 실행한다.

# ./install_execd

Welcome to the grid engine execution host installation


qmaster를 아직 설치하지 않았다면 먼저 설치한다.

If you haven’t installed the grid engine qmaster host yet, you must execute

this step (with >install_qmaster<) prior the execution host installation.

qmaster를 실행한 호스트가 관리 호스트가 된다.

For a sucessfull installation you need a running grid engine qmaster. It is

also neccesary that this host is an administrative host.

You can verify your current list of administrative hosts with

the command:

관리 호스트의 리스트를 보려면

   # qconf -sh

You can add an administrative host with the command:

관리 호스트를 추가 하려면

   # qconf -ah

The execution host installation will take approximately 5 minutes.

Hit <RETURN> to continue >> Enter

grid engine admin user account


The current directory


is owned by user


If user >root< does not have write permissions in this directory on *all*

of the machines where grid engine will be installed (NFS partitions not

exported for user >root< with read/write permissions) it is recommended to

install grid engine that all spool files will be created under the user id

of user >sge<.

IMPORTANT NOTE: The daemons still have to be started by user >root<.

Do you want to install grid engine as admin user >sge< (y/n) [y] >> Enter

Installing grid engine as admin user >sge<

Hit <RETURN> to continue >> Enter

Checking $SGE_ROOT directory


Your $SGE_ROOT directory: /home/sge/SGE

Hit <RETURN> to continue >> Enter

grid engine cells


Please enter cell name which you used for the qmaster

installation or press <RETURN> to use default cell >default< >> Enter

Using cell: >default<

Hit <RETURN> to continue >> Enter

Confirm grid engine default installation settings


The following default settings can be used for an accelerated

installation procedure:

      $SGE_ROOT          = /home/sge/SGE

      service            = sge_commd

      admin user account = sge

Do you want to use these configuration parameters (y/n) [y] >> Enter

Creating local configuration


Creating local configuration for host >sdd105.foo.bar.com<

root@sdd105.foo.bar.com modified “sdd105.foo.bar.com” in configuration list

Local configuration for host >sdd105.foo.bar.com< created.

Hit <RETURN> to continue >> Enter

grid engine startup script


We can install the startup script that

grid engine is started at machine boot (y/n) [y] >> Enter

Installing startup script /etc/rc.d/rc3.d/S95rcsge

Hit <RETURN> to continue >> Enter

grid engine execution daemon startup


Starting execution daemon daemon. Please wait …

   starting sge_execd

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

Hit <RETURN> to continue >> Enter

Adding a default grid engine queue for this host


We can now add a sample queue for this host with following attributes:

   – the queue has the name >sdd105.q<

   – the queue provides 1 slot(s) for jobs

   – the queue provides access for any user with an account on this machine

   – the queue has no Unix resource limits

You do not need to add a queue now, but before running jobs on this host

need to add a queue with >qconf< or the GUI >qmon<.

Do you want to add a default queue for this host (y/n) [y] >> Enter

root@sdd105.foo.bar.com added “sdd105.q” to queue list

Hit <RETURN> to continue >> Enter

Using grid engine


You should now enter the command:

   source /home/sge/SGE/default/common/settings.csh

if you are a csh/tcsh user or

   # . /home/sge/SGE/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   – $SGE_ROOT    (always necessary)

   – $SGE_CELL    (if you are using a cell other than >default<)

   – $COMMD_PORT   (if you haven’t added the service >sge_commd<)

   – $PATH/$path (to find the grid engine binaries)

   – $MANPATH     (to access the manual pages)

Hit <RETURN> to see where grid engine logs messages >> Enter

grid engine messages


grid engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log thier messages in their spool directories.

   Qmaster:     /home/sge/SGE/default/spool/qmaster/messages

   Exec daemon: //messages

Do you want to see previous screen about using grid engine again (y/n) [n] >> Enter

Your execution daemon installation is now completed.

실행중인 데몬 확인

# ps -aux –cols=120 | grep sge

root     10531  0.0  0.0  1688  824 ?        S    15:28   0:00 /home/sge/SGE/bin


sge      10533  0.0  0.1  2728 1376 ?        S<   15:28   0:00 /home/sge/SGE/bin


sge_commd는 root권한으로 sge_commd는 sge 유저 권한으로 실행되고 있음을

알 수 있다.

1.4. 환경설정  [[목차]]

직접 환경 변수를 설정하거나 settings.sh를 읽어들여도 된다.

참고>> csh의 경우는 settings.csh를 사용해야 한다.

settings.sh는 다음 변수들을 설정한다.

– $SGE_ROOT    (always necessary)

– $SGE_CELL    (if you are using a cell other than >default<)

– $COMMD_PORT   (if you haven’t added the service >sge_commd<)

– $PATH/$path (to find the grid engine binaries)

– $MANPATH     (to access the manual pages)

$ vi ~/.bashrc


export SGE_ROOT=/home/sge/SGE

export PATH=/home/sge/SGE/bin/glinux:$PATH

export MANPATH=`man –path`:/home/sge/SGE/man

# or

# . /home/sge/SGE/default/common/settings.sh


1.5. 설치확인  [[목차]]

큐 설정 명령어인 qconf를 이용하여 설치를 확인하자.

참고>> man qconf

$ qconf -sel # execution host list




$ qconf -se sdd105 # execution host definition

hostname                   sdd105.foo.bar.com

load_scaling               NONE

complex_list               NONE

complex_values             NONE

load_values                load_avg=0.000000,load_short=0.000000,load_medium=0.0






processors                 1

user_lists                 NONE

xuser_lists                NONE

$ qconf -secl # event client list

      ID NAME            HOST


       1 scheduler       sdd114.foo.bar.com

$ qconf -sep

       # the number of licenced processors per execution host and in total

HOST                      PROCESSOR        ARCH


sdd105.foo.bar.com                1      glinux

sdd106.foo.bar.com                1      glinux

sdd107.foo.bar.com                1      glinux


SUM                               3

$ qconf -sh # administrative host





$ qconf -ss # submit host list





$ qconf -sm # managers list



$ qconf -so # operator list


$ qconf -sql # list of all currently defined queues




1.6. 데몬 재시작 하기  [[목차]]

SGE 데몬을 재시작 하려면

설치과정에서 /etc/rc.d/init.d/의 rcsge 스크립트를 이용하면 된다.

마스터 노드와 계산노드 모두 같은 방법이다.

# /etc/rc.d/init.d/rcsge stop

   Shutting down grid engine execution daemon

   Shutting down grid engine communication daemon

# /etc/rc.d/init.d/rcsge start

   starting sge_execd

starting program: /home/sge/SGE/bin/glinux/sge_commd

using service “sge_commd”

bound to port 536

1.7. 간단한 작업 예제  [[목차]]

examples 디렉토리의 간단한 예제를 실행해 보자.

$ cd /home/sge/SGE/examples/jobs/

$ cat simple.sh


# request Bourne shell as shell for job

#$ -S /bin/sh

# print date and time


# Sleep for 20 seconds

sleep 20

# print date and time again


$ qsub simple.sh

your job 1 (“simple.sh”) has been submitted

$ qstat

job-ID  prior name       user         state submit/start at     queue      maste

r  ja-task-ID



      1     0 simple.sh  sge          t     10/18/2002 17:25:12 sdd105.q   MASTE


참고>> 작업의 상태는

d(eletion),  t(ransfering), r(unning), R(estarted), s(uspended), S(uspended),

T(hreshold), w(aiting) or h(old) 를 의미한다.

출력 화일은 홈디렉토리에 생성된다.

참고>> 출력 화일의 위치를 지정하려면

  #$ -e path/to/err.file

  #$ -o path/to/out.file

  명령을 써주면 된다.

$ cd ~/

$ ls -la simple*

-rw-r–r–    1 sge      sge             0 Oct 18 17:25 simple.sh.e1

-rw-r–r–    1 sge      sge            58 Oct 18 17:25 simple.sh.o1

$ cat simple.sh.o5

Fri Oct 18 17:25:10 KST 2002

Fri Oct 18 17:25:30 KST 2002

SGE와 관련 있는 환경변수는 다음 작업을 실행해 보면 알 수 있다.

자세한 설명은 man qsub를 참조한다.

$ cat env.sh




echo PID=$$








echo ARC=$ARC




















echo TMP=$TMP


$ qsub env.sh

your job 1 (“env.sh”) has been submitted

$ cat ~/env.sh.o1

































2. 소스로 설치하기  [[목차]]

설치할 클러스터의 모든 노드에 관리계정을 추가하고, 서비스 포트를 추가한다.

2.1. 공통사항  [[목차]]

# adduser sge

# passwd sge

# cat /etc/services

sge_commd 536/tcp # Sun grid engine

관리상 편의를 위해 첫번째 노드에 계정과 서비스를 추가하고, 다른 노드로 관련화일을

복사하는 것이 좋다. /etc/passwd, /etc/shadow, /etc/group, /etc/services 화일을

모든 노드에 동일하게 유지한다.

2.2. 컴파일 준비 및 설치 디렉토리 생성  [[목차]]

설치 디렉토리를 만든다.

# su sge

$ mkdir /home/sge/SGE

압축화일을 풀고 source 디렉토리로 이동

$ cd /home/sge

$ tar zxvf sge-V53p2_TAG-src.tar.gz

$ cd gridengine/source/

SGE_ROOT 환경 변수를 설정한다.

$ export SGE_ROOT=/home/sge/SGE

설치를 위해 다음 화일들을 참조한다.






2.3. 컴파일 및 화일복사  [[목차]]

컴파일 및 설치를 위해서 패키지에 포함된 aimk라는 스크립트를 이용한다.

aimk에 옵션을 주어 실행함으로써, 원하는 부분만을 빌드할 수 있다.

aimk는 Architecture Independent MaKe를 의미한다.

헤더 화일의 의존성을 채크하기 위한 도구인 sge_depend를 만든다.

sge_depend에 대한 자세한 정보는 3rdparty/sge_depend/sge_depend.html

를 참고한다.

sge_depend 생성

$ ./aimk -only-depend

3rdparty/sge_depend/LINUX6/sge_depend 가 생성된다.

의존성 화일 생성

길이가 0인 의존성 화일을 만든다.

$ ./scripts/zerodepend

dependencies 화일을 만든다.

$ ./aimk depend

aimk의 도움말을 보려면 다음을 실행한다.

$ ./aimk -help

grid engine의 핵심 시스템(데몬, 컴맨드 라인 클라이언트)를 생성한다.

$ ./aimk -only-core

man page와 qmon help 컴파일하기

$ ./aimk -man

$ ./aimk -mankv

$ ./aimk -only-qmon

이제 컴파일이 끝나고, 생성된 바이너리 화일을 적절한 위치로 옮겨주는

작업이 남았다. 이 과정에서 scripts/distinst 라는 스크립트를 이용하는데,

이 스크립트를 myinst라는 이름으로 실행하면 필요한 화일들을 $SGE_ROOT 디렉토리

아래로 복사를 해 준다.

자세한 사용법은 scripts/README.distinst 를 참고하자.

myinst라는 심볼릭 링크를 만들고, 옵션을 주어 실행한다.

$ ln -s scripts/distinst myinst

$ ./myinst -allall glinux

    Installing: sge_qmaster sge_execd sge_shadowd sge_commd sge_schedd sge_shepherd sge_coshepherd qstat qsub qalter qconf qdel qacct qmod qsh commdcntl utilbin jobs qmon qhold qrls qhost qmake qtcsh distcommon

Architectures: glinux

Base directory: /home/sge/SGE

   OK [Y/N][Y]:


2.4. 설치  [[목차]]

이제 $SGE_ROOT 디렉토리에 설치에 필요한 화일들이 복사가 되었다.

1. SGE 설치하기에서 설명한 대로 root 권한으로 install_qmaster와

install_qexec를 실행하면 된다.

Master 호스트에서

# cd $SGE_ROOT

# ./install_qmaster

Execution 호스트에서

# cd $SGE_ROOT

# ./install_qexec

참고자료  [[목차]]

1. Sun grid engine 5.3 관리 및 사용 설명서


2. Sun grid engine 5.3p2 man pages


