[클러스터] Tight MPICH integration with SGE – another doc

Sun GridEngine

From Rocks Clusters

Contents

•        1 General Information

•        2 Impressing Your Friends With SGE

•        3 Tight MPICH integration with SGE

•        4 Submitting Jobs From One Cluster To Another

•        5 Prologue/Epilogue

General Information

SGE home page

Sample SGE Submission Scripts

Impressing Your Friends With SGE

This is a place for tricks that make SGE easier to deal with.

(Title is a nod to the great old redhat page IYFW RPM.)

Tight MPICH integration with SGE

Rocks comes with a default parallel environment for SGE that facilitates tight integration. Unforunately it’s not quite complete. For the ch_p4 mpich device, the environment variable MPICH_PROCESS_GROUP must be set to no on both the frontend and compute nodes in order for SGE to maintain itself as process group leader. These are the steps I did to get it to work in Rocks 4.1 (I opted for the 2nd solution described here) Nmichaud@jhu.edu 14:11, 15 February 2006 (EST)

1. Edit /opt/gridengine/default/common/sge_request and add the following line at the end:

-v MPICH_PROCESS_GROUP=no

2. For a default Rocks setup, SGE calls /opt/gridengine/mpi/startmpi when starting an mpi job. This in turn calls /opt/gridengine/mpi/rsh. Both these files must be changed. However, each compute node has it’s own copy of these files. Instead of editing it on the frontend and copying to all the compute nodes, I found it easier to place my own copy in a subdirectory of /share/apps called mpi and then change the mpich parallel environment to call my own copies of startmpi and stopmpi (and by extension rsh). This way my one copy is exported to all the nodes and I don’t have to worry about keeping them in sync.

3. Edit /share/apps/mpi/startmpi. Change the line

rsh_wrapper=$SGE_ROOT/mpi/rsh

to

rsh_wrapper=/share/apps/mpi/rsh

4. Edit /share/apps/mpi/rsh. Change the following lines:

echo $SGE_ROOT/bin/$ARC/qrsh -inherit -nostdin $rhost $cmd

exec $SGE_ROOT/bin/$ARC/qrsh -inherit -nostdin $rhost $cmd

else

echo $SGE_ROOT/bin/$ARC/qrsh -inherit $rhost $cmd

exec $SGE_ROOT/bin/$ARC/qrsh -inherit $rhost $cmd

to:

echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd

exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd

else

echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd

exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd

5. Finally, run qconf -mp mpich. Change it from:

pe_name          mpich

slots            9999

user_lists       NONE

xuser_lists      NONE

start_proc_args  /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile

stop_proc_args   /opt/gridengine/mpi/stopmpi.sh

allocation_rule  $fill_up

control_slaves   TRUE

job_is_first_task FALSE

urgency_slots     min

to

pe_name          mpich

slots            9999

user_lists       NONE

xuser_lists      NONE

start_proc_args  /share/apps/mpi/startmpi.sh -catch_rsh $pe_hostfile

stop_proc_args   /share/apps/mpi/stopmpi.sh

allocation_rule  $fill_up

control_slaves   TRUE

job_is_first_task FALSE

urgency_slots     min

Submitting Jobs From One Cluster To Another

It is currently unknown how to do this in SGE 6. Use Globus (instructions here.)

Prologue/Epilogue

This is a place for information about the prologue & epilogue scripts that run before and after a job, respectively.

I have found that SGE is not particularly good at cleaning up after mpi jobs; it does not keep track of which other nodes are being used and killing leftover user processes on those nodes. If anyone has a good solution for this, I’d love to see it.

Retrieved from “https://wiki.rocksclusters.org/wiki/index.php/Sun_GridEngine”

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.