Parallel Jobs and SGE Parallel

1.

Parallel Jobs and SGE Parallel

Environments

When a parallel job starts running under any batch system, a mechanism must be in place through which the batch system dictates to the job not only the compute node on which the job starts, but on which nodes further processes are started (and the number of processes on each node), i.e., tight-binding must exist. Within SGE, this mechanism is part of the parallel environment, aka the PE. The PE may also take care of creating other parts of any environment required for parallel software.

Thus, any qsub script defining a parallel job, must specify not only an SGE queue but also an SGE PE.

Example: OpenMPI under SGE

This is a qsub script suitable for running an OpenMPI job on Redqueen:

  #!/bin/bash

  #$ -q parallel.q
  #$ -pe orte.pe 16
      #
      # ...specify the SGE "parallel.q" queue and **also** the "orte.pe" PE...
      #

  #$ -cwd
  #$ -S /bin/bash

  export PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/openmpi-1.3-gcc-gfortran/bin
  export LD_LIBRARY_PATH=/usr/local/openmpi-1.3-gcc-gfortran/lib
      #
      # ...ensure the OpenMPI-related executables and libaries can be found by
      #    the job...
      #

  mpirun -np $NSLOTS ./mynameis
      #
      # ...start the job!  "$NSLOTS" is the number of cores/processes/slots the
      #    job will use --- the value is set by the orte.pe PE.  
      #
      #    *** N.B. We have not specified a host/machinefile here --- OpenMPI picks
      #             this up automagically from the PE. ****
      #

We have specified the orte.pe PE which works with OpenMPI and specified a 16-process job (-pe orte.pe 16). The PE:

  • sets the value of the environment variable NSLOTS; this should be used to specify the number of processes (mpirun -np $NSLOTS);
  • builds a host/machine file for the job and this is used to determine on which compute nodes the job will run, in the usual way — though it is not necessary to give the name of this file to mpirun as OpenMPI recognises that it is running under SGE and automagically grabs the required host list.

(N.B. MPICH has no such automagic; the host/machine file must be specifed — see the example given below [Section 2.1.].)

More About SGE PEs

Each parallel application needs it own, dedicated SGE PE. For example, MPICH, OpenMPI, OpenMP, Star-CD and Fluent each have their own on Man2e, Mace01 and Redqueen.

To determine which PEs are available and to find out details of any particular PE, use qconfqconf -spl lists PEs

  prompt> qconf -spl

  fluent-16.pe
  fluent.pe
  openmp.pe
  orte.pe
  starcd.pe

while qconf -sp <pe-name> lists details

  prompt> qconf -sp fluent.pe

  pe_name            fluent.pe
  .                  .
  .                  .
  start_proc_args    /software/Fluent.Inc/setup_env
  stop_proc_args     /software/Fluent.Inc/addons/sge1.0/kill-fluent
  allocation_rule    $fill_up
  .                  .
  .                  .

2.

Example Qsub Scripts

2.1.

MPICH v1.2.7 on Mace01

  #!/bin/bash

  #$ -pe mpich 8
  #$ -q mpich.q

  #$ -cwd
  #$ -S /bin/bash

  export PATH=/bin:/usr/bin:/usr/local/mpich-ch_p4_gfortran/bin

  mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines mynameis-mpich
    #
    # -- use SGE PE-generated information:
    #     -- NSLOTS for number of processes
    #     -- TMPDIR/machines for MPI host/machinefile
    #
    # -- do **NOT** use "-nolocal"...
    #

2.2.

Star-CD on Man2e

  #$ -S /bin/bash 
  #$ -cwd

  #$ -q parallel.q
  #$ -pe starcd.pe 8

  export LM_LICENSE_FILE=1999@130.88.124.202

  STARINI=Default; export STARINI
  . /software/starcd_402_001_lam/etc/setstar

  echo " "
  echo "Command line is:"
  echo "star -dp $PNP_JOBNODES"
  star -dp $PNP_JOBNODES 
  echo " "

  exit_on_error $?

or

  #$ -S /bin/bash 
  #$ -cwd

  #$ -q parallel.q
  #$ -pe starcd.pe 8

  export LM_LICENSE_FILE=1999@130.88.124.202

  STARINI=Default; export STARINI
  . /software/starcd_402_001_lam/etc/setstar

  #
  # -- use starcd.pe-generated machinefile (from SGE's PE_HOSTFILE) :
  #
  echo " "
  echo "Command line is:"
  echo "star -dp -nodefile=$MACHINEFILE"
  star -dp -nodefile=$MACHINEFILE 
  echo " "

  exit_on_error $?

2.3.

OpenMPI on Redqueen

    #!/bin/bash

    #$ -pe orte.pe 16
    #$ -q parallel.q
        # ...or "parallel-fat.q"...
    #$ -cwd
    #$ -S /bin/bash

    export PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/openmpi-1.3-gcc-gfortran/bin
    export LD_LIBRARY_PATH=/usr/local/openmpi-1.3-gcc-gfortran/lib

    mpirun -np $NSLOTS ./mynameis

2.4.

OpenMP on Man2e

  #$ -S /bin/bash
  #$ -cwd  

  #$ -q  openmp.q
  #$ -pe openmp.pe 2

  export OMP_NUM_THREADS=$NSLOTS

  ./hello

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.