Sun Grid Engine parallel environmen configuration man page

NAME
     sge_pe – Sun Grid Engine parallel environment  configuration
     file format

DESCRIPTION
     Parallel environments are parallel programming  and  runtime
     environments  allowing for the execution of shared memory or
     distributed  memory  parallelized   applications.   Parallel
     environments usually require some kind of setup to be opera-
     tional before starting parallel applications.  Examples  for
     common  parallel  environments  are  shared  memory parallel
     operating systems and the  distributed  memory  environments
     Parallel  Virtual Machine (PVM) or Message Passing Interface
     (MPI).

     sge_pe allows for the definition of interfaces to  arbitrary
     parallel  environments.   Once  a  parallel  environment  is
     defined or modified with the -ap or -mp options to  qconf(1)
     and   linked   with  one  or  more  queues  via  pe_list  in
     queue_conf(5) the environment can be requested for a job via
     the -pe switch to qsub(1) together with a request of a range
     for the number of parallel processes to be allocated by  the
     job.  Additional  -l  options may be used to specify the job
     requirement to further detail.

     Note, Sun Grid Engine allows  backslashes  (\)  be  used  to
     escape  newline (\newline) characters. The backslash and the
     newline are replaced with a space (” “) character before any
     interpretation.

FORMAT
     The format of a sge_pe file is defined as follows:

  pe_name
     The name of the parallel environment as defined for  pe_name
     in sge_types(1).  To be used in the qsub(1) -pe switch.

  slots
     The number of parallel processes being  allowed  to  run  in
     total  under the parallel environment concurrently.  Type is
     number, valid values are 0 to 9999999.

  user_lists
     A comma separated  list  of  user  access  list  names  (see
     access_list(5)).  Each user contained in at least one of the
     enlisted access lists has access to  the  parallel  environ-
     ment.  If  the  user_lists  parameter  is  set  to NONE (the
     default) any user has access being not  explicitly  excluded
     via the xuser_lists parameter described below.  If a user is
     contained both in an access list enlisted in xuser_lists and
     user_lists  the  user  is  denied  access  to  the  parallel
     environment.

  xuser_lists
     The xuser_lists parameter contains a comma separated list of
     so  called user access lists as described in access_list(5).
     Each user contained in at least one of the  enlisted  access
     lists  is not allowed to access the parallel environment. If
     the xuser_lists parameter is set to NONE (the  default)  any
     user  has  access.  If a user is contained both in an access
     list enlisted in xuser_lists  and  user_lists  the  user  is
     denied access to the parallel environment.

  start_proc_args
     The invocation command line of a start-up procedure for  the
     parallel  environment.  The start-up procedure is invoked by
     sge_shepherd(8) prior to executing the job script. Its  pur-
     pose is to setup the parallel environment correspondingly to
     its needs.  An optional prefix “user@”  specifies  the  user
     under  which  this procedure is to be started.  The standard
     output of the start-up procedure is redirected to  the  file
     REQNAME.poJID  in the job’s working directory (see qsub(1)),
     with REQNAME being the name  of  the  job  as  displayed  by
     qstat(1)  and  JID  being  the  job’s identification number.
     Likewise,  the  standard  error  output  is  redirected   to
     REQNAME.peJID
     The following special variables being  expanded  at  runtime
     can  be  used  (besides  any  other strings which have to be
     interpreted by the start and stop procedures) to  constitute
     a command line:

     $pe_hostfile
          The pathname of a file containing a  detailed  descrip-
          tion  of  the  layout of the parallel environment to be
          setup by the start-up procedure. Each line of the  file
          refers  to a host on which parallel processes are to be
          run. The first entry of each line denotes the hostname,
          the second entry the number of parallel processes to be
          run on the host, the third entry the name of the queue,
          and  the  fourth  entry a processor range to be used in
          case of a multiprocessor machine.

     $host
          The name of the host on which the start-up or stop pro-
          cedures are started.

     $job_owner
          The user name of the job owner.

     $job_id
          Sun Grid Engine’s unique job identification number.

     $job_name
          The name of the job.

     $pe  The name of the parallel environment in use.

     $pe_slots
          Number of slots granted for the job.

     $processors
          The processors string as contained in the queue  confi-
          guration  (see  queue_conf(5)) of the master queue (the
          queue in which the start-up  and  stop  procedures  are
          started).

     $queue
          The cluster queue of the master queue instance.

  stop_proc_args
     The invocation command line of a shutdown procedure for  the
     parallel  environment.  The shutdown procedure is invoked by
     sge_shepherd(8) after the job script has finished. Its  pur-
     pose  is  to  stop the parallel environment and to remove it
     from all participating systems.  An optional prefix  “user@”
     specifies  the  user  under  which  this  procedure is to be
     started.  The standard output of the stop procedure is  also
     redirected  to  the  file REQNAME.poJID in the job’s working
     directory (see qsub(1)), with REQNAME being the name of  the
     job  as  displayed by qstat(1) and JID being the job’s iden-
     tification number.  Likewise, the standard error  output  is
     redirected to REQNAME.peJID
     The same special variables as  for  start_proc_args  can  be
     used to constitute a command line.

  allocation_rule
     The allocation rule is interpreted by the  scheduler  thread
     and helps the scheduler to decide how to distribute parallel
     processes among the available machines. If, for instance,  a
     parallel environment is built for shared memory applications
     only, all parallel processes have to be assigned to a single
     machine, no matter how much suitable machines are available.
     If, however, the parallel environment  follows  the  distri-
     buted  memory  paradigm,  an  even distribution of processes
     among machines may be favorable.
     The current version of the scheduler  only  understands  the
     following allocation rules:

     <int>:    An integer number fixing the number  of  processes
               per  host.  If the number is 1, all processes have
               to reside  on  different  hosts.  If  the  special
               denominator  $pe_slots  is used, the full range of
               processes as specified with the qsub(1) -pe switch
               has  to  be  allocated on a single host (no matter
               which value belonging  to  the  range  is  finally
               chosen for the job to be allocated).

     $fill_up: Starting from the best  suitable  host/queue,  all
               available  slots  are allocated. Further hosts and
               queues are “filled up” as  long  as  a  job  still
               requires slots for parallel tasks.

     $round_robin:
               From all suitable hosts a single slot is allocated
               until  all tasks requested by the parallel job are
               dispatched. If more tasks are requested than suit-
               able hosts are found, allocation starts again from
               the  first  host.  The  allocation  scheme   walks
               through  suitable  hosts  in a best-suitable-first
               order.

  control_slaves
     This parameter can be set to TRUE or FALSE (the default). It
     indicates  whether  Sun  Grid  Engine  is the creator of the
     slave tasks of a parallel application via  sge_execd(8)  and
     sge_shepherd(8) and thus has full control over all processes
     in a parallel application, which enables  capabilities  such
     as  resource  limitation and correct accounting. However, to
     gain control over the slave tasks of a parallel application,
     a  sophisticated  PE  interface  is  required,  which  works
     closely together with Sun Grid Engine  facilities.  Such  PE
     interfaces  are available through your local Sun Grid Engine
     support office.

     Please set the control_slaves parameter  to  false  for  all
     other PE interfaces.

  job_is_first_task
     The job_is_first_task parameter can be set to TRUE or FALSE.
     A  value  of  TRUE  indicates  that  the Sun Grid Engine job
     script already contains one of the  tasks  of  the  parallel
     application (the number of slots reserved for the job is the
     number of slots requested with  the  -pe  switch),  while  a
     value  of FALSE indicates that the job script (and its child
     processes) is not part of the parallel program  (the  number
     of  slots  reserved  for  the  job  is  the  number of slots
     requested with the -pe switch + 1).

     If    wallclock    accounting    is    used    (execd_params
     ACCT_RESERVED_USAGE  and/or  SHARETREE_RESERVED_USAGE set to
     TRUE)   and   control_slaves   is   set   to   FALSE,    the
     job_is_first_task  parameter  influences  the accounting for
     the job:  A value of TRUE means that accounting for cpu  and
     requested  memory  gets  multiplied  by  the number of slots
     requested with the -pe switch, if job_is_first_task  is  set
     to  FALSE,  the  accounting  information  gets multiplied by
     number of slots + 1.

  urgency_slots
     For pending jobs with a slot range PE request the number  of
     slots  is  not determined. This setting specifies the method
     to be used by Sun Grid Engine to assess the number of  slots
     such jobs might finally get.

     The assumed slot allocation has a meaning  when  determining
     the resource-request-based priority contribution for numeric
     resources as described in sge_priority(5) and  is  displayed
     when qstat(1) is run without -g t option.

     The following methods are supported:

     <int>:    The specified integer number is directly  used  as
               prospective slot amount.

     min:      The slot range minimum is used as prospective slot
               amount.  If  no  lower bound is specified with the
               range 1 is assumed.

     max:      The of the slot range maximum is used as  prospec-
               tive  slot  amount. If no upper bound is specified
               with the range the absolute maximum  possible  due
               to the PE’s slots setting is assumed.

     avg:      The average of all numbers  occurring  within  the
               job’s PE range request is assumed.

  accounting_summary
     This parameter is only checked if control_slaves (see above)
     is  set  to  TRUE and thus Sun Grid Engine is the creator of
     the slave tasks of a parallel application  via  sge_execd(8)
     and  sge_shepherd(8).   In this case, accounting information
     is available for every single slave task started by Sun Grid
     Engine.

     The accounting_summary parameter  can  be  set  to  TRUE  or
     FALSE. A value of TRUE indicates that only a single account-
     ing record is written to the accounting(5) file,  containing
     the  accounting summary of the whole job including all slave
     tasks, while  a  value  of  FALSE  indicates  an  individual
     accounting(5)  record  is  written  for every slave task, as
     well as for the master task.
     Note:   When   running   tightly   integrated   jobs    with
     SHARETREE_RESERVED_USAGE     set,     and     with    having
     accounting_summary  enabled  in  the  parallel  environment,
     reserved  usage  will only be reported by the master task of
     the parallel job.  No per parallel task usage  records  will
     be  sent  from  execd  to  qmaster,  which can significantly
     reduce load on qmaster when running large tightly integrated
     parallel jobs.

RESTRICTIONS
     Note, that the functionality of the start-up,  shutdown  and
     signaling  procedures remains the full responsibility of the
     administrator configuring  the  parallel  environment.   Sun
     Grid  Engine  will just invoke these procedures and evaluate
     their exit status. If the procedures do  not  perform  their
     tasks  properly or if the parallel environment or the paral-
     lel application behave unexpectedly, Sun Grid Engine has  no
     means to detect this.

SEE ALSO
     sge_intro(1)sge__types(1)qconf(1)qdel(1)qmod(1),
     qsub(1), access_list(5), sge_qmaster(8), sge_shepherd(8).

COPYRIGHT
     See sge_intro(1) for a full statement of rights and  permis-
     sions.

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

3 Responses

  1. бінанс 말해보세요:

    I don’t think the title of your article matches the content lol. Just kidding, mainly because I had some doubts after reading the article.

  2. Your point of view caught my eye and was very interesting. Thanks. I have a question for you.

  1. 2024년 4월 10일

    … [Trackback]

    […] Find More to that Topic: nblog.syszone.co.kr/archives/3706 […]

페이스북/트위트/구글 계정으로 댓글 가능합니다.