How to upgrade Sun Grid Engine

How to Upgrade Sun Grid Engine (SGE) and Migrate to New Server

Sometimes you want to upgrade software and migrate hardware at the same time. If you want to do that with SGE, then you’re looking in the right place.

My architecture: old server SGE 6.1u2 on CentOS 5, migrating to SGE 6.2u3 on CentOS 5.3.

Upgrade Procedure

1) Download SGE onto the new server. If you’re feeling farsighted, fill out the Planning Checklist.

2) Unzip, untar, an set $SGE_ROOT to your untar’ed folder.

OldCentOS# export SGE_ROOT=/directory/to/sge/

Note: You may consider putting SGE root in an NFS directory in case you want to create a “high availability” fail-over environment. NFS may very well slow you down.

3) Find save_sge_config.sh and copy it over to the old host.

4) Create a copy of your configuration using save_sge_config.sh

OldCentOS# mkdir sge_config_folder
OldCentOS# /path/to/save_sge_config.sh sge_config_folder

5) Copy over your config folder

6) Edit save_config_folder/cell/qmaster and change the old hostname (OldCentOS) to the new hostname (NewCentOS). Otherwise you’ll get an error like this:

Upgrade must be started on a qmaster host!

7) Run upgrade

NewCentOS# $SGE_ROOT/inst_sge -upd

8) Follow prompts. This is when you should reach for that planning checklist from step 1.

9)  Be patient. Depending on the size your configuration, certain portions might take a long time.

10) Check your install

NewCentOS# ps -ef | grep sge
NewCentOS# qstat -f


—————————————————————————————


Upgrading From a Previous Version of Sun Grid Engine Software


About Upgrading the Software

Note

The upgrade procedure is now partly destructive. See the constraints.

The LD_LIBRARY_PATH variable is not set in Grid Engine 6.2 software. Remove the existing LD_LIBRARY_PATH settings from 6.0 before you start a 6.2 installation.

Before you begin the upgrade process, make sure that you source the existing $SGE_ROOT/$SGE_CELL/common/settings.sh or $SGE_ROOT/$SGE_CELL/common/settings.csh file.

The upgrade procedure uses the cluster configuration information from the older version of the software to install the Grid Engine 6.2 software on the master host. Beginning with the Sun Grid Engine 6.2 release, you can install 6.2 to a different $SGE_ROOT or $SGE_CELL and transfer the old configuration to this cluster. This method is called cloned cluster configuration. You might want to use this method to accomplish the following:


To test the upgrade before making the real upgrade.

To keep the old cluster running.

Before You Upgrade

Choose one of the following methods to upgrade to 6.2:


New 6.2 installation (different $SGE_ROOT or $SGE_CELL) using the same configuration as was used for the old cluster (cloned cluster configuration).

If you use the cloned cluster configuration, you do not have to stop or in any way affect the original cluster. You simply install a new qmaster and transfer the configuration from the old cluster to the new one. Then, you manually restart the new execution daemons on all the original execution hosts.

The disadvantage of the cloned configuration method is that you have to install the new qmaster and might loose some of the configuration information during the upgrade (see the constraints). Another disadvantage is that the original execution host will now have twice as many slots – one set for the old cluster and one for the new one.

Real upgrade of the existing cluster (same $SGE_ROOT and $SGE_CELL.)

Constraints

The following constraints apply to both upgrade methods:


Dynamic and static load values will be lost (only static values will be recreated).

The sharetree usage will be lost.

Neither jobs nor advanced reservations (ARs) will be replicated.

There might be running or pending jobs in the cluster when the configuration is saved. If you decide to install the new Sun Grid Engine version in the same $SGE_ROOT and $SGE_CELL, then you must remove all jobs from the old cluster before the old cluster is shutdown and the new software is installed.

The previous state of a disabled queue will be lost if the queue config initial_state is set to default.

Additional Constraints for the New 6.2 Installation with Cloned Configuration

For the cloned cluster configuration, you must also define several new variables and directories that must be different from the original settings:


$SGE_ROOT

$SGE_CELL

$SGE_CLUSTER_NAME

$SGE_QMASTER_PORT

$SGE_EXECD_PORT

Master daemon spooling directory (qmaster_spool_dir)

Execution daemon spooling directory (execd_spool_dir)

Group ID range for the jobs (gid_range)

Caution

Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.

Note

Because there have been significant changes in the Grid Engine 6.2 software, loading the configuration adds and removes some configuration attributes. Adding and removing configuration attributes might affect the operation of the cluster.

To ensure stability, you should always follow this process:

Upgrade to the new $SGE_ROOT or $SGE_CELL (cloned cluster configuration).

Test that the original cluster configuration did not change and that the functionality of the cluster remains intact.

Perform the real upgrade of the original cluster, if desired.

Back Up the Configuration of the Old Cluster

You can create this backup at any time before you start the upgrade procedure. The upgrade is the same for both types of the upgrade procedures. To create the backup, at least the qmaster daemon must be running.


What the Backup Contains

The backup saves the following files:


arseqnum

jobseqnum

act_qmaster

bootstrap

cluster_name

host_aliases

qtask

sge_aliases

sge_ar_request

sge_request

sge_qstat

sge_qquota

sge_qstat

shadow_masters

accounting

dbwriter.conf

jmx directory

Caution

During the upgrade procedure, you can select the next job ID. Do not select a job ID that is less than the last job ID in the accounting file in the backup. If you do, the accounting file will contain some job IDs twice. This leads to unexpected behaviors.

To avoid the problem, accept the suggested default for the next job ID. The upgrade procedure calculates a safe value for the default.

The backup process creates the following files:


sge_root – old $SGE_ROOT

sge_cell – old $SGE_CELL

ports – old $SGE_QMASTER_PORT and $SGE_EXECD_PORT

win_hosts – A list of registered windows execution hosts at the time of the backup

The standard qconf client is used to save the complete cluster configuration.


How to Back Up the Cluster

Either download the backup script or get the backup script from the Sun Grid Engine 6.2 common package (util/upgrade_modules/save_sge_config.sh).

(Optional) Verify that the script is executable.

Source the $SGE_ROOT/$SGE_CELL/common/settings.sh (or .csh) file of the original cluster.

Run the backup script.

The backup script has one argument, which is the path to the directory in which to store the backup. The directory must not already exist, but the user must have permission to create it.

Note

You must run the backup script on an admin host (qconf -sh) as a manager or operator user (typically sgeadmin).

# ./save_sge_config.sh /backups/sge_6.1_June10_2008

The backup process displays a message confirming that the backup succeeded.

How to Install the 6.2 Software Using the Cloned Cluster Configuration Method

Caution

Do not make both the new cluster and the old cluster available to your users. If you do, execution hosts would offer the original amount of slots for both clusters and might become overloaded.

Back up the original cluster settings as described in How to Back Up the Cluster.

(Optional) ARCo Upgrade Prerequisites

If you use ARCo and you want to have the data from the old and new cluster in the same ARCo database, you cannot install the dbwriter on the new cluster, specifying the old dbwriter’s database parameters, unless the dbwriter from the old cluster is stopped and all the data from the old cluster are inserted in the database. After installing dbwriter (with the same database parameters) on the new cluster, you must not again start the dbwriter on the old cluster, otherwise your database will be compromised.

Wait to install ARCo on the new cluster until all the jobs are drained from the old cluster, the cluster is stopped and the old reporting file is processed completely.

There should be no reporting or reporting.processing file in the $SGE_ROOT/$SGE_CELL/common directory of the old cluster.

Note

Jobs can be submitted and the reporting file generated on the new cluster, as long as there is no dbwriter installed on the new cluster.

Caution

There cannot be more than one dbwriter process writing into the same ARCo database and schema.

If you create a new ARCo database for the new cluster, you cannot later merge it with the old ARCo database, due to the primary key constraints.

Once the reporting file on the old cluster is processed, on dbwriter host:

Source the cluster settings.sh (or .csh) file.

Stop the dbwriter:

# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop



Extract the new 6.2 binaries and common files to the new $SGE_ROOT directory.


Start the new upgrade installation of the qmaster from the new $SGE_ROOT directory.

# ./inst_sge -upd

This starts the upgrade procedure. See the Example Upgrade for Cloned Cluster Configuration.

Tip

To enable or disable some additional features like JMX, CSP, or use old IJS, you must provide additional flags to the upgrade script the same way you would for qmaster installation. For example, to upgrade a cluster and enable JMX thread in qmaster and CSP mode run:

./inst_sge -upd -jmx -csp


Accept the displayed license.


Enter the complete path to the backup directory.

For example, /backups/sge_6.1_June10_2008. See Step 6 in the example.


Enter the new $SGE_ROOT directory.

The default is the current directory. For more information, see SGE_ROOT. See Step 7 in the example.


Select a new $SGE_CELL directory.

The default is the $SGE_CELL directory from the backup. For more information, see SGE_CELL. See Step 8 in the example.


Select a new SGE_QMASTER_PORT number.

The default is the $SGE_QMASTER_PORT number from the backup + 2. See Step 9 in the example.


Select a new SGE_EXECD_PORT number.

The default is the $SGE_EXECD_PORT number from the backup + 2. See Step 10 in the example.


Select a new qmaster spooling directory

The default is $SGE_ROOT/$SGE_CELL/spool/qmaster. See Step 11 in the example.


Select a new $SGE_CLUSTER_NAME.

The default is p$SGE_QMASTER_PORT. For more information, see SGE_CLUSTER_NAME. See Step 12 in the example.


(Optional) Choose the JMX configuration.

For more information about JMX, see JMX guide.

If you started the upgrade using the -jmx option, one of the following choices appears:

Choose if you want to use JMX settings from the backup or use new settings.

This question appears when JMX exists in the backup.

Choose a JMX port.

This question appears when JMX does not exist in the backup.


Select a spooling method.

For more information on choosing a spooling mechanism, see Choosing Between Classic Spoooling and Database Spooling. See Step 14 in the example.


Choose if you want to use interactive jobs support (IJS) settings from the backup or use the new defaults for 6.2.

In most cases, you should use the new defaults which enable the new interactive jobs support. Step 15 in the example shows the new defaults.

Caution

If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration attributes, you should verify that the new IJS will not break your site-specific settings.



Choose the group id range

The default is the last group id from the backup + 100 and same range. See Step 16 in the example.


Select the next job ID.

The default is old jobseqnum + 1000, rounded up to the nearest 1000. See Step 17 in the example.


(Optional) Select the next AR ID.

This question appears only if arseqnum is in the backup. The default is old arseqnum + 1000, rounded up to the nearest 1000. See Step 18 in the example.


Select automatic startup options.

See Step 19 in the example.

One of the following choices appears:

Choose whether to run qmaster as an SMF service.

This question appears only on systems that run at least version 10 of the Solaris OS.

Choose whether to use RC scripts for qmaster.

This question appears on platforms that are not running at least version 10 of the Solaris OS or if you started the upgrade using the -nosmf option.


Load the old configuration.

See Step 20 in the example.

If this step fails with a critical error:

Check the log file /tmp/sge_backup_date.log.

Try to reload the configuration through the $SGE_ROOT/util/upgrade_modules/load_sge_config.sh script and the arguments displayed in the previous step.

If the preceding steps do not resolve the problem, stop the upgrade process.


(Optional) Upgrade ARCo.

If you use ARCo, you need to upgrade it. If you want to use the same ARCo database, copy the $SGE_ROOT/$SGE_CELL/common/dbwriter.conf from the old cluster into the same directory on the new cluster, it will be sourced and you will be only prompted to enter any missing information during the installtion of dbwriter. See Upgrading ARCo step 6.


Run the post upgrade procedures

Info

The post-upgrade procedures are easier when you have root access to all machines through ssh or rsh without having to enter a password. To use rsh instead of the default ssh, run the ./inst_sge command with -rsh argument. Example:

# ./inst_sge -upd-execd -rsh

Initialize the local execd spool directories

This step creates the local execd spool directories on the execd hosts with the correct permissions. Run the following command as root from the master host in $SGE_ROOT directory:

# ./inst_sge -upd-execd

(Optional) Create new RC scripts for the whole cluster.

Caution

This command removes old RC scripts. To keep the old RC scripts, do not run this command.

To start the services automatically after a reboot, run the following command as root from the master host in $SGE_ROOT directory:

## ./inst_sge -upd-rc

(Optional) Install or update the Windows helper service.

Perform this step to use the Windows execution hosts with the 6.2 cluster. When connecting to each Windows execution host, you are prompted for an administrator user to connect to the Windows host. If all your Windows hosts share the same administrative user, set the environment variable SGE_WIN_ADMIN to that user to access all Windows hosts without additional user intervention. Example:

(sh, bash)# export SGE_WIN_ADMIN=Administrator

(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator

To install or update the Windows helper service, run the following command as root from the master host in $SGE_ROOT directory:

# ./inst_sge -upd-win

Caution

Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.



Start the new execution daemons.

Optionally, if you can login without typing a password, you can start the whole cluster as root user from the $SGE_ROOT directory with a single command:

# ./inst_sge -start-all

This command starts the master daemon, shadow daemons, and all execution daemons.

Upgrade is complete.


How to Upgrade the Original Cluster to 6.2 Software (Real Upgrade)

(Optional) Test the cloned cluster, if you used the cloned cluster configuration method to transfer the configuration to a new 6.2 cluster.


Back up the original cluster settings as described in How to Back Up the Cluster.

Stop the scheduler:

# qconf -ks


Verify that no jobs are running on the cluster.


Stop the old cluster:

# qconf -ke all

# $SGE_ROOT/$SGE_CELL/common/sgemaster stop


(Optional) Stop the Berkeley DB server, if your cluster uses Berkeley DB server spooling.

On the BDB server host:

Source the cluster settings.sh (or .csh) file.

Type the following command:

# $SGE_ROOT/$SGE_CELL/common/sgebdb stop


(Optional) If you use ARCo, ensure that the reporting file has been completely processed by the dbwriter.

There should be no reporting or reporting.processing file in the $SGE_ROOT/$SGE_CELL/common directory.

Once the reporting file is processed, on dbwriter host:

Source the cluster settings.sh (or .csh) file.

Stop the dbwriter:

# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop

Warning

If you use ARCo, you must completely process the reporting file and stop the dbwriter before you continue.


Extract the new 6.2 binaries and common files to the $SGE_ROOT directory.

Caution

Do not remove any of the $SGE_ROOT directory contents, except for the case where the new Sun Grid Engine 6.2 binaries differ from the existing installation. For example, you might have used your custom lx26-amd64 binaries, but Sun Grid Engine 6.2 uses lx24-amd64 even for 2.6 kernels. In that case you must remove the old binaries manually!

You must ensure that all binaries for all used architectures were updated and no architecture with the old version remains in the $SGE_ROOT directory.



Start the new upgrade on the original qmaster host from the $SGE_ROOT directory.

# ./inst_sge -upd

Tip

To enable or disable some additional features like JMX, CSP, or to use the old IJS, you must provide additional flags to the upgrade script in the same way that you would for qmaster installation. For example, to upgrade a cluster and enable the JMX thread in qmaster and use CSP mode, run the following command: ./inst_sge -upd -jmx -csp


Accept the displayed license.


Enter the complete path to the backup directory.

For example, /backups/sge_6.1_June10_2008.


Caution

In case you you don’t specify the original $SGE_ROOT and $SGE_CELL in the next two steps, the upgrade type attempted will not be the real upgrade! Instead the clone cluster configuration method will be used.

Enter the $SGE_ROOT directory.

The default is the current directory. For more information, see SGE_ROOT.


Enter the $SGE_CELL directory.

The default is default. For more information, see SGE_CELL.


Select a new $SGE_CLUSTER_NAME.

The default value is one of the following, depending on which is found first:

The existing SGE_CLUSTER_NAME ($SGE_ROOT/$SGE_CELL/common/cluster-name)

The SGE_CLUSTER_NAME from the backup

p$SGE_QMASTER_PORT

For more information, see SGE_CLUSTER_NAME.


(Optional) Select the JMX configuration.

For more information about JMX, see JMX guide.

If you started the upgrade using the -jmx option, one of the following choices appears:

Choose if you want to use JMX settings from the backup or use new settings.

This question appears when JMX exists in the backup.

Choose a JMX port.

This question appears when JMX does not exist in the backup.


Choose if you want to keep the spooling method from the backup.


(Optional) Select a spooling method.

This is displayed if you chose not to use backup in the previous screen. See example. For more information on choosing a spooling mechanism, see Choosing Between Classic Spooling and Database Spooling.


Choose if you want to use interactive jobs support (IJS) settings from the backup or use the new defaults for 6.2.

In most cases, you should use the new defaults which enable the new interactive jobs support.

Caution

If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration attributes, you should verify that the new IJS will not break your site-specific settings.


Select the next job ID.

The default is old jobseqnum + 1000, rounded up to the nearest 1000.


(Optional) Select the next AR ID.

This question appears only if arseqnum is in the backup. The default is old arseqnum + 1000, rounded up to the nearest 1000.


Choose automatic startup options.

One of the following choices appears:

Choose whether to run qmaster as an SMF service.

This question appears only on systems that run at least version 10 of the Solaris OS.

Choose whether to use RC scripts for qmaster.

This question appears on platforms that are not running at least version 10 of the Solaris OS or if you started the upgrade using the -nosmf option.


Load the old configuration.

If this step fails with a critical error:

Check the log file /tmp/sge_backup_date.log.

Try to reload the configuration through the $SGE_ROOT/util/upgrade_modules/load_sge_config.sh script and the arguments displayed in the previous step.

If the preceding steps do not resolve the problem, stop the upgrade process.


(Optional) Copy the binaries and the common directory to all the hosts in the cluster, if not on a shared file system

If you use local binaries or a local common directory for each host, you must copy all the new binaries and the common directory locally to each host. Ensure that all binaries are updated and no architecture with the old version remains in the $SGE_ROOT directory.

Note

If you do not perform this operation the qmaster host will have Sun Grid Engine 6.2 binaries, while the rest of the cluster will still have the old version and will not work as desired!



(Optional) Upgrade ARCo.

If you use ARCo, you need to upgrade it.See Upgrading ARCo step 6.


Run the post upgrade procedures

Info

The post-upgrade procedures are easier when you have root access to all machines through ssh or rsh without having to enter a password. To use rsh instead of the default ssh, run the ./inst_sge command with -rsh argument. Example:

# ./inst_sge -upd-execd -rsh

Initialize the local execd spool directories

This step creates the local execd spool directories on the execd hosts with the correct permissions. Run the following command as root from the master host in $SGE_ROOT directory:

# ./inst_sge -upd-execd

(Optional) Create new RC scripts for the whole cluster.

Caution

This command removes old RC scripts. To keep the old RC scripts, do not run this command.

To start the services automatically after a reboot, run the following command as root from the master host in $SGE_ROOT directory:

## ./inst_sge -upd-rc

(Optional) Install or update the Windows helper service.

Perform this step to use the Windows execution hosts with the 6.2 cluster. When connecting to each Windows execution host, you are prompted for an administrator user to connect to the Windows host. If all your Windows hosts share the same administrative user, set the environment variable SGE_WIN_ADMIN to that user to access all Windows hosts without additional user intervention. Example:

(sh, bash)# export SGE_WIN_ADMIN=Administrator

(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator

To install or update the Windows helper service, run the following command as root from the master host in $SGE_ROOT directory:

# ./inst_sge -upd-win

Caution

Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.



Start the new execution daemons.

Optionally, if you can login without typing a password, you can start the whole cluster as root user from the $SGE_ROOT directory with a single command:

# ./inst_sge -start-all

This command starts the master daemon, shadow daemons, and all execution daemons.


—————————————————————————————-


Example Upgrade for Cloned Cluster Configuration

The following upgrade example uses a copy of the existing cluster configuration with a different $SGE_CELL. This example does not use JMX and there are no Service Tags. The steps in this example are referred to from the software upgrade description at How to Install the 6.2 Software Using the Cloned Cluster Configuration Method.



Steps 4 and 5

# ./inst_sge -upd


Welcome to the Grid Engine Upgrade Procedure

——————————————–


Before you continue with the upgrade, read these hints:


   – Your terminal window should have a size of at least

     80×24 characters


   – At any time during the upgrade process, use your standard

     interrupt key to abort the upgrade. Typically, the interrupt

     key combination is Ctrl-C.


The upgrade procedure will take approximately 1-2 minutes.


Hit <RETURN> to continue >>


Step 6

Type the complete path to the Grid Engine configuration backup directory.

————————————————————————-

Backup directory  >> /tmp/bck


Found backup from GE 6.1u4 version created on 2008-06-10_10:56:29

Continue with this backup directory (y/n) [y] >>


Step 7

The Grid Engine root directory is:


   $SGE_ROOT = /sge


If this directory is not correct (e.g. it may contain an automounter

prefix) enter the correct path to this directory or hit <RETURN>

to use default [/sge] >>


Your $SGE_ROOT directory: /sge


Hit <RETURN> to continue >>


Step 8

Grid Engine cells

—————–


Grid Engine supports multiple cells.


If you are not planning to run multiple Grid Engine clusters or if you don’t

know yet what is a Grid Engine cell it is safe to keep the default cell name


   default


If you want to install multiple cells you can enter a cell name now.


The environment variable


   $SGE_CELL=<your_cell_name>


will be set for all further Grid Engine commands.


Enter cell name [default] >> new_cell


Using cell >new_cell<.

Hit <RETURN> to continue >>


Step 9

Grid Engine TCP/IP communication service

—————————————-


The port for sge_qmaster is currently set by the shell environment.


   SGE_QMASTER_PORT = 21640


Now you have the possibility to set/change the communication ports by

using the

 >shell environment< or you may configure it via a network service,

configured

in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form


    sge_qmaster <port_number>/tcp


to your services database and make sure to use an unused port number.


How do you want to configure the Grid Engine communication ports?


Using the >shell environment<:                           [1]


Using a network service like >/etc/service<, >NIS/NIS+<: [2]


(default: 1) >>


Grid Engine TCP/IP communication service

—————————————-


Using the environment variable


   $SGE_QMASTER_PORT=21640


as port for communication.


Do you want to change the port number? (y/n) [n] >>


Step 10

Grid Engine TCP/IP communication service

—————————————-


The port for sge_execd is currently set by the shell environment.


   SGE_EXECD_PORT = 21641


Now you have the possibility to set/change the communication ports by

using the

 >shell environment< or you may configure it via a network service,

configured

in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form


    sge_execd <port_number>/tcp


to your services database and make sure to use an unused port number.


How do you want to configure the Grid Engine communication ports?


Using the >shell environment<:                           [1]


Using a network service like >/etc/service<, >NIS/NIS+<: [2]


(default: 1) >>


Grid Engine TCP/IP communication service

—————————————-


Using the environment variable


   $SGE_EXECD_PORT=21641


as port for communication.


Do you want to change the port number? (y/n) [n] >>


Step 11

Grid Engine qmaster spool directory

———————————–


The qmaster spool directory is the place where the qmaster daemon stores

the configuration and the state of the queuing system.


The admin user >sgeadmin< must have read/write access

to the qmaster spool directory.


If you will install shadow master hosts or if you want to be able to start

the qmaster daemon on other hosts (see the corresponding section in the

Grid Engine Installation and Administration Manual for details) the account

on the shadow master hosts also needs read/write access to this directory.


The following directory


[/sge/new_cell/spool/qmaster]


will be used as qmaster spool directory by default!


Do you want to select another qmaster spool directory (y/n) [n] >>


Step 12

Unique cluster name

——————-


The cluster name uniquely identifies a specific Sun Grid Engine cluster.

The cluster name must be unique throughout your organization. The name

is not related to the SGE cell.


The cluster name must start with a letter ([A-Za-z]), followed by letters,

digits ([0-9]), dashes (-) or underscores (_).


Enter new cluster name or hit <RETURN>

to use default [p21640] >>


Your $SGE_CLUSTER_NAME: p21640


Hit <RETURN> to continue >>


Step 14

creating directory: /sge/new_cell/spool/qmaster/job_scripts


Setup spooling

————–

Your SGE binaries are compiled to link the spooling libraries

during runtime (dynamically). So you can choose between Berkeley DB

spooling and Classic spooling method.

Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic


Initializing spooling database


Hit <RETURN> to continue >>


Step 15

Interactive Job Support (IJS) Selection

—————————————


The backup configuration includes information for running

interactive jobs. Do you want to use the IJS information from

the backup (‘y’) or use new default values (‘n’) (y/n) [y] >> n



Using new interactive job support default setting for a new installation.

Hit <RETURN> to continue >>


Creating >act_qmaster< file


Step 16

Grid Engine group id range

————————–


When jobs are started under the control of Grid Engine an additional

group id is set on platforms which do not support jobs. This is done

to provide maximum control for Grid Engine jobs.


This additional UNIX group id range must be unused group id’s in your

system. Each job will be assigned a unique id during the time it is

running. Therefore you need to provide a range of id’s which will

be assigned dynamically for jobs.


The range must be big enough to provide enough numbers for the

maximum number of Grid Engine jobs running at a single moment on

a single host. E.g. a range like >20000-20100< means, that Grid Engine

will use the group ids from 20000-20100 and provides a range for

100 Grid Engine jobs at the same time on a single host.


You can change at any time the group id range in your cluster configuration.


Please enter a range [34299-34498] >>


Using >34299-34498< as gid range. Hit <RETURN> to continue >>


Grid Engine cluster configuration

———————————


Please give the basic configuration parameters of your Grid Engine

installation:


   <execd_spool_dir>


The pathname of the spool directory of the execution hosts. User >sgeadmin<

must have the right to create this directory and to write into it.


Default: [/sge/new_cell/spool] >>


Grid Engine cluster configuration (continued)

———————————————


<administrator_mail>


The email address of the administrator to whom problem reports are sent.


It is recommended to configure this parameter. You may use >none<

if you do not wish to receive administrator mail.


Please enter an email address in the form >user@foo.com<.


Default: [sgeadmin@qmaster.com] >>


The following parameters for the cluster configuration were configured:


   execd_spool_dir        /sge/new_cell/spool

   administrator_mail     sgeadmin@qmaster.com


Do you want to change the configuration parameters (y/n) [n] >>


Step 17

Provide a value to use for the next job ID.

——————————————-


Backup contains last job ID 1. As a suggested value, we added 1000

to that number and rounded it up to the nearest 1000.

Increase the value, if appropriate.

Choose the new next job ID [2000] >>


Hit <RETURN> to continue >>


Step 18

Provide a value to use for the next AR ID.

——————————————


Backup contains last AR ID 1. As a suggested value, we added 1000

to that number and rounded it to the nearest 1000.

Increase the value, if appropriate.

Choose the new next AR ID [2000] >>


Hit <RETURN> to continue >>


Step 19

Creating >sgemaster< script

Creating >sgeexecd< script

Creating settings files for >.profile/.cshrc<


Hit <RETURN> to continue >>


qmaster startup script

———————-


Do you want to start qmaster automatically at machine boot?

NOTE: If you select “n” SMF will be not used at all! (y/n) [y] >> n



Grid Engine qmaster startup

—————————


Starting qmaster daemon. Please wait …

   starting sge_qmaster

Hit <RETURN> to continue >>


Step 20

Last step – load configuration from the backup

———————————————-


load command: /sge/util/upgrade_modules/load_sge_config.sh /tmp/bck -mode “copy” -log C -newijs “false” -gid_range “34299-34498” -admin_mail “sgeadmin@qmaster.com” -execd_spool_dir “/sge/new_cell/spool”



Hit <RETURN> to continue >>



Loading saved cluster configuration from /tmp/bck (log in

/tmp/sge_backup_load_2008-06-13_17:42:28.log)…


Loading saved cluster configuration from /tmp/bck (log in /tmp/sge_backup_load_2008-06-13_17:42:28.log)…

Done


If loading the configuration succeeded run these additional commands:

REQUIRED:

inst_sge -upd-execd

   This command initializes all execd spool directories.


inst_sge -upd-win

   This command connects to all Windows execution hosts and installs

   the new Windows helper service on each host.

   WARNING: If a helper service from a previous release is running

            on this host, the new helper service overwrites it. The

            host will run only in a 6.2 cluster.

   TIP: This action requires to enter a windows administrator user for each

        host interactively. If all your systems share the same administrator you

        can set the environment variable SGE_WIN_ADMIN to that user name.

        E.g.: (sh, bash) export SGE_WIN_ADMIN=Administrator

              (csh,tcsh) setenv SGE_WIN_ADMIN Administrator


OPTIONAL:

inst_sge -upd-rc

   This command creates new autostart scripts for the new cluster

   and removes any conflicting files.

   TIP: To disable SMF on Solaris systems, use the command

        inst_sge -upd-rc -nosmf


TIP: Use inst_sge -post-upd to do all above actions


서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.