How to upgrade Sun Grid Engine
How to Upgrade Sun Grid Engine (SGE) and Migrate to New Server
Sometimes you want to upgrade software and migrate hardware at the same time. If you want to do that with SGE, then you’re looking in the right place.
My architecture: old server SGE 6.1u2 on CentOS 5, migrating to SGE 6.2u3 on CentOS 5.3.
Upgrade Procedure
1) Download SGE onto the new server. If you’re feeling farsighted, fill out the Planning Checklist.
2) Unzip, untar, an set $SGE_ROOT to your untar’ed folder.
OldCentOS# export SGE_ROOT=/directory/to/sge/
Note: You may consider putting SGE root in an NFS directory in case you want to create a “high availability” fail-over environment. NFS may very well slow you down.
3) Find save_sge_config.sh and copy it over to the old host.
4) Create a copy of your configuration using save_sge_config.sh
OldCentOS# mkdir sge_config_folder
OldCentOS# /path/to/save_sge_config.sh sge_config_folder
5) Copy over your config folder
6) Edit save_config_folder/cell/qmaster and change the old hostname (OldCentOS) to the new hostname (NewCentOS). Otherwise you’ll get an error like this:
Upgrade must be started on a qmaster host!
7) Run upgrade
NewCentOS# $SGE_ROOT/inst_sge -upd
8) Follow prompts. This is when you should reach for that planning checklist from step 1.
9) Be patient. Depending on the size your configuration, certain portions might take a long time.
10) Check your install
NewCentOS# ps -ef | grep sge
NewCentOS# qstat -f
—————————————————————————————
Upgrading From a Previous Version of Sun Grid Engine Software
About Upgrading the Software
Note
The upgrade procedure is now partly destructive. See the constraints.
The LD_LIBRARY_PATH variable is not set in Grid Engine 6.2 software. Remove the existing LD_LIBRARY_PATH settings from 6.0 before you start a 6.2 installation.
Before you begin the upgrade process, make sure that you source the existing $SGE_ROOT/$SGE_CELL/common/settings.sh or $SGE_ROOT/$SGE_CELL/common/settings.csh file.
The upgrade procedure uses the cluster configuration information from the older version of the software to install the Grid Engine 6.2 software on the master host. Beginning with the Sun Grid Engine 6.2 release, you can install 6.2 to a different $SGE_ROOT or $SGE_CELL and transfer the old configuration to this cluster. This method is called cloned cluster configuration. You might want to use this method to accomplish the following:
To test the upgrade before making the real upgrade.
To keep the old cluster running.
Before You Upgrade
Choose one of the following methods to upgrade to 6.2:
New 6.2 installation (different $SGE_ROOT or $SGE_CELL) using the same configuration as was used for the old cluster (cloned cluster configuration).
If you use the cloned cluster configuration, you do not have to stop or in any way affect the original cluster. You simply install a new qmaster and transfer the configuration from the old cluster to the new one. Then, you manually restart the new execution daemons on all the original execution hosts.
The disadvantage of the cloned configuration method is that you have to install the new qmaster and might loose some of the configuration information during the upgrade (see the constraints). Another disadvantage is that the original execution host will now have twice as many slots – one set for the old cluster and one for the new one.
Real upgrade of the existing cluster (same $SGE_ROOT and $SGE_CELL.)
Constraints
The following constraints apply to both upgrade methods:
Dynamic and static load values will be lost (only static values will be recreated).
The sharetree usage will be lost.
Neither jobs nor advanced reservations (ARs) will be replicated.
There might be running or pending jobs in the cluster when the configuration is saved. If you decide to install the new Sun Grid Engine version in the same $SGE_ROOT and $SGE_CELL, then you must remove all jobs from the old cluster before the old cluster is shutdown and the new software is installed.
The previous state of a disabled queue will be lost if the queue config initial_state is set to default.
Additional Constraints for the New 6.2 Installation with Cloned Configuration
For the cloned cluster configuration, you must also define several new variables and directories that must be different from the original settings:
$SGE_ROOT
$SGE_CELL
$SGE_CLUSTER_NAME
$SGE_QMASTER_PORT
$SGE_EXECD_PORT
Master daemon spooling directory (qmaster_spool_dir)
Execution daemon spooling directory (execd_spool_dir)
Group ID range for the jobs (gid_range)
Caution
Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.
Note
Because there have been significant changes in the Grid Engine 6.2 software, loading the configuration adds and removes some configuration attributes. Adding and removing configuration attributes might affect the operation of the cluster.
To ensure stability, you should always follow this process:
Upgrade to the new $SGE_ROOT or $SGE_CELL (cloned cluster configuration).
Test that the original cluster configuration did not change and that the functionality of the cluster remains intact.
Perform the real upgrade of the original cluster, if desired.
Back Up the Configuration of the Old Cluster
You can create this backup at any time before you start the upgrade procedure. The upgrade is the same for both types of the upgrade procedures. To create the backup, at least the qmaster daemon must be running.
What the Backup Contains
The backup saves the following files:
arseqnum
jobseqnum
act_qmaster
bootstrap
cluster_name
host_aliases
qtask
sge_aliases
sge_ar_request
sge_request
sge_qstat
sge_qquota
sge_qstat
shadow_masters
accounting
dbwriter.conf
jmx directory
Caution
During the upgrade procedure, you can select the next job ID. Do not select a job ID that is less than the last job ID in the accounting file in the backup. If you do, the accounting file will contain some job IDs twice. This leads to unexpected behaviors.
To avoid the problem, accept the suggested default for the next job ID. The upgrade procedure calculates a safe value for the default.
The backup process creates the following files:
sge_root – old $SGE_ROOT
sge_cell – old $SGE_CELL
ports – old $SGE_QMASTER_PORT and $SGE_EXECD_PORT
win_hosts – A list of registered windows execution hosts at the time of the backup
The standard qconf client is used to save the complete cluster configuration.
How to Back Up the Cluster
Either download the backup script or get the backup script from the Sun Grid Engine 6.2 common package (util/upgrade_modules/save_sge_config.sh).
(Optional) Verify that the script is executable.
Source the $SGE_ROOT/$SGE_CELL/common/settings.sh (or .csh) file of the original cluster.
Run the backup script.
The backup script has one argument, which is the path to the directory in which to store the backup. The directory must not already exist, but the user must have permission to create it.
Note
You must run the backup script on an admin host (qconf -sh) as a manager or operator user (typically sgeadmin).
# ./save_sge_config.sh /backups/sge_6.1_June10_2008
The backup process displays a message confirming that the backup succeeded.
How to Install the 6.2 Software Using the Cloned Cluster Configuration Method
Caution
Do not make both the new cluster and the old cluster available to your users. If you do, execution hosts would offer the original amount of slots for both clusters and might become overloaded.
Back up the original cluster settings as described in How to Back Up the Cluster.
(Optional) ARCo Upgrade Prerequisites
If you use ARCo and you want to have the data from the old and new cluster in the same ARCo database, you cannot install the dbwriter on the new cluster, specifying the old dbwriter’s database parameters, unless the dbwriter from the old cluster is stopped and all the data from the old cluster are inserted in the database. After installing dbwriter (with the same database parameters) on the new cluster, you must not again start the dbwriter on the old cluster, otherwise your database will be compromised.
Wait to install ARCo on the new cluster until all the jobs are drained from the old cluster, the cluster is stopped and the old reporting file is processed completely.
There should be no reporting or reporting.processing file in the $SGE_ROOT/$SGE_CELL/common directory of the old cluster.
Note
Jobs can be submitted and the reporting file generated on the new cluster, as long as there is no dbwriter installed on the new cluster.
Caution
There cannot be more than one dbwriter process writing into the same ARCo database and schema.
If you create a new ARCo database for the new cluster, you cannot later merge it with the old ARCo database, due to the primary key constraints.
Once the reporting file on the old cluster is processed, on dbwriter host:
Source the cluster settings.sh (or .csh) file.
Stop the dbwriter:
# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop
Extract the new 6.2 binaries and common files to the new $SGE_ROOT directory.
Start the new upgrade installation of the qmaster from the new $SGE_ROOT directory.
# ./inst_sge -upd
This starts the upgrade procedure. See the Example Upgrade for Cloned Cluster Configuration.
Tip
To enable or disable some additional features like JMX, CSP, or use old IJS, you must provide additional flags to the upgrade script the same way you would for qmaster installation. For example, to upgrade a cluster and enable JMX thread in qmaster and CSP mode run:
./inst_sge -upd -jmx -csp
Accept the displayed license.
Enter the complete path to the backup directory.
For example, /backups/sge_6.1_June10_2008. See Step 6 in the example.
Enter the new $SGE_ROOT directory.
The default is the current directory. For more information, see SGE_ROOT. See Step 7 in the example.
Select a new $SGE_CELL directory.
The default is the $SGE_CELL directory from the backup. For more information, see SGE_CELL. See Step 8 in the example.
Select a new SGE_QMASTER_PORT number.
The default is the $SGE_QMASTER_PORT number from the backup + 2. See Step 9 in the example.
Select a new SGE_EXECD_PORT number.
The default is the $SGE_EXECD_PORT number from the backup + 2. See Step 10 in the example.
Select a new qmaster spooling directory
The default is $SGE_ROOT/$SGE_CELL/spool/qmaster. See Step 11 in the example.
Select a new $SGE_CLUSTER_NAME.
The default is p$SGE_QMASTER_PORT. For more information, see SGE_CLUSTER_NAME. See Step 12 in the example.
(Optional) Choose the JMX configuration.
For more information about JMX, see JMX guide.
If you started the upgrade using the -jmx option, one of the following choices appears:
Choose if you want to use JMX settings from the backup or use new settings.
This question appears when JMX exists in the backup.
Choose a JMX port.
This question appears when JMX does not exist in the backup.
Select a spooling method.
For more information on choosing a spooling mechanism, see Choosing Between Classic Spoooling and Database Spooling. See Step 14 in the example.
Choose if you want to use interactive jobs support (IJS) settings from the backup or use the new defaults for 6.2.
In most cases, you should use the new defaults which enable the new interactive jobs support. Step 15 in the example shows the new defaults.
Caution
If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration attributes, you should verify that the new IJS will not break your site-specific settings.
Choose the group id range
The default is the last group id from the backup + 100 and same range. See Step 16 in the example.
Select the next job ID.
The default is old jobseqnum + 1000, rounded up to the nearest 1000. See Step 17 in the example.
(Optional) Select the next AR ID.
This question appears only if arseqnum is in the backup. The default is old arseqnum + 1000, rounded up to the nearest 1000. See Step 18 in the example.
Select automatic startup options.
See Step 19 in the example.
One of the following choices appears:
Choose whether to run qmaster as an SMF service.
This question appears only on systems that run at least version 10 of the Solaris OS.
Choose whether to use RC scripts for qmaster.
This question appears on platforms that are not running at least version 10 of the Solaris OS or if you started the upgrade using the -nosmf option.
Load the old configuration.
See Step 20 in the example.
If this step fails with a critical error:
Check the log file /tmp/sge_backup_date.log.
Try to reload the configuration through the $SGE_ROOT/util/upgrade_modules/load_sge_config.sh script and the arguments displayed in the previous step.
If the preceding steps do not resolve the problem, stop the upgrade process.
(Optional) Upgrade ARCo.
If you use ARCo, you need to upgrade it. If you want to use the same ARCo database, copy the $SGE_ROOT/$SGE_CELL/common/dbwriter.conf from the old cluster into the same directory on the new cluster, it will be sourced and you will be only prompted to enter any missing information during the installtion of dbwriter. See Upgrading ARCo step 6.
Run the post upgrade procedures
Info
The post-upgrade procedures are easier when you have root access to all machines through ssh or rsh without having to enter a password. To use rsh instead of the default ssh, run the ./inst_sge command with -rsh argument. Example:
# ./inst_sge -upd-execd -rsh
Initialize the local execd spool directories
This step creates the local execd spool directories on the execd hosts with the correct permissions. Run the following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-execd
(Optional) Create new RC scripts for the whole cluster.
Caution
This command removes old RC scripts. To keep the old RC scripts, do not run this command.
To start the services automatically after a reboot, run the following command as root from the master host in $SGE_ROOT directory:
## ./inst_sge -upd-rc
(Optional) Install or update the Windows helper service.
Perform this step to use the Windows execution hosts with the 6.2 cluster. When connecting to each Windows execution host, you are prompted for an administrator user to connect to the Windows host. If all your Windows hosts share the same administrative user, set the environment variable SGE_WIN_ADMIN to that user to access all Windows hosts without additional user intervention. Example:
(sh, bash)# export SGE_WIN_ADMIN=Administrator
(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator
To install or update the Windows helper service, run the following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-win
Caution
Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.
Start the new execution daemons.
Optionally, if you can login without typing a password, you can start the whole cluster as root user from the $SGE_ROOT directory with a single command:
# ./inst_sge -start-all
This command starts the master daemon, shadow daemons, and all execution daemons.
Upgrade is complete.
How to Upgrade the Original Cluster to 6.2 Software (Real Upgrade)
(Optional) Test the cloned cluster, if you used the cloned cluster configuration method to transfer the configuration to a new 6.2 cluster.
Back up the original cluster settings as described in How to Back Up the Cluster.
Stop the scheduler:
# qconf -ks
Verify that no jobs are running on the cluster.
Stop the old cluster:
# qconf -ke all
# $SGE_ROOT/$SGE_CELL/common/sgemaster stop
(Optional) Stop the Berkeley DB server, if your cluster uses Berkeley DB server spooling.
On the BDB server host:
Source the cluster settings.sh (or .csh) file.
Type the following command:
# $SGE_ROOT/$SGE_CELL/common/sgebdb stop
(Optional) If you use ARCo, ensure that the reporting file has been completely processed by the dbwriter.
There should be no reporting or reporting.processing file in the $SGE_ROOT/$SGE_CELL/common directory.
Once the reporting file is processed, on dbwriter host:
Source the cluster settings.sh (or .csh) file.
Stop the dbwriter:
# $SGE_ROOT/$SGE_CELL/common/sgedbwriter stop
Warning
If you use ARCo, you must completely process the reporting file and stop the dbwriter before you continue.
Extract the new 6.2 binaries and common files to the $SGE_ROOT directory.
Caution
Do not remove any of the $SGE_ROOT directory contents, except for the case where the new Sun Grid Engine 6.2 binaries differ from the existing installation. For example, you might have used your custom lx26-amd64 binaries, but Sun Grid Engine 6.2 uses lx24-amd64 even for 2.6 kernels. In that case you must remove the old binaries manually!
You must ensure that all binaries for all used architectures were updated and no architecture with the old version remains in the $SGE_ROOT directory.
Start the new upgrade on the original qmaster host from the $SGE_ROOT directory.
# ./inst_sge -upd
Tip
To enable or disable some additional features like JMX, CSP, or to use the old IJS, you must provide additional flags to the upgrade script in the same way that you would for qmaster installation. For example, to upgrade a cluster and enable the JMX thread in qmaster and use CSP mode, run the following command: ./inst_sge -upd -jmx -csp
Accept the displayed license.
Enter the complete path to the backup directory.
For example, /backups/sge_6.1_June10_2008.
Caution
In case you you don’t specify the original $SGE_ROOT and $SGE_CELL in the next two steps, the upgrade type attempted will not be the real upgrade! Instead the clone cluster configuration method will be used.
Enter the $SGE_ROOT directory.
The default is the current directory. For more information, see SGE_ROOT.
Enter the $SGE_CELL directory.
The default is default. For more information, see SGE_CELL.
Select a new $SGE_CLUSTER_NAME.
The default value is one of the following, depending on which is found first:
The existing SGE_CLUSTER_NAME ($SGE_ROOT/$SGE_CELL/common/cluster-name)
The SGE_CLUSTER_NAME from the backup
p$SGE_QMASTER_PORT
For more information, see SGE_CLUSTER_NAME.
(Optional) Select the JMX configuration.
For more information about JMX, see JMX guide.
If you started the upgrade using the -jmx option, one of the following choices appears:
Choose if you want to use JMX settings from the backup or use new settings.
This question appears when JMX exists in the backup.
Choose a JMX port.
This question appears when JMX does not exist in the backup.
Choose if you want to keep the spooling method from the backup.
(Optional) Select a spooling method.
This is displayed if you chose not to use backup in the previous screen. See example. For more information on choosing a spooling mechanism, see Choosing Between Classic Spooling and Database Spooling.
Choose if you want to use interactive jobs support (IJS) settings from the backup or use the new defaults for 6.2.
In most cases, you should use the new defaults which enable the new interactive jobs support.
Caution
If you changed QLOGIN_DAEMON, QLOGIN_COMMAND, RLOGIN_DEAMON, RLOGIN_COMMAND, RSH_DEAMON, or RSH_COMMAND configuration attributes, you should verify that the new IJS will not break your site-specific settings.
Select the next job ID.
The default is old jobseqnum + 1000, rounded up to the nearest 1000.
(Optional) Select the next AR ID.
This question appears only if arseqnum is in the backup. The default is old arseqnum + 1000, rounded up to the nearest 1000.
Choose automatic startup options.
One of the following choices appears:
Choose whether to run qmaster as an SMF service.
This question appears only on systems that run at least version 10 of the Solaris OS.
Choose whether to use RC scripts for qmaster.
This question appears on platforms that are not running at least version 10 of the Solaris OS or if you started the upgrade using the -nosmf option.
Load the old configuration.
If this step fails with a critical error:
Check the log file /tmp/sge_backup_date.log.
Try to reload the configuration through the $SGE_ROOT/util/upgrade_modules/load_sge_config.sh script and the arguments displayed in the previous step.
If the preceding steps do not resolve the problem, stop the upgrade process.
(Optional) Copy the binaries and the common directory to all the hosts in the cluster, if not on a shared file system
If you use local binaries or a local common directory for each host, you must copy all the new binaries and the common directory locally to each host. Ensure that all binaries are updated and no architecture with the old version remains in the $SGE_ROOT directory.
Note
If you do not perform this operation the qmaster host will have Sun Grid Engine 6.2 binaries, while the rest of the cluster will still have the old version and will not work as desired!
(Optional) Upgrade ARCo.
If you use ARCo, you need to upgrade it.See Upgrading ARCo step 6.
Run the post upgrade procedures
Info
The post-upgrade procedures are easier when you have root access to all machines through ssh or rsh without having to enter a password. To use rsh instead of the default ssh, run the ./inst_sge command with -rsh argument. Example:
# ./inst_sge -upd-execd -rsh
Initialize the local execd spool directories
This step creates the local execd spool directories on the execd hosts with the correct permissions. Run the following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-execd
(Optional) Create new RC scripts for the whole cluster.
Caution
This command removes old RC scripts. To keep the old RC scripts, do not run this command.
To start the services automatically after a reboot, run the following command as root from the master host in $SGE_ROOT directory:
## ./inst_sge -upd-rc
(Optional) Install or update the Windows helper service.
Perform this step to use the Windows execution hosts with the 6.2 cluster. When connecting to each Windows execution host, you are prompted for an administrator user to connect to the Windows host. If all your Windows hosts share the same administrative user, set the environment variable SGE_WIN_ADMIN to that user to access all Windows hosts without additional user intervention. Example:
(sh, bash)# export SGE_WIN_ADMIN=Administrator
(csh,tcsh)# setenv SGE_WIN_ADMIN Administrator
To install or update the Windows helper service, run the following command as root from the master host in $SGE_ROOT directory:
# ./inst_sge -upd-win
Caution
Only one SGE_Helper_Service.exe can run on an execution host. You cannot use the same Windows execution host for a 6.0 or 6.1 cluster and a 6.2 cluster.
Start the new execution daemons.
Optionally, if you can login without typing a password, you can start the whole cluster as root user from the $SGE_ROOT directory with a single command:
# ./inst_sge -start-all
This command starts the master daemon, shadow daemons, and all execution daemons.
—————————————————————————————-
Example Upgrade for Cloned Cluster Configuration
The following upgrade example uses a copy of the existing cluster configuration with a different $SGE_CELL. This example does not use JMX and there are no Service Tags. The steps in this example are referred to from the software upgrade description at How to Install the 6.2 Software Using the Cloned Cluster Configuration Method.
Steps 4 and 5
# ./inst_sge -upd
Welcome to the Grid Engine Upgrade Procedure
——————————————–
Before you continue with the upgrade, read these hints:
– Your terminal window should have a size of at least
80×24 characters
– At any time during the upgrade process, use your standard
interrupt key to abort the upgrade. Typically, the interrupt
key combination is Ctrl-C.
The upgrade procedure will take approximately 1-2 minutes.
Hit <RETURN> to continue >>
Step 6
Type the complete path to the Grid Engine configuration backup directory.
————————————————————————-
Backup directory >> /tmp/bck
Found backup from GE 6.1u4 version created on 2008-06-10_10:56:29
Continue with this backup directory (y/n) [y] >>
Step 7
The Grid Engine root directory is:
$SGE_ROOT = /sge
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/sge] >>
Your $SGE_ROOT directory: /sge
Hit <RETURN> to continue >>
Step 8
Grid Engine cells
—————–
Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don’t
know yet what is a Grid Engine cell it is safe to keep the default cell name
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >> new_cell
Using cell >new_cell<.
Hit <RETURN> to continue >>
Step 9
Grid Engine TCP/IP communication service
—————————————-
The port for sge_qmaster is currently set by the shell environment.
SGE_QMASTER_PORT = 21640
Now you have the possibility to set/change the communication ports by
using the
>shell environment< or you may configure it via a network service,
configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<: [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 1) >>
Grid Engine TCP/IP communication service
—————————————-
Using the environment variable
$SGE_QMASTER_PORT=21640
as port for communication.
Do you want to change the port number? (y/n) [n] >>
Step 10
Grid Engine TCP/IP communication service
—————————————-
The port for sge_execd is currently set by the shell environment.
SGE_EXECD_PORT = 21641
Now you have the possibility to set/change the communication ports by
using the
>shell environment< or you may configure it via a network service,
configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_execd <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<: [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 1) >>
Grid Engine TCP/IP communication service
—————————————-
Using the environment variable
$SGE_EXECD_PORT=21641
as port for communication.
Do you want to change the port number? (y/n) [n] >>
Step 11
Grid Engine qmaster spool directory
———————————–
The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.
The admin user >sgeadmin< must have read/write access
to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.
The following directory
[/sge/new_cell/spool/qmaster]
will be used as qmaster spool directory by default!
Do you want to select another qmaster spool directory (y/n) [n] >>
Step 12
Unique cluster name
——————-
The cluster name uniquely identifies a specific Sun Grid Engine cluster.
The cluster name must be unique throughout your organization. The name
is not related to the SGE cell.
The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).
Enter new cluster name or hit <RETURN>
to use default [p21640] >>
Your $SGE_CLUSTER_NAME: p21640
Hit <RETURN> to continue >>
Step 14
creating directory: /sge/new_cell/spool/qmaster/job_scripts
Setup spooling
————–
Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic
Initializing spooling database
Hit <RETURN> to continue >>
Step 15
Interactive Job Support (IJS) Selection
—————————————
The backup configuration includes information for running
interactive jobs. Do you want to use the IJS information from
the backup (‘y’) or use new default values (‘n’) (y/n) [y] >> n
Using new interactive job support default setting for a new installation.
Hit <RETURN> to continue >>
Creating >act_qmaster< file
Step 16
Grid Engine group id range
————————–
When jobs are started under the control of Grid Engine an additional
group id is set on platforms which do not support jobs. This is done
to provide maximum control for Grid Engine jobs.
This additional UNIX group id range must be unused group id’s in your
system. Each job will be assigned a unique id during the time it is
running. Therefore you need to provide a range of id’s which will
be assigned dynamically for jobs.
The range must be big enough to provide enough numbers for the
maximum number of Grid Engine jobs running at a single moment on
a single host. E.g. a range like >20000-20100< means, that Grid Engine
will use the group ids from 20000-20100 and provides a range for
100 Grid Engine jobs at the same time on a single host.
You can change at any time the group id range in your cluster configuration.
Please enter a range [34299-34498] >>
Using >34299-34498< as gid range. Hit <RETURN> to continue >>
Grid Engine cluster configuration
———————————
Please give the basic configuration parameters of your Grid Engine
installation:
<execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >sgeadmin<
must have the right to create this directory and to write into it.
Default: [/sge/new_cell/spool] >>
Grid Engine cluster configuration (continued)
———————————————
<administrator_mail>
The email address of the administrator to whom problem reports are sent.
It is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [sgeadmin@qmaster.com] >>
The following parameters for the cluster configuration were configured:
execd_spool_dir /sge/new_cell/spool
administrator_mail sgeadmin@qmaster.com
Do you want to change the configuration parameters (y/n) [n] >>
Step 17
Provide a value to use for the next job ID.
——————————————-
Backup contains last job ID 1. As a suggested value, we added 1000
to that number and rounded it up to the nearest 1000.
Increase the value, if appropriate.
Choose the new next job ID [2000] >>
Hit <RETURN> to continue >>
Step 18
Provide a value to use for the next AR ID.
——————————————
Backup contains last AR ID 1. As a suggested value, we added 1000
to that number and rounded it to the nearest 1000.
Increase the value, if appropriate.
Choose the new next AR ID [2000] >>
Hit <RETURN> to continue >>
Step 19
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>
qmaster startup script
———————-
Do you want to start qmaster automatically at machine boot?
NOTE: If you select “n” SMF will be not used at all! (y/n) [y] >> n
Grid Engine qmaster startup
—————————
Starting qmaster daemon. Please wait …
starting sge_qmaster
Hit <RETURN> to continue >>
Step 20
Last step – load configuration from the backup
———————————————-
load command: /sge/util/upgrade_modules/load_sge_config.sh /tmp/bck -mode “copy” -log C -newijs “false” -gid_range “34299-34498” -admin_mail “sgeadmin@qmaster.com” -execd_spool_dir “/sge/new_cell/spool”
Hit <RETURN> to continue >>
Loading saved cluster configuration from /tmp/bck (log in
/tmp/sge_backup_load_2008-06-13_17:42:28.log)…
Loading saved cluster configuration from /tmp/bck (log in /tmp/sge_backup_load_2008-06-13_17:42:28.log)…
Done
If loading the configuration succeeded run these additional commands:
REQUIRED:
inst_sge -upd-execd
This command initializes all execd spool directories.
inst_sge -upd-win
This command connects to all Windows execution hosts and installs
the new Windows helper service on each host.
WARNING: If a helper service from a previous release is running
on this host, the new helper service overwrites it. The
host will run only in a 6.2 cluster.
TIP: This action requires to enter a windows administrator user for each
host interactively. If all your systems share the same administrator you
can set the environment variable SGE_WIN_ADMIN to that user name.
E.g.: (sh, bash) export SGE_WIN_ADMIN=Administrator
(csh,tcsh) setenv SGE_WIN_ADMIN Administrator
OPTIONAL:
inst_sge -upd-rc
This command creates new autostart scripts for the new cluster
and removes any conflicting files.
TIP: To disable SMF on Solaris systems, use the command
inst_sge -upd-rc -nosmf
TIP: Use inst_sge -post-upd to do all above actions
Hey there, You have done a fantastic job. I will certainly digg it and personally recommend to my friends. I’m confident they’ll be benefited from this site.
I have been browsing online more than three hours today, yet I never found any interesting article like yours. It is pretty worth enough for me. In my view, if all website owners and bloggers made good content as you did, the internet will be a lot more useful than ever before.
My brother suggested I might like this blog. He was totally right. This post actually made my day. You can not imagine simply how much time I had spent for this info! Thanks!