HPMPI 관련 문제 해결 방법

General  
 
 Where can I get the latest version of HP-MPI?

HP-MPI can be obtained from http://www.hp.com/go/mpi, by following the “Download” link.

What is MPI_ROOT that I see referenced all over the place?

MPI_ROOT is an environment variable that hp-mpi (mpirun) uses to determine where HP-MPI is installed and therefore which executables and libraries to use. If you only have one copy of HP-MPI installed on the system and it is in /opt/hpmpi or /opt/mpi, then you do not need to set MPI_ROOT. To set MPI_ROOT:

For sh, bash, or ksh shells:

% export MPI_ROOT=/opt/hpmpi
% $MPI_ROOT/bin/mpirun …

For csh or tcsh shells:

% setenv MPI_ROOT /opt/hpmpi
% $MPI_ROOT/bin/mpirun …

It is particularly helpful when you have multiple versions of HP-MPI installed on a system.

Where can I get a license for HP-MPI?

The first thing you must determine is if you need a license. If you do need a license, then proceed as follows:
Customers should get their license by contacting HP Licensing, using the information on the License Certificate.
A 30 day demo license can be downloaded, along with the current version of HP-MPI at
http://www.hp.com/go/mpi. Follow the “Download” link.
To request a permanent license you will need to supply the hostname of the license server, and the ‘hostid’ of the license server. The hostid can be found by running this command:

$MPI_ROOT/bin/licensing/<arch>/lmutil hostid

The value of <arch> should be selected from the available directories in $MPI_ROOT/bin/licensing.
How can I tell what version of HP-MPI is installed on my system?

The currently installed version of HP-MPI can be determined in several ways. Try one of the following:
% $MPI_ROOT/bin/mpirun -version
(on HP-UX) % swlist –l product | grep “HP MPI”
(on Linux) % rpm –qa | grep “hpmpi”
What Linux distributions is HP-MPI supported on?

We have started tracking this information in the HP-MPI release notes. See the release note for your product for this information. Generally we test with the current distributions of RedHat and SuSE, but are generally compatible with all distributions.

You can also check the “What versions of interconnect software stacks are supported with HP-MPI” questions for some additional information about which versions of Linux are qualified with HP-MPI.

Does HP-MPI work with applications that use MPICH2?

HP-MPI provides an “MPICH compatibility mode” that supports applications and libraries that use the MPICH implementation. It is important to remember that MPICH2 is not a standard, but rather a specific implementation of the MPI-2 standard.

HP-MPI provides MPICH compatibility with mpirun, and with all the compiler wrappers: mpirun.mpich, mpicc.mpich, mpiCC.mpich, mpif77.mpich, and mpif90.mpich.

In general, object files built with HP-MPI’s MPICH compiler wrappers can be used by an application that uses the MPICH implementation. In general, applications built using MPICH complaint libraries should be re-linked to use HP-MPI. Using MPICH compatibility mode to produce a single executable to run under both MPICH and HP-MPI is not advised.

More information on MPICH-compatibility can be found in the HP-MPI 11th Edition Users Guide (http://www.hp.com/go/mpi, click on Documentation.

What version of HP-MPI works with my version of HPUX?

HP-UX Version Intanium PA-RISC
HP-UX 11.31
HP-UX 11i V3 MPI V2.2.5
MPI V2.2 MPI V2.2.5
MPI V2.2 
HP-UX 11.23
HP-UX 11i V2 MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0 MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0
HP-UX 11.11
HP-UX 11i V1 N/A MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0
 
 
 Installation/setup  
 
 Do I need a license to run HP-MPI?

You do not need a license on HP-UX or an XC4000 or XC6000 system. You also do not need a license if you are running a supported ISV application. The list of currently supported ISV applications is:
Acusim (Acusolve)
Adina (Adina)
Ansys (Ansys, CFX)
Abaqus (Abaqus)
Accelrys (Castep, DMol3, Discover, MesoDyn, ONETEP, GULP)
Allied Engineering (ADVC)
AVL (Fire, Swift)
CD-Adapco (Star-CD)
COSMOlogic GmbH&Co.KG (TURBOMOLE)
ESI-Group (Pamcrash, PamFlow)
EXA (PowerFlow)
Fluent (Fluent)
LSTC (LS-Dyna)
Magmasoft (Magmasoft)
Mecalog (Radioss)
Metacomp (Metacomp)
MSC (Nastran, Marc)
Parallel Geoscience Corporation (SPW)
Remcom (various)
Ricardo (Vectis)
Roxar (Roxar)
TNO (Tass)
UGS (Nx Nastran)
Univ. of Birmingham (Molpro)
Univ. of Texas (AMLS)
In all other cases, you need to acquire a license.

Where do I install the license file?

The license file should be copied to:

$MPI_ROOT/licenses/mpi.lic

The license file should be readable by owner, group, and world. The license file only needs to exist on the host where rank0 will run, but it is recommended that the license file be copied to every host that will run HP-MPI jobs.

How do I start the license server?

The license server can be started using the command:

% $MPI_ROOT/bin/licensing/<arch>/lmgrd -c mpi.lic

Where <arch> is the architecture of the host that is running the license server.
How are the ranks launched? (or why do I get the message “remshd: Login incorrect.” or “Permission denied” )

There are a number of ways that HP-MPI can launch the ranks, but some way must be made available:
allow passwordless rsh access by setting up hosts.equiv and/or .rhost files to allow the mpirun machine to rsh into the execution nodes.
allow passwordless ssh access from the mpirun machine to the execution nodes and set the environment variable MPI_REMSH to the full path of ssh. (See IQ3)
use SLURM (srun) by using the –srun option to mpirun
under Quadrics, use RMS (prun) by using –prun option to mpirun
How can I verify that HP-MPI is installed and functioning optimally on my system?

A simple hello_world test is available in $MPI_ROOT/help/hello_world.c that can validate basic launching and connectivity. Other more involved tests are also included. To test the bandwidth and latency of your system use the ping_pong.c test.

Can I have multiple version of HP-MPI installed and how can I switch between them?

You can install multiple HP-MPI’s and they can be installed anywhere as long as they are in the same place on each host you plan to run on. You can switch between them by setting the environment variable MPI_ROOT.

How do I install HP-MPI in a non-standard location on Linux?

There are two ways:

% rpm –prefix=/wherever/you/want -ivh hpmpi-XXXXX.XXX.rpm

Alternatively, “untar” the rpm using:

% rpm2cpio hpmpi-XXXXX.XXX.rpm | cpio -id

How can I determine what the environment settings are during the application runtime?

HP-MPI does not propagate the full environment of the shell that started the mpirun command. To determine what variables are set during the application run, replace the application with /bin/env tool, while keeping the remainder of the mpirun command the same:

% $MPI_ROOT/bin/mpirun -stdio=p … /bin/env

The “-sdtio=p” command line option will cause all the output to be tagged with the rank ID that produced that output.

If you application is launched using a script to setup the environment, you can temporarily replace (i.e. symlink) the executable to /bin/env and still determine the runtime environment.

How can I pass an environment variable to my application?

An environment variable that is required by the application can be set on the mpirun command line using the -e flag.

% $MPI_ROOT/bin/mpirun -e <variable>=<value> …

Is there a way to automatically set an environment variable for all future HP-MPI runs?

The hpmpi.conf file can be used to set options for all future HP-MPI runs. The hpmpi.conf file exists in three places on each system. The file is read in this order, and the last entry for any particular setting is what is used.
$MPI_ROOT/etc/hpmpi.conf
/etc/hpmpi.conf
~/.hpmpi.conf
 
 
 Host environment setup  
 
 How do I setup passwordless ssh access?

First from some arbitrary machine run

host1% ssh-keygen -t dsa

When it asks for a passphrase just hit enter. This creates a file at ~/.ssh/id_dsa which is supposed to be private and local to that machine. And it creates an ~/.ssh/id_dsa.pub that will become part of the remote machines ~/.ssh/authorized_keys2 files and allows you to ssh from ost1 to those other hosts.

You can cheat a little and use the same ~/.ssh/id_dsa everywhere.

ext:

host1% cat ~/.ssh/id_dsa.pub
~/.ssh/authorized_keys2

Add the collowing lines to the ~/.ssh/config file:

StrictHostKeyChecking=no
ConnectionAttempts=15

On any new host,

newhost% mkdir ~/.ssh
newhost% cp /backup/copy/for/ssh/* ~/.ssh
newhost% chmod 700 ~/.ssh
newhost% chmod 755 ~

How can I increase the shmmax on a host?

To increase the shmmax on a host, without a reboot:

sysctl -w kernel.shmmax=<value>
sysctl -p /etc/sysctl.conf

This will modify the kernel.shmmax value in the /etc/sysctl.conf file, and then force the /etc/sysctl.conf file to be re-read by the system. 
 
 Performance problems 
 
 Are there any flags I can use to get better performance?

Better performance is a difficult thing to address in a general FAQ. In general, HP-MPI is built to provide good performance over a very wide range of input, system, application size, and other variables. In some cases, it is possible to improve the performance of an application by using specific tunable flags and settings.

The HP-MPI Users Guide includes a section on “Tuning” and a detailed description of all the HP-MPI flags and settings. The HP-MPI Users Guide can be found at http://www.hp.com/go/mpi by following the “Documentation” link.

Can the polling time of a rank be tuned? Can a rank be forced to yield without spinning?

Yes. The amount of time that a rank will poll waiting for new messages is tunable using the MPI_FLAGS environment variable.

MPI_FLAGS=y#

The “#” sign should be replaced with the number of miliseconds that a rank will spin before yielding. The default value is 10000. A value of 0 will cause the rank to yield immediately and not spin waiting for another message.

How do I turn on MPI collection of message lengths? I want an overview of MPI message lengths being send around within the application.

This information is available through HP-MPI’s “instrumentation” feature. Information is available in the user’s guide, but basically including “-i ” on the mpirun command line will create with a report that includes number and sizes of messages sent between ranks.

How does MPI_Init scale for large rank counts?

We have not done extensive profiling of the startup time for MPI_Init, but we believe that MPI_Init time is proportional to the number of ranks plus some overhead time. The overhead is likely a function of the amount of shared memory that is required on each node, plus the number of off host connections that must be established. 
 
 Interconnect and networking 
 
 What version of HP-MPI supports ConnectX?

HP-MPI v2.2.5.1 is the first release which supports ConnectX.

A failure/crash occurs when running HP-MPI applications on InfiniBand clusters using ConnectX Infiniband Host Channel Adapters when using HP-MPI versions older than v2.2.5.1. There is no specific signature to the failure. In one instance an application did not produce its results file. In another, a SEGV violation occurred. An example of the SEGV violation on a compute node named hadesn2 follows.

Terminated at Timestep -1: Segmentation Fault
srun: error: hadesn2: task[4,6]: Segmentation fault (core dumped)
srun: Terminating job

What versions of HP MPI are qualified with OFED?
HP-MPI 2.2.5 works with OFED 1.0 and 1.1.
HP-MPI 2.2.5.1 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3, but it does not support XRC in 1.3.
HP-MPI 2.2.7 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3 (including XRC).
HP-MPI 2.3 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3 (including XRC).

What InfiniBand HCA firmware is supported by OFED 1.3?

IOFED 1.3 supports the following Mellanox Technologies HCAs (SDR and DDR Modes are Supported):
InfiniHost (fw-23108 Rev 3.5.000)
InfiniHost III Ex (MemFree: fw-25218 Rev 5.3.000) (With memory: fw-25208 Rev 4.8.200)
InfiniHost III Lx (fw-25204 Rev 1.2.000)
ConnectX IB (fw-25408 Rev 2.3.000)
HP-MPI may not work properly when used with OFED 1.3 and an unsupported version of the HCA firmware.

How much memory is consumed when InfiniBand is used?

The amount of memory used varies with the message sizes that the application sends. For short message envelopes. 64M of memory is pinned. If long messages (>16K) are sent, up to 20% of the physical memory may be pinned during the application runtime.

When long messages (>16K) are sent, HP-MPI will pin the user provided buffer on both the sender and receiver side. Up to 20% of physical memory will be pinned. This memory is used by all the ranks on the node. If free() is called on memory that is in the 20%, the memory will be unpinned.

How do I control which network protocol is used by my application?

HP-MPI will test each protocol in the order listed in MPI_IC_ORDER, and use the first available option for the application. The test is done at runtime by the mpirun command. The default network protocol search is defined by the MPI_IC_ORDER environement variable. The defuault value in HP-MPI v2.2.7 is:

MPI_IC_ORDER=”ibv:vapi:udapl:itapi:psm:mx:gm:elan:TCP”

To force the selection of a specific network protocol, an option can be provided to mpirun. Currently, valid options are -IBV, -VAPI, -UDAPL, -ITAPI, -PSM, -MX, -GM, -ELAN, -TCP. Options can be either upper case or lower case. Upper case options will cause an error if the required network is not available. Lower case options will issue a warning if the requested network is not available, and will then use MPI_IC_ORDER to select a network.

To change the value for all future runs, the MPI_IC_ORDER variable can be added to the hpmpi.conf file.

What versions of Interconnect software stacks are supported with HP-MPI 2.3?

Protocol Option Supported OS Supported platforms NIC version !InfiniBand driver version 
Shared memory on SMP N/A Linux 2.4, 2.6 Kernels All Platforms N/A N/A
OpenFabrics -IBV Linux 2.6 Kernels IA64, i386, x86_64 Any IB card OFED 1.0, 1.1, 1.2, 1.3
uDAPL Standard -UDAPL Linux 2.4, 2.6 Kernels IA64, i386, x86_64 IB vendor specific 10GbE vendor  uDAPL 1.1, 1.2, 2.0
QLogic PSM -PSM Linux 2.6 Kernels x86_64 QHT7140,QLE7140 PSM 1.0
Myrinet MX -MX Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F,10G MX 2g, 10g, V1.2.x
Myrinet GM -GM Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F GM 2.0 and later
Quadrics ELAN -ELAN Linux 2.4, 2.6 Kernels IA64, i386, x86_64 1.4.20; 1.4.22 ELAN4
TCP/IP -TCP Linux 2.4, 2.6 Kernels All Platforms All cards that support IP Ethernet Driver, IP

What versions of interconnect software stacks are supported with HP-MPI 2.2.5.1?

Protocol Option Supported OS Supported platforms NIC version !InfiniBand driver version
Shared memory on SMP N/A Linux 2.4, 2.6 Kernels All Platforms N/A N/A
OpenFabrics -IBV Linux 2.6 Kernel IA64, i386, x86_64 Any IB card OFED 1.1, 1.2
Mellanox VAPI  -VAPI Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Mellanox Card VAPI 3.2, 4.1 
uDAPL Standard -UDAPL Linux 2.4, 2.6 Kernels IA64, i386, x86_64 vendor specific uDAPL 1.1, 1.2
QLogic PSM -PSM Linux 2.6 Kernels x86_64 QHT7140,QLE7140 PSM 1.0
Myrinet MX -MX Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F,10G MX 2g, 10g
Myrinet GM -GM Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F GM 2.0 and later
Quadrics ELAN -ELAN Linux 2.4, 2.6 Kernels IA64, i386, x86_64 1.4.20; 1.4.22 Elan4
TCP/IP -TCP Linux 2.4, 2.6 Kernels All Platforms All cards that support IP Ethernet Driver, IP

How is HP-MPI IB message striping (the ability to send a single message using multiple IB cards) functionality accessed?

HP-MPI supports multi-rail on OpenFabric via the environment variable MPI_IB_MULTIRAIL. This environment variable is ignored by all other interconnects. In multi-rail mode, a rank can use up to all the cards on its node, but limited to the number of cards on the node to which it is connecting. For example, if rank A has three cards, rank B has two cards, rank C has three cards, then connection A–B uses two cards, connection B–C uses two cards, connection A–C uses three cards. Long messages are striped among all the cards on that connection to improve bandwidth.

By default, multi-card message striping is off. To turn on it, specify ‘-e MPI_IB_MULTIRAIL=N’, where N is the number of cards used by a rank. If N <= 1, then message striping is not used. If N is greater than the max number of cards M on that node, then all M cards are used. If 1 < N <= M, message striping is used on N cards or less. If ‘-e MPI_IB_MULTIRAIL’ is specified, the max possible cards are used.

On a host, all the ranks select all the cards in a series. For example: Given 4 cards and 4 ranks per host, rank 0 will use cards ‘0, 1, 2, 3’, rank 1 will use ‘1, 2, 3, 0’, rank 2 will use ‘2, 3, 0, 1’, and rank 4 will use ‘3, 0, 1, 2’. The order is important in SRQ mode because only the first card is used for short messages. The selection approach allows short RDMA messages to use all the cards in a balanced way.

For HP-MPI 2.2.5.1 and older, all cards must be on the same fabric.

What extra software do I need to allow HP-MPI to run on my Infiniband hardware?
On HP-UX you do not need anything (unless you want to run IP over IB (IPoIB)). On HP XC Linux, you do not need any additional software. On other Linux distributions, you will need to use a software stack that supports the specific InfiniBand cards you are using. Please contact the InfiniBand card vendor to determine what software you should install.

Why is the IB network not being used for communication?
The problem is generally a result of not finding the modules or libraries that HP-MPI looks for. To see what modules and libraries HP-MPI expects, see $MPI_ROOT/etc/hpmpi.conf. For example, there was a change in the vapi module name from mod_vapi to mod_vip. In HP-MPI 2.2.5, this file was updated to reflect this change, but if you are running with 2.2 you may need to make this change yourself. After 2.2.5 you can always ask the HP-MPI team for the latest hpmpi.conf file. If you are not getting the network you want selected, look at what is expected in hpmpi.conf and compare to the output of lsmod and the name/location of library files on the systems you are running on.

I get a problem when I run my 32-bit executable on my AMD64 or EM64T system:

ping_pong: Rank 0:1: MPI_Init: Fatal: dlopen(libitapi.so) failed! > (/usr/voltaire/lib/libitapi.so: cannot open shared object file: No > such file or directory) > a.out: Rank 0:1: MPI_Init: Can’t initialize RDMA device
Note that not all messages that say “Can’t initialize RDMA device” are caused by this problem, but this message can show up when running a 32-bit executable on a 64-bit linux machine. At present, the 64-bit deamon used by HP-MPI cannot determine the bitness of the executable and thereby uses incomplete information to determine the availability of high performance interconnects. To work around the problem for now, be sure to use flags (-TCP, -ITAPI, etc) to explicitly specify the network to use.

Where does HP-MPI look for the shared-libraries for the high-performance networks it supports?
On HP-MPI 2.2.5 and higher, these can be changed or extended globally using /etc/hpmpi.conf or $MPI_ROOT/etc/hpmpi.conf and can be set individually using ~/.hpmpi.conf. Environment variables can be set using the ‘-e’ option to mpirun. For older versions of HP-MPI, this chart explains the selection process:

Protocol 1st attempt 2nd attempt 3rd attempt
IB Environment variable MPI_ICLIB_ITAPI libitapi.so /usr/voltaire/lib/libitapi.so
GM Environment variable MPI_ICLIB_GM libgm.so OR libgm32.so /opt/gm/lib/libgm.so OR /opt/gm/lib/libgm32.so
ELAN Environment variable MPI_ICLIB_ELAN libelan.so
uDAPL Environment variable MPI_ICLIB_UDAPL libdat.so
VAPI Environment variable MPI_ICLIB_VAPI Environment variable MPI_ICLIB_VAPIDIR libmtl_common.so, libmpga.so, libmosal.so, libvapi.so
MX Environment variable MPI_ICLIB_MX libmyriexpress.so

 

When multiple TCP/IP interconnects are present, how does HP-MPI choose the network that will be used?

By default, HP-MPI uses the system call gethostbyname() to determine the ‘primary’ IP address of a host. That is the IP address that will be used to establish the initial connection will all the hosts that will be used.

With TCP/IP networks, a specific subnet can be selected by using the “-netaddr” flag to mpirun:

% $MPI_ROOT/bin/mpirun -netaddr <ip_address> …

QLogic card with -IBV flag dumps core

QLogic encourages the use of the -PSM protocol with their adapters.
 
 
 Building applications 
 
 What compilers does HP-MPI work with?

HP-MPI has gone through great measures to not introduce compiler dependencies. HP-MPI 2.2.7 for Linux was tested with the following list of compilers:
GNU 3.2, 3.4, 4.1
glibc 2.3, 2.4, 2.5
Intel® 9.0, 9.1, 10.0, 10.1
PathScale 2.3, 2.4, 2.5, 3.1, 3.2
Portland Group 6.2, 7.0
What MPI libraries do I need to link when I build.

We strongly encourage you to use the mpicc, mpif90 and mpi77 scripts in $MPI_ROOT/bin to build. If you absolutely do not want to build with the scripts, then I would at least use them with the –show option to see what they are doing and use that as a starting point for doing your build. The -show option will just print out the command it would use to build with. These scripts are readable, so you can always poke around in them to understand when and what gets linked in.

How do I specifically build a 32-bit application on a 64-bit architecture?

On Linux, HP-MPI contains additional libraries in a 32 bit directory for 32-bit builds: $MPI_ROOT/lib/linux_ia32/

Use the –mpi32 flag to mpicc to ensure that the 32-bit libraries are used. In addition, your specific compiler may require a flag to indicate a 32-bit compilation.

EXAMPLE: On an Opteron system, using gcc, you need to tell gcc to generate 32 bit via the flag -m32. In addition, the -mpi32 is used to ensure 32-bit libraries are selected:

% setenv MPI_ROOT /opt/hpmpi
% setenv MPI_CC gcc
% $MPI_ROOT/bin/mpicc hello_world.c -mpi32 -m32
% file a.out

a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Can I use HP-MPI in my C++ application?

Yes, HP-MPI 2.2 and above contains C++ bindings.

Can I use HP-MPI in my Java application?

Yes. Here is a brief description on how to use the Open Source Java Interface to MPI for Linux (mpiJava):
Get the Java RPM for Linux

Goto http://java.sun.com

Follow the “Downloads” link, and select the Linux RPM that is appropriate to your version of Linux.

Install Java on Linux server

Copy the Linux RPM to your Linux server. Install the software, following the instructions provided by Sun.

Download mpiJava to desktop system

Goto http://aspen.ucs.indiana.edu/pss/HPJava/mpiJava.html Click “Software registration and download” Follow the instructions to build HPJava on your local machine.
 
 
 Known Problems and Error Messages 
 
 My application crashes in HP-MPI

If you have an application that crashes with a segv, and the backtrace shows that it was in HP-MPI when it segv’ed, try passing -e MPI_FLAGS=Eon to mpirun. This requests that HP-MPI do argument checking on what is passed into HP-MPI from user code. Normally HP-MPI does no argument checking, which can in some cases cause a segv or other undefined behavior.

HP-MPI 2.2.5.1 Rank 0:0: MPI_Comm_spawn_multiple: spawn failed

This is a known problem with HP-MPI 2.2.5.1. The C++ bindings contained an error that caused calls to MPI_Comm_spawn to fail. The problem is fixed in HP-MPI 2.2.7. If for some reason you are unable to upgrade, please contact HP-MPI support at mpi@rsn.hp.com. We can provide a new set of C++ headers that can be used to recompile your application to enable spawn support.

ScaLAPACK does not work with HP-MPI mpirun

ScaLAPACK requires the use of the HP-MPI MPICH compatibility mode. You will need to recompile your application code using the MPICH compatibility compiler wrappers:

mpicc.mpich, mpiCC.mpich, mpif77.mpich, mpif90.mpich

When launching the application, you will need to use the

$MPI_ROOT/bin/mpirun.mpich

command.

HP-MPI 2.2.7 applications exit if -sdtio=bnone is set

There is no work around to allow -stdio=bnone to work in HP-MPI 2.2.7. This problem has been resolved in HP-MPI 2.3.

HP-MPI 2.2.7 will not work if LD_BIND_NOW is set

HP-MPI 2.2.7 introduced a new architecture to make collectives easier to upgrade and extend. That new architecture uses a “lazy” dlopen to inspect internal libraries. The LD_BIND_NOW environment variable forces all dlopen calls to ignore the “lazy” guidance. As a result, an error message about “unresolved symbols” will be displayed. At this time, there are two possible workarounds:

Unset the environment variable LD_BIND_NOW
Set the environment variable MPI_DISABLE_PLUGINS=1
This will be fixed in a future release of HP-MPI.

libibverbs: Fatal: couldn’t open sysfs class ‘infiniband_verbs’

This error message happens on RedHat 4 systems that do not have InfiniBand hardware installed. RedHat 4 pre-installed some InfiniBand libraries. HP-MPI attempts to auto detect the available interconnects. If InfiniBand libraries are installed, HP-MPI will open these libraries and attempt to detect if InfiniBand is in a usable state. The detection process will fail if no InfiniBand hardware is installed on the host. This causes the error message to be displayed.

The workaround is to remove IBV and UDAPL from the default interconnect search order. This should only be done on RedHat 4 systems that do not have InfiniBand hardware installed. The default MPI_IC_ORDER is:

MPI_IC_ORDER=”ibv:vapi:udapl:itapi:psm:mx:gm:elan:TCP”

Create or Edit the /etc/hpmpi.conf file to remove IBV and UDAPL from the MPI_IC_ORDER environment variable. The $MPI_ROOT/etc/hpmpi.conf file can be used as a template.

MPI_IC_ORDER=”vapi:udapl:itapi:psm:mx:gm:elan:TCP”

Alternatively, an interconnect selection flag can be given on the mpirun command line. For example, to select TCP as the interconnect: % $MPI_ROOT/bin/mpirun -TCP …

libibverbs: Warning: fork()-safety requested but init failed

This is a benign warning message that is issued when using HP-MPI v2.2.5.1 or later on Linux kernels older than v2.6.12, and OFED (OpenFabrics Enterprise Distribution) version 1.2 or later.

The application will run successfully.

The message is repeated for each rank used by the application. So multiple instances of this message may be found in the output file.

The message can be avoided by any one of the following:
Run on a kernel of version 2.6.12 or newer.
Set HP-MPI environment variable MPI_IBV_NO_FORK_SAFE to 1. For Example:

/opt/hpmpi/bin/mpirun -np 8 -e MPI_IBV_NO_FORK_SAFE=1 -srun ./a.out
Application that uses fork() crashes

Applications using fork() may crash on configurations with InfiniBand using OFED (OpenFabrics Enterprise Distribution) on kernels older than v2.6.18. This includes indirect use of fork() such as calls to system() and popen(), which invoke fork().

Known problems with fork() and OFED can be avoided in any of the following ways:
Run on XC V3.2.1 or greater, where all known OFED fork() fixes have been made.
Run on a system with kernel 2.6.18 or greater
Run InfiniBand with non-OFED drivers. This is not an option on configurations with ConnectX InfiniBand Host Channel Adapters, where OFED is required.
Set MPI_REMSH to a local command, got error: Cannot execvp “<command>”: No such file or directory

By default, HP-MPI uses a tree structure to start large jobs. When the number of hosts is greater than 20, the mpid’s use MPI_REMSH to create other mpid’s. So, the MPI_REMSH command must be accessible on each host. This error can be avoided in two ways:
make MPI_REMSH available on each host you plan to run on
Force the mpirun to start all the remote mpid’s. MPI_MAX_REMSH=99999999
MPI_Init: MPI BUG: IBV requested but not available

This message may be displayed by multi-threaded applications that are using HP-MPI 2.2.7. When the pthread library is included with the application, the expected library linking order is altered. This will be fixed in a future release of HP-MPI. The workaround is to add the MPI_ICSELECT_FORK environment variable to the mpirun command line:

% $MPI_ROOT/bin/mpirun -e MPI_ICSELECT_FORK=0 …

MPI_Init: Could not pin pre-pinned rdma region 0

The pinnable memory limit is set too low for the user to allocate and pre-pin memory for use with RDMA message transfers. This limit can be adjusted in the /etc/security/limits.conf file on Redhat, and /etc/syscntl.conf on SUSE. On our systems, we set the limits to:

soft memlock 524288
hard memlock 524288

The sshd service will need to be restarted after the change is made.
% service sshd restart

uDAPL dat_ia_openv shows udapl not available
Starting with RHEL 4 update 5, the dapl RPM delivers the dat.conf configuration files into the /etc/ofed directory. The configuration files are named dat32.conf and/or dat64.conf, depending on the bitness of the OS. Beginning eith RHEL 5 update 2 the file is named dat.conf. The libdat expects the configuration file to be called /etc/dat.conf. This means that an out-of-the-box dapl installation will never be able to locate the configuration file. If the uDAPL protocol is used by mpirun, the application will fail with errors like:
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_20 setting.
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_20b setting.
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_1X setting.
mpid: MPI BUG: uDAPL requested but not available

The solution is to copy the configuration file to /etc/dat.conf.

uDAPL runs hang on QLogic clusters
If an application using uDAPL running on QLogic hangs during the initialization process the problem may be lack of a module “rdma_ucm”. To load that module by hand you can use the command:

# modprobe rdma_ucm

To have the module loaded automatically, add rdma_cm and rdma_ucm to the OPENFABRICS_MODULES assignment in /etc/sysconfig/infinipath.

MPI_Init: dat_evd_wait()1 unexpected event number 16392

Users may see the following error during launch of HP-MPI applications on Chelsio iWARP hardware:

Rank 0:0: MPI_Init: dat_evd_wait()1 unexpected event number 16392
Rank 0:0: MPI_Init: MPI BUG: Processes cannot connect to rdma device
MPI Application rank 0 exited before MPI_Finalize() with status 1

To prevent these errors, Chelsio recommends passing the peer2peer=1 parameter to the iw_cxgb3 kernel module. This can be accomplished by running the following commands as root on all nodes:
# echo “1” > /sys/module/iw_cxgb3/parameters/peer2peer
# echo “options iw_cxgb3 peer2peer=1” >> /etc/modprobe.conf

The second command is optional and makes the setting persist across a system reboot.

HP-MPI 2.3 problems with -e MPI_UDAPL_MSG1=1
Users of iWARP hardware may see errors resembling the following:

dapl async_event QP (0x2b27fdc10d30) ERR 1
dapl_evd_qp_async_error_callback() IB async QP err – ctx=0x2b27fdc10d30

Previous versions of HP-MPI documented that passing “-e MPI_UDAPL_MSG1=1” was necessary on some iWARP hardware. As of HP-MPI 2.3, no iWARP implemtations are know to need this setting and it should be removed from all run scripts unless otherwise instructed.

-prot causes segv in MPI_Finalize
This is a know issue with “-prot” at high rank counts. The problem occurs most frequently at higher than 512 ranks. The workaround is to remove the “-prot” from the command line of any application run that experiences this problem.

mpid: MPI BUG: execvp failed: Cannot execute ./a.out: No such file or directory

MPID cannot find the executable to spawn in the following case:

The executable specified via the 2nd argument of MPI_Comm_spawn(_multiple)is a relative path, or has no path information at all.
Specifying a full path to MPI_Comm_spawn(_multiple) will always work.
The executable specified is NOT in the user’s home directory (Having the executable in a subdirectory of the home directory will demonstrate this problem).
Mpirun uses -hostlist (or appfile mode) to launch the initial ranks. There are several possible workarounds for this problem
Setting -e MPI_WORKDIR=`pwd` works if the executable to spawn is in the current working directory.
Specifying a full path to MPI_Comm_spawn’s 2nd argument is a valid workaround.
Placing the executable in the user’s home directory will also work.
mpid: An mpid exiting before connecting back to mpirun

In some cases, a batch job launcher is not able to launch jobs that are distributed on more than 20 hosts using the default HP-MPI settings. For jobs that will run on more than 32 hosts, it is possible that increasing MPI_MAX_REMSH will allow the job to launch normally. The default value of MPI_MAX_REMSH is 20. In most cases, increasing the value to be equal to the number of nodes will allow the job to successfully launch.

MPI_MAX_REMSH specifies the number of remote hosts that the mpirun command should connect to while launching the job. If the number of hosts in the job exceeds the value of MPI_MAX_REMSH, then the remaining hosts are contacted in a tree structure by the hosts in the first level.
 General  
 
 Where can I get the latest version of HP-MPI?

HP-MPI can be obtained from http://www.hp.com/go/mpi, by following the “Download” link.

What is MPI_ROOT that I see referenced all over the place?

MPI_ROOT is an environment variable that hp-mpi (mpirun) uses to determine where HP-MPI is installed and therefore which executables and libraries to use. If you only have one copy of HP-MPI installed on the system and it is in /opt/hpmpi or /opt/mpi, then you do not need to set MPI_ROOT. To set MPI_ROOT:

For sh, bash, or ksh shells:

% export MPI_ROOT=/opt/hpmpi
% $MPI_ROOT/bin/mpirun …

For csh or tcsh shells:

% setenv MPI_ROOT /opt/hpmpi
% $MPI_ROOT/bin/mpirun …

It is particularly helpful when you have multiple versions of HP-MPI installed on a system.

Where can I get a license for HP-MPI?

The first thing you must determine is if you need a license. If you do need a license, then proceed as follows:
Customers should get their license by contacting HP Licensing, using the information on the License Certificate.
A 30 day demo license can be downloaded, along with the current version of HP-MPI at
http://www.hp.com/go/mpi. Follow the “Download” link.
To request a permanent license you will need to supply the hostname of the license server, and the ‘hostid’ of the license server. The hostid can be found by running this command:

$MPI_ROOT/bin/licensing/<arch>/lmutil hostid

The value of <arch> should be selected from the available directories in $MPI_ROOT/bin/licensing.
How can I tell what version of HP-MPI is installed on my system?

The currently installed version of HP-MPI can be determined in several ways. Try one of the following:
% $MPI_ROOT/bin/mpirun -version
(on HP-UX) % swlist –l product | grep “HP MPI”
(on Linux) % rpm –qa | grep “hpmpi”
What Linux distributions is HP-MPI supported on?

We have started tracking this information in the HP-MPI release notes. See the release note for your product for this information. Generally we test with the current distributions of RedHat and SuSE, but are generally compatible with all distributions.

You can also check the “What versions of interconnect software stacks are supported with HP-MPI” questions for some additional information about which versions of Linux are qualified with HP-MPI.

Does HP-MPI work with applications that use MPICH2?

HP-MPI provides an “MPICH compatibility mode” that supports applications and libraries that use the MPICH implementation. It is important to remember that MPICH2 is not a standard, but rather a specific implementation of the MPI-2 standard.

HP-MPI provides MPICH compatibility with mpirun, and with all the compiler wrappers: mpirun.mpich, mpicc.mpich, mpiCC.mpich, mpif77.mpich, and mpif90.mpich.

In general, object files built with HP-MPI’s MPICH compiler wrappers can be used by an application that uses the MPICH implementation. In general, applications built using MPICH complaint libraries should be re-linked to use HP-MPI. Using MPICH compatibility mode to produce a single executable to run under both MPICH and HP-MPI is not advised.

More information on MPICH-compatibility can be found in the HP-MPI 11th Edition Users Guide (http://www.hp.com/go/mpi, click on Documentation.

What version of HP-MPI works with my version of HPUX?

HP-UX Version Intanium PA-RISC
HP-UX 11.31
HP-UX 11i V3 MPI V2.2.5
MPI V2.2 MPI V2.2.5
MPI V2.2 
HP-UX 11.23
HP-UX 11i V2 MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0 MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0
HP-UX 11.11
HP-UX 11i V1 N/A MPI V2.2.5
MPI V2.2
MPI V2.1.1
MPI V2.0
 
 
 Installation/setup  
 
 Do I need a license to run HP-MPI?

You do not need a license on HP-UX or an XC4000 or XC6000 system. You also do not need a license if you are running a supported ISV application. The list of currently supported ISV applications is:
Acusim (Acusolve)
Adina (Adina)
Ansys (Ansys, CFX)
Abaqus (Abaqus)
Accelrys (Castep, DMol3, Discover, MesoDyn, ONETEP, GULP)
Allied Engineering (ADVC)
AVL (Fire, Swift)
CD-Adapco (Star-CD)
COSMOlogic GmbH&Co.KG (TURBOMOLE)
ESI-Group (Pamcrash, PamFlow)
EXA (PowerFlow)
Fluent (Fluent)
LSTC (LS-Dyna)
Magmasoft (Magmasoft)
Mecalog (Radioss)
Metacomp (Metacomp)
MSC (Nastran, Marc)
Parallel Geoscience Corporation (SPW)
Remcom (various)
Ricardo (Vectis)
Roxar (Roxar)
TNO (Tass)
UGS (Nx Nastran)
Univ. of Birmingham (Molpro)
Univ. of Texas (AMLS)
In all other cases, you need to acquire a license.

Where do I install the license file?

The license file should be copied to:

$MPI_ROOT/licenses/mpi.lic

The license file should be readable by owner, group, and world. The license file only needs to exist on the host where rank0 will run, but it is recommended that the license file be copied to every host that will run HP-MPI jobs.

How do I start the license server?

The license server can be started using the command:

% $MPI_ROOT/bin/licensing/<arch>/lmgrd -c mpi.lic

Where <arch> is the architecture of the host that is running the license server.
How are the ranks launched? (or why do I get the message “remshd: Login incorrect.” or “Permission denied” )

There are a number of ways that HP-MPI can launch the ranks, but some way must be made available:
allow passwordless rsh access by setting up hosts.equiv and/or .rhost files to allow the mpirun machine to rsh into the execution nodes.
allow passwordless ssh access from the mpirun machine to the execution nodes and set the environment variable MPI_REMSH to the full path of ssh. (See IQ3)
use SLURM (srun) by using the –srun option to mpirun
under Quadrics, use RMS (prun) by using –prun option to mpirun
How can I verify that HP-MPI is installed and functioning optimally on my system?

A simple hello_world test is available in $MPI_ROOT/help/hello_world.c that can validate basic launching and connectivity. Other more involved tests are also included. To test the bandwidth and latency of your system use the ping_pong.c test.

Can I have multiple version of HP-MPI installed and how can I switch between them?

You can install multiple HP-MPI’s and they can be installed anywhere as long as they are in the same place on each host you plan to run on. You can switch between them by setting the environment variable MPI_ROOT.

How do I install HP-MPI in a non-standard location on Linux?

There are two ways:

% rpm –prefix=/wherever/you/want -ivh hpmpi-XXXXX.XXX.rpm

Alternatively, “untar” the rpm using:

% rpm2cpio hpmpi-XXXXX.XXX.rpm | cpio -id

How can I determine what the environment settings are during the application runtime?

HP-MPI does not propagate the full environment of the shell that started the mpirun command. To determine what variables are set during the application run, replace the application with /bin/env tool, while keeping the remainder of the mpirun command the same:

% $MPI_ROOT/bin/mpirun -stdio=p … /bin/env

The “-sdtio=p” command line option will cause all the output to be tagged with the rank ID that produced that output.

If you application is launched using a script to setup the environment, you can temporarily replace (i.e. symlink) the executable to /bin/env and still determine the runtime environment.

How can I pass an environment variable to my application?

An environment variable that is required by the application can be set on the mpirun command line using the -e flag.

% $MPI_ROOT/bin/mpirun -e <variable>=<value> …

Is there a way to automatically set an environment variable for all future HP-MPI runs?

The hpmpi.conf file can be used to set options for all future HP-MPI runs. The hpmpi.conf file exists in three places on each system. The file is read in this order, and the last entry for any particular setting is what is used.
$MPI_ROOT/etc/hpmpi.conf
/etc/hpmpi.conf
~/.hpmpi.conf
 
 
 Host environment setup  
 
 How do I setup passwordless ssh access?

First from some arbitrary machine run

host1% ssh-keygen -t dsa

When it asks for a passphrase just hit enter. This creates a file at ~/.ssh/id_dsa which is supposed to be private and local to that machine. And it creates an ~/.ssh/id_dsa.pub that will become part of the remote machines ~/.ssh/authorized_keys2 files and allows you to ssh from ost1 to those other hosts.

You can cheat a little and use the same ~/.ssh/id_dsa everywhere.

ext:

host1% cat ~/.ssh/id_dsa.pub
~/.ssh/authorized_keys2

Add the collowing lines to the ~/.ssh/config file:

StrictHostKeyChecking=no
ConnectionAttempts=15

On any new host,

newhost% mkdir ~/.ssh
newhost% cp /backup/copy/for/ssh/* ~/.ssh
newhost% chmod 700 ~/.ssh
newhost% chmod 755 ~

How can I increase the shmmax on a host?

To increase the shmmax on a host, without a reboot:

sysctl -w kernel.shmmax=<value>
sysctl -p /etc/sysctl.conf

This will modify the kernel.shmmax value in the /etc/sysctl.conf file, and then force the /etc/sysctl.conf file to be re-read by the system. 
 
 Performance problems 
 
 Are there any flags I can use to get better performance?

Better performance is a difficult thing to address in a general FAQ. In general, HP-MPI is built to provide good performance over a very wide range of input, system, application size, and other variables. In some cases, it is possible to improve the performance of an application by using specific tunable flags and settings.

The HP-MPI Users Guide includes a section on “Tuning” and a detailed description of all the HP-MPI flags and settings. The HP-MPI Users Guide can be found at http://www.hp.com/go/mpi by following the “Documentation” link.

Can the polling time of a rank be tuned? Can a rank be forced to yield without spinning?

Yes. The amount of time that a rank will poll waiting for new messages is tunable using the MPI_FLAGS environment variable.

MPI_FLAGS=y#

The “#” sign should be replaced with the number of miliseconds that a rank will spin before yielding. The default value is 10000. A value of 0 will cause the rank to yield immediately and not spin waiting for another message.

How do I turn on MPI collection of message lengths? I want an overview of MPI message lengths being send around within the application.

This information is available through HP-MPI’s “instrumentation” feature. Information is available in the user’s guide, but basically including “-i ” on the mpirun command line will create with a report that includes number and sizes of messages sent between ranks.

How does MPI_Init scale for large rank counts?

We have not done extensive profiling of the startup time for MPI_Init, but we believe that MPI_Init time is proportional to the number of ranks plus some overhead time. The overhead is likely a function of the amount of shared memory that is required on each node, plus the number of off host connections that must be established. 
 
 Interconnect and networking 
 
 What version of HP-MPI supports ConnectX?

HP-MPI v2.2.5.1 is the first release which supports ConnectX.

A failure/crash occurs when running HP-MPI applications on InfiniBand clusters using ConnectX Infiniband Host Channel Adapters when using HP-MPI versions older than v2.2.5.1. There is no specific signature to the failure. In one instance an application did not produce its results file. In another, a SEGV violation occurred. An example of the SEGV violation on a compute node named hadesn2 follows.

Terminated at Timestep -1: Segmentation Fault
srun: error: hadesn2: task[4,6]: Segmentation fault (core dumped)
srun: Terminating job

What versions of HP MPI are qualified with OFED?
HP-MPI 2.2.5 works with OFED 1.0 and 1.1.
HP-MPI 2.2.5.1 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3, but it does not support XRC in 1.3.
HP-MPI 2.2.7 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3 (including XRC).
HP-MPI 2.3 works with OFED 1.0, 1.1, 1.2, 1.2.5 and 1.3 (including XRC).

What InfiniBand HCA firmware is supported by OFED 1.3?

IOFED 1.3 supports the following Mellanox Technologies HCAs (SDR and DDR Modes are Supported):
InfiniHost (fw-23108 Rev 3.5.000)
InfiniHost III Ex (MemFree: fw-25218 Rev 5.3.000) (With memory: fw-25208 Rev 4.8.200)
InfiniHost III Lx (fw-25204 Rev 1.2.000)
ConnectX IB (fw-25408 Rev 2.3.000)
HP-MPI may not work properly when used with OFED 1.3 and an unsupported version of the HCA firmware.

How much memory is consumed when InfiniBand is used?

The amount of memory used varies with the message sizes that the application sends. For short message envelopes. 64M of memory is pinned. If long messages (>16K) are sent, up to 20% of the physical memory may be pinned during the application runtime.

When long messages (>16K) are sent, HP-MPI will pin the user provided buffer on both the sender and receiver side. Up to 20% of physical memory will be pinned. This memory is used by all the ranks on the node. If free() is called on memory that is in the 20%, the memory will be unpinned.

How do I control which network protocol is used by my application?

HP-MPI will test each protocol in the order listed in MPI_IC_ORDER, and use the first available option for the application. The test is done at runtime by the mpirun command. The default network protocol search is defined by the MPI_IC_ORDER environement variable. The defuault value in HP-MPI v2.2.7 is:

MPI_IC_ORDER=”ibv:vapi:udapl:itapi:psm:mx:gm:elan:TCP”

To force the selection of a specific network protocol, an option can be provided to mpirun. Currently, valid options are -IBV, -VAPI, -UDAPL, -ITAPI, -PSM, -MX, -GM, -ELAN, -TCP. Options can be either upper case or lower case. Upper case options will cause an error if the required network is not available. Lower case options will issue a warning if the requested network is not available, and will then use MPI_IC_ORDER to select a network.

To change the value for all future runs, the MPI_IC_ORDER variable can be added to the hpmpi.conf file.

What versions of Interconnect software stacks are supported with HP-MPI 2.3?

Protocol Option Supported OS Supported platforms NIC version !InfiniBand driver version 
Shared memory on SMP N/A Linux 2.4, 2.6 Kernels All Platforms N/A N/A
OpenFabrics -IBV Linux 2.6 Kernels IA64, i386, x86_64 Any IB card OFED 1.0, 1.1, 1.2, 1.3
uDAPL Standard -UDAPL Linux 2.4, 2.6 Kernels IA64, i386, x86_64 IB vendor specific 10GbE vendor  uDAPL 1.1, 1.2, 2.0
QLogic PSM -PSM Linux 2.6 Kernels x86_64 QHT7140,QLE7140 PSM 1.0
Myrinet MX -MX Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F,10G MX 2g, 10g, V1.2.x
Myrinet GM -GM Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F GM 2.0 and later
Quadrics ELAN -ELAN Linux 2.4, 2.6 Kernels IA64, i386, x86_64 1.4.20; 1.4.22 ELAN4
TCP/IP -TCP Linux 2.4, 2.6 Kernels All Platforms All cards that support IP Ethernet Driver, IP

What versions of interconnect software stacks are supported with HP-MPI 2.2.5.1?

Protocol Option Supported OS Supported platforms NIC version !InfiniBand driver version
Shared memory on SMP N/A Linux 2.4, 2.6 Kernels All Platforms N/A N/A
OpenFabrics -IBV Linux 2.6 Kernel IA64, i386, x86_64 Any IB card OFED 1.1, 1.2
Mellanox VAPI  -VAPI Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Mellanox Card VAPI 3.2, 4.1 
uDAPL Standard -UDAPL Linux 2.4, 2.6 Kernels IA64, i386, x86_64 vendor specific uDAPL 1.1, 1.2
QLogic PSM -PSM Linux 2.6 Kernels x86_64 QHT7140,QLE7140 PSM 1.0
Myrinet MX -MX Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F,10G MX 2g, 10g
Myrinet GM -GM Linux 2.4, 2.6 Kernels IA64, i386, x86_64 Rev D,E,F GM 2.0 and later
Quadrics ELAN -ELAN Linux 2.4, 2.6 Kernels IA64, i386, x86_64 1.4.20; 1.4.22 Elan4
TCP/IP -TCP Linux 2.4, 2.6 Kernels All Platforms All cards that support IP Ethernet Driver, IP

How is HP-MPI IB message striping (the ability to send a single message using multiple IB cards) functionality accessed?

HP-MPI supports multi-rail on OpenFabric via the environment variable MPI_IB_MULTIRAIL. This environment variable is ignored by all other interconnects. In multi-rail mode, a rank can use up to all the cards on its node, but limited to the number of cards on the node to which it is connecting. For example, if rank A has three cards, rank B has two cards, rank C has three cards, then connection A–B uses two cards, connection B–C uses two cards, connection A–C uses three cards. Long messages are striped among all the cards on that connection to improve bandwidth.

By default, multi-card message striping is off. To turn on it, specify ‘-e MPI_IB_MULTIRAIL=N’, where N is the number of cards used by a rank. If N <= 1, then message striping is not used. If N is greater than the max number of cards M on that node, then all M cards are used. If 1 < N <= M, message striping is used on N cards or less. If ‘-e MPI_IB_MULTIRAIL’ is specified, the max possible cards are used.

On a host, all the ranks select all the cards in a series. For example: Given 4 cards and 4 ranks per host, rank 0 will use cards ‘0, 1, 2, 3’, rank 1 will use ‘1, 2, 3, 0’, rank 2 will use ‘2, 3, 0, 1’, and rank 4 will use ‘3, 0, 1, 2’. The order is important in SRQ mode because only the first card is used for short messages. The selection approach allows short RDMA messages to use all the cards in a balanced way.

For HP-MPI 2.2.5.1 and older, all cards must be on the same fabric.

What extra software do I need to allow HP-MPI to run on my Infiniband hardware?
On HP-UX you do not need anything (unless you want to run IP over IB (IPoIB)). On HP XC Linux, you do not need any additional software. On other Linux distributions, you will need to use a software stack that supports the specific InfiniBand cards you are using. Please contact the InfiniBand card vendor to determine what software you should install.

Why is the IB network not being used for communication?
The problem is generally a result of not finding the modules or libraries that HP-MPI looks for. To see what modules and libraries HP-MPI expects, see $MPI_ROOT/etc/hpmpi.conf. For example, there was a change in the vapi module name from mod_vapi to mod_vip. In HP-MPI 2.2.5, this file was updated to reflect this change, but if you are running with 2.2 you may need to make this change yourself. After 2.2.5 you can always ask the HP-MPI team for the latest hpmpi.conf file. If you are not getting the network you want selected, look at what is expected in hpmpi.conf and compare to the output of lsmod and the name/location of library files on the systems you are running on.

I get a problem when I run my 32-bit executable on my AMD64 or EM64T system:

ping_pong: Rank 0:1: MPI_Init: Fatal: dlopen(libitapi.so) failed! > (/usr/voltaire/lib/libitapi.so: cannot open shared object file: No > such file or directory) > a.out: Rank 0:1: MPI_Init: Can’t initialize RDMA device
Note that not all messages that say “Can’t initialize RDMA device” are caused by this problem, but this message can show up when running a 32-bit executable on a 64-bit linux machine. At present, the 64-bit deamon used by HP-MPI cannot determine the bitness of the executable and thereby uses incomplete information to determine the availability of high performance interconnects. To work around the problem for now, be sure to use flags (-TCP, -ITAPI, etc) to explicitly specify the network to use.

Where does HP-MPI look for the shared-libraries for the high-performance networks it supports?
On HP-MPI 2.2.5 and higher, these can be changed or extended globally using /etc/hpmpi.conf or $MPI_ROOT/etc/hpmpi.conf and can be set individually using ~/.hpmpi.conf. Environment variables can be set using the ‘-e’ option to mpirun. For older versions of HP-MPI, this chart explains the selection process:

Protocol 1st attempt 2nd attempt 3rd attempt
IB Environment variable MPI_ICLIB_ITAPI libitapi.so /usr/voltaire/lib/libitapi.so
GM Environment variable MPI_ICLIB_GM libgm.so OR libgm32.so /opt/gm/lib/libgm.so OR /opt/gm/lib/libgm32.so
ELAN Environment variable MPI_ICLIB_ELAN libelan.so
uDAPL Environment variable MPI_ICLIB_UDAPL libdat.so
VAPI Environment variable MPI_ICLIB_VAPI Environment variable MPI_ICLIB_VAPIDIR libmtl_common.so, libmpga.so, libmosal.so, libvapi.so
MX Environment variable MPI_ICLIB_MX libmyriexpress.so

 

When multiple TCP/IP interconnects are present, how does HP-MPI choose the network that will be used?

By default, HP-MPI uses the system call gethostbyname() to determine the ‘primary’ IP address of a host. That is the IP address that will be used to establish the initial connection will all the hosts that will be used.

With TCP/IP networks, a specific subnet can be selected by using the “-netaddr” flag to mpirun:

% $MPI_ROOT/bin/mpirun -netaddr <ip_address> …

QLogic card with -IBV flag dumps core

QLogic encourages the use of the -PSM protocol with their adapters.
 
 
 Building applications 
 
 What compilers does HP-MPI work with?

HP-MPI has gone through great measures to not introduce compiler dependencies. HP-MPI 2.2.7 for Linux was tested with the following list of compilers:
GNU 3.2, 3.4, 4.1
glibc 2.3, 2.4, 2.5
Intel® 9.0, 9.1, 10.0, 10.1
PathScale 2.3, 2.4, 2.5, 3.1, 3.2
Portland Group 6.2, 7.0
What MPI libraries do I need to link when I build.

We strongly encourage you to use the mpicc, mpif90 and mpi77 scripts in $MPI_ROOT/bin to build. If you absolutely do not want to build with the scripts, then I would at least use them with the –show option to see what they are doing and use that as a starting point for doing your build. The -show option will just print out the command it would use to build with. These scripts are readable, so you can always poke around in them to understand when and what gets linked in.

How do I specifically build a 32-bit application on a 64-bit architecture?

On Linux, HP-MPI contains additional libraries in a 32 bit directory for 32-bit builds: $MPI_ROOT/lib/linux_ia32/

Use the –mpi32 flag to mpicc to ensure that the 32-bit libraries are used. In addition, your specific compiler may require a flag to indicate a 32-bit compilation.

EXAMPLE: On an Opteron system, using gcc, you need to tell gcc to generate 32 bit via the flag -m32. In addition, the -mpi32 is used to ensure 32-bit libraries are selected:

% setenv MPI_ROOT /opt/hpmpi
% setenv MPI_CC gcc
% $MPI_ROOT/bin/mpicc hello_world.c -mpi32 -m32
% file a.out

a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Can I use HP-MPI in my C++ application?

Yes, HP-MPI 2.2 and above contains C++ bindings.

Can I use HP-MPI in my Java application?

Yes. Here is a brief description on how to use the Open Source Java Interface to MPI for Linux (mpiJava):
Get the Java RPM for Linux

Goto http://java.sun.com

Follow the “Downloads” link, and select the Linux RPM that is appropriate to your version of Linux.

Install Java on Linux server

Copy the Linux RPM to your Linux server. Install the software, following the instructions provided by Sun.

Download mpiJava to desktop system

Goto http://aspen.ucs.indiana.edu/pss/HPJava/mpiJava.html Click “Software registration and download” Follow the instructions to build HPJava on your local machine.
 
 
 Known Problems and Error Messages 
 
 My application crashes in HP-MPI

If you have an application that crashes with a segv, and the backtrace shows that it was in HP-MPI when it segv’ed, try passing -e MPI_FLAGS=Eon to mpirun. This requests that HP-MPI do argument checking on what is passed into HP-MPI from user code. Normally HP-MPI does no argument checking, which can in some cases cause a segv or other undefined behavior.

HP-MPI 2.2.5.1 Rank 0:0: MPI_Comm_spawn_multiple: spawn failed

This is a known problem with HP-MPI 2.2.5.1. The C++ bindings contained an error that caused calls to MPI_Comm_spawn to fail. The problem is fixed in HP-MPI 2.2.7. If for some reason you are unable to upgrade, please contact HP-MPI support at mpi@rsn.hp.com. We can provide a new set of C++ headers that can be used to recompile your application to enable spawn support.

ScaLAPACK does not work with HP-MPI mpirun

ScaLAPACK requires the use of the HP-MPI MPICH compatibility mode. You will need to recompile your application code using the MPICH compatibility compiler wrappers:

mpicc.mpich, mpiCC.mpich, mpif77.mpich, mpif90.mpich

When launching the application, you will need to use the

$MPI_ROOT/bin/mpirun.mpich

command.

HP-MPI 2.2.7 applications exit if -sdtio=bnone is set

There is no work around to allow -stdio=bnone to work in HP-MPI 2.2.7. This problem has been resolved in HP-MPI 2.3.

HP-MPI 2.2.7 will not work if LD_BIND_NOW is set

HP-MPI 2.2.7 introduced a new architecture to make collectives easier to upgrade and extend. That new architecture uses a “lazy” dlopen to inspect internal libraries. The LD_BIND_NOW environment variable forces all dlopen calls to ignore the “lazy” guidance. As a result, an error message about “unresolved symbols” will be displayed. At this time, there are two possible workarounds:

Unset the environment variable LD_BIND_NOW
Set the environment variable MPI_DISABLE_PLUGINS=1
This will be fixed in a future release of HP-MPI.

libibverbs: Fatal: couldn’t open sysfs class ‘infiniband_verbs’

This error message happens on RedHat 4 systems that do not have InfiniBand hardware installed. RedHat 4 pre-installed some InfiniBand libraries. HP-MPI attempts to auto detect the available interconnects. If InfiniBand libraries are installed, HP-MPI will open these libraries and attempt to detect if InfiniBand is in a usable state. The detection process will fail if no InfiniBand hardware is installed on the host. This causes the error message to be displayed.

The workaround is to remove IBV and UDAPL from the default interconnect search order. This should only be done on RedHat 4 systems that do not have InfiniBand hardware installed. The default MPI_IC_ORDER is:

MPI_IC_ORDER=”ibv:vapi:udapl:itapi:psm:mx:gm:elan:TCP”

Create or Edit the /etc/hpmpi.conf file to remove IBV and UDAPL from the MPI_IC_ORDER environment variable. The $MPI_ROOT/etc/hpmpi.conf file can be used as a template.

MPI_IC_ORDER=”vapi:udapl:itapi:psm:mx:gm:elan:TCP”

Alternatively, an interconnect selection flag can be given on the mpirun command line. For example, to select TCP as the interconnect: % $MPI_ROOT/bin/mpirun -TCP …

libibverbs: Warning: fork()-safety requested but init failed

This is a benign warning message that is issued when using HP-MPI v2.2.5.1 or later on Linux kernels older than v2.6.12, and OFED (OpenFabrics Enterprise Distribution) version 1.2 or later.

The application will run successfully.

The message is repeated for each rank used by the application. So multiple instances of this message may be found in the output file.

The message can be avoided by any one of the following:
Run on a kernel of version 2.6.12 or newer.
Set HP-MPI environment variable MPI_IBV_NO_FORK_SAFE to 1. For Example:

/opt/hpmpi/bin/mpirun -np 8 -e MPI_IBV_NO_FORK_SAFE=1 -srun ./a.out
Application that uses fork() crashes

Applications using fork() may crash on configurations with InfiniBand using OFED (OpenFabrics Enterprise Distribution) on kernels older than v2.6.18. This includes indirect use of fork() such as calls to system() and popen(), which invoke fork().

Known problems with fork() and OFED can be avoided in any of the following ways:
Run on XC V3.2.1 or greater, where all known OFED fork() fixes have been made.
Run on a system with kernel 2.6.18 or greater
Run InfiniBand with non-OFED drivers. This is not an option on configurations with ConnectX InfiniBand Host Channel Adapters, where OFED is required.
Set MPI_REMSH to a local command, got error: Cannot execvp “<command>”: No such file or directory

By default, HP-MPI uses a tree structure to start large jobs. When the number of hosts is greater than 20, the mpid’s use MPI_REMSH to create other mpid’s. So, the MPI_REMSH command must be accessible on each host. This error can be avoided in two ways:
make MPI_REMSH available on each host you plan to run on
Force the mpirun to start all the remote mpid’s. MPI_MAX_REMSH=99999999
MPI_Init: MPI BUG: IBV requested but not available

This message may be displayed by multi-threaded applications that are using HP-MPI 2.2.7. When the pthread library is included with the application, the expected library linking order is altered. This will be fixed in a future release of HP-MPI. The workaround is to add the MPI_ICSELECT_FORK environment variable to the mpirun command line:

% $MPI_ROOT/bin/mpirun -e MPI_ICSELECT_FORK=0 …

MPI_Init: Could not pin pre-pinned rdma region 0

The pinnable memory limit is set too low for the user to allocate and pre-pin memory for use with RDMA message transfers. This limit can be adjusted in the /etc/security/limits.conf file on Redhat, and /etc/syscntl.conf on SUSE. On our systems, we set the limits to:

soft memlock 524288
hard memlock 524288

The sshd service will need to be restarted after the change is made.
% service sshd restart

uDAPL dat_ia_openv shows udapl not available
Starting with RHEL 4 update 5, the dapl RPM delivers the dat.conf configuration files into the /etc/ofed directory. The configuration files are named dat32.conf and/or dat64.conf, depending on the bitness of the OS. Beginning eith RHEL 5 update 2 the file is named dat.conf. The libdat expects the configuration file to be called /etc/dat.conf. This means that an out-of-the-box dapl installation will never be able to locate the configuration file. If the uDAPL protocol is used by mpirun, the application will fail with errors like:
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_20 setting.
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_20b setting.
uDAPL dat_ia_openv shows udapl not available via UDAPL_MAIN_1X setting.
mpid: MPI BUG: uDAPL requested but not available

The solution is to copy the configuration file to /etc/dat.conf.

uDAPL runs hang on QLogic clusters
If an application using uDAPL running on QLogic hangs during the initialization process the problem may be lack of a module “rdma_ucm”. To load that module by hand you can use the command:

# modprobe rdma_ucm

To have the module loaded automatically, add rdma_cm and rdma_ucm to the OPENFABRICS_MODULES assignment in /etc/sysconfig/infinipath.

MPI_Init: dat_evd_wait()1 unexpected event number 16392

Users may see the following error during launch of HP-MPI applications on Chelsio iWARP hardware:

Rank 0:0: MPI_Init: dat_evd_wait()1 unexpected event number 16392
Rank 0:0: MPI_Init: MPI BUG: Processes cannot connect to rdma device
MPI Application rank 0 exited before MPI_Finalize() with status 1

To prevent these errors, Chelsio recommends passing the peer2peer=1 parameter to the iw_cxgb3 kernel module. This can be accomplished by running the following commands as root on all nodes:
# echo “1” > /sys/module/iw_cxgb3/parameters/peer2peer
# echo “options iw_cxgb3 peer2peer=1” >> /etc/modprobe.conf

The second command is optional and makes the setting persist across a system reboot.

HP-MPI 2.3 problems with -e MPI_UDAPL_MSG1=1
Users of iWARP hardware may see errors resembling the following:

dapl async_event QP (0x2b27fdc10d30) ERR 1
dapl_evd_qp_async_error_callback() IB async QP err – ctx=0x2b27fdc10d30

Previous versions of HP-MPI documented that passing “-e MPI_UDAPL_MSG1=1” was necessary on some iWARP hardware. As of HP-MPI 2.3, no iWARP implemtations are know to need this setting and it should be removed from all run scripts unless otherwise instructed.

-prot causes segv in MPI_Finalize
This is a know issue with “-prot” at high rank counts. The problem occurs most frequently at higher than 512 ranks. The workaround is to remove the “-prot” from the command line of any application run that experiences this problem.

mpid: MPI BUG: execvp failed: Cannot execute ./a.out: No such file or directory

MPID cannot find the executable to spawn in the following case:

The executable specified via the 2nd argument of MPI_Comm_spawn(_multiple)is a relative path, or has no path information at all.
Specifying a full path to MPI_Comm_spawn(_multiple) will always work.
The executable specified is NOT in the user’s home directory (Having the executable in a subdirectory of the home directory will demonstrate this problem).
Mpirun uses -hostlist (or appfile mode) to launch the initial ranks. There are several possible workarounds for this problem
Setting -e MPI_WORKDIR=`pwd` works if the executable to spawn is in the current working directory.
Specifying a full path to MPI_Comm_spawn’s 2nd argument is a valid workaround.
Placing the executable in the user’s home directory will also work.
mpid: An mpid exiting before connecting back to mpirun

In some cases, a batch job launcher is not able to launch jobs that are distributed on more than 20 hosts using the default HP-MPI settings. For jobs that will run on more than 32 hosts, it is possible that increasing MPI_MAX_REMSH will allow the job to launch normally. The default value of MPI_MAX_REMSH is 20. In most cases, increasing the value to be equal to the number of nodes will allow the job to successfully launch.

MPI_MAX_REMSH specifies the number of remote hosts that the mpirun command should connect to while launching the job. If the number of hosts in the job exceeds the value of MPI_MAX_REMSH, then the remaining hosts are contacted in a tree structure by the hosts in the first level.
 

서진우

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.