infiniband 환경에서 HP-MPI 사용 Solver의 통신방식을 강제로 TCP 로 지정하기

On some Linux machines, Infiniband libraries are installed (for example with OpenMPI) without the corresponding kernel drivers and/or hardware. This could cause a CFD-ACE+ parallel run to stop with error messages related to Infiniband. The error messages maybe any of the following or something similar depending on your configuration:  

libibverbs: Fatal: Couldn’t read uverbs ABI version
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: didn’t find active interface/port
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: Can’t initialize RDMA device
CFD-ACE-SOLVER-MPM-MPI: Rank 0:0: MPI_Init: MPI BUG: Cannot initialize RDMA protoc

In such cases, one has to force HP-MPI to use TCP connections by setting a new environment variable. The MPI_IC_ORDER environment variable can be used to force HP-MPI to ignore all other interconnects except TCP.

Variable name: MPI_IC_ORDER                                                                                                                                Variable value: TCP

For bash/sh/ksh: export MPI_IC_ORDER=”TCP”
For csh/tcsh: setenv MPI_IC_ORDER “TCP”

This needs to be set only on the master node.  

MPI_IC_ORDER is an environment variable whose default contents are:    

ibv:vapi:udapl:psm:mx:gm:elan:itapi:TCP”  

It instructs HP-MPI to search in a specific order for the presence of an interconnect. Lowercase selections imply ‘use if detected, otherwise keep searching’. An uppercase option demands that the interconnect option be used, and if it cannot be selected the application will terminate with an error. This can be used to set a different interconnect if available.

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.