대규모 MPI 구성 시 ssh 연결 속도 증가 시키기

Of critical interest to us is how long it takes to startup and shutdown large jobs. I have been doing some experimentation on a 64 2-cpu/node sparc cluster. I ran a variety of timing tests and I realized that starting the jobs on the remote nodes via rsh is much faster than ssh. However, rsh is limited to using ports 512-1024 so if one runs several large jobs at the same time, or one runs them in quick succession, one can quickly run out of available ports. Therefore, we suggest one uses ssh instead (or a resource manager, but that is another story).

I have therefore looked into how to speed up ssh. I am not interested in the security features of ssh as I am running in a safe environment, so if I can give up security for speed, I will do so. I discovered that one way to speed things up is to have modify ssh and the sshd do use sshv1 instead of sshv2. Following are the instructions on how to do this.

  1. Generate the key needed by sshv1. This is done on the node that you want to ssh to. Just leave the passphrase blank.
    • # cd /etc/ssh
    • # ssh-keygen -t rsa1 -f ssh_host_key
  2. Make the following changes to the /etc/ssh/sshd_config file.  The first change tells sshd to use Protocol 1.  The second change generates the key needed by Protocol 1 as it is not created by default.
      Protocol 1
      HostKey /etc/ssh/ssh_host_key
  3. Restart the sshd.
    • # svcadm restart network/ssh

Now, you need to create public keys so that you do not get asked for a password. For every node, you need to do this. Note that I suggest the removal of the identity files prior to running ssh-keygen. This is because it did not appear to overwrite the existing version (even though it should).

  • > cd /home/rolfv/.ssh
  • > /bin/rm -rf identity identity.pub
  • > ssh-keygen -t rsa1 -f identity
  • > cat identity.pub >> authorized_keys


To verify that you are really using sshv1, you can do ssh with the -1 switch. (That is the number one, not an ell).

  • > ssh -1 allegany 

If you try it with sshv2, you should get an error.

  • > ssh -2 allegany

 Protocol major versions differ: 2 vs. 1

UPDATE – Here are the results of comparing rsh, ssh, and sshv1 run times of a simple MPI job.

사용자 삽입 이미지

서진우

슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.