Xenserver-6 가상화 환경 구현 및 vGPU Passthrough 구현

by 서진우 · Published 2016년 8월 16일 · Updated 2024년 12월 17일

Ctirix Xenserver 6.2 + GPU Passthrough 환경구축하기

작성일 : 2012-12-01

작성자 : 클루닉스/서진우 (alang@clunix.com)

1. XenServer 설치 및 기본 운영 방법

2013년 7월경 Citrix XenServer 6.2 부터 기존의 상용 라이선스 정책을 적용하던 XenServer
가 OpenSource 재단에 기부되었음.

Citrix의 기술지원과 XenCenter를 통해 자동 업데이트 및 자동 Hotfix 적용 기능을 사용하기
위해서는 정식 라이선스를 구매해야 하지만, 이 부분을 엔지니어가 직접 수행할 경우에는
라이선스가 필요하지 않다.

참고로 업데이트 패키지와 Hotfix는 Citrix.com 혹은 xenserver.org 사이트에서 다운 받을수
있고, 관리자가 xe 명령을 통해 수동으로 적용이 가능하다.

XenServer 6.2 부터는 라이선스가 부가 기능 활성 용도가 아닌 기술지원 비용으로 전환되었다.
http://xenserver.org 사이트에 XenServer 6.2 에 대한 모든 Source가 공개 되어 있다.

일단 XenServer 6.2를 설치한다. 별도의 윈도우 PC에 Xenserver Management 프로그램인
XenCenter를 설치한다.

– Hotfix 적용 방법

설치가 완료되면 XenServer Hosts 서버에 기본적으로 두개의 Hotfix 를 적용하도록 한다.

Hotfix XS62E004 (공통)
Hotfix XS62ETP001 (vGPU Passthrough 시 필요)

Hotfix 를 찾는 방법은 XenCenter 를 실행하고 상단 “Tools” 메뉴의 “Check for Updates..”
를 클릭하면 적용가능한 Hotfix 가 나타난다. “Web Page” Link 를 선택하면 자동으로 다운로드
페이지로 이동된다.

: http://support.citrix.com/article/CTX138833

XS62ETP001 은 NVIDIA GRID K GPU 장치를 통해 vGPU 를 할당할 경우 필요한 Hotfix 이다.

기본적으로 해당 Hotfix 다운로드 경로는 찾아지지 않는다. 아래 경로에서 다운을 받도록한다.

# wget http://downloadns.citrix.com.edgesuite.net/8174/XS62ETP001.zip

Hotfix 를 적용하기 위해 Xenserver 호스트 서버에 접속한다.
먼저 Hotfix 파일 압축을 푼다.

# unzip XS62E004.zip
# unzip XS62ETP001.zip

적용할 Hotfix 파일을 로딩한다.
# xe patch-upload file-name=./XS62E004.xsupdate
5579f1f0-ff83-11e2-b778-0800200c9a66 (해당 Hotfix의 uuid를 확인할 수 있다)

Hotfix를 Host pool 에 적용한다.
# xe patch-pool-apply uuid=5579f1f0-ff83-11e2-b778-0800200c9a66

만일 Pool 구성을 하지 않고 개별 host 에 적용을 할 경우에는 아래 방식으로 적용한다.

# xe patch-apply host-uuid=<host uuid> uuid=<patch uuid>

패치 결과를 확인한다.
# xe patch-list name-label=XS62E004
——————————————————————————————
uuid ( RO) : 5579f1f0-ff83-11e2-b778-0800200c9a66
name-label ( RO): XS62E004
name-description ( RO): Public Availability: Fixes for Dom0 kernel issues
size ( RO): 35766769
hosts (SRO): c9ee4819-0646-4ec0-a45e-7f4195d8da58
after-apply-guidance (SRO): restartHost
——————————————————————————————

XenServer 호스틀 reboot 한다.

Hotfix XS62ETP001 은 Hotfix XS62E004 를 적용한 뒤에 적용해야 한다.
적용방법은 동일하다.

혹시 업로드한 패치를 제거하고 싶은 경우 아래 방법을 사용한다.

# xe patch-list
# xe patch-destroy uuid=<patch_uuid>

# xe patch-clean uuid=<patch_uuid>

– Dom0 메모리 변경 방법

설치관련 최적화 이슈로 Dom0 의 Default 메모리를 늘릴 필요가 있다. 기본적으로 752MB가
지정이 되는데, 많은 수의 DomU를 운영할 경우 Dom0의 메모리가 적을 경우 안정성에 영향을
줄 수 있다. Xenserver 호스트의 메모리가 충분한 경우 가급적 Dom0의 메모리를 전체 메모리
와 비교하여 안정적으로 확보해 주는 것이 좋다.

XenServer 6.2 에서는 Dom0에 대한 적정 메모리지정이 자동으로 이루어 진다. (개선 기능)
하지만 관리자가 임으로 지정을 하고 싶을때 아래 방법을 사용할 수 있다.

XenServer 설치 CD를 넣고 부팅을 하면 초기 부트 화면이 나타남.
boot :

여기서 F2 키를 누른 후 “shell” 키워드를 입력한다.

boot : shell

그럼 bash# 프롬프트 화면이 나타난다.

# vi /opt/xensource/installer/constants.py

DOM0_MEM 키워드를 찾아 앞에 주석을 제거하고 원하는 메모리를 지정한다.
# DOM0_MEM=752
.
DOM0_MEM=4092
저장하고 나온 뒤 프롬프트에 exit 를 입력한다.

bash-3.2 # exit

기본 설치 과정이 진행한다.

– 기본 로컬 DISK 용량 변경

XenServer를 설치하면 기본 ROOT 파티션 용량이 4GB 정도이다. 장기간 운영을하다보면
/var 밑에 Log가 늘어나서 Disk full 이 발생할 수 있다. 이런 부분을 예방하기 위해서
정식 서비스의 경우 20GByte 정도 상향해서 잡도록 한다.

로컬디스크 용량은 설치가 되면 이후 확장하기 힘들다. 설치 전에 용량 계획을 해서
설치 시 반영해야 한다. 반영 방법은 Dom0 메모리 변경과 동일한다.

# vi /opt/xensource/installer/constants.py

#root_SIZE = 4096
.
root_SIZE = 20480

변경
– Local Storage 새로 생성하기

기본적으로 Xenserver 를 설치되면 Xenserver hypervisor 가 설치되는 디스크장치를 하나의
로컬 storage로 인식한다.

만일 새로 디스크를 추가하거나 설치 전 부터 여러개의 디스크를 장착한 경우, local storage
를 분산하여 VM을 운영함으로 local disk 에 대한 i/o 집중을 분산 시킬 수 있다.

참고로, Xenserver 의 경우 기본적으로 local storage에 VM을 생성할 경우 LVM 을 통해 가상
볼륨을 만든다. 해당 LVM으로 나누진 파티션에 VM의 이미지가 설치되어 진다.

이밖에 여러가지의 Storage 방식을 제공하는데, 이부분은 별도의 자료를 통해 확인하기 바람.

일단 기본 local storage를 확인해 보자
먼저 현재 Xenserver 에서 인식하고 있는 HDD 나 Volume 정보를 확인한다.

# fdisk -l
# cat /proc/partitions
# ls -al /dev/disk/by-id

# xe sr-probe type=lvm device-config:device=/dev/sda3
——————————————————————————–
<?xml version=”1.0″ ?>
<SRlist>
<SR>
<UUID>
1f35c9a0-a55d-37a0-6ac1-2c8ddc619c4f
</UUID>
<Devlist>
/dev/sda3
</Devlist>
<size>
991600574464
</size>
</SR>
</SRlist>
———————————————————————————

새로운 HDD 혹은 Storage Volume 을 추가할 경우 해당 장치를 새로운 local storage로
추가해 보자

우선 LVM SR 만들기

# xe sr-create host-uuid=<host_uuid> content-type=user \
name-label=<”Local LVM SR”> shared=false device-config:device=<add disk devcie> \
type=lvm

실제 시스템을 대상으로 추가를 하면 아래와 같은 SR에 대한 uuid 가 출력된다.

# xe sr-create host-uuid=c9ee4819-0646-4ec0-a45e-7f4195d8da58 content-type=user \
device-config:device=/dev/mapper/ddf1_XENSRp1 name-label=”Local Storage 2″ \
shared=false type=lvm
———————————————————————————-
054b7cb9-30cf-93d1-55a7-f565ba6511f0

Local storage 가 잘 추가가 되었는지 확인한다.

xe sr-probe type=lvm device-config:device=/dev/mapper/ddf1_XENSRp1

EXT3 SR 만들기

# xe sr-create host-uuid=<host_uuid> content-type=user \
name-label=<”Local EXT SR”> shared=false device-config:device=<disk device> \
type=ext

참고사항

XenServer의 Local Storage에 VM을 생성하게 되면 이는 대부분 LVM 방식으로 관리가
된다. LVM으로 관리를 하면 VM의 용량을 동적으로 조정(확장)하거나 VM copy, snapshot
등을 할때 빠른 성능을 나타낸다.

생성된 실제 LVM 정보는 LVM 관련 명령을 통해 확인할 수 있다.

pvdisplay -> Local Storage 에 대한 시스템 적용 상태 확인
lvdisplay -> VM 에 할당된 Storage 에 대한 시스템 적용 상태 확인

– Multi HDD 를 가지고 SR 만들기

각 디스크의 세부 파티션은 나누지 말고, 장치 그대로 사용한다.

# ls -al /dev/disk/by-id/*

sdb, sdc 두개 디스크의 id 값을 가지고 하나의 SR을 구성한다.

# xe sr-create host-uuid=<host-uuid> content-type=user name-label=<”Local EXT SR”> shared=false device-config:device=<sdb_disk_id>,<sdc_disk_id> type=lvm

other-config 값을 추가한다.

# xe sr-list params=all uuid=<sr-uuid>
# xe sr-param-set uuid=<sr-uuid> other-config:i18n-key=local-storage
# xe sr-param-set uuid=<sr-uuid> other-config:i18n-original-value-name_label=”Local storage”
– 안전모드로 부팅하기

간혹 HOtfix 를 잘못 시키거나 특정 장애로 인해 Xenserver가 정상 부팅이 안될 경우가 있다.
slash 화면에서 멈추어 더 이상 진행을 하지 않고, 아무런 키도 먹히지 않는 경우가 있다.

이경우 xenserver safe mode(linux rescue mode) 로 부팅 할 필요가 있다.

서버 부팅 후, Xenserver Splash Screen이 나오기 전에 ESC 키를 여러번 누른다.
그럼.. boot: 프롬프트가 나타난다. menu.c32 를 입력하도록 한다.

boot : menu.c32

그럼..여러가지 옵션이 나타난다.

safe 메뉴를 선택한다. 간혹 커널 옵션등을 변경하고 싶을때는 TAP키를 이용한다.

safe 모드로 부팅을 하면 부팅 과정을 살펴볼 수 있는데, 간혹 Hotfix 가 Kernel에 해당되는
경우(Kernel upgrade) 리부팅 시 Kernel Panic 가 같은 상황이 발생할 수 있다.

이 경우는 menu.c32 화면에서 failback mode 로 부팅을 한 후, 새로 적용된 kernel을 제거
하도록 한다.

rpm -e 로 Hotfix 에 적용된 kernel 을 제거하고, /boot 에서 initrd-2.6-xen.img, vmlinuz-2.6-xen
link 파일을 이전 정상 커널 파일로 relink(ln -sf) 시킨다.
– 백업하기

XenServer 에서 백업 대상은 크게 3가지이다.

VM(가상머신), HOST(가상서버), POOL(가상서버리소스집합)

Xenserver 에서 VM을 백업하는 방법은 여러가지가 있다. VM copy 로 원본 VM과 동일한 VM
을 만들수도 있고, snapshot 으로 VM의 특정 시점의 상태를 백업할 수도 있다.
또한 VM을 Template 화 시켜서 설치 시 동일한 구성으로 재 배포가 가능하다.

하지만 이 모든 방법이 같은 Host 혹은 동일한 스토리지를 사용하는 같은 pool 환경에서
적용이 가능하다.

다른 VM 백업본을 Host, Pool 에서 이용하는 방법으로 VM export 방법을 사용해야 한다.

# xe vm-export vm=<vn_name> filename=<export_vm.xva>

만일 VM의 Meta 정보만 백업을 받고자 할때는 위 명령 뒤에 –metadata 옵션을 추가한다.

# xe vm-export vm=<vn_name> filename=<export_vm.xva> –metadata

생성된 VM export 백업 파일을 통해 다른 Host 에 적용하는 방법은 아래와 같다.

# xe vm-import filename=export_vm.xva (sr-uuid=<sr_uuid>) (preserve=true)

특정 SR로 Import 하고 싶을때는 sr-uuid를 지정한다.
Mac Address를 유지하고 싶을때(same License Server..) preserve=true 옵션을 추가한다.

참고로 VM export을 수행할 경우 VM 백업파일 용량이 크면 Xenserver host 에 영향을 줄수
있다. ( XenServer host 설치 시 기본 root 파티션 용량 : 4GB )
그러므로 VM export는 대부분 원격에서 수행하는 것을 권장한다.

원격 호스트에 xe 원격 명령 전송 방식으로 수행을 하면 된다.

# xe <command> -s master_ip -u root -pw <password>

원격에 관련된 설정 항목을 환경변수로 지정할 수 있다.

export XE_EXTRA_ARGS=”server=${POOL},port=${PORT},username=${USER},password=${PASSWORD}”
그런후..
xe vm-list

이밖에 Xenserver의 주요 백업 대상을 Host 와 Pool 이 있다.

먼저 Pool 백업 방법이다.

아래는 Pool에 대한 설정 정보(Metadata)를 백업한다.

# xe pool-dump-database file-name=pool.backup

백업이 정상적으로 되었는지 확인한다.

# xe pool-restore-database file-name=pool.backup dry-run=true

백업한 리소스 풀 데이터 복구하기

# xe pool-restore-database file-name=pool.backup

다음은 Host 백업 방법이다.

# xe host-backup host=<hostname> file-name=hosto.backup \
-s <master_ip> -u root -pw <password>

호스트를 백업하면 /dev/sda1 에 설치된 xenserver OS 파일 전체를 압축하여
이미지 파일로 백업한다. 해당 백업 파일은 가급적 외부 스토리지에 저장하길 권장한다.
실제 운영하는 xenserver를 백업해 보면 4~6GB 정도의 크기로 저장된다.
운영하는 xenserver OS 가 복구 불능 상태가 되면..아래 명령을 통해 복구가 가능하다.
아예 접속자체가 불가능하다면 새로 xenserver를 설치한 후, 신규 설치된 xenserver
상태에서 백업 이미지를 통해 이전 xenserver 형태로 복구도 가능하다.

대신 xenserver os 가 설치된 sda 의 빈 공간을 SR로 잡은 경우 해당 SR은 초기화가
되어진다.

아래는 복구 방법이다.

호스트 역시 큰 백업 파일이 생기기 때문에 원격 CLI명령을 이용하여 실행한다.

# xe host-restore file-name=host.backup -s <master_ip> -u root -pw <password>
이렇게 복구가 완료된 후, 리부팅을 하고, install cd 를 넣고 부팅을 하면, 설치과정에
Restore from backup 메뉴가 나타난다. 해당 메뉴를 선택하면 자동으로 복구되고, 다시
리부팅을 하면 기존 xenserver 상태로 복구가 되어진다.

앞서 얘기한 바와 같이 새로 xenserver를 설치하고 복구한 경우에는 sda3 에 생성된
SR은 초기화된다. 그렇기 때문에 기존 SR 정보를 삭제하고, 다시 SR을 생성하여
사용하도록 한다. 실제 사용방법은 “SR 제거하기”를 참고하기 바란다.

– 원격 CLI 환경 구축

기본적으로 Xenserver 가 설치되면 해당 호스트에 대해 xe 명령으로 Xenserver관리를
할수 있다. 하지만 Xenserver가 설치되지 않은 일반 Linux 시스템에서 원격으로 xe
명령을 이용해야 하는 경우가 있다. (백업/복구)

이 경우 일반 Linux 시스템에 xe command 기능이 담겨 있는 xapi-xe 패키지를 설치하는
방법이다.

# yum install stunnel

Citrix Xenserver Install CD 혹은 ISO 파일을 마운트하여 Client_install 디렉토리에
가면 xe-cli 패키지가 있다. 해당 패키지를 설치한다.

# rpm -Uvh xe-cli-6.2.0-70442c.i686.rpm

이제 원격 XenServer에 xe cli 로 명령을 수행해 보자

[root@alang00 ~]# xe host-list -s 192.168.201.151 -u root -pw <password>
————————————————————————————–
uuid ( RO) : c9ee4819-0646-4ec0-a45e-7f4195d8da58
name-label ( RW): xen-alang
name-description ( RW): Default install of XenServer
————————————————————————————-

2. pGPU Passthrough 방법

XenServer 6.2 환경에서 Physical Graphic Board를 passthrough 하는 방법에 대해 알아보자

간단한 절차는 아래와 같다.

Xenserver 6.2 설치 > Hotfix XS62E004 적용 > Hotfix XS62ETP001 적용(GRID K 이용시) >
Nvidia-vgx-xenserver 설치 (GRID K 이용 시) > Windows 7 VM 생성 (생성 시 VM 자동 시작 안함) >
vGPU 생성 및 VM mapping > VM 에 Windows 7 운영체제 설치 > HP RGS설치 > NVIDIA GPU driver 설치

> Xentools 설치

Xenserver 6.2 에서 GPU Passthrough 하는 방법은 크게 2가지로 나눈다.

하나는 Quadro나 Kepler 와 같이 일반적인 pGPU를 passthrough 를 통해 VM에 직접 할당하는
방법이다. 다른 하나는 GRID K2, K1 같이 NVIDIA VGX 기술을 이용한 vGPU를 VM에 할당하는
방법이다.

우선 pGPU를 VM에 제공하는 방법이다.

먼저 서버의 BIOS상에서 Intel VT-d 기능을 활성화 시킨다.

XenServer 상에서 Intel VT-d 기능이 적용되었는지를 확인다.

# xl dmesg | grep VT
————————————————————————
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.

————————————————————————
XenServer 상에 인식되어 있는 GPU 장치를 확인한다.

# lspci | grep VGA
————————————————————————
02:00.0 VGA compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 / C2075] (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 / C2075] (rev a1)
07:01.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
83:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [Quadro K5000] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 / C2075] (rev a1)
————————————————————————
# xe pgpu-list
————————————————————————
uuid ( RO) : 5c89f6b2-3921-d8b5-a847-d85e882ef22f
vendor-name ( RO): Matrox Electronics Systems Ltd.
device-name ( RO): MGA G200eW WPCM450
gpu-group-uuid ( RO): bc73dd4e-35aa-14ad-94a5-35c7b37b3253
uuid ( RO) : db3dbbdd-dd90-7000-79f6-fc10d23b8859
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GF110GL [Tesla C2050 / C2075]
gpu-group-uuid ( RO): f7197a87-ea53-556c-274c-c12c09fb20b4
uuid ( RO) : 14648feb-5603-7020-3ed1-5a866222d035
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GF110GL [Tesla C2050 / C2075]
gpu-group-uuid ( RO): f7197a87-ea53-556c-274c-c12c09fb20b4
uuid ( RO) : 4dfacb43-d8a8-6aa1-382b-06b97be79fbf
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [Quadro K5000]
gpu-group-uuid ( RO): 74aa44ad-ac92-d451-c490-9c89bb5c0c59
uuid ( RO) : dda1942f-7edd-90fb-0b6d-4bb612ad1836
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GF110GL [Tesla C2050 / C2075]
gpu-group-uuid ( RO): f7197a87-ea53-556c-274c-c12c09fb20b4
————————————————————————

다음으로는 Xenserver 상에서 인식하고 있는 GPU group 정보를 확인하다.
Citrix Xenserver 는 open Xen 가는 달라 동일한 사양의 GPU가 여러개 있는 경우 이를 하나의
그룹으로 인식을 하고, GPU 그룹을 여러 VM에 할당하면, 그룹내에 존재하는 GPU를 자동으로
개별 VM에 할당하도록 되어 있다. 즉 GPU 별로 pci id 를 일일이 파악하여 VM별로 개별 지정
할 필요가 없다.

# xe gpu-group-list
————————————————————————
uuid ( RO) : f7197a87-ea53-556c-274c-c12c09fb20b4
name-label ( RW): Group of NVIDIA Corporation GF110GL [Tesla C2050 / C2075] GPUs
name-description ( RW):
uuid ( RO) : 74aa44ad-ac92-d451-c490-9c89bb5c0c59
name-label ( RW): Group of NVIDIA Corporation GK104GL [Quadro K5000] GPUs
name-description ( RW):
uuid ( RO) : bc73dd4e-35aa-14ad-94a5-35c7b37b3253
name-label ( RW): Group of Matrox Electronics Systems Ltd. MGA G200eW WPCM450 GPUs
name-description ( RW):
————————————————————————

일단 XenCenter 에서 GPU를 할당할 Windows 7 VM의 Metadata만 생성해 둔다. 기본적으로
VM Metadata 생성 시 자동으로 VM을 시작하여 운영체제 설치가 바로 진행되는데, 바로 시작하지
않은 상태에서 Meta정보만 생성한다.

이제 생성된 VM 정보를 확인한다.

# xe vm-list
————————————————————————
uuid ( RO) : 755b8431-066d-b42d-260e-3f7e8a38c132
name-label ( RW): win7-01
power-state ( RO): halted
uuid ( RO) : 41072988-71d3-7f20-b781-47235fff93fc
name-label ( RW): win7-02
power-state ( RO): halted
uuid ( RO) : 1828ec5a-42bf-48ca-ab85-382288282df1
name-label ( RW): Control domain on host: xen-alang
power-state ( RO): running
uuid ( RO) : b627fb12-ef0e-4347-8588-48353d961c33
name-label ( RW): win7-01s
power-state ( RO): halted
uuid ( RO) : ea006743-5185-6130-4dc7-85d2535a3c4e
name-label ( RW): win7-03
power-state ( RO): halted

# xe vm-list name-label=win7-01
uuid ( RO) : 755b8431-066d-b42d-260e-3f7e8a38c132
name-label ( RW): win7-01
power-state ( RO): halted
————————————————————————

이제 xe gpu-group-list 에서 확인된 GPU group을 특정 win7-03 VM에 할당해 보도록 하자

우선 기존에 할당된 gpu_group이 없는지 확인한다.

# xe vgpu-list vm-uuid=ea006743-5185-6130-4dc7-85d2535a3c4e

그럼 아무런 정보가 출력되지 않아야 한다. 만일 기존에 할당된 GPU가 있다면 아래와 같이
할당된 gpu-group-uuid 가 출력이 될것이다.

# xe vgpu-list vm-uuid=755b8431-066d-b42d-260e-3f7e8a38c132
uuid ( RO) : 3dab6b01-c202-f723-8474-4529e030a00e
vm-uuid ( RO): 755b8431-066d-b42d-260e-3f7e8a38c132
gpu-group-uuid ( RO): f7197a87-ea53-556c-274c-c12c09fb20b4

만일 새롭게 gpu-group을 할당하고자 하면 아래와 같은 방법으로 기존 할당된 gpu-group
을 제거한다.

# xe vgpu-destroy uuid=3dab6b01-c202-f723-8474-4529e030a00e
# xe vgpu-list vm-uuid=755b8431-066d-b42d-260e-3f7e8a38c132

이제 win7-03 VM 에 Quadro K5000 의 pGPU를 할당한다.

pGPU를 할당하기 위해서는 win7-01의 uuid 와 Quadro K5000 GPU가 포함된 gpu_group의 uuid
정보가 필요하다. xe vm-list, xe gpu-group-list 명령으로 확인이 가능하다.

pGPU 할당은 xe vgpu-create 명령으로 가능하다.

# xe vgpu-create vm-uuid=ea006743-5185-6130-4dc7-85d2535a3c4e gpu-group-uuid=74aa44ad-ac92-d451-c490-9c89bb5c0c59
————————————————————————
7bdd7660-28da-31b2-b538-53c8bea68ec5
————————————————————————

xe vgpu-create 명령으로 VM 에 PGPU를 정상적으로 할당하면 XenServer레벨에서 인식하는
vgpu uuid 가 출력된다.

이제 win7-03 VM에 GPU가 정상적으로 할당되었는지 확인한다.

# xe vgpu-list vm-uuid=ea006743-5185-6130-4dc7-85d2535a3c4e
————————————————————————-
uuid ( RO) : 7bdd7660-28da-31b2-b538-53c8bea68ec5
vm-uuid ( RO): ea006743-5185-6130-4dc7-85d2535a3c4e
gpu-group-uuid ( RO): 74aa44ad-ac92-d451-c490-9c89bb5c0c59
————————————————————————-

정상 할당이 되어지면 위와 같이 vgpu uuid 가 출력되어야 한다.

XenCenter 프로그램에서 해당 VM의 속성으로 들어가서 GPU 정보를 보면 Quadro K5000 GPUs 그룹이
할당되어 있는 것을 확인할 수 있다.

이제 위에서 설명한 바와 같이 Windows 7 운영체제를 설치하고, 원격접속툴(RGS, VNC)설치 ,
K5000에 대한 Nvidia Driver를 설치 한다.

Nvidia Driver 설치 완료 후 reboot 하면 다음 부터는 K5000 GPU 를 가상머신에서 기본 VGA로
인식하게 된다. 이때 XenCenter 프로그램의 VNC console 에도 화면이 나타날 수 있는데,
이는 Nvidia GPU를 인식하기 전까지 그래픽 화면을 표현할때 사용되는 가상 VGA장치를 생성
하기 때문에다. GPU passthrough 되면 가상 console 에 대한 VGA는 제거하는 것이 좋다.

아래는 기본 가상 VGA를 제거하는 방법이다.

# xe vm-param-set uuid=755b8431-066d-b42d-260e-3f7e8a38c132 platform:vgpu_vnc_enabled=false

OpenXen 에서는 hvm config 에 gfx_passthru=1 에 해당하는 명령이라 볼수 있다.

이제 정상적으로 GPU가 VM에 할당되어 동작하면 XenCenter의 console 화면에는 Black Screen
만 나타나게 된다.

VM 에는 원격 3D 그래픽 전송이 가능한 프로토콜(Citrix HDX3Dpro, HP RGS, TightVNC+DF Mirage,
RealVNC)을 이용하여 원격에서 고속의 3D 환경을 구현할 수 있다.
3. GRID VGX 로 vGPU passthrough 방법

Nvidia 에서 Kepler pGPU를 vGPU 형태로 여러 VM에서 공유할 수 있도록 하는 VGX 기술을 만들었다.
해당 GPU Board 가 GRID K 시리즈이다. 현재 고급 3D 그래픽 사용자를 대상으로 하는 k2 시리즈와
일반 그래픽 사용자를 대상으로 하는 K1 시리즈가 출시 되었다.

Grid K1 board
pGPG : 4
vGPU Type : K140Q(파워유저, 1GB), K100(일반유저, 256MB)
K140Q : 16개 생성 가능, (pGPU 당 4개)
K100 : 32개 생성 가능, (pGPU 당 8개)

Grid K2 board (8GB)
pGPU : 2 (K5000급, 4GB)
vGPU Type : K260Q(전문가, 2GB), K240Q(파워유저, 1GB), K200(일반유저, 256MB)
K260Q : 4개 생성 (K2000급,2GB)
K240Q : 8개 생성 (K600급,1GB)
K200 : 16개 생성 (GT급,256MB)

GPU Passthrough 작업 전, 후는 pGPU와 거의 동일하다. 다른 점은 XenServer설치 후
Hotfix XS62ETP001 적용하고, XenServer 에 Nvidia vgx driver를 설치해야 하는 점이다.

그리고 VM에 GPU 할당 시 vgpu-type 이란 새로운 개념이 추가 된다는 점이다.

이제 vGPU Passthrough 적용 가능에 대해 알아보자.

우선 XenServer 6.2 설치하고, XS62ETP001 Hotfix 를 적용한다.

reboot

Nvidia 사에서 제공하는 NVIDIA-vgx-xenserer driver 를 Xenserver Host 에 설치한다.

# rpm -Uvh NVIDIA-vgx-xenserver-6.2-312.53.i386.rpm

reboot

우선 vgx driver 가 잘 인식되었는지 확인한다.

# lsmod | grep nvidia
nvidia 8524886 666
i2c_core 20294 2 nvidia,i2c_i801

이제 pGPU를 확인한다.

# xe pgpu-list
———————————————————————————
uuid ( RO) : 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 56a9d370-df44-64ab-31c3-b3649cc03247
uuid ( RO) : 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 56a9d370-df44-64ab-31c3-b3649cc03247
———————————————————————————
2개의 GRID K2 pGPU를 확인할 수 있다.

pGPU 의 세부 정보를 확인한다.

# xe pgpu-param-list uuid=7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
———————————————————————————
uuid ( RO) : 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 56a9d370-df44-64ab-31c3-b3649cc03247
gpu-group-name-label ( RO): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
host-uuid ( RO): 63ffb0d6-cfea-405f-84ec-9feedacfbdc1
host-name-label ( RO): xen-alang2
pci-id ( RO): 0000:85:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
enabled-VGPU-types (SRW): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
resident-VGPUs ( RO):
———————————————————————————

# xe pgpu-param-list uuid=62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6
———————————————————————————
uuid ( RO) : 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 56a9d370-df44-64ab-31c3-b3649cc03247
gpu-group-name-label ( RO): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
host-uuid ( RO): 63ffb0d6-cfea-405f-84ec-9feedacfbdc1
host-name-label ( RO): xen-alang2
pci-id ( RO): 0000:86:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
enabled-VGPU-types (SRW): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
resident-VGPUs ( RO):
———————————————————————————

이제 XenServer Host에 존재하는 GPU Group 에 대해 확인한다.

# xe gpu-group-list
———————————————————————————
uuid ( RO) : 56a9d370-df44-64ab-31c3-b3649cc03247
name-label ( RW): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
name-description ( RW):
uuid ( RO) : 1f677c02-c0b6-ded5-2606-0bbaa91ff409
name-label ( RW): Group of Matrox Electronics Systems Ltd. G200eR2 GPUs
name-description ( RW):
———————————————————————————

지원되는 vGPU Type 를 확인한다
.
vGPU Type을 확인하기 위해서는 아래 명령을 수행해야 한다.

# xe-toolstack-restart

# xe vgpu-type-list
———————————————————————————
uuid ( RO) : b6d016bf-7475-f064-45e8-6bbc3c2f61fe
vendor-name ( RO):
model-name ( RO): passthrough
framebuffer-size ( RO): 0
uuid ( RO) : 217b6cc8-8470-6e43-6038-98adf2497676
vendor-name ( RO): NVIDIA Corporation
model-name ( RO): GRID K240Q
framebuffer-size ( RO): 1006632960
uuid ( RO) : 06b6d0dc-70d9-3134-b569-f521c4967617
vendor-name ( RO): NVIDIA Corporation
model-name ( RO): GRID K200
framebuffer-size ( RO): 268435456
uuid ( RO) : c8a09c89-e48c-cbfd-c76d-e493fbc2f98d
vendor-name ( RO): NVIDIA Corporation
model-name ( RO): GRID K260Q
framebuffer-size ( RO): 2013265920
———————————————————————————

Grid K2 는 총 4가지 형태의 GPU를 제공한다.

pGPU (K5000), vGPU-K260Q (K2000), vGPU-K240Q (K600), vGPU-K200 (GT)
vgpu-type 중 passthrough는 pGPU 를 직접 할당시키는 type이다.

vgpu-type 의 세부 정보를 확인해 보자. 아래는 K240Q type 의 세부정보이다.

# xe vgpu-type-list uuid=217b6cc8-8470-6e43-6038-98adf2497676 params=all
———————————————————————————
uuid ( RO) : 217b6cc8-8470-6e43-6038-98adf2497676
vendor-name ( RO): NVIDIA Corporation
model-name ( RO): GRID K240Q
framebuffer-size ( RO): 1006632960
max-heads ( RO): 2
supported-on-PGPUs ( RO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6; 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
enabled-on-PGPUs ( RO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6; 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
VGPU-uuids ( RO):

# xe gpu-group-param-list uuid=56a9d370-df44-64ab-31c3-b3649cc03247
uuid ( RO) : 56a9d370-df44-64ab-31c3-b3649cc03247
name-label ( RW): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
name-description ( RW):
VGPU-uuids (SRO):
PGPU-uuids (SRO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6; 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
other-config (MRW):
enabled-VGPU-types ( RO): c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676; b6d016bf-7475-f064-45e8-6bbc3c2f61fe
supported-VGPU-types ( RO): c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676; b6d016bf-7475-f064-45e8-6bbc3c2f61fe
allocation-algorithm ( RW): depth-first
———————————————————————————

이제 VM에 GRID K2 에서 지원하는 vGPU를 할당한다.
vGPU 할당에 필요한 정보는 아래와 같다.

vm-uuid : VM의 uuid -> xm vm-list
gpu-group-uuid : pGPU가 소속된 GPU Group 의 uuid -> xe gpu-group-list
vgpu-type-uuid : VM에 할당한 vGPU 등급(passthrough, K260Q, K240Q, K200) -> xe vgpu-type-list

아래는 VM에 K240Q vGPU를 할당하는 명령이다.

# xe vgpu-create vm-uuid=a623c349-6b7f-709e-4e05-38dffc1c5bb8 gpu-group-uuid=56a9d370-df44-64ab-31c3-b3649cc03247 vgpu-type-uuid=217b6cc8-8470-6e43-6038-98adf2497676
ae8d116b-0d2c-07c3-af43-3f35fe1fdedf

할당이 완료된 후, vgpu-type-list 를 확인하면 할당된 VGPU-uuids 에 할당된 vgpu의 정보가 포함
되어 있다.

# xe vgpu-type-list uuid=217b6cc8-8470-6e43-6038-98adf2497676 params=all
uuid ( RO) : 217b6cc8-8470-6e43-6038-98adf2497676
vendor-name ( RO): NVIDIA Corporation
model-name ( RO): GRID K240Q
framebuffer-size ( RO): 1006632960
max-heads ( RO): 2
supported-on-PGPUs ( RO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6; 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
enabled-on-PGPUs ( RO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6; 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
VGPU-uuids ( RO): 7af519ba-bf7c-ac73-2032-56c93dd7770b; 78458e7a-6c21-69cd-e192-da71c8a4313c; 7dfe501e-cd1f-9998-4d3e-cbc5f30f36fb; ae8d116b-0d2c-07c3-af43-3f35fe1fdedf

마지막으로 VM 의 운영체제 설치 및 기본 작업을 완료한 후, VM 가상 console 을 제거한다.

# xe vm-param-set platform:vgpu_vnc_enabled=false uuid=755b8431-066d-b42d-260e-3f7e8a38c132

???# xe vm-param-set uuid=826ac06c-4bee-1c9b-0ef3-b70a68853991 platform:vgpu_extra_args=”disable_vnc=1″ -> VM hang when booting

참고로 vGPU가 할당된 VM에서도 Nvidia Graphic driver를 설치해야 하는데, 반드시 GRID vGPU driver를
설치해야 한다. (312.56-grid-win7-64bit-english-whql.exe)

– 특정 pGPU 상에서 vGPU 생성하기 (실험중)

아래의 내용은 추측을 통한 검증 테스트 과정의 내용임.
현재 까지의 결론은 gpu-group 은 pgpu를 분리하는 역할만 함.
vgpu의 경우 gpu-group을 통해 배분할 수 없음.

단 아래 방식으로 vGPU 별로 group 을 운영하면, 적어도 XenCenter 상에 VM별 GPU 그룹정보
를 통해 해당 VM이 어떤 vGPU를 가지고 있는지 알수 있다.

또한 vgpu-create 시 xe pgpu-list params=all 로 설정에 필요한 정보를 모두 확인할 수
있다.

—————————————————————————————-

동일한 사양의 여러 PGPU를 가진 서버로 XenServer를 기본 구성하면 단일 gpu group을 통해
해당 pgpu 가 그룹화 된다. 만일 특정 VM에 할당하는 vGPU를 특정 pGPU에 종속을 시키고자 하면
GPU Group을 분리해서 각 그룹별로 다른 pGPU를 할당해야 한다.

이러한 이유는 XenCenter에서 VM의 GPU 속성 선택을 gpu group 단위로만 지정 할 수 있다.
만일 부서나 프로젝트와 같이 같은 종속 부류에 대해 고정된 GPU 자원을 할당하고자 할 경우
기본적인 할당 정책으로는 불가능하다.

XenCenter 상에서 직관적으로 여러 VM의 GPU 자원 운영하고자 할때 pGPU별 그룹 분리는 필요한
방안 중 하나가 될것이다.

참고로 기본상태 XenCenter 로 GPU 그룹을 할당하면 무조건 passthrough type의 vgpu를 가지게
된다. vGPU(K260q,K240q,K200)를 할당받기 위해서는 command line 상에서 직접 해당 VM에
vgpu-type을 지정해 주어야 한다.

XenCenter UI 상에서 GPU 그룹을 vGPU Type별로 분류를 하고, vGPU type 별 그룹을 할당할때
해당 vGPU를 자동으로 할당 받기 위해서는 아래의 GPU 분류작업이 필요하다.
1. vGPU Type 별로 gpu group 생성
2. pGPU 장치 pci id 를 통해 물리적으로 특정 GPU 장치를 gpu group 에 할당
3. pGPU 속성 정보에서 enabled-VGPU-types 을 지정한다.

일단 gpu-group 을 vGPU 별로 분리하기 위해서는 새로운 gpu-group을 생성해야 한다.
또한 새로 생성한 gpu-group 에 특정 pGPU를 할당해야 할것이다.

우선 새로운 GPU group 을 생성하도록 한다.

# xe gpu-group-create name-label=”<New GPU Group name>”

# xe gpu-group-create name-label=”Group of ALANG VMs’s GPU Resource”

이제 특정 pGPU를 새로 생성한 gpu group 에 포함시켜 주면 된다.
그러기 위해서는 새로 생성한 gpu group의 uuid 와 pGPU 의 uuid 를 사전에 파악해야 한다.

아래는 pci_id 86:00.0 의 pGPU를 새로 생성한 “Group of ALANG VMs’s GPU Resource” group
에 포함 시키는 과장이다.

먼저 새로 생성한 gpu group uuid 파악

# xe gpu-group-list name-label=”Group of ALANG VMs’s GPU Resource”
——————————————————————————-
uuid ( RO) : 6a417759-635c-cdf4-8345-f2f2e986d432
name-label ( RW): Group of ALANG VMs’s GPU Resource
name-description ( RW):
——————————————————————————

새로 생성한 gpu group 에 포함 시킬 pGPU의 uuid 파악

# xe pgpu-list pci-id=0000:86:00.0
——————————————————————————
uuid ( RO) : 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): a116cfbe-ddce-f08a-3d41-7a84640d6fc7
——————————————————————————
해당 pGPU를 신규 gpu group 에 포함

# xe pgpu-param-set uuid=62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6 gpu-group-uuid=6a417759-635c-cdf4-8345-f2f2e986d432

이제 마지막으로 해당 pGPU 에 지원가능한 vgpu-type을 지정한다.

# xe vgpu-type-list : vgpu-type-uuid 확인
# xe pgpu-list params=all : pgpu-uuid, gpu-group-name-label, enabled-VGPU-types 확인
# xe pgpu-param-set uuid=<pgpu-uuid> enabled-VGPU-types=<vgpu-type-uuid>

이제 정상적으로 Group 별로 pGPU 와 vGPU-Type 이 분리가 되었는지 확인한다.

# xe pgpu-list params=all
———————————————————————————————
uuid ( RO) : 502e7398-4756-3cdf-8560-44beb6a62433
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK106GL [Quadro K4000]
gpu-group-uuid ( RW): e20bbb59-5236-2d75-10c2-63b7543b30c1
gpu-group-name-label ( RO): Group of NVIDIA Corporation GK106GL [Quadro K4000] GPUs
host-uuid ( RO): 58688cf5-f2e6-40c3-bb22-5f91b945789a
host-name-label ( RO): alang20
pci-id ( RO): 0000:03:00.0
dependencies (SRO): 0000:03:00.1
other-config (MRW):
supported-VGPU-types ( RO): b607c3e6-1948-637b-6cbf-b5175b122919
enabled-VGPU-types (SRW): b607c3e6-1948-637b-6cbf-b5175b122919
resident-VGPUs ( RO):
uuid ( RO) : 3522804a-e851-3d19-fd3c-c6714981d95f
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): c55f9161-a28d-8f20-1522-f95dfbc88f4b
gpu-group-name-label ( RO): NVIDIA GRID K2 [K240Q] vGPUs
host-uuid ( RO): 58688cf5-f2e6-40c3-bb22-5f91b945789a
host-name-label ( RO): alang20
pci-id ( RO): 0000:86:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): b607c3e6-1948-637b-6cbf-b5175b122919; 889c4df2-5f7c-78f7-acb9-59f66742c93d; e1b8131a-6ba5-e90f-9d77-d53b93861a3b; 6b5852a5-0499-0129-8bd9-2c8aa15cd1bb
enabled-VGPU-types (SRW): 889c4df2-5f7c-78f7-acb9-59f66742c93d
resident-VGPUs ( RO):
uuid ( RO) : c2a5ab86-9a3f-4d5c-0c12-43f37f01e77c
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 9bada0e2-f576-8eb7-e282-ba7bfde9e552
gpu-group-name-label ( RO): NVIDIA GRID K2 [passthrough] pGPUs
host-uuid ( RO): 58688cf5-f2e6-40c3-bb22-5f91b945789a
host-name-label ( RO): alang20
pci-id ( RO): 0000:85:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): b607c3e6-1948-637b-6cbf-b5175b122919; 889c4df2-5f7c-78f7-acb9-59f66742c93d; e1b8131a-6ba5-e90f-9d77-d53b93861a3b; 6b5852a5-0499-0129-8bd9-2c8aa15cd1bb
enabled-VGPU-types (SRW): b607c3e6-1948-637b-6cbf-b5175b122919
resident-VGPUs ( RO): 94701c64-9262-824c-5c14-2f2abd722205
—————————————————————————————
# xe gpu-group-list params=all
—————————————————————————————–
uuid ( RO) : 72462076-8107-8f35-e09d-ef1ef40e0e5d
name-label ( RW): NVIDIA GRID K2 [K200] vGPUs
name-description ( RW):
VGPU-uuids (SRO):
PGPU-uuids (SRO):
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
uuid ( RO) : c55f9161-a28d-8f20-1522-f95dfbc88f4b
name-label ( RW): NVIDIA GRID K2 [K240Q] vGPUs
name-description ( RW):
VGPU-uuids (SRO):
PGPU-uuids (SRO): 3522804a-e851-3d19-fd3c-c6714981d95f
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
uuid ( RO) : 9bada0e2-f576-8eb7-e282-ba7bfde9e552
name-label ( RW): NVIDIA GRID K2 [passthrough] pGPUs
name-description ( RW):
VGPU-uuids (SRO): 94701c64-9262-824c-5c14-2f2abd722205
PGPU-uuids (SRO): c2a5ab86-9a3f-4d5c-0c12-43f37f01e77c
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
uuid ( RO) : e20bbb59-5236-2d75-10c2-63b7543b30c1
name-label ( RW): Group of NVIDIA Corporation GK106GL [Quadro K4000] GPUs
name-description ( RW):
VGPU-uuids (SRO):
PGPU-uuids (SRO): 502e7398-4756-3cdf-8560-44beb6a62433; 4d6dc185-d40c-048d-b13e-e53fb1b02f1f
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
uuid ( RO) : 6620b20d-6fe1-421e-20e8-0c408b1394f0
name-label ( RW): NVIDIA GRID K2 [K260Q] vGPUs
name-description ( RW):
VGPU-uuids (SRO):
PGPU-uuids (SRO):
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
———————————————————————————————

일단 새로 추가한 gpu-group 에 PGPU 가 포함되어 있는 것을 확인할 수 있다. 하지만 아직 해당 PGPU
를 통해 아무런 vGPU가 할당되지는 않았다.

이제 XenCenter 로 접속한다. 그런 후 몇몇 VM의 GPU 속성 변경을 통해 새로 생성한 GPU Group 을
할당 한다.

다시 gpu-group-list 를 확인해 본다.

# xe gpu-group-list params=all
——————————————————————————–
uuid ( RO) : 56a9d370-df44-64ab-31c3-b3649cc03247
name-label ( RW): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
name-description ( RW):
VGPU-uuids (SRO): 3ba20760-c18c-e0f2-aeaa-40716072f6a9; c90014f4-c924-ac69-f329-ab670e0900a8; 7af519ba-bf7c-ac73-2032-56c93dd7770b; ea09461b-b189-8d23-b4f8-112ec576f521; 78458e7a-6c21-69cd-e192-da71c8a4313c; b45b9706-84d5-413e-4189-0f75d8c3d2ea; 459145b3-4fb1-87a5-9256-6832a2ae7f18
PGPU-uuids (SRO): 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
uuid ( RO) : 6a417759-635c-cdf4-8345-f2f2e986d432
name-label ( RW): Group of ALANG VMs’s GPU Resource
name-description ( RW):
VGPU-uuids (SRO): 20e3821b-cdc1-dede-dedd-ab662cd6b996; a8403082-2ad3-2ee5-ee48-4848c1421abe; 851e6198-d87b-92b1-b811-e00d2a91fc30; acdf6db6-3e9b-9ea5-c949-0b83ee594efd
PGPU-uuids (SRO): 62ce0ffc-aa26-9f03-ac95-406ee9a4e8f6
other-config (MRW):
enabled-VGPU-types ( RO): <expensive field>
supported-VGPU-types ( RO): <expensive field>
allocation-algorithm ( RW): depth-first
——————————————————————————–

XenCenter에서 GPU group 을 변경한 VM 수만큼 신규 gpu group 에서 vGPU 정보를 확인할 수 있다.

– GPU group 삭제

GPU Passthrough 대상이 아닌 VGA가 서버에 장착되어 있을 경우 GPU group에 등록되어 있어
설정 시 출력값이 복잡하거나, 관리에 불편함이 발생할 수 있다.

필요없는 GPU group을 삭제하도록 한다.

# xe gpu-group-list
uuid ( RO) : a116cfbe-ddce-f08a-3d41-7a84640d6fc7
name-label ( RW): Group of ALANG VMs’s GPU Resource
name-description ( RW):
uuid ( RO) : 56a9d370-df44-64ab-31c3-b3649cc03247
name-label ( RW): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
name-description ( RW):
uuid ( RO) : 1f677c02-c0b6-ded5-2606-0bbaa91ff409
name-label ( RW): Group of Matrox Electronics Systems Ltd. G200eR2 GPUs
name-description ( RW):

Matrox G200eR2 GPUs 는 일반적인 서버 콘솔용 Onboard VGA 이다.
GPU Passthrough 대상 장치가 아니다. 제거하도록 한다.

# xe gpu-group-destroy uuid=1f677c02-c0b6-ded5-2606-0bbaa91ff409

– 특정 VM에서 동작되는 vGPU 의 PGPU 파악하기

위에서 설명한 바와 같이 pGPU 별로 그룹을 분리하지 않은 경우, 특정 VM에 가지고 있는
vGPU가 어느 pGPU에 소속되어 있는지 확인하기 복잡하다. 주로 고장된 pGPU를 찾아 낼때
이런 필요가 생길 것이다.

먼저 해당 vM 별로 어떤 vGPU를 사용하고 있는지 확인해야 한다.

# xe vgpu-list params=all

# xe vgpu-list vm-name-label=Win7k-01 params=all
—————————————————————————–
uuid ( RO) : c90014f4-c924-ac69-f329-ab670e0900a8
vm-uuid ( RO): a623c349-6b7f-709e-4e05-38dffc1c5bb8
vm-name-label ( RO): Win7k-01
gpu-group-uuid ( RO): 56a9d370-df44-64ab-31c3-b3649cc03247
gpu-group-name-label ( RO): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
currently-attached ( RO): true
other-config (MRW):
type-uuid ( RO): b6d016bf-7475-f064-45e8-6bbc3c2f61fe
type-model-name ( RO): passthrough
resident-on ( RO): 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
—————————————————————————–

위 파라메터 중 resident-on 값이 해당 pgpu의 uuid 에 해당한다.

# xe vgpu-list vm-name-label=Win7k-01 params=resident-on
—————————————————————————–
resident-on ( RO) : 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172

이제 해당 pGPU의 정보를 확인해 보도록 하자

# xe pgpu-param-list uuid=7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
—————————————————————————–
uuid ( RO) : 7bd8ebb9-9f69-e9ef-9e68-2be71bddd172
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): 56a9d370-df44-64ab-31c3-b3649cc03247
gpu-group-name-label ( RO): Group of NVIDIA Corporation GK104GL [GRID K2] GPUs
host-uuid ( RO): 63ffb0d6-cfea-405f-84ec-9feedacfbdc1
host-name-label ( RO): xen-alang2
pci-id ( RO): 0000:85:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
enabled-VGPU-types (SRW): b6d016bf-7475-f064-45e8-6bbc3c2f61fe; c8a09c89-e48c-cbfd-c76d-e493fbc2f98d; 06b6d0dc-70d9-3134-b569-f521c4967617; 217b6cc8-8470-6e43-6038-98adf2497676
resident-VGPUs ( RO): c90014f4-c924-ac69-f329-ab670e0900a8
—————————————————————————–

pci-id ( RO): 0000:85:00.0 에 해당하는 pGPU 상에서 동작되고 있는 vGPU 임을 알수 있다.
– GPU 할당 정책 지정하기

기본적으로 GPU Group 에서 제공되는 할당 정책은 depth-first 와 breadth-first 두가지 이다.
depth-first 는 pGPU 별로 최대의 vGPU를 할당하기 위한 정책이다. 즉 1개의 pGPU가 제공할 수
있는 vGPU 수가 채워질때까지 해당 pGPU에서 vGPU를 제공하게 된다. pGPU에서 더이상 제공할 수
있는 vGPU가 없게되면, 그 이후 다른 pGPU에서 vGPU를 제공하게 된다.
(SGE 스케줄러의 fill_up 과 같은 정책)

breadth-first 는 여러 pGPU가 있을 경우 VM이 시작될때 마다 순차적으로 vGPU를 제공하는 방법
이다. (SGE 스케줄러의 round_robin 과 같은 정책)

allocation rule 을 변경하는 방법은 아래와 같다.

먼저 해당 gpu group 의 현재 할당 방식을 확인한다.
# xe gpu-group-param-get uuid=be825ba2-01d7-8d51-9780-f82cfaa64924 \
param-name=allocation-algorithm
———————————————————————————–
depth-first

depth-first 를 breadth-first 로 변경 한다.

# xe gpu-group-param-set uuid=be825ba2-01d7-8d51-9780-f82cfaa64924 \
allocation-algorithm=breadth-first
– vGPU 상태 모니터링

# 마지막으로 모든 vM에 정상적으로 vGPU가 할당되면 nvidia-smi 명령으로 이용 현황을 모니터링 할수
있다.

위 내용을 보면 GRID K2 Board 1장으로 K240Q(1GB) vGPU* 4 개와 K260Q(2GB) vGPU *2개
를 VM 6대에 각각 할당하여 사용하는 것을 확인할 수 있다.

GPU-Util 항목에 각 pGPU 의 전체 사용률 역시 모니터링 할 수 있다.
– 참고 사항

XenServer 에서 GPU Passthrough 에 사용 가능한 GPU 정보 :
http://hcl.vmd.citrix.com/GPUPass-throughDeviceList.aspx
4. XenServer 기타 운영 방법

– Dom0 의 vCPU 변경하기

# /opt/xensource/libexec/xen-cmdline –set-xen dom0_max_vcpus=4

– VM의 vCPU 관리하기

먼저 해당 서버의 CPU 정보를 확인한다.

# xe host-cpu-info
——————————————————————————–
cpu_count : 12
socket_count: 2
vendor: GenuineIntel
speed: 2000.024
modelname: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
family: 6
model: 45
stepping: 7
flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc nonstop_tsc aperfmperf pni pclmulqdq vmx est ssse3 sse4_1 sse4_2 x2apic popcnt aes hypervisor ida arat tpr_shadow vnmi flexpriority ept vpid
features: 17bee3ff-bfebfbff-00000001-2c100800
features_after_reboot: 17bee3ff-bfebfbff-00000001-2c100800
physical_features: 17bee3ff-bfebfbff-00000001-2c100800
maskable: full
——————————————————————————-

각 VM에 core 를 할당한다.

# xe vm-param-set platform:cores-per-socket=6 VCPUs-max=6 VCPUs-at-startup=6 uuid=<VM-UUID>

순서 중요
# xe vm-param-set platform:cores-per-socket=6 uuid=<VM-UUID>
# xe vm-param-set VCPUs-max=6 uuid=<VM-UUID>
# xe vm-param-set VCPUs-at-startup=6 uuid=<VM-UUID>
실제 적용 예)

# xe vm-param-set platform:cores-per-socket=6 VCPUs-max=6 VCPUs-at-startup=6 uuid=ff15652c-85b6-6226-a92c-92cae4f96a27

만일 특정 VM에 특정 CPU를 할당하고자 할때는 아래 방법을 이용한다.

# xe vm-param-set uuid=<vm-uuid> VCPUs-params:mask=0,1,2,3,4,5,6,7,8,9,10,11

실제 적용 예)

# xe vm-param-set uuid=ff15652c-85b6-6226-a92c-92cae4f96a27 VCPUs-params:mask=0,1,2,3,4,5

할당된 vCPU 현황 모니터링은 아래와 같이 수행할 수 있다.

# xl vcpu-list
————————————————————————————-
Name ID VCPU CPU State Time(s) CPU Affinity
Domain-0 0 0 7 r– 13619.9 any cpu
Domain-0 0 1 9 r– 15863.2 any cpu
Domain-0 0 2 10 r– 11772.1 any cpu
Domain-0 0 3 6 -b- 12907.0 any cpu
Win7k-09 63 0 5 r– 18.1 0-5
Win7k-09 63 1 0 -b- 4.0 0-5
Win7k-09 63 2 1 -b- 6.3 0-5
Win7k-09 63 3 2 -b- 6.4 0-5
Win7k-09 63 4 3 -b- 7.9 0-5
Win7k-09 63 5 4 -b- 5.9 0-5
Win7k-01 64 0 11 r– 0.6 any cpu
Win7k-01 64 1 – –p 0.0 any cpu
————————————————————————————-

Issue :
– VCPUs-params:mask를 지정해 주지 않으니깐 vcpu를 많이 주더라도 GuestOS상에서는 2개만
인식하는 증세 발견 ..
– DELL R720 서버의 경우 GRID K2 장착 시 “mapping of MMIO above 4GB” BIOS 설정을
기본값인 enable 시 Xenserver hosts 에서 nvidia driver를 인식 못하는 문제 발견
disabled 시키니깐 정상 인식

Supermicro 장비의 경우 K1 장착수에 따라서, above 4G decoding 관련 BIOS 설정 조정이
필요 하다.

advanced > pcie/pci/pnp configureation > above 4G decoding
advanced > chipset configuration > integrated IO configuration – MMCFG base

one K1 의 경우 Above 4G decode – disabled
two K1 의 경우 Above 4G decode – disabled, MMCFG-0x60000000
three K1 의 경우 Above 4G decode – disabled, MMCFG-0x30000000
four K1 경의 경우 지원 안됨

K2 의 경우 Above 4G decode – enabled 가능, MMCFG-0x80000000(default)
Multi K2의 경우 아래 권고 사항 있음 (참고바람)

Please also remember to disable “Above 4G decoding” under PCI configuration menu in the bios.
And if using multiple cards in a system, please try setting MMCFG Base under CPU configuaration
to either 0x30000000 or 0x40000000

현재 alang20의 BIOS 설정은

Above 4G decode – enabled
MMCFG – 0x40000000

임..

– nvidia-smi 로 vgpu 이용률을 모니터링 할 수 있다. 단, pGPU passthrough 시에는 nvidia-smi
를 사용할 수 없다. xenserver host machine 이 kernel panic 이 발생하는 경우도 있다.
(nvidia bug : nvidia 김도영 기술부장왈~)
– gpu 관련 설정 (gpu_group, pgpu, vgpu-type..)을 크게 변경한 후 VM에 새로운 vgpu를 할당
하고 부팅을 하면,

Error: Internal error:xenopsd internal error: Device Ioemu_failed(“vgpu exited unexpectedly”)
같은 에러가 뜬다.. 이런 경우 hosts 를 reboot 해준다.
– VBIOS 와 NVIDIA Driver 간 의존성 문제

default VBIOS : 80:04:60:00:30

nvidia manager driver 312.53, nvidia driver 312.56
xenserver, VM 모두 booting 및 vgpu 사용에 문제 없음. 단 Dual Monitor 기능 제공 안함.

nvidia manager driver 331.24, nvidia driver 331.82
pgu 할당의 경우, nvidia driver 설치 후, VM 부팅 시 윈도우 시작화면에서 hang 발생
(performance monitoring 에서 CPU 이용률 100% 상태로 유지, 네트워크 접속 안됨)

vgpu 할당 후, VM 부팅하면 xenserver host의 부하 극증, 잠시 후 xenserver hosts 강제
rebooting 됨.

updated VBIOS : 80:04:D4:00:09/0A

nvidia manager driver 312.53, nvidia driver 312.56
pgpu 할당 시 xenserver, VM 모두 booting 및 gpu 사용에 문제 없음. Dual Monitor도 사용가능
vgpu 할당 시 해당 pgpu에 첫번째로 loading 된 vgpu는 정상, 두번째부터는 VM에서 인식은 하지만
윈도우 장치관리자에서 VGX Driver 에 문제발생. – 정상적인 vgpu 동작 안함.
(K260Q, K240Q, K200 모두 첫번째 vgpu 만 정상 동작)

nvidia manager driver 331.24, nvidia driver 331.82
pgpu, vgpu 모두 정상동작함. 단, VM의 Nvidia vgpu driver 설치 시 설치 완료 후 NVIDIA driver
설치 관리자에서 자동으로 reboot 메시지를 표시하는데, reboot 을 하면 VM 부팅 시 윈도우 시작화면
에서 hang 발생

반드시 Nvidia driver 설치 후 윈도우 시작메뉴의 시스템종료로 shutdown 을 해야 함.
그런 후 XenCenter 의 Power on 메뉴로 VM을 시작해야 함.
– dxdiag 로 directx 점검 시 Direct3D는 enabled, 단 directdraw 와 AGP 가속은 disabled
됨.
– vgpu 할당 시 nvidia 제어판에서 topology 기능이 없어서 임의의 EDID 로딩 불가.
wide screen 해상도 지원에 제한이 있고, Dual Monitor 사용이 불가함.

Tip :
– VM Copy 방법 : Full Copy

xe sr-list : sr 의 uuid 확인
xe vm-copy new-name-label=WINX86_01 vm=RCAWIN00_X86 sr-uuid=7f09882f-a7ce-06b9-f21f-d61ea083f39f

command 방식으로 full copy를 할 경우, XenCenter 에서 처럼 복사 진행률이 나타나지 않는다.
중간에 강제로 끊고, 다시 시도를 하면 , SR-Storage에 중복되어 VM 디스크 공간이 할당 된다.
지우고 초기화 하기 힘드니, 꾸준히 기다려라 ..

– VM 복제 방법 : Fast Copy

xe vm-clone new-name-label=WINX86_01 vm=RCAWIN00_X86
fast copy 의 경우 원본과 같은 SR에서만 가능하다.

– SR 제거하기

임의의 로컬디스크를 이용하여 생성된 SR을 제거할 경우, 바로 xe sr-destroy 혹은 sr-forget 으로
제거하면 제거가 안된다. SR 은 XenServer 내에서 스토리지 객체의 단위로 실제 SR을 할당한
PBD(물리적장치)에서 해당 SR을 제거한 후에 제거가 가능하다.

우선 삭제할 SR의 UUID를 확인한다.
# xe sr-list
.
uuid ( RO) : 019b941c-bc22-8b4f-a978-a63fc3d25e8c
name-label ( RW): SDC Storage
.
해당 SR을 만든 물리적인 디스크 장치의 UUID를 확인한다.
# xe pbd-list sr-uuid=019b941c-bc22-8b4f-a978-a63fc3d25e8c
uuid ( RO) : e24dfffc-f57d-e1e6-b3a1-65a3f29cfe66
host-uuid ( RO): 375725aa-a48f-4f75-8439-83fd7a4fa202
sr-uuid ( RO): 019b941c-bc22-8b4f-a978-a63fc3d25e8c
device-config (MRO): device: /dev/sdc1
currently-attached ( RO): true
해당 PBD 를 탈착하고, 제거한다.
# xe pbd-unplug uuid=e24dfffc-f57d-e1e6-b3a1-65a3f29cfe66
# xe pbd-destroy uuid=e24dfffc-f57d-e1e6-b3a1-65a3f29cfe66
해당 SR을 제거한다.
# xe sr-forget uuid=019b941c-bc22-8b4f-a978-a63fc3d25e8c

– Dual Monitor 인식시키기

Multi Monitor Support on GRID

Multi-monitor support on GRID boards K1 and K2 Requires the following VBIOS
versions :

* GRID K1: 80.07.AF.00.00 or later
* GRID K2: 80:04:BA:00:00 or later

1. USB Drive 를 준비하세요. 용량은 작아도 상관 없습니다. (1GB ~)
2. USB를 PC 연결 후 Fat32 타입으로 포맷 진행하세요.
3. 보내드린 default.zi 를 default.zip으로 파일명 변경하시고, 압축파일 내용을 해당 USB에 복사하세요.
4. Command Prompt를 여시고, 해당 USB Drive 로 경로 변경합니다. (eg. e: or f:)
5. 다음 명령을 실행합니다 syslinux -m -a e: (e:는 USB 드라이브 명)
6. 보내드린 FW 파일을 USB에 복사하세요.(루트에 복사하시면 됩니다)
7. GRID K1,K2가 장착된 서버에 USB를 연결하시고, USB로 부팅하시면 Tiny Linux 환경으로 부팅됩니다. /mnt/nv 폴더에서 다음과 같이 업데이트 파일을 실행하시기 바랍니다.
./gridpro-update.run (GPU 별 Bios 업데이트 물어봄) 혹은
./gridpro-update-auto.run (자동으로 업데이트)

완료 및 부팅 후 nvidia-smi -q 명령으로 Bios 확인 시 모델에 따라 80.04.BE.00.02(K1) 혹은 80.04.D4.09(K2) 버전으로 업데이트 되면 성공입니다.

가급적 자동(./gridpro-update-auto.run) 업데이트 하길, 수동으로 하면 GRID K2 의 경우 2개의 PGPU가
있는데, 1개만 되는 실수를 범할 수 있음.

업데이트가 되면 .. 하나는 80.04.D4.00.09, 다른 하나는 80.04.D4.00.0A 로 업데이트가 된다.

VBIOS Version : 80.04.D4.00.09
VBIOS Version : 80.04.D4.00.0A

혹 GRID K2 와 다른 Quadro GPU 가 같이 장착된 경우, 굳이 다른 GPU를 제거할 필요 한다.
– NIC passtrough 적용하기

# lspci 로 NIC의 PCI ID 확인
06:00.0

# vi /boot/extlinux.conf 에 pciback.hide=(06:00.0) 설정 추가
label xe
# XenServer
kernel mboot.c32
append /boot/xen.gz mem=4096G dom0_max_vcpus=6 dom0_mem=4096M,max:4096M watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M cpuid_mask_xsave_eax=0 console=vga vga=mode-0x0311 pciback.hide=(06:00.0) — /boot/vmlinuz-2.6-xen root=LABEL=root-spyxbkgu ro xencons=hvc console=hvc0 console=tty0 quiet vga=785 splash vmalloc=300M — /boot/initrd-2.6-xen.img
# extlinux -i /boot 실행

reboot

xl pci-list-assignable-devices
# xe vm-list 로 nic-pt 를 적용할 vm 의 uuid 확인

# xe vm-param-set other-config:pci=0/0000:06:00.0 uuid=<vm_uuid>
nic-pt 할 device 가 여러개 일 경우 extlinux.conf 에 pciback.hide=(06:00.0)(01:00.0)..
이런 식으로

# xe vm-param-set other-config:pci=0/0000:06:00.0,0/0000:01:00.0 uuid=.

– USB 인식시키기
usb-redirector 이용 방법
http://blogs.citrix.com/2012/02/29/usb-over-network-with-xenserver-6/

usb-passthrough 이용 방법
http://hardforum.com/showthread.php?t=1663213
– sysprep 검증 : Ok

GPU 할당 상태에서 sysprep 실행
일반화 체크, 반드시 종료
GPU 할당 제거하고 부팅
sysprep 초기 정보 입력
administrator 활성
네트워크 IP 정보 설정
시스템 종료
GPU 할당
시스템 부팅
최초접속 시 XenCenter 콘솔로 접속 진행, 로그인 후 다시 재시작
RGS, RDP 원격 접속 – 그래픽드라이버 재 설치 필요 없음.
– 관리자메뉴얼 살펴보기

– RCA 구상
Dom0 CPU-pin -> 4개 .. 0-3
Dom0 CPU 와 Windows AD VM vcpu 공유
Dom0 CPU 와 RNTier management vcpu 공유
4~15 or 4-19 개는 WIN7 VM 전용

– NVIDIA 패키지 업그레이드 방법 (검증중)

우선 Hosts에 NVIDIA-manager 를 업그레이드 한다.
기존 rpm 을 지우고 reboot, 신규 rpm 설치

기존 VM에 할당된 GPU 를 XenCenter에서 none 으로 제거하고, VM 부팅
NVIDIA vGPU 제거, 종료
pGPU passthrough 로 GPU 할당, VM 부팅
NVIDIA 신규 Driver 설치 후, 반드시 종료 (리부팅 안됨)

– XenServer 3D Graphic Pack 1 업그레이드 방법 (XS62ESP1)

1. Xenserver 6.2.0 을 새로 설치한다.
2. XS62ESP1 을 patch 한다. 그럼..XS62E001~012 까지 패치됨. reboot
XS62ESP1을 설치하 경우, 이전 버전인 XS62ETP001를 설치하면 안된다. (중요..)
XS62ETP001 은 vGPU 기능이 추가된 XenServer의 preview 버전임. 설치 시
XenCenter 최신 버전과의 기능 구성 차이로
3. NVIDIA Manager Driver 331.30 설치 rpm -ivh . reboot
4. lsmod | grep nvidia , nvidia-smi check
xe-toolstack-restart 할 필요 없음.
6. XenServer 6.2.2-XenCenter 설치. 관리자 Windows PC
7. VM의 Nvidia vGPU Driver Upgrade 332.07 . VM
패치 후 기본 정보

[root@alang20 Patch]# xe patch-list | grep name-label
name-label ( RO): XS62E008
name-label ( RO): XS62E009
name-label ( RO): XS62E002
name-label ( RO): XS62E005
name-label ( RO): XS62E001
name-label ( RO): XS62E011
name-label ( RO): XS62ESP1
name-label ( RO): XS62E010
name-label ( RO): XS62E004
name-label ( RO): XS62E013
name-label ( RO): XS62E012
name-label ( RO): XS62E007

# uname -r
2.6.32.43-0.4.1.xs1.8.0.847.170785xen

– RHEL 6.4 설치 하기 ..

pxe 설치 시 ..
ks option 에 clocksource=jiffies clocksource_failover=hpet

xe vm-praram-set uuid=<vm_uuid> platform:viridian=false

최초 부팅 시 ..
grub edit mode 에서 kernel option 에 ..

로그인 후, /boot/grub.conf 수정 ..

/etc/sysconfig/network 에 gateway 입력
/etc/resolve.conf dns 입력

# mount -o loop xs-tools-6.2.0-3.iso /mnt
# /mnt/Linux/install.sh

# vi /etc/ntp.conf
server time.bora.net
server time.nuri.net
multicastclient 224.0.1.1

# /etc/rc.d/init.d/ntpdate start
# chkconfig –level 345 ntpdate on

reboot
– openxen VM 을 xenserver VM 으로 converting 하기

주의 사항 : PV driver 를 제거한 후 convert 해야 한다.

wget http://www-archive.xenproject.org/files/xva/xva.py
# ./xva.py -c /etc/xen/WIN8X64.hvm -s 192.168.123.71 –username=root –password=root///
# ./xva.py -n HYNTEL01 –is-hvm –disk /dev/sdb6 –filename=/APP/HYNTEL01.xva
# ./xva.py –disk=/dev/system/windows –is-hvm –name=Windows –memory=256 –vcpus=4 –filename=windows.xva

옵션 참고
# ./xva.py -h
Usage: xva.py [options]

Options:
-h, –help show this help message and exit
-c FILE, –config=FILE
Specify the OSS Xen config file to process(all other
options output options are ignored)
–sparse Attempt sparse mode(detecting chunks that are zero)

Virtual Machine Parameters:
These options are only read when you dont specify a config file with
-c

-d DISK, –disk=DISK
Add disk in file/block device DISK, make sure first
disk given is the boot disk
-m MEM, –memory=MEM
Set memory to MEM(Megabytes), default 256
-n NAME, –name=NAME
Set VM name to NAME(default unnamed)
-v NUM, –vcpus=NUM
Set VCPUS to NUM(default 1)
–no-acpi ACPI Disabled
–no-apic APIC Disabled
–no-viridian Viridian Disabled
–no-pae PAE Disabled
–nx NX enabled(default no)
–is-hvm Is HVM VM(defaults to HVM)
–is-pv Is PV VM
-k KERNEL, –kernel=KERNEL
Supply VM kernel KERNEL
-r RAMDISK, –ramdisk=RAMDISK
Supply VM ramdisk RAMDISK
-a ARGUMENTS, –args=ARGUMENTS
Supply VM kernel arguments ARGUMENTS

Output Options:
These are the options that dictates where the VM should be saved or
streamed to a server. You can either save to a file or stream to a
server, not both. One of either -f or -s have to be specified

-f FILE, –filename=FILE
Save XVA to file FILE
-s HOSTNAME, –server=HOSTNAME
Stream VM to host HOSTNAME
–username=USERNAME
Use username USERNAME when streaming to remote host
–password=PASSWORD
Use password PASSWORD when streaming to remote host
–no-ssl SSL disabled with streaming
–sftp SFTP the kernel/ramdisk to the server(requires
paramiko)

– vm operation scripts

vm-uuid
# vm-uuid [vn_name]

vm-vnc-close
# vm-vnc-close [open|close] [vm_uuid]

vm-core-set
# vm-core-set [core_number] [vm_uuid]

# sr-uuid

– nic 제거하기

# xe pif-list
# xe pif-forget uuid=…

– VM 메모리 강제 지정

간혹 XenCenter 상에서 VM의 메모리 변경이 안될 경우가 있다. 이때 command 상에서 메모리
할당을 하는 명령 구문이다.
# xe vm-memory-limits-set uuid=<vm uuid> static-min=64GiB static-max=64GiB dynamic-min=64GiB dynamic-max=64GiB

/opt/xensource/libexec/xen-cmdline –set-dom0 blkbk.reqs=256

xenpm get-cpu-topology
/opt/xensource/libexec/xen-cmdline –set-xen dom0_max_vcpus=1-6
/opt/xensource/libexec/xen-cmdline –set-xen dom0_vcpus_pin
command 로 제어하기

Usage: xl [-v] vcpu-pin <Domain> <VCPU|all> <CPUs|all>

xm vcpu-pin ID VCPU CPU
xm vcpu-pin 14 0 7

xm vcpu-pin 0 0 0
xm vcpu-pin 0 1 0
xm vcpu-pin 0 2 0
xm vcpu-pin 0 3 0
xm vcpu-pin 0 4 1
xm vcpu-pin 0 5 1
xm vcpu-pin 0 6 1
xm vcpu-pin 0 7 1

[root@host ~]# /usr/lib/xen/bin/host-cpu-tune
Usage: /usr/lib/xen/bin/host-cpu-tune { show | advise | set <dom0_vcpus> <pinning> [–force] }
show Shows current running configuration
advise Advise on a configuration for current host
set Set host’s configuration for next reboot
<dom0_vcpus> specifies how many vCPUs to give dom0
<pinning> specifies the host’s pinning strategy
allowed values are ‘nopin’ or ‘xpin’
[–force] forces xpin even if VMs conflict

Examples: /usr/lib/xen/bin/host-cpu-tune show
/usr/lib/xen/bin/host-cpu-tune advise
/usr/lib/xen/bin/host-cpu-tune set 4 nopin
/usr/lib/xen/bin/host-cpu-tune set 8 xpin
/usr/lib/xen/bin/host-cpu-tune set 8 xpin –force

xl sched-credit
xl sched-credit -d <domain> -w <weight>
xl sched-credit -d <domain> -c CAP : CAP is int

– VM에서 해석 성능 불안전

BIOS CPU Power Management 에서 Max Performance 설정 확인
최상의 성능에서 20~30% 정도 떨어짐. dom0_cpu_pin, vcpu_pin, nic passthrough 등으로
순간 최상의 성능에 가깝게 성능 확보가 가능하지만, 일정 시간이 지나면
다시 성능이 저하됨. 특히 네트워크 연결 환경(AD Profile, shared storage) 에서는
성능 저하가 더 극심함.

DELL R720 장비의 경우 system profile 설정에서 performance 로 지정해야함.
기본은 performance per watt(dapc) 로 되어 있음.

– non-GPU VM 에 해상도 증가시키기

GPU-PT를 하지 않은 윈도우 VM은 기본 해상도가 1024×768밖에 지원되지 않음.
비디오 메모리 역시 4M 만 할당됨. 비디오 메모리를 증가시켜 해상도를 높이고자 할경우 ..

# ps aux | grep qemu-dm
qemu-dm-46 -d 46 -m 3000 -boot dc -serial pty -vcpus 2 -videoram 4 -vncunused -k en-us -vnc 127.0.0.1:1 -usb -usbdevice tablet -net nic,vlan=0,macaddr=22:10:b9:7d:12:e7,model=rtl8139 -net tap,vlan=0,bridge=xenbr0,ifname=tap46.0 -acpi -monitor pty

# vi /opt/xensource/libexec/qemu-dm-wrapper
————————————————————————————–
def main(argv):
.
.
qemu_args = [‘qemu-dm-%d’%domid] + argv[2:]
행 밑에 아래 3개 추가

qemu_args.append(‘-std-vga’)
qemu_args.append(‘-videoram’)
qemu_args.append(’32’)
—————————————————————————————-

그런 후 리부팅 ..

화면해상도>고급>모니터>색 에서 256색을 32비트로 변경
그럼..2650×1600까지 잡힘, 실제 인식하는 비디오 메모리는 16M
– 네트워크 성능 Benchmark

iperf 를 이용

# on receiver
iperf -s -f m # add “-w 256K -l 256K” when sender or receiver is a Windows VM

# on sender
iperf -c <receiver-IP> -f m -t 20 # add “-w 256K -l 256K” when sender or receiver is a Windows VM

– 다른 XENSERVER로 VM 옮기기

같은 Pool 내부의 XENSERVER 간에 VM 옮기기
# xe vm-migrate host=<other xenserver> vm=<vm_name> live=true force=true

서로 다른 Pool에 소속된 XENSERVER 간 VM 옮기기 (Live상태)

# xe vm-migrate remote-master=192.168.123.72 remote-username=root remote-password=root/// vm=RNTIER27-04b destination-sr-uuid=1839d06d-f608-c75c-d616-f09d487b6e67 live=true force=true

migration tool 이용하는 방법 (Halted 상태)
# yum install glibc.i686
# wget http://djlab.com/stuff/migratevm-1.0.1.tar.gz
# tar zxf migratevm-1.0.1.tar.gz && cd migratevm-1.0.1
# ./migratevm
migratevm 1.0.1 started
Enter source host name/IP (blank = localhost):
Enter username for localhost (blank = root):
Enter password for localhost: *******
Enter source vm name or uuid on localhost: RNTIER27-04b
Enter destination host name/IP (blank = localhost): 192.168.123.72
Enter username for 192.168.123.72 (blank = root): root
Enter password for 192.168.123.72: *******
Destination SR on 192.168.123.72 (blank for default):
Connecting to source host
Connecting to destination host and Starting transfer
…… 1.9%, 7440.59 (KB/sec))

주요 옵션
-sh : source host
-su : source user (usually root)
-sp : source pass
-sv : source VM label or UUID
-dh : destination host
-du : destination user
-dp : destination pass
-ds : destination SR (optional)

# ./migratevm -sh localhost -su root -sp xxxxx -sv WIN7X02 -dh 192.168.123.72 -du root -dp xxxxx -ds f956e313-d403-9c21-62eb-b5ada58f7a87
migration 시 주의 사항

destination xenserver 에 Migration VM이 요구하는 메모리 만큼 여유 메모리가 있어야 함.
보유메모리가 아니라 할당되지 않은 여유 메모리가 확보되어야 함.

GPU-PT 형태의 VM은 반드시 GPU를 none 상태로 변경한 후 Migration 시킴

destination SR 에 용량 체크 필요
– ECC mode off 하기

nvidia-smi -q > ecc mode 확인 ECC 모드시 3583MiB로 표시

nvidia-smi -i <GPU num> –ecc-config=0
– boot video device 를 gpu passthrough 하기

NVIDIA-vgx-xenserver rpm 제거

xe vm-param-set other-config:pci=0/0000:06:00.0 uuid=.

– xe command 로 xen network 정보 수정 하기

xe host-set-hostname-live host-uuid=${HT-UUID} host-name=${1}
xe pif-reconfigure-ip IP=$2 netmask=$3 gateway=$4 DNS=$5 mode=static uuid=${PIF-UUID}

– Fujitsu Server Power Performance 최대 설정 하기

CPU 의 Power technology 를 disabled 한다.
참고, custom 으로 해서 Turbo 혹은 Intel Speed 관련 설정을 enabled 하는 경우가
있는데, 이는 max performance 상태가 아니다..

– vm-export 시 주의 사항..

VM을 export 로 file backup 할 경우, 몇 가지 유의사항이 존재한다.
vm-export 로 백업을 할때 export 시점에 VM을 구성하는 설정이 모두 백업이 된다.
만일 해당 VM에 해당 host 의 의존적인 구성 요소가 포함되어 있을 경우, 다른 host 로
import 가 안될 수 있다.

즉 PCI Passthrough로 GPU 나 특정 PCI 장치(NIC..) 가 할당된 경우가 이에 해당된다.
export 시에는 이러한 장치를 제거한 상태에서 백업을 하도록 한다.

이밖에 CDROM 에 특정 미디어가 탑재된채로 export 한 경우나, 특히 xen-tool.iso 파일이
탑재된 상태로 export를 하게 되면 import 시 심각한 문제가 발생한다.

예를 들어 export 시 xen-tool 이 탑재된채로 backup 이 되었다가, xen-tool 버전이 업그레이드
된 host 에 import를 하게 되면 xen-tool 에 대한 sr-uuid 가 다르기 때문에 import를 할 수
없게 된다.

이밖에 export 대상 VM 에 대한 vbd 정보를 확인 후, 체크를 한다.

# xe vbd-list vm-name-label=RNTIER27-SP2 params=all
.
vdi-uuid ( RO): <not in database>
vdi-name-label ( RO): <EMPTY>
allowed-operations (SRO): insert; attach
empty ( RO): true
device ( RO):

위 의 경우 문제가 발생했다.

# xe vbd-list vm-name-label=RNTIER27-SP2 params=all
.
vdi-uuid ( RO): 2db5e034-4efc-43aa-a837-296d34c19401
vdi-name-label ( RO): RHEL64-00
allowed-operations (SRO): attach
current-operations (SRO):
empty ( RO): false
device ( RO): hda

위의 경우는 정상적으로 import 가 가능했다.

그리고, 이러한 문제는 XS6.2 SP1008 패치 이후 발생하였다.
XS62ESP1008 에 export/import 와 관련된 xcp(xapi) 패키지가 업데이트 된다.
xapi-xe 패키지도 업데이트 되지만 client 버전은 굳이 update 할 필요 없을듯..
참고로 import 시 발생하는 문제는 아래와 같다.

There was an SR backend failure.
status: non-zero exit
stdout:
– VM 종료 시, 장애 대응 방법

VM shutdown 도중 완전 종료 되지 않고 멈추어 버리는 문제..

xe vm-reset-powerstate vm=<vm_name> force=true

수행 ..
– GPU 포함된 상태로 export 한 이미지를 GPU 없는 Xenserver 에 Import 구동하기

xe vgpu-list
xe vgpu-destory uuid=
xe vm-param-remove param-name=other-config param-key=vgpu_pci
– bnx 10g bonding 시 SW-IOMMU 문제

/opt/xensource/libexec/xen-cmdline –set-dom0 swiotlb=128

PIF_UUID=`xe pif-list device=${NDEV} VLAN=-1 params=uuid | grep uuid | awk ‘{ print $5 }’
xe pif-param-set other-config:ethtool-gro=”on” uuid=${PIF_UUID}

확인

ethtool -k <nic_dev>

OS Rebooting 없이 바로 gro enable 적용하는 방법

ethtool -K xapi0 gro on
ethtool -K eth4 gro on
ethtool -K eth6 gro on

– xen tool 재 설치 하기

1) Uninstalling all Citrix/Xen stuff via add/remove programs
2) Removing the C:\Program Files(x86)\Citrix folder
3) Removing all xen* entries from the registry in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services
4) Emptying the temp folders (Windows and CurrentUser)
5) Copy the complete XenTools cd to a map on the local harddrive
6) Install via the normal way (so no legacy)
7) If the installer seems stuck wait 10 more minutes and if still at the same point restart the Citrix XenTools Installer Servic

Xenserver-6 가상화 환경 구현 및 vGPU Passthrough 구현

You may also like...

알림글

시스존 통합 검색

카테고리

2025 11월
월	화	수	목	금	토	일
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Xenserver-6 가상화 환경 구현 및 vGPU Passthrough 구현

You may also like...

Xen 환경에서 live migraiton 과정

Redhat KVM 기반 VDI Client 가속 기술인 SPICE 소개

Xenserver VM 간 네트워크 성능 측정

알림글

시스존 통합 검색

카테고리