Background
Siemens NX is an advanced, integrated Computer-Aided Design (CAD), Computer-Aided Manufacturing (CAM), and Computer-Aided Engineering (CAE) software developed by Siemens Digital Industries Software. It is widely used in product design, engineering analysis, and manufacturing. NX is the successor to Unigraphics, a well-known 3D design and simulation software in the industry.
The company’s design department requires large CAD/CAM software for design work, running applications such as Siemens NX, ZWCAD, and SolidWorks.
High-end workstations are equipped with Intel i9 13900KF and NVIDIA RTX A5000 graphics cards. Terminal workstations are configured with A2000 graphics cards, while some older workstations are equipped with Q2000/Q2200 graphics cards. In actual work, designers’ tasks are staged. Some design tasks require high-end workstations, while others can be handled by mid-range or lower-end workstations.
The following issues exist with the use of these graphic workstations:
1. Low Resource Utilization:
For example, assigning the highest-spec workstations to certain employees does not fully utilize the equipment’s capabilities, leading to low utilization. If other employees temporarily need high-performance machines, it’s difficult to allocate resources.
2. Information Security Issues:
Design drawings are key core assets for the company. Losing or leaking drawings can result in significant losses. Allowing employees to directly use graphic workstations poses a risk of data leakage and loss.
3. Remote Design Requirements:
The company hopes to enable employees to work remotely, whether at home or while traveling, without hindering their design tasks. Furthermore, design drawings may need to be shared with suppliers or clients for review or collaborative design. Running CAD/CAM software locally on workstations cannot meet the demands for remote design.
Workstation Virtualization Introduction
ShareStation workstation virtualization utilizes the latest GPU virtualization technology to allow multiple users to share a single workstation remotely. This solution satisfies both remote design and information security needs.
We selected a high-end workstation for virtualization, with the following configuration:
CPU: Intel i9-13900KF
Memory: 64GB
Graphics Card 1: NVIDIA RTX A5000-24GB
Graphics Card 2: A2000
Storage: 1TB NVME
The cost of this workstation is approximately 30,000 RMB.
The virtualization software system uses the enterprise-grade open-source platform Proxmox VE and the multi-platform, distributed, and integrated cloud desktop software system, DoraCloud.
Hardware Installation and BIOS Configuration
To enable GPU virtualization, the following hardware installation points are necessary:
1. The A5000 GPU should not be connected to a monitor. The A5000 is used as a virtualization GPU and does not output to a monitor, so another GPU should be used to connect to the display.
2. In the BIOS, enable SR-IOV (also known as VT-D) and Above 4G MMIO BIOS Assignment.
Workstation Virtualization Software Installation
1. Install Proxmox VE
First, install Proxmox VE, a popular virtualization platform.
2. Install and Configure NVIDIA GRID vGPU
The RTX A5000 GPU has several operational modes. By default, it supports display output. To enable vGPU, you need to disable the display output function. This can be configured using the `displaymodeselector` tool.
If the A5000 is already connected to a display and used as the default workstation display output, modifying the GPU mode will prevent the workstation from turning on the display and accessing the local operating system. Hence, as mentioned earlier, another GPU (either dedicated or integrated) must be used as the workstation’s default display output.
To install the NVIDIA vGPU driver, use this script: [Proxmox vGPU Script](https://gitee.com/deskpool/proxmox-vgpu).
The installation steps are as follows:
Log into the Proxmox VE command line.
Run the following commands to update the Proxmox VE source:
apt install git-core -y
git clone https://gitee.com/deskpool/proxmox-vgpu
./proxmox-vgpu/nvidia/gpu01.sh
./proxmox-vgpu/nvidia/gpu02.sh
Proxmox VE will reboot. After rebooting, check whether IOMMU is enabled:
root@pvehost:~# dmesg |grep IOMMU
[ 0.046588] DMAR: IOMMU enabled
Next, run `gpu03.sh` to install the GRID 16.4 driver:
./proxmox-vgpu/nvidia/gpu03.sh
After the Proxmox VE system reboots and you log into Proxmox VE, you can verify that the graphics card driver is installed by running the nvidia-smi
command.
root@pvehost:~# nvidia-smi
Fri May 24 16:20:22 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.05 Driver Version: 535.161.05 CUDA Version: N/A |
|-----------------------------------------+----------------------+----------------------|
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:01:00.0 Off | 0 |
| 30% 46C P8 29W / 230W | 22272MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A2000 12GB On | 00000000:04:00.0 Off | 0 |
| 30% 42C P8 12W / 70W | 0MiB / 11514MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 657157 C+G vgpu 7424MiB |
| 0 N/A N/A 657163 C+G vgpu 7424MiB |
| 0 N/A N/A 657623 C+G vgpu 7424MiB |
+---------------------------------------------------------------------------------------+
Next, use the mdevctl
command to check the vGPU type (vGPU Profile). If you are using a GPU from a generation before Ampere architecture, it should work at this point. However, for the Ampere-based A5000 GPU, an additional step is needed. Enable SR-IOV using the following command:
/usr/lib/nvidia/sriov-manage -e 0000:01:00.0
After executing this command, running mdevctl types
will show the vGPU types, indicating that the vGPU driver has been successfully configured. Note that enabling SR-IOV with this command will not persist after a system reboot. To ensure SR-IOV is enabled on boot, create a service that runs automatically at startup. Here is the script:
cat >/etc/systemd/system/sriov.service <<EOF
[Unit]
Description=Script to enable SR-IOV on boot
[Service]
Type=simple
#start SR-IOV
ExecStart=/usr/lib/nvidia/sriov-manage -e 0000:01:00.0
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable sriov.service
systemctl start sriov.service
After that, reboot the server. If you can see the vGPU types listed when running mdevctl types
, it means the server is properly configured for vGPU.
root@pvehost:~# mdevctl types |more
0000:01:00.4
nvidia-657
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-1B
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-658
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-659
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-1Q
Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-660
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-2Q
Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
nvidia-661
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-3Q
Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
nvidia-662
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-4Q
Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
nvidia-663
Available instances: 0
Device API: vfio-pci
Name: NVIDIA RTXA5000-6Q
Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
....................................
....................................
3. Install and Configure DoraCloud
DoraCloud for Proxmox VE can be installed and configured using tutorials available on the official website.
4. Install CAD Software
Edit the DoraCloud desktop template to install Siemens NX and other commonly used office software. To achieve better visual effects, install the DoraCloud desktop protocol (DDP Server) in the template.
Once the template is complete, create a desktop pool. Set the vGPU type for the desktop pool to 8Q, which will allow the workstation to virtualize three desktops, each with 8GB of VRAM. You can select either RDP or DDP as the desktop pool protocol.
Based on the desktop pool, three desktops are allocated, each with 8GB of VRAM.
Application Testing and Results
Virtual workstations can be accessed using various client options:
1. DoraClient Application
Available for both Windows and Linux versions.
2. DoraCloud Cloud Terminal Products
For example, the JC36 Cloud Terminal and DC20 Cloud Terminal.
3. x86 Machines with DoraOS Thin Client Software
These can be transformed into cloud terminals.
For the best experience, it is recommended to use cloud terminals that support the DDP protocol when accessing DoraCloud. Currently, only DoraOS or x86-based cloud terminals support the DDP protocol.
By using the DoraCloud-based workstation virtualization solution, the professional GPUs of graphic workstations can be flexibly partitioned. For example, during complex projects, a single workstation can be divided into three parts, each with 8GB of VRAM to meet the requirements of large applications. For simpler projects, the workstation can be divided into eight parts, with each part having 3GB of VRAM, fulfilling the design needs of multiple team members. This not only avoids resource wastage but also solves the performance issues of lower-end workstations.
Leave a Reply