Proxmox VE Hyper-converged Cluster Service Remains Uninterrupted while Adding New Configurations (Disks)

,

For a Proxmox VE hyper-converged cluster with five nodes, two Ceph Pools have been created: one is a high-speed Nvme storage pool, and the other is a large-capacity SATA storage pool. Now, there is a need to replace all the existing SATA disks with high-speed Nvme disks.

First, destroy the “hdd_pool” composed of SATA mechanical disks; select it, and then click the “Destroy” button.

Note that it is essential to destroy the Ceph Pool first, followed by the OSDs (Object Storage Devices) that make up the Ceph Pool. If this order is reversed, the remaining OSDs will continuously rebalance data during the destruction of Ceph OSDs. If the Ceph cluster cannot maintain the minimum number of disks, the system will report errors, which can be quite troublesome.

Next, proceed with destroying the Ceph OSDs, which is also mandatory. Otherwise, after removing the hard drives and powering on, some issues may persist, causing discomfort. The process of destroying the OSDs associated with the “hdd_pool” Ceph Pool involves three sub-steps: marking the OSD as out (Out), stopping the OSD (Down), and finally destroying the OSD (Destroy).

Step one: Mark the OSD disk as out. Select the OSD disk you wish to take offline, and click the “Out” button in the upper right corner of the Proxmox VE cluster web management interface.

Step two: Stop the OSD disk. Select the OSD disk that is in the “out” state, and click the “Stop” button in the upper right corner of the Proxmox VE cluster management interface.

To ensure the operation is correct, it’s best to confirm whether the Ceph cluster is rebalancing OSD data. This can be checked in the Proxmox VE cluster web management backend or by using the command line “ceph health detail” on any cluster node. If checking via the web graphical interface, everything should appear green under normal circumstances.

Step three: Destroy the OSD disk. Select the OSD disk that is in both “down/out” states, click the “More” button in the upper right corner, and then continue to click the submenu “Destroy”.

Follow these three steps to take all the OSD mechanical disks offline and destroy them. Besides the graphical operation method, commands can also be used.

Shut down any physical server in the cluster, remove all the SATA hard drives, and insert the new high-speed Nvme disks. After shutting down, all virtual machines running on that node will automatically migrate to other nodes.

When the server with the newly installed Nvme disks is powered back on, the new Nvme disks are recognized by Proxmox VE. Then, continue in the web management backend to perform the operation of creating an OSD for each individual disk.

The two newly added Nvme disks already have partitions, so use a command like “wipefs -a /dev/nvme3n1” to clear them.

If this step is not handled, when creating the OSD next, it will prompt that there are no available hard disks.

Switch to the Proxmox VE hyper-converged cluster web management backend, select the node where the new disks were just inserted, click the “Create OSD” button in the upper left corner of the interface, and a small window for creating an OSD will pop up. Select the new blank device, choose “nvme” from the dropdown list for the device type, and then click the create button.

Repeat this process to create the remaining OSDs. Then observe the overall capacity changes of the Ceph Pool compared to its previous size.

Cycle through all the above steps to complete the processing of the cluster’s five nodes. Because of thorough consideration, the entire process goes very smoothly.


Leave a Reply

Your email address will not be published. Required fields are marked *