Enhance vSAN Performance by enabling RDMA

Remote Direct Memory Access (RDMA) can significantly boost VMware vSAN's performance by enabling low-latency, high-throughput data transfers directly between the memory of different hosts. This reduces CPU overhead and enhances overall efficiency. By leveraging specifically, RoCE v2 (RDMA over Converged Ethernet), VMware vSAN can achieve improved IOPS and reduced latency, making it ideal for demanding workloads.

For VDI workloads, such as those managed by Omnissa/VMware Horizon, RDMA can significantly improve the user experience. By reducing the latency and increasing the throughput of data transfers, RDMA ensures that virtual desktops respond more quickly and smoothly, even under heavy load. This is particularly beneficial for environments with many users, as it helps maintain consistent performance and responsiveness.

However, not all hardware constellations support RDMA out of the box without performing specific configuration. The purpose of this article is to describe different configurations depending on the NIC hardware you are using.


To check and enable RDMA these prerequisites are recommended:

1.      Check NIC RDMA / RoCE HW compatibility.

2.     Check if your Layer2 switches can process RDMA packets

3.     Check BIOS if RDMA is enabled

4.     Configure the NIC driver in ESXi to enable RoCE

 

Check NIC RDMA / RoCE HW compatibility.

Go to the VMware compatibility Guide https://www.vmware.com/resources/compatibility and check if your NIC is supported. RDMA should be listed in the Additional Features section

Example Intel E810

 

Check if your Layer2 switches can process RDMA packets

Most enterprise grade switches can handle RDMA traffic. It can be required to enable RDMA/RoCEv2 in the switch config. Here is an config example of Dell S5212 switches running OS10:

priority-flow-control on
priority-flow-control priority 3 no-drop

 

Check BIOS if RDMA is enabled

Most NICs have their own section in the servers BIOS. Check if RDMA is enabled there. Here is an example of an Dell PowerEdge R640:






Configure the NIC driver in ESXi to enable RoCE

It might be required to configure the NIC driver of each NIC to enable RoCE v2 and the Data Center Bridging Mode. All steps will be performed in the ESXi console via SSH:

To view the parameters use the following command:
esxcli system module parameters list -m NICdrivername

Broadcom NICs:

esxcli system module parameters list -m bnxtnet

To set the parameters all NICs utilize individual commands. Based on the NIC the parameters can vary slightly. But this list should guide you through a broad part of the NIC HW-landscape.

Broadcom
esxcli system module parameters set -m bnxtnet -p "disable_roce=0 disable_dcb=0"

Intel

esxcli system module parameters set -m icen -p "RDMA=1,1"
esxcli system module parameters set -m irdman -p "ROCE=1,1"

Marvell QLOGIC

esxcli system module parameters set -m qedentv “enable_roce=1”

NVIDIA / Mellanox

esxcli system module parameters set -m nmlx5_core -p dcbx=3

 

Enable vSAN RDMA

Finally, when all requirements are met, you can enable RDMA vSAN in the cluster level configuration in the vCenter console.

 

 



Comments

Popular Posts