VMware Cloud Community
ManivelR
Hot Shot
Hot Shot

Mellanox RDMA performance with VSAN 8

 

Hi All,

I'm still having back-to-back doubts about VSAN.

Can someone help me in this area?

RDMA,

Mellanox Technologies ConnectX-6 Dx EN NIC; 100GbE; dual-port QSFP56; PCIe4.0 x16; (MCX623106AN-CDA)

We are using the above 100 G NICs(2 * 100 G NICs) for VSAN traffic.100 G uses RDMA functionality.

VSAN version is 8 and its 3 node cluster with OSA.

As per the VSAN health,IEEE should be enabled,PFC should be set to 3.Currently it is in CEE mode and PFC is 2.

Where we need to set IEEE?

For PFC=3 and lossless configuration and all will be configured from the switch end.I hope so..

 

VSAN performance(network performance in  monitor tab)-->When I try the same, it pushes max 30 Gbps only(all 3 nodes).Is this hard limit? or any other configuration is missing.

 

Note-->We are using cisco 100 G switch for these VSAN connection.

 

Thanks,

Manivel R

 

 

0 Kudos
2 Replies
ManivelR
Hot Shot
Hot Shot

I found most of the answers. 

1) Before enabling ieee and pfc set to 3 from esxi command line, we will need to install these two vibs on esxi. 

Mellanox firmware tools

Nvidia mellanox software tools. 

Then, we will need to run 4 commands on esxi. It's available in reddit website. Check with vsan with rdma there. 

I have updated those commands and took reboot of esxi servers. 

Pfc set =3 is success. 

Rdma mode is showing as unknown instead of ieee. 

I will need to check with vendors. 

0 Kudos
TravisHTX
Contributor
Contributor

Hi Manivel,

 

I just went through the same process of getting IEEE mode working on Mellanox dual port 100 Gb adapters. I am using Arista switches though.

The default firmware config on my Mellanox adpaters already had DCBX_IEEE enabled, so this was the command I needed:

/opt/mellanox/bin/mlxconfig -d mt4119_pciconf0 set LLDP_NB_DCBX_P1=1 LLDP_NB_RX_MODE_P1=2 LLDP_NB_TX_MODE_P1=2 LLDP_NB_DCBX_P2=1 LLDP_NB_RX_MODE_P2=2 LLDP_NB_TX_MODE_P2=2 DCBX_CEE_P1=0 DCBX_CEE_P2=0

 

Here are the driver parameters I used:

esxcli system module parameters set -m nmlx5_core -p "RSS=8 GEN_RSS=3 DRSS=4 dcbx=1 pfctx=8 pfcrx=8 trust_state=2"
esxcli system module parameters set -m nmlx5_rdma -p "dscp_force=26"

 

Followed the steps here for the switch/port configs:

https://enterprise-support.nvidia.com/s/article/roce-configuration-for-arista-switches

 

 

Something that might help with performance, change these settings on the hosts

Net.TcpipRxDispatchQueues = 8
Migrate.VMotionStreamHelpers = 8

 

Once both the switch and hosts configs are done, I rebooted and viola, IEEE mode.

 

Hopefully that helps.