Johnson2g
Contributor
Contributor

RDMA mode show invalid at vsan skyline health check


Hello there,

I am wondering is that because my switch lacks the ETS function or other reasons to fail the RDMA health check.

 

I have followed the guide 

https://www.reddit.com/r/vmware/comments/ozhq6j/vsan_rdma_with_mellanox_nic/

to set up RDMA on my vsan. And I have configured it to IEEE mode. However, the vsan skyline health check shows "RDMA mode: invalid. Issues: The RDMA mode is not configured as IEEE. ". 

60_EWS[KIQM94}(IQN70PLV.jpg

 

It is a four nodes environment.

My switch is Mellanox SN2700 with SONiC OS

NIC is Mellanox MCX455A .

 

The commands i ran are as follow:

/opt/mellanox/bin/mlxconfig -d mt4115_pciconf0 set LLDP_NB_DCBX_P1=1 LLDP_NB_RX_MODE_P1=2 LLDP_NB_TX_MODE_P1=2 DCBX_WILLING_P1=1 DCBX_IEEE_P1=1 DCBX_CEE_P1=0

esxcli system module parameters set -m nmlx5_core -p dcbx=1

esxcli system module parameters set -m nmlx5_core -p "pfctx=0x08 pfcrx=0x08 trust_state=2 max_vfs=4"

esxcli system module parameters set -m nmlx5_rdma -p "pcp_force=3 dscp_force=26"

On the switch, i ran 'sudo config qos reload' to have the pfc priority 3 and 4.

 

I ran 'esxcli network nic dcb status get -n vmnic4' and the result shown as follow:

Nic Name: vmnic4
Mode: 0 - Unknown
Enabled: true
Capabilities:
Priority Group: true
Priority Flow Control: true
PG Traffic Classes: 8
PFC Traffic Classes: 8
PFC Enabled: true
PFC Configuration: 0 0 0 1 0 0 0 0
IEEE ETS Configuration:
Willing Bit In ETS Config TLV: 1
Supported Capacity: 8
Credit Based Shaper ETS Algorithm Supported: 0x0
TX Bandwidth Per TC: 13 13 13 13 12 12 12 12
RX Bandwidth Per TC: 13 13 13 13 12 12 12 12
TSA Assignment Table Per TC: 2 2 2 2 2 2 2 2
Priority Assignment Per TC: 1 0 2 3 4 5 6 7
Recommended TC Bandwidth Per TC: 13 13 13 13 12 12 12 12
Recommended TSA Assignment Per TC: 2 2 2 2 2 2 2 2
Recommended Priority Assignment Per TC: 1 0 2 3 4 5 6 7
IEEE PFC Configuration:
Number Of Traffic Classes: 8
PFC Configuration: 0 0 0 0 0 0 0 0
Macsec Bypass Capability Is Enabled: 0
Round Trip Propagation Delay Of Link: 0
Sent PFC Frames: 0 0 0 4 0 0 0 0
Received PFC Frames: 0 0 0 0 0 0 0 0
DCB Apps:

 

The configuration of the nic after I ran '/opt/mellanox/bin/mlxconfig query ' shows:


Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_OF_VFS 1
FPP_EN True(1)
SRIOV_EN False(0)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 1
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PCIE_CREDIT_TOKEN_TIMEOUT 0
PARTIAL_RESET_EN False(0)
SW_RECOVERY_ON_ERRORS False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
UCTX_EN True(1)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
LOG_DCR_HASH_TABLE_SIZE 14
DCR_LIFO_SIZE 16384
LINK_TYPE_P1 ETH(2)
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_ALGORITHM_P1 ECN(0)
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 0
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
LLDP_NB_DCBX_P1 True(1)
LLDP_NB_RX_MODE_P1 ALL(2)
LLDP_NB_TX_MODE_P1 ALL(2)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 False(0)
DCBX_WILLING_P1 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
DUP_MAC_ACTION_P1 LAST_CFG(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE True(1)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN False(0)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS False(0)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
DYNAMIC_VF_MSIX_TABLE False(0)
EXP_ROM_UEFI_ARM_ENABLE False(0)
EXP_ROM_UEFI_x86_ENABLE False(0)
EXP_ROM_PXE_ENABLE True(1)
ADVANCED_PCI_SETTINGS False(0)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)


Thanks

 

0 Kudos
5 Replies
HUNTER0125
Contributor
Contributor

Hello Has the problem been solved?
0 Kudos
doubled83
Contributor
Contributor

Hello Johnson2g,

I have the same issue. Have you solved the issue? If yes, what have you done.

THX

0 Kudos
TryggveKnutsson
Contributor
Contributor

Hello!

Same problem here. Occurred from the latest vCenter 7.0.3 build 20845200 update.

"esxcli network nic dcb status get -n vmnicX" show IEEE mode.

0 Kudos
ManivelR
Enthusiast
Enthusiast

Hi All,

 

Im also having the same issue.

 

Anyone fixed this issue?

0 Kudos
JasonNash
Enthusiast
Enthusiast

Same here. What are the correct commands to set the RDMA mode to IEEE on the ConnectX-5 adapters? I have been able to enable and set the PFC value to 3 but not amend the mode. I still have the Skyline Health warning "The RDMA mode is not configured as IEEE."

 

We have two vSAN clusters both with the same adapters and we have previously enabled RDMA on the other cluster without issue and I don't recall having to amend the mode so it may be these have a later firmware. 

0 Kudos