Counterdoc
Contributor
Contributor

Unable to connect ESXi host in vCenter after adding vmk for iSCSI

Hi!

In my company we have 3 ESXi hosts running with a vCenter 6.7 that were configured some time ago. We found out that the configuration is not as "best practise" with the iSCSI connection (separate management and storage traffic). Therefore we wanted to get this right.

To test the new best practise version we only changed the config on one host. The old config was using one vmnic, one vmk and one vSwitch for management and iSCSI storage traffic.

Now we have the following configuration:

vmnic0 -> vmk0 (10.176.20.22, Services: Management) -> vSwitch0
vmnic1 -> vmk1 (10.176.21.122, Services: none checked!) -> vSwitch1

vCenter IP 10.176.21.9

So two physical uplinks with different vmk's and also different vSwitches. The network of vmk0 and vmk1 are also different ones (vmk0 - 10.176.20.0/24, vmk1 - 10.176.21.0/24). vmk1 is bond to a newly created port group, called iSCSI-0.

I already configured the Software iSCSI Adapter to use this new vmk and it is working fine. The problem we now have is that the ESXi host is stated as "disconnected" in vCenter. When I try to connect it, it fails. I can ping the host from the vCenter without issues. I tried to connect the host with hostname and also IP.

But now comes the odd part: when I try to add the host in vCenter with the IP address of the new vmk1 it gets found instantly.
When I try to add just any wrong address I get an instant error message (cannot contact host xxx.xxx.xxx.xxx). So it seems like the host is reachable from vCenter and as I said I can ping it from there.

I already restarted the management network on the host, restarted the host itself, checked the vCenter licensing just in case.

I really appreciate any ideas how to get rid of this problem. I am stuck at the moment. It seems that vmk1 gets recognized as the management interface instead of vmk0, though I only set the management service checkbox on vmk0.

Some details from the current configuration:

[root@esxi-2:~] esxcli network ip interface ipv4 get

Name  IPv4 Address   IPv4 Netmask   IPv4 Broadcast  Address Type  Gateway      DHCP DNS

----  -------------  -------------  --------------  ------------  -----------  --------

vmk1  10.176.21.122  255.255.255.0  10.176.21.255   STATIC        0.0.0.0         false

vmk0  10.176.20.22   255.255.255.0  10.176.20.255   STATIC        10.176.20.1     false

[root@esxi-2:~] esxcli network ip route ipv4 list

Network      Netmask        Gateway      Interface  Source

-----------  -------------  -----------  ---------  ------

default      0.0.0.0        10.176.20.1  vmk0       MANUAL

10.176.20.0  255.255.255.0  0.0.0.0      vmk0       MANUAL

10.176.21.0  255.255.255.0  0.0.0.0      vmk1       MANUAL

[root@esxi-2:~] esxcli network vswitch standard list

vSwitch0

   Name: vSwitch0

   Class: cswitch

   Num Ports: 7936

   Used Ports: 17

   Configured Ports: 128

   MTU: 1500

   CDP Status: listen

   Beacon Enabled: false

   Beacon Interval: 1

   Beacon Threshold: 3

   Beacon Required By:

   Uplinks: vmnic0

   Portgroups: VM Network, LAN_SRV, Management Network

vSwitch1

   Name: vSwitch1

   Class: cswitch

   Num Ports: 7936

   Used Ports: 4

   Configured Ports: 1024

   MTU: 1500

   CDP Status: listen

   Beacon Enabled: false

   Beacon Interval: 1

   Beacon Threshold: 3

   Beacon Required By:

   Uplinks: vmnic1

   Portgroups: iSCSI-0

[root@esxi-2:~] esxcli network ip interface list

vmk1

   Name: vmk1

   MAC Address: 00:50:56:6f:91:62

   Enabled: true

   Portset: vSwitch1

   Portgroup: iSCSI-0

   Netstack Instance: defaultTcpipStack

   VDS Name: N/A

   VDS UUID: N/A

   VDS Port: N/A

   VDS Connection: -1

   Opaque Network ID: N/A

   Opaque Network Type: N/A

   External ID: N/A

   MTU: 1500

   TSO MSS: 65535

   RXDispQueue Size: 1

   Port ID: 50331652

vmk0

   Name: vmk0

   MAC Address: ac:1f:6b:45:47:c8

   Enabled: true

   Portset: vSwitch0

   Portgroup: Management Network

   Netstack Instance: defaultTcpipStack

   VDS Name: N/A

   VDS UUID: N/A

   VDS Port: N/A

   VDS Connection: -1

   Opaque Network ID: N/A

   Opaque Network Type: N/A

   External ID: N/A

   MTU: 1500

   TSO MSS: 65535

   RXDispQueue Size: 1

   Port ID: 33554437

Thank you!

Marius

0 Kudos
19 Replies
MikeStoica
Expert
Expert

Your management vmnic is on a different subnet than vCenter.

You have  vmk0 on 10.176.20.22/24 and vCenter on 10.176.20.22/24

0 Kudos
Counterdoc
Contributor
Contributor

Okay but this was already working for years, before I added the new vmk1.

So this means as soon as I add any vmk, with the same subnet that the vCenter uses, it gets automatically preferred and used as the management interface although it is not marked for the management service?

0 Kudos
MikeStoica
Expert
Expert

You have routing configured between the 2 subnets? From the esxi can you ping vCenter IP from the vmk0 interface?

0 Kudos
Lalegre
Virtuoso
Virtuoso

The error is completely related to connectivity, you can test ping connectivity from both sides. You will see probably that the packet is not reaching the destination.

And to do a deeper test you can run curl -v telnet://hostIP:443 from vCenter server and nc -zv vcenterip 443

There you are checking the connectivity to port 443 from both sides and if the result is Host not reachable you clearly have a connection issue.

0 Kudos
Counterdoc
Contributor
Contributor

Yes routing is working.

This is a cluster of three ESXi hosts and I only configured one with this additional vmk1. The other two hosts still have their old configuration with the same subnet like the vmk0 of the third ESXi.

I can ping in both directions (vmk0 -> vCenter -> vmk0).

[root@esxi-2:~] esxcli network diag ping -H 10.176.21.9 -I vmk0

   Trace:

         Received Bytes: 64

         Host: 10.176.21.9

         ICMP Seq: 0

         TTL: 63

         Round-trip Time: 525 us

         Dup: false

         Detail:

         Received Bytes: 64

         Host: 10.176.21.9

         ICMP Seq: 1

         TTL: 63

         Round-trip Time: 461 us

         Dup: false

         Detail:

         Received Bytes: 64

         Host: 10.176.21.9

         ICMP Seq: 2

         TTL: 63

         Round-trip Time: 354 us

         Dup: false

         Detail:

   Summary:

         Host Addr: 10.176.21.9

         Transmitted: 3

         Received: 3

         Duplicated: 0

         Packet Lost: 0

         Round-trip Min: 354 us

         Round-trip Avg: 446 us

         Round-trip Max: 525 us

Second test with telnet and netcat:

root@vcenter [ ~ ]# curl -v telnet://10.176.20.22:443

* Rebuilt URL to: telnet://10.176.20.22:443/

*   Trying 10.176.20.22...

* TCP_NODELAY set

* Connected to 10.176.20.22 (10.176.20.22) port 443 (#0)

root@vcenter [ ~ ]# nc -zv 10.176.21.9 443

10.176.21.9 443 (https) open

0 Kudos
Lalegre
Virtuoso
Virtuoso

What about the new Management VMkernel? Is it tagged as Management in it? You should not have the iSCSI one tagged as Management also

0 Kudos
Counterdoc
Contributor
Contributor

The new vmk1 does not have the Management Service activated.

Screenshot 2020-06-03 at 14.32.54.png

0 Kudos
Lalegre
Virtuoso
Virtuoso

Try to add the ESXi again pointing to the vmk0 IP and paste the errors found on the vpxd.log.

0 Kudos
Counterdoc
Contributor
Contributor

This is a tail of vpxd.log after trying to add the host with the vmk0 IP in vCenter.

0 Kudos
Lalegre
Virtuoso
Virtuoso

Please paste the last lines extract so we can help you from here. The files seems to be damaged

0 Kudos
Counterdoc
Contributor
Contributor

Edited by moderator: The log attached previously is readable, and is preferable to pasting long log dumps - thanks for using the attach function on your previous post.

0 Kudos
Lalegre
Virtuoso
Virtuoso

I can see different erros on your logs:

  • "Host not compatible with version 6.7.2"
  • "Connection refused with SSL errors"
  • "Component Manager service not running"

Did you update the ESXi?

Try to run service.sh restart on the ESXi.

Review the status of the Component Manager service in vCenter: service-control --status vmware-cm

0 Kudos
Counterdoc
Contributor
Contributor

There was no update neither to the ESXi nor to the vCenter. I can connect the host with the vmk1 address by the way. So this should not be an version issue.

root@vcenter [ /var/log/vmware/vpxd ]# service-control --status vmware-cm

Running:

vmware-cm

[root@esxi-2:~] /sbin/services.sh restart

Errors:

Invalid operation requested: This ruleset is required and connot be disabled

No changes yet. Still no way to add the vmk0 IP.

0 Kudos
Counterdoc
Contributor
Contributor

I did a test today for the port 902.

root@vcenter [ ~ ]# curl -v 10.176.20.22:902

* Rebuilt URL to: 10.176.20.22:902/

*   Trying 10.176.20.22...

* TCP_NODELAY set

* Connected to 10.176.20.22 (10.176.20.22) port 902 (#0)

> GET / HTTP/1.1

> Host: 10.176.20.22:902

> User-Agent: curl/7.59.0

> Accept: */*

>

^C

root@vcenter [ ~ ]# curl -v 10.176.21.122:902

* Rebuilt URL to: 10.176.21.122:902/

*   Trying 10.176.21.122...

* TCP_NODELAY set

* Connected to 10.176.21.122 (10.176.21.122) port 902 (#0)

> GET / HTTP/1.1

> Host: 10.176.21.122:902

> User-Agent: curl/7.59.0

> Accept: */*

>

220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP, MKSDisplayProtocol:VNC , VMXARGS supported, NFCSSL supported/t

* Connection #0 to host 10.176.21.122 left intact

Only the vmk1 responds with the VMWare Authentication Daemon. Port 902 on vmk0 is open, but it looks like no service is responding.

0 Kudos
gibarra94
Enthusiast
Enthusiast

I think you may have an duplicate IP on other vm/host. Have you already tried to disable the vmk1 (iSCSI) and see if the problems is solved? (this way we can confirm that this vmk is the one which produces the issue)

0 Kudos
Counterdoc
Contributor
Contributor

I just talked to our networking + ESXi specialist from another department and he said this is related to subnetting. The vCenter is in the same subnet as vmk1, but vmk0 is in a different one. He said the solution is to put the iSCSI part in a dedicated subnet.

I try to do it within this week hopefully and will let you guys know.

0 Kudos
MikeStoica
Expert
Expert

It's what I already told you in my previous comment.

0 Kudos
Counterdoc
Contributor
Contributor

You said that my mgmt vmnic is on a different subnet than vCenter, yes. But I did not know that this is a problem - it was working before when there was no other vmnic active.

I also asked about that:

So this means as soon as I add any vmk, with the same subnet that the vCenter uses, it gets automatically preferred and used as the management interface although it is not marked for the management service?

I will try to fix it this evening.

0 Kudos
Counterdoc
Contributor
Contributor

Okay, it is fixed now. I was able to re-connect the host to the vCenter. Thanks for all your input and help guys!

The final solution was to put vmk1 and the NAS into a dedicated subnet.

vmk0 (10.176.20.22/24) is still in another subnet than vCenter (10.176.21.9/24) but it works fine. And the re-connect of the ESXi to the vCenter worked instantly after changing the subnet for vmk1 and the NAS.

I will change the whole setup in future and use physically dedicated switches, only for iSCSI purposes.

0 Kudos