Hi!
In my company we have 3 ESXi hosts running with a vCenter 6.7 that were configured some time ago. We found out that the configuration is not as "best practise" with the iSCSI connection (separate management and storage traffic). Therefore we wanted to get this right.
To test the new best practise version we only changed the config on one host. The old config was using one vmnic, one vmk and one vSwitch for management and iSCSI storage traffic.
Now we have the following configuration:
vmnic0 -> vmk0 (10.176.20.22, Services: Management) -> vSwitch0
vmnic1 -> vmk1 (10.176.21.122, Services: none checked!) -> vSwitch1
vCenter IP 10.176.21.9
So two physical uplinks with different vmk's and also different vSwitches. The network of vmk0 and vmk1 are also different ones (vmk0 - 10.176.20.0/24, vmk1 - 10.176.21.0/24). vmk1 is bond to a newly created port group, called iSCSI-0.
I already configured the Software iSCSI Adapter to use this new vmk and it is working fine. The problem we now have is that the ESXi host is stated as "disconnected" in vCenter. When I try to connect it, it fails. I can ping the host from the vCenter without issues. I tried to connect the host with hostname and also IP.
But now comes the odd part: when I try to add the host in vCenter with the IP address of the new vmk1 it gets found instantly.
When I try to add just any wrong address I get an instant error message (cannot contact host xxx.xxx.xxx.xxx). So it seems like the host is reachable from vCenter and as I said I can ping it from there.
I already restarted the management network on the host, restarted the host itself, checked the vCenter licensing just in case.
I really appreciate any ideas how to get rid of this problem. I am stuck at the moment. It seems that vmk1 gets recognized as the management interface instead of vmk0, though I only set the management service checkbox on vmk0.
Some details from the current configuration:
[root@esxi-2:~] esxcli network ip interface ipv4 get
Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type Gateway DHCP DNS
---- ------------- ------------- -------------- ------------ ----------- --------
vmk1 10.176.21.122 255.255.255.0 10.176.21.255 STATIC 0.0.0.0 false
vmk0 10.176.20.22 255.255.255.0 10.176.20.255 STATIC 10.176.20.1 false
[root@esxi-2:~] esxcli network ip route ipv4 list
Network Netmask Gateway Interface Source
----------- ------------- ----------- --------- ------
default 0.0.0.0 10.176.20.1 vmk0 MANUAL
10.176.20.0 255.255.255.0 0.0.0.0 vmk0 MANUAL
10.176.21.0 255.255.255.0 0.0.0.0 vmk1 MANUAL
[root@esxi-2:~] esxcli network vswitch standard list
vSwitch0
Name: vSwitch0
Class: cswitch
Num Ports: 7936
Used Ports: 17
Configured Ports: 128
MTU: 1500
CDP Status: listen
Beacon Enabled: false
Beacon Interval: 1
Beacon Threshold: 3
Beacon Required By:
Uplinks: vmnic0
Portgroups: VM Network, LAN_SRV, Management Network
vSwitch1
Name: vSwitch1
Class: cswitch
Num Ports: 7936
Used Ports: 4
Configured Ports: 1024
MTU: 1500
CDP Status: listen
Beacon Enabled: false
Beacon Interval: 1
Beacon Threshold: 3
Beacon Required By:
Uplinks: vmnic1
Portgroups: iSCSI-0
[root@esxi-2:~] esxcli network ip interface list
vmk1
Name: vmk1
MAC Address: 00:50:56:6f:91:62
Enabled: true
Portset: vSwitch1
Portgroup: iSCSI-0
Netstack Instance: defaultTcpipStack
VDS Name: N/A
VDS UUID: N/A
VDS Port: N/A
VDS Connection: -1
Opaque Network ID: N/A
Opaque Network Type: N/A
External ID: N/A
MTU: 1500
TSO MSS: 65535
RXDispQueue Size: 1
Port ID: 50331652
vmk0
Name: vmk0
MAC Address: ac:1f:6b:45:47:c8
Enabled: true
Portset: vSwitch0
Portgroup: Management Network
Netstack Instance: defaultTcpipStack
VDS Name: N/A
VDS UUID: N/A
VDS Port: N/A
VDS Connection: -1
Opaque Network ID: N/A
Opaque Network Type: N/A
External ID: N/A
MTU: 1500
TSO MSS: 65535
RXDispQueue Size: 1
Port ID: 33554437
Thank you!
Marius
Your management vmnic is on a different subnet than vCenter.
You have vmk0 on 10.176.20.22/24 and vCenter on 10.176.20.22/24
Okay but this was already working for years, before I added the new vmk1.
So this means as soon as I add any vmk, with the same subnet that the vCenter uses, it gets automatically preferred and used as the management interface although it is not marked for the management service?
You have routing configured between the 2 subnets? From the esxi can you ping vCenter IP from the vmk0 interface?
The error is completely related to connectivity, you can test ping connectivity from both sides. You will see probably that the packet is not reaching the destination.
And to do a deeper test you can run curl -v telnet://hostIP:443 from vCenter server and nc -zv vcenterip 443
There you are checking the connectivity to port 443 from both sides and if the result is Host not reachable you clearly have a connection issue.
Yes routing is working.
This is a cluster of three ESXi hosts and I only configured one with this additional vmk1. The other two hosts still have their old configuration with the same subnet like the vmk0 of the third ESXi.
I can ping in both directions (vmk0 -> vCenter -> vmk0).
[root@esxi-2:~] esxcli network diag ping -H 10.176.21.9 -I vmk0
Trace:
Received Bytes: 64
Host: 10.176.21.9
ICMP Seq: 0
TTL: 63
Round-trip Time: 525 us
Dup: false
Detail:
Received Bytes: 64
Host: 10.176.21.9
ICMP Seq: 1
TTL: 63
Round-trip Time: 461 us
Dup: false
Detail:
Received Bytes: 64
Host: 10.176.21.9
ICMP Seq: 2
TTL: 63
Round-trip Time: 354 us
Dup: false
Detail:
Summary:
Host Addr: 10.176.21.9
Transmitted: 3
Received: 3
Duplicated: 0
Packet Lost: 0
Round-trip Min: 354 us
Round-trip Avg: 446 us
Round-trip Max: 525 us
Second test with telnet and netcat:
root@vcenter [ ~ ]# curl -v telnet://10.176.20.22:443
* Rebuilt URL to: telnet://10.176.20.22:443/
* Trying 10.176.20.22...
* TCP_NODELAY set
* Connected to 10.176.20.22 (10.176.20.22) port 443 (#0)
root@vcenter [ ~ ]# nc -zv 10.176.21.9 443
10.176.21.9 443 (https) open
What about the new Management VMkernel? Is it tagged as Management in it? You should not have the iSCSI one tagged as Management also
The new vmk1 does not have the Management Service activated.
Try to add the ESXi again pointing to the vmk0 IP and paste the errors found on the vpxd.log.
Please paste the last lines extract so we can help you from here. The files seems to be damaged
Edited by moderator: The log attached previously is readable, and is preferable to pasting long log dumps - thanks for using the attach function on your previous post.
I can see different erros on your logs:
Did you update the ESXi?
Try to run service.sh restart on the ESXi.
Review the status of the Component Manager service in vCenter: service-control --status vmware-cm
There was no update neither to the ESXi nor to the vCenter. I can connect the host with the vmk1 address by the way. So this should not be an version issue.
root@vcenter [ /var/log/vmware/vpxd ]# service-control --status vmware-cm
Running:
vmware-cm
[root@esxi-2:~] /sbin/services.sh restart
Errors:
Invalid operation requested: This ruleset is required and connot be disabled
No changes yet. Still no way to add the vmk0 IP.
I did a test today for the port 902.
root@vcenter [ ~ ]# curl -v 10.176.20.22:902
* Rebuilt URL to: 10.176.20.22:902/
* Trying 10.176.20.22...
* TCP_NODELAY set
* Connected to 10.176.20.22 (10.176.20.22) port 902 (#0)
> GET / HTTP/1.1
> Host: 10.176.20.22:902
> User-Agent: curl/7.59.0
> Accept: */*
>
^C
root@vcenter [ ~ ]# curl -v 10.176.21.122:902
* Rebuilt URL to: 10.176.21.122:902/
* Trying 10.176.21.122...
* TCP_NODELAY set
* Connected to 10.176.21.122 (10.176.21.122) port 902 (#0)
> GET / HTTP/1.1
> Host: 10.176.21.122:902
> User-Agent: curl/7.59.0
> Accept: */*
>
220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP, MKSDisplayProtocol:VNC , VMXARGS supported, NFCSSL supported/t
* Connection #0 to host 10.176.21.122 left intact
Only the vmk1 responds with the VMWare Authentication Daemon. Port 902 on vmk0 is open, but it looks like no service is responding.
I think you may have an duplicate IP on other vm/host. Have you already tried to disable the vmk1 (iSCSI) and see if the problems is solved? (this way we can confirm that this vmk is the one which produces the issue)
I just talked to our networking + ESXi specialist from another department and he said this is related to subnetting. The vCenter is in the same subnet as vmk1, but vmk0 is in a different one. He said the solution is to put the iSCSI part in a dedicated subnet.
I try to do it within this week hopefully and will let you guys know.
It's what I already told you in my previous comment.
You said that my mgmt vmnic is on a different subnet than vCenter, yes. But I did not know that this is a problem - it was working before when there was no other vmnic active.
I also asked about that:
So this means as soon as I add any vmk, with the same subnet that the vCenter uses, it gets automatically preferred and used as the management interface although it is not marked for the management service?
I will try to fix it this evening.
Okay, it is fixed now. I was able to re-connect the host to the vCenter. Thanks for all your input and help guys!
The final solution was to put vmk1 and the NAS into a dedicated subnet.
vmk0 (10.176.20.22/24) is still in another subnet than vCenter (10.176.21.9/24) but it works fine. And the re-connect of the ESXi to the vCenter worked instantly after changing the subnet for vmk1 and the NAS.
I will change the whole setup in future and use physically dedicated switches, only for iSCSI purposes.