What exactly does it show when you select a host from the list? It should normally show you the VLAN ID that is not supported. The error usually occurs if you have configured a dvPortgroup with a VLAN ID that does not exist on the physical switch port of an uplink. This usually triggers also the MTU warning for the same VLAN ID.
From the list when you select a host (vDS --> Monitor --> Health --> VLAN Tab), it shows under the "VLAN Trunk" column a value of "0" and under the "VLAN Status" column a value of "Not supported" for 4 of the 6 vmnics listed. The other 2 show "Supported". The vDS port groups are not assigned Vlan IDs/numbers.
For the MTU tab, it shows "Not supported" for the same 4/6 vmnics.
Is this because the vDS is running 5.5, still, in the 6.5.0 environment, and would upgrading first correct this?
Okay. Normally there is a VLAN ID instead of 0, which is missing on the physical switch port. For example: You have created a port group with VLAN ID 10, but VLAN 10 is not configured on the switchports of some uplinks. In this case, the number 10 would be there.
If this is 0, it usually means that you have portgroups where VLAN is set to "none". So the dvSwitch sends the packets from these port groups untagged to the physical switch port. The dvSwitch healthchecks check this, too, but if there is no Native VLAN configured on a trunk port of the physical switch, these frames are dropped and the healthcheck warns about this.
I therefore suspect that the switchport configuration of some uplinks is different. Especially the Native VLAN configuration.
The Native VLAN on the physical switch (HP Flex10) shows as VLAN 1, which I suppose is the default setting. There are 3 Vlans, including VLAN 1 going to this environment. So what is the fix for this if that's the case?
Unfortunately, I'm not familiar with HP switches, especially the Virtual Connect modules. With our Cisco network infrastructure, I simply created a VLAN as a native VLAN on each switchport where an ESXi uplink is connected and the problem was solved. We have no untagged traffic in our infrastructure, so using a "dummy" native vlan was an acceptable workaround.
With Cisco it would be:
switch# conf t
switch(config)# vlan 123
switch(config-vlan)# name VMWARE-NATIVE-DUMMY
switch(config)# int Ethernet1/35
switch(config-if)# switchport trunk allowed vlan add 123
switch(config-if)# switchport trunk native vlan 123
(Interface must be changed)
So, I haven't gotten to the network settings changes or analysis yet since it's a blade server and I'll have to investigate how that was and should be set up vs. how it current is. I also wanted to upgrade the vDS to see if that made any difference...
So I upgraded the vDS from version 5.5.0 to 6.5.0, hoping that might clear something up compatibility-wise. It did not. Now, of the 5-host cluster, 3 of the other hosts now show the same Critical Alerts as the original one I had the issue with, but oddly, one of them does not. That host is running the same OS as the others (minus the one host I had patched) and appears to be configured the same way also. So now 4 of the 5 hosts show the Alert. I'll have to review the network settings on both sides but if you or anyone else has any input or recommendations beyond what's already been recommended here, I'm all ears. Thanks.
Hi. I'm still working on this since I haven't touched it in a while. I still have the critical alerts since I wanted to get to the bottom of this before acknowledging them. Traffic seems to be working fine, despite the alerts persisting.
There are 2 physical switches going to the vDS: HP ProCurve --> Flex-10 pair switch --> vDS
The ProCurve pair is trunking 3 Vlans to the Flex-10s:
2 ports in a trunk (x2, 4 total, 2 per switch)
Vlan 100 Untagged
Vlan 200 Tagged
Lan 300 Tagged
The Flex-10s configuration shows the same:
6 nics per host x 5 hosts (30 uplinks to vDS)
Vlan 100 (Native)
dvUplink Group 1 (6 links x 5 hosts = 30 total)
Port Group A (Vlan ID = 0)
Port Group B (Vlan ID = 0)
Port Group C (Vlan ID = 0)
These links from the Flex-10s are all trunked to a single dvUplink group on the vDS, and then there are a few vDistributed Port Groups and each of those have no Vlan ID assigned, as mentioned (so, Vlan ID = 0)
For some reason, all 5 of the hosts appear to be configured the same but only one of them now shows no critical alerts. I don't recall acknowledging the alerts.
I'm thinking of testing out just assigning the matching Vlan IDs to the Port Groups as recommended, but I'd like more info before I break something.
I read at the link below that if tagging is done on the physical switch, the Port Groups' Vlan ID on the Virtual Switch should be zero, but I'm not sure if this applies here or if they mean in a situation where the vDS is connecting to an ACCESS port in a Vlan on the physical switch, or something else. Any clarification or additional help would be great, based on the detail I've added. Thanks.
Hi. I still haven't figured out why only one host did not show any alerts like the others when it appears to be configured the exact same.
We're now migrating to a set of new hosts & decommissioning the old ones (hardware aging), so what I'm doing now is configuring the new host cluster in the same datacenter the same way as the the original cluster, except with new port groups on the vDS that actually have the Vlan IDs assigned (with similar names to the originals).
Though we still have some datastore connectivity to complete before we begin the VM migration to these new hosts, the networking for each new host was set up on the same vDS with these new port groups and I don't get the alerts that the other hosts are still getting. I actually did see the alerts at one point on one or two of them after adding the host & configuring networking, but after entering maintenance mode & rebooting the hosts to ensure everything comes up ok, and to clear up some of the lingering alerts that new installations tend to display, I don't see those alerts anymore, just the HA alerts since we haven't configured the datastores yet.
So, I hope my theory proves true that the Vlan ID = 0 on the original port groups is what the issue was. I'll report back once it's fully functional.