VMware Cloud Community
MattGagliardi
Enthusiast
Enthusiast

dvSwitch problem on reboot

Not my first rodeo when it comes to dvSwitches, but this is a problem I've never seen before and it's a nasty one.  Installing 5 into my test environment before upgrading my production 4.1.  Initial installation, configuration and reboots when the hosts are on a standard vSwitch are A-OK.  The hosts/guests are all able to communicate and the reboots are quick.  However, the minute I migrate over to the dvSwitch everything goes to Hell.  On a reboot the host will take forever to come back up (~10 minutes) and it always seems to be stuck at the point where it's scanning for the iSCSI datastores (different places depending on the host and whether it's using hardware/software iSCSI).  When the system eventually comes up to the console I have zero network connectivity.  The correct IP for vmk0 is listed but I can't ping it from any other entity.  When I go into the shell I can't ping or vmkping anything external to the box (it can ping it's own vmk0 IP but that's it).

If I restart the management network via the console or I use esxcli to "modify" vmk0/vmk1 (something as simple as setting the MTU to the same value it's already got) my networking comes back and the host connects to vCenter again.

Obvious assumption: there is a problem with the networking.  But what?  And why only after I move to the dvSwitch?  I've been impatient and not simply sat and waited to see if it comes back by itself, but there's obviously a limit to how long I could wait.

Any ideas/solutions?  Thanks for your time!

PS - I've been running dvSwitches in my 4.1 environment for a long time, never any problems like this.  Same hardware in both environments.

PPS - it's definitely not a storage problem.  Rebooted a host that had been migrated to dvSwitch but hadn't yet had its iSCSI configuration put in place.  The reboot was "normal" speed (very quick...not hung up scanning for the iSCSI storage) but the networking was still gone and required a management network restart to function again.  My slow boot would seem to be a downstream effect of networking not coming up.

Tags (4)
0 Kudos
28 Replies
MattGagliardi
Enthusiast
Enthusiast

I'm glad the patch went in smoothly and reduced your boot time.  I think once the connectivity issue is resolved you'll see it drop even further, as my experience leads me to believe the delay is the host attempting to enumerate the iSCSI targets that it knows should be there...but that it can't find due to the connectivity issue.

At this point I'd suggest you take the host to the simplest configuration that should work...a single vSwitch that has the VMnetwork (for your guests) and the management network, operating off of vmk0 and vmnic0 (presumeably).  This would be the vSwitch with connectivity to your pSwitch.

I'd then take the remaining vSwitch (the one with 2 pNICs/vmnics and 2 vmks) and remove the second pNIC/vmnic and second vmk.  I'd even yank the second network cable (knowing myself, it'd probably be in a fit of rage).  Get that "storage" vSwitch into the most simple configuration possible...then reboot the host and see what you get.  My experiences from earlier in this thread would suggest that things will come up fine (my issue seemed to be centered around multiple vmk MACs being fed down the same pNIC/vmnic with no VLAN setup).

I think ultimately the shortest path to what you're looking for may be that third switch...it's also entirely possible there's a way to do it with just two switches and some setting(s) in the vSwitch that I've not had to use personally (I'm thinking the routing or perhaps failover order).  I got mine to go with VLANs...but since you're plugging directly into the Drobo I don't think that's going to work for you.

I'll continue to help as much as I can but you may need someone with more depth of experience with the redundant NICs.

0 Kudos
DJCoder
Contributor
Contributor

Yes, I have cut it down to the bear minimum. Here's all I'm doing:

1. Create a vSphere standard switch, select the NIC already wired to the Drobo, set switch IP addr and subnet (192.168.1.7, 255.255.255.0) (gw is pre-set, of course, with the GW of the management network, which is 192.168.1.254) (BTW, that mgmt adapter, which is on its own vSphere standard switch, is 192.168.1.2).

2. Bind the storage adapter that corresponds to the Drobo NIC in step 1 to that NIC.

3. Set the Dynamic Discovery settings of that storage adapter, listing the IP of the Drobo port that's connected (192.168.1.4, same subnet and gw as above).

4. I was trying it with CHAP, but for now I have that off everywhere.

What I find is that the host either does not find the Drobo at all, or I did have it find it and report a status of "mounted" only for it to go into this weird cycle problem a few seconds later, where it would go to standby during which time the host reports the status as "Dead or Error". It would cycle in and out of those two states on an approximately 10-14 second interval.

That's the network/storage side of my problem. Then, if I reboot after doing the above, my host still becomes totally inaccessible via that managment interface.

What am I doing wrong? Something with the IP addresses? Again, I need to stress that the Drobo is directly connected to the server (with a single cable), not through a physical switch. The only connection to a physical switch is the management interface).

Thanks,

DJ

0 Kudos
MattGagliardi
Enthusiast
Enthusiast

DJ,

This may just be my opinion...but I'm not wild about having both vmks on the same subnet.  I'd leave the management switch/vmk/etc. as-is (presumeably that's the actual subnet required for you to get around your network), then change the "storage" vswitch/vmk/etc. over to 192.168.2.x, along with changing the Drobo IP as well.  The GW will be inconsequential (it'll probably still show the mgmt. GW IP) as the iSCSI is wired with a crossover cable and not going off-nework.

0 Kudos
DJCoder
Contributor
Contributor

Hi, Matt,

Yes, I had the same thought and had tried that. To be sure, I tried it again, but no better results. However, I did make some progress...

I narrowed the problem down to the dependent-hardware iSCSI adapters (i.e., the chips on the Broadcom 5709s). I'm sure you know, but just for the benefit of others who might be reading... these are the vmhbaXX devices listed under "Broadcom iSCSI Adapter" in the "Storage Adapters" area. If instead I create a software iSCSI adapter, and bind that to the NIC, then the Drobo is happy and rebooting causes no issues. Everything else is the same as far as network addresses, my vSwitch, etc.

So, why in the world would those cause the iSCSI storage to not work properly as well as making the server totally unreachable after a reboot?! Is there a way inside vSphere to check for the latest drivers for those cards? I'll start with that tomorrow. Any other ideas?

Thanks,

DJ

0 Kudos
MattGagliardi
Enthusiast
Enthusiast

DJ,

  First, let me just put out there that I've not used 5709s (so I have limited familiarity with them) and I'm not suggesting that they won't work...but my understanding of them is that they're actually fairly limited devices when it comes to iSCSI.  FIrst thing I'd do would be to check the VMware HCL regarding their compatibility/supportability in the iSCSI role.  There do appear to be some driver updates for the Broadcom devices...you can find these in the vSphere download section of the VMware site...so you might try updating the drivers to see if that gets you any further.

  I think something to keep in mind is that there's a difference between the ability to do some iSCSI offload (which I think the 5709 can do) and being a full-fledged iSCSI HBA (I don't think the 5709 fits that description).  It may be that what's required of the 5709 during boot isn't something it's capable of due to these limitations (I'm just speculating here).  There are a lot of Google results around "VMware Broadcom 5709"...I'd start there.  FWIW in my experience the software iSCSI initiator in VMware really does an excellent job, it's probably time to start weighing just how much more time you want to invest in making the 5709 work vs. just going to the software initiator and being done with it, particularly if this system won't require mission-critical performance.

0 Kudos
DJCoder
Contributor
Contributor

I'm not that familiar with them either, but VMware has 2 entries for this card in their IO compatability list, one for the network part and one for the iSCSI part ( http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=18683&deviceCat... ). So they are definitely certified.

Also, from the following paragraph from the vSphere Storage Guide, it sure sounds like it is fully supported:

"An example of a dependent iSCSI adapter is a Broadcom 5709 NIC. When installed on a host, it presents its two components, a standard network adapter and an iSCSI engine, to the same port. The iSCSI engine appears on the list of storage adapters as an iSCSI adapter (vmhba). Although the iSCSI adapter is enabled by default, to make it functional, you must first connect it, through a virtual VMkernel interface, to a physical network adapter (vmnic) associated with it. You can then configure the iSCSI adapter. After you configure the dependent hardware iSCSI adapter, the discovery and authentication data are passed through the network connection, while the iSCSI traffic goes through the iSCSI engine, bypassing the network."

I did find out at least part of why my storage configuration is not working for me. I mentioned the info in my last post to the Drobo agent I was working with, and he said, "What, you're trying to use hardware adapters with the Drobo -- we don't support that at all!". After I got up off the floor, I started asking myself the same questions you mentioned. Is the loss of main CPU power significant enough for this to be a deal-breaker for me. Any CPU consumed by iSCSI processing is CPU I don't have later down the road as the number of my VMs grow... I feel sort of cheated, if you know what I mean... not to mention I paid extra for that iSCSI TOE.

In my mind I'm asking, "Is it possible that, even if I use a software iSCSI adapter w/ my Drobo, the iSCSI TOE chip on my BCM5709C will somehow still offload some if the iSCSI processing?" I don't understand the hardware enough to know. Seems like a 'No', but I'm hoping the answer is that it will still offload some but not as good as a true hardware HBA. I think I could live with that. But I read that truly software-only iSCSI can consume up to 500MHz of CPU ( http://www.sanstor.info/5iSCSI%20software%20initiators%20vs.pdf ), and that's a bit scarry when I only have 3GHz to work with.

Let me know what your thoughts are. I don't really know how to define "mission-critical performance." This will be our production server environment, and  not just backup. I'm not running a datacenter, but I plan to have 3-4 VMs and up to 80 people on it for various apps (no huge DBs or Exchange, but domain control, file serving, remote desktop hosting, a financial app, and other various smaller apps). We are a non-profit, so the Drobo's fit our budget, but I don't want to have to replace them later due to poor performance.

Thanks for all the help.

DJ

0 Kudos
DJCoder
Contributor
Contributor

Have no idea if this is valid at all... I just bound the BCM5709 iSCSI HBA to the same vSwitch as the software adapter is bound to (and to which my Drobo is connected), and it seemed to not be a problem. Do you think the hardware is just being ignored for the software initiator, or is it possible that this is how I can use that hardware to offload the iSCSI processing? Anyone know?

0 Kudos
MattGagliardi
Enthusiast
Enthusiast

I'm actually a bit lost inside your configuration at this point DJ...in terms of not really understanding how you've got everything put together.  Some screen shots might help.  I'm thinking:

Storage adapters:

1. Overall view.

2. The properties pages of the vmhba you're trying to use (General and Network Configuration in particular).

Networking:

1. Overall view.

2. Properties of the "Storage" vSwitch (including Ports and Network Adapters pages).

Also, have you broken the switches into separate subnets yet?

0 Kudos
DJCoder
Contributor
Contributor

Understandable. Although it might not seem like it, It's actually very simple still. One vSwitch for the managment network. I created a second and associated a second network card with it. Then I created a software iSCSI adapter and bound it to that second vSwitch. At that point, I could access my iSCSI storage (Drobo B800i), and all was happy... except that I'm using a software iSCSI adapter instead of the hardware ones that are installed and available. Just to try it, I then tried to bind the associated hardware adapter to the same vSwitch along side of the software adapter that was bound to it, and vSphere did not complain, and I can still transfer files to and from the datastore on that target. What I don't know is if that last step actually did anything functionally or if it is just being ignored.

I understand your question about having to use software adapters, b/c I was confused about the same thing. After reading and re-reading the vmware storage guide, I am just about convinced that what I was trying earlier, by NOT creating a software adapter but instead using one the hardware adapters present in the list, is valid and should have worked. Drobo doesn't support it, so that's part of the problem. But even w/o the Drobo attached, the host still had the problem whereby it would lose all connectivity after a reboot after setting up that config. Subnets were irrelvant. I believe that could be a bug, as Frank was possibly finding out, and by what someone obviously believes in this thread: http://communities.vmware.com/message/1583579

I just noticed another patch was released yesterday. Will definitely try that. But really, now my question is about adapters and performance.

I have to run but I'll try to add screen shots Monday if it's still unclear. I did use different subnets, but that did not make a difference... it was all about the software vs. hardware iSCSI adapters.

Thanks,

DJ

0 Kudos