EB_123
Contributor
Contributor

verify a newly added host to a DRS cluster

Hi all,

Is there any tool/script/KB that can help verify if a new ESX that was added to a DRS cluster is indeed able to properly run all of the VMs included in the cluster.

Basically, the issue that i am more interested in is the network issue. we had some issues at the past where a vlan was misconfigured at the flex/switch side, so even if this vlan was properly configured at the vmware side, when i tried to move a vm using this vlan to the new host it lost its network.

I thought about a script that will identify all the networks that are configured in the cluster and will somehow check connectiviy to all of them, maybe by changing its vmnic network to every one of them and then trying to ping the default gateway, but this method forces me have an ip in every subnet which in kinda complicated.

So, is there some sort of procedure that can help me make sure a newly added host can properly run all the VMs in that cluster?

Thanks!

0 Kudos
8 Replies
ExpletiveDelete
Enthusiast
Enthusiast

I take it that the cluster is st up with all hosts containing their own Virtual Standard Switch, and not a Distributed switch? If that's the case, there is not going to be a specific way to check other than verifying visually ( guess you could run a script to do this too) or maybe setting up profiles. you could also move to a distributed switch, so you know every host will have the same exact switch configs (outside of the individually defined kernels).

if you know the flex module was misconfigured, why are you looking at the VMware side of it? Do you have the profiles/templates set up properly in the OA/ICM? (also, if you have the means, you may want to consider FEX modules instead.)

0 Kudos
EB_123
Contributor
Contributor

Thanks for the reply Smiley Happy


Almost all switches are distributed. only vmotion and vshield are standard switch (and of course they are not the ones i am worried about).

Regarding the Flex configuration - in some enviorments we are using HpOneView, so we have an ESX Profile that we apply for each bay. In some evnviorments we use Virtual Connect to manage the flex, in there everytime i add a new host i just copy a profile from one of the existing ESX's and apply it to the new one.

Concidering all of the above, can i feel "safe" adding new hosts to the cluster?

By the way, one thing i did forget to mention and might be important - the problems we had in the past happend while moving VMs between clusters. This is not a process which happens regulary, only when we upgrade the servers (we form a new cluster a move VMs from the old one to the new one, while both are connected to the same dvSwitches but they are not located at the same cage, which means they are not connected to the same Flex).

0 Kudos
ExpletiveDelete
Enthusiast
Enthusiast

vMotion between clusters shouldn't be a problem at all. If you're using vDS' and none of the other hosts are having the issue, and the HP profile is the same on the backend, then there should be no difference (other than IP configs) - you should be safe. If you want to be "super duper" safe, then you can remove DRS or set up an affinity prior to adding a host to the cluster so you can manually move things over as a test.

But - if the backend profiles are the same, and the vDS is solid, there should be no problems (at least from the hypervisor through the chassis).

0 Kudos
EB_123
Contributor
Contributor

So, in a case where the backend profiles are the same, and all the vDS are configured properly, would you add hosts to an existing cluster and move VMs to the new host even at the middle of the day, concidering that there are some VLANs that contain only VMs that are not allowed to lose network even for a few minutes (vMotion to the new host, realizing there's a problem and vMotion back to one of the other hosts)? Or even with this configuration the level of certainty is not high enough?

0 Kudos
ExpletiveDelete
Enthusiast
Enthusiast

would I? sure. But I would also test it out prior to moving critical VMs. Grab a VM that is in the same port group and test that out first. If you have mission critical VMs that you don't want moved over, then either set an affinity or remove DRS (and move VMs manually). Run a continuous ping while you do it to notice any network issues.

0 Kudos
EB_123
Contributor
Contributor

Sorry if i'm nagging, but at some clusters we have about 40-50 vlans, almost all of them are critical. testing each VLAN by moving a less-critical vm to the new host is a lot of work and that's exactly what i'm trying to prevent. My goal is to create some sort of a check list, so after i verify the ESX was properly configured i can move VMs into it WITHOUT monitoring each VM during the vMotion process. Is this level of certainty is even possible?

0 Kudos
ExpletiveDelete
Enthusiast
Enthusiast

no problem. but thats is why you have a vDS. your vDS is what is keeping everything standardized across the hosts - assuming the back-end profile is legit. When you add the hosts to the vDS it will go through its own checklist on the up-links, that's all the vDS really cares about. the vDS will get any/all VLANs from the hardware layer modules on the back of the chassis, so assuming the VLANs are passed through THAT, then the vDS is good to go. If you're that juiced about it, then call for an outage time/date and do it then, but I think you're psyching yourself out.

If you're positive the back-end OA/modules and profile are configured correctly, no other issues are seen on the vDS with other hosts, then I would be confident.

EB_123
Contributor
Contributor

I understand.

Ok, Thank you very much for all your help!

0 Kudos