VMware Cloud Community
FLeiXiuS
Contributor
Contributor

ESX Becoming Unreachable - DRS

Hello, this is a pretty serious problem and it's evolving quickly. I have a script that will populate an esx server with virtual machines based off templates that I created prior. This script will register the vm's the server the script was run off of. It's about 500 virtual machines being registered.

Equipment overview:

5 - ESX 3.05 cluster with DRS enabled

MD3000i iSCSI Storage Array

Problem:

The script will be running and after a few virtual machines the ESX server will go to an "unreachable" state in VIC. After reviewing the logs I cant find anything that shows any bit of lead as to why the ESX server is unreachable. VPX is running and listening on port 902. I can see this for sure and can grab the services banner from a remote machine no problem. I cant login to either VIC with the VC IP or the IP of the ESX server itself. I'm out of idea's as to whats causing this. I have tried multiple reboots, cleaned a couple of files, and even removed the virtual machines that were initially deployed by the script. Nothing, not even a log file generated.

Anyone have the slightest bit of clue of what is happening? I have all of the resources available to give you any bit of information you need.

0 Kudos
8 Replies
weinstein5
Immortal
Immortal

Welcome to the forums - I assume when you say the host becomes unreachable you are referring to the VI connected to Virtual Center - this is an iindication that VC has lost network connectivity to the service console - no on to the script are you running this script on a single host and is that host the one that becomes unreachable - if this is the case I am betting what is happening is you are overloading the service console - to resolve if you have not done so increase the memory in the SC -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
FLeiXiuS
Contributor
Contributor

Sorry, let me make my self more clear. When I say the host is unreachable, I mean the ESX server is unreachable by both VC and directly connecting to the SC with VIC. As for memory, the service console is getting as much of 32GB as it wants. I dont think it's that it's out of resources. The SC should come back up after a reboot if it's overloaded...but it's not.

Let me also make it clear that during the script, templates are being deployed so there is a good 2-3 minutes in between registering of virtual machines.

The script worked fine in 3.05 without DRS. But I need the load balancing because I have different specs per server. As for trial runs, I ran the script on ESX-1 and ESX-1 is now no longer reachable via VIC. On ESX-2 I ran the script and it was unreachable until I rebooted.

The script does nothing more then generate a vmx file and clone from a template. Nothing fancy there.

0 Kudos
weinstein5
Immortal
Immortal

OK - both VC and the VI CLient connect to the service console - the SC by default will only be assigned 272 MB and can be set to amximum of 800 MB - but I do not think that is an issue if the script ran in 3.0.x without DRS - if you diasble DRS does the problem dissapear?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
FLeiXiuS
Contributor
Contributor

I thought the SC would allocate whatever resources the server had available. I've delegated 75% of my overall cluster performance to virtual machines leaving 25% to the COS. Is that not a good practice or...?

I'll have to disable DRS tomorrow, I'm almost certain it'll do the same thing. It's rendered my first ESX unreachable by all means except SSH. Which is why I'm sort of stuck as to what has happened.

Thanks for keeping an interest Smiley Happy

0 Kudos
weinstein5
Immortal
Immortal

Like I said the COS will take up to what the memory is set to so if you have not changed it the COS will only use 272 MB - also the COS only runs on CPU0 and can take the entire core - so interms of you percentages is it a bad thing probably not because it will always insure you have some headroom to handle extra loads like those caused by vmoiton (there is a slight load placed on both the originating and receiving esx host) or as you start up VMs - does your script start the vms as well? Post what happens when you try it without DRS running -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
FLeiXiuS
Contributor
Contributor

Well with the DRS option both on and off the results varried. I'm going to try and bump up the memory allocated to the SC. Then re-run the script to see if that makes any difference!

EDIT:

Yes the script will power on the deployed VM's. It'll also place them in the appropriate resource pool.

0 Kudos
weinstein5
Immortal
Immortal

Sounds good - remember if you will also need to increase the DC swap file size - check this thread http://communities.vmware.com/message/685697#685697m for information on how to increase swap size -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
FLeiXiuS
Contributor
Contributor

Hey well here's an update. It still freezes with or without DRS. I have no idea as to why its doing this. After about 10 virtual machines it'll go to register the last VM and then it wont return a 1 or a 0 signifying whether or not it has been registered. At this time...the SSH session is stuck waiting for the machine to register. The ESX host's SC now becomes unavailable in VC. I'm stuck....

0 Kudos