Contributor
Contributor

DRS and HA Problems!!!!!

Jump to solution

I have started the evaluation process of VMware ESX 3.0.2. The evaluation is going very smooth.

My current test environment consist of 2 ESX Servers, within a cluster using VirtualCenter (loaded on a MSDE Workstation) with 2 Win2003 VMs and 1 Template.

As it stands, I have all the 2 VMs on one ESX Server...the other ESX Server in the cluster is idle.

I have installed the license server, and applied and enabled HA and DRS, Configured Rules, HA-Fully Automated Level, DRS Threshold-Apply all Recommendations, and VMKernel Port configured on each ESX Server.

It seems I cannot get the HA and DRS add-ons to perform as they are designed. I have tested both a CPU Load and a Server/Hardware Failure, and neither DRS or HA enabled features are working.

I have attached a link to a document full of screenshots of my test environment. As it stands, I need to pick up the pace on this evaluation while I can! I just need to get the DRS and HA functions to work, and stable, so that I can present a presentation to management!

Thanks,

Dale

http://www.mediamax.com/txskibum/Links/624D54E5F2

(The doc is about 10mb and link is virus and spam free!!!)[/b]

0 Kudos
1 Solution

Accepted Solutions
Enthusiast
Enthusiast

Yep that's it then, you need to have shared Fibre Channel or iSCSI storage for vMotion, DRS and HA to work. They all operate on the premise that all ESX hosts can see the vmdk files etc and that in the case of vMotion or DRS al that is being copied across the vMotion network is the VM's active memory and state information otherwise it would take too long to copy a 10Gb vmdk from one host to another and in the meantime the Guest OS in the VM would be unresponsive. For HA shared storage is required as it is intended for use in a hardware failure scenario, so if you had an esx host fail due to a hardware failure (RAM, CPU etc) then the server will just crash, in that case ESX will not get time to copy VMs off to another ESX host, it'll just be turned off by the hardware going donw, bringing the VMs sitting on it's local disk with it. With shared disk, the ESX host will still go down but the other ESX host can see the VMs still sitting on the shared storage and start them back up.

View solution in original post

0 Kudos
16 Replies
Enthusiast
Enthusiast

Hi,

I notice from the screenshots that you've set your HA Host Failures allowed to 2. Maybe try setting this to 1 and rerun the tests?

Cheers

DB

0 Kudos
Immortal
Immortal

For DRS to start working, you really need to load up lots of VM's and start to constrain the resources.

HA will really only work if you've got your DNS totally straight. Everything must be resolvable to make this work.

0 Kudos
Contributor
Contributor

Thank you. I switched it from 2 to 1, and reran test.. Looks like its still not working.... Smiley Sad

0 Kudos
Contributor
Contributor

Ive put a major strain on one of my VMs, while the other one (in the same ESX Server) is idle. I have CPU Script that I have 5 strings running, plus Symantec Anti-Virus Scans, etc, etc. CPU is maxed out! Still there is no DRS... My thinking is that the idle VM Server would switch to the other ESX Server, giving the busy VM full resources to the ESX Server...?

0 Kudos
Enthusiast
Enthusiast

What was the test you were performing, should have said that you should try testing HA with this setting changed. Basically pull the plug on the ESX host that's running your VMs and make sure that they start up on the other ESX host.

0 Kudos
Enthusiast
Enthusiast

Also for HA to work you need DNS to be working correctly on the ESX hosts and VC, so can you resolve the hostname of both ESX hosts and the VC server from each of those servers. You may also find it useful to use host entries in /etc/hosts on the ESX servers to make sure this is working as it should.

0 Kudos
Contributor
Contributor

ok.. I was trying to test DRS.. Now, I will (again) forcus on HA. I will pull the plug on the ESX Host which both VM are on. I have HA configured to Number of Host failures allowed= 1.

Stay tuned!

0 Kudos
Contributor
Contributor

Both ESX Host and VMs are all in my DNS Servers... I can ping the all by name, and it will resolve name by IP (ping -a ###.##.##.##)... Stay tuned!

0 Kudos
Enthusiast
Enthusiast

I'd also add the IP, FQDN, and common names of your ESX hosts to your /etc/hosts file. It's a pretty common thing to do since DNS is a bit flaky.

Can you manually migrate your VMs between the two hosts?

0 Kudos
Contributor
Contributor

Its still not working. I got a message on the down ESX Server that says "Possible failure has been detected by HA.....". Seems it knows it went down, but VMs did not move over. There's got to be something obvious that I am missing. (lack of experience)

0 Kudos
Contributor
Contributor

Yes. I can manually migrate between the two host just fine. its takes a little while, but i can. ok...i will try to add the ESX hosts to my VMs etc/hosts file..

0 Kudos
Enthusiast
Enthusiast

Just noticed from your doc that you don't seem to have shared storage between the two esx hosts, at least that's how it looks. You have a VMFS LUN call VM1 and another called VM2. Without shared storage then HA and DRS won't work unfortunately.

Contributor
Contributor

Hmmm... Shared Storage!?!? Well, I think thats it. I dont have an attached SAN to neither ESX host Servers... So, can I use a workstation or another windows server as storage?, and just map a share between the ESX hosts? Then, will I boot my VMs from the shared storage? Doesn't sound very efficient...

0 Kudos
Enthusiast
Enthusiast

Yep that's it then, you need to have shared Fibre Channel or iSCSI storage for vMotion, DRS and HA to work. They all operate on the premise that all ESX hosts can see the vmdk files etc and that in the case of vMotion or DRS al that is being copied across the vMotion network is the VM's active memory and state information otherwise it would take too long to copy a 10Gb vmdk from one host to another and in the meantime the Guest OS in the VM would be unresponsive. For HA shared storage is required as it is intended for use in a hardware failure scenario, so if you had an esx host fail due to a hardware failure (RAM, CPU etc) then the server will just crash, in that case ESX will not get time to copy VMs off to another ESX host, it'll just be turned off by the hardware going donw, bringing the VMs sitting on it's local disk with it. With shared disk, the ESX host will still go down but the other ESX host can see the VMs still sitting on the shared storage and start them back up.

View solution in original post

0 Kudos
Contributor
Contributor

Fibre or iSCSI... Got it! Thank you.

0 Kudos
Enthusiast
Enthusiast

You could look at using a virtual appliance from here http://www.vmware.com/appliances/ , there's one or two iscsi appliances that I've tested with before. It'd be possible to put together something that would work for testing the DR and HA but the OS in the VM may not run very well