ESX , 3.0.1, 32039
VC, 2.0.1, 32042
600+ VDI deployment, XP, 512mb spread across 21 servers. Storage is across 4 Luns on HP MSA 1500's, single fiber pathed.
Each ESX box has 4 NICs dedicated to VDI's and access to all MSA luns.
We are experiencing a complete RDP 'freeze' at exactly the same time every day. There are no batch jobs running, no sms queries, no anti-virus running and nothing in any of the event logs to indicate any errors. RDP stream stays alive during the freeze and eventually all the users can continue working, but the freeze time varies between 4 min up to as long as an hour. There is nothing in the ESX perf logs to indicate a problem , other than the heartbeat inidcator going crazy. Perfmon on the XP boxes indicates a huge spike in Disk I/O, nothing big in CPU or memory.
Has anyone experienced this RDP freezing? It happens at exactly 11:15pm every night, so this would potentially indicate a job running. Is there anything on the ESX server side that could be potentially causing this ?
have you applied the following patch
http://support.microsoft.com/default.aspx/kb/925876
if you found this post helpfull consider awarding points
Patch already installed. Plus, that doesn't explain the same time, daily freeze. The operate all day long with no issues at all.
Regards,
Frank
have you checked your MSA units, how many LUN's do you have and also how many hosts are attached to each?
there are known issues with MSA's and over 4 hosts
Yes, 4 Luns, 2 TB each, fiber attached. Running 600+ desktops across all of them. No issues with them except for this one area of time. BTW, running many more across another MSA with no issues at all... I read about the MSA1000 issues, but we haven't had the issue with the 1500's to date.
so you are talking about 150 VM's per LUN, how many disks are in each LUN (7?) have you checked the SAN's logs,
on your other MSA are you also running XP desktops are is that serving different VM guests? there could be a usage patten causing the issue
Hi Frank,
You've said you are using XP, which version? In SP1 there has been a bug than can cause this kind of behavior:
http://support.microsoft.com/?kbid=811080
\- Dirk
sp2.
Now, we have determined that it is happening in our 2.5.x environment as well, so it is not specific to an MSA, but is also occurring on an HP 5000 EVA as well.
Root cause ended up being a hardware and software scan being initiated by an SMS client that was corrupt.
Hey Frank,
thanks for posting back the solution (very good practice). So was this an SMS client (loaded on a single vm) that caused all this mess ?
Massimo.