VMware Cloud Community
juanaf
Contributor
Contributor

ESXi 3.5 Hangs on HP SmartArray E200i Servers

Hi,

There's a thread in thsi forum related to this one, but as that one begun about the existance of unkillable procesesses, I've decided to begun a new one about the core issue.

We hace three HP BL460c G5 servers with SmartArray E200i, installed with ESXi 3.5 Update 4. These have been running since june 2009, and along this period of time, one or another of them have suffered the problem every 3 or four months.

The initial simpthom is that the virtual machines stop responding, curiously the first ones to hang are Windows 2003, while SLES 10 Linux servers continue to work for some time, but end up hanging anyway. Access to the server with VMI Client is first slow, then impossilble. Access to the console is slow but still works.

Accessing the console one can find many ash processes running, some of them can be killed but most don't. These procesees are trying to run the scheduled backup script. We've found that the script hangs when executing a esxcfg-info command. Running esxcfg-info hangs right after shoing the Diagnostic Partition:

\==+Storage Info :

\==+Diagnostic Partition :

|----Is Active.............................................true

\==+Disk Lun Partition :

|----Name...............................................vmhba0:0:0:7

|----Partition Number...................................7

|----Start Sector.......................................204832

|----End Sector.........................................430080

|----Partition Type.....................................252

|----Console Device...................................../vmfs/devices/disks/vmhba0:0:0:7

|----Size...............................................115326976

|----Type...............................................0x000000fc

\==+Scsi Stats :

|----Commands........................................0

|----Blocks Read.....................................0

|----Blocks Written..................................0

|----Aborts..........................................0

|----Resets..........................................0

|----Read Operations.................................0

|----Write Operations................................0

|----PAE commands....................................0

|----PAE copies......................................0

|----Split commands..................................0

|----Split copies....................................0

|----Issue Time......................................0

|----Issue Time Reads................................0

|----Issue Time Writes...............................0

|----Total Time......................................0

|----Total Time Reads................................0

|----Total Time Writes...............................0

|----Queue Time......................................0

|----Queue Time Reads................................0

|----Queue Time Writes...............................0

|----Layer Time......................................0

|----Layer Time Reads................................0

|----Layer Time Writes...............................0

Or at any other point in the output (if run on another server), allways right before the output of the local datastore filesystem. Commands issued on the local disk filesystems like ls, responds slowly or hang.

Searching the web, we've found that other thread in this forum, dated september 2009, and one internet blog where it also states that this is a recognized VMWare bug, that there are 40+ tickets against it, and that it is registered at vmware support with issue number 420010.

We can't find that issue number or anything about this problem on the vmware support site.

This is a very critical and painful issue, as the only solution so far is a hard reboot of the server. HP-VMWare partnership should have dealed with this already, but we can't check.

¿Can anyone provide more information about the state of this issue?

Regards

0 Kudos
2 Replies
geddam
Expert
Expert

Have you tried with next build of ESXi?

What is the firmware/BIOS version you are using as of now for 460c?

Thanks,,

Ramesh. Geddam,

VCP 3&4, MCTS(Hyper-V), SNIA SCP.

Please award points, if helpful

Thanks,, Ramesh. Geddam,
0 Kudos
PaulDoom
Contributor
Contributor

I just encountered this on a ML350G5 running 4.0 update 1. I had to kick it over, but hope to have some time to trace out what is launching shells next time. (May just be a side effect.)

Anyone find a solution to this? I will probably just update the controller firmware (slightly out of date) and upgrade to 4.1 if not.

-Paul

0 Kudos