VMware Cloud Community
chavez243
Contributor
Contributor

Windows guests become unresponsive

Had an incident yesterday where all the windows guests became unresponsive, both within the Infrastructure Client and from the external network. The BSD guests were unaffected. Event log had no useful entries, but the syslog has been spewing out loads of interesting albeit cryptic details.

Nov 13 10:47:16 xx.xx.xx.xx sfcb[243308]: storelib Physical Device Device ID : 0x5

Nov 13 10:47:16 xx.xx.xx.xx last message repeated 9 times <27>sfcb[243310]: storelib Physical Device Device ID : 0x5

Nov 13 10:48:41 xx.xx.xx.xx last message repeated 4 times <27>sfcb[3918]: storelib Physical Device Device ID : 0x5

Nov 13 10:48:47 xx.xx.xx.xx last message repeated 29 times <27>sfcb[243475]: storelib Physical Device Device ID : 0x5

Nov 13 10:48:47 xx.xx.xx.xx last message repeated 9 times <27>sfcb[243477]: storelib Physical Device Device ID : 0x5

This is either a PE1950 or 2950 with a PERC6 SAS array in it.

Tags (3)
0 Kudos
7 Replies
chavez243
Contributor
Contributor

is it just me or do questions largely go unanswered in these forums? I mean, I have 3 or 4 distinct issues with ESXi - most of which seem to have been asked about in here before, but they are either a lonely post with no response or a nice long thread of folks comiserating about the same problem, but still not getting any solution. Makes evaluating a product more difficult, I mean, why would we go whole hog with VMware Infrastructure when we have these apparently unresolvable issues with the free product?

The ONLY thing that keep me in the VMware camp at this point is that it is the only product I have found, to date, that will support BSD guests. It's getting to the point though where I might just switch to Debian and XenServer5 though.

0 Kudos
SuryaVMware
Expert
Expert

No, Usually lot of people answere lot of question here. It could be one of case, some times when ppl have no clues.

The symptoms looks to me is like the underlying disk didn't respond for a longer time causing the Windows VMs to freez. More details about your environment and few logs from the affected ESX server for the duration of the issue should help us to figure out what must have happened.

Questions:

1) Storage?? (SAN, Local Disk any other type of datastore.)

2) How many VMs?

3) Did you see a BSOD?

4) Can you attach the vmkernel log for the duration of this issue?

-Surya

0 Kudos
khughes
Virtuoso
Virtuoso

Personally I don't have the knowledge to answer this question so I'm not going to try and point you in the wrong direction, but with that being said, yes some questions do get left unanswered. Sometimes you have to post them a couple times since the volume of threads pushes them out of sight of those who do have the information to help you. For some of the more advanced problems, VMware support is your best bet (even though you're evaluating), the forums here aren't filled with those on VMware's payroll, they're here to offer what knowledge they have, when they have time.

Like the previous poster said, providing as much detail to your setup helps out quite a bit. Hardware, disks, SAN if you have one, type of connection to SAN? etc...

  • Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "
0 Kudos
chavez243
Contributor
Contributor

1) As mentioned in the OP - storage is PERC6 SAS array - that would be local.

2) 10VMs - mixed Win32 & BSD

3) VMs were unresponsive, could not access via RDP or VMiC console - no screens to see, no stops in logs - problem only affected win32 VMs, not BSD VMs

4) Today's log attached (since it is full of the kind of entries we saw when we had the problem originally)

0 Kudos
RParker
Immortal
Immortal

In addition to what khughes said, we do have jobs. This is a self help forum, and we don't get paid to answer questions. so sometimes we may have time to answer a question or two, sometimes not. It's random. And this time of year it's difficult with holidays, people are on vacation, spending time with family... And very few people have the time to monitor every single problem. so the pool of resources is diminished.

And this looks less like a VM Ware issue and more like a hardware issue, so the tech support for this type of question may be hard to find on this forum.

So in the future keep in mind that this forum is here as a community, and we do try to answer and get to everyone's answer, and it's really hard to tell which people will be here to get the answer, so we would waste our time on questions that people don't give us follow up information on as well.

It works both ways, some questions do get missed, but then the questions we spend time answering people don't appreciate or seem to acknowledge the answers given (as well as heed the information). So if a question looks like someone is fishing or throwing some extremely wild question, people may think twice before answering, and some questions can be answered by Google.

And some problems have no difinitive answer. Like which box should I buy for this ESX server. Or I want 100 VM's on 1 laptop hard drive, how do I get the best performance. Now, how are we REALLY supposed to answer that? You really don't know if they are seriously looking for an answer, or if they just want opinions.... We do strive to give the best answer for the majority, and sorry if we can't answer every question timely.

0 Kudos
RParker
Immortal
Immortal

Also FYI, Dell offers support for Hardware AND VM ware, so they can troubleshoot not only the system, but also take into account the ESX OS. Have you tried approaching Dell with this problem? They may have seen it before. . . .

0 Kudos
chavez243
Contributor
Contributor

While I appreciate the lesson in Community Forums 101, and I understand the "busy" aspect - we have invested serious time and money in our evaluation, with two decked out 2950s and 3 1950s. We aren't some dude in his basement loading ESXi on a Commodore 64. We are experiencing some issues across the board on all testing platforms - the various logs are verbose, but cryptic, so we're just trying to get a handle on whether it's just a log happy application or if something is legitimately amiss.

Telling someone "it's hardware" is not very useful, no more so that saying "it's software" - you might as well be blaming the flux capacitor at this point!

We aren't so delusional that we expect to consolidate our data center onto a free product, we realize that just like all the others offering a free hypervisor, if you want the tools to manage the project you're going to have to pay. That said, we have to get through the evaluation first. There are a number of us working on this, all of us have googled, and we have not come up with answers, so I volunteered to spend some time in the vmware forums. Which brings us to the current state of affairs... we had a Host with several guests on it exhibit strange behaviour a while ago now, where all the Windows VMs stopped responding, while the BSD VMs kept running (one of those selective hardware failures, or BSD is just so much better than Windows it ignores hardware errors). We poured over all the logs and nothing looked different from any other day, we compared the logs with the logs on other ESXi hosts we had running. The host console had some entry about "out of resources" - this is a dual-quad, 32GB box, 8 hosts none of which were resource heavy. IT director didn't want it down for long, so troubleshooting gave way to a reboot. Host has been running ok since, but we are watching the logs closer now and want to understand more about what we see in them.

As for giving back - as soon as I figured out the logging to syslog issue I posted about a while back, I immediately posted the fix back into your forums to help the next guy.

0 Kudos