Re: ESX server VM unresponsive

sc_2111 · ‎05-28-2007

We have a server in a cluster of two which randomly has a problem with Vms running on it .

All at once the VM become unresponsive and cannot be taken offline in any way . Not only they cannot be reached remotely but also the VMVware console doesn't work.

The "stop" button doesn't work.

The questions are :

\- what can the reason be ?

\- How can we force the vm to be taken offline via command line ?

On the esx console we found this error , but we don't know if is much related with the VM problem

hda: cdrom_pc_intr: the drive appears confused (ireason = 0x 1)

hda: lost_interrupt

Thanks

krishnaprasad · ‎05-28-2007

you can stop / shutdown a VM in CLI mode using vmware-cmd command .

vmware-cmd stop

.....

see the status of VM using the status parameter of vmware-cmd command. Is there any application running on particular VMs?

christianZ · ‎05-28-2007

By all our vms we disconnect the cdroms - maybe that's your problem.

The vms will poll the cdrom when connected (by start up) - when there are many vms doing it this could be problematic, I think.

sc_2111 · ‎05-29-2007

Not any particular application as far as I know .

Thanks for the command line , We'll try the next time it occurs

jebarber · ‎12-06-2007

I know this is an old thread, but it is the only one I could find with this rather informitive error message:

hda: cdrom_pc_intr: the drive appears confused (ireason = 0x 1)

I was getting this every 10 seconds in /var/log/messages.

It seems it has something to do with installing VMware tools and the CD image not disconnecting properly when it's done.

Rebooting the guest should fix the problem.

I had deleted the guest OS but althought it was gone from VC it was still running on the host, running esxtop from the host confirmed this. The only way I was able to kill the VM was to reboot the host.

After do that the drive apparently figured out was was going on and was no longer confused.

sunvmman · ‎09-03-2008

This has been a constant problem for me also.

I constantly get the error message

"hda: cdrom_pc_intr the drive appears confused" and the hostd daemon goes into the "D" state with no response.

I can still access my vm's but can not manage my ESX server.

this has happened 3 times in the last week.

any help appreciated.

Texiwill · ‎09-04-2008

Hello,

hostd in D state is a bad thing. However it may not be related to the cdrom. I had a similar issue that looked to be related to networking and it was instead related to Emulex HBAnywhere. This will take some research to determine the problem and a bit of knowledge based on what else is on the SC.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

bellym · ‎12-31-2008

Just to add a little to this old thread....

I experienced a smiliar problem, I have a single ESX server (3.0.2) that runs 6 VM's. For the second time in the space of a couple of months I had 3 of the 6 VM's lockup.

I restarted the mgmt-vmware service and all three of them switched to the 'off' state, while the 3 working VM's continued to function ok.

I couldn't power any of the 3 failed VM's back on and on further inspection the process off all 3 failed VM's were still present. No amunt of kill -9 or vm-support -X would get rid of these processes and VMware support confirmed that the only course of action was a restart of the host.

In the /var/log/messages log file there were endless 'kernel: hda: lost interrupt' and 'hda: cdrom_pc_intr: The drive appears confu

sed (ireason = 0x 1)' errors, starting at the exact same time the 3 failed VM's hung.

All three of these VM's had a CDrom device mapped to /dev/cdrom whereas the working VM's didn't.

I can only assume that there must be a link of some kind between the cdrom mapping and the locking up.

I have removed the mappings as they were not needed, so fingers crossed it won't occur again.

Matt_B1 · ‎01-08-2009

I am hoping someone has figured this out because this issue is killing us on a few hosts. This is the issue...

Jan 6 23:47:56 usri-pvrt-e08 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)

Jan 6 23:48:06 usri-pvrt-e08 kernel: hda: lost interrupt

Jan 6 23:48:06 usri-pvrt-e08 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)

Jan 6 23:48:16 usri-pvrt-e08 kernel: hda: lost interrupt

Once this happens, the host is unresponsive in VC.

As suggested by VMware, I disconnected the CD-ROM and it removes this error. However, I now get these errors.

Jan 8 18:58:36 usri-pvrt-e08 kernel: end_request: I/O error, dev 03:00 (hda), sector 0

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: status=0x20

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: error=0x20LastFailedSense 0x02

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: ATAPI reset complete

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: status=0x20

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: error=0x20LastFailedSense 0x02

This does help and makes the server responsive in VC but I would like to have the CD-ROM connected and figure out the real issue.

This error only occurs on HP ProLiant DL585G2s in our environment. We have G1 and G5s that don't see this issue. I also run the HP v8.1 agents. I have stopped the hpasm service to be sure the HP agents were not causing issues since they are always problematic. I have verified that every VM on these hosts has the CD-ROM set to Client Device and is disconnected. I can't imagine the CD-ROMs on these DL585G2s needs to be replaced for all 3 having this issue. We are in VC 2.5update3 and I also saw this issue when in VC 2.5update1.

halston4d4 · ‎06-03-2009

Matt.B

I'm having the same issue on my DL585G2s and like you am not seeing it on any of the G1s and G5 servers. Did you ever get a resolution to this issue?

kernel: hda: lost interrupt

Matt_B1 · ‎11-15-2009

No, it is extremely annoying. I don't understand why I have to manually eject the entire CD-ROM unit from the server to prevent this recurring message. I have to go down to the data center again today to pull another drive. I am suprised more people haven't run into this. I still see the issue after our upgrade to 3.5U4 build 176894.