VMware Cloud Community
sc_2111
Enthusiast
Enthusiast

ESX server VM unresponsive

We have a server in a cluster of two which randomly has a problem with Vms running on it .

All at once the VM become unresponsive and cannot be taken offline in any way . Not only they cannot be reached remotely but also the VMVware console doesn't work.

The "stop" button doesn't work.

The questions are :

\- what can the reason be ?

\- How can we force the vm to be taken offline via command line ?

On the esx console we found this error , but we don't know if is much related with the VM problem

hda: cdrom_pc_intr: the drive appears confused (ireason = 0x 1)

hda: lost_interrupt

Thanks

0 Kudos
10 Replies
krishnaprasad
Hot Shot
Hot Shot

you can stop / shutdown a VM in CLI mode using vmware-cmd command .

vmware-cmd stop

.....

see the status of VM using the status parameter of vmware-cmd command. Is there any application running on particular VMs?

christianZ
Champion
Champion

By all our vms we disconnect the cdroms - maybe that's your problem.

The vms will poll the cdrom when connected (by start up) - when there are many vms doing it this could be problematic, I think.

0 Kudos
sc_2111
Enthusiast
Enthusiast

Not any particular application as far as I know .

Thanks for the command line , We'll try the next time it occurs

0 Kudos
jebarber
Enthusiast
Enthusiast

I know this is an old thread, but it is the only one I could find with this rather informitive error message:

hda: cdrom_pc_intr: the drive appears confused (ireason = 0x 1)

I was getting this every 10 seconds in /var/log/messages.

It seems it has something to do with installing VMware tools and the CD image not disconnecting properly when it's done.

Rebooting the guest should fix the problem.

I had deleted the guest OS but althought it was gone from VC it was still running on the host, running esxtop from the host confirmed this. The only way I was able to kill the VM was to reboot the host.

After do that the drive apparently figured out was was going on and was no longer confused.

0 Kudos
sunvmman
Enthusiast
Enthusiast

This has been a constant problem for me also.

I constantly get the error message

"hda: cdrom_pc_intr the drive appears confused" and the hostd daemon goes into the "D" state with no response.

I can still access my vm's but can not manage my ESX server.

this has happened 3 times in the last week.

any help appreciated.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

hostd in D state is a bad thing. However it may not be related to the cdrom. I had a similar issue that looked to be related to networking and it was instead related to Emulex HBAnywhere. This will take some research to determine the problem and a bit of knowledge based on what else is on the SC.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
bellym
Contributor
Contributor

Just to add a little to this old thread....

I experienced a smiliar problem, I have a single ESX server (3.0.2) that runs 6 VM's. For the second time in the space of a couple of months I had 3 of the 6 VM's lockup.

I restarted the mgmt-vmware service and all three of them switched to the 'off' state, while the 3 working VM's continued to function ok.

I couldn't power any of the 3 failed VM's back on and on further inspection the process off all 3 failed VM's were still present. No amunt of kill -9 or vm-support -X would get rid of these processes and VMware support confirmed that the only course of action was a restart of the host.

In the /var/log/messages log file there were endless 'kernel: hda: lost interrupt' and 'hda: cdrom_pc_intr: The drive appears confu

sed (ireason = 0x 1)' errors, starting at the exact same time the 3 failed VM's hung.

All three of these VM's had a CDrom device mapped to /dev/cdrom whereas the working VM's didn't.

I can only assume that there must be a link of some kind between the cdrom mapping and the locking up.

I have removed the mappings as they were not needed, so fingers crossed it won't occur again.

0 Kudos
Matt_B1
Enthusiast
Enthusiast

I am hoping someone has figured this out because this issue is killing us on a few hosts. This is the issue...

Jan 6 23:47:56 usri-pvrt-e08 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)

Jan 6 23:48:06 usri-pvrt-e08 kernel: hda: lost interrupt

Jan 6 23:48:06 usri-pvrt-e08 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)

Jan 6 23:48:16 usri-pvrt-e08 kernel: hda: lost interrupt

Once this happens, the host is unresponsive in VC.

As suggested by VMware, I disconnected the CD-ROM and it removes this error. However, I now get these errors.

Jan 8 18:58:36 usri-pvrt-e08 kernel: end_request: I/O error, dev 03:00 (hda), sector 0

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: status=0x20

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: error=0x20LastFailedSense 0x02

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: ATAPI reset complete

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: status=0x20

Jan 8 18:58:36 usri-pvrt-e08 kernel: hda: status error: error=0x20LastFailedSense 0x02

This does help and makes the server responsive in VC but I would like to have the CD-ROM connected and figure out the real issue.

This error only occurs on HP ProLiant DL585G2s in our environment. We have G1 and G5s that don't see this issue. I also run the HP v8.1 agents. I have stopped the hpasm service to be sure the HP agents were not causing issues since they are always problematic. I have verified that every VM on these hosts has the CD-ROM set to Client Device and is disconnected. I can't imagine the CD-ROMs on these DL585G2s needs to be replaced for all 3 having this issue. We are in VC 2.5update3 and I also saw this issue when in VC 2.5update1.

0 Kudos
halston4d4
Contributor
Contributor

Matt.B

I'm having the same issue on my DL585G2s and like you am not seeing it on any of the G1s and G5 servers. Did you ever get a resolution to this issue?

kernel: hda: lost interrupt

0 Kudos
Matt_B1
Enthusiast
Enthusiast

No, it is extremely annoying. I don't understand why I have to manually eject the entire CD-ROM unit from the server to prevent this recurring message. I have to go down to the data center again today to pull another drive. I am suprised more people haven't run into this. I still see the issue after our upgrade to 3.5U4 build 176894.

0 Kudos