VMware Cloud Community
bellocarico
Enthusiast
Enthusiast

problem with VM mapping raw disks

Hi all

I'm having some problems with a vm that has 4 physical disks physically mapped (indipendent).

This is the problem: if I reboot the host and with it the VM, once this boots up it often stops with an error saying that it can't find one of the disks.

I initially though about an HW fault, but the missing disk (if happening!) is random.

So booting the VM with the host it might either boot fine or report a rangom mapped disk missing.

Not sure how to get started to troubleshoot this...

Thanks!

0 Kudos
9 Replies
rlund
Enthusiast
Enthusiast

So four RDM's? Do you have multiple paths showing? Click manage paths. How are these connected? ISCSI, FF, NFS, etc. How is multipathing setup?

Almost sounds like its losing connection on reboot, etc.

Roger l

Roger Lund Minnesota VMUG leader Blogger VMware and IT Evangelist My Blog: http://itblog.rogerlund.net & http://www.vbrainstorm.com
0 Kudos
bellocarico
Enthusiast
Enthusiast

yes 4 RDMs and 4 paths. When a disk is not found the path is not found either. Specifically: when a disk goes missing, so the "Physical LUN or datastore mapping file"field does in the vm properties. Please see image in attachment.

One thing I can confirm is: this is not a cold boot problem! if I have ESXi and the vm running fine I could easily experience the problem after a normal reboot.

0 Kudos
rlund
Enthusiast
Enthusiast

Connection type? Any messages under events under the vm? Or host?

Any storage side logs?

Roger

Roger Lund Minnesota VMUG leader Blogger VMware and IT Evangelist My Blog: http://itblog.rogerlund.net & http://www.vbrainstorm.com
0 Kudos
bellocarico
Enthusiast
Enthusiast

When a disk is not found the .vmdk file (and not even the devices!) is present in the esxi filesystemeither! (checked via ssh browsing both /vmfs/volumes/datastore and /vnfx/devoces/disks directories)

I'm starting to think this is more a HD firmware problem... or ESXi as a maximum but I can't see anything wrong with the VM itself.

The error message says something like: impossible to boot up the VM as one of the disks with path xxxxxxxxxxx it missing.

I'll paste a screen shot of the error next time this is happening.

At this point I wish it was "just" a hard disk failure, now I livei nthe fear that my vm might not survive a host reboot and a manual interaction (additional reboot/s) is likely to be needed.

0 Kudos
bellocarico
Enthusiast
Enthusiast

P.S. The disks are SATA attached to the motherboard controller. According to the POST the disks are always found by the BIOS, but having said that I had experience in the past of disks attached only with the sata cable (not power) being seen by the BIOS.

THere's also a SATA dvd drive attached to the same controller, perhaps I'd better trying to disconnect it and see if it causes any problems.

0 Kudos
bellocarico
Enthusiast
Enthusiast

This is the error I get when one disk is missing:

0 Kudos
rlund
Enthusiast
Enthusiast

Try firmware, the server device , and raid controller is on the hcl?

Roger Lund Minnesota VMUG leader Blogger VMware and IT Evangelist My Blog: http://itblog.rogerlund.net & http://www.vbrainstorm.com
0 Kudos
bellocarico
Enthusiast
Enthusiast

I have a similar server with very similar HW specs and it works fine.

I *think* I've found the problem though...

As the disks are WD20EARS they have a well known problems with spin idle timer. I wasn't able to modify the HD firmware as suggested on the WD internet site but I've noticed that the problem has disappeared (3 consecutives host reboots with no problems so far) if I cut down the POST time. So I've skipped things like RAM test in the BIOS and other controls to bootstrap faster.

If that's really the problem I guess ESXi was finding "lazy" drives (due to the idle timer kicking in) and identify them as "not available".

0 Kudos
bellocarico
Enthusiast
Enthusiast

Actually... it's better now but not sorted! Before I was getting 1 good reboot out of 4/5 now it's one bad reboot only every 3/4

Better but the problem is still there 😞

0 Kudos