Yesterday we had a problem when Datastores became unvisible to all ESX 3.5 in our farm. Though luns where still visible from all ESX's HBAs.
I've searched over Internet and did not find how to fix my problem. Trick with enabling LVM.Resignature did not help. No errors about snapshots appeared in /var/log/vmkernel.
The solution was brought to us by VMWare support team, and i'm putting it here just for other people who meet this problem (because the solution has a little trick that you will not find out by yourself).
So this is the magic sequence of commands (x346-10 is one of ESXes 3.5 and /dev/sdn is one of disappeared disks, which seemed not to contain any partitions):
-
The number of cylinders for this disk is set to 109222.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sdn: 898.3 GB, 898388459520 bytes
255 heads, 63 sectors/track, 109222 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-109222, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-109222, default 109222):
Using default value 109222
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fb
Changed system type of partition 1 to fb (Unknown)
Command (m for help): p
Disk /dev/sdn: 898.3 GB, 898388459520 bytes
255 heads, 63 sectors/track, 109222 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdn1 1 109222 877325683+ fb Unknown
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
The number of cylinders for this disk is set to 109222.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): x
Expert command (m for help): b
Partition number (1-4): 1
New beginning of data (63-1754651429, default 63): 128
Expert command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
-
After that, rescan storage adapters using VMware infractructure Client and lost Datastore reappears!
Hello,
Look for SCSI and related errors within the file /var/log/vmkernel. You may be experience LUN failover.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
SearchVMware Blog: http://itknowledgeexchange.techtarget.com/virtualization-pro/
Blue Gears Blogs - http://www.itworld.com/ and http://www.networkworld.com/community/haletky
As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization
No, there is no problem with failover or multipathing - they were and are working fine. This was just some bug from VMWare that is resolved now and above is the solution. The aim of this discussion is to make this solution public.