The problem:
When adding a new datastore, I am able to view all of the LUNs that I am preparing to add. When I add the storage, I receive errors that the object reference is invalid, and the operation fails
Events leading up to the problem:
This iSCSI storage is non-production.. I'm experimenting with VDR (Really cool btw, I look forward to future releases). As this was unimportant storage, when I decided to make some radical changes to the storage, I simply went into the SAN and told it to set itself back to factory. I made no changes to the vCenter or ESX storage configurations.
Other information:
---When adding the datastore via the vSphere Client's "Add Storage" wizard, the first LUN in my new volume shows as having a VMFS Label of "VDR (head)". Yes, the old datastore that I didn't remove correctly was called VDR.
---The LUNs in the SAN were deleted, and new LUNs were created.
Anyone have any ideas on where to begin? To me, the logical place to start looking is where the now defunct VDR datastore information is held, so that I can clean said information out.. I just don't know where that would be.
Your running ESX4?
esxcfg-scsidevs -m
Will give you the /dev/name for the VMFS Volume names
and
esxcfg-scsidevs -l
Gives you the vml to /dev/name so you can verify
dd if=/dev/zero of=/dev/name count=2048 bs=512
blows it away (backup first!)
vExpert 2009
Thank you for the answer.
I'm reading through the article now, but while I'm reading I'll check a few things here as well. I grew the LUN with the vmfs label back up to 1TB, and attempted to re-add (choosing to keep the signature) however the wizard complains that it cannot read the partition information, and will not allow me to proceed any further.
Sounds like the VMFS header/metadata is destroyed/missing parts and that's not good. There really is no way to bring it back without the full VMFS header image and metadata. The blog entry talks about header protection.
The LU recreation on the SAN may have blown that part away or allocated a
different raw block assignment.
If the metadata is missing then you can only zero the partition and start fresh.
vExpert 2009
Excellent article.. parts of it were a bit above my technical knowledge, but I think I understood most of it.
Would you have suggestions on how I can zero out the partition?
fdisk shows approximately 156 different partitions (production datastores and what not).. I do know the disk inside of /vmfs/devices/disks - do you know of a way I can find out which device that actually is?
Your article mentions the esxcfg-vmkhbadevs -m command (which I could then just grep for my iscsi adapter and fdisk them all) however when I run that command, I am told that command does not exist.
Tabbing through 'esxcfg-vm', the only choice I have is for vmknics.
Yes, we're on ESX4.
After using esxcfg-scsidevs -l, I was able to find that the disk in question and which actual /dev/name it is.
Specifically, it is /dev/sdam. And there is a partition.. /dev/sdam1.
Check this out:
Disk /dev/sdam: 549.7 GB, 549756861952 bytes
255 heads, 63 sectors/track, 66837 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdam1 1 133674 1073736341 fb VMware VMFS
This partition is actually a bit over 1TB in size.. and also exactly the same size as the now defunct and gone datastore.
Using your advice, I have resolved the problem using commands that I'm more familiar with (as much as I trust someone with your level of expertise, I do like to know what the commands I'm entering do). I will post the solution in a separate post, and award you full points for guiding me right to the root of the problem.
Solution, which was found via the advice above:
esxcfg-scsidevs -l allowed me to find the /dev/name of the disk which had a bogus partition on it.
The device name in my case was /dev/sdam. An fdisk -l /dev/sdam showed that there was a partition present, and that said partition was actually 1TB in size (as it would have been, before I deleted the LUN and rebuilt it). /dev/sdam was showing up correctly as a 512GB disk - obviously, this would have caused some confusion.
To fix, I used the following commands to delete the bogus partition:
The number of cylinders for this disk is set to 66837.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): d
Selected partition 1
Command (m for help): p
Disk /dev/sdam: 549.7 GB, 549756861952 bytes
255 heads, 63 sectors/track, 66837 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.