Question for the masses - I'm building a 2-node hybrid vSAN cluster. In each host I have 6x900GB for capacity and 4x480GB SSD for cache. These are getting broken down into 2 disk groups (3x900+1x480GB) per server. When I was creating the cluster, one of the 900GBs from one hosts was not showing up in the vSAN configuration window. Just to get by, I set one 900GB from the other host to "Do not claim" so that I could complete the configuration and move forward. Coming back to look at this missing disk I've learned the following:
[:~] esxcli ssacli cmd -q "ctrl slot=0 pd all show"
Smart Array P440ar in Slot 0 (Embedded) (HBA Mode)
HBA Drives
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA SSD, 480 GB, OK)
[:~] esxcli ssacli cmd -q "ctrl slot=0 pd 2I:1:6 show detail"
Smart Array P440ar in Slot 0 (Embedded) (HBA Mode)
HBA Drives
physicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: OK
Drive Type: HBA Mode Drive
Interface Type: SAS
Size: 900 GB
Drive exposed to OS: True
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDC
Serial Number: [rem]
WWID: 5000CCA05739F1BD
Model: HP EG0900FBVFQ
Current Temperature (C): 38
Maximum Temperature (C): 50
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
PHY Physical Link Rate: Unknown, Unknown
PHY Maximum Link Rate: Unknown, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA SSD, 480 GB, OK)
There's no existing RAID configuration or partitions on the disk and the issue follows the disk if I move it to another slot in the server. Looking for additional thoughts while I dig deeper on my end.
@pdirmann01, So, looks like the partitions weren't or couldn't be removed, can you try this with dd?
This should tell whether it is read-only also, if it is then see if it is the same after a cold power-cycle of the host, if it is still read-only or otherwise problematic after this then potentially you have a bad device:
# dd if=/dev/zero of=/vmfs/devices/disks/naa.5000cca05739f1bc bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/naa.5000cca05739f1bc msdos
@pdirmann01 , Can you check whether you can write *anything* to it e.g. temporarily make a VMFS datastore with it?
If you can't then potentially the device is read only or otherwise not being detected correctly (one example being NMP failing READ CAPACITY) - /var/log/vmkernel.log should show the reason.
Valid thought. I'll attempt to format it with VMFS and let your know. Assumed this wouldn't be an issue because it had a previous partition table that I wiped. Standby!
@pdirmann01, How were you validating that wiping partitions off that disk was successful or not?
You can check it in the UI but also from the CLI - 'vdq -q' will show as ineligible for vSAN as has partitions and 'ls /dev/disks/' will show naa.xxxxxxxxxxxxxxx:1 (or however many partitions it has).
@TheBobkin - it was via GUI, but good call!
"Name" : "naa.5000cca05739f1bc",
"VSANUUID" : "",
"State" : "Ineligible for use by VSAN",
"Reason" : "Has partitions",
"IsSSD" : "0",
"IsCapacityFlash": "0",
"IsPDL" : "0",
"Size(MB)" : "858483",
"FormatType" : "512n",
"IsVsanDirectDisk" : "0"
},
[:~] partedUtil delete /dev/disks/naa.5000cca05739f1bc 3
Error: Function not implemented during read on /dev/disks/naa.5000cca05739f1bc
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/naa.5000cca05739f1bc) diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790)
Warning: The available space to /dev/disks/naa.5000cca05739f1bc appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (7032365056 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790) NewLastUsableLBA (1758174734)
Error: Can't have a partition outside the disk!
Unable to construct disk from device /dev/disks/naa.5000cca05739f1bc
[:~] partedUtil fix /dev/disks/naa.5000cca05739f1bc
Error: Function not implemented during read on /dev/disks/naa.5000cca05739f1bc
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/naa.5000cca05739f1bc) diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790)
Warning: The available space to /dev/disks/naa.5000cca05739f1bc appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (7032365056 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790) NewLastUsableLBA (1758174734)
Error: Can't have a partition outside the disk!
@pdirmann01, So, looks like the partitions weren't or couldn't be removed, can you try this with dd?
This should tell whether it is read-only also, if it is then see if it is the same after a cold power-cycle of the host, if it is still read-only or otherwise problematic after this then potentially you have a bad device:
# dd if=/dev/zero of=/vmfs/devices/disks/naa.5000cca05739f1bc bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/naa.5000cca05739f1bc msdos
Good to go! Thanks for the assist. I think I was severely overthinking this one. 🙂