VMware Cloud Community
pdirmann01
Enthusiast
Enthusiast
Jump to solution

Hybrid vSAN 7.0U2 2-Node Cluster HDD Disk Detection

Question for the masses - I'm building a 2-node hybrid vSAN cluster. In each host I have 6x900GB for capacity and 4x480GB SSD for cache. These are getting broken down into 2 disk groups (3x900+1x480GB) per server. When I was creating the cluster, one of the 900GBs from one hosts was not showing up in the vSAN configuration window. Just to get by, I set one 900GB from the other host to "Do not claim" so that I could complete the configuration and move forward. Coming back to look at this missing disk I've learned the following:

[:~] esxcli ssacli cmd -q "ctrl slot=0 pd all show"

Smart Array P440ar in Slot 0 (Embedded) (HBA Mode)

HBA Drives

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 900 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS HDD, 900 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA SSD, 480 GB, OK)

[:~] esxcli ssacli cmd -q "ctrl slot=0 pd 2I:1:6 show detail"

Smart Array P440ar in Slot 0 (Embedded) (HBA Mode)

HBA Drives

physicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: OK
Drive Type: HBA Mode Drive
Interface Type: SAS
Size: 900 GB
Drive exposed to OS: True
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPDC
Serial Number: [rem]
WWID: 5000CCA05739F1BD
Model: HP EG0900FBVFQ
Current Temperature (C): 38
Maximum Temperature (C): 50
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
PHY Physical Link Rate: Unknown, Unknown
PHY Maximum Link Rate: Unknown, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None


physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA SSD, 480 GB, OK)

pdirmann01_0-1619212570029.png

pdirmann01_1-1619212743881.png

 

There's no existing RAID configuration or partitions on the disk and the issue follows the disk if I move it to another slot in the server. Looking for additional thoughts while I dig deeper on my end.

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

@pdirmann01, So, looks like the partitions weren't or couldn't be removed, can you try this with dd?
This should tell whether it is read-only also, if it is then see if it is the same after a cold power-cycle of the host, if it is still read-only or otherwise problematic after this then potentially you have a bad device:

 

# dd if=/dev/zero of=/vmfs/devices/disks/naa.5000cca05739f1bc bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/naa.5000cca05739f1bc msdos

View solution in original post

Reply
0 Kudos
7 Replies
TheBobkin
Champion
Champion
Jump to solution

@pdirmann01 , Can you check whether you can write *anything* to it e.g. temporarily make a VMFS datastore with it?

 

If you can't then potentially the device is read only or otherwise not being detected correctly (one example being NMP failing READ CAPACITY) - /var/log/vmkernel.log should show the reason.

Reply
0 Kudos
pdirmann01
Enthusiast
Enthusiast
Jump to solution

Valid thought. I'll attempt to format it with VMFS and let your know. Assumed this wouldn't be an issue because it had a  previous partition table that I wiped. Standby!

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@pdirmann01, How were you validating that wiping partitions off that disk was successful or not?

You can check it in the UI but also from the CLI - 'vdq -q' will show as ineligible for vSAN as has partitions and 'ls /dev/disks/' will show naa.xxxxxxxxxxxxxxx:1 (or however many partitions it has).

pdirmann01
Enthusiast
Enthusiast
Jump to solution

@TheBobkin - it was via GUI, but good call!

"Name" : "naa.5000cca05739f1bc",
"VSANUUID" : "",
"State" : "Ineligible for use by VSAN",
"Reason" : "Has partitions",
"IsSSD" : "0",
"IsCapacityFlash": "0",
"IsPDL" : "0",
"Size(MB)" : "858483",
"FormatType" : "512n",
"IsVsanDirectDisk" : "0"
},

 

 

Reply
0 Kudos
pdirmann01
Enthusiast
Enthusiast
Jump to solution

[:~] partedUtil delete /dev/disks/naa.5000cca05739f1bc 3
Error: Function not implemented during read on /dev/disks/naa.5000cca05739f1bc
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/naa.5000cca05739f1bc) diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790)
Warning: The available space to /dev/disks/naa.5000cca05739f1bc appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (7032365056 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790) NewLastUsableLBA (1758174734)
Error: Can't have a partition outside the disk!
Unable to construct disk from device /dev/disks/naa.5000cca05739f1bc

[:~] partedUtil fix /dev/disks/naa.5000cca05739f1bc
Error: Function not implemented during read on /dev/disks/naa.5000cca05739f1bc
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/naa.5000cca05739f1bc) diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790)
Warning: The available space to /dev/disks/naa.5000cca05739f1bc appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (7032365056 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (1758174768) AlternateLBA (8790539823) LastUsableLBA (8790539790) NewLastUsableLBA (1758174734)
Error: Can't have a partition outside the disk!

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@pdirmann01, So, looks like the partitions weren't or couldn't be removed, can you try this with dd?
This should tell whether it is read-only also, if it is then see if it is the same after a cold power-cycle of the host, if it is still read-only or otherwise problematic after this then potentially you have a bad device:

 

# dd if=/dev/zero of=/vmfs/devices/disks/naa.5000cca05739f1bc bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/naa.5000cca05739f1bc msdos

Reply
0 Kudos
pdirmann01
Enthusiast
Enthusiast
Jump to solution

Good to go! Thanks for the assist. I think I was severely overthinking this one. 🙂

 

pdirmann01_0-1619446480652.png

 

Reply
0 Kudos