VMware Cloud Community
rbsenske
Contributor
Contributor

vSAN re-build disk group problem on HPE Synergy

Sorry; it's a long story but I'll try and keep it short and succinct. I'm running vCenter 7.0 on an HPE

Synergy (OneView SDDC) stack with a number of blades and in rack storage provided by a D3940. Each is

configured with 1 SSD cache drives and 4 HDD capacity drives.

I don't think this is relevant to the issue but it's how the problem got started:: OneView reported a "Drive

not authenticated" problem for an SSD (cache tier) drive on one of the blades. Apparently that problem is not

a drive problem but an inability for the controller to communicate with the fancy drive caddy, with the winky

lights and switches, that holds the drive for swapping in and out of the enclosure. If re-seating the drive

doesn't fix this problem, HPE's solution is to swap the drives around and see if it goes away, before

replacing the drive and the caddy with a new one.

Since I didn't have a equivalent spare SSD, I would have to swap the SSDs between two blades.

Since. as I understand it, you can't remove any cache tier drive without removing the entire disk group, I

migrated all the VMs off two blades in preparation for the swap of the SSDs and a re-build of the disk groups.

I removed the whole disk groups (1x SSD + 4x HDD capacity each) swapped the SSDs - all green lights. Good to

plan I thought then...

When I went to rebuild the disk groups I saw the capacity tier HDDs in the available list to re-build the two

disk groups but NO SSDs for the cache tier... and you can't rebuild a disk group w/o a cache drive.

I don't know whether the vSAN didn't properly/completely remove the cache drives or whether the

Synergy/OneView grabbed or didn't release the SSD so they reappear in the available list to rebuild the disk

groups.

I've gone through many futile exercises trying to get the SSDs to reappear to rebuild the disk groups on the

blade, but to no avail.

Has anyone else encountered a problem like this. Is it a VMware vSAN problem or an HPE Synergy problem? Or is

it just a case of an "easy wrench thrown in a too smart SDDC technology"

(I've left out a lot of details and some steps to keep this short... and get some ideas, so let me know what

additional information you might need.)

Any of your thoughts on this matter would be much appreciated.


Thanks

Randy

0 Kudos
2 Replies
TheBobkin
Champion
Champion

@rbsenske, when it comes to disks being ineligible for vSAN (or any other file system), this can be initially approached from either end of the stack but I tend to advise starting in the middle:
Are the devices seen by ESXi? This is dependent on PSA/HPP layers being able to successfully interact with the device and establish path(s) to it etc.
This is easily validated via vSphere client - Host > Configure > Storage Devices > select the device
While you are at this location, you can also check whether it sees partitions on the device or not and of what type (though note, some partially/unsuccessfully removed or test partitions won't be detected here).

 

If the devices are seen then check if they have partitions as per above - if they do then these will need to be removed before these will show as eligible for claim.
If they don't have partitions (or it sees none but still can't write a new one) then validate you can write to them either by putting a temporary VMFS-6 partition on them or via CLI overwriting the one on it e.g.:
(NOTE: this is permanently destructive to any partition data on this device, the assumption here is that you evacuated all data off these Disk-Groups, that all of your data is currently healthy and the original Disk-Groups are not needed)
# dd if=/dev/zero of=/vmfs/devices/disks/<naa.###> bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/<naa.###> msdos

 

If you cannot write to the devices OR cannot even see the devices from ESXi perspective then you have an issue on the controller/hardware side - this could be numerous things including things like the devices not properly supporting hot-add/swap and the servers needing a reboot to see them properly or the devices losing their controller mapping and being 'unconfigured-good' or other unusable state.

 

0 Kudos
rbsenske
Contributor
Contributor

Thanks very much for your response. I believe the answer is yes, ESXi can see the disks, but I can confirm that tomorrow as I plan to go into the office. (I can only access the systems from on site.)

The disks were originally part of a healthy vSAN configuration until I removed the disk group to swap a cache tier drive, then I think there was some unexpected interaction with the HPE OneView S/W, taking control of the SSDs. The HDDs seemed to be released fine and are available for a disk group rebuild.

I will verify and try the things you suggest tomorrow and post an update.

I very much appreciate your input on this.

Thanks

Randy

0 Kudos