lspin's Posts

Resolved with the help of VMware support. Stated there is a Dell VxRail KB out there somewhere with the resolution if the "Disconnect" and "Reconnect" does not work. Steps we took are below. Hope this... See more...
Resolved with the help of VMware support. Stated there is a Dell VxRail KB out there somewhere with the resolution if the "Disconnect" and "Reconnect" does not work. Steps we took are below. Hope this helps someone. STEP 1: Take a snapshot of the vCenter STEP 2: SSH to vcenter and ran commands: cat /etc/vmware-vpx/vcdb.properties (you'll need the outputted password for the "psql -d VCDB vc" command below) cd /opt/vmware/vpostgres/current psql -d VCDB vc select ID,DNS_NAME,endorsement,_key,attestation_identity_key from VPX_HOST; (you'll need the ID# listed at the start of each host line item) STEP 3: Ctrl + Z to back out. STEP 4: Backup file: /opt/vmware/vpostgres/current/bin/pg_dump -U postgres -t VPX_HOST VCDB > /tmp/VPX_HOST.sql STEP 4: Then back into KB to make one change: update VPX_HOST set endorsement_key=' ', attestation_identity_key=' ' where id =ID#; [ID# is the ID of the affected host from the "select ID,DNS,_NAME..." list. Example: 726 | 10.10.10.10 | AToAAQALAAMAsgAgg3GXZ0SEs/g....] STEP 5: Lastly, from vsphere client, we disconnected and reconnected the host.
I've tried disconnecting and reconnecting the host, but the alert is still present.
We recently had one of our hosts system board replaced by HP. However, when they replaced the system board they did not install a new TPM chip. The old board had a TPM chip that was already managed by... See more...
We recently had one of our hosts system board replaced by HP. However, when they replaced the system board they did not install a new TPM chip. The old board had a TPM chip that was already managed by vSphere. They recently came out and replaced the system board and installed a new TPM chip. However, now we're getting the following error in vSphere "The new host TPM endorsement key doesn't match the one stored in the DB." We have a vDS configured on the cluster and the host is also part of a vSAN cluster. However, the host has been placed in M-mode with full-data migration. Any suggestions on how to resolve?
@TheBobkin cold reboot worked. Thanks for you help!
It was hot swapped with the host in full data migration m-mode. I will run the test on one of the hosts later today and let you know how it goes. I will do a full data migration first. 
Hi @TheBobkin ,  I tried the # esxcli vsan storage remove -u UUIDHere command on all 3 hosts to remove the offending UUID. Then ran the restarted the vmware-vsan-health service on the vCenter and al... See more...
Hi @TheBobkin ,  I tried the # esxcli vsan storage remove -u UUIDHere command on all 3 hosts to remove the offending UUID. Then ran the restarted the vmware-vsan-health service on the vCenter and also restarted /etc/init.d/vsanmgmtd restart daemon on the ESXi hosts as well.  However, the alarm is still present in the vsan skyline health. Also, the Unknown UUIDs are still displayed when I run the esxcli vsan storage list command on the hosts as well. They still have no naa. disks associated with them.  
@TheBobkin I see.  # vdq -Hi command outputted all the disks without the disk group UUID. I was able to match up the listed naa. disks with their perspective disk group UUID.  # esxcli vsan storage... See more...
@TheBobkin I see.  # vdq -Hi command outputted all the disks without the disk group UUID. I was able to match up the listed naa. disks with their perspective disk group UUID.  # esxcli vsan storage list command does list the one UUID as follows (displays the same for all affected hosts, different UUIDs): Unknown Device: Unknown Display Name: Unknown Is SSD: false VSAN UUID: 527ce5f5-5716-1a0b-d15b-3abbba6cc027 VSAN Disk Group UUID: VSAN Disk Group Name: Used by this host: false In CMMDS: false On-disk format version: -1 Deduplication: false Compression: false Checksum: Checksum OK: false Is Capacity Tier: false Encryption Metadata Checksum OK: true Encryption: false DiskKeyLoaded: false Is Mounted: false Creation Time: Unknown   If I understand correctly, the " # escli vsan storage remove -u UUIDHere " should work in removing the UUID from the host? I will have to run this command against the UUID displayed as unknown in the vSAN Operation Health for each identified host, correct? Once I remove the UUID on the hosts do I need to do both suggested restarts or just the ESXi " # /etc/init.d/vsanmgmtd retart " on the host?
We recently had a few vSAN cache disks fail in 3 of our ESXi hosts (at separate times). To replace, we placed the hosts in maintenance mode with "Full data migration", removed the capacity disks fro... See more...
We recently had a few vSAN cache disks fail in 3 of our ESXi hosts (at separate times). To replace, we placed the hosts in maintenance mode with "Full data migration", removed the capacity disks from the disk group, then removed the disk group. Once we replaced the failed cache SSD, we recreated the disk group with the remaining 7 capacity SSDs.  However, now we are seeing the vSAN "Operation Health" alarm showing the following for each host.  Host = (host IP address) line for each ESXi host where we replaced cache SSD and recreated disk group. Disk = "Unknown" Overall health = "Red !" Metadata health = "Red !" Operational health = "Red !" In CMMDS/VSI = "No/No" Operational State Description = "Unknown disk health state" UUID = UUIDs of the old disk groups we removed when replacing the cache SSD disks.   Attached is the rvc output when we try to identify the UUID locations. None of our VMs utilizing the vSAN cluster show as inaccessible or orphaned. Any idea how we can get rid of what seems to be phantom vSAN disk groups?   
Hi, We recently had an issue with the Legacy Lookup Service certificate expiring in 10 days. To resolve this issue, support had us use the vCert tool to renew the Machine SSL cert, Solution user cer... See more...
Hi, We recently had an issue with the Legacy Lookup Service certificate expiring in 10 days. To resolve this issue, support had us use the vCert tool to renew the Machine SSL cert, Solution user certs, and use lsdoctor -t for the SSL Trust Anchors.  After doing so, we had to reconfigure our vSphere replication appliance and SRM appliance to reregister them with the new vCSA Machine SSL cert.  However, now we are not able to get vmware-vapi-endpoint service to start. We've tried manually starting using the VAMI and receive the "Service crashed while starting" error.  After doing some research, ran across article KB-59555 https://kb.vmware.com/s/article/59555 Command: # for i in `grep -l "BEGIN X509 CRL" *`;do openssl crl -inform PEM -text -noout -in $i | grep -A 1 " Authority Key Identifier";done  - This command outputs several "grep: ########.0: No such file or directory"  Command: # for i in `grep -l "BEGIN CERTIFICATE" *`;do openssl x509 -in $i -noout -text | grep -A 1 "Subject Key Identifier";done - This command outputs several (more than 20) "X509v3 Subject Key Identifier:  "   I am wondering if this is something that can be corrected with lsdoctor.py or vCert tool. Any help would be greatly appreciated. 
Thanks @TheBobkin @TheBobkin  I ran the commands as suggested, output is attached.  Looks like it's a vmdk for one of our place holder VMs. This is a DR cluster that holds all the replica VMs from ... See more...
Thanks @TheBobkin @TheBobkin  I ran the commands as suggested, output is attached.  Looks like it's a vmdk for one of our place holder VMs. This is a DR cluster that holds all the replica VMs from the primary site using vSphere Replication and SRM. This VM is not powered on but syncs periodically.    Ran the "vsish -e set" command against the objects UUID. vSAN object health now lists the object with the "Reduced availability with no rebuild - delay timer". Resync objects shows this is scheduled to be resynced in the next 60 minutes. 
I've checked disk management for any disks that may be failed on the host, but none show as "Evacuated" or failed. I also checked the iLO's storage to see if there were any disks in predictive failur... See more...
I've checked disk management for any disks that may be failed on the host, but none show as "Evacuated" or failed. I also checked the iLO's storage to see if there were any disks in predictive failure or degraded state and all show as healthy.  I then ran the RVC commands to locate the object which reports as "Unknown disks" and generated the output in the attached "RFC_outputs.txt" file.  I'm not sure where to go from here. Seems like it could be left over from a VM that has sense been deleted, but I want to be sure before deleting the object. 
Hi,  I'm looking for assistance resolving an unhealthy object in our vSAN. We are in the process of upgrading our environment from 6.7 to vSphere 7.0 and will need to correct this issue before upgra... See more...
Hi,  I'm looking for assistance resolving an unhealthy object in our vSAN. We are in the process of upgrading our environment from 6.7 to vSphere 7.0 and will need to correct this issue before upgrading. Unfortunately, we cannot open a VMware support request until we complete the upgrade to 7. Skyline health displays the following for the object:   When I check the physical placement of the object it shows the following: I haven't been able to find a whole lot of vSAN troubleshooting KBs to help resolve so I'm reaching out here. If anyone can help, it would be greatly appreciated. Thanks.
Resolved by running the ./vCert tool and choosing the following options: vCenter 6.7 Certificate Management Utility (4.8.0) ----------------------------------------------------------------- 1. Che... See more...
Resolved by running the ./vCert tool and choosing the following options: vCenter 6.7 Certificate Management Utility (4.8.0) ----------------------------------------------------------------- 1. Check current certificates status 2. View Certificate Info 3. Manage Certificates 4. Manage SSL Trust Anchors 5. Check configurations 6. Reset all certificates with VMCA-signed certificates 7. ESXi certificate operations 8. Restart services 9. Generate certificate report E. Exit Then choosing the Option 18. Clear Machine SSL CSR in VECS 
We recently had to go through the process of renewing the STS certs on our vCSA 6.7u3r appliance. After several issues with this process we've finally got all the certs to show as valid.  However, w... See more...
We recently had to go through the process of renewing the STS certs on our vCSA 6.7u3r appliance. After several issues with this process we've finally got all the certs to show as valid.  However, when running the check using the vCert tool against the vCSA, now we get the following expired certificate displayed: Checking Certifcate Status ----------------------------------------------------------------- Checking Machine SSL certificate VALID Checking Machine SSL CSR EXPIRED Checking Solution User certificates: machine VALID vsphere-webclient VALID vpxd VALID vpxd-extension VALID Checking SMS certificate VALID Checking data-encipherment certificate VALID Checking Authentication Proxy certificate VALID Checking Auto Deploy CA certificate VALID Checking BACKUP_STORE entries: Checking VMDir certificate VALID Checking VMCA certificate VALID   How do we correct the "Machine SSL CSR" certificate. We've not been able to find a KB for this specific certificate. 
  • So
Greetings,  We are in the process of patching our current VMware Infrastructure and wanted to get clarity on best practice upgrade path for the vSphere Replication Appliance and Site Recovery Manage... See more...
Greetings,  We are in the process of patching our current VMware Infrastructure and wanted to get clarity on best practice upgrade path for the vSphere Replication Appliance and Site Recovery Manager Appliance.  We are currently running vSphere Replication appliance 8.4.0.4 build 19074777. We are also running Site Recovery Manager appliance 8.4.0.4 build 19073435. Our vCSA will be patched to v6.7u3r. I checked the interoperability update matrix and confirmed we can go from 8.4 to 8.5, but I would like to know if I can go ahead and apply the most recent 8.5.0.x patch without having to apply incremental 8.5 patches.  I also noticed the SRM 8.5.0.4 is no longer available for download. vSphere Replication download only goes up to 8.5.0.4. Will it be okay to have vSphere Replication running 8.5.0.4 and SRM running 8.5.0.5?
Update: We've figured it out by resetting the machine account password using the below kb.  https://kb.vmware.com/s/article/214280 Also, found the below kb that provides the 'reset_machine_pw.sh' s... See more...
Update: We've figured it out by resetting the machine account password using the below kb.  https://kb.vmware.com/s/article/214280 Also, found the below kb that provides the 'reset_machine_pw.sh' shell script to complete the same password reset action. https://kb.vmware.com/s/article/70756 Hope this helps anyone with the same issue.
Trying to help out my new employer with an issue accessing one of their vCenters. From what I gather, they've been running vcsa 6.5 with an external PSC 6.5. At some point they loss connectivity to t... See more...
Trying to help out my new employer with an issue accessing one of their vCenters. From what I gather, they've been running vcsa 6.5 with an external PSC 6.5. At some point they loss connectivity to the vsphere web-client and receive the following error: "[400] an error occurred while sending an authorization request to the vcenter SSO Server. AFD Native Error occurred 9234" Haven't been able to find this specific error anywhere on the net.  We've tried shutting down the vcsa, reboot the PSC, and bring back up the vcsa. No luck.