Hello together,
at a customers site we are running a vcenter Server Appliance (Version 6.7.0.41000, Build 14836122). Since today in the morning I have the Problem, that my monitoring solution cannot query information about the connected ESXi hosts any more. After logging into vCenter Web Ui i received the Error message: "Could not connect to one or more vCenter Server systems https://v-center:443/sdk".
I found this thread, where user bleuze had the same problem: SOLVED: Could not connect to one or more vCenter Server systems: .
I followed the instructions and found out that vmware-vpxd is not running. After restarting the service everything seems to work fine for a few seconds, until vmware-vpxd crashes again.
I did some further investigation on the /storage/log/vmware/vpxd/vpxd.log (find attached in next posting). I am not really clear, what the cause for the crash is. Maybe there is a faulty package installed or downloaded by update manager? Find errors from Line 3292 in the log file.
2019-10-29T10:37:00.981+01:00 error vpxd[05017] [Originator@6876 sub=[SSO] opID=a048760] [UserDirectorySso] GetUserInfo exception: N7Vmacore9Authorize25AuthUserNotFoundExceptionE(User localos\com.vmware.vim.eam)
--> [context]zKq7AVECAAAAAGC34QAVdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAIyfGAHcNvV2cHhkAAFLaPUBemv1AQhx9QGiXPUBY131AZKdngGaoZ+CIbEBAWxpYnZpbS10eXBlcy5zbwABA9JyAUzWcQH943EBpD5yAHFvIwA6ciMAnVYrA9RzAGxpYnB0aHJlYWQuc28uMAAE3Y4ObGliYy5zby42AA==[/context]
2019-10-29T10:37:00.983+01:00 error vpxd[05017] [Originator@6876 sub=[SSO] opID=a048760] [UserDirectorySso] NormalizeUserName(com.vmware.vim.eam, false) exception: N7Vmacore9Authorize25AuthUserNotFoundExceptionE(User localos\com.vmware.vim.eam)
--> [context]zKq7AVECAAAAAGC34QAVdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAIyfGAHcNvV2cHhkAAFLaPUBemv1AQhx9QGiXPUBY131AZKdngGaoZ+CIbEBAWxpYnZpbS10eXBlcy5zbwABA9JyAUzWcQH943EBpD5yAHFvIwA6ciMAnVYrA9RzAGxpYnB0aHJlYWQuc28uMAAE3Y4ObGliYy5zby42AA==[/context]
2019-10-29T10:37:00.998+01:00 info vpxd[05017] [Originator@6876 sub=vpxLro opID=a048760] [VpxLRO] -- FINISH lro-233
2019-10-29T10:37:02.244+01:00 info vpxd[05007] [Originator@6876 sub=DAS] [FdmManager::MonitorVmHostStateCallback] All VMs and hosts have been protected.
2019-10-29T10:37:02.665+01:00 info vpxd[04985] [Originator@6876 sub=vpxLro opID=5c421eb3] [VpxLRO] -- BEGIN lro-238 -- CustomFieldsManager -- vim.CustomFieldsManager.addFieldDefinition -- 52c75802-dce2-4af1-9514-2c6831594611(5232e022-9e06-f146-e2fb-3654948be4fa)
2019-10-29T10:37:02.666+01:00 info vpxd[04985] [Originator@6876 sub=vpxLro opID=5c421eb3] [VpxLRO] -- FINISH lro-238
2019-10-29T10:37:02.666+01:00 info vpxd[04985] [Originator@6876 sub=Default opID=5c421eb3] [VpxLRO] -- ERROR lro-238 -- CustomFieldsManager -- vim.CustomFieldsManager.addFieldDefinition: vim.fault.DuplicateName:
--> Result:
--> (vim.fault.DuplicateName) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> name = "com.vmware.vsan.clusterstate",
--> object = 'vim.CustomFieldsManager:9334614a-3cb3-4a53-b975-2fce444bafc1:CustomFieldsManager'
--> msg = ""
--> }
--> Args:
-->
--> Arg name:
--> "com.vmware.vsan.clusterstate"
--> Arg moType:
--> "vim.ClusterComputeResource"
--> Arg fieldDefPolicy:
-->
--> Arg fieldPolicy:
-->
Opening the Update Manager on vSphere Web Client displays "An unexpected error occured".
I am a little bit confused on how to continue to solve the Problem. I'd be very gratefull, if somebody had an idea on how to continue and fix this problem.
Thank you in advance!
I have found this i log:
2019-10-29T10:36:22.729+01:00 error vpxd[05056] [Originator@6876 sub=vpxdVdb] Shutting down the VC as there is not enough free space for the Database(used: 95%; threshold: 95%).
Helpful should be:
https://kb.vmware.com/s/article/67017
and this:
vCenter automatically shutdown vCenter Service when vPostrgres DB exceed 95% (i don't remember, it could be different value).
I have found this i log:
2019-10-29T10:36:22.729+01:00 error vpxd[05056] [Originator@6876 sub=vpxdVdb] Shutting down the VC as there is not enough free space for the Database(used: 95%; threshold: 95%).
Helpful should be:
https://kb.vmware.com/s/article/67017
and this:
vCenter automatically shutdown vCenter Service when vPostrgres DB exceed 95% (i don't remember, it could be different value).
Looks like you are hitting a know issue with 6.7U3
Correct. I just found out a few minutes we are affected from the excessive Hardware health alarms described here: VMware Knowledge Base
vCenter got tons of Events that filled up the seat partition:
Thank you KocPawel for the hint. I was able to resize the vCenter partition by help of this article: https://vm.knutsson.it/2018/07/10fb-does-not-support-flow-control-autoneg/#more-749 .
For the first step I'll limit Event log to 14 days which should be enough to keep SEAT partition free.
Re-sizing /storage/seat is not a good option. During next upgrade you might be limited to large and x-large options.
This would be unfortunate. What would you recommend to do?
I have a vCenter Database Backup automatically created tonight and a VEEAM Backup, also created tonight. The error occured this morning, so I could restore vCenter from VEEAM Backup, run it and limit the Event retention time so that it would not fill up the disk completeley.
I never had to restore vCenter from a VEEAM VM Backup. Is there something I should consider before doing this?
From KB: VMware Knowledge Base follow the section:
To workaround the excessive events filling VCDB, user can follow the below step:
Thank you for your response. I have read this KB article, but I already resized the SEAT partition. So my question would be how to revert this, to avoid running into the upgrade problems you mentioned before.
Do you know if there is something i should consider before restoring VCSA from my veeam backup? I would do this to have VCSA in the state before resizing it and then following the truncation solution mentioned in the KB article.
Please share the output of below command:
df -Th
root@VCSA001 [ ~ ]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 4.9G 0 4.9G 0% /dev
tmpfs tmpfs 4.9G 872K 4.9G 1% /dev/shm
tmpfs tmpfs 4.9G 684K 4.9G 1% /run
tmpfs tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup
/dev/sda3 ext4 11G 6.3G 3.8G 63% /
tmpfs tmpfs 4.9G 1.5M 4.9G 1% /tmp
/dev/mapper/log_vg-log ext4 9.8G 3.5G 5.8G 38% /storage/log
/dev/mapper/updatemgr_vg-updatemgr ext4 99G 1.3G 93G 2% /storage/updatemgr
/dev/mapper/db_vg-db ext4 9.8G 389M 8.9G 5% /storage/db
/dev/mapper/dblog_vg-dblog ext4 15G 310M 14G 3% /storage/dblog
/dev/mapper/netdump_vg-netdump ext4 985M 1.3M 916M 1% /storage/netdump
/dev/mapper/autodeploy_vg-autodeploy ext4 9.8G 34M 9.2G 1% /storage/autodeploy
/dev/mapper/seat_vg-seat ext4 15G 4.6G 9.4G 33% /storage/seat
/dev/mapper/imagebuilder_vg-imagebuilder ext4 9.8G 23M 9.2G 1% /storage/imagebuilder
/dev/sda1 ext4 120M 34M 78M 31% /boot
/dev/mapper/core_vg-core ext4 25G 180M 24G 1% /storage/core
/dev/mapper/archive_vg-archive ext4 50G 30G 18G 63% /storage/archive
Did you tried to truncate the event id;s and sundry house keeping entries from embedded post gress DB .
No i did not. But I guess this would have been successfull. Sadly I already resized the SEAT partition and am afraid of problems in future now because of Vijay2027's hint about large and x-large upgrades. Thought resizing would not be a problem. I am now thinking of restoring VCSA into a state before it ran full and to truncate the logs afterwards.
I would still suggest to truncate DB to avoid getting into trouble during upgrade. If this is not prod critical, perhaps it will be good idea to build a new one and do a selective restore.
Thank you for your help! What exactly do you mean by selective restore?
vCSA is well below default storage size 300 GB capacity. Looks good for now.