pkomurka
Contributor
Contributor

Could not start vCenter Server appliance

Hello,

we have been using vCenter server (5.0 and 5.1) on Windows Server 2003 for a long time. Now, with vCenter Server 5.5 there is a need for Windows Server 2008 64-bit and accompanying database requirements. We decided to test vCenter linux appliance, and get rid of Microsoft stuff. So far we've tried vCenter Server 5.1 and 5.5 appliance on 5.0, 5.1 and 5.5 vSphere hypervisor with no luck. The problem is always the same: vpxd is interrupted because of segmentation fault, and this fault occurs in libvmomi.so. I've tried suggested network settings (correct IP, FQDN, and correct forward/backl resolving, etc.) and everything possible to be found on the internet.

The problem arises on fresh start, and running installation wizard:

Failed to execute '/usr/sbin/vpxd_servicecfg 'db' 'write' 'embedded' '' '' '' '' CENSORED':

/usr/sbin/vpxd_commonutils: line 910:  5898 Segmentation fault      (core dumped) $VPXD "$@"

/usr/sbin/vpxd_commonutils: line 910:  5907 Segmentation fault      (core dumped) $VPXD "$@"

/usr/sbin/vpxd_commonutils: line 910:  6276 Segmentation fault      (core dumped) $VPXD "$@"

VC_DB_SCHEMA_VERSION=

VC_DB_SCHEMA_INITIALIZED=1

VC_CFG_RESULT=407(Error: Failed to initialize schema.)

This leads to segmentation fault, and in system log we can find:

vpxd[4422] general protection ip:7f3146a9d5df sp:7fff38f1bd88 error:0 in libvmomi.so

[7f31467b5000+41f000]

This clearly states problem in libvmomi.so, which is segfaulting somewhere. I did strace of vpxd:

futex(0x7f9247442f90, FUTEX_WAKE_PRIVATE, 2147483647) = 0

brk(0)                                  = 0x7f925274f000

brk(0x7f9252770000)                     = 0x7f9252770000

futex(0x7f924e594018, FUTEX_WAKE_PRIVATE, 2147483647) = 0

open("/usr/lib/locale/locale-archive", O_RDONLY) = -1 ENOENT (No such file or directory)

open("/usr/share/locale/locale.alias", O_RDONLY) = 3

fstat(3, {st_dev=makedev(8, 3), st_ino=339728, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2512, st_atime=2013/10/17-12:08:59, st_mtime=2013/03/18-21:30:01, st_ctime=2013/09/07-01:07:16}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f924e7ae000

read(3, "# Locale name alias data base.\n#"..., 4096) = 2512

read(3, "", 4096)                       = 0

close(3)                                = 0

munmap(0x7f924e7ae000, 4096)            = 0

open("/usr/lib/locale/en_US.UTF-8/LC_CTYPE", O_RDONLY) = -1 ENOENT (No such file or directory)

open("/usr/lib/locale/en_US.utf8/LC_CTYPE", O_RDONLY) = 3

fstat(3, {st_dev=makedev(8, 3), st_ino=402782, st_mode=S_IFREG|0644, st_nlink=176, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=512, st_size=256324, st_atime=2013/10/17-12:09:34, st_mtime=2013/03/18-21:30:39, st_ctime=2013/09/07-01:07:16}) = 0

mmap(NULL, 256324, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f924e5f5000

close(3)                                = 0

open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3

fstat(3, {st_dev=makedev(8, 3), st_ino=402784, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=56, st_size=26050, st_atime=2013/10/17-12:08:59, st_mtime=2013/09/07-01:07:28, st_ctime=2013/09/07-01:07:28}) = 0

mmap(NULL, 26050, PROT_READ, MAP_SHARED, 3, 0) = 0x7f924e5ee000

close(3)                                = 0

futex(0x7f9246897f80, FUTEX_WAKE_PRIVATE, 2147483647) = 0

mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9244b83000

brk(0x7f9252793000)                     = 0x7f9252793000

brk(0x7f92527b4000)                     = 0x7f92527b4000

brk(0x7f92527d5000)                     = 0x7f92527d5000

brk(0x7f92527f6000)                     = 0x7f92527f6000

brk(0x7f9252817000)                     = 0x7f9252817000

brk(0x7f9252838000)                     = 0x7f9252838000

brk(0x7f9252859000)                     = 0x7f9252859000

brk(0x7f925287a000)                     = 0x7f925287a000

brk(0x7f925289c000)                     = 0x7f925289c000

brk(0x7f92528bd000)                     = 0x7f92528bd000

brk(0x7f92528de000)                     = 0x7f92528de000

brk(0x7f92528ff000)                     = 0x7f92528ff000

brk(0x7f9252920000)                     = 0x7f9252920000

brk(0x7f9252941000)                     = 0x7f9252941000

brk(0x7f9252962000)                     = 0x7f9252962000

brk(0x7f9252986000)                     = 0x7f9252986000

brk(0x7f92529a8000)                     = 0x7f92529a8000

brk(0x7f92529c9000)                     = 0x7f92529c9000

brk(0x7f92529ea000)                     = 0x7f92529ea000

brk(0x7f9252a0b000)                     = 0x7f9252a0b000

brk(0x7f9252a2c000)                     = 0x7f9252a2c000

brk(0x7f9252a4d000)                     = 0x7f9252a4d000

brk(0x7f9252a6e000)                     = 0x7f9252a6e000

brk(0x7f9252a8f000)                     = 0x7f9252a8f000

brk(0x7f9252ab0000)                     = 0x7f9252ab0000

brk(0x7f9252ad1000)                     = 0x7f9252ad1000

brk(0x7f9252af2000)                     = 0x7f9252af2000

brk(0x7f9252b13000)                     = 0x7f9252b13000

brk(0x7f9252b34000)                     = 0x7f9252b34000

brk(0x7f9252b55000)                     = 0x7f9252b55000

brk(0x7f9252b76000)                     = 0x7f9252b76000

brk(0x7f9252b97000)                     = 0x7f9252b97000

brk(0x7f9252bb8000)                     = 0x7f9252bb8000

brk(0x7f9252bd9000)                     = 0x7f9252bd9000

brk(0x7f9252bfa000)                     = 0x7f9252bfa000

brk(0x7f9252c1b000)                     = 0x7f9252c1b000

brk(0x7f9252c3c000)                     = 0x7f9252c3c000

brk(0x7f9252c5d000)                     = 0x7f9252c5d000

brk(0x7f9252c7e000)                     = 0x7f9252c7e000

brk(0x7f9252c9f000)                     = 0x7f9252c9f000

brk(0x7f9252cc0000)                     = 0x7f9252cc0000

brk(0x7f9252ce1000)                     = 0x7f9252ce1000

brk(0x7f9252d02000)                     = 0x7f9252d02000

--- SIGSEGV (Segmentation fault) @ 0 (0) ---

+++ killed by SIGSEGV (core dumped) +++

And also did GDB and backtrace of vpxd:

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff27a25df in Vmomi::DynamicData::DynamicData() () from /usr/lib/vmware-vpx/libvmomi.so

(gdb) backtrace

#0  0x00007ffff27a25df in Vmomi::DynamicData::DynamicData() () from /usr/lib/vmware-vpx/libvmomi.so

#1  0x00007ffff5889e92 in Vim::HistoricalInterval::HistoricalInterval() () from /usr/lib/vmware-vpx/libtypes.so

#2  0x00005555574f2834 in ?? ()

#3  0x000055555833a3d6 in ?? ()

#4  0x00007ffff7dc95d8 in ?? () from /usr/lib/vmware-vpx/libtypes.so

#5  0x000055555833a320 in ?? ()

#6  0x0000000000000000 in ?? ()

(gdb)

So at the end, there is some problem in Vmomi::DynamicData::DynamicData() in libvmomi.so

Is there any resolution for this error in vCenter server appliance ? Any other operating system/appliance/anything runs on host machine for months without a problem. I did hardware checks of server hardware and tried everything I could to diagnose the problem. One thing never works on any host I have: vpxd, it segfaults on libvmomi.so

Thank you for any possible hint.

Best regards,

Pavel Komurka

6 Replies
lhromadka
Contributor
Contributor

I try first time to run newly deployed vCenter Appliance 5.5 (VMware-vCenter-Server-Appliance-5.5.0.5101-1398493_OVF10.ova).

/var/log/boot.msg contains:

/usr/sbin/vpxd_commonutils: line 910:  3176 Segmentation fault      (core dumped) $VPXD "$@"

Configure database fail with: "Failed to initialize schema".

Start vCenter fail with: "vCenter Server failed to start".

/var/log/messages contains:

kernel: [   42.774111] vpxd[3176] general protection ip:7f0c227205df sp:7ffffde3eca8 error:0 in libvmomi.so[7f0c22438000+41f000]

kernel: [ 1124.503110] postgres (4994): /proc/4994/oom_adj is deprecated, please use /proc/4994/oom_score_adj instead.

kernel: [ 1125.594151] vpxd[5140] general protection ip:7f91d0bad5df sp:7fff157bcbe8 error:0 in libvmomi.so[7f91d08c5000+41f000]

kernel: [ 1238.269736] vpxd[5512] general protection ip:7fd7872e55df sp:7fff876db318 error:0 in libvmomi.so[7fd786ffd000+41f000]

kernel: [ 1238.664438] vpxd[5529] general protection ip:7fa54f43c5df sp:7fff38792508 error:0 in libvmomi.so[7fa54f154000+41f000]

kernel: [ 1239.070201] vpxd[5536] general protection ip:7f2f2d4d05df sp:7fff40d1c8c8 error:0 in libvmomi.so[7f2f2d1e8000+41f000]

kernel: [ 1255.594201] vpxd[5903] general protection ip:7fdd572775df sp:7fff924b1a28 error:0 in libvmomi.so[7fdd56f8f000+41f000]

...

watchdog-vpxd: '/usr/sbin/vpxd' exited after 0 seconds (quick failure 6) 139

watchdog-vpxd: End '/usr/sbin/vpxd', failure limit reached

This product seems completely unusable.

runesan
Contributor
Contributor

After several months of working with this, I wish I could add something substantive, but after watching over this posts for a few months, I just want to throw in and say I have the same problem. All sorts of tweaking, attempts, switching, have all come up empty.

The funny thing is I can deploy this image on my Mac's VMWare Fusion 6, and my Window's VMWare Workstation with *no* trouble. It literally is plug and play like its supposed to be (on Fusion I have to go to /opt/vmware/share/vami/vami_config_net after first startup to set the Network interface).

I am absurdly stumped why this product seems to randomly work and fail. The only thing this VCSA appliance *doesn't* work on is the primary VM Host I need it to work on. Keep in mind that we're talking identical environments. They all are on the same network, under the same internal domain, under the same NTP server, and using the same internal DNS servers. It works on VMWare Fusion, VMWare Workstation, and I was even able to shoehorn it onto my XenServer 6.2 host. But my singular ESXi 5.5 host refuses to run it with this same Database schema failure every single time. Keep in mind I've completely reinstalled the Hypervisor several times over, and in fact, the past few day's tests have been on a host with no special customization at all aside from Network configuration. No external Storage, Read Caches, Sites, VLANs, or anything. I've reset and unit tested just about everything on this host with absolutely nothing showing issue *except* this Appliance.

This has been a profoundly experience for months now as I essentially have no benefits of using 5.5 due to VMWare leveraging the use of the vCenter Web Client. Really wish I could get a breakthrough on this but after over a month of trying back when 5.5a was out and with the same issue in 5.5b, I'm not holding out much hope. If I get any sort of resolution, I will certainly update this post.

0 Kudos
snowzach
Contributor
Contributor

Figured it out finally... I was having the same issue. It's something to do with my RAID or the local datastore. I have an Areca controller. I actually created a VM, shared filesystem via NFS and installed the appliance on it and it worked fine. It must be something to do with that driver. I downgraded the driver and that didn't seem to fix it. It's certainly not ideal but something I can work around. It does make me wonder what other RAID/disk related issues I might be having though.

UPDATE: Yep, that was definitely the issue. I contacted Areca technical support and they sent me a beta driver to try that resolved the issue.

0 Kudos
burnedwire
Contributor
Contributor

Moving my vCenter Appliance's drive to NFS storage from iSCSI fixed this issue for me as well.

0 Kudos
kgottleib
Enthusiast
Enthusiast

I ran into this as well, ouch it hurt!    I resolved this by simply allowing my vCenter appliance to have internet access.. that's it. 

I had given up for a bit and deployed a windows server and installed vCenter using SQL express, but the installation kept failing without any solid error message that would help me out.  So I connected my lab to the internet, prior to this it was isolated completely, then I reinstalled vcenter on windows without issues. 

I then went  back to the VCSA, deleted it, redeployed from same image, but this time allowed it to get a DHCP address from our production network (I wired up an extra NIC in ESXi to this network and created a new vSwitch \ port group for vCenter) and moved forward with the configuration wizard and it did not choke on the DB piece as it had done prior many times. 

Ironically, but not surprising at all, after having the VCSA brought up successfully I decided to move the VM off the network that had internet connectivity back to isolated wire, soon as I did this I lost the DHCP address and all communication to the VCSA.  I logged in locally, changed the IP to static (keeping same IP) with the /opt/VMware/share/vami/vami_config_net utility, rebooted and nothing would start up.. At that point I already had working vCenter in place so I didn't see a need to touble-shoot the VCSA any further.

I have been experiencing issues with vCenter, both windows \ SQL and VCSA when not connected to internet, but I have not seen any solid statement that states internet is a requirement for vCenter??? or is there??? 

0 Kudos
dan13476
Contributor
Contributor

Ran into the OP problem today as I was trying to install the vca appliance onto a VMFS using a crucial ssd.

didn't like it for some reason. so after reading this post I redeployed the same ova onto different VMFS using standard HDD backing and it worked first go.

the VMFS with the ssd backing has been running other vms fine. just doesn't like the vca.

0 Kudos