I am running OpenStack Newton as HA deployment on Ubuntu 16.04 vSphere VMs in conjuntion with a Ceph Storage (Jewel). The deployment comprises:
Everything is up and running. There are no configuration issues known regarding the OpenStack environment. The same setup is working properly on "real" hardware-based machines.
Unfortunately, I am confronted with segfaults occasionally occurring at startup of my OpenStack VMs (tested with Ubuntu 14.04/16.04). These segfaults appear randomly in different Ubuntu services, unforeseeable where and when. Theses faults are definitely not software-related and appear only when OpenStack VMs are configured with more than 1 vCPU. The probability to create a broken OpenStack VM rises with the vCPU count which means segfaults occur more frequently in an OpenStack VM with 4 vCPUs than in one VM with only 2 vCPUs configured and never happen in VMs with only 1 vCPU. I was able to spawn and destroy 500 VMs successfully in series using only 1 vCPU.
ESX/ESXi-Version: VMware ESXi, 6.0.0, 3825889
I am using KVM/QEMU as hypervisor on my compute nodes, so there must be a problem when running KVM on ESXi-based nodes. Hardware virtualization support is activated for my vSphere VMs:
$ egrep -c '(vmx|svm)' /proc/cpuinfo
INFO: /dev/kvm exists
KVM acceleration can be used
I have also tested using a different clock source for my vSphere VMs and switched from tsc to acpi_pm, but the issue is still occurring when more than 1 vCPU is configured in my OpenStack VMs. All OpenStack guest VMs use kvm-clock as clock source.
The problem must be related with ESXi, because KVM on real hardware works without any issues independently of how many vCPUs are configured for an OpenStack VM.
Any hints what to do?