VMware Cloud Community
Lukehe
Contributor
Contributor
Jump to solution

Unexpected VM reboots ESXi6.0, RHEL 8.3

On VM with RHEL 8.3 on Esxi6.0 is unexpectedly rebooting every 6 hours. I can't find any clue in RHEL logs. Nothing in messages, or other logs just new suddenly new start. The same as far as I can see goes for ESXi. On linux I see

last -Fxn6 shutdown reboot
reboot system boot 4.18.0-240.22.1. Thu May 20 10:33:41 2021 still running
reboot system boot 4.18.0-240.22.1. Thu May 20 04:28:48 2021 still running
showing the OS was not properly shut down. Nothing in messages, just OS is starting all of sudden.

On ESXi I have folowing vmkernel log entries for every reboot:

2021-05-19T20:28:29.484Z cpu12:37145264)VSCSI: 6784: handle 16266(vscsi0:0):Destroying Device for world 37145259 (pendCom 0)
2021-05-19T20:28:29.484Z cpu12:37145264)NetPort: 1782: disabled port 0x200001d
2021-05-19T20:28:29.484Z cpu12:37145264)NetPort: 1782: disabled port 0x5000013
2021-05-19T20:28:29.507Z cpu8:37145264)WARNING: CBT: 2080: Unsupported ioctl 44
2021-05-19T20:28:29.507Z cpu8:37145264)VSCSI: 4010: handle 16279(vscsi0:0):Using sync mode due to sparse disks
2021-05-19T20:28:29.507Z cpu8:37145264)VSCSI: 4052: handle 16279(vscsi0:0):Creating Virtual Device for world 37145259 (FSS handle 624668247) numBlocks=83886080 (bs=512)
2021-05-19T20:28:29.507Z cpu8:37145264)VSCSI: 273: handle 16279(vscsi0:0):Input values: res=0 limit=-1 bw=-1 Shares=-1
2021-05-19T20:28:29.507Z cpu8:37145264)Vmxnet3: 15430: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 128.
2021-05-19T20:28:29.507Z cpu8:37145264)Vmxnet3: 15680: Using default queue delivery for vmxnet3 for port 0x200001d
2021-05-19T20:28:29.507Z cpu8:37145264)NetPort: 1575: enabled port 0x200001d with mac 00:0c:29:4c:a9:68
2021-05-19T20:28:29.507Z cpu8:37145264)Vmxnet3: 15430: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 128.
2021-05-19T20:28:29.507Z cpu8:37145264)Vmxnet3: 15680: Using default queue delivery for vmxnet3 for port 0x5000013
2021-05-19T20:28:29.507Z cpu8:37145264)NetPort: 1575: enabled port 0x5000013 with mac 00:0c:29:4c:a9:72
2021-05-19T20:28:29.517Z cpu8:37145264)NetPort: 1782: disabled port 0x200001d
2021-05-19T20:28:29.524Z cpu8:37145264)NetPort: 1782: disabled port 0x5000013
2021-05-19T20:28:30.933Z cpu10:37145259)PVSCSI: 2546: Failed to issue sync i/o : Busy (btstat=0x0 sdstat=0x8)
2021-05-19T20:29:01.387Z cpu15:37145264)Vmxnet3: 15430: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 128.
2021-05-19T20:29:01.387Z cpu15:37145264)Vmxnet3: 15680: Using default queue delivery for vmxnet3 for port 0x200001d
2021-05-19T20:29:01.387Z cpu15:37145264)NetPort: 1575: enabled port 0x200001d with mac 00:0c:29:4c:a9:68
2021-05-19T20:29:01.529Z cpu15:37145264)Vmxnet3: 15430: Disable Rx queuing; queue size 1024 is larger than Vmxnet3RxQueueLimit limit of 128.
2021-05-19T20:29:01.529Z cpu15:37145264)Vmxnet3: 15680: Using default queue delivery for vmxnet3 for port 0x5000013
2021-05-19T20:29:01.529Z cpu15:37145264)NetPort: 1575: enabled port 0x5000013 with mac 00:0c:29:4c:a9:72

hostd.log goes like this: (related events starting at 2021-05-20T08:33:15.852Z)

2021-05-20T08:30:58.697Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '14' already registered
2021-05-20T08:30:58.697Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '17' already registered
2021-05-20T08:30:58.697Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '18' already registered
2021-05-20T08:30:58.697Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '6' already registered
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
2021-05-20T08:32:58.701Z warning hostd[7C132B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '14' already registered
2021-05-20T08:32:58.701Z warning hostd[7C132B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '17' already registered
2021-05-20T08:32:58.701Z warning hostd[7C132B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '18' already registered
2021-05-20T08:32:58.701Z warning hostd[7C132B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '6' already registered

2021-05-20T08:33:15.852Z error hostd[7C030B70] [Originator@6876 sub=Default opID=591ef603] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:15.864Z error hostd[7B452B70] [Originator@6876 sub=Default opID=591ef60f] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:15.871Z error hostd[7BDAAB70] [Originator@6876 sub=Default opID=591ef611] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:15.928Z error hostd[7C030B70] [Originator@6876 sub=Default opID=591ef613] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:20.706Z error hostd[7BDAAB70] [Originator@6876 sub=Default opID=591ef615] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:24.545Z info hostd[7BD28B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Tools manifest version status changed from guestToolsUnmanaged to guestToolsUnmanaged, on install is TRUE
2021-05-20T08:33:24.581Z info hostd[7BD28B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Send config update invoked
2021-05-20T08:33:24.621Z info hostd[7BD28B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Send config update invoked
2021-05-20T08:33:28.986Z info hostd[7BE2BB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Turning off heartbeat checker
2021-05-20T08:33:38.287Z warning hostd[7BDAAB70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '18' already registered
2021-05-20T08:33:40.007Z warning hostd[7BDAAB70] [Originator@6876 sub=Statssvc.vim.PerformanceManager] Calculating read OIO for scsi0:0 - delta is negative, prevTime = 1621499600 curTime = 1621499620 previIOTime = 30773757 curIOTime = 1209294
2021-05-20T08:33:40.007Z warning hostd[7BDAAB70] [Originator@6876 sub=Statssvc.vim.PerformanceManager] Calculating read I/O size for scsi0:0 -- commands delta is negative,prevBytes = 550266880 curBytes = 50629632 prevCommands = 14514curCommands = 2735
2021-05-20T08:33:40.007Z warning hostd[7BDAAB70] [Originator@6876 sub=Statssvc.vim.PerformanceManager] Calculating write OIO for scsi0:0 - delta is negative, prevTime = 1621499600 curTime = 1621499620 previIOTime = 5894206 curIOTime = 72179
2021-05-20T08:33:40.007Z warning hostd[7BDAAB70] [Originator@6876 sub=Statssvc.vim.PerformanceManager] Calculating write I/O size for scsi0:0 -- commands delta is negative,prevBytes = 143067136 curBytes = 2099200 prevCommands = 7408curCommands = 1027
2021-05-20T08:33:51.235Z error hostd[7BE2BB70] [Originator@6876 sub=Default opID=591ef627] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:51.237Z info hostd[7BDAAB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Send config update invoked
2021-05-20T08:33:51.280Z error hostd[7B452B70] [Originator@6876 sub=Default opID=591ef629] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:33:51.286Z error hostd[7C132B70] [Originator@6876 sub=Default opID=591ef62b] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'

2021-05-20T08:33:51.348Z info hostd[7C132B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Setting the tools properties cache.
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
2021-05-20T08:34:21.211Z error hostd[7B452B70] [Originator@6876 sub=Default opID=591ef62d] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:34:21.218Z error hostd[7B452B70] [Originator@6876 sub=Default opID=591ef62f] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:34:21.223Z error hostd[7BEACB70] [Originator@6876 sub=Default opID=591ef631] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:34:21.321Z error hostd[7BDAAB70] [Originator@6876 sub=Default opID=591ef633] Unable to convert Vigor value 'rhel8-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
2021-05-20T08:34:21.322Z info hostd[7B452B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5c2e8503-5e531d4a-1040-b499bab3168e/FirewallRH/FirewallRH.vmx] Send config update invoked
2021-05-20T08:34:40.006Z warning hostd[7B452B70] [Originator@6876 sub=Statssvc.vim.PerformanceManager] Calculated read I/O size 582844 for scsi0:0 is out of range -- 582844,prevBytes = 5090915328 curBytes = 5341538304 prevCommands = 219084curCommands = 219514
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
2021-05-20T08:34:58.704Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '14' already registered
2021-05-20T08:34:58.704Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '17' already registered
2021-05-20T08:34:58.704Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '18' already registered
2021-05-20T08:34:58.705Z warning hostd[7BD28B70] [Originator@6876 sub=VigorStatsProvider(2045777248)] AddVirtualMachine: VM '6' already registered
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

It is happening almost exactly the same times every day. All other VMs are running fine.

0 Kudos
1 Solution

Accepted Solutions
Lukehe
Contributor
Contributor
Jump to solution

It was issue internal to the RHEL VM guest OS.

I found reports related to the reboots on RHEL machine in /var/crash.

I began to suspect relation to ipsec, moved connections one by one to the other server (RHEL7) and single connection to the old Cisco ASA, with option "sha2_truncbug=yes"  was the one crashing the RHEL8 server. After I got it away, no crashes.

As I am writing now, I noticed both libreswan and strongswan installed on the problem machine, while on the other RHEL7, where I moved ipsec connections, only libreswan is present and it works fine. Possibly crashes could be related to this.

 

View solution in original post

0 Kudos
6 Replies
mossbrbr
Contributor
Contributor
Jump to solution

I'm seeing something very, very similar to one of my VM's running on ESXi 6.7. Random reboots with nothing with any real teeth in the logs for me to explore further.

0 Kudos
e_espinel
Virtuoso
Virtuoso
Jump to solution

Hello
RHEL 8.X is only supported for versions 6.5 and higher according to VMware compatibility matrix.
When using an unsupported operating system on a VMware VM its behavior can be unpredictable or on the contrary run without problem, but it is a Russian roulette.

https://www.vmware.com/resources/compatibility/search.php?deviceCategory=software&testConfig=16

e_espinel_0-1621881940566.png

 

Enrique Espinel
Senior Technical Support on IBM, Lenovo, Veeam Backup and VMware vSphere.
VSP-SV, VTSP-SV, VTSP-HCI, VTSP
Please mark my comment as Correct Answer or assign Kudos if my answer was helpful to you, Thank you.
Пожалуйста, отметьте мой комментарий как Правильный ответ или поставьте Кудо, если мой ответ был вам полезен, Спасибо.
0 Kudos
mark_a_wang
Contributor
Contributor
Jump to solution

I saw a very similar issue on ESXi 6.7, RHEL8.5. I have 16 VMs running, some of them reboot randomly. If I keep them running long enough, I think all of them will go through reboot. When I first create all 16 VMs and bring them up, almost every time 1~3 of them will reboot just after the first-boot (cloud-init was terminated). I couldn't see anything from guest VM's log, and this can happen on different VMs or hosts. I have the environment ready, it is easy to recreate this problem. I can provide anything can help with this issue when needed. Thanks.

0 Kudos
mark_a_wang
Contributor
Contributor
Jump to solution

Greetings,

I saw very similar issue for RHEL8 VM running on ESXi 6.7 - I have VMs running on ESXi 6.7, RHEL 8.5, and the VMs reboot randomly, no obvious errors can be found from guest VM.

Seems you ran into same issues. Did you solve the issue?

Thanks,

Mark

0 Kudos
Lukehe
Contributor
Contributor
Jump to solution

It was issue internal to the RHEL VM guest OS.

I found reports related to the reboots on RHEL machine in /var/crash.

I began to suspect relation to ipsec, moved connections one by one to the other server (RHEL7) and single connection to the old Cisco ASA, with option "sha2_truncbug=yes"  was the one crashing the RHEL8 server. After I got it away, no crashes.

As I am writing now, I noticed both libreswan and strongswan installed on the problem machine, while on the other RHEL7, where I moved ipsec connections, only libreswan is present and it works fine. Possibly crashes could be related to this.

 

0 Kudos
mark_a_wang
Contributor
Contributor
Jump to solution

Finally, I found the reboot is actually expected in my case. In this case, guest VM customization is used. The sequence seems like: pre-customization script -> reboot -> post-customization script. Cloud-init is also used to apply some other configurations and that takes a little bit longer. The cloud-init may or may not get interrupted by the reboot, this is why sometimes I was aware of the reboot, sometimes not. VMware folks suggest to use either traditional guest customization or cloud-init, not both of them at the same boot.
0 Kudos