VMware Cloud Community
lrmurali
Enthusiast
Enthusiast

Kernel driver C program with emergency_restart not rebooting the VMs on ESXi 6.0

Hi VMware enthusiasts,

Need your expert opinion to know about the reboot problem which we are facing in our test systems.

We have a problem with the VMware VMs which are not coming up post reset using kernel driver program.

we have written a simple C program which will execute in kernel mode and calls "emergency_restart()"

to reset a running VM. Post execution of this program our VM systems are not rebooting and

they are not responding to any console commands from vCenter. Until we reset again with VCenter

power->reset the VMs will be down.

Our environment:

-------------------

4 ESXi Servers Model: HP Proliant BL460c Gen9

fullName = "VMware ESXi 6.0.0 build-2494585",

There are 7 VMs on each host with different Operating systems (RH6, RH7, SLES11, SLES12)

With this configuration we have tried reset of the different VM by executing the C program.

All of these VM with different OS have shown same problem of not rebooting post execution

of this kernel driver program.

I would like to know what could be the reason for this problem. Is there any configuration needed

to resolve this issue?

We have initially thought it could be due to BIOS order. But I have set the "bios.hddOrder = "scsi0:0""

on all VMs vmx file. But even after setting this the problem arises.

On all of these VMs we have OS vmdk residing on SCSI controller 0. This controller is set to

"VMware paravirtual" with SCSI bus sharing set to "none". On few VMs we have set the controller

to "LSI Logic parallel" with scsi bus sharing as "none". The reboot problem seen on all of these VMs.

Any hint on why the issue is arising on all of these VMs will be really help full.

If you need any further logs or information on existing set up let us know.

Many thanks in advance.

 

ESXi vmkernel log during the issue:

>>>>>>>>

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13587(vscsi0:0):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13588(vscsi1:4):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13589(vscsi1:5):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13590(vscsi1:6):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13591(vscsi1:8):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.120Z cpu16:12730886)VSCSI: 6699: handle 13592(vscsi1:9):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.121Z cpu16:12730886)VSCSI: 6699: handle 13593(vscsi2:12):Destroying Device for world 12730875 (pendCom 0)

2017-11-16T13:23:22.121Z cpu16:12730886)NetPort: 1780: disabled port 0x200008e

2017-11-16T13:23:22.121Z cpu16:12730886)NetPort: 1780: disabled port 0x3000111

2017-11-16T13:23:22.121Z cpu16:12730886)NetPort: 1780: disabled port 0x3000110

2017-11-16T13:23:22.121Z cpu16:12730886)NetPort: 1780: disabled port 0x4000089

2017-11-16T13:23:22.121Z cpu16:12730886)NetPort: 1780: disabled port 0x5000088

2017-11-16T13:23:22.171Z cpu11:12730886)VSCSI: 4011: handle 13594(vscsi0:0):Creating Virtual Device for world 12730875 (FSS handle 255599114) numBlocks=104857600 (bs=512)

2017-11-16T13:23:22.171Z cpu11:12730886)VSCSI: 273: handle 13594(vscsi0:0):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.175Z cpu11:12730886)VSCSI: 4011: handle 13595(vscsi1:4):Creating Virtual Device for world 12730875 (FSS handle 261235195) numBlocks=4194304 (bs=512)

2017-11-16T13:23:22.175Z cpu11:12730886)VSCSI: 273: handle 13595(vscsi1:4):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.176Z cpu11:12730886)VSCSI: 4011: handle 13596(vscsi1:5):Creating Virtual Device for world 12730875 (FSS handle 241312250) numBlocks=4194304 (bs=512)

2017-11-16T13:23:22.176Z cpu11:12730886)VSCSI: 273: handle 13596(vscsi1:5):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.178Z cpu11:12730886)VSCSI: 4011: handle 13597(vscsi1:6):Creating Virtual Device for world 12730875 (FSS handle 230040056) numBlocks=4194304 (bs=512)

2017-11-16T13:23:22.178Z cpu11:12730886)VSCSI: 273: handle 13597(vscsi1:6):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.180Z cpu11:12730886)VSCSI: 4011: handle 13598(vscsi1:8):Creating Virtual Device for world 12730875 (FSS handle 211755513) numBlocks=4194304 (bs=512)

2017-11-16T13:23:22.180Z cpu11:12730886)VSCSI: 273: handle 13598(vscsi1:8):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.182Z cpu11:12730886)VSCSI: 4011: handle 13599(vscsi1:9):Creating Virtual Device for world 12730875 (FSS handle 212541943) numBlocks=4194304 (bs=512)

2017-11-16T13:23:22.182Z cpu11:12730886)VSCSI: 273: handle 13599(vscsi1:9):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.182Z cpu11:12730886)VSCSI: 4011: handle 13600(vscsi2:12):Creating Virtual Device for world 12730875 (FSS handle 233447925) numBlocks=2096 (bs=512)

2017-11-16T13:23:22.182Z cpu11:12730886)VSCSI: 273: handle 13600(vscsi2:12):Input values: res=0 limit=-1 bw=-1 Shares=-1

2017-11-16T13:23:22.182Z cpu11:12730886)Vmxnet3: 15108: Using default queue delivery for vmxnet3 for port 0x200008e

2017-11-16T13:23:22.183Z cpu11:12730886)NetPort: 1573: enabled port 0x200008e with mac 00:50:56:95:7e:de

2017-11-16T13:23:22.183Z cpu11:12730886)Vmxnet3: 15108: Using default queue delivery for vmxnet3 for port 0x3000111

2017-11-16T13:23:22.184Z cpu11:12730886)NetPort: 1573: enabled port 0x3000111 with mac 00:50:56:95:48:3a

2017-11-16T13:23:22.184Z cpu11:12730886)Vmxnet3: 15108: Using default queue delivery for vmxnet3 for port 0x3000110

2017-11-16T13:23:22.184Z cpu11:12730886)NetPort: 1573: enabled port 0x3000110 with mac 00:50:56:95:48:3a

2017-11-16T13:23:22.184Z cpu11:12730886)Vmxnet3: 15108: Using default queue delivery for vmxnet3 for port 0x4000089

2017-11-16T13:23:22.185Z cpu11:12730886)NetPort: 1573: enabled port 0x4000089 with mac 00:50:56:95:6c:cb

2017-11-16T13:23:22.185Z cpu11:12730886)Vmxnet3: 15108: Using default queue delivery for vmxnet3 for port 0x5000088

2017-11-16T13:23:22.185Z cpu11:12730886)NetPort: 1573: enabled port 0x5000088 with mac 00:50:56:95:4e:b9

2017-11-16T13:23:22.262Z cpu12:33199)P2MCache: 383: vm 12730875: GetPhysMemRange failed for PPN 0x1e589485 canBlock 0 count 95 status Bad parameter

2017-11-16T13:23:22.262Z cpu12:33199)WARNING: P2MCache: vm 12730875: 772: failed to Inc PinCount for PPN 0x1e589485

2017-11-16T13:23:22.473Z cpu11:12730886)NetPort: 1780: disabled port 0x4000089

2017-11-16T13:23:22.524Z cpu11:12730886)NetPort: 1780: disabled port 0x200008e

2017-11-16T13:23:22.534Z cpu11:12730886)NetPort: 1780: disabled port 0x5000088

2017-11-16T13:23:22.584Z cpu11:12730886)NetPort: 1780: disabled port 0x3000111

2017-11-16T13:23:22.643Z cpu0:12730886)NetPort: 1780: disabled port 0x3000110

>>>>>>>>

I could see one error in above log with "GetPhysMemRange". Searching this error in VMware site/google didn't give any hint.

Is there any issue with the set up? Any one seen similar problem? Do I need to update my current ESXi or BIOS?

Any thing to do with OS vmdk/SCSI controller setting?

NOTE: We have not seen any issue when the reset of VM is done via vCenter->power->reset .Issue is seen when the

            kernel driver program executes "emergency_restart"!

Regards,

Murali

0 Kudos
8 Replies
daphnissov
Immortal
Immortal

If you execute from bash a reboot now command, does the VM restart and come up successfully?

0 Kudos
lrmurali
Enthusiast
Enthusiast

Yes, bash 'reboot now' command is bringing up the VM.

0 Kudos
daphnissov
Immortal
Immortal

If it's coming up with a soft reboot request, then there may be a problem with your C because this doesn't appear to be an issue with ESXi.

0 Kudos
lrmurali
Enthusiast
Enthusiast

>>If it's coming up with a soft reboot request, then there may be a problem with your C because this doesn't appear to be an issue with ESXi.

Thanks for your reply. Soft reboot will only do the reboot of the operating system right?  Our C program will send reset request to ESXi/vCenter to reset the VM. Will it have any impact compare to reboot?

Since our C program has been used on many other configurations where it works fine with no issue. And also it has been there for a while with no issue of resetting the VMs.

I suspect there may be some issue with the current setup where I am seeing this issue. Do you have any hints on  which angle do I need to check for configuration issues? Any boot logs I can check?

Any thing do with ESXi version or build? or OS hard disk setup?

Thanks in advance.

-Murali

0 Kudos
daphnissov
Immortal
Immortal

Soft reboot only sends an ACPI request to the OS requesting a reboot, yes.

Our C program will send reset request to ESXi/vCenter to reset the VM. Will it have any impact compare to reboot?

I don't understand what you mean. I thought your program rebooted the VM from the guest's perspective. Also a "reset" is different from a "reboot" from the perspective of vSphere. A reset is a hard power operation whereas reboot is a soft operation.

If you have specific programming interface questions, you should probably open a case with VMware to see if they can answer it, because you're in territory that isn't seen by many.

0 Kudos
lrmurali
Enthusiast
Enthusiast

Thanks for your reply.

>>I don't understand what you mean. I thought your program rebooted the VM from the guest's perspective

Our program will send reset request through emergency_restart. It won't reboot in guest perspective.

When the program is executed I could see "Reset virtual machine" task request on Virtual machine "Task&Events" tab. Also this request status is showing completed. However VM is not coming into OS boot prompt/ not booting. I will check on any other configurations of VM/host which can lead to this problem.

>>If you have specific programming interface questions, you should probably open a case with VMware to see if they can answer it, because you're in territory that isn't seen by many.

Thanks for your suggestion. Will try to open a CR with VMware and check.

-Murali

0 Kudos
daphnissov
Immortal
Immortal

Ok, if you're seeing a task show up, that means you're running the command against vCenter or the host directly. You may have an incorrect method you're calling. Without seeing the code it's hard to tell.

0 Kudos
lrmurali
Enthusiast
Enthusiast

Hi,

In case if you want to look into the source of our kernel module.

cat test_reboot.c

==================

#include <linux/module.h>    // included for all kernel modules

#include <linux/kernel.h>    // included for KERN_INFO

#include <linux/init.h>      // included for __init and __exit macros

#include <linux/kthread.h>

#include <linux/reboot.h>

#include <linux/init.h>

#include <linux/version.h>

#include <linux/jiffies.h>

#include <linux/timer.h>

#include <linux/kthread.h>

int sleep=0;

struct timer_list timer_root_disk;

MODULE_LICENSE("GPL");

MODULE_DESCRIPTION("Test Reboot  module");

module_param(sleep, int, 0644);

MODULE_PARM_DESC(sleep, "sleep value to be passed along with insmod");

static void

root_disk_timer_routine(void)

{

    printk(KERN_INFO" Test Reboot: Rebooting node\n");

    emergency_restart();

}

static int __init hello_init(void)

{

    printk(KERN_INFO" Test Reboot: Entering Reboot Test module!\n");

    printk(KERN_INFO" Test Reboot: Rebooting node after %d seconds \n", sleep);

    init_timer(&timer_root_disk);

    timer_root_disk.function = root_disk_timer_routine;

    timer_root_disk.data = 1;

    timer_root_disk.expires = jiffies + sleep * HZ;

    add_timer(&timer_root_disk); /* Starting the timer */

    return 0;    // Non-zero return means that the module couldn't be loaded.

}

static void __exit hello_cleanup(void)

{

    printk(KERN_INFO" Test Reboot: Exiting module .\n");

}

module_init(hello_init);

module_exit(hello_cleanup);

Make file fo this kernel module:

================================

$cat Makefile

KERNEL_SOURCE := /lib/modules/`uname -r`/source

KERNEL_BUILD  := /lib/modules/`uname -r`/build

# Module will be installed into /lib/modules/`uname -r`/kernel/extra

obj-m           := test_reboot.o

# Do not print "Entering directory ..."

MAKEFLAGS += --no-print-directory

# Targets for running make directly in the external module directory:

modules modules_install clean:

        @$(MAKE) -C $(KERNEL_SOURCE) $@ O=$(KERNEL_BUILD) M=$(CURDIR)

       

Execution step:

===============

make modules

insmod test_reboot.ko sleep=10 (This will reset the node after 10 seconds)

>>> Executing this command will not bring up the node (most of the time)

0 Kudos