VMware Cloud Community
Alpha_
Contributor
Contributor
Jump to solution

ESXi Hypervisor host not accessable after ~3 days uptime

Hello guys,

I just bought myself a Dell Power edge T30 server a couple of weeks ago and the goal was to put ESXi on it and run one VM with pfSense and one VM with Win 2012 server.

The setup follows:

* ESXi ((Updated) Dell-ESXi-6.5.0-4564106-A01 (Dell)) - running on a USB stick (Sandisk Ultra Fit 16GB)

* 1x SSD drive divided in two partitions; one partition with pfSense and the second partition as the system disk of the Win 2012 server operating system.

* 3x Western Digital drives used for storage

* One I350-T4 Lenovo Ethernet adapter; 2 ports assigned to the pfSense VM (WAN and LAN), 1 port assigned to Win 2012 server VM

* Integrated NIC is used for administration of ESXi host

Everything works flawlessly so far BUT except one thing; when the machine has been running for approx. 3 days, it is inpossible to connect to the ESXi host and the VMs. It is like everything has been frozen.

* pfSense is dead since all internal network is offline (all my devices at home is disconnected, no IP released to the devices)

* Win 2012 server does not respond to remote desktop connection queries nor the web server installed on this VM.

* ESXi console is not possible to access; my TV connected to the computer does not receive any signal from the server. The Num Lock and Caps Lock is working on the keyboard connected to the machine (lamp is switching on and off when pushing the buttons) but the TV is still black. Even if I try any of the combinations Alt+F1, Alt+F2...

* When trying to access ESXi using Putty and SSH protocol, I am asked to enter the user name BUT when i enter my password correctly the SSH session closes?!

So the question is, why does my host freeze after 3 days? I have tried to find any information in the logs but I can't find anything useful. But I am kind of new to ESXi so I might have missed something.

What I have done so far to restart everything is to push the power button for 5 seconds - the server is powered off (hard reset) - push the power button and the server boot - VM boot and everything is back to normal. I know that this is not a good way to restart the machine but what to do? I cannot access anything on that machine during this "freeze-period". I am so worried about my data being corrupted by doing this.

So do you have any suggestions on how to proceed? Have you heard about this before?

See my logs below (the time when it got frozen is 2017-12-31 13:41 I think):

dhclient: http://txt.do/dmtzn

esxupdate: http://txt.do/dmtz3

hostd: http://txt.do/dmtzi

shell: http://txt.do/dmtz8

sysboot: http://txt.do/dmt7s

syslog: http://txt.do/dmt7r

vmauthd: http://txt.do/dmt7k

vmkeventd: http://txt.do/dmt7o

vmkwarning: http://txt.do/dmt76

vpxa: http://txt.do/dmt7l

Cheers!

Reply
0 Kudos
1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal
Jump to solution

Ok, so a few things to point out here.

  1. Although you're probably aware, the T30 does not appear on VMware's HCL, meaning it isn't officially supported. And although it sounds like this is for a personal lab and not running a business, being on the HCL also means you take your chances with compatibility. In some cases there is hardware out there that just will not run ESXi correctly or with any degree of stability. You basically roll the dice and take it as it comes.
  2. The ISO you mentioned you used is for a -very- early build of ESXi 6.5. Before doing any more troubleshooting, I'd strongly recommend you re-install with the latest build of ESXi that Dell has customized. The latest customized image from Dell is VMware-VMvisor-Installer-6.5.0.update01-7388607.x86_64-DellEMC_Customized-A07.iso and here's the direct download link.
  3. You say you have one SSD "divided in two partitions". Does this mean you have two datastores you've created in ESXi that correspond to each partition?
  4. You also say you have three WD drives "used for storage". How are these being presented? As JBOD? Is there RAID abstraction?
  5. As for possible data corruption, you do run that risk when you kill a host, yes, especially with local storage and so you should be ensuring you have some backup procedure in place.

View solution in original post

Reply
0 Kudos
4 Replies
daphnissov
Immortal
Immortal
Jump to solution

Ok, so a few things to point out here.

  1. Although you're probably aware, the T30 does not appear on VMware's HCL, meaning it isn't officially supported. And although it sounds like this is for a personal lab and not running a business, being on the HCL also means you take your chances with compatibility. In some cases there is hardware out there that just will not run ESXi correctly or with any degree of stability. You basically roll the dice and take it as it comes.
  2. The ISO you mentioned you used is for a -very- early build of ESXi 6.5. Before doing any more troubleshooting, I'd strongly recommend you re-install with the latest build of ESXi that Dell has customized. The latest customized image from Dell is VMware-VMvisor-Installer-6.5.0.update01-7388607.x86_64-DellEMC_Customized-A07.iso and here's the direct download link.
  3. You say you have one SSD "divided in two partitions". Does this mean you have two datastores you've created in ESXi that correspond to each partition?
  4. You also say you have three WD drives "used for storage". How are these being presented? As JBOD? Is there RAID abstraction?
  5. As for possible data corruption, you do run that risk when you kill a host, yes, especially with local storage and so you should be ensuring you have some backup procedure in place.
Reply
0 Kudos
Alpha_
Contributor
Contributor
Jump to solution

Thank you for your reply!

1. Yes I am aware of that. I guess I have to live with the fact that I roll the dice and have to take it as it comes. Next time I will have a look in the HCL. And yes you are correct, this is for personal/lab use, not business.

2. Oh, I see. Yes I sure have encountered some minor bugs in the software during this time of use. If it is related to my hardware not being in the HCL or not is hard to say but I have noticed that many people have discovered the same issues (not running the same hardware as me). Thanks again for the download link!

Is it possible to just re-mount my currently used VMFS6 datastores after re-installing ESXi with the latest build? Or do I have to re-create the datastores after re-installing ESXi? I only want this for the three WD drives which contain one datastore per each drive.

3. Yes you are correct. Since it is not possible to create two datastores on one physical drive using the ESXi web interface, I used "partedutil" to achieve this. I chosed to split one drive in two because I want the win 2012 server system disk and the pfSense VM to be on a SSD drive. You would probably not recommend this setup and I can see that but I didn't have enough with SATA ports available. I might have a look at the market for a SATA controller board to add 2 more SATA ports to my system.

4. No there is no RAID configuration in my system. I changed the configuration in BIOS to make the SATA ports act like AHCI and not RAID. I do not want RAID in this setup.

5. Actually I have no backup. But it wouldn't be a problem if I lost everything as it is right now. Since the system is kind of unstable right now, there is no important data stored in the system/VMs..

Reply
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Is it possible to just re-mount my currently used VMFS6 datastores after re-installing ESXi with the latest build?

Yes, just re-install ESXi using the latest ISO and choose NOT to overwrite VMFS datastores. They'll get re-mounted when it comes up.

Re #4, I would change that setup and use the entirety of the SSD for a single VMFS datastore and then create individual datastores from your other drives. You can easily split up the virtual disks that make up a virtual machine to place them on different datastores allowing you to consume that SSD for high-IO needs only.

So, again, re-install using the latest build available, fix your SSD to wipe out the partition map, and see what you get then.

Reply
0 Kudos
Alpha_
Contributor
Contributor
Jump to solution

Hello again.

I did update the ESXi installation to "DellEMC-ESXi-6.5U1-7388607-A07". No problems were encountered.

Now, one week later, the host and the VMs are still up and running. No interruptions, no system freeze.

I have not separated the datastores on the SSD drive yet but this will be performed at a later state.

Thanks for your help daphnissov.

/Simon

Reply
0 Kudos