VMware Cloud Community
Fttz
Enthusiast
Enthusiast
Jump to solution

VMware ESXi 5.5 crashing

I have a server with ESXI 5.5 and two virtual machines running pfSense, the problem is that almost every morning the server crashes, I can not even access the virtual machines or the own server by the vSphere Client. What could it be? Hardware problem? Or otherwise virtual machines can cause the host crash? Thanks

1 Solution

Accepted Solutions
Fttz
Enthusiast
Enthusiast
Jump to solution

31 days of uptime!

The problem really was overheating.

Solution: 8cm FAN Cooler + opening for air flow in the system cover.

Thanks to all who helped!

View solution in original post

45 Replies
Dee006
Hot Shot
Hot Shot
Jump to solution

Fttz,It can be several number of reason for Crash and check out this KB article.Can you post the PSOD screenshot and vmkernel logs?

0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

Dee006,

The server now caught again, five hours after the last crash. Does not display the purple screen error, shows the default screen (with system information, etc).

Attached the log files.

0 Kudos
Dee006
Hot Shot
Hot Shot
Jump to solution

vpxa

2015-01-02T14:09:57.129Z [FFD261A0 info 'Default'] [VpxVmomi] SOAP adapter started on named pipe /var/run/vmware/proxy-vpxa

2015-01-02T14:09:57.129Z [FFD261A0 verbose 'vpxavpxaMain'] [VpxaModulesStart] DONE

2015-01-02T14:09:57.129Z [FFD261A0 info 'Memory checker'] Check resources every 30 secs, soft limit 307200, hard limit 358400.

2015-01-02T14:09:57.129Z [FFD261A0 info 'Handle checker'] Setting system limit of 1024

2015-01-02T14:09:57.129Z [FFD261A0 info 'Handle checker'] Set system limit to 1024

2015-01-02T14:10:46.526Z [FFDABB70 verbose 'HttpConnectionPool-000000'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x1f3b5320, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0

2015-01-02T14:10:46.526Z [FFD261A0 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x1f3c9848, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0

vmwarning

0:00:00:00.000 cpu0:1)WARNING: Serial: 806: Serial port com2 failed during initialization.

0:00:00:00.000 cpu0:1)WARNING: Serial: 807: Serial port com2 will be disabled.

0:00:00:05.187 cpu0:32768)WARNING: VMKAcpi: 780: No IPMI PNP id found

2015-01-02T14:09:03.923Z cpu2:33294)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)

2015-01-02T14:09:04.290Z cpu1:33177)WARNING: Team.etherswitch: TeamES_Activate:668: Failed to initialize beaconing on portset 'pps': Not implemented.

2015-01-02T14:09:15.764Z cpu1:33177)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:09:24.310Z cpu0:33509)WARNING: Supported VMs 25, Max VSAN VMs 100, SystemMemoryInGB 6

2015-01-02T14:09:24.310Z cpu0:33509)WARNING: MaxFileHandles: 750, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 6

2015-01-02T14:09:24.313Z cpu0:33509)WARNING: DOM memory will be preallocated.

2015-01-02T14:09:46.092Z cpu0:34150)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2015-01-02T14:09:46.606Z cpu2:34150)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9ffd8c)

2015-01-02T14:09:59.169Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:09:59.171Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:09:59.173Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:09:59.176Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:11:59.234Z cpu2:35537)WARNING: NetDVS: 547: portAlias is NULL

2015-01-02T14:11:59.236Z cpu2:35537)WARNING: NetDVS: 547: portAlias is NULL

vmkernel logs

2015-01-02T14:17:54.472Z cpu0:33981)World: 14299: VC opID hostd-5f33 maps to vmkernel opID fad8d528

2015-01-02T14:18:11.007Z cpu0:33988)World: 14299: VC opID 70A6ABB7-00000024 maps to vmkernel opID 6a43f816

2015-01-02T14:18:13.948Z cpu3:37018)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.

2015-01-02T14:18:14.005Z cpu3:37019)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.

2015-01-02T14:18:14.637Z cpu0:33973 opID=1ff8b7f7)World: 14299: VC opID hostd-159b maps to vmkernel opID 1ff8b7f7

2015-01-02T14:18:17.670Z cpu0:33974 opID=bc2ceb41)World: 14299: VC opID hostd-05c5 maps to vmkernel opID bc2ceb41

2015-01-02T14:18:20.003Z cpu2:33993)World: 14299: VC opID hostd-de94 maps to vmkernel opID 3cf9045e

2015-01-02T14:18:53.810Z cpu3:32788)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e80896440, 0) to dev "mpx.vmhba34:C0:T0:L0" on path "vmhba34:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2015-01-02T14:18:53.810Z cpu3:32788)ScsiDeviceIO: 2338: Cmd(0x412e80896440) 0x1a, CmdSN 0x711 from world 0 to dev "mpx.vmhba34:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 1, Old Value: 0, (Status: 0x0)

2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 0, Old Value: 1, (Status: 0x0)

2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 0, Old Value: 0, (Status: 0x0)

2015-01-02T14:19:20.004Z cpu1:33981)World: 14299: VC opID hostd-70a5 maps to vmkernel opID 7ea5898e

These are the logs which I can obtain from the dump file.

From the vmkernel logs I believe it could be due to storage level issue.Lemme do some research on it.

couple of question for you

Are you trying to vmotion your VM?

Do you have enough resources in your ESXi and datastore?

Do you have enough port at vswitch?

Fttz
Enthusiast
Enthusiast
Jump to solution

What do you mean "storage level"? I do not understand ....

I'm not trying to do vmotion with vm.

I believe that my resources are sufficient.

Attached are a few prints of my host.

I have four vSwitch, each with a NIC. A VM uses four and the other uses two VSWITCH. The ports used by the two VM use IP on the same network.

thanks

0 Kudos
Alistar
Expert
Expert
Jump to solution

I have analyzed the logs for some time but it's quite hard to pinpoint the exact issue. There is a lot of storage related messages though and I have noticed in the loaded modules that you are using VSAN or flash cache? Can you describe the hardware setup you are using? Mainly how storage is interconnected.

The best shot would be to post the PSOD screenshot so we can backtrace the thread stack just before the panic. There is really no exact vector towards any component in the logs Smiley Sad

Edit: You may want to update the Realtek drivers, I see the one from 2009 is pretty outdated:

r8168  Copyright (C) 2009  Realtek NIC software team <nicfae@realtek.com>

This program comes with ABSOLUTELY NO WARRANTY; for details, please see <http://www.gnu.org/licenses/>.

This is free software, and you are welcome to redistribute it under cer$

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
0 Kudos
FritzBrause
Enthusiast
Enthusiast
Jump to solution

There is nothing in the logs pointing to a unexpected reboot/shutdown.

Checked syslog.log, vmkernel.log, vmkwarning.log and hostd.log.

The server did not write a crashdump.

It just (cold) booted:

2014-12-31T17:47:42Z bootstop: Host has booted

2014-12-31T18:01:02Z bootstop: Host has booted

2015-01-01T10:24:54Z bootstop: Host has booted

2015-01-02T08:56:04Z bootstop: Host has booted

2015-01-02T09:14:46Z bootstop: Host has booted

2015-01-02T11:35:03Z bootstop: Host has booted

2015-01-02T14:09:57Z bootstop: Host has booted

Suggestions:

1. Make sure latest BIOS is installed.

2. Make sure system and components are on VMware HCL.

3. Check BIOS for power saving functions (disable all and set power to max).

4. As a test, install 5.1 and see if that works.

Fttz
Enthusiast
Enthusiast
Jump to solution

I left the linked server with no vm running. When I access the morning was locked.

In other times was not presented PSOD. But today I rebooted the server and when I tried to extract the logs showed PSOD. I tried to extract four times and the four logs occurred PSOD making it impossible to get the logs.

Attached the screen shots.

I'm not using or VSAN or flash cache.

I use a SATA HD for the entire server (ESXI and VMs).

Where can I find updated realtek drivers?

Thanks

0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

The other times the system did not present POSD. Today the system was crashed when i checked (without any VM running), so I reset and tried to extract logs, but the POSD was displayed four times in a row and I could not logs because the system crashed every time I tried.

I restored the BIOS settings but there was the same problem.

Before installing the system, I customized the iso ESXI 5.5 with third-party drivers for my NICs Realteck and D-Link 528T, this may be the problem?

I'll check the power of saving functions.

Thanks

0 Kudos
Dee006
Hot Shot
Hot Shot
Jump to solution

Nice Shoot out Alistar,i believe there is driver problem issue.Do you have enabled jumbo frames in any of your NIC?If yes can you try to disable it for a while and see how it goes.

Are you using any branded server HP or Dell?

0 Kudos
Dee006
Hot Shot
Hot Shot
Jump to solution

I found something in Vfront VIBs Depot.Check this Net51-drivers - V-Front VIBSDepot Wiki 

check whether your PCI ID 10ec:8167 matches in listed in the blog.

0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

All NIC are with MTU in 1500.

I use a mounted server. With ASUS motherboard (onboard realteck NIC), 3 D-Link 528T PCI NICs, SATA 500 GB, 6 GB Ram.

0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

You recommended me redo the iso with NET51 driver package (instead of realtek package used) and the D-Link driver package used before? Thanks

0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

I do not use broadcom NIC still may be the problem ?

0 Kudos
Alistar
Expert
Expert
Jump to solution

PCI: 1269: 0000:03:00.0 10ec:8168 1043:8505 discovered

Yes, this package is ought to help you out with your issue.

There is no need to reimage the host, just:

  1. Download the zip package from the above mentioned site
  2. Upload it to your local datastore via vSphere Client
  3. Enter maintenance mode with your host
  4. Connect to your ESXi via SSH (use PuTTy) and input the commands below written in bold
  5. In console, Assume root with "su -" (the dash is needed)
  6. Browse to your local datastore with cd /vmfs/volumes/datastore1
  7. Set Acceptance Level to Community Supported (if not already): esxcli software acceptance set --level=CommunitySupported
  8. Invoke the command to install the vib package: esxcli software vib install -d "/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733-offline_bundle.zip
  9. Confirm, wait a minute and eventually reboot your ESXi host
  10. Congratulations! Your troubles should be remedied now Smiley Happy
Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

I tried to update and got this error:

/vmfs/volumes/54a29525-e7f3b84c-0515-60a44ccd6fa6 # esxcli software vib install -d "/vmfs/volumes/datastore1/net51-drivers-1

.0.0-1vft.510.0.0.799733.x86_64.vib"

[MetadataDownloadError]

Could not download from depot at /vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml, ski                 pping (('/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml', '', "[Errno 4] IOError: <ur                 lopen error [Errno 20] Not a directory: '/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.x                 ml'>"))

        url = /vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml

Please refer to the log file for more details.

Thanks

0 Kudos
Alistar
Expert
Expert
Jump to solution

Ah damn sorry, I should double-check my links more - it was supposed to be:"esxcli software vib install -d "/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733-offline_bundle.zip" not .vib as that would be used with the -n switch. Also the path might be really different so double-check before you try to install again just to be sure Smiley Happy

For more info refer here: VMware KB: Installing patches on an ESXi 5.x host from the command line

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
0 Kudos
Fttz
Enthusiast
Enthusiast
Jump to solution

Now installed! I'll leave the system running to see if fixed and I return.

Thanks to all

Fttz
Enthusiast
Enthusiast
Jump to solution

The system froze again (again and again). The HD LED does not blink when this occurs, the server IP is not available on the network. But does not occur PSOD, the system is locked in the standard ESXI screen. I'm almost giving up on this server Smiley Sad

I extracted the logs and follow into 7 parts.

0 Kudos