I have a server with ESXI 5.5 and two virtual machines running pfSense, the problem is that almost every morning the server crashes, I can not even access the virtual machines or the own server by the vSphere Client. What could it be? Hardware problem? Or otherwise virtual machines can cause the host crash? Thanks
31 days of uptime!
The problem really was overheating.
Solution: 8cm FAN Cooler + opening for air flow in the system cover.
Thanks to all who helped!
Fttz,It can be several number of reason for Crash and check out this KB article.Can you post the PSOD screenshot and vmkernel logs?
vpxa
2015-01-02T14:09:57.129Z [FFD261A0 info 'Default'] [VpxVmomi] SOAP adapter started on named pipe /var/run/vmware/proxy-vpxa
2015-01-02T14:09:57.129Z [FFD261A0 verbose 'vpxavpxaMain'] [VpxaModulesStart] DONE
2015-01-02T14:09:57.129Z [FFD261A0 info 'Memory checker'] Check resources every 30 secs, soft limit 307200, hard limit 358400.
2015-01-02T14:09:57.129Z [FFD261A0 info 'Handle checker'] Setting system limit of 1024
2015-01-02T14:09:57.129Z [FFD261A0 info 'Handle checker'] Set system limit to 1024
2015-01-02T14:10:46.526Z [FFDABB70 verbose 'HttpConnectionPool-000000'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x1f3b5320, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
2015-01-02T14:10:46.526Z [FFD261A0 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x1f3c9848, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
vmwarning
0:00:00:00.000 cpu0:1)WARNING: Serial: 806: Serial port com2 failed during initialization.
0:00:00:00.000 cpu0:1)WARNING: Serial: 807: Serial port com2 will be disabled.
0:00:00:05.187 cpu0:32768)WARNING: VMKAcpi: 780: No IPMI PNP id found
2015-01-02T14:09:03.923Z cpu2:33294)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)
2015-01-02T14:09:04.290Z cpu1:33177)WARNING: Team.etherswitch: TeamES_Activate:668: Failed to initialize beaconing on portset 'pps': Not implemented.
2015-01-02T14:09:15.764Z cpu1:33177)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:09:24.310Z cpu0:33509)WARNING: Supported VMs 25, Max VSAN VMs 100, SystemMemoryInGB 6
2015-01-02T14:09:24.310Z cpu0:33509)WARNING: MaxFileHandles: 750, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 6
2015-01-02T14:09:24.313Z cpu0:33509)WARNING: DOM memory will be preallocated.
2015-01-02T14:09:46.092Z cpu0:34150)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40
2015-01-02T14:09:46.606Z cpu2:34150)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9ffd8c)
2015-01-02T14:09:59.169Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:09:59.171Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:09:59.173Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:09:59.176Z cpu2:35152)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:11:59.234Z cpu2:35537)WARNING: NetDVS: 547: portAlias is NULL
2015-01-02T14:11:59.236Z cpu2:35537)WARNING: NetDVS: 547: portAlias is NULL
vmkernel logs
2015-01-02T14:17:54.472Z cpu0:33981)World: 14299: VC opID hostd-5f33 maps to vmkernel opID fad8d528
2015-01-02T14:18:11.007Z cpu0:33988)World: 14299: VC opID 70A6ABB7-00000024 maps to vmkernel opID 6a43f816
2015-01-02T14:18:13.948Z cpu3:37018)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.
2015-01-02T14:18:14.005Z cpu3:37019)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.
2015-01-02T14:18:14.637Z cpu0:33973 opID=1ff8b7f7)World: 14299: VC opID hostd-159b maps to vmkernel opID 1ff8b7f7
2015-01-02T14:18:17.670Z cpu0:33974 opID=bc2ceb41)World: 14299: VC opID hostd-05c5 maps to vmkernel opID bc2ceb41
2015-01-02T14:18:20.003Z cpu2:33993)World: 14299: VC opID hostd-de94 maps to vmkernel opID 3cf9045e
2015-01-02T14:18:53.810Z cpu3:32788)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e80896440, 0) to dev "mpx.vmhba34:C0:T0:L0" on path "vmhba34:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2015-01-02T14:18:53.810Z cpu3:32788)ScsiDeviceIO: 2338: Cmd(0x412e80896440) 0x1a, CmdSN 0x711 from world 0 to dev "mpx.vmhba34:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 1, Old Value: 0, (Status: 0x0)
2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 0, Old Value: 1, (Status: 0x0)
2015-01-02T14:19:15.510Z cpu1:37261)Config: 346: "NetTraceEnable" = 0, Old Value: 0, (Status: 0x0)
2015-01-02T14:19:20.004Z cpu1:33981)World: 14299: VC opID hostd-70a5 maps to vmkernel opID 7ea5898e
These are the logs which I can obtain from the dump file.
From the vmkernel logs I believe it could be due to storage level issue.Lemme do some research on it.
couple of question for you
Are you trying to vmotion your VM?
Do you have enough resources in your ESXi and datastore?
Do you have enough port at vswitch?
What do you mean "storage level"? I do not understand ....
I'm not trying to do vmotion with vm.
I believe that my resources are sufficient.
Attached are a few prints of my host.
I have four vSwitch, each with a NIC. A VM uses four and the other uses two VSWITCH. The ports used by the two VM use IP on the same network.
thanks
I have analyzed the logs for some time but it's quite hard to pinpoint the exact issue. There is a lot of storage related messages though and I have noticed in the loaded modules that you are using VSAN or flash cache? Can you describe the hardware setup you are using? Mainly how storage is interconnected.
The best shot would be to post the PSOD screenshot so we can backtrace the thread stack just before the panic. There is really no exact vector towards any component in the logs
Edit: You may want to update the Realtek drivers, I see the one from 2009 is pretty outdated:
r8168 Copyright (C) 2009 Realtek NIC software team <nicfae@realtek.com>
This program comes with ABSOLUTELY NO WARRANTY; for details, please see <http://www.gnu.org/licenses/>.
This is free software, and you are welcome to redistribute it under cer$
There is nothing in the logs pointing to a unexpected reboot/shutdown.
Checked syslog.log, vmkernel.log, vmkwarning.log and hostd.log.
The server did not write a crashdump.
It just (cold) booted:
2014-12-31T17:47:42Z bootstop: Host has booted
2014-12-31T18:01:02Z bootstop: Host has booted
2015-01-01T10:24:54Z bootstop: Host has booted
2015-01-02T08:56:04Z bootstop: Host has booted
2015-01-02T09:14:46Z bootstop: Host has booted
2015-01-02T11:35:03Z bootstop: Host has booted
2015-01-02T14:09:57Z bootstop: Host has booted
Suggestions:
1. Make sure latest BIOS is installed.
2. Make sure system and components are on VMware HCL.
3. Check BIOS for power saving functions (disable all and set power to max).
4. As a test, install 5.1 and see if that works.
I left the linked server with no vm running. When I access the morning was locked.
In other times was not presented PSOD. But today I rebooted the server and when I tried to extract the logs showed PSOD. I tried to extract four times and the four logs occurred PSOD making it impossible to get the logs.
Attached the screen shots.
I'm not using or VSAN or flash cache.
I use a SATA HD for the entire server (ESXI and VMs).
Where can I find updated realtek drivers?
Thanks
The other times the system did not present POSD. Today the system was crashed when i checked (without any VM running), so I reset and tried to extract logs, but the POSD was displayed four times in a row and I could not logs because the system crashed every time I tried.
I restored the BIOS settings but there was the same problem.
Before installing the system, I customized the iso ESXI 5.5 with third-party drivers for my NICs Realteck and D-Link 528T, this may be the problem?
I'll check the power of saving functions.
Thanks
Nice Shoot out Alistar,i believe there is driver problem issue.Do you have enabled jumbo frames in any of your NIC?If yes can you try to disable it for a while and see how it goes.
Are you using any branded server HP or Dell?
I found something in Vfront VIBs Depot.Check this Net51-drivers - V-Front VIBSDepot Wiki
check whether your PCI ID 10ec:8167 matches in listed in the blog.
All NIC are with MTU in 1500.
I use a mounted server. With ASUS motherboard (onboard realteck NIC), 3 D-Link 528T PCI NICs, SATA 500 GB, 6 GB Ram.
You recommended me redo the iso with NET51 driver package (instead of realtek package used) and the D-Link driver package used before? Thanks
I do not use broadcom NIC still may be the problem ?
PCI: 1269: 0000:03:00.0 10ec:8168 1043:8505 discovered
Yes, this package is ought to help you out with your issue.
There is no need to reimage the host, just:
I tried to update and got this error:
/vmfs/volumes/54a29525-e7f3b84c-0515-60a44ccd6fa6 # esxcli software vib install -d "/vmfs/volumes/datastore1/net51-drivers-1
.0.0-1vft.510.0.0.799733.x86_64.vib"
[MetadataDownloadError]
Could not download from depot at /vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml, ski pping (('/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml', '', "[Errno 4] IOError: <ur lopen error [Errno 20] Not a directory: '/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.x ml'>"))
url = /vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733.x86_64.vib/index.xml
Please refer to the log file for more details.
Thanks
Ah damn sorry, I should double-check my links more - it was supposed to be:"esxcli software vib install -d "/vmfs/volumes/datastore1/net51-drivers-1.0.0-1vft.510.0.0.799733-offline_bundle.zip" not .vib as that would be used with the -n switch. Also the path might be really different so double-check before you try to install again just to be sure
For more info refer here: VMware KB: Installing patches on an ESXi 5.x host from the command line
Now installed! I'll leave the system running to see if fixed and I return.
Thanks to all
The system froze again (again and again). The HD LED does not blink when this occurs, the server IP is not available on the network. But does not occur PSOD, the system is locked in the standard ESXI screen. I'm almost giving up on this server
I extracted the logs and follow into 7 parts.