Hello,
if I copy files from or to my ESXi host to local storage via the datastore manager or via scp, the ESXi host crashes with a PSoD and exception 14 between 30MB and 100MB. Sometimes the PSoD does not appear, but the host freezes anyway. There a no VMs running, because it is a fresh install.
Hardware info:
Board: Supermicro X8SIL-F
CPU: Intel i3-550
Memory: Kingston 4*2GB DDR3 @ 1333 MHz
NIC: 2 x Intel 82574L Gigabit Ethernet
HDD: 2 * 1 TB Hitachi HDS721010CLA332
ESXi:
uname -a
VMkernel hekvmhost2 5.1.0 #1 SMP Release build-1021289 Feb 17 2013 21:52:53 x86_64 GNU/Linux
ethtool -i vmnic0
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 1.9-0
Somehow it seems the same as one can find here: http://communities.vmware.com/thread/256853 but for ESXI 5.0 through 5.1 with different build numbers.
Has anybody a simimlar problem and/or knows if VMware is already working on a fix?
Thanks, Matthias
PSOD:s is usually related to memory or a bad driver.
Is this a new setup or has it been working reliably before?
Is all components on the VMware HCL?
// Linjo
It worked with some old build version of the ESXi 5.0 branch, but this was the one with the auto start error (see: http://blogs.vmware.com/vsphere/2012/07/vsphere-hypervisor-auto-start-bug-fixed.html)
I do not believe that the hardware is broken, because I did some tests with different OSes.
1) MemTest86 and MemTest86+ do not report errors
2) SMART short test and extended test without errors under linux 2.6
3) With linux I tried
dd if=/dev/urandom of=/dev/sda
to exclude any hdd controller issues. No errors.
4) Tried iperf to stress the NICs, no errors and good results
5) Then I booted a basic linux, mounted both hdd and copied >500GB via network from and to another server all night. No errors.
Hence, I do not believe into a hardware problem, but into a problem of the e1000e driver delivered with ESXi. As you can see on the photo, the page fault occurs somewhere in the polling routine of the e100e driver and this sounds similar to the error I linked to.
Matthias
Is this driver later then the one you are running?
// Linjo
Hi,
maybe. How do I find the version of the driver I am running? If I type
ethtool -i vmnic0
on the CLI, then I get
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 1.9-0
If 1.1.2 is the correct version and 4.0.17 is the version of the driver you pointed me to, then I do not run the new driver. But I do not know, if these numbers can be compared.
Second, how do I install that new driver? Is the zip file a depot file and can be installed via
esxcli software vib install --depot=/path/to/zip/file
?
Remember, I use the free version of the ESXi.
Thank you, Matthias
Update to former post. The results of
esxcli software vib list
are
net-e1000 8.0.3.1-2vmw.510.0.0.799733 VMware VMwareCertified 2013-03-15
net-e1000e 1.1.2-3vmw.510.0.0.799733 VMware VMwareCertified 2013-03-15
net-igb 2.1.11.1-3vmw.510.0.0.799733 VMware VMwareCertified 2013-03-15
Then I had a look into the zip archive, there I find a file called
intel_bootbank_net-igb_4.0.17.vib
I assume, that the later is a replacement for the net-igb package that is already installed on my system. But according to ethtool this driver is not used anyway, but the net-e1000e package in version 1.1.2? Are my assumptions correct? If yes, why should it help to replace a package that is not used anyway?
Matthias
I extracted the archive, copied the vib file to the local storage of the ESXi host and installed the vib via the esxcli command. The vib was updated, but it did not change anything. The e100e driver is still in use. The problem remains the same.
I can almost guarantee you are having this problem as I did. http://www.supermicro.com/support/faqs/faq.cfm?faq=11049
You probably have Active State Power Management enabled in your bios. I had the exact same problem with a Supermicro and Intel 82576 onboard NIC.
Disable ASPM and those issues will be gone.