VMware Cloud Community
HEKnet
Contributor
Contributor

ESXI 5.1 NIC e1000 driver triggers PSoD with exception 14

Hello,

if I copy files from or to my ESXi host to local storage via the datastore manager or via scp, the ESXi host crashes with a PSoD and exception 14 between 30MB and 100MB. Sometimes the PSoD does not appear, but the host freezes anyway. There a no VMs running, because it is a fresh install.

Hardware info:

Board: Supermicro X8SIL-F

CPU: Intel i3-550

Memory: Kingston 4*2GB DDR3 @ 1333 MHz

NIC: 2 x Intel 82574L Gigabit Ethernet

HDD: 2 * 1 TB Hitachi HDS721010CLA332

ESXi:

uname -a

VMkernel hekvmhost2 5.1.0 #1 SMP Release build-1021289 Feb 17 2013 21:52:53 x86_64 GNU/Linux

ethtool -i vmnic0
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 1.9-0

Somehow it seems the same as one can find here: http://communities.vmware.com/thread/256853 but for ESXI 5.0 through 5.1 with different build numbers.

Has anybody a simimlar problem and/or knows if VMware is already working on a fix?

Thanks, Matthias

Tags (3)
0 Kudos
7 Replies
Linjo
Leadership
Leadership

PSOD:s is usually related to memory or a bad driver.

Is this a new setup or has it been working reliably before?

Is all components on the VMware HCL?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
HEKnet
Contributor
Contributor

It worked with some old build version of the ESXi 5.0 branch, but this was the one with the auto start error (see: http://blogs.vmware.com/vsphere/2012/07/vsphere-hypervisor-auto-start-bug-fixed.html)

I do not believe that the hardware is broken, because I did some tests with different OSes.

1) MemTest86 and MemTest86+ do not report errors

2) SMART short test and extended test without errors under linux 2.6

3) With linux I tried

dd if=/dev/urandom of=/dev/sda

dd if=/dev/sda  of=/dev/sdb
diff /dev/sda /dev/sdb

to exclude any hdd controller issues. No errors.

4) Tried iperf to stress the NICs, no errors and good results

5) Then I booted a basic linux, mounted both hdd and copied >500GB via network from and to another server all night. No errors.

Hence, I do not believe into a hardware problem, but into a problem of the e1000e driver delivered with ESXi. As you can see on the photo, the page fault occurs somewhere in the polling routine of the e100e driver and this sounds similar to the error I linked to.

Matthias

0 Kudos
Linjo
Leadership
Leadership

Is this driver later then the one you are running?

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-INTEL-IGB-4017&productId=285#dt_ver...

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
HEKnet
Contributor
Contributor

Hi,

maybe. How do I find the version of the driver I am running? If I type

ethtool -i vmnic0

on the CLI, then I get


driver: e1000e
version: 1.1.2-NAPI
firmware-version: 1.9-0

If 1.1.2 is the correct version and 4.0.17 is the version of the driver you pointed me to, then I do not run the new driver. But I do not know, if these numbers can be compared.

Second, how do I install that new driver? Is the zip file a depot file and can be installed via

esxcli software vib install --depot=/path/to/zip/file
?

Remember, I use the free version of the ESXi.

Thank you, Matthias
0 Kudos
HEKnet
Contributor
Contributor

Update to former post. The results of

esxcli software vib list

are

net-e1000                      8.0.3.1-2vmw.510.0.0.799733         VMware  VMwareCertified   2013-03-15 
net-e1000e                     1.1.2-3vmw.510.0.0.799733          VMware  VMwareCertified   2013-03-15 
net-igb                        2.1.11.1-3vmw.510.0.0.799733        VMware  VMwareCertified   2013-03-15

Then I had a look into the zip archive, there I find a file called

intel_bootbank_net-igb_4.0.17.vib

I assume, that the later is a replacement for the net-igb package that is already installed on my system. But according to ethtool this driver is not used anyway, but the net-e1000e package in version 1.1.2? Are my assumptions correct? If yes, why should it help to replace a package that is not used anyway?

Matthias

0 Kudos
HEKnet
Contributor
Contributor

I extracted the archive, copied the vib file to the local storage of the ESXi host and installed the vib via the esxcli command. The vib was updated, but it did not change anything. The e100e driver is still in use. The problem remains the same.

0 Kudos
TMX1
Contributor
Contributor

I can almost guarantee you are having this problem as I did. http://www.supermicro.com/support/faqs/faq.cfm?faq=11049

You probably have Active State Power Management enabled in your bios. I had the exact same problem with a Supermicro and Intel 82576 onboard NIC.

Disable ASPM and those issues will be gone.

0 Kudos