Hello there,
I've recently received what I hope would be my new lab rig, this SuperMicro 1U Twin Server : http://www.supermicro.nl/products/system/1U/5016/SYS-5016TI-TF.cfm
It's not on the HCL, but it's still server grade material :
- Xeon X3440
- ECC RAM
- Dual Intel 82574L nic onboard
Most if not all of these components are on the HCL, and truth is, ESXi installs without complaining and works out of the box.
Problem is, when using the NICs (for example, I'm using an iSCSI SAN), the NIC "crash", but the ESXi keeps running.
I've attached an extract of the vmkernel during the following sequence :
- Boot a VM
- Start a Debian installation inside the VM
At some point during the installation of Debian the nic used for iSCSI traffic just crash, and the following lines can be seen in the log :
2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1713: Scheduler [0x4100045c5b80] lock up [stopped=0] for vmnic3:2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1723: detected at 866999 while last xmit at 861775 and 2/37092 packets/bytes in flight [window full 1] and binary heap size 1 [stress 0]2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1732: Packets completion seem stuck, issuing reset on vmnic3 [stress 0]The following lines are iSCSI software initiator complaining about not being able to write to the LUN.
I've also done the same test with the other onboard nic and no iSCSI traffic (ie with VM Network traffic), to the same result.
Any ideas on how to fix the problem ?
I have another dual port card in the server I intend to use in the meantime, but loosing 2 onboard nics that are on the HCL makes me sick ...
Sylvain.
Can't you try to upgrade mother BIOS and/or the nic firmware? Maybe you already have a correction to this issue available...
Hello,
In fact I already checked, and I'm at the latest revision available form BIOS/NIC/IPMI firmware.
The output of ethtool -i vmnic2 indicate that I'm running the latest version of the Intel driver supported in the HCL.
Sylvain.
No idea ? Anyone ?
I can give any additional details that may be necessary.
Sylvain.
We are experiencing the same issue on hardware we just purchased. I have just started digging into it...
I saw your other post at: http://communities.vmware.com/message/1997632#1997632. FYI, we are using a Dell PowerEdge R710 (http://www.dell.com/downloads/global/products/pedge/en/server-poweredge-r710-specs-en.pdf) with two Broadcom 5709 Dual Port 1GbE NIC w/TOE iSCSI, PCIe-4 (430-3260).
If you find out any information about your issue, if you wouldn't mind updating this thread that would be helpful. I'll do the same. BTW, I saw this: http://www.vmug.nl/phpbb/viewtopic.php?t=5680. It's in Dutch, but I think they are saying there is some incompatibility with the NICs he was using.
Still no luck getting this to work.
I just applied a bunch a patches, at least 3 of them related to the e1000e driver, and the problem is still not fixed.
012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1817: Scheduler [0x4100188b79c0] lock up [stopped=0] for vmnic3:
2012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1827: detected at 4035995 while last xmit at 4030121 and 1/60 packets/bytes in flight [window full 0] and binary heap size 0 [stress 0]
2012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1836: Packets completion seem stuck, issuing reset on vmnic3 [stress 0]
Any idea is welcome...
Sylvain.
Don't know if this directly relates but you might want to look at these two links:
http://www.yellow-bricks.com/2010/02/02/e1000-and-dropped-rx-packets/
Datto
I don't think it's related, because these links applies to a similar problem but at the virtual machine level (vnic), where my problem is at the physical nic level (vmnic).
Thanks for the info anyway, it was worth looking into it.
Sylvain.
You might also look at this thread and see if it relates to what you're experiencing:
http://communities.vmware.com/message/1430032
Datto
I tried with all three IntMode (0,1,2) for the e1000e driver, with no luck.
Using the IntMode=0,0 (There are two NICs in this box), the problem was worse as the NIC failed just by loading a simple web page on the only VM using this vmnic.
The other two modes didn't do much to help either, the NIC kept failing after somewhere between 100Mb to 400Mb of traffic passing on the vmnic.
At least I have things to try!
Keep it going if you have any other thought on the matter
Sylvain.
You might run a network cable tester or swap the network cable to see if there's any positive effect.
Datto
Got the very same problem on 2 boards, and on all onboard NICs on these 2 boards.
No problem whatsoever with the additionnal, not onboard, NICs.
Beside, changing/checking the cables was one of my first step in troobleshooting the problem
I'll see if I can get someone from Supermicro to look into it, but as they do not officially support this board with ESXi, I'm thinking it's a lost battle anyway.
Sylvain.
I did once have some Dell servers that had NICs that wouldn't work unless I shut off the USB ports in the system BIOS and exhibited a sympton similar to your system. If for some reason shutting off the USB ports works, you may be able to re-assign the interrupts to get an arrangement that works for USB also.
Datto
Hi Sylvian,
Did you get any further with this issue as I'm tearing my hair out with this exact problem too - in my case a Supermicro X8S16-F and I too bought it specfically because everything was supported in ESXi...
I've purchased another Intel NIC and this works perfectly but having two onboard NICs just sitting there is very frustrating (especially as I had plans for them). (Rant over)
In any case, does anyone have any further suggestions?
Hi thomasq,
As a matter of fact, yes I have it working fine now.
After trying to fix the problem myself, I opened a case with Supermicro somewhere in June.
They were able to reproduce the problem, then got in touch with Intel, and they had me try a couple of firmwares, until the last one they sent me at the end of July fixed my problem.
I didn't have the chance to try it until 3 days ago, but I can report that it is working fine so far, even when put under a lot of stress (I switched my iSCSI network on to it, and stressed it with a steady 90+Mb/s throughput for 12 hours straight).
I recommend that you check with Supermicro, as their support staff has really been helpful on this one.
I hope you'll get your NICs working!
Sylvain.
Sylvain,
Thank you so much for the reply - I had given up hope!
regards,
Thomas.
-- Thomas B. Quillinan
Hi Sylvain,
Did you mean Supermicro update a BIOS version to you and solved the problem? Any further details? I met a similar problem with the following log in vmkwarning.log several month ago and really drived me mad. Any suggestion? Thanks in advance.
2016-04-15T09:43:22.985Z cpu26:33758)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out
2016-04-15T09:43:27.036Z cpu48:1887628)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic8: failed to push the coalescing settings cranking upinflight window to infinite: Failure
2016-04-15T09:43:28.037Z cpu14:1887636)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic9: failed to push the coalescing settings cranking upinflight window to infinite: Failure
2016-04-15T09:43:34.000Z cpu24:33758)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out
2016-04-15T09:43:38.037Z cpu22:1887666)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic8: failed to push the coalescing settings cranking upinflight window to infinite: Failure
2016-04-15T09:43:38.041Z cpu80:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic9: failed to push the coalescing settings cranking upinflight window to infinite: Failure
2016-04-15T09:43:44.012Z cpu58:33751)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out
2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3863: vmnic9 : scheduler(0x410c9034c2f0)/device(0x410b5da37940) 0/1 lock up [stopped=0]:
2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3874: detected at 682017011 while last xmit at 682011996 and 4401 bytes in flight [window 1500000 bytes] and last enqueued/dequeued at 682016995/682011996 [st$
2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3890: vmnic9: packets completion seems stuck, issuing reset
Regards,
Felix