VMware Cloud Community
ITaaP
Enthusiast
Enthusiast

PSOD Caused by Intel X710 NICs?

Anyone have issues with Intel X710 NICs? I have four server I built over a year ago with the X710 NIC and no issues. I created a customized ESXi ISO with the i40e v1.4.28 drivers. Those servers are still running ESXi 6.0.0, 4510822. I used the same customized ISO recently on slightly different Supermicro servers. I left the i40e drivers at v1.4.28 but upgraded ESXi to 6.0.0, 6921384. Two out of four servers crashed. I then upgraded i40e to v2.0.6 which states in the release notes "Fix PSOD caused by small TSO segmentation" but again another server PSOD. Mind you, both the original servers and the ones I am working on now all have TSO and LRO enabled.

I just changed to native i40en 1.3.1 drivers, but haven't had a chance to test yet. My biggest concern is why are these servers crashing when the other are not?

Attached is one of the PSOD messages and below is part of the dump log.

2017-11-30T18:03:12.273Z cpu83:33607)<6>i40e 0000:82:00.0: TX driver issue detected, PF reset issued

2017-11-30T18:03:12.747Z cpu11:33598)<6>i40e 0000:82:00.0: i40e_open: Registering netqueue ops

2017-11-30T18:03:12.756Z cpu11:33598)IntrCookie: 1935: cookie 0x53 moduleID 4111 <i40e-vmnic6-TxRx-0> exclusive, flags 0x25

2017-11-30T18:03:12.757Z cpu10:33025)World: 9757: PRDA 0x418042800000 ss 0x0 ds 0x10b es 0x10b fs 0x0 gs 0x13b

2017-11-30T18:03:12.757Z cpu10:33025)World: 9759: TR 0x4020 GDT 0x43916b621000 (0x402f) IDT 0x4180118ca000 (0xfff)

2017-11-30T18:03:12.757Z cpu10:33025)World: 9760: CR0 0x80010031 CR3 0x30daaf5000 CR4 0x42768

2017-11-30T18:03:12.794Z cpu10:33025)Backtrace for current CPU #10, worldID=33025, rbp=0x43910809b0c0

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b010:[0x41801209b9b6]i40e_lan_xmit_frame@<None>#<None>+0x4da stack: 0x43b627e4b480, 0x430

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b0d0:[0x418011f44120]netdev_tx@com.vmware.driverAPI#9.2+0xf4 stack: 0x0, 0x0, 0x0, 0x4309

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b1a0:[0x41801193e8f6]UplinkDevTransmit@vmkernel#nover+0x3b2 stack: 0x43b62effc840, 0x9953

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b280:[0x41801202961e]NetSchedFIFORunLocked@<None>#<None>+0x126 stack: 0x0, 0x43b626bede00

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b2f0:[0x41801202993a]NetSchedFIFOInput@<None>#<None>+0x192 stack: 0x1, 0x43009742e780, 0x

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b3a0:[0x418011950980]IOChain_Resume@vmkernel#nover+0x270 stack: 0x4303512e0218, 0x4391080

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b440:[0x418011933c9e]PortOutput@vmkernel#nover+0xae stack: 0x2000, 0x4303512dfc80, 0x4391

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b480:[0x418011ff905e]TeamES_Output@<None>#<None>+0x27a stack: 0x4303511c7958, 0x43910809b

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b500:[0x418011fe6b4d]EtherswitchPortDispatch@<None>#<None>+0x985 stack: 0x0, 0x0, 0x0, 0x

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b6f0:[0x418011933f03]Port_InputResume@vmkernel#nover+0x17b stack: 0x418011fe61c8, 0x41801

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b750:[0x418011934051]Port_Input_Committed@vmkernel#nover+0x29 stack: 0x4303e35e7480, 0x52

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b790:[0x418011988eac]Vmxnet3VMKDevTQDoTx@vmkernel#nover+0x1754 stack: 0x0, 0x418011934051

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809b920:[0x41801198ae21]Vmxnet3VMKDev_AsyncTx@vmkernel#nover+0x95 stack: 0x50fc6b670ab0a, 0x

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809ba80:[0x41801196d7b4]NetWorldletPerVMCB@vmkernel#nover+0x164 stack: 0x4300973ab540, 0x439

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809baf0:[0x4180118c0934]WorldletBHHandler@vmkernel#nover+0xe0 stack: 0x1000000ef, 0x417fd18f

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bc50:[0x41801183336d]BH_Check@vmkernel#nover+0xe1 stack: 0x417fd18f4a88, 0x0, 0x0, 0x1418

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bcc0:[0x418011a123d2]CpuSchedIdleLoopInt@vmkernel#nover+0x182 stack: 0x400, 0x10000000000

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bd40:[0x418011a15bee]CpuSchedDispatch@vmkernel#nover+0x15fe stack: 0x439119f27100, 0x0, 0

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809be60:[0x418011a167d4]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0, 0x0, 0x80009bf08, 0x0,

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bee0:[0x418011a16d0f]CpuSched_SleepUntilTC@vmkernel#nover+0x8f stack: 0x2001, 0x0, 0x50fc

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bf40:[0x418011858aa4]IntrCookieRetireLoop@vmkernel#nover+0x214 stack: 0x4180428002c0, 0xb

2017-11-30T18:03:12.794Z cpu10:33025)0x43910809bfd0:[0x418011a1746e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0, 0

2017-11-30T18:03:12.811Z cpu10:33025) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-6921384 x86_64] [0m

#PF Exception 14 in world 33025:retireWld.00 IP 0x41801209b9b6 addr 0x14

PTEs:0x30de5f3027;0x30de5f1027;0x0;

2017-11-30T18:03:12.812Z cpu10:33025)cr0=0x8001003d cr2=0x14 cr3=0x1807000 cr4=0x216c

2017-11-30T18:03:12.812Z cpu10:33025)frame=0x43910809af50 ip=0x41801209b9b6 err=2 rflags=0x10202

2017-11-30T18:03:12.812Z cpu10:33025)rax=0x0 rbx=0x43b6089a0728 rcx=0x1

2017-11-30T18:03:12.812Z cpu10:33025)rdx=0x1 rbp=0x43910809b0c0 rsi=0x51

2017-11-30T18:03:12.812Z cpu10:33025)rdi=0x51 r8=0x34c r9=0x0

2017-11-30T18:03:12.812Z cpu10:33025)r10=0x51 r11=0x74 r12=0x43b6089a0648

2017-11-30T18:03:12.813Z cpu10:33025)r13=0x430770707830 r14=0x740013 r15=0x0

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:0 world:586767 name:"vmm2:LGA11KINGVBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:1 world:33016 name:"retireWld.0001" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:2 world:33294 name:"memMapKernel-2" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:3 world:586711 name:"vmm0:LGA11KINGVBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:4 world:573959 name:"vmm0:LGA11COLLECT02" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:5 world:33297 name:"memMapKernel-5" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:6 world:33103 name:"itRebalance" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:7 world:586758 name:"vmm6:cust1831-web10" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:8 world:586756 name:"vmm4:cust1831-web10" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:9 world:33108 name:"memsched-periodic" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:10 world:33025 name:"retireWld.0010" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:11 world:33598 name:"helper39-12" (SH)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:12 world:36381 name:"hostd-worker" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:13 world:584319 name:"vmm0:LGA11DNS01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:14 world:33029 name:"retireWld.0014" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:15 world:572888 name:"vmm3:LGA11VBR01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:16 world:586762 name:"vmm10:cust1831-web10" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:17 world:586709 name:"vmx" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:18 world:586769 name:"vmm4:LGA11KINGVBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:19 world:33474 name:"userMemTouchEst-430512e13000" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:20 world:33703 name:"vmnic6-pollWorld-0" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:21 world:581694 name:"vmm0:LGA11LABPSC01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:22 world:572884 name:"vmm0:LGA11VBR01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:23 world:573960 name:"vmast.573959" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:24 world:33667 name:"vmnic0-pollWorld-0" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:25 world:34000 name:"nfsgssd" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:26 world:586710 name:"vmm0:cust1831-web10" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:27 world:586768 name:"vmm3:LGA11KINGVBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:28 world:33043 name:"retireWld.0028" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:29 world:581695 name:"vmast.581694" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:30 world:33518 name:"vmsyslogd" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:31 world:33323 name:"memMapKernel-31" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:32 world:581696 name:"vmm1:LGA11LABPSC01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:33 world:33325 name:"memMapKernel-33" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:34 world:572886 name:"vmm1:LGA11VBR01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:35 world:33766 name:"L2Echo Rx" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:36 world:572887 name:"vmm2:LGA11VBR01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:37 world:33329 name:"memMapKernel-37" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:38 world:33330 name:"memMapKernel-38" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:39 world:586831 name:"vmx-mks:LGA11KINGVBP01" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:40 world:33055 name:"retireWld.0040" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:41 world:586763 name:"vmm11:cust1831-web10" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:42 world:33334 name:"memMapKernel-42" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:43 world:586770 name:"vmm5:LGA11KINGVBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:44 world:33231 name:"helper30-0" (SH)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:45 world:33780 name:"tq:tcpip4" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:46 world:583589 name:"vmm4:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:47 world:33339 name:"memMapKernel-47" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:48 world:583598 name:"vmm13:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:49 world:33429 name:"memMap-49" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:50 world:583586 name:"vmm1:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:51 world:33343 name:"memMapKernel-51" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:52 world:33432 name:"memMap-52" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:53 world:583600 name:"vmm15:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:54 world:583584 name:"vmm0:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:55 world:33347 name:"memMapKernel-55" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:56 world:583597 name:"vmm12:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:57 world:33437 name:"memMap-57" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:58 world:32865 name:"coalesceWorld-0" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:59 world:33439 name:"memMap-59" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:60 world:583593 name:"vmm8:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:61 world:33441 name:"memMap-61" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:62 world:35399 name:"vpxa-worker" (U)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:63 world:572368 name:"vmm1:LGA11VBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:64 world:33444 name:"memMap-64" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:65 world:583590 name:"vmm5:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:66 world:572366 name:"vmm0:LGA11VBP01" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:67 world:583592 name:"vmm7:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:68 world:33360 name:"memMapKernel-68" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:69 world:33361 name:"memMapKernel-69" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:70 world:33450 name:"memMap-70" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:71 world:583591 name:"vmm6:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:72 world:583599 name:"vmm14:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:73 world:33453 name:"memMap-73" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:74 world:33366 name:"memMapKernel-74" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:75 world:33367 name:"memMapKernel-75" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:76 world:33368 name:"memMapKernel-76" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:77 world:583595 name:"vmm10:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:78 world:33458 name:"memMap-78" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:79 world:583588 name:"vmm3:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:80 world:32866 name:"netCoalesce2World" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:81 world:33461 name:"memMap-81" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:82 world:33462 name:"memMap-82" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:83 world:33463 name:"memMap-83" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:84 world:583587 name:"vmm2:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:85 world:33465 name:"memMap-85" (S)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:86 world:583596 name:"vmm11:cust1831-db1" (V)

2017-11-30T18:03:12.813Z cpu10:33025)pcpu:87 world:33379 name:"memMapKernel-87" (S)

2017-11-30T18:03:12.813Z cpu10:33025)@BlueScreen: #PF Exception 14 in world 33025:retireWld.00 IP 0x41801209b9b6 addr 0x14

PTEs:0x30de5f3027;0x30de5f1027;0x0;

2017-11-30T18:03:12.813Z cpu10:33025)Code start: 0x418011800000 VMK uptime: 7:11:53:21.277

2017-11-30T18:03:12.814Z cpu10:33025)0x43910809b010:[0x41801209b9b6]i40e_lan_xmit_frame@<None>#<None>+0x4da stack: 0x43b627e4b480

2017-11-30T18:03:12.814Z cpu10:33025)0x43910809b0d0:[0x418011f44120]netdev_tx@com.vmware.driverAPI#9.2+0xf4 stack: 0x0

2017-11-30T18:03:12.814Z cpu10:33025)0x43910809b1a0:[0x41801193e8f6]UplinkDevTransmit@vmkernel#nover+0x3b2 stack: 0x43b62effc840

2017-11-30T18:03:12.815Z cpu10:33025)0x43910809b280:[0x41801202961e]NetSchedFIFORunLocked@<None>#<None>+0x126 stack: 0x0

2017-11-30T18:03:12.815Z cpu10:33025)0x43910809b2f0:[0x41801202993a]NetSchedFIFOInput@<None>#<None>+0x192 stack: 0x1

2017-11-30T18:03:12.815Z cpu10:33025)0x43910809b3a0:[0x418011950980]IOChain_Resume@vmkernel#nover+0x270 stack: 0x4303512e0218

2017-11-30T18:03:12.815Z cpu10:33025)0x43910809b440:[0x418011933c9e]PortOutput@vmkernel#nover+0xae stack: 0x2000

2017-11-30T18:03:12.816Z cpu10:33025)0x43910809b480:[0x418011ff905e]TeamES_Output@<None>#<None>+0x27a stack: 0x4303511c7958

2017-11-30T18:03:12.816Z cpu10:33025)0x43910809b500:[0x418011fe6b4d]EtherswitchPortDispatch@<None>#<None>+0x985 stack: 0x0

2017-11-30T18:03:12.816Z cpu10:33025)0x43910809b6f0:[0x418011933f03]Port_InputResume@vmkernel#nover+0x17b stack: 0x418011fe61c8

2017-11-30T18:03:12.817Z cpu10:33025)0x43910809b750:[0x418011934051]Port_Input_Committed@vmkernel#nover+0x29 stack: 0x4303e35e7480

2017-11-30T18:03:12.817Z cpu10:33025)0x43910809b790:[0x418011988eac]Vmxnet3VMKDevTQDoTx@vmkernel#nover+0x1754 stack: 0x0

2017-11-30T18:03:12.817Z cpu10:33025)0x43910809b920:[0x41801198ae21]Vmxnet3VMKDev_AsyncTx@vmkernel#nover+0x95 stack: 0x50fc6b670ab0a

2017-11-30T18:03:12.818Z cpu10:33025)0x43910809ba80:[0x41801196d7b4]NetWorldletPerVMCB@vmkernel#nover+0x164 stack: 0x4300973ab540

2017-11-30T18:03:12.818Z cpu10:33025)0x43910809baf0:[0x4180118c0934]WorldletBHHandler@vmkernel#nover+0xe0 stack: 0x1000000ef

2017-11-30T18:03:12.818Z cpu10:33025)0x43910809bc50:[0x41801183336d]BH_Check@vmkernel#nover+0xe1 stack: 0x417fd18f4a88

2017-11-30T18:03:12.818Z cpu10:33025)0x43910809bcc0:[0x418011a123d2]CpuSchedIdleLoopInt@vmkernel#nover+0x182 stack: 0x400

2017-11-30T18:03:12.819Z cpu10:33025)0x43910809bd40:[0x418011a15bee]CpuSchedDispatch@vmkernel#nover+0x15fe stack: 0x439119f27100

2017-11-30T18:03:12.819Z cpu10:33025)0x43910809be60:[0x418011a167d4]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0

2017-11-30T18:03:12.819Z cpu10:33025)0x43910809bee0:[0x418011a16d0f]CpuSched_SleepUntilTC@vmkernel#nover+0x8f stack: 0x2001

2017-11-30T18:03:12.820Z cpu10:33025)0x43910809bf40:[0x418011858aa4]IntrCookieRetireLoop@vmkernel#nover+0x214 stack: 0x4180428002c0

2017-11-30T18:03:12.820Z cpu10:33025)0x43910809bfd0:[0x418011a1746e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0

2017-11-30T18:03:12.823Z cpu10:33025)base fs=0x0 gs=0x418042800000 Kgs=0x0

2017-11-30T18:03:12.823Z cpu10:33025)vmkernel             0x0 .data 0x0 .bss 0x0

2017-11-30T18:03:12.823Z cpu10:33025)chardevs             0x418011dbd000 .data 0x417fc0000000 .bss 0x417fc00003c0

2017-11-30T18:03:12.823Z cpu10:33025)user                 0x418011dc4000 .data 0x417fc0400000 .bss 0x417fc040f900

2017-11-30T18:03:12.823Z cpu10:33025)vsanapi              0x418011e91000 .data 0x417fc0800000 .bss 0x417fc08024c0

2017-11-30T18:03:12.823Z cpu10:33025)vsanbase             0x418011e99000 .data 0x417fc0c00000 .bss 0x417fc0c08f40

2017-11-30T18:03:12.823Z cpu10:33025)vprobe               0x418011ea7000 .data 0x417fc1000000 .bss 0x417fc100e540

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_mgmt          0x418011ef0000 .data 0x417fc1400000 .bss 0x417fc1400180

2017-11-30T18:03:12.823Z cpu10:33025)iodm                 0x418011ef5000 .data 0x417fc1800000 .bss 0x417fc1800138

2017-11-30T18:03:12.823Z cpu10:33025)procfs               0x418011ef9000 .data 0x417fc1c00000 .bss 0x417fc1c00240

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_2_0_0_mgmt_shim 0x418011efc000 .data 0x417fc2000000 .bss 0x417fc20001a0

2017-11-30T18:03:12.823Z cpu10:33025)dma_mapper_iommu     0x418011efd000 .data 0x417fc2400000 .bss 0x417fc2400080

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_2_0_0_vmkernel_shim 0x418011f00000 .data 0x417fc2800000 .bss 0x417fc280c800

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_0_0_0_vmkernel_shim 0x418011f06000 .data 0x417fc2c00000 .bss 0x417fc2c08100

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_1_0_0_vmkernel_shim 0x418011f0c000 .data 0x417fc3000000 .bss 0x417fc3008a80

2017-11-30T18:03:12.823Z cpu10:33025)vmkplexer            0x418011f12000 .data 0x417fc3400000 .bss 0x417fc3400260

2017-11-30T18:03:12.823Z cpu10:33025)vmklinux_9           0x418011f16000 .data 0x417fc3800000 .bss 0x417fc3808ec0

2017-11-30T18:03:12.823Z cpu10:33025)vmklinux_9_2_0_0     0x418011fab000 .data 0x417fc3c00000 .bss 0x417fc3c07e84

2017-11-30T18:03:12.823Z cpu10:33025)vmklinux_9_2_1_0     0x418011fae000 .data 0x417fc4000000 .bss 0x417fc4007f98

2017-11-30T18:03:12.823Z cpu10:33025)vmklinux_9_2_2_0     0x418011fb1000 .data 0x417fc4400000 .bss 0x417fc4408798

2017-11-30T18:03:12.823Z cpu10:33025)vmklinux_9_2_3_0     0x418011fb4000 .data 0x417fc4800000 .bss 0x417fc4808ad8

2017-11-30T18:03:12.823Z cpu10:33025)lsi_mr3              0x418011fb7000 .data 0x417fc4c00000 .bss 0x417fc4c002c0

2017-11-30T18:03:12.823Z cpu10:33025)iscsi_trans          0x418011fd6000 .data 0x417fc5000000 .bss 0x417fc5001800

2017-11-30T18:03:12.823Z cpu10:33025)iscsi_trans_compat_shim 0x418011fe2000 .data 0x417fc5400000 .bss 0x417fc540096c

2017-11-30T18:03:12.823Z cpu10:33025)iscsi_trans_incompat_shim 0x418011fe3000 .data 0x417fc5800000 .bss 0x417fc58007e4

2017-11-30T18:03:12.823Z cpu10:33025)etherswitch          0x418011fe4000 .data 0x417fc5c00000 .bss 0x417fc5c14f00

2017-11-30T18:03:12.823Z cpu10:33025)netsched             0x418012029000 .data 0x417fc6000000 .bss 0x417fc6003d40

2017-11-30T18:03:12.823Z cpu10:33025)netioc               0x418012038000 .data 0x417fc6400000 .bss 0x417fc64000a0

2017-11-30T18:03:12.823Z cpu10:33025)random               0x41801203e000 .data 0x417fc6800000 .bss 0x417fc6800600

2017-11-30T18:03:12.823Z cpu10:33025)cnic_register        0x418012042000 .data 0x417fc6c00000 .bss 0x417fc6c001e0

2017-11-30T18:03:12.823Z cpu10:33025)ixgbe                0x418012044000 .data 0x417fc7000000 .bss 0x417fc7002240

2017-11-30T18:03:12.823Z cpu10:33025)i40e                 0x41801207d000 .data 0x417fc7400000 .bss 0x417fc74014e0

2017-11-30T18:03:12.823Z cpu10:33025)usb                  0x4180120b7000 .data 0x417fc7800000 .bss 0x417fc7801680

2017-11-30T18:03:12.823Z cpu10:33025)ehci-hcd             0x4180120dd000 .data 0x417fc7c00000 .bss 0x417fc7c002a0

2017-11-30T18:03:12.823Z cpu10:33025)xhci                 0x4180120e9000 .data 0x417fc8000000 .bss 0x417fc80003a0

2017-11-30T18:03:12.823Z cpu10:33025)hid                  0x418012108000 .data 0x417fc8400000 .bss 0x417fc84004e0

2017-11-30T18:03:12.823Z cpu10:33025)dm                   0x41801210e000 .data 0x417fc8800000 .bss 0x417fc8800000

2017-11-30T18:03:12.823Z cpu10:33025)nmp                  0x418012111000 .data 0x417fc8c00000 .bss 0x417fc8c04010

2017-11-30T18:03:12.823Z cpu10:33025)vmw_satp_local       0x41801213c000 .data 0x417fc9000000 .bss 0x417fc9000028

2017-11-30T18:03:12.823Z cpu10:33025)vmw_satp_default_aa  0x41801213e000 .data 0x417fc9400000 .bss 0x417fc9400000

2017-11-30T18:03:12.823Z cpu10:33025)vmw_psp_lib          0x418012140000 .data 0x417fc9800000 .bss 0x417fc9800290

2017-11-30T18:03:12.823Z cpu10:33025)vmw_psp_fixed        0x418012142000 .data 0x417fc9c00000 .bss 0x417fc9c00000

2017-11-30T18:03:12.823Z cpu10:33025)vmw_psp_rr           0x418012145000 .data 0x417fca000000 .bss 0x417fca000068

2017-11-30T18:03:12.823Z cpu10:33025)vmw_psp_mru          0x418012148000 .data 0x417fca400000 .bss 0x417fca400000

2017-11-30T18:03:12.823Z cpu10:33025)libata_92            0x41801214a000 .data 0x417fca800000 .bss 0x417fca802660

2017-11-30T18:03:12.823Z cpu10:33025)libata_9_2_0_0       0x41801216f000 .data 0x417fcac00000 .bss 0x417fcac01750

2017-11-30T18:03:12.823Z cpu10:33025)libata_9_2_1_0       0x418012170000 .data 0x417fcb000000 .bss 0x417fcb001750

2017-11-30T18:03:12.823Z cpu10:33025)libata_9_2_2_0       0x418012171000 .data 0x417fcb400000 .bss 0x417fcb401750

2017-11-30T18:03:12.823Z cpu10:33025)usb-storage          0x418012172000 .data 0x417fcb800000 .bss 0x417fcb804b00

2017-11-30T18:03:12.823Z cpu10:33025)vmci                 0x41801217f000 .data 0x417fcbc00000 .bss 0x417fcbc059c0

2017-11-30T18:03:12.823Z cpu10:33025)healthchk            0x4180121a4000 .data 0x417fcc000000 .bss 0x417fcc012bc0

2017-11-30T18:03:12.823Z cpu10:33025)teamcheck            0x4180121ba000 .data 0x417fcc400000 .bss 0x417fcc413100

2017-11-30T18:03:12.823Z cpu10:33025)vlanmtucheck         0x4180121cd000 .data 0x417fcc800000 .bss 0x417fcc812e00

2017-11-30T18:03:12.823Z cpu10:33025)heartbeat            0x4180121e2000 .data 0x417fccc00000 .bss 0x417fccc13000

2017-11-30T18:03:12.823Z cpu10:33025)shaper               0x4180121f7000 .data 0x417fcd000000 .bss 0x417fcd014d40

2017-11-30T18:03:12.823Z cpu10:33025)lldp                 0x41801220c000 .data 0x417fcd400000 .bss 0x417fcd400040

2017-11-30T18:03:12.823Z cpu10:33025)cdp                  0x418012211000 .data 0x417fcd800000 .bss 0x417fcd814240

2017-11-30T18:03:12.823Z cpu10:33025)ipfix                0x41801222b000 .data 0x417fcdc00000 .bss 0x417fcdc13440

2017-11-30T18:03:12.823Z cpu10:33025)tcpip4               0x418012242000 .data 0x417fce000000 .bss 0x417fce018600

2017-11-30T18:03:12.823Z cpu10:33025)dvsdev               0x4180123a1000 .data 0x417fce400000 .bss 0x417fce400040

2017-11-30T18:03:12.823Z cpu10:33025)dvfilter             0x4180123a4000 .data 0x417fce800000 .bss 0x417fce800b00

2017-11-30T18:03:12.823Z cpu10:33025)lacp                 0x4180123c6000 .data 0x417fcec00000 .bss 0x417fcec00180

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_1_0_0_dvfilter_shim 0x4180123d4000 .data 0x417fcf000000 .bss 0x417fcf0009f0

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_2_0_0_dvfilter_shim 0x4180123d5000 .data 0x417fcf400000 .bss 0x417fcf4009f0

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_0_0_0_dvfilter_shim 0x4180123d6000 .data 0x417fcf800000 .bss 0x417fcf800930

2017-11-30T18:03:12.823Z cpu10:33025)libfc_92             0x4180123d7000 .data 0x417fcfc00000 .bss 0x417fcfc00540

2017-11-30T18:03:12.823Z cpu10:33025)libfcoe_92           0x4180123f2000 .data 0x417fd0000000 .bss 0x417fd00001e0

2017-11-30T18:03:12.823Z cpu10:33025)libfc_9_2_0_0        0x4180123fb000 .data 0x417fd0400000 .bss 0x417fd0400868

2017-11-30T18:03:12.823Z cpu10:33025)libfcoe_9_2_0_0      0x4180123fc000 .data 0x417fd0800000 .bss 0x417fd08001f4

2017-11-30T18:03:12.823Z cpu10:33025)libfc_9_2_1_0        0x4180123fd000 .data 0x417fd0c00000 .bss 0x417fd0c00868

2017-11-30T18:03:12.823Z cpu10:33025)libfcoe_9_2_1_0      0x4180123fe000 .data 0x417fd1000000 .bss 0x417fd10001f4

2017-11-30T18:03:12.823Z cpu10:33025)ahci                 0x4180123ff000 .data 0x417fd1400000 .bss 0x417fd1400420

2017-11-30T18:03:12.823Z cpu10:33025)esxfw                0x418012407000 .data 0x417fd2600000 .bss 0x417fd2613b00

2017-11-30T18:03:12.823Z cpu10:33025)dvfilter-generic-fastpath 0x41801241f000 .data 0x417fd2a00000 .bss 0x417fd2a132c0

2017-11-30T18:03:12.823Z cpu10:33025)vmkibft              0x41801243b000 .data 0x417fd2e00000 .bss 0x417fd2e03960

2017-11-30T18:03:12.823Z cpu10:33025)vmkfbft              0x41801243f000 .data 0x417fd3200000 .bss 0x417fd3202b20

2017-11-30T18:03:12.823Z cpu10:33025)lvmdriver            0x418012442000 .data 0x417fd3600000 .bss 0x417fd3603500

2017-11-30T18:03:12.823Z cpu10:33025)deltadisk            0x41801245c000 .data 0x417fd3a00000 .bss 0x417fd3a07e40

2017-11-30T18:03:12.823Z cpu10:33025)vdfm                 0x418012493000 .data 0x417fd3e00000 .bss 0x417fd3e001c0

2017-11-30T18:03:12.823Z cpu10:33025)tracing              0x418012498000 .data 0x417fd4200000 .bss 0x417fd4206380

2017-11-30T18:03:12.823Z cpu10:33025)rdt                  0x4180124a1000 .data 0x417fd4600000 .bss 0x417fd4605840

2017-11-30T18:03:12.823Z cpu10:33025)vsanutil             0x4180124dd000 .data 0x417fd4a00000 .bss 0x417fd4a0a680

2017-11-30T18:03:12.823Z cpu10:33025)lsomcommon           0x418012506000 .data 0x417fd4e00000 .bss 0x417fd4e01940

2017-11-30T18:03:12.823Z cpu10:33025)plog                 0x418012547000 .data 0x417fd5200000 .bss 0x417fd52085d0

2017-11-30T18:03:12.823Z cpu10:33025)gss                  0x4180125f8000 .data 0x417fd5600000 .bss 0x417fd5602ad8

2017-11-30T18:03:12.823Z cpu10:33025)vmfs3                0x41801261e000 .data 0x417fd5a00000 .bss 0x417fd5a03b00

2017-11-30T18:03:12.823Z cpu10:33025)sunrpc               0x41801269c000 .data 0x417fd5e00000 .bss 0x417fd5e03880

2017-11-30T18:03:12.823Z cpu10:33025)virsto               0x4180126b6000 .data 0x417fd6200000 .bss 0x417fd6200a40

2017-11-30T18:03:12.823Z cpu10:33025)lsom                 0x41801272d000 .data 0x417fd6600000 .bss 0x417fd660b400

2017-11-30T18:03:12.823Z cpu10:33025)vfat                 0x418012811000 .data 0x417fd6a00000 .bss 0x417fd6a02800

2017-11-30T18:03:12.823Z cpu10:33025)ufs                  0x41801281c000 .data 0x417fd6e00000 .bss 0x417fd6e008c0

2017-11-30T18:03:12.823Z cpu10:33025)dvfg-igmp            0x41801282f000 .data 0x417fd7200000 .bss 0x417fd7200208

2017-11-30T18:03:12.823Z cpu10:33025)cmmds_net            0x418012835000 .data 0x417fd7600000 .bss 0x417fd7603740

2017-11-30T18:03:12.823Z cpu10:33025)cmmds                0x418012849000 .data 0x417fd7a00000 .bss 0x417fd7a05ec0

2017-11-30T18:03:12.823Z cpu10:33025)cmmds_resolver       0x4180128b9000 .data 0x417fd7e00000 .bss 0x417fd7e00110

2017-11-30T18:03:12.823Z cpu10:33025)vsan                 0x4180128ca000 .data 0x417fd8200000 .bss 0x417fd8217b00

2017-11-30T18:03:12.823Z cpu10:33025)vmklink_mpi          0x418012a30000 .data 0x417fd8600000 .bss 0x417fd86025c0

2017-11-30T18:03:12.823Z cpu10:33025)swapobj              0x418012a36000 .data 0x417fd8a00000 .bss 0x417fd8a03268

2017-11-30T18:03:12.823Z cpu10:33025)nfsclient            0x418012a3f000 .data 0x417fd8e00000 .bss 0x417fd8e03ca0

2017-11-30T18:03:12.823Z cpu10:33025)nfs41client          0x418012a5a000 .data 0x417fd9200000 .bss 0x417fd9205600

2017-11-30T18:03:12.823Z cpu10:33025)vflash               0x418012abe000 .data 0x417fd9600000 .bss 0x417fd9603700

2017-11-30T18:03:12.823Z cpu10:33025)vmkapei              0x418012ac9000 .data 0x417fd9a00000 .bss 0x417fd9a00b20

2017-11-30T18:03:12.823Z cpu10:33025)procMisc             0x418012ad2000 .data 0x417fd9e00000 .bss 0x417fd9e00000

2017-11-30T18:03:12.823Z cpu10:33025)ipmi_msghandler      0x418012ad3000 .data 0x417fda200000 .bss 0x417fda2005e0

2017-11-30T18:03:12.823Z cpu10:33025)ipmi_si_drv          0x418012adc000 .data 0x417fda600000 .bss 0x417fda600660

2017-11-30T18:03:12.823Z cpu10:33025)ipmi_devintf         0x418012ae7000 .data 0x417fdaa00000 .bss 0x417fdaa00180

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_1_0_0_nmp_shim 0x418012aea000 .data 0x417fdae00000 .bss 0x417fdae00ce8

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_0_0_0_nmp_shim 0x418012aeb000 .data 0x417fdb200000 .bss 0x417fdb200ce8

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_2_0_0_nmp_shim 0x418012aec000 .data 0x417fdb600000 .bss 0x417fdb600d68

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_0_0_0_iscsi_shim 0x418012aed000 .data 0x417fdba00000 .bss 0x417fdba00970

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_2_0_0_iscsi_shim 0x418012aee000 .data 0x417fdbe00000 .bss 0x417fdbe00970

2017-11-30T18:03:12.823Z cpu10:33025)vmkapi_v2_1_0_0_iscsi_shim 0x418012aef000 .data 0x417fdc200000 .bss 0x417fdc200970

2017-11-30T18:03:12.823Z cpu10:33025)hbr_filter           0x418012af0000 .data 0x417fdc600000 .bss 0x417fdc6002c0

2017-11-30T18:03:12.823Z cpu10:33025)vmkstatelogger       0x418012b1c000 .data 0x417fdca00000 .bss 0x417fdca03840

2017-11-30T18:03:12.823Z cpu10:33025)ftcpt                0x418012b43000 .data 0x417fdce00000 .bss 0x417fdce03000

2017-11-30T18:03:12.823Z cpu10:33025)svmmirror            0x418012b81000 .data 0x417fdd200000 .bss 0x417fdd200100

2017-11-30T18:03:12.823Z cpu10:33025)cbt                  0x418012b8e000 .data 0x417fdd600000 .bss 0x417fdd600080

2017-11-30T18:03:12.823Z cpu10:33025)migrate              0x418012b92000 .data 0x417fdda00000 .bss 0x417fdda05100

2017-11-30T18:03:12.823Z cpu10:33025)filtmod              0x418012bfd000 .data 0x417fdde00000 .bss 0x417fdde03fc0

2017-11-30T18:03:12.823Z cpu10:33025)vfc                  0x418012c0d000 .data 0x417fde200000 .bss 0x417fde202c80

Coredump to disk.

2017-11-30T18:03:12.874Z cpu10:33025)Slot 1 of 1.

2017-11-30T18:03:12.874Z cpu10:33025)Dump: 2352: Using dump slot size 2684354560.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
12 Replies
CJR_SR
Contributor
Contributor

Make sure you are using the VMware certified Drivers and Firmware for your Nic's, I had the same problem with 2 Servers in our farm. Hope that helps.

Reply
0 Kudos
ITaaP
Enthusiast
Enthusiast

All drivers are direct from VMware's website.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
TheBobkin
Champion
Champion

Hello ITaaP,

Yes there are known issues with this and driver/TSO recommendations accordingly:

https://kb.vmware.com/s/article/2126909

Hope this helps.

Bob

Reply
0 Kudos
ITaaP
Enthusiast
Enthusiast

Yes, that is the KB I saw. Issue was supposed to be resolved with v2.0.6 which is what I was running last when another server crashed.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
TheBobkin
Champion
Champion

Hello ITaaP,

Not aware of the exact specifics and no internal-access at the moment to research further,

but I do recall on some of the other threads here others mentioning that when using some vendors of server couldn't get PSOD-free without disabling TSO.

Bob

Reply
0 Kudos
ITaaP
Enthusiast
Enthusiast

I guess I can try that as a last resort. Will see have the native 1.3.1 drivers work first. Just want to be sure it is just a driver issue with the X710 since it is not affecting our other Supermicro servers.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
TheBobkin
Champion
Champion

Hello ITaaP,

Aye, that seems to be the crux of the issue though - unfortunately not all drivers/firmware play equally well with all hardware.

Also, are you changing the firmware here to match the drivers you have tried?

I ask as the HCL listings for these devices (may vary for your specific model/vendor) appear to be rather varied with respect to what matches with what e.g:

https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=40583&vcl=true

Bob

Reply
0 Kudos
ITaaP
Enthusiast
Enthusiast

Firmware is actually what I first suspected and is an interesting story. Let's start with the servers that have zero issues. They are running firmware 5.02 and driver 1.4.28, which are not matching on VMware's website.

Two of the servers that crashed have two X710 NICs. I checked the firmware version and one was running 4.42 and the other 5.05. I thought maybe running different firmware versions in the same server could be an issue. Since one was already running 5.05, I upgraded the other to 5.05 along with driver 2.0.6. Go figure driver 2.0.7 was not available for download when I was troubleshooting, but now it is.

Both sets of servers are running the same chipset, broadwell processors, and NICs. Yet I can't see why the servers running for a year have never had an issue. One difference is the ESXi release. The good servers are running 6.0.0, 4510822 and the crashing servers are on 6.0.0, 6921384. Thinking maybe it is a combination of the ESXi release and/or NIC firmware/driver, I opened a case with VMware Support. Their response?

This is not a VMware issue.  There isn't anything that we can suggest changing from the VMware perspective.

I am downloading drivers from their HCL, but they can't help... What is the point of having a HCL when if something doesn't work, it is not their problem?

I also came across this article which I thought was interesting. http://www.i-1.nl/blog/?p=58057  I am running microcode revision 0x0b00001b, but so are the servers that haven't crashed, and revision 0x0b00001b is supposed to be good according to VMware's website.

So nothing really makes sense. I can only hope that i40en v1.3.1 works. But if it does, I would still like to understand the difference between the servers I have that is causing the issue. I also have the same servers running the latest 6.5 version, but haven't had a chance to test yet.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
TheHevy
Contributor
Contributor

Reply
0 Kudos
ITaaP
Enthusiast
Enthusiast

I don't see anything in the release notes address this issue. However, I haven't had the issue again since upgrading to i40en 1.3.1.

https://tactsol.com https://vmware.solutions
Reply
0 Kudos
Halvsvenskeren
Contributor
Contributor

Can you run i40en driver on a 5.5 U3?

I cant find the drivers anywhere.

Our clusters are PSOD every few days and its effing annoying.

Reply
0 Kudos
berndweyand
Expert
Expert

note that this thread is 3 years old !

i had the issues with 5.5 a few years ago - and the only option for 5.5 was to disable TSO : https://kb.vmware.com/s/article/2126909
intel was investigating but i dont know if they released native driver.

after i had updated to 6.5 the problem went away - keep in mind that 5.5 is not supported anymore

 

Reply
0 Kudos