VMware Cloud Community
Kelvin0431
Contributor
Contributor

eth0: tx hang

Got one node tx hang when running MR jobs, this node is one of datanodes from our hadoop cluster, we are importing databases from sqoop when got below errors, any ideas ?

Jun  1 14:01:19 kernel: ------------[ cut here ]------------

Jun  1 14:01:19 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)

Jun  1 14:01:19 kernel: Hardware name: VMware Virtual Platform

Jun  1 14:01:19 kernel: NETDEV WATCHDOG: eth0 (vmxnet3): transmit queue 2 timed out

Jun  1 14:01:19 kernel: Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc autofs4 8021q garp stp llc vsock(U) ipv6 microcode vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip6t_REJECT]

Jun  1 14:01:19 kernel: Pid: 18580, comm: java Not tainted 2.6.32-431.el6.x86_64 #1

Jun  1 14:01:19 kernel: Call Trace:

Jun  1 14:01:19 kernel: <IRQ>  [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0

Jun  1 14:01:19 kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50

Jun  1 14:01:19 kernel: [<ffffffff8147b74b>] ? dev_watchdog+0x26b/0x280

Jun  1 14:01:19 kernel: [<ffffffff81083e75>] ? internal_add_timer+0xb5/0x110

Jun  1 14:01:19 kernel: [<ffffffff8147b4e0>] ? dev_watchdog+0x0/0x280

Jun  1 14:01:19 kernel: [<ffffffff81084b07>] ? run_timer_softirq+0x197/0x340

Jun  1 14:01:19 kernel: [<ffffffff810ac8f5>] ? tick_dev_program_event+0x65/0xc0

Jun  1 14:01:19 kernel: [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0

Jun  1 14:01:19 kernel: [<ffffffff810ac9ca>] ? tick_program_event+0x2a/0x30

Jun  1 14:01:19 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30

Jun  1 14:01:19 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0

Jun  1 14:01:19 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90

Jun  1 14:01:19 kernel: [<ffffffff815310aa>] ? smp_apic_timer_interrupt+0x4a/0x60

Jun  1 14:01:19 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20

Jun  1 14:01:19 kernel: <EOI>

Jun  1 14:01:19 kernel: ---[ end trace 808af6e00c97548a ]---

Jun  1 14:01:19 kernel: vmxnet3 0000:03:00.0: eth0: tx hang

Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: resetting

Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: intr type 3, mode 0, 9 vectors allocated

Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: NIC Link is Up 10000 Mbps

Tags (1)
Reply
0 Kudos
4 Replies
DavoudTeimouri
Virtuoso
Virtuoso

Read this KB: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=20551...

Then disable TSO on your guest, then check and if the problem not resolved, change your NIC to E1000.

If the problem was still exist, you can disable TSO on ESXi as a test but I don't suggest it.

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
Reply
0 Kudos
Kelvin0431
Contributor
Contributor

Thx, Davoud, I will try your suggestion and continue monitoring this guest, Smiley Happy

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast

Reply
0 Kudos
rembertv
Contributor
Contributor

Hi,

We are using version VMware ESXi 5.5.0 build-1331820 & facing the same issue(tx hang) with same back trace in vmxnet3 driver.

Please let us know the availablity of patch .

As mentioned TSO is disabled by default in our guest vm.

Thanks in advance.

Reply
0 Kudos