VMware Cloud Community
dimba
Contributor
Contributor
Jump to solution

vmxnet3 causes kernel panic on CentOS 6.3

Hi,

I'm running on ESX 5.1. The guest is CentOS 6.3 with vmxnet3 1.1.32.0.

When under load kernel is short of skbs the machine crashes with the following stack:

[77782.262000] ------------[ cut here ]------------

[77782.262000] kernel BUG at /tmp/build_rpm_sandbox.vy6fT4/BUILD/vmware-open-vm-tools-9.0.0/vmxnet3/vmxnet3_drv.c:1467!

[77782.262000] invalid opcode: 0000 [#2] SMP

[77782.262000] last sysfs file: /sys/module/ipv6/initstate

[77782.262000] CPU 0

[77782.262000] Modules linked in: ip_vs libcrc32c u2p(P)(U) nls_cp950 netconsole hades(P)(U) imp_iTCO_wdt(U) hades_wd(U) ip6table_filter ip6_tables ip6t_REJECT configfs vmblock(U) vsock(U) fuse vmware_balloon ipt_REJECT iptable_filter ip_tables ppdev parport_pc parport vmxnet3(U) ipv6 sg i2c_piix4 i2c_core vmci(U) shpchp ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netconsole]

[77782.262000]

[77782.262000] Pid: 14178, comm: gw.x Tainted: P           ---------------    2.6.32-279.el6.imp4.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform

[77782.262000] RIP: 0010:[<ffffffffa01b0a1f>]  [<ffffffffa01b0a1f>] vmxnet3_rq_rx_complete+0x9bf/0xbc0 [vmxnet3]

[77782.262000] RSP: 0018:ffff8800282037c8  EFLAGS: 00010046

[77782.262000] RAX: 0000000000000040 RBX: ffff880239dd0ee0 RCX: 0000000000001000

[77782.262000] RDX: 0000000000000004 RSI: 00000000000006c8 RDI: 0000000000000000

[77782.262000] RBP: ffff880028203838 R08: ffff880238629800 R09: 00000000000006c8

[77782.262000] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023a94aea0

[77782.262000] R13: ffff8802365a4c10 R14: ffff880239dd06e0 R15: ffff88023862aa18

[77782.262000] FS:  00007f61f2aea700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000

[77782.262000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[77782.262000] CR2: 0000000000a9f270 CR3: 000000008beaa000 CR4: 00000000000006f0

[77782.262000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[77782.262000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

[77782.262000] Process gw.x (pid: 14178, threadinfo ffff880080828000, task ffff880080820970)

[77782.262000] Stack:

[77782.262000]  ffff88023b9c0c40 ffff88023b9c0c80 0000000000000001 ffff880239dd0ee8

[77782.262000] <d> ffff880028203868 0000001081151947 0000000000000001 000000c100000000

[77782.262000] <d> ffff88023b9c0c80 ffff880239dd06e0 ffff880239dd0ee8 0000000000000010

[77782.262000] Call Trace:

[77782.262000]  <IRQ>

[77782.262000]  [<ffffffffa01b0da3>] vmxnet3_poll_rx_only+0x53/0x110 [vmxnet3]

[77782.262000]  [<ffffffff8143c835>] netpoll_poll_dev+0xd5/0x490

[77782.262000]  [<ffffffff8143cc9a>] find_skb+0x8a/0x90

[77782.262000]  [<ffffffff8143cf96>] netpoll_send_udp+0x36/0x230

[77782.262000]  [<ffffffffa001633b>] write_msg+0xbb/0x110 [netconsole]

[77782.262000]  [<ffffffff81064a65>] __call_console_drivers+0x75/0x90

[77782.262000]  [<ffffffff81064aca>] _call_console_drivers+0x4a/0x80

[77782.262000]  [<ffffffff81064fde>] release_console_sem+0x4e/0x220

[77782.262000]  [<ffffffff81065797>] vprintk+0x247/0x560

[77782.262000]  [<ffffffff814e81ce>] printk+0x41/0x43

[77782.262000]  [<ffffffff810a3ed7>] print_modules+0x97/0xf0

[77782.262000]  [<ffffffff8100e066>] show_registers+0x46/0x280

[77782.262000]  [<ffffffff814ee23a>] ? atomic_notifier_call_chain+0x1a/0x20

[77782.262000]  [<ffffffff814ec053>] __die+0xb3/0xf0

[77782.262000]  [<ffffffff8100f178>] die+0x48/0x90

[77782.262000]  [<ffffffff814eba44>] do_trap+0xc4/0x160

[77782.262000]  [<ffffffff8100cd95>] do_invalid_op+0x95/0xb0

[77782.262000]  [<ffffffffa01b0a1f>] ? vmxnet3_rq_rx_complete+0x9bf/0xbc0 [vmxnet3]

[77782.262000]  [<ffffffff81151947>] ? cache_alloc_refill+0x2f7/0x580

[77782.262000]  [<ffffffff8100be3b>] invalid_op+0x1b/0x20

[77782.262000]  [<ffffffffa01b0a1f>] ? vmxnet3_rq_rx_complete+0x9bf/0xbc0 [vmxnet3]

[77782.262000]  [<ffffffffa01b01fe>] ? vmxnet3_rq_rx_complete+0x19e/0xbc0 [vmxnet3]

[77782.262000]  [<ffffffff81427c9c>] ? dev_hard_start_xmit+0x2bc/0x3f0

[77782.262000]  [<ffffffff81276950>] ? swiotlb_map_page+0x0/0x100

[77782.262000]  [<ffffffffa01b0da3>] vmxnet3_poll_rx_only+0x53/0x110 [vmxnet3]

[77782.262000]  [<ffffffff8142bd62>] net_rx_action+0x102/0x2f0

[77782.262000]  [<ffffffff8106cde1>] __do_softirq+0xc1/0x1e0

[77782.262000]  [<ffffffff8100c1ac>] call_softirq+0x1c/0x30

[77782.262000]  <EOI>

[77782.262000]  [<ffffffff8100ddb5>] ? do_softirq+0x65/0xa0

[77782.262000]  [<ffffffff8106cc7a>] local_bh_enable+0x9a/0xb0

[77782.262000]  [<ffffffff8142c0eb>] dev_queue_xmit+0x19b/0x6f0

[77782.262000]  [<ffffffff814639b0>] ? ip_finish_output+0x0/0x310

[77782.262000]  [<ffffffff81463aec>] ip_finish_output+0x13c/0x310

[77782.262000]  [<ffffffff81463d78>] ip_output+0xb8/0xc0

[77782.262000]  [<ffffffff8146303f>] ? __ip_local_out+0x9f/0xb0

[77782.262000]  [<ffffffff81463075>] ip_local_out+0x25/0x30

[77782.262000]  [<ffffffff81463550>] ip_queue_xmit+0x190/0x420

[77782.262000]  [<ffffffff8147821e>] tcp_transmit_skb+0x3fe/0x7b0

[77782.262000]  [<ffffffff8147a5db>] tcp_write_xmit+0x1fb/0xa20

[77782.262000]  [<ffffffff8147af90>] __tcp_push_pending_frames+0x30/0xe0

[77782.262000]  [<ffffffff8146a5de>] tcp_push+0x6e/0x90

[77782.262000]  [<ffffffff8146b5ba>] tcp_sendmsg+0x64a/0xa20

[77782.262000]  [<ffffffff81417147>] sock_aio_write+0x167/0x180

[77782.262000]  [<ffffffffa020969d>] ? hook_function+0x1d/0x30 [hades_wd]

[77782.262000]  [<ffffffff81167a2a>] do_sync_write+0xfa/0x140

[77782.262000]  [<ffffffff8108aa50>] ? autoremove_wake_function+0x0/0x40

[77782.262000]  [<ffffffff8105e1f0>] ? pick_next_task_fair+0xd0/0x130

[77782.262000]  [<ffffffff811fe706>] ? security_file_permission+0x16/0x20

[77782.262000]  [<ffffffff81167df4>] vfs_write+0x184/0x1a0

[77782.262000]  [<ffffffff81168711>] sys_write+0x51/0x90

[77782.262000]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

[77782.262000] Code: 00 00 00 c7 83 b4 00 00 00 00 00 00 00 31 c0 83 f1 01 88 8b b8 00 00 00 e9 1c fb ff ff 48 83 bb c8 00 00 00 00 0f 85 5a fb ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 80 00 00 00 00 eb f7 8b 83

[77782.262000] RIP  [<ffffffffa01b0a1f>] vmxnet3_rq_rx_complete+0x9bf/0xbc0 [vmxnet3]

[77782.262000]  RSP <ffff8800282037c8>

crash-x86_64> mod -s vmxnet3 vmxnet.ko

mod: vmxnet3: cannot find or load object file: vmxnet.ko

crash-x86_64> mod -s vmxnet3 vmxnet3.ko

     MODULE       NAME                   SIZE  OBJECT FILE

ffffffffa01b7ac0  vmxnet3               60947  vmxnet3.ko

crash-x86_64> gdb list * vmxnet3_rq_rx_complete+0x9bf

0xffffffffa01b0a1f is in vmxnet3_rq_rx_complete (/tmp/build_rpm_sandbox.vy6fT4/BUILD/vmware-open-vm-tools-9.0.0/vmxnet3/vmxnet3_drv.c:1467).

1462                                            PCI_DMA_FROMDEVICE);

1463                            rxd->addr = cpu_to_le64(rbi->dma_addr);

1464                            rxd->len = rbi->len;

1465

1466                    } else {

1467                            BUG_ON(ctx->skb == NULL && !skip_page_frags);

1468                            /* non SOP buffer must be type 1 in most cases */

1469                            BUG_ON(rbi->buf_type != VMXNET3_RX_BUF_PAGE);

1470                            BUG_ON(rxd->btype != VMXNET3_RXD_BTYPE_BODY);

1471

The line is added by the following commit https://github.com/torvalds/linux/commit/5318d809d7b4975ce5e5303e8508f89a5458c2b6.

Any ideas?

Tags (2)
Reply
0 Kudos
1 Solution

Accepted Solutions
JarryG
Expert
Expert
Jump to solution

This seems to me more like open-vm-tools problem. Do you really need it? Because vmxnet3 is supported by kernel-sources since 2.6.32. And if you run incompatible versions of kernel & open-vm-tools, you might get these problems...

BTW, open-vm-tools is NOT the same as vmware-tools! Moreover, your open-vm-tools are quite old (9.0.0). Current stable branch is 9.4.0, so maybe it is time to update it (vmware-tools is even further, iirc in 9.6.x)...

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! :winking_face:

View solution in original post

Reply
0 Kudos
4 Replies
dimba
Contributor
Contributor
Jump to solution

Forgot to mention, that I also applied netpoll race condition patch ([PATCH] vmxnet3: fix netpoll race condition &amp;mdash; Linux Network Development).

This fix as it looks is not related to problem I see.

Reply
0 Kudos
JarryG
Expert
Expert
Jump to solution

This seems to me more like open-vm-tools problem. Do you really need it? Because vmxnet3 is supported by kernel-sources since 2.6.32. And if you run incompatible versions of kernel & open-vm-tools, you might get these problems...

BTW, open-vm-tools is NOT the same as vmware-tools! Moreover, your open-vm-tools are quite old (9.0.0). Current stable branch is 9.4.0, so maybe it is time to update it (vmware-tools is even further, iirc in 9.6.x)...

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! :winking_face:
Reply
0 Kudos
hostasaurus
Enthusiast
Enthusiast
Jump to solution

Thanks for that mention of the vm tools.  We forgot to update our linux repositories to reference 5.5 after upgrading and vmware hasn't deployed the newer tools to the 5.1 repo.  Just updated and tested and their 5.5 repo is indeed serving up the 9.4 branch.

Reply
0 Kudos
dimba
Contributor
Contributor
Jump to solution

@JarryG Indeed upgrading vmware-tools to latest tools fom ESX 5.5 solved problem, while we run on ESXi 5.1

Reply
0 Kudos