Thank you very much for reporting this! I'll see that it gets fixed.
Hi again, swordfeng!
Linux kernel 3.11 changed the usage of the 3rd argument to the notifier function from a struct net_device * to a struct netdev_notifier_info *. Your patch will fix it neatly for newer Linux kernels, but the official fix we have queued up ends up being a little bit more complicated because we still support host OSes older than kernel 3.11.
We haven't seen any other reports of a stack overflow though... The misuse of the pointer is very nasty, but I haven't yet seen how a stack overflow could result. Is there any chance you could share more details of the failure? Information on which Linux distribution you're running might be useful, and/or a photo of the panic message on your screen.
Nice to hear that this is going to be fixed soon.
I'm current using Arch Linux, kernel 4.10.13, running patched modules (https://aur.archlinux.org/cgit/aur.git/tree/?h=vmware-modules-dkms) which seems unrelated to this issue however.
I have taken a picture and have kept the kernel module which produced the 'stack overflow'. And here's a piece of disassembly:
4390: e8 00 00 00 00 callq 4395 <VNetBridgeNotify+0x5>
4395: 55 push %rbp
4396: 48 89 e5 mov %rsp,%rbp
4399: 41 54 push %r12
439b: 53 push %rbx
439c: 48 89 fb mov %rdi,%rbx
439f: 48 83 ec 08 sub $0x8,%rsp
43a3: 48 83 fe 02 cmp $0x2,%rsi
43a7: 0f 84 bf 00 00 00 je 446c <VNetBridgeNotify+0xdc>
43ad: 48 83 fe 06 cmp $0x6,%rsi
43b1: 0f 84 8c 00 00 00 je 4443 <VNetBridgeNotify+0xb3>
43b7: 48 83 fe 01 cmp $0x1,%rsi
43bb: 74 0b je 43c8 <VNetBridgeNotify+0x38>
43bd: 48 83 c4 08 add $0x8,%rsp
43c1: 31 c0 xor %eax,%eax
43c3: 5b pop %rbx
43c4: 41 5c pop %r12
43c6: 5d pop %rbp
43c7: c3 retq
43c8: 48 83 7f 28 00 cmpq $0x0,0x28(%rdi)
43cd: 75 ee jne 43bd <VNetBridgeNotify+0x2d>
43cf: 48 81 ba 00 05 00 00 cmpq $0x0,0x500(%rdx)
43d6: 00 00 00 00
43da: 75 e1 jne 43bd <VNetBridgeNotify+0x2d>
43dc: 4c 8d 67 18 lea 0x18(%rdi),%r12
43e0: 48 89 d7 mov %rdx,%rdi
43e3: 48 89 55 e8 mov %rdx,-0x18(%rbp)
43e7: 4c 89 e6 mov %r12,%rsi
43ea: e8 00 00 00 00 callq 43ef <VNetBridgeNotify+0x5f>
Which convinces me that the invalid access into the *data causes this issue.
Ahhhhh yes, that'd do it! Not really a stack overflow in the most common/traditional sense (e.g. too many function calls or stack frames too large) but instead it's a wild pointer dereference that's "too far" past the current stack pointer and causes a page fault. I agree that your patch will take care of it.
Thanks for the awesome help, swordfeng. We greatly appreciate it!
I wasn't able to squeeze this fix into the Workstation 12.5.6 update which was released just now. It's still in the queue for evaluation for the subsequent patch release. No specific timeline to share, I'm afraid. In the meantime, your patch should continue to work against Workstation 12.5.6.
Workstation 12.5.7 has been released, and contains a fix for this issue which may cause host kernel panics or unreliable network bridging when run on Linux host kernel version 3.11 and newer.