VMware Cloud Community
GeorgeHS
Contributor
Contributor
Jump to solution

Random Packet Loss, but only on windows guests

Hello all,

I've been working on this problem for a couple days now... I'm experiencing intermittent packet loss but only on Windows guest os's. Here's some more details on the problem, the network, and what I've already done.

The problem:

Random intermittent packet loss on windows guest os's sometimes stable, sometimes up to 4% packet loss for short periods of time. NO problems on FreeBSD, Oracle Linux 6 or Redhat 5.6 guest.

The platform:

ESXi v4.1 on an Intel motherboard based home built server.

8 gigs of ram

1 Gigabit intel nic

Intel E8200 Proc

4 high use guest

  1. pFsense (FreeBSD) 256mb ram, 4gig, 1cpu, 1 Nic Flexible 3, VMWare tools installed

  2. SBS 2008 4g ram, 100gig vdisk, 1 cpu, 1 Nic VMXNEXT 3, VMWare tools full installed

  3. Windows Server 2008 R2 (Web Server) 2g ram, 100gig vdisk, 1 cpu, 1 Nic VMXNEXT 3, VMWare tools full installed

  4. Windows Server 2008 R2 (SQL Host) 2g ram, 100gig vdisk, 1 cpu, 1 Nic VMXNEXT 3, VMWare tools full installed

2 test guest

  1. Oracle Linux 6 (Wanted to play with the new release) 2gig ram, 100gig, 1cpu (Usually powered off) 1 Nic Flexible 3, VMWare tools installed

  2. Redhat 5.6 (My sandbox) 2gig ram, 100gig, 1cpu (Usually powered off) 1 Nic Flexible 3, VMWare tools installed

The Network:

  ESXi fed via Cisco managed switch, Vlan 10 is management vlan, 20 is WAN, 30 is Guest, 40 is DVR. pFsense firewall controlls data across all vlans. All Guest OSs on vLan 10.

What I've done to try to resolve the issue.

  Hardware Changes:

     1. Replaced old cisco catalyst switch with a newer SG200 switch. (Nice little switch by the way. I recommend it)

     2. Replaced ALL patch cables including patch from switch to esx box, and all cables to patch panel that feeds the physical domain joined machines.

     3. Replaced physical NIC in server (3 times, 3 different NICS all Intel)

     4. Changed from individual HDDs to a Raid 10 running on an Intel SAS raid controller (6 450gig 15krpm drives)

  Software Changes

     1. Changed all nics in all windows boxes from E1000 to VMXNEXT 3

     2. Diabled TCP Chimney on all windows boxes.

     3. Reinstalled ALL windows systems from fresh instead of being conversions of physical boxes. Reconfigured them all from scratch, reinstalled all software, did all patches.

At this point I don't know where else to go to try and fix this problem. Any help would be appreciated.

Thanks.

Reply
0 Kudos
1 Solution

Accepted Solutions
ABDJBR
Enthusiast
Enthusiast
Jump to solution

Hi ,

i have checked your packet capture file , there is some checksum error and it seems due to enabling Task offload on your win box , try disabling it , this may solve your problem .

View solution in original post

Reply
0 Kudos
9 Replies
ABDJBR
Enthusiast
Enthusiast
Jump to solution

Hi ,

Seems your pFsense firewall is dropping these packets , since it's able to detect the fingerprint of the OS , you neeed to check the rules set for it if there is any rules defined to filter the traffice based on the OS .

Reply
0 Kudos
GeorgeHS
Contributor
Contributor
Jump to solution

Took pfSense completley out of the picture. Issue is still there. The packet loss is occuring between machines on the same lan. 

Reply
0 Kudos
ABDJBR
Enthusiast
Enthusiast
Jump to solution

hi again ,

how are you detecting packet loss ? wireshark ? can you post the packet capture ?

Reply
0 Kudos
GeorgeHS
Contributor
Contributor
Jump to solution

Detected the problem a while back trying to save to network shares (wouldn't go through)

Just pinging the box and getting a lot of time outs. I'll throw wire shark on it in a minute and post the results.

THanks for helping me Smiley Happy

Reply
0 Kudos
GeorgeHS
Contributor
Contributor
Jump to solution

Here is wireshark dump

Reply
0 Kudos
bilalhashmi
Expert
Expert
Jump to solution

Do you also see drop packets in ESXTOP .. it would show in the drpx field..

Follow me @ Cloud-Buddy.com

Blog: www.Cloud-Buddy.com | Follow me @hashmibilal
ABDJBR
Enthusiast
Enthusiast
Jump to solution

Hi ,

i have checked your packet capture file , there is some checksum error and it seems due to enabling Task offload on your win box , try disabling it , this may solve your problem .

Reply
0 Kudos
GeorgeHS
Contributor
Contributor
Jump to solution

I had just disabled chimney offload using the netsh command.

I went to the actual nic in the device manager and disabled pretty much anything that said "Offload" after your response. Let a ping run all day on it as well as running wire shark.

Rock solid.

Thank you very much for your help.

Reply
0 Kudos
ABDJBR
Enthusiast
Enthusiast
Jump to solution

you are welcome .

if you need anything in Linux and cisco , i will be glade to offer somehelp .

Reply
0 Kudos