    Network performance issues with ESX 3.5 on Sun Fire X4200

    apapadop


      Dear all


      We recently installed ESX 3.5 on a Sun Fire X4200 server. We setup a couple of VMs (RHEL5 from scratch and an Alfresco appliance). It became apparent that there was something wrong with the network. Large transfers especially (in the GBs) would start off at a decent transfer rate (say 9MB/s on a 100Mbit network) but then start slowing down dramatically, even stalling, sometimes breaking connections, sometimes picking up speed again.






      To rule out the hardware involved (switches, cables, NICs etc) we booted the server with SystemRescueCD, a simple bootable distribution. We mounted the local filesystem, activated the SSH server and then initiated transfers to the server, using the exact same files from the exact same sender, naturally over the same network equipment. The result was a steady flow of data at 10MB/s. We repeated this test tens of times to make sure.



      Then we rebooted to ESX 3.5 and sure enough, the exact same transfer, even to the ESX server's datastore directly and not to a hosted VM, over the same protocol (scp) was initially slower (about 9MB/s) but then started slowing down, stalling etc.



      At this point we're pretty confident that it's a driver issue (the e1000 driver is used, and we've set it to 100Mbit FD according to ESX's tuning guidelines) but don't know what to do next. The hardware is in the official compatibility list, so I would expect it to work flawlessly under ESX 3.5.



      Any ideas as to how we can further troubleshoot this are very welcome!