gwomick
Contributor
Contributor

ESXi, 4.1.0 260247

Jump to solution

After upgrading hosts to ESXi 4.1.0 I have noticed a severe hit to my overall network performance during a vMotion of a virtual machine and for several minutes following.

I observer the overall bandwith on our network switch rising from 100,000 kbits/sec to almost 2,000,000 for a 3-5 minute period and it will slowly drop back to the 100,000 level. The drop down takes 5-10 minutes even when the vmotion is done in 3.

Any ideas?

0 Kudos
1 Solution

Accepted Solutions
AntonVZhbankov
Immortal
Immortal

It's very strange situation. Looks like something is wrong in your network configuration, because a lot of people are using ESXi 4.1 and you're first who reports such a problem.

Have you contacted VMware support on this issue?


---

MCSA, MCTS Hyper-V, VCP 3/4, VMware vExpert

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru

View solution in original post

0 Kudos
9 Replies
gwomick
Contributor
Contributor

I've added an additional NIC to each of the servers. With two NICs dedicated to vMotion and 3 for Production Traffic. We are not seeing any improvement.

We will be restarted our Cisco 4507R-E switch late this evening to see if it is an issue with that device. If not it appears to be an issue with the 4.1 upgrade.

0 Kudos
gwomick
Contributor
Contributor

Switch was restarted and the IOS has been upgraded on the switch. We are still seeing a heavy traffic storm on the switch during a vMotion. The switch reaches 2,000,000 kbits/sec and stops passing traffic.

0 Kudos
AntonVZhbankov
Immortal
Immortal

It's very strange situation. Looks like something is wrong in your network configuration, because a lot of people are using ESXi 4.1 and you're first who reports such a problem.

Have you contacted VMware support on this issue?


---

MCSA, MCTS Hyper-V, VCP 3/4, VMware vExpert

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru

View solution in original post

0 Kudos
dilpreet
VMware Employee
VMware Employee

Hi,

Is this a 10Gb link? The drop off seems strange if the vMotion task is finishing in 3 and the traffic lingers. Is vCenter progress completing or are you talking about some other way of determining that the migration completed?

regards,

-dilpreet

0 Kudos
gwomick
Contributor
Contributor

I'm inclined to agree it is most likely a network related issue. However this issue did not exist prior to the upgrade to 4.1. My only thoughts would be the new vMotion settings actually push more data through at a faster pace which is overloading our switch. We are running 5 hosts and approximately 100 vms. Each host has 3 vSwitches. vSwitch0 is our production network traffic (4 1GB NICs), vSwitch1 is our vMotion (1 1GB NIC), vSwitch2 is our iSCSCI traffic (2 1GB NICs). None of these are distributed switches. We currently have open tickets with Cisco and VMware.

I posted here because there were no other reports of a similar problem. So this is both for feedback from others such as yourself and documentation for anyone else going forward.

0 Kudos
gwomick
Contributor
Contributor

1 GB links. And the vMotion is finishing in the correct amount of time 3-5 minutes. I currently monitor the progress through vCenter.

0 Kudos
dilpreet
VMware Employee
VMware Employee

The traffic from your description doesn't seem to be stemming from the vMotion itself. I am inclined to say what the other person mentioned, seems like you have some networking issues. There may be some congestion which causes other traffic to behave differently but 4.0 also filled up a 1G network so the difference in what you see on the switch should be negligible (the efficiency of vMotion was increased so most of the traffic is data rather than header and various other improvements but that shouldn't affect the network much).

As part of this upgrade did you change something in your network as well?

gwomick
Contributor
Contributor

It ended up being a failing switch. We replaced the Cisco 4507R-E switch with a 4948E and have seen no further issues. Just bad timing to coincide wth the VMware upgrade.

0 Kudos
wsniegowski
Contributor
Contributor

I see the issue was resolved but thought I'd add some numbers.  Most of the 4500 series Cisco Gigabit modules suffer from ASIC oversubscription issues, where 8 gig ports all share a single 1 gig asic.  So for true gigabit performance you would only be able to use 6 ports on the module (1,8,16,24,32,40).  The backplane on the 4500 is also limited to 6gig on most Sups.

0 Kudos