Howdy all,
does anyone have any experience troubleshotting poor ESX server performance - we have one host in a cluster of 4 under-performing , when we migrate a VM to the under-performing host, we notice file transfer and backups to/from that VM drop by 50% or more
We've looked at the KB documents, and whilst there's some good general guidelines there, am looking for some more specific detail.
So far, we've assumed its network based, but ifconfig doesn't show any errors on the vmnic's, nor does the physical networking, at least on the physical switch
vSwitch config (in esx.conf ) looks exactly the same as the normally performing hosts.
Performance stats in VCentre certainly show some differences in CPU, Disk and NIC utilisation stats, but nothing really exceptional, most of it appears at least to be explainable by the differing VM loads on the hosts.
I think we've ruled out Disk as an issue, as the hosts are all connected to a fabric SAN, and we can reproduce the errors just with a TCP traffic generator (iperf.exe) as opposed to copying a file etc.
Hosts are Dell Poweredge 2900's, with 2x additional dual-port Broadcom NIC in each host, connected to an EMC CX500 via McData fabric swicthes. ESX 3.5 has been patched up to March with Critical updates.
Any clues would be greatly appreciated,
Justin.
Hi Justin
You seem to have covered most bases trying to resolve this issue... so i guess the next place to go would be the host logs to see if you can see anything in there that might identify what the problem is.
As the host is otherwise running ok, you might find it easier to generate a full diagnostic log bundle for the host and at least that way you'll have all the logs and info in one location for you to look at.
Hope this helps 😃
Adam
Cheers Adam, will give the diag bundle a look.
Don't suppose you know anything about NIC teaming? We have multiple NIC's attached to the main Virtual Machine Switch, but two of the four do almost no work at all.
Was trying to confirm if you need to enable NIC Teaming in both the vSwitch entry and the Virtual Machine Network entry?
Hi,
If you team a NIC at the switch level then all port groups will inherit that configuration.
I suggest you verify your device IRQ's are not shared with a slower device like a USB interface.
use:
cat /proc/vmware/interrupts
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x21: 0 0 0 0 VMK ACPI Interrupt
0x29: 2 0 0 0 COS irq 1 (ISA edge)
0x31: 1 0 0 0 <COS irq 6 (ISA edge)>
0x39: 0 0 0 0 <COS irq 8 (ISA edge)>
0x41: 14 0 0 0 COS irq 12 (ISA edge)
0x49: 0 0 0 0 <COS irq 13 (ISA edge)>
0x51: 0 0 0 0 <COS irq 14 (ISA edge)>
0x59: 40 0 0 0 COS irq 15 (ISA edge)
0x61: 0 0 0 0 COS irq 16 (PCI level)
0x69: 1447683 5188742 8819947 8681993 <COS irq 18 (PCI level)>, VMK ioc0
0x71: 1531 11222 18710 52277 <COS irq 17 (PCI level)>, VMK qla2300
0x79: 581145 1250520 1671525 1729260 <COS irq 19 (PCI level)>, VMK vmnic0
0x81: 20691576 94268900 153193164 157074040 <COS irq 20 (PCI level)>, VMK vmnic1
When you see more than one dev per vector there could be a shared issue if the shared devs differ greatly. e.g. UART + HBA = slow HBA
Also look for IRQ's that are only processed by one CPU.
vExpert 2009
Hi Mike,
yeah i was just looking at that article by Tom SIghtler
IRQ's look ok - yes one is being shared by one of the NIC's and the FC HBA's, but its spread across all the processors, right? See attached
Hi Justin
Hope you manage to find something in the logs to sort out your initial problem...
Re the NIC teaming, what teaming policy are you using and how many VMs are we talking about?
Cheers
Adam
Hi Adam,
we're just using the route via the originating virtual ID on the vswitch. No trunking on the physical switches etc.
Approx 10 VM's in the Virtual Machine Network, with 4 physical adapters in the VSwitch, which includes the service console.