NickHorton
Contributor
Contributor

Puzzling svmotion latency issue

We're having a puzzling issue with our iSCSI luns on our ESX 4.0u1 hosts. If I storage vmotion a vm between iscsi luns, the DAVG/cmd value in esxtop can hit as high as 3000. If I svmotion a VM from local storage to any of the LUNS, or from the LUNS to local storage I don't have the same issues, and throughput is wonderful. Around 800 Mb/sec. When i check esxtop statistics on the other hosts in the cluster, I am not seeing the same DAVG/cmd latency. I suppose that's to be expected somewhat as the other host is not orchestrating the vmotion. However, I wonder if the issue is storage related or has something to do with the iSCSI connection.

The storage array is an EMC Celerra/Clariion unified solutioin. NS-120FC/CX4-120. Most recent DART/FLARE29. I have two iscsi vmkernel ports on each host, however they are set for most recently used. They are on the same vlan, so the multipathing bug with iscsi initiators on the same VLAN to an EMC won't be fixed until FLARE30.

I'm sure I left a lot out but I'm a bit fuzzy on what to check at this point. Ideas?

Thanks,

Nick

Tags (4)
0 Kudos
1 Reply
LairdBedore
Contributor
Contributor

You're not alone on this one.  Although I've found almost no mention of these problems by anyone else, it definitely does exist.  There appear to be either bugs or undocumented restrictions on how multipathing can be configured.  Below I have examples of a few sites; two of which experienced these problems (and how I fixed them) and three identical sites that never experienced any trouble.

I ran into this issue due to template deployments going slow, but SVMotion uses the same mechanism and behaves in exactly the same way.

Symptoms:

  • If deploying the template from iSCSI to a non-iSCSI LUN such as the local block storage, it deploys normally and very quickly (>1Gb/S)
  • If deploying the template from non-iSCSI LUN such as the local block storage to iSCSI, it deploys normally and very quickly (>1Gb/S)
  • If deploying the template from one iSCSI LUN to another iSCSI LUN, disk latency goes through the roof (>500ms) and throughput is absurdly slow.  However, there is near zero I/O or CPU usage on the host or the storage array, and no errors on either one.  It will take 2-3 hours to deploy a 30GB template.

Site #1:

Three HP DL380 G6

ESX 4.0

Two gigabit uplinks (active/active) to vSwitch0, one service console and one vmk iSCSI port on vswitch

Software iSCSI adapter

HP MSA 2312, 2 gigabit iSCSI ports

VMW_SATP_ALUA, Most Recently Used

2 paths per LUN, both active, one I/O

Site #1, FIXED:

Three HP DL380 G6

ESX 4.0

Two gigabit uplinks (active/active) to vSwitch0, one service console and one vmk iSCSI port on vswitch

on service console port, change NIC teaming to active/standby

on vmk iSCSI port, change NIC teaming to active/standby, using the opposite active NIC as the service console

Software iSCSI adapter

HP MSA 2312, 2 gigabit iSCSI ports

VMW_SATP_ALUA, Most Recently Used

2 paths per LUN, both active, one I/O

Sites #2,3,4: (do NOT experience this problem)

Three HP DL380 G5

ESX 4.1

Two gigabit uplinks (active/active) to vSwitch1, one vmk iSCSI port on vswitch, nothing else!

Software iSCSI adapter

HP MSA 2312, 2 gigabit iSCSI ports

VMW_SATP_ALUA, Most Recently Used

2 paths per LUN, both active, one I/O

Site #5:

Three HP DL360 G7

ESX 4.1

Two GigE uplinks, each to its own vSwitch and vmk port

Hardware iSCSI offload using Broadcom onboard NICs

EMC CX4-240, 4 gigabit iSCSI ports

iSCSI Delayed Ack disabled

VMW_SATP_ALUA_CX, Round-Robin

4 paths per LUN, all active, two I/O (one per SP & vmhba)

Site #5, FIXED:

Three HP DL360 G7

ESX 4.1

Two GigE uplinks, each to its own vSwitch and vmk port

both vmk ports bound to single software iSCSI adapter

EMC CX4-240, 4 gigabit iSCSI ports

iSCSI Delayed Ack disabled

VMW_SATP_ALUA_CX, Round-Robin

4 paths per LUN, all active, two I/O (one per SP & vmhba)

Notes:

  • This problem would NOT occur when the template and the destination VM were on the same iSCSI LUN.
  • This problem would NOT occur when the only active paths were across the same vmhba, only when the active paths were split among two (or more?) vmhbas.
  • If you're teaming multiple NICs for one vSwitch and vmk iSCSI port, you're not going to get aggregated throughput, but you can get resiliency.  This is due to how VMware does MPIO.  If you want aggregated throughput, assign each NIC to its own vSwitch and vmk iSCSI port, then bind them both to the SW iSCSI adapter.

Summary:

  • The vmk iSCSI port does NOT like to share a vswitch with anything else.
  • I don't think differences in server models makes any difference, but SW vs HW iSCSI does matter.
  • Having multiple physical adapters map back to a single SW iSCSI controller seems to work better than mapping them separately (via HW iSCSI).

I hope this helps.

Laird Bedore

0 Kudos