VMware Cloud Community
clubbing80s
Contributor
Contributor

iSCSI high latency over direct cable connections ?

I’m see higher latencies than I would have expected on the access/write times between the ESXi 4.1 U1 server and the SAN Infortrend DS S16E-G1140-4. The value consistently floating around 15 ms, for the LUN that is in active use. There are 2 virtual machines currently on the ESXi host one is idle the other is monitoring server. The machines are on separate LUNs. When the backup runs it clones the machines to another LUN the latencies jump up further during this.

I find the latencies high given that the SAN is directly attached to the ESX server , via 2 cables, plugged from the interfaces on the storage into the interfaces on the ESXi server. Jumbo frames have been enable. And the ESXi server its self is idle.

Before I put more guest machines onto this environment I would like to be sure that there are no issues with the storage configuration.

I have attached the exported san configuration.

The SAN in configured:

14 disks in raid 60 giving 9 TB usable space. This is presented in 4 x 2 TB LUNs and 1 x 1 TB LUN.

Drives are Hitachi HUA72201.

The EXS server configuration:

ESXi 4.1 update 1

Manufacturer:   Supermicro

Model:                                 X8DT3

Processors:         12 CPU x 2.666 GHz

Processor Type: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz

Hyperthreading: Active

Total Memory: 23.99 GB

Number of NICs: 4

State:                    Connected

Network:

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

vmnic0  0000:02:00.00 igb         Up   1000Mbps  Full   00:1b:21:8c:72:60 9000   Intel Corporation 82576 Gigabit Network Connection

vmnic1  0000:02:00.01 igb         Up   1000Mbps  Full   00:1b:21:8c:72:61 9000   Intel Corporation 82576 Gigabit Network Connection

vmnic2  0000:08:00.00 igb         Up   1000Mbps  Full   00:25:90:29:cc:5e 1500   Intel Corporation 82576 Gigabit Network Connection

vmnic3  0000:08:00.01 igb         Up   1000Mbps  Full   00:25:90:29:cc:5f 1500   Intel Corporation 82576 Gigabit Network Connection

/vmfs/volumes/4e4e6cb7-3b8e90fc-a964-001b218c7261/scripts/ghettoVCB # esxcfg-vswitch  -l

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0         128         3           128               1500    vmnic2

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  VM Network            0        0           vmnic2

  Management Network    0        1           vmnic2

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch1         128         4           128               1500    vmnic3

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  Guest Network         0        2           vmnic3

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch2         128         3           128               9000    vmnic0

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  iSCSI_0               0        1           vmnic0

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch3         128         3           128               9000    vmnic1

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  iSCSI_1               0        1           vmnic1

Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type

vmk0       Management Network  IPv4      192.168.XX.XX                           255.255.255.0   192.168.XX.XX  00:25:90:29:cc:5e 1500    65535     true    STATIC

vmk1       iSCSI_0             IPv4      10.10.10.21                             255.255.255.0   10.10.10.255    00:50:56:72:13:82 9000    65535     true    STATIC

vmk2       iSCSI_1             IPv4      10.10.11.21                             255.255.255.0   10.10.11.255    00:50:56:7c:8b:76 9000    65535     true    STATIC

Please can you advise where we can tweak to get better performance.

Thank you.

0 Kudos
8 Replies
VeyronMick
Enthusiast
Enthusiast

Are the system logs on the ESX host showing any kinds of storage errors or warnings?

Your iSCSI configuration seems to be inline with best practice.

When you check esxtop and break the latency down by LUN are you seeing it on all LUNs on just some specific ones?

(esxtop -> d -> u -> L 40)

Some times a specifc LUN or management LUN may be reporting the high latency.

In that case you need to check to see what VMs are running on the LUN which may be contributing to them.

You also need to see what tier (RAID level) the LUN is running on incase its contributing to the issue.

Also is the latency DAVG or KAVG?

clubbing80s
Contributor
Contributor

Hi.

Thanks for taking time to advise.

I have attached a file with some captures of the esxtop output as specified, is there a way to capture these results in this format as regular intervals ?

The KAVG is very low mostly 0.00 , whereas the DAVG moves from 17 to as high as 55 for the one LUN and the other as low as 0.28 to 623.65.

The storage is 1 x Raid 60 array. That I have divided up into 5 LUNS (4 x 2TB , 1 x 1.) .

I have tried the following as well ...

I have remove the Jumbo frames, and tried each of the possible path configurations, the results where all the same.

From the DAVG being so high would this indicate the the issue is most likely with the sorage ?

Thanks

0 Kudos
f10
Expert
Expert

Hi,

Yes DAVG relates latency from the Storage Array (If you experience  high latency times, investigate current performance metrics and running  configuration for the switches and the SAN targets.)

Refer to http://kb.vmware.com/kb/1008205 for more info

Regards, Arun Pandey VCP 3,4,5 | VCAP-DCA | NCDA | HPUX-CSA | http://highoncloud.blogspot.in/ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
VeyronMick
Enthusiast
Enthusiast

It certainly looks like the storage array is the source of the high latency.

As a rule of thumb below 10ms for average DAVG is showing a well performing SAN.

10-30ms is serious performance issues (will see slow downs in VMs)

30+ms you are in trouble.

You don't mind seeing high peaks that disappear after a few seconds.

Your lowest latency on busy LUNs is still 10+ms which is still too high.

I would recommend focusing you attention on the array for the source of the issue.

0 Kudos
clubbing80s
Contributor
Contributor

I have trashed the configuration that was on the SAN when I got it.

The original configuration was Raid 60. 2 Raid 6 arrays of 7 Disks + 1 hot spare.

As a test I setup 1 x raid 1  array of 4 Disks + 1 host spare. The result of this was write latencies genraly not going past 14ms, and and average of 7ms , this is while there is a rebuild of the new array running in the background.

I would have expected the performace of the Raid 60 array to be better than that of the smaller raid 1 array, given the vast difference in the number of spindles.

What would cause this ?

What tool can I used under Linux to benchmark the storage so that I have better comparable statistics ?

Thanks

0 Kudos
JimPeluso
Enthusiast
Enthusiast

I thought raid 60 requires 8 disks to work correctly... 4 in a RAID 6 array and then 4 to mirror the RAID 6 array. Have you tried configuring the RAID 60 with 8 disks?

Good Luck

"The only thing that interferes with my learning is my education." If you found this information useful, please consider awarding points for "Correct" or "Helpful"
0 Kudos
clubbing80s
Contributor
Contributor

True 8 disk is the requirement for raid 6.  ( http://en.wikipedia.org/wiki/Nested_RAID_levels and same here http://www.thinkmate.com/Storage/What_is_Raid ).

The array was configured as 2 by raid 6 with 2 hot spares. Thus the device was misconfigured.

Sad that the interface will allow one to missconfigure it.

Once I have tested the Raid 5 and Raid 1 configuration I will test the raid 6 with 8 disks and see if it works.

Thanks

G

0 Kudos
clubbing80s
Contributor
Contributor

Hi.

As a last resort I downgraded to ESXi 4.0 U 1. the performance doubled.

I originally started out with ESXi 4.1 U 1, then Upgrade to ESXi 5.0 when I thought it was the SAN, and needed support for LUNs >2TB.

I validated this by keeping the SAN configuration and reinstalling ESXi 5.0 again , and the performance reverted to a degraded state. I kept the same configuration on the SAN for the ESXi 4.0 U 1 and the new ESXi 5.0 install.

What would cause ESXi 4.0 U 1 to work correctly but ESXi 4.1 U 1 and ESXi 5.0 to experience degrade performance ?

Thanks

G

0 Kudos