VMware Cloud Community
zlajoie
Contributor
Contributor

AX4-5i and high disk latency > 20ms constantly

I am repurposing an AX4-5i for a small VDI implementation, but I'm seeing the disk latency stay around 20ms and spike to 40ms without any VM's running on it.  I wanted to see if anybody had any ideas to help me troubleshoot this problem.

I currently have 2 disk pools on this SAN, the first is (4) 750GB sata disks Raid5 that the SAN OS runs on and the second is (8) 300GB 15K Raid10 that I want to run the VDI project on.  I chose Raid10 because I needed performance over capacity.  If I should have chose a different Raid type, then I am open to suggestions.

The SAN has two storage processors and each SP has two 1Gb network ports.  Each SP is on a different subnet (SP A: 172.16.252.0/24 and SP B: 172.16.253.0/24) and they are  each connected to two separate Dell 6224 switches with two separate VLAN's. See attached network diagram.

The hosts are (2) Dell R515 servers with dual 2.6Ghz AMD 4180 processors and 32GB of ram.  Both servers have (2) Intel dual port 1Gb NIC's and I am running one port of each card to each Dell switch.

I have also tried eliminating the switch and connecting the servers directly to the SAN, but the disk latency stays around 20ms.  I ran the iometer tests from http://vmktree.org/iometer/ and I have attached the results.  I really don't know if they are good or bad since I haven't seen any other benchmarks for the AX4-5i, but I wanted to work out the high latency issue before I moved forward with the virtual desktop project.

I have jumbo frames enabled on the switches and the vm hosts, I've also tried Round Robin and MRU for the paths, but neither changes the performance.  If I need to start over and blow everything away, I can do that too.

I have an Equallogic PS4000 and the disk latency on that SAN stays around 3 to 5ms with about 40 VM's on it.  I don't know what I would have done differently between the two to keep the disk latency low.

Thanks for any help that you can offer.

Tags (4)
0 Kudos
5 Replies
idle-jam
Immortal
Immortal

can you check if the cache controller are all working fine and etc?

0 Kudos
AndreTheGiant
Immortal
Immortal

Disk latency it too high. But it's the same on the two different datastore? (RAID5 and RAID10).

Have you tried to enable jumbo frame?

Switch are isolated and configured with flow control?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
zlajoie
Contributor
Contributor

Thanks for your replies.  I only have the RAID10 available to the VM's, so I'm not monitoring anything from the RAID5 that the SAN OS is running on.  Also, I do have jumbo frames enabled and the switches are physically separated from data network.  I'll have to check the flow control settings on the switches.

Everything on the SAN appears to be operating normally.  Here is the diagnostic report:

Collecting Storage Processor agent information...
.
SPA agent information  -  IP: 10.100.0.82

Agent Rev:           6.23.8 (1.2)
Name:                K10
Desc:               
Node:                A-APM00082201608
Physical Node:       K10
Signature:           2179101
Peer Signature:      2179148
Revision:            2.23.50.5.710
SCSI Id:             N/A
Model:               AX4-5i
Model Type:          Rackmount
Prom Rev:            5.40.00
SP Memory:           1023
Serial No:           APM00082201608
SP Identifier:       A
Cabinet:             DPE4AX


SPB agent information  -  IP: 10.100.0.83

Agent Rev:           6.23.8 (1.2)
Name:                K10
Desc:               
Node:                B-APM00082201608
Physical Node:       K10
Signature:           2179148
Peer Signature:      2179101
Revision:            2.23.50.5.710
SCSI Id:             N/A
Model:               AX4-5i
Model Type:          Rackmount
Prom Rev:            5.40.00
SP Memory:           1023
Serial No:           APM00082201608
SP Identifier:       B
Cabinet:             DPE4AX


.
.

WARNING: One or more LUNs are bound on APM00082201608 storage system!
.
*******************************************************************************
******** Verify that the Read and Write did enable ****************************
*******************************************************************************

Read and write cache Status during the test

SP Read Cache State                 Enabled
SP Write Cache State                Enabled
Cache Page size:                    16
Write Cache Mirrored:               YES
Low Watermark:                      30
High Watermark:                     50
SPA Cache pages:                    3341
SPB Cache pages:                    18930
Unassigned Cache Pages:             0
Read Hit Ratio:                     N/A
Write Hit Ratio:                    N/A
Prct Dirty Cache Pages =            0
Prct Cache Pages Owned =            0
SPA Read Cache State                Enabled
SPB Read Cache State                Enabled
SPA Write Cache State               Enabled
SPB Write Cache State               Enabled
System Buffer (spA):                592 MB
System Buffer (spB):                592 MB
SPS Test Day:                       Sunday
SPS Test Time:                      01:00
SPA Physical Memory Size =          1023
SPB Physical Memory Size =          1023
Physical memory size of Front-End = Switch not supported
Physical memory size of Back-End =  Switch not supported
SPA Free Memory Size =              0
SPB Free Memory Size =              0
Free Memory Size of Front-End =     Switch not supported
Free Memory Size of Back-End =      Switch not supported
SPA Read Cache Size =               75
SPB Read Cache Size =               75
SPA Write Cache Size =              356
SPB Write Cache Size =              356
SPA Optimized Raid 3 Memory Size =  0
SPB Optimized Raid 3 Memory Size =  0

Parsing SP logs for specific errors     SPA     SPB

Drive is going bad (803)               : 0    0
Uncorrectable sector (953)             : 0    0
Uncorrectable sector (840)             : 0    0
Uncorrectable sector (956)             : 0    0
Uncorrectable sector (957)             : 0    0
Invalid CRU signature(951)             : 0    0
Cache Dirty (90a)                      : 0    0
Port Glitch (63e)                      : 0    0
Drive causing loop failure (a18)       : 0    0
Soft media error (820)                 : 0    0
Soft SCSI bus error (801)              : 0    0
Unit shutdown (906)                    : 0    0
Lun Trespass (606)                     : 0    0
Rebuild started (603)                  : 0    0
Rebuild completed (604)                : 0    0
Equalize started (613)                 : 0    0
Equalize completed (614)               : 0    0
Background verify started (621)        : 0    0
Background verify completed (622)      : 0    0
CRU powered down (a07)                 : 0    0
VSC shutdown (904)                     : 0    0
Enclosure State change (850)           : 0    0
Fibre unknown event (6c7)              : 0    0
BE fibre loop down (6c1)               : 0    0
Loop down (71170008)                   : 0    0
Loop up (71170009)                     : 0    0
Drive removed (78b)                    : 0    0
Drive inserted (78c)                   : 0    0
Drive login retry (65c)                : 0    0
Bad drive or LCC (69d)                 : 0    0
Bad drive or LCC (69e)                 : 0    0


Event log entries*********************************


Event Log entries more than 200KB length cannot be reported

Physical disks state               *************************

               SPA                    SPB                    
Bus 0 Enclosure 0  Disk 0        Bus 0 Enclosure 0  Disk 0
State:                   Unbound        State:                   Unbound
       
Bus 0 Enclosure 0  Disk 1        Bus 0 Enclosure 0  Disk 1
State:                   Unbound        State:                   Unbound
       
Bus 0 Enclosure 0  Disk 2        Bus 0 Enclosure 0  Disk 2
State:                   Unbound        State:                   Unbound
       
Bus 0 Enclosure 0  Disk 3        Bus 0 Enclosure 0  Disk 3
State:                   Unbound        State:                   Unbound
       
Bus 0 Enclosure 0  Disk 4        Bus 0 Enclosure 0  Disk 4
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 5        Bus 0 Enclosure 0  Disk 5
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 6        Bus 0 Enclosure 0  Disk 6
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 7        Bus 0 Enclosure 0  Disk 7
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 8        Bus 0 Enclosure 0  Disk 8
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 9        Bus 0 Enclosure 0  Disk 9
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 10        Bus 0 Enclosure 0  Disk 10
State:                   Enabled        State:                   Enabled
       
Bus 0 Enclosure 0  Disk 11        Bus 0 Enclosure 0  Disk 11
State:                   Enabled        State:                   Enabled
       

Physical disks error count listing *************************

               SPA                    SPB                    
Bus 0 Enclosure 0  Disk 0        Bus 0 Enclosure 0  Disk 0
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 1        Bus 0 Enclosure 0  Disk 1
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 2        Bus 0 Enclosure 0  Disk 2
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 3        Bus 0 Enclosure 0  Disk 3
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 4        Bus 0 Enclosure 0  Disk 4
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 5        Bus 0 Enclosure 0  Disk 5
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 6        Bus 0 Enclosure 0  Disk 6
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 7        Bus 0 Enclosure 0  Disk 7
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 8        Bus 0 Enclosure 0  Disk 8
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 9        Bus 0 Enclosure 0  Disk 9
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 10        Bus 0 Enclosure 0  Disk 10
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0
       
Bus 0 Enclosure 0  Disk 11        Bus 0 Enclosure 0  Disk 11
Soft Write Errors:       0        Soft Write Errors:       0
Soft Read Errors:        0        Soft Read Errors:        0
Hard Write Errors:       0        Hard Write Errors:       0
Hard Read Errors:        0        Hard Read Errors:        0

0 Kudos
zlajoie
Contributor
Contributor

I found some new information on the performance page that indicates this is not a problem with the SAN after all.  When I looked in the advanced section for the performance, it shows that vmhba1 is where the latency is coming from.  That is the local disk raid card and it is a Dell Perc H200.  After searching the forums, it turns out that this card has no write back cache and you really need that to get good performance in a VMware environment.  Please look at the attached screen shot and tell me if i'm reading this correctly.

So my new question is, how much performance impact will I see if I keep this card installed, but not run any VM's off this controller?  I will still have the actual VMware software running from it since it boots from that controller?  If I need to replace it, I can do that too, but it will probably cost a lot and I'm not looking forward to asking for more money from my boss.  I won't make this mistake again...

0 Kudos
chriswahl
Virtuoso
Virtuoso

I've encountered this raid controller card before on a proof of concept install of ESX 4.0 on a Dell PE 1950 using local storage. The hypervisor and guests ran "okay" but write speeds were terrible (5MB/s). I ended up going with the PERC6/i to improve local performance.

Even though you would only have the hypervisor running on local disk, I personally wouldn't advocate going with something that is already showing latency values over 15ms without any VMs running. It just seems like a recipe for issues, although you may never have any. Just my opinion.

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
0 Kudos