Re: Single Threaded Application on HP DL 360 G3 fa... - Page 2

meistermn · ‎10-20-2008

The application Gupta SQL 9.0.1 is now running in a VM. The Datastores are on a Netapp 6070 (Raid DP 41 FC Disks a 300 GB)

The VM uses 5 LUN's. C: Windows 2003 D:Data E: Gupta SQL : F: Gupta SQL Logs G: Pagefile

Before it run on a HP DL360 G3 (Smart 5i raid controller 64 MB Cache) with 6 x 36 GB and 3 Logical Volumes on raid 5 .

On the HP DL360 G3 the databse update statistics runs 1Hour 3 Minuten.

The VM on the Netapp needs to 2 Hours 14 Minutes.

Analyzed Application SQL Gupta with perfmon: Average transfer size /sec for SQL Log 64 KB Read and Write (E:) and for SQL log 4 KB Read and Write

Performance Tweaks:

1.) Datacore Uptempo: Only 5 Mintes Faster. I think the DB , which 30 GB , is to big.

2.) Partition Aligment for E: and F Partition : 1 Hour 45 Minutes. Improvement 29 Minutes .

3.) OS Filesystem Blocksize(Clustersize) change for E: to 64 KB (default 4 KB) no improvement

4.) OS Filesystem Blocksize changed for F: to 32 KB (default 4 KB) no improvement

5.) Separate LUN for Pagefile G: no improvment

6.) VMware VC changed Queue Depth to 128 : no improvement

Arrgh!!!! What the hell is the problem? Is it cache for random io?

Craig_Baltzer · ‎10-20-2008

Two other quick things:

General "rule of thumb" is not to increase queue depth above 64 on ESX (not that it is particularly relevant here since you had performance problems at the defaults, but there is always a concern in a shared SAN environment of overrunning the SAN storage controller)
Have another look at the "storage analysis" section of and particularily using ESXTOP to look at %USD (amount of queue depth being used), DAVG/cmd (latency in the SAN), KAVG/cmd (latency in the kernel), ABRTS/s (the number of commands being aborted (basically a timeout))

You can import esxtop data into perfmon to make it more readible. goes through how to setup esxtop to capture the data to a csv file, and http://communities.vmware.com/docs/DOC-5100 goes through how to get it into perform. Looking at things in perfmon with all the features available to you (esp with Vista/Windows 2008 performon) much bettern than trying to figure anything out looking at the esxtop "table" display...

meistermn · ‎10-21-2008

There is a huge differents e between single and multithreed applications.

With multithreaded iometer testing (outstanding Io 25) the netapp 6070 I can get 180 MB/s for 100 % random read.

But for single threaded application it is only 7 MB/s (outstanding io 1) or slower for random read.

Read the following document. Page 20 and Page 21.

There are to paging files . One on c: and one g: .

meistermn · ‎10-21-2008

We use Qlogic cards (PCI-x). Look at snia document for single threaded application.

I did following test: All run datacore uptempo

HP DL 360 with Raid Controller Smart 5i without a BBWC (64 MB Cache default 100 Read Ahead) 6 x 36 Disks. 3 Logical Volumes ,all Raid 5.

Iometer result 20 MB/s for 100 % read random and 1 out standing io

IBM DS 8000 with Raid 5 (7+1)

Iometer result 16 MB/s for 100 % read random and 1 out standing io

Netapp 6070 with raid dp 41 disks raid dp.

Iometer result 7 MB/s for 100 % read random and 1 out standing io.

I read about the raid levels 1 , 10 , 5 and 6 and sql, oracel documents and all tell that for sql logs raid 10 is use and for sql db allthough raid 10 should be used. Allthough a sun document telles that the raid level 6 (raid dp) can only achieve 66 % raid 5 performance.

I am allthough astonished about the new netapp performance accelartion module.

They say following:

"For read intensive random I/O applications that are latency sensitive this requires the configuration of a high number disks in order to provide user and application acceptable latencies."

And if you look at the snia document on page 20 at the Note "Single thread applications are extremly sensitive to latency ...", they point in the same direction.

For me applications that need high random read /write io should be placed on solid satet disks or use much cache.

I am very interested in the new intel solid state disks. Look at the iozone results . (sorry the text is german, but the random values tells for them self)

iozone KByte/sec	Intel SSD 80 GB SATAII	Seagate Cheetah 73.4 GB 15k rpm SAS	2 xWD Raptor 36 GB 10k rpm RAID0 SATA	Maxtor DiamondMax 9 Plus 160 GB 7.2k rpm SATA

sequential write 4 KB	75'471	74'730	47'840	44'720
sequential write 16 KB	76'178	87'147	50'381	45'463
sequential write 32 KB	77'325	94'126	66'401	44'700
sequential write 64 KB	76'484	104'642	68'319	41'458
sequential write 128 KB	77'204	78'664	84'472	43'977
sequential write 256 KB	77'257	87'703	89'999	49'007

sequential read 4 KB	240'626	61'252	20'062	39'566
sequential read 16 KB	240'505	69'315	21'810	43'703
sequential read 32 KB	239'662	74'521	32'627	37'019
sequential read 64 KB	239'235	81'718	39'331	39'904
sequential read 128 KB	242'603	36'893	53'627	40'527
sequential read 256 KB	240'404	71'301	49'105	43'344

random write 4 KB	48'194	4'521	3'005	1964
random write 16 KB	75'127	7'356	7'573	5640
random write 32 KB	79'788	12'493	13'012	8'853
random write 64 KB	81'474	22'158	20'132	12'756
random write 128 KB	79'147	33'757	24'931	17'808
random write 256 KB	80'460	47'147	35'806	23'862

random read 4 KB	30'287	2'617	1'203	362
random read 16 KB	95'552	4'504	4'331	1'451
random read 32 KB	149'029	7'291	7'057	3'094
random read 64 KB	197'591	12'840	11'432	5'775
random read 128 KB	267'814	24'534	14'976	10'169
random read 256 KB	205'981	26'387	18'218	12'364

meistermn · ‎10-21-2008

I think 3.583 means "three thousand, five hundred and eighty three".

I dont think that spotlight, perfmon and sysinternals process monitor show a false used of memory. But i am not sure.

What if from 4 GB only 2 GB can be used.

4 GB in Windows menas, that 2 GB is used for Usermode and 2 GB for Kernelmode.

This means to me that in 32 bit Windows Standard Edition an application cannot use no more than 2 GB in usermode.

Kernelmode can use usermode memory, but usermode cannot use kernel memory!!!

The only was to use more usermode memory is to use the switch /3G in the boot.ini. I do not no if this works for the standard edtion of windows 32 bit

Craig_Baltzer · ‎10-21-2008

The SAN controllers typically have a bunch of cache already on them, however "more is better" holds true when it comes to cache.

Single threaded, sequential IO is definitely latency sensitive, and direct attached disk will typically outperform anything else just because of the physics (signal has to travel less than 1 meter and go through 1 controller rather than travelling 10s of meters and go through multiple controllers. That being said it is extremely rare to find any multi-user application (or any application at all for that matter) that is single threaded now days.

The thing that sticks out for me looking at your perfmon disk results is that reads look reasonable, but the writes are horrible. That usually points back to some kind of write cache issue (cache disabled, not being used or the application is requesting a synchronous "write through" that is being honoured.

Craig_Baltzer · ‎10-21-2008

I don't think that perfmon and sysinternals are "wrong" per se, I just think they're looking at counters that don't correspond exactly to the ESX/VC counters.

Spotlight's estimation of "hard" page rates just isn't supported by the perfmon disk numbers. If you're hard paging 3582 pages of memory per second, then it has to be going to the page file which is either on C: or G:. The whole G: drive is showing no IO at all so its not going there, and the C: drive is showing 4000 bytes/sec. The OS is certainly not paging 1 byte at a time, so the 3582 pages/sec number as well as the "cache hit ratio" that Spotlight is coming up with is suspect.

meistermn · ‎10-23-2008

I removed the pagefile, which was on g: partition and allthough removed the g: partition.

Now there is only one pagefile on the c drive.

Then I changed the registry key DisablePagingExecutive in HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management .

.

Over the day only a pagefile of 10 MB (before 90 MB) was needed.

Then when the gupta db starts in the night, spotlight showed that form the 10 MB pagefile 2000-3000 pages /s were read.

Then I asked me , what is using the pagefile and I started the sysinternals process monitor while gupta was running and configured the path to c:\pagefile.sys.

Boom!!! A part of the dbntsrv process (gupta sql ) was paged out in pagefile. So is it now bad programed?

Only Physical Power will help: Fusion IO

mreferre · ‎10-23-2008

So assuming the same behaviour applies to the physical servers you used to have .... are you saying that this usage pattern (well designed or badly designed - whatever it is) is very much penalized when running on ESX ?

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

meistermn · ‎12-25-2008

We tried a new solution for Gupta application, because the customer was not satisfied.

It was implemented on a physical server.

We used a HP DL 360 G5 with 8 GB memory , a P400i with battery cache 256 MB and 3 x 36 GB (one spare) raid 1 and a second raid controller P800i

with 512 MB Cache with an external DAS with 10 x 36 GB Disks 15 K and Raid 5.

Now on we get 130 MB/s on the DAS!!! So at the moment DAS is Faster than any SAN!!!

I think it has definitly to do with the cache on the P800i controller and the different raid level 10. 5, 6

So in January 2009 we will test with the 4 x 16 GB netapp performance accelaeration cars, if can get the same or better performance on a san!!!

CWedge · ‎06-02-2009

I've had problems with HP gear and ESX for years...

I have proven over and over again that you take ESX running to a Netapp NFS store create a VM, you get meager performance..

Remove Esx from said machine and install unbuntu and you get blazing speed.. Same hardware, same NFS store.

The answer....Esx binds the USB irqs to the console which only uses CPU 0. Because the irqs are shared with your storage controller and nics it also restricts all those to CPU 0.

We disabled USB and made sure the driver didn't start in esx. We got a 5x Througput increase

We went from 70MB/s on local vmfs storage to 333MB/s and were able to sautrate 1GBe at 100MB/s to the netapp 3070.

This has been true on any HP servers from like 5 G3+ that i've used.

CWedge · ‎06-03-2009

See this thread of my suggestion being a success.

http://communities.vmware.com/message/1271112#1271112

All

Single Threaded Application on HP DL 360 G3 faster than on Netapp 6070