vSphere 5.1 and Qnap TS-879-RPU FW:3.7.3 and Free...

Henri · ‎10-24-2012

Hi,

have here a 2 host, same configuration, one with a 10Gbit iSCSI attached Exomium box and one with a 10Gbit iSCSI QNAP TS-879 box (direct attached).

Have installed 6 Hitachi 2TB HDD in the QNAP. Did a lot of testing with QNAP in advance to move the box in production (FW:3.7.1).

Both installations ran fine and the performance was very well for a couple of month.

This is still true for the Exomium attached host, but after a firmware upgrade of the QNAP box (because a important fix came with this version),

the attached Vsphere 5.1 host freezed now 3 times in 3 weeks. The datastore disk access latency is bad, compared to the other box.

I ran "iostat -x 5" on the QNAP box, the HDD busy% is less than 30%, service time 3-5 ms, this seams normal.

Here are some log, no message on the Hardware console, according to the SNMP manager the Vsphere host stopped 11/23/12 21:02:05.

1.) Any idea to proof, if this performance behavior causes the frezze? Any way to do a better diagnose?

2.) Anybody else with Qnaps and Vsphere 5.1? Have ran "iostat", "top" etc., checked the logs, anything else to get more informations?

3.) Have still a free 10Gbit port on both sides, anybody with multipathing on Qnaps?

Thanks

Henri

2012-10-23T18:46:31.487Z cpu5:4101)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10349 microseconds to 246308 microseconds.

2012-10-23T18:46:52.671Z cpu13:4109)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 314209 microseconds.

2012-10-23T18:46:52.709Z cpu13:4109)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 721348 microseconds.

2012-10-23T18:47:23.655Z cpu7:4103)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 214506 microseconds.

2012-10-23T18:48:15.083Z cpu3:5924)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10351 microseconds to 341996 microseconds.

2012-10-23T18:49:08.619Z cpu15:5920)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10352 microseconds to 218986 microseconds.

2012-10-23T18:49:29.211Z cpu5:6252)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10352 microseconds to 313394 microseconds.

2012-10-23T18:52:18.714Z cpu15:5924)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10354 microseconds to 211610 microseconds.

2012-10-23T20:10:05.175Z cpu3:4099)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10336 microseconds to 217475 microseconds.

2012-10-23T20:10:10.557Z cpu5:5926)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10337 microseconds to 207602 microseconds.

2012-10-23T20:10:23.458Z cpu9:4105)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10338 microseconds to 222609 microseconds.

2012-10-23T20:10:26.726Z cpu5:4101)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10339 microseconds to 208840 microseconds.

2012-10-23T20:10:28.230Z cpu9:5920)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10339 microseconds to 429868 microseconds.

2012-10-23T20:13:50.852Z cpu15:5924)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1056 Unknown status vmhba36:0:0:0 (driver name: ahci) - Message repeated 1 time

0:00:00:04.437 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled

0:00:00:04.566 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found

2012-10-24T05:57:13.809Z cpu11:4555)WARNING: LinuxSignal: 761: ignored unexpected signal flags 0x2 (sig 17)

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.0 has no legacy interrupt(s)

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.0

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.1 has no legacy interrupt(s)

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.1

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.2 has no legacy interrupt(s)

2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.2

2012-10-23T20:55:01Z crond[4569]: crond: USER root pid 901826 cmd /sbin/hostd-probe

2012-10-23T20:55:01Z syslog[901827]: starting hostd probing.

2012-10-23T20:55:16Z syslog[901827]: hostd probing is done.

2012-10-23T21:00:01Z crond[4569]: crond: USER root pid 902032 cmd /usr/lib/vmware/vmksummary/log-heartbeat.py

2012-10-23T21:00:01Z crond[4569]: crond: USER root pid 902033 cmd /sbin/hostd-probe

2012-10-23T21:00:01Z syslog[902035]: starting hostd probing.

2012-10-23T21:00:16Z syslog[902035]: hostd probing is done.

2012-10-23T21:01:01Z crond[4569]: crond: USER root pid 902117 cmd /sbin/auto-backup.sh

2012-10-24T05:57:12Z watchdog-vobd: [4514] Begin '/usr/lib/vmware/vob/bin/vobd ++min=0,max=100,group=uwdaemons', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = ''

2012-10-24T05:57:12Z watchdog-vobd: Executing '/usr/lib/vmware/vob/bin/vobd ++min=0,max=100,group=host/vim/vmvisor/uwdaemons'

2012-10-24T05:57:14Z jumpstart: Using policy dir /etc/vmware/secpolicy

2012-10-24T05:57:14Z jumpstart: Parsed all objects

2012-10-24T05:57:14Z jumpstart: Objects defined and obsolete objects removed

2012-10-24T05:57:14Z jumpstart: Parsed all domain names

2012-10-23T21:01:40.070Z [2B09BB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:01:40.070Z [2B09BB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection. Turn on 'trivia' log for details

2012-10-23T21:02:00.026Z [2B05AB90 warning 'vim.PerformanceManager'] Calculated read I/O size 602112 for scsi0:0 is out of range -- 602112,prevBytes = 1577965419008 curBytes = 1577995524608 prevCommands = 56798322curCommands = 56798372

2012-10-23T21:02:00.063Z [2B05AB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:02:00.063Z [2B05AB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection. Turn on 'trivia' log for details

2012-10-23T21:02:03.585Z [2AD81B90 verbose 'SoapAdapter'] Responded to service state request

2012-10-23T21:02:04.477Z [2A981B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root

2012-10-23T21:02:19.680Z [2AD81B90 verbose 'DvsManager'] PersistAllDvsInfo called

2012-10-23T21:02:20.023Z [2B019B90 warning 'vim.PerformanceManager'] Calculated read I/O size 577308 for scsi0:0 is out of range -- 577308,prevBytes = 1577995524608 curBytes = 1578026699264 prevCommands = 56798372curCommands = 56798426

2012-10-23T21:02:20.057Z [2B019B90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:02:20.057Z [2B019B90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection. Turn on 'trivia' log for details

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] UpdateStats: report cimom converter stats started

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] DumpStats: cimom stats file /tmp/.cvtcimsnmp.xml generated, size 298 bytes.

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] PublishReport: file /tmp/.cvtcimsnmp.xml published as /tmp/cvtcimsnmp.xml

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] DumpStats: cimom stats file published

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] NotifyAgent: write(51, /var/run/snmp.ctl, N) 1 bytes to snmpd

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] UpdateStats: report cimom converter stats completed

Section for VMware ESX, pid=5084, version=5.1.0, build=799733, option=Release

------ In-memory logs start --------

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 verbose 'Default'] No update forwarding configured

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] Supported VMs 87

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Handle checker'] Setting system limit of 2222

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Handle checker'] Set system limit to 2222

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] Setting malloc mmap threshold to 32 k

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] getrlimit(RLIMIT_NPROC): curr=64 max=128, return code = Success

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] setrlimit(RLIMIT_NPROC): curr=128 max=128, return code = Success

kspare · ‎10-24-2012

I've done quite a bit of work with the qnap, i'm currently running a EC1279U-RP and I gave up on iscsi. I found NFS to work much much better.

You should do very very well with nfs and an actual 10 gig link.

Henri · ‎11-05-2012

Hi,

have downgraded the firmware and run now 50% of the VMs on a NFS datastore. First seams to help, but as soon I have more I/O activity (20-30 MBytes/sec.) it's as worst as iSCSI. The migrate of a 300 GB VM from the iSCSI to the NFS datastore took a very long time and the disk latency got super bad (up to 500ms).

Any body else?

Thanks

Henri

kspare · ‎11-05-2012

I'm running firmware Version 3.7.1 build 20120614 if that helps.

Also what Raid configuration are you running on your box?

Henri · ‎11-05-2012

Hi,

I have here a Raid6 with 6 Hitachi 2TB drives.

Firmware is now 3.7.2..

Could you post a Latency diagram from vCenter while running some I/Os?

Thanks

Henri

kspare · ‎11-05-2012

try switching to raid 5, you will see much better results.

Henri · ‎02-04-2013

Hi,

Freeze was mainboard related, the memory controller of Slot 1-4 is defect. The server runs currently with slot 5-8 without further issues.

The performance problem still exists.

Is anybody out there, whos runs successfully vSphere 5.1 with the QNAP TS-879 or better and usualy HDDs?

Have installed 2 SSD in addition where the VMDKs lifes, this works fine, the performance is good. As soon I move some VM to

the RAID 6 (6 Hitachi 2TB drives) the performance is in avg. 120 ms, up to 600 ms per I/O. Had a lot of trouble with VEEAM at

this performance.

Thanks

Henri

AlexsYB · ‎08-03-2013

I have major issues with QNAP nfs performance and ESX 5i and ESX 4i

TS-459U+

TS-469U-RP

TS-859U

All patched up to the latest QNAP firmware. direct attached to a 10G switch. the QNAPs are connected to 1G ports. I use the advance load balance. loaded with server HD's

What I have found is low IO's so running a few VM's 5-10 there are okay. I have tried iscsi and NFS .. BUT ( i found this out when i moved my VDP vm onto the QNAP ) when you have a lot of IOPS (also noticed when i moved my zabbix monitoring box onto the QNAP DS) the QNAP fail badly I start to get 500ms -1.5s delays. I also notice the throughout maxs at around 9-10MB/s ....

I have raised SR's with VMWare of cause they blame QNAP. I have raised 2 or 3 support request with QNAP and tried their forums. Best I got was join us on skype and we can diagnose problem ... yeah right, we tried a few things but they don't seem to have an idea and basically don't really care..

Now I am stuck with 3 QNAP device that basically I pieces of shit.... BUT ! I did some experimenting and found if I mount the space directly in the VM, ie nfs mount inside a linux vm or iscsi mount space directly into Windows i actually get some performance, alot better than what a iscsi or nfs backed VMWare Datastore might give me.

I also found a problem when the raid array rebuilds... performance is a dog, rebuild rate is around 8-10Mb/s ... for a 10Tb array that takes forever ... My home built system does 110Mb/s easy and still server multimedia. The response I got from the QNAP support .. OH wait till its finished and then we can look for any faults .... Not sure how you diagnose a problem once its finished ... But it leads me to think they have gone ultra cheap on the SATA/SAS bus

SO I WOULD STAY AWAY FROM QNAP IT YOUR PLAN TO DO ANY THING SERIOUS..

asus77x · ‎05-09-2014

hi alex,

im interesting from your posting about load balancing. what i understand is you run 3 qnaps and use load balancing to backup vms using nfs, cmiiw. if that correct, is it load balancing for backup process? how you do to achieve that?

All

vSphere 5.1 and Qnap TS-879-RPU FW:3.7.3 and Freeze