VMware Cloud Community
Henri
Contributor
Contributor

vSphere 5.1 and Qnap TS-879-RPU FW:3.7.3 and Freeze

Hi,

have here a 2 host, same configuration, one with a 10Gbit iSCSI attached Exomium box and one with a 10Gbit iSCSI QNAP TS-879 box (direct attached).

Have installed 6 Hitachi 2TB HDD in the QNAP. Did a lot of testing with QNAP in advance to move the box in production (FW:3.7.1).

Both installations ran fine and the performance was very well for a couple of month.

This is still true for the Exomium attached host, but after a firmware upgrade of the QNAP box (because a important fix came with this version),

the attached Vsphere 5.1 host freezed now 3 times in 3 weeks. The datastore disk access latency is bad, compared to the other box.

I ran "iostat -x 5" on the QNAP box, the HDD busy% is less than 30%, service time 3-5 ms, this seams normal. 

Here are some log, no message on the Hardware console, according to the SNMP manager the Vsphere host stopped 11/23/12 21:02:05.

1.) Any idea to proof, if this performance behavior causes the frezze? Any way to do a better diagnose?

2.) Anybody else with Qnaps and Vsphere 5.1? Have ran "iostat", "top" etc., checked the logs, anything else to get more informations?

3.) Have still a free 10Gbit port on both sides, anybody with multipathing on Qnaps?

Thanks

Henri

2012-10-23T18:46:31.487Z cpu5:4101)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10349 microseconds to 246308 microseconds.
2012-10-23T18:46:52.671Z cpu13:4109)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 314209 microseconds.
2012-10-23T18:46:52.709Z cpu13:4109)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 721348 microseconds.
2012-10-23T18:47:23.655Z cpu7:4103)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10350 microseconds to 214506 microseconds.
2012-10-23T18:48:15.083Z cpu3:5924)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10351 microseconds to 341996 microseconds.
2012-10-23T18:49:08.619Z cpu15:5920)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10352 microseconds to 218986 microseconds.
2012-10-23T18:49:29.211Z cpu5:6252)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10352 microseconds to 313394 microseconds.
2012-10-23T18:52:18.714Z cpu15:5924)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10354 microseconds to 211610 microseconds.
2012-10-23T20:10:05.175Z cpu3:4099)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10336 microseconds to 217475 microseconds.
2012-10-23T20:10:10.557Z cpu5:5926)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10337 microseconds to 207602 microseconds.
2012-10-23T20:10:23.458Z cpu9:4105)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10338 microseconds to 222609 microseconds.
2012-10-23T20:10:26.726Z cpu5:4101)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10339 microseconds to 208840 microseconds.
2012-10-23T20:10:28.230Z cpu9:5920)WARNING: ScsiDeviceIO: 1211: Device naa.60014051352449ed68f3d4d08d914dd2 performance has deteriorated. I/O latency increased from average value of 10339 microseconds to 429868 microseconds.
2012-10-23T20:13:50.852Z cpu15:5924)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1056 Unknown status vmhba36:0:0:0 (driver name: ahci) - Message repeated 1 time
0:00:00:04.437 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled
0:00:00:04.566 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found
2012-10-24T05:57:13.809Z cpu11:4555)WARNING: LinuxSignal: 761: ignored unexpected signal flags 0x2 (sig 17)
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.0 has no legacy interrupt(s)
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.0
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.1 has no legacy interrupt(s)
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.1
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: VMK_PCI: 1170: device 00:00:16.2 has no legacy interrupt(s)
2012-10-24T05:57:19.550Z cpu3:4571)WARNING: LinPCI: LinuxPCILegacyIntrVectorSet:80:Could not allocate legacy PCI interrupt for device 0000:00:16.2

2012-10-23T20:55:01Z crond[4569]: crond: USER root pid 901826 cmd /sbin/hostd-probe

2012-10-23T20:55:01Z syslog[901827]: starting hostd probing.

2012-10-23T20:55:16Z syslog[901827]: hostd probing is done.

2012-10-23T21:00:01Z crond[4569]: crond: USER root pid 902032 cmd /usr/lib/vmware/vmksummary/log-heartbeat.py

2012-10-23T21:00:01Z crond[4569]: crond: USER root pid 902033 cmd /sbin/hostd-probe

2012-10-23T21:00:01Z syslog[902035]: starting hostd probing.

2012-10-23T21:00:16Z syslog[902035]: hostd probing is done.

2012-10-23T21:01:01Z crond[4569]: crond: USER root pid 902117 cmd /sbin/auto-backup.sh

2012-10-24T05:57:12Z watchdog-vobd: [4514] Begin '/usr/lib/vmware/vob/bin/vobd ++min=0,max=100,group=uwdaemons', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = ''

2012-10-24T05:57:12Z watchdog-vobd: Executing '/usr/lib/vmware/vob/bin/vobd ++min=0,max=100,group=host/vim/vmvisor/uwdaemons'

2012-10-24T05:57:14Z jumpstart: Using policy dir /etc/vmware/secpolicy

2012-10-24T05:57:14Z jumpstart: Parsed all objects

2012-10-24T05:57:14Z jumpstart: Objects defined and obsolete objects removed

2012-10-24T05:57:14Z jumpstart: Parsed all domain names

2012-10-23T21:01:40.070Z [2B09BB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:01:40.070Z [2B09BB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2012-10-23T21:02:00.026Z [2B05AB90 warning 'vim.PerformanceManager'] Calculated read I/O size 602112 for scsi0:0 is out of range -- 602112,prevBytes = 1577965419008 curBytes = 1577995524608 prevCommands = 56798322curCommands = 56798372

2012-10-23T21:02:00.063Z [2B05AB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:02:00.063Z [2B05AB90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2012-10-23T21:02:03.585Z [2AD81B90 verbose 'SoapAdapter'] Responded to service state request

2012-10-23T21:02:04.477Z [2A981B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root

2012-10-23T21:02:19.680Z [2AD81B90 verbose 'DvsManager'] PersistAllDvsInfo called

2012-10-23T21:02:20.023Z [2B019B90 warning 'vim.PerformanceManager'] Calculated read I/O size 577308 for scsi0:0 is out of range -- 577308,prevBytes = 1577995524608 curBytes = 1578026699264 prevCommands = 56798372curCommands = 56798426

2012-10-23T21:02:20.057Z [2B019B90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

2012-10-23T21:02:20.057Z [2B019B90 verbose 'vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] UpdateStats: report cimom converter stats started

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] DumpStats: cimom stats file /tmp/.cvtcimsnmp.xml generated, size 298 bytes.

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] PublishReport: file /tmp/.cvtcimsnmp.xml published as /tmp/cvtcimsnmp.xml

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] DumpStats: cimom stats file published

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] NotifyAgent: write(51, /var/run/snmp.ctl, N) 1 bytes to snmpd

2012-10-23T21:02:24.707Z [2B09BB90 info 'Snmpsvc'] UpdateStats: report cimom converter stats completed

Section for VMware ESX, pid=5084, version=5.1.0, build=799733, option=Release

------ In-memory logs start --------

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 verbose 'Default'] No update forwarding configured

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] Supported VMs 87

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Handle checker'] Setting system limit of 2222

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Handle checker'] Set system limit to 2222

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] Setting malloc mmap threshold to 32 k

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] getrlimit(RLIMIT_NPROC): curr=64 max=128, return code = Success

mem> 2012-10-24T06:00:56.807Z [FFEA8D20 info 'Default'] setrlimit(RLIMIT_NPROC): curr=128 max=128, return code = Success

0 Kudos
8 Replies
kspare
Enthusiast
Enthusiast

I've done quite a bit of work with the qnap, i'm currently running a EC1279U-RP and I gave up on iscsi. I found NFS to work much much better.

You should do very very well with nfs and an actual 10 gig link.

0 Kudos
Henri
Contributor
Contributor

Hi,

have downgraded the firmware and run now 50% of the VMs on a NFS datastore. First seams to help, but as soon I have more I/O activity (20-30 MBytes/sec.) it's as worst as iSCSI. The migrate of a 300 GB VM from the iSCSI to the NFS datastore took a very long time and the disk latency got super bad (up to 500ms).

Any body else?

Thanks

Henri

0 Kudos
kspare
Enthusiast
Enthusiast

I'm running firmware Version 3.7.1 build 20120614 if that helps.

Also what Raid configuration are you running on your box?

0 Kudos
Henri
Contributor
Contributor

Hi,

I have here a Raid6 with 6 Hitachi 2TB drives.

Firmware is now 3.7.2..

Could you post a Latency diagram from vCenter while running some I/Os?

Thanks

Henri

0 Kudos
kspare
Enthusiast
Enthusiast

try switching to raid 5, you will see much better results.

0 Kudos
Henri
Contributor
Contributor

Hi,

Freeze was mainboard related, the memory controller of Slot 1-4 is defect. The server runs currently with slot 5-8 without further issues.

The performance problem still exists.

Is anybody out there, whos runs successfully vSphere 5.1 with the QNAP TS-879 or better and usualy HDDs?

Have installed 2 SSD in addition where the VMDKs lifes, this works fine, the performance is good. As soon I move some VM to

the RAID 6 (6 Hitachi 2TB drives) the performance is in avg. 120 ms, up to 600 ms per I/O. Had a lot of trouble with VEEAM at

this performance.

Thanks

Henri

0 Kudos
AlexsYB
Enthusiast
Enthusiast

I have major issues with QNAP nfs performance and ESX 5i and ESX 4i

TS-459U+

TS-469U-RP

TS-859U


All patched up to the latest QNAP firmware. direct attached to a 10G switch. the QNAPs are connected to 1G ports. I use the advance load balance. loaded with server HD's

What I have found is low IO's so running a few VM's 5-10 there are okay. I have tried iscsi and NFS ..  BUT ( i found this out when i moved my VDP vm onto the QNAP )  when you have a lot of IOPS (also noticed when i moved my zabbix monitoring box onto the QNAP DS) the QNAP fail badly I start to get 500ms -1.5s delays. I also notice the throughout maxs at around 9-10MB/s ....


I have raised SR's with VMWare of cause they blame QNAP. I have raised 2 or 3 support request with QNAP and tried their forums.  Best I got was join us on skype and we can diagnose problem ... yeah right, we tried a few things but they don't seem to have an idea and basically don't really care..


Now I am stuck with 3 QNAP device that basically I pieces of shit.... BUT ! I did some experimenting and found if I mount the space directly in the VM, ie nfs mount inside a linux vm or iscsi mount space directly into Windows i actually get some performance, alot better than what a iscsi or nfs backed VMWare Datastore might give me.


I also found a problem when the raid array rebuilds... performance is a dog, rebuild rate is around 8-10Mb/s ... for a 10Tb array that takes forever ... My home built system does 110Mb/s easy and still server multimedia.  The response I got from the QNAP support .. OH wait till its finished and then we can look for any faults .... Not sure how you diagnose a problem once its finished ... But it leads me to think they have gone ultra cheap on the SATA/SAS bus


SO I WOULD STAY AWAY FROM QNAP IT YOUR PLAN TO DO ANY THING SERIOUS..


0 Kudos
asus77x
Contributor
Contributor

hi alex,

im interesting from your posting about load balancing. what i understand is you run 3 qnaps and use load balancing to backup vms using nfs, cmiiw. if that correct, is it load balancing for backup process? how you do to achieve that?

0 Kudos