Hi Community,
i need some help to get rid of a performance problem.
If i create a snapshot of a vm on my netapp array with the "snapshot vm memory" option the vmkernel port is not saturated, so that the craetion process takes long time.
As a benchmark i started a storage vmotion which saturates my array.
PORT-ID USED-BY TEAM-PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX ACTN/s
50331652 vmk2 vmnic5 vSwitch1 26589.29 1997.36 33060.58 2011.38 0.00 0.00 17132.35 svmotion
50331652 vmk2 vmnic5 vSwitch1 1634.88 173.42 2201.95 38.39 0.00 0.00 1846.65 snap
DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
{NFS}Test - - - 6 - - - 4016.83 0.58 4016.25 0.01 250.71 1.39 - 6.74 - svmotion from sas array to the sata test array
{NFS}Test - - - 0 - - - 213.62 30.52 183.11 0.48 19.73 0.38 - 0.59 - create snapshot on the array
Offset for NFS:
intel nic 82599 10gbit -> vmnic5 -> vmk2 -> MTU 9000
Settings like described in the netapp manual.
Isolated one VM on ESX and Datastore to avoid any concurrency.
Would be awesome if anyone can give me a hint.
Hi,
I thinks that’s the normal behavior and not a Bottleneck.
I tried today a SnapShot with memory on a really fast Array and a Host without any Bottlenecks.
VM had 4 GB Memory in Idle State.
Task took about 2.5 minutes
If the snapshot includes the memory option, the ESX host writes the memory of the virtual machine to disk.
Note: The virtual machine is stunned throughout the duration of time the memory is being written. The length of time of the stun cannot be pre-calculated, and is dependent on the performance of the disk that has issue and the amount of memory being written. ESXi/ESX 4.x and later have shorter stun times when memory is being written. For more information, see Taking a snapshot with virtual machine memory stuns the virtual machine while the memory is written ....
Can you you run and post a "sysstat -x 2" from your Array during this Operation?
Oh, what Ontap version are you using?
Is Jumbo Frame Support enabled end to end (Kernel, vSwitch, Switch, NetApp vif)?
Can you also check this NetApp option: options nfs.tcp.recvwindowsize
Hi,
ONTAP is 8.2 7-Mode, Jumboframes are enabled on vmk,vSwitch,Juniper,netapp. Flowcrontrol is disabled.
sysstat result:
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
7% 327 0 0 327 21105 170 1546 46706 0 0 10s 99% 36% H 20% 0 0 0 0 0 0 0
26% 1812 0 0 1819 118894 979 3904 149920 0 0 10s 97% 42% Hs 34% 7 0 0 0 0 0 0
30% 2169 0 0 2173 142749 1152 4777 151289 0 0 10s 98% 43% H 39% 4 0 0 0 0 0 0
7% 339 0 0 341 21437 173 1974 46820 0 0 10s 98% 41% H 19% 2 0 0 0 0 0 0
6% 413 0 0 418 26667 255 2736 11036 0 0 10s 99% 16% Hf 7% 5 0 0 0 0 0 0
5% 339 0 0 341 21778 175 150 35670 0 0 10s 99% 58% : 13% 2 0 0 0 0 0 0
8% 434 0 0 578 27055 263 2080 43976 0 0 10s 98% 58% Hf 13% 144 0 0 0 0 0 0
4% 386 0 0 391 24954 217 42 5880 0 0 10s 99% 18% : 11% 5 0 0 0 0 0 0
7% 435 0 0 438 28070 233 2896 47290 0 0 10s 99% 69% H 21% 3 0 0 0 0 0 0
Windowsize is set to options nfs.tcp.recvwindowsize 65535.
I have no clue why snapshot is slow while dd or migration is fast.
Ok,
looks also ok.
Only cache is interresting. Hf
CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are as follows:
The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:
And in a 10G enviroment set options nfs.tcp.recvwindowsize to 64240
Without Memory the SnapShot Process is fast?
How large it the Memory of the VM?
Can you check the ESXTOP Memory counter %ACTV during Snapshot an in normal operation?
Interpreting esxtop Statistics
Hello and thanks for the quick reply.
I have changed the window size options nfs.tcp.recvwindowsize to 64240.
And Yes Snapshot without Memory takes 1 second.
good copy if i move the vm with storage vmotion from sas filer to the sata filer i can write on the test ds with round about 2xx mb/s wich saturates the array_
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
63% 4484 0 0 4485 297134 2339 14301 348175 0 0 0s 97% 86% Hn 74% 1 0 0 0 0 0 0
67% 4410 0 0 4508 291567 2292 9632 363466 0 0 0s 98% 91% H 80% 98 0 0 0 0 0 0
63% 4484 0 0 4485 297134 2339 14301 348175 0 0 0s 97% 86% Hn 74% 1 0 0 0 0 0 0
67% 4410 0 0 4508 291567 2292 9632 363466 0 0 0s 98% 91% H 80% 98 0 0 0 0 0 0
64% 4390 0 0 4392 291666 2300 12804 373094 0 0 0s 97% 80% Hs 70% 2 0 0 0 0 0 0
63% 4512 0 0 4621 298393 2346 11718 356983 0 0 0s 97% 82% Hs 61% 109 0 0 0 0 0 0
59% 4335 0 0 4377 287452 2259 10293 325769 0 0 0s 97% 79% Hn 66% 42 0 0 0 0 0 0
62% 4497 0 0 4497 297907 2341 11590 357236 0 0 0s 97% 87% H 72% 0 0 0 0 0 0 0
63% 4481 0 0 4484 296841 2333 10870 361800 0 0 0s 97% 83% Hv 71% 3 0 0 0 0 0 0
63% 4447 0 0 4451 294783 2323 9622 349846 0 0 0s 97% 79% Hv 64% 4 0 0 0 0 0 0
62% 4530 0 0 4530 300362 2349 12162 343260 0 0 0s 97% 84% Hn 72% 0 0 0 0 0 0 0
63% 4492 0 0 4495 297670 2325 10587 365699 0 0 0s 97% 81% H 66% 3 0 0 0 0 0 0
60% 4212 0 0 4214 278982 2188 9732 349126 0 0 0s 97% 79% Hv 63% 2 0 0 0 0 0 0
62% 4426 0 0 4426 293402 2314 10519 351468 0 0 0s 97% 80% Hf 66% 0 0 0 0 0 0 0
63% 4470 0 0 4474 296371 2351 13604 357776 0 0 0s 97% 84% Hv 67% 4 0 0 0 0 0 0
61% 4474 0 0 4476 296520 2333 10526 334506 0 0 0s 97% 87% H 65% 2 0 0 0 0 0 0
62% 4356 0 0 4358 288532 2284 12276 360280 0 0 0s 97% 93% H 76% 2 0 0 0 0 0 0
61% 4223 0 0 4224 280289 2211 13394 356961 0 0 0s 97% 78% Hs 65% 1 0 0 0 0 0 0
61% 4371 0 0 4372 289224 2290 9811 348589 0 0 0s 97% 79% Hs 67% 1 0 0 0 0 0 0
63% 4472 0 0 4474 296450 2336 13253 355076 0 0 0s 97% 83% Hf 68% 2 0 0 0 0 0 0
62% 4391 0 0 4394 291168 2298 12442 354838 0 0 0s 97% 87% Hs 71% 3 0 0 0 0 0 0
62% 4492 0 0 4500 297652 2357 10366 343166 0 0 0s 97% 84% Hn 72% 8 0 0 0 0 0 0
3% 296 0 0 296 19360 152 4 12 0 0 0s 98% 0% - 2% 0 0 0 0 0 0 0
Bad one snapshot with mem on the test datastore
6% 343 0 0 345 22572 177 3618 43658 0 0 0s 97% 68% Hf 18% 2 0 0 0 0 0 0
7% 658 0 0 662 42141 336 1712 22736 0 0 0s 94% 25% Hf 11% 4 0 0 0 0 0 0
6% 311 0 0 315 20044 158 1190 30324 0 0 0s 99% 23% Hf 11% 4 0 0 0 0 0 0
8% 420 0 0 427 27845 217 3332 44034 0 0 0s 98% 70% : 23% 7 0 0 0 0 0 0
6% 350 0 0 350 23204 181 1508 36868 0 0 0s 96% 48% Hf 11% 0 0 0 0 0 0 0
4% 312 0 0 314 20685 162 26 10238 0 0 0s 97% 21% : 11% 2 0 0 0 0 0 0
8% 328 0 0 333 21248 168 4180 46686 0 0 0s 96% 69% H 20% 5 0 0 0 0 0 0
8% 330 0 0 477 21878 170 2942 46850 0 0 0s 99% 78% H 16% 147 0 0 0 0 0 0
ESXTOP during snapshot
GID NAME MEMSZ GRANT SZTGT TCHD TCHD_W %ACTV %ACTVS %ACTVF %ACTVN SWCUR SWTGT SWR/s SWW/s LLSWR/s LLSWW/s OVHDUW OVHD OVHDMAX
22883 VM1 8192.00 7988.00 7379.71 1310.72 81.92 18 7 16 5 0.00 0.00 0.00 0.00 0.00 0.00 10.30 76.36 77.53
ESXTOP after snap
22883 VM1 8192.00 7988.00 7069.84 409.60 81.92 1 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 10.30 66.54 73.78
Also looks good...
How much time is a long running snapshot creation for you?
I dont think its a netapp Bottleneck. But the last thing we can check during snapshot is that:
priv set advanced # Set permissions mode for advanced admin rights
statit -b # Start statit data capture in background
statit -e # Stop statit capture and dump data
priv set # Return permissions mode to default
After "statit -b" a large output is done. Can you post it?
And then wee need to dig in esxtop:
esxtop -b -a -d 2 -n 300 | gzip -9c > esxtopoutput.csv.gz
You need to check that 300 literations is long enough. Sowe can dig in with virtualesxtop.
Hi Markus, thank you for your time.
a "long" snapshot takes 7 minutes. Its not that it is that bad for me but i am interested if this is a default limit of vmware or misconfig on my side.
Here the output of the statit -b and esxtop attached
5.59 blocks read 0.76 blocks read-ahead
0.12 chains read-ahead 0.00 dummy reads
0.35 blocks speculative read-ahead 28249.63 blocks written
52.85 stripes written 192.07 blocks page flipped
0.00 blocks over-written 0.00 wafl_timer generated CP
0.00 snapshot generated CP 0.00 wafl_avail_bufs generated CP
1.40 dirty_blk_cnt generated CP 0.00 full NV-log generated CP
0.00 back-to-back CP 0.00 flush generated CP
0.00 sync generated CP 0.00 deferred back-to-back CP
0.00 low mbufs generated CP 0.00 low datavecs generated CP
0.00 nvlog replay takeover time limit CP 3130.75 non-restart messages
0.00 IOWAIT suspends 0.00 next nvlog nearly full msecs
0.00 dirty buffer susp msecs 0.00 nvlog full susp msecs
0.00 nvlh susp msecs 611640 buffers
RAID Statistics (per second)
3454.59 xors 0.00 long dispatches [0]
0.00 long consumed [0] 0.00 long consumed hipri [0]
0.00 long low priority [0] 0.00 long high priority [0]
99.99 long monitor tics [0] 0.00 long monitor clears [0]
0.00 long dispatches [1] 0.00 long consumed [1]
0.00 long consumed hipri [1] 0.00 long low priority [1]
99.99 long high priority [1] 99.99 long monitor tics [1]
0.00 long monitor clears [1] 18 max batch
11.58 blocked mode xor 483.14 timed mode xor
4.19 fast adjustments 3.61 slow adjustments
0 avg batch start 0 avg stripe/msec
0.00 checksum dispatches 0.00 checksum consumed
53.90 tetrises written 0.00 master tetrises
0.00 slave tetrises 3168.41 stripes written
239.62 partial stripes 2928.79 full stripes
27414.30 blocks written 1040.67 blocks read
16.06 1 blocks per stripe size 8 16.12 2 blocks per stripe size 8
14.61 3 blocks per stripe size 8 14.67 4 blocks per stripe size 8
14.96 5 blocks per stripe size 8 16.82 6 blocks per stripe size 8
16.82 7 blocks per stripe size 8 1475.98 8 blocks per stripe size 8
18.92 1 blocks per stripe size 10 20.37 2 blocks per stripe size 10
11.87 3 blocks per stripe size 10 12.57 4 blocks per stripe size 10
6.64 5 blocks per stripe size 10 11.58 6 blocks per stripe size 10
11.12 7 blocks per stripe size 10 20.37 8 blocks per stripe size 10
16.12 9 blocks per stripe size 10 1452.81 10 blocks per stripe size 10
Network Interface Statistics (per second)
iface side bytes packets multicasts errors collisions pkt drops
e0a recv 952.84 9.72 0.00 0.00 0.00
xmit 113.90 0.70 0.00 0.00 0.00
e0b recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
e5a recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
e5b recv 120023224.89 63367.24 0.00 0.00 0.00
xmit 2061420.05 34818.91 0.00 0.00 0.00
c0a recv 408.59 15.37 1.98 0.00 0.00
xmit 389.73 2.10 1.98 0.00 0.00
c0b recv 408.59 2.10 1.98 0.00 0.00
xmit 389.73 2.10 1.98 0.00 0.00
e0M recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
e0P recv 126.42 1.75 0.00 0.00 0.00
xmit 127.81 1.75 0.00 0.00 0.00
vif0 recv 921.76 9.37 8.09 0.00 0.00
xmit 113.90 0.70 0.00 0.00 0.00
STORAGE recv 116012630.10 61321.05 0.35 0.00 0.00
xmit 1993618.25 33690.35 0.00 0.00 0.00
Disk Statistics (per second)
ut% is the percent of time the disk was busy.
xfers is the number of data-transfer commands issued per second.
xfers = ureads + writes + cpreads + greads + gwrites
chain is the average number of 4K blocks per command.
usecs is the average disk round-trip time per 4K block.
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs gr eads--chain-usecs gwrites-chain-usecs
/aggr0/plex0/rg0:
3d.20.4 34 38.30 0.70 1.00 1083 30.80 52.07 386 6.81 22.38 637 0.00 .... . 0.00 .... .
3a.20.1 33 38.30 0.70 1.00 917 30.80 52.07 382 6.81 21.85 655 0.00 .... . 0.00 .... .
3d.20.2 31 35.63 0.35 1.00 13333 27.59 55.69 369 7.68 6.11 1492 0.00 .... . 0.00 .... .
3d.20.0 23 28.12 0.35 1.00 21500 24.97 61.40 306 2.79 18.71 703 0.00 .... . 0.00 .... .
3a.20.3 24 28.06 0.47 1.00 7875 25.15 61.15 328 2.44 16.52 901 0.00 .... . 0.00 .... .
3a.20.5 23 27.88 0.29 2.00 2400 25.03 61.33 305 2.56 21.09 582 0.00 .... . 0.00 .... .
3d.20.6 23 27.83 0.06 1.00 28000 25.15 61.04 290 2.62 19.02 589 0.00 .... . 0.00 .... .
3a.20.7 23 28.12 0.17 1.00 39333 25.09 61.18 308 2.85 19.78 711 0.00 .... . 0.00 .... .
3d.20.8 23 27.71 0.23 1.00 27750 25.03 61.21 293 2.44 16.57 721 0.00 .... . 0.00 .... .
3a.20.9 24 28.47 0.81 1.00 22000 25.09 61.18 331 2.56 17.45 743 0.00 .... . 0.00 .... .
3d.20.10 23 27.36 0.12 1.00 23000 25.03 61.44 317 2.21 18.03 797 0.00 .... . 0.00 .... .
3a.20.11 23 27.94 0.29 1.00 26000 25.09 61.09 319 2.56 15.02 834 0.00 .... . 0.00 .... .
/aggr0/plex0/rg1:
3d.20.12 21 29.81 0.00 .... . 25.96 61.82 241 3.84 31.86 249 0.00 .... . 0.00 .... .
3a.20.13 22 29.81 0.00 .... . 25.96 61.68 268 3.84 32.12 274 0.00 .... . 0.00 .... .
3d.20.14 21 27.59 0.17 1.00 27333 25.26 61.31 262 2.15 18.59 452 0.00 .... . 0.00 .... .
3a.20.15 21 27.88 0.12 1.00 22000 25.26 61.39 249 2.50 20.35 479 0.00 .... . 0.00 .... .
3a.20.17 21 27.36 0.00 .... . 25.09 61.82 248 2.27 21.95 501 0.00 .... . 0.00 .... .
3d.20.18 20 27.71 0.29 1.00 5200 25.21 61.52 240 2.21 15.05 517 0.00 .... . 0.00 .... .
3a.20.19 21 28.87 0.64 1.00 4182 25.26 61.34 250 2.97 21.22 425 0.00 .... . 0.00 .... .
3d.20.20 21 27.94 0.47 1.00 16250 25.21 61.52 284 2.27 19.79 473 0.00 .... . 0.00 .... .
3a.20.21 21 27.83 0.29 1.00 40200 25.26 61.41 254 2.27 16.13 585 0.00 .... . 0.00 .... .
3a.20.23 22 27.77 0.41 1.00 29429 25.03 61.90 306 2.33 20.00 565 0.00 .... . 0.00 .... .
Aggregate statistics:
Minimum 20 27.36 0.00 24.97 2.15 0.00 0.00
Mean 24 29.34 0.29 25.79 3.20 0.00 0.00
Maximum 34 38.30 0.81 30.80 7.68 0.00 0.00
Spares and other disks:
3d.20.16 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.3 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.5 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.7 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.2 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.9 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.12 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.6 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.10 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.15 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.11 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.8 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.13 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.19 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.16 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.20 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.21 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.22 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.2 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.3 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
0a.10.23 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.6 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.8 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.10 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.7 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.5 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.13 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.16 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.9 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.11 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.12 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.15 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.20 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.19 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.21 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.23 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
3b.11.22 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
FCP Statistics (per second)
0.00 FCP Bytes recv 0.00 FCP Bytes sent
0.00 FCP ops
iSCSI Statistics (per second)
0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit
0.00 iSCSI ops
Interrupt Statistics (per second)
11.87 int_1 0.64 Flash Cache DMA (IRQ 2)
5498.81 int_3 285.54 int_4
339.50 int_5 0.35 int_6
0.35 int_7 0.35 int_8
0.35 int_9 0.35 int_11
19.27 int_12 9.60 Gigabit Ethernet (IRQ 13)
3.49 int_16 0.00 RTC
0.00 IPI 999.93 Msec Clock
7170.40 total
Cant post the esxtop sorry, but no wait no swap no counter that i can imagine for the 10% issue.
if you provide me your email and want to spend some more time ill provide the csv per mail.
And aditionally if you have access to a netapp system with esxi and nfs can you just check if snapshot with memory just uses 10% of storage migration?
Maybe its just an api limit.
Hi,
I thinks that’s the normal behavior and not a Bottleneck.
I tried today a SnapShot with memory on a really fast Array and a Host without any Bottlenecks.
VM had 4 GB Memory in Idle State.
Task took about 2.5 minutes
If the snapshot includes the memory option, the ESX host writes the memory of the virtual machine to disk.
Note: The virtual machine is stunned throughout the duration of time the memory is being written. The length of time of the stun cannot be pre-calculated, and is dependent on the performance of the disk that has issue and the amount of memory being written. ESXi/ESX 4.x and later have shorter stun times when memory is being written. For more information, see Taking a snapshot with virtual machine memory stuns the virtual machine while the memory is written ....
Hey Markus,
thanks for the Benchmark.
I did some experiments too and i still think there is a nfs bottleneck and in my opinion your bench proof it.
Today i copied my vm to local sas discs and see the snapshot creation of the vm with 8 gb ram tooks less than 4 minutes.
After this test i copied the vm back on nfs now just with a gbit link and again 8 minutes.
Its not a real bottleneck but it does not make sense that if i create a snapshot on 2 local raid one discs it takes less than 50% of the time as if i do on a 10gbit connected empty host nfs with 20 discs.
Especially is nfs sata or sas does not make a difference and 10gbit or 1 gbit does not change anything.
it is like you say no bottleneck it must be something deep in the nfs stack with the snapshot api. Because if i do storage vmotion i can copy 100GB of data on the same store in round about 10 minuets and on the other hand 8 GB Memory contend of an idle machine takes nearly the same amount of time.
Maybe ill open a support case now that i talked with you and checked for missconfiguration just to clarify what performance is expected and why nfs is so slow compared to local discs.
Or i just try another nfs targed or iscsi to cross check.
Thanks for your awesome support and i will let you know if i can figure it out finally if it is a normal behaviour or just weired nfs stuff.