7 Replies Latest reply on Jun 2, 2010 11:45 PM by richardt

    Storage Vmotion stops  at 18%

    richardt Novice

      HI all,

       

      Got a problem with storage VMotion in a Cluster of 4 esx host  (vsphere 4.0.0,208167)

       

      VMotion works ok. no errors/warning in the host's vmkernel or vmKwarning logs. Checked storage and paths no problems found.Run a test with a VM it can be storage VMotion on two out of the 4 esx hosts in the cluster.The same error is logged in the vm's vmwre.log

       

       

      Host  capacity memory  77%, CPU 34%

       

       

      The vm's vmware.log reports the following error:

       

       

      May 06 15:40:07.831: Worker#0| DISKLIB-VMFS :CopyData '/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1-flat.vmdk' : failed to move data (Cannot allocate memory:0xc0009).

       

       

      Any ideas?

      May 06 15:40:02.104: vmx| VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##9_20cc2/op/=to

      May 06 15:40:02.105: vmx| VmxMigrateGetParam: type: 2 srcIp=<10.1.2.7> dstIp=<10.1.2.7> mid=485e65f566323 uuid=44454c4c-3600-1033-8058-b9c04f323253 priority=none checksumMemory=no maxDowntime=0 encrypted=0 resumeDuringPageIn=no

      May 06 15:40:02.107: vmx| VmxMigrateGetParam: type 2 unsharedSwap 1 memMinToTransfer 0 cpuMinToTransfer 0 numDisks 1

      May 06 15:40:02.111: vmx| Received migrate 'to' request for mid id 1273124395311907, src ip <10.1.2.7>, dst ip <10.1.2.7>.

      May 06 15:40:02.142: vmx| MigrateSetInfo: state=1 srcIp=<10.1.2.7> dstIp=<10.1.2.7> mid=1273124395311907 uuid=44454c4c-3600-1033-8058-b9c04f323253 priority=low

      May 06 15:40:02.143: vmx| MigrateStateUpdate: Transitioning from state 0 to 1.

      May 06 15:40:06.707: vmx| VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##9_20ccc/op/=start

      May 06 15:40:06.718: vmx| VmxMigrateGetStartParam: mid=485e65f566323 dstwid=165341

      May 06 15:40:06.718: vmx| Received migrate 'start' request for mig id 1273124395311907, dest world id 165341.

      May 06 15:40:06.720: vmx| WORKER: Creating new group with numThreads=1 (4)

      May 06 15:40:06.723: vmx| CPT current = 0, requesting 1

      May 06 15:40:06.724: vmx| MigrateStateUpdate: Transitioning from state 1 to 2.

      May 06 15:40:06.725: vmx| Set thread 5 stack size to 1048576: Success

      May 06 15:40:06.726: vcpu-0| Sync monModules(1).

      May 06 15:40:06.729: vcpu-0| Done Sync monModules(1).

      May 06 15:40:06.729: vmx| CPT: monitor ACKing mode 1

      May 06 15:40:06.730: vmx| Destroying virtual dev for scsi0:0 vscsi=44625

      May 06 15:40:06.730: vmx| VMMon_VSCSIStopVports: No such target on adapter

      May 06 15:40:06.731: vmx| DISKLIB-VMFS  : "/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1-flat.vmdk" : closed.

      May 06 15:40:06.732: vmx| CPT current = 1, requesting 2

      May 06 15:40:06.733: vmx| CPT: monitor ACKing mode 2

      May 06 15:40:07.553: vmx| CPT current = 2, requesting 6

      May 06 15:40:07.563: vmx| Checkpoint_Unstun: vm stopped for 840003 us

      May 06 15:40:07.564: vcpu-0| Sync monModules(6).

      May 06 15:40:07.564: vcpu-0| Done Sync monModules(6).

      May 06 15:40:07.564: vmx| CPT: monitor ACKing mode 6

      May 06 15:40:07.569: vmx| DISK: OPEN scsi0:0 '/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1.vmdk' persistent R[]

      May 06 15:40:07.590: vmx| DISKLIB-VMFS  : "/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1-flat.vmdk" : open successful (10) size = 18376324096, hd = 101089547. Type 3

      May 06 15:40:07.590: vmx| DISKLIB-DSCPTR: Opened : "ns1-flat.vmdk" (0xa)

      May 06 15:40:07.590: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1.vmdk' (0xa): vmfs, 35891258 sectors / 17.1 GB.

      May 06 15:40:07.590: vmx| DISKLIB-LIB   : Opened "/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1.vmdk" (flags 0xa).

      May 06 15:40:07.591: vmx| DISK: OPEN '/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1.vmdk' Geo (2234/255/63) BIOS Geo (0/0/0)

      May 06 15:40:07.592: vmx| DISKLIB-CTK   : Auto blocksize for size 35891258 is 128.

      May 06 15:40:07.603: vmx| DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 101089547.

      May 06 15:40:07.604: vmx| DISKLIB-CBT   : Successfuly created cbt node 606810b-cbt.

      May 06 15:40:07.604: vmx| DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/606810b-cbt

      May 06 15:40:07.605: vmx| Creating virtual dev for scsi0:0

      May 06 15:40:07.605: vmx| DumpDiskInfo: scsi0:0 createType=11, capacity = 35891258, numLinks = 1, allocationType = 0

      May 06 15:40:07.606: vmx| SCSIDiskESXPopulateVDevDesc: Using FS backend

      May 06 15:40:07.653: Worker#0| DISKLIB-LIB   : CREATE: "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1.vmdk" -- vmfs capacity=0 (0 bytes) adapter=buslogic cowGran=0

      May 06 15:40:07.655: Worker#0| DISKLIB-LIB   : DiskLib_CreateChildCreateParam: grain size set to : 1

      May 06 15:40:07.656: Worker#0| DISKLIB-LIB   : CREATE: "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1.vmdk" -- vmfs capacity=35891258 (17.1 GB) adapter=buslogic cowGran=1

      May 06 15:40:07.730: Worker#0| DISKLIB-LINK  : "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1.vmdk" : creation successful.

      May 06 15:40:07.731: Worker#0| DISKLIB-VMFS  : "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1-flat.vmdk" : open successful (17) size = 18376324096, hd = 0. Type 3

      May 06 15:40:07.770: Worker#0| DISKLIB-VMFS  : "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1-flat.vmdk" : closed.

      May 06 15:40:07.773: Worker#0| DISKLIB-VMFS  : "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1-flat.vmdk" : open successful (528) size = 18376324096, hd = 151765263. Type 3

      May 06 15:40:07.773: Worker#0| DISKLIB-DSCPTR: Opened : "ns1-flat.vmdk" (0x210)

      May 06 15:40:07.774: Worker#0| DISKLIB-LINK  : Opened '/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1.vmdk' (0x210): vmfs, 35891258 sectors / 17.1 GB.

      May 06 15:40:07.774: Worker#0| DISKLIB-LIB   : Opened "/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1.vmdk" (flags 0x210).

      May 06 15:40:07.817: Worker#0| DISKLIB-DDB   : "longContentID" = "2594bb631353b27093f8f350dd11abc3" (was "329c79216ae7078d034051b559a20051")

      May 06 15:40:07.831: Worker#0| DISKLIB-VMFS :CopyData '/vmfs/volumes/4ae53235-da7ef348-3f56-001e4f1dd095/ns1/ns1-flat.vmdk' : failed to move data (Cannot allocate memory:0xc0009).

      May 06 15:40:07.831: Worker#0| DISKLIB-LIB   : DiskLib_Clone: Could

      not perform clone using vmkernel data mover. Falling back to

      non-accelerated clone. Cannot allocate memory

      May 06 15:40:11.880: vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/46f9b29c-8959b018-376f-001aa025a31a/ns1/ns1.vmdk'

      May 06 15:40:11.898: vcpu-0| DISKLIB-CHAIN : UpdateContentID: old = 0xdd11abc3, new = 0x8ae3b0a2

      May 06 15:40:11.899: vcpu-0| DISKLIB-DDB   : "longContentID" = "b5392b788526fa1622fe17f58ae3b0a2" (was "2594bb631353b27093f8f350dd11abc3")

      May 06 15:40:11.904: vcpu-0| DISKLIB-CHAIN : UpdateContentID: updated long content id

      May 06 15:44:47.816: Worker#0| DISKLIB-LIB   : numIOs = 50000 numMergedIOs = 0 numSplitIOs = 0

      May 06 15:44:47.824: Worker#0| DISKLIB-LIB   : numIOs = 50000 numMergedIOs = 0 numSplitIOs = 0

      May 06 15:49:52.996: Worker#0| DISKLIB-LIB   : numIOs = 100000 numMergedIOs = 0 numSplitIOs = 0

      May 06 15:49:53.031: Worker#0| DISKLIB-LIB   : numIOs = 100000 numMergedIOs = 0 numSplitIOs = 0

      May 06 15:55:42.037: Worker#0| DISKLIB-LIB   : numIOs = 150000 numMergedIOs = 0 numSplitIOs = 0

      May 06 15:55:42.047: Worker#0| DISKLIB-LIB   : numIOs = 150000 numMergedIOs = 0 numSplitIOs = 0

       

      Message was edited by: richardt

       

      Message was edited by: richardt

        • 1. Re: Storage Vmotion stops  at 18%
          golddiggie Virtuoso

          Are all four hosts identical? How about the spec's of the hosts? Are you actually using local datastores for the VM's (on each host)? How are the network's configured for each host (vSwitches, pNIC's and vNIC's configurations, specifications)?

           

          Storage vMotion is more for moving VM's from one LUN to another on shared storage/SAN's... While you can (probably) use it to shift between hosts' local storage (as you can with half your hosts) it's not recommended, nor really designed with that in mind.

           

           

          VMware VCP4

           

          Consider awarding points for "helpful" and/or "correct" answers.

          • 2. Re: Storage Vmotion stops  at 18%
            richardt Novice

            More inofrmation:

             

            All hosts are the same hardware dell r710. Storage is EMC cx3-20 and AX4-5.

             

             

            Storage Vmotion is between SAN volumes, all luns are visable to all hosts.

             

             

            Run a test with a VM:

             

            • VM on datastore "A' vmotion between all 4 Hosts ok.

            • VM is storage Vmotion to datastore "B" using host "8"

            • VM on datastore "B" vmotion between all 4 hosts ok.

            • The VM fails storage Vmotion between the same datastores as used above, on 2 out of the 4 host.

            • The error is the same on the 2 failed hosts.

             

            The VM can run on all host using both datastores, but can all storage Vmotion between datastores when on 2 of the hosts.

             

             

            The vmkernel log from a failed Storage Vmotion:

             

             

             

             

             

             

            May  7 06:37:00 vm5 vmkernel: 144:14:21:37.944 cpu7:217926)FSR: 1136: 1273178212912056 S: Swap pre-copy iteration 0: 0 swapped pages copied

            May  7 06:37:00 vm5 vmkernel: 144:14:21:37.944 cpu7:217926)FSR: 1148: 1273178212912056 S: Completed pre-copy iteration, with 0 changed pages.  Changed page count less than threshold (4096), ready to stop.

            May  7 06:37:00 vm5 vmkernel: 144:14:21:37.951 cpu1:217915)VSCSI: 6011: handle 69388(vscsi0:0):Destroying Device for world 217916 (pendCom 0)

            May  7 06:37:01 vm5 vmkernel: 144:14:21:38.749 cpu4:217915)CBT: 1031: Created device 3b78f1e-cbt for cbt driver with filehandle 62361374

            May  7 06:37:01 vm5 vmkernel: 144:14:21:38.750 cpu3:217915)VSCSI: 3472: handle 69389(vscsi0:0):Creating Virtual Device for world 217916 (FSS handle 53825311)

            May  7 06:37:01 vm5 vmkernel: 144:14:21:38.963 cpu4:217927)FS3DM: 1595: status Out of memory copying 16777216 bytes from file offset 0 to file offset 0, bytesTransferred = 0 extentsTransferred: 0

            • 3. Re: Storage Vmotion stops  at 18%
              esexon Enthusiast

              Found this link which may help.

               

              http://communities.vmware.com/thread/196142

               

              It talks about the Service Console memory. Also are all your hosts at the same revision?

               

              Evan

              1 person found this helpful
              • 4. Re: Storage Vmotion stops  at 18%
                richardt Novice

                Got an answer from vmware support:

                 

                 

                 

                 

                 

                 

                 

                "This problem is caused by the VMFS3-DM

                (Data Mover) having to use contiguous memory space on the host.

                 

                 

                 

                Apparently, when host's kernel memory

                usage is >50% and memory has been fragmented, the DM cannot allocate more

                space and throws the errors.

                 

                 

                 

                Note from the example above, the host

                has been up for 161 days.

                 

                 

                 

                 

                 

                 

                 

                 

                 

                Resolution:-

                 

                 

                 

                 

                 

                 

                 

                 

                 

                Until the fix is published (scheduled in

                upcoming releases), migrate sufficient VMs off the host to reduce memory

                consumption and attempt the sVmotion again."

                • 5. Re: Storage Vmotion stops  at 18%
                  jhanekom Master

                  Thanks for posting the workaround - saved me quite a bit of time today.  Had this once before and ended up cancelling the migration and forcing it to start on another host (shutting down the "faulty" host.)

                   

                  Bit of extra information:  in my case today, the host was only up for 13 days, so don't feel too bad that you leave your hosts running for longer periods!

                  • 6. Re: Storage Vmotion stops  at 18%
                    GarethAW Lurker

                    Thanks Richard, you have saved us a few hours of digging around by posting your response.

                     

                    Did VMWare allocate you a bug reference that we can watch out for in the release notes?

                     

                    Many Thanks,

                     

                    Gareth

                    • 7. Re: Storage Vmotion stops  at 18%
                      richardt Novice

                      Hi Gareth,

                      I didn't get a bug ref  :(.

                       

                       

                      Richard Taylor

                       

                       

                      On 02/06/2010, at 7:30 PM, "GarethAW" <communities-emailer@vmware.com