P2V Repeatedly Fails at 98%

Scruffy_Nerfher · ‎02-11-2013

We have a 2003 SBS sp2 system that is our last unit to convert to a virtual machine. We are running ESXi 5.0.1. All other machine migrations have gone seamlessly; however, each time I attempt to P2V this system, invariably it errors out at 98% with an innocuous error message: A General System Error Occurred: Unknown Exception. The log files (attached) don't state anything that I can see as a concrete issue. One person did have a somewhat similar issue, though not exactly the same, where it was solved by doing a THIN conversion instead of a THICK conversion (see http://communities.vmware.com/thread/435092). I can't believe it'd be that simple, but...perhaps? Regardless, the primary issue is, at this point, I need something more concrete as this is the third weekend in a row I've had to take our primary server offline for almost 48 hours, only to have the P2V turn-around and fail.

PROCESS\HISTORY: In a nutshell, When the first P2V failed (with all system services running), I wasn't surprised. I was just hoping we could keep plugging along without any real downtime and then I could synchronize in the wee hours. So, I did a conversion of just he OS (C:\) volume, with ALL services killed, which went off without a hitch. Building on that, I tried another P2V of both the OS and data volumes (approximately 350GB), with the same settings as when I did just the C:\ volume. As with the previous successful attempt, it was executed locally (conversion of the local machine) directly over a CAT6 crossover cable to the ESXi server - all services killed. Just as with the initial P2V with all services running, this one died at 98%. I'm wondering if it's an avaiable drive space issue as we only have about 650GB of free space on the ESXi server. Does the P2V process create a temp file of the job at hand, then do a final conversion to THICK\THIN\ETC. at about the 98% mark? That would explain why it's dying with less than twice the space of the job at hand available but succeeding with smaller jobs.

Anyone have any ideas what's going on?

Thanks!

vmroyale · ‎02-11-2013

Hello and welcome to the communities.

Note: Discussion successfully moved from VMware ESXi 5 to Converter

a_nut_in · ‎02-11-2013

Logs show the following

013-01-28T13:47:23.265-05:00 [09668 info 'Default'] [,0] Unloaded hive mntApi389241279630276999
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ] NfcNetTcpRead: bRead: -1
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcNet_Recv: requested 264, recevied only 0 bytes
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcGetMessage: recv failed:
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcFssrvr_IO: failed to receive io reply
2013-01-28T13:47:42.515-05:00 [07596 info 'Default'] [,0] Sysimgbase_DiskLib_Write failed with 'NBD_ERR_NETWORK_CONNECT' (error code:2338)
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcNetTcpWrite: bWritten: -1
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcSendMessage: send failed: NFC_NETWORK_ERROR
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcFssrvr_IO: failed to send io message
2013-01-28T13:47:42.515-05:00 [07596 info 'Default'] [,0] Sysimgbase_DiskLib_Write failed with 'NBD_ERR_NETWORK_CONNECT' (error code:2338)
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcNetTcpWrite: bWritten: -1
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcSendMessage: send failed: NFC_NETWORK_ERROR
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcFssrvr_IO: failed to send io message
2013-01-28T13:47:42.515-05:00 [07596 info 'Default'] [,0] Sysimgbase_DiskLib_Write failed with 'NBD_ERR_NETWORK_CONNECT' (error code:2338)
2013-01-28T13:47:42.515-05:00 [07596 warning 'Default'] [,0] [NFC ERROR] NfcNetTcpWrite: bWritten: -1

How many drives or partitions does the VM have?

Can this be tried with one drive/partition at a time?

http://kb.vmware.com/kb/2018582

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!

continuum · ‎02-12-2013

at 98% the VM probably already arrived.

Can you start it ?

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Scruffy_Nerfher · ‎02-12-2013

Thank you (all) for replying. To answer your queries:

There are two volumes (on one RAID5 array) that are being cloned - a C:\ and an E:\ volume. Yes, I can boot the 98% drive but it always BSOD's...always. I did successfully get a clone of just the C:\ volume, which worked fine considering Exchange, SEP and several other critical applications are on the E:\ volume which wasn't cloned (obviously I inherited several "treats" when I took on my role in IT). That's what prompted me to think that the issue could be drive space. The clone of C:\ was only 19.5GB onto a datastore that had just over 650GB of available space. The clone of C:\ & E:\ was just over 350GB. So I'm left wondering if ESXi needs more space to complete the process as the clones do complete but it always seems to die during the reconfiguration process. I'd suspected the reconfiguration was a final conversion process, turning the cloned drive(s) into a usuable .VMDK, so I attempted a drive conversion, changing from the default THICK to THIN (e.g. vmkfstools -i SERVERNAME.vmdk -d thin /vmfs/volumes/datastore1/NEW_LOCATION/SERVERNAME.vmdk), with no change in performance or results. I did find this http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201858... (which one of you referenced in your reply) where someone simply changed the conversion type from the default THINK to THIN, before they kicked the job off, and the conversion succeeded. I can't imagine it'd be that simple, but possibly? The problem is, I can't keep taking down a production machine, hoping this time it'll work. I need more certainty in the problem, and a solution, before I tell our users, again, there's no server access for the next 36-48 hours.

As far as trying to make the failed 98% attempts viable, I tried doing a recovery of the OS, which was somewhat successful, but not nearly to the point that I felt the least bit confident at putting this into a production environment.

continuum · ‎02-12-2013

a BSOD is very likely if you used Converter with default options.
Your VM should use buslogic or Lsilogic scsi-controller but the Converter uses IDE per default.

That can be fixed easily - either rerun Converter and assign the buslogic controller manually or - just rewrite the vmx-file and run Converter "configure machine" again

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Scruffy_Nerfher · ‎02-12-2013

Thanks but I already had the same thought and verified that the conversion did take place with it set as LSI Logic Parallel (and this is still what it's currently set to - never changed it).

However, when the error occurred, after the first failed conversion, I also tried other controller types with no success.

continuum · ‎02-12-2013

did you try to run the configure job again ?

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Scruffy_Nerfher · ‎02-12-2013

No. The controller type is set to what I would have expected and, as Draconian as it sounds, when I start getting BSOD's, I lose faith in the image very quickly.

I'm going to go for another clone this weekend, although it'll be a "live" clone as I don't want to take the server down yet again. I'll be choosing a THIN disk type, instead of a THICK (as this seemed to have solved the issue in one of the other forums - http://communities.vmware.com/thread/435092), and I'll probably deselect the RECONFIGURE checkbox before I submit the job as that seems to be when\where it invariably dies. I'll update this thread with whether it finishes successfully or dies.

Thanks for your input!

continuum · ‎02-12-2013

You judge a bit hard over BSODs 😉
A bluescreen 7B is the expected state of the import after the imaging part is done at 98%

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Scruffy_Nerfher · ‎02-12-2013

True. But considering I have a Roman nose, it's probably hereditary.

Yes, the 7B is generally a drive driver issue. I guess I'm just spoiled, wanting a perfect migration, especially considering the others went so seamlessly...including our 2003 Terminal Server. I thought, for sure, that one would blow-up but, I killed the services, ran P2V and, bada-bing, there it was.

I can also try doing individual drive clones, on the 98% SBS server, and then simply join the two back together, after the fact - that might work. I'm suspecting I'll have better success this weekend. My only other remaining issue is that bloody physical parallel port output\mapping that ESXi 5.x seems to no longer support (http://communities.vmware.com/message/2193057#2193057). I don't suppose you'd care to chime in on that one? :smileygrin:

continuum · ‎02-13-2013

> I guess I'm just spoiled, wanting a perfect migration ...

That would be one where you do not see the BSOD because you do not start the VM between stage1(imagging) and stage2(patching)
But that says nothing about the quality of the patching process.
personally I would use Coldclone to import a SBS

About your parallel port problem - if possible I would avoid everything that needs to be directly attached to an ESXi host - other than a USB-boot stick.

Using devices that server ports via network may be more expensive - but in the long run it makes things so much easier

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Scruffy_Nerfher · ‎02-13-2013

I'll take a look at ColdClone.

As for the parallel port deal, it all comes down to money. I work for a manufacturing company - they have absolutely no idea of the truly indispensible nature of IT. All they know is that IT costs them money and they don't like that. The funny thing is, that $9,000 for the software that would last (perhaps?) in perpetuity is still $1,000 less than a one day crane rental. It's all about priorities. I've talked, preached, pleaded, presented and demanded but what's heard is that a non-critical department (IT) needs money. The only reason this VMWare project is happening is because my boss went to the COO and told him there is no way to avoid this migration. Without going into too much detail, we've already had the budget meeting. For me to now go back and say I need another $9,000 - talk about it hitting the fan.

Scruffy_Nerfher · ‎02-18-2013

Well....it looks like changing the option from THICK to THIN did it. In fact, unlike before, where I had killed all the services and run it as THICK, I left all services as-is, with the exception of antivirus and the APC battery backup software (service). Everything I've read, everyone I've talked to, has said that killing the services is the silver bullet but this seems to have actually been what was needed.

So, at 4:44 AM, when I checked in on my patient, I was very pleasantly surprised to find that the P2V had completed successfully. It still leaves me wondering if the issue isn't free drive space related as it always died during the reconfiguration process. Morbid curiosity has me wondering.

Regardless, that was the proverbial thorn in my side. And it is no mas. :smileygrin: