VMware Cloud Community
Veei
Contributor
Contributor

Win2k8 AD server blue screen after hardware v7 upgrade

Hey all,

I'm at a major loss trying to recover from this problem and could use any advice I can get. The gist is that I have a Windows Server 2008 x64 Enterprise VM acting as the only DC for a child domain in our AD tree.

We have just upgraded from VI3 to vSphere on the vCenter and hosts. ESX upgrade went fine. Update manager didn't want to automatically update the 2k8 VM's VMware Tools so I manually updated it. After reboot, I ran update manager's VM hardware update baseline. As this is DC server, VMware strongly recommends never taking snapshots. So, before the hardware upgrade I browsed online for a while looking for any problems that anyone might have encountered with 2k8 and hardware v7. I saw no issues. As we do weekly system state backups, and since the 2k3 upgrades went so well, I went ahead with the h/w upgrade.

Upon reboot, the VM blue screens basically saying it cannot find the AD database (which is not on the same disk as the OS is installed on). The VM attaches to 3 different LUNs. The disks attached are:

Disk 1 - OS - 50GB - SAS iSCSI - LUN 1

Disk 2 - User Data - 2TB - SATA iSCSI - LUN 2

Disk 3 - Group Data - 1.6TB - SATA iSCSI - LUN 3

Disk 4 - Misc Backup Data - 500GB - SAS iSCSI - LUN 1

All disks in the VM are set on the same LSI Parallel controller.

The VM will not boot in any safe mode except for Directory Services Restore Mode. Once in DSRM, it can only see the OS drive. It starts trying to install drivers and it finds a driver for "Unidentified Device" and installs it and asks for reboot. It also dectects 32 "PCI to PCI Bridge" devices as well as the "SCSI Controller" and a "Base System Device"... all of which it doesn't know what to do with and prompts me to point it to where those drivers are. I go into Device Manager and update the SCSI driver and have it autodetect. It finds the LSI driver and attempts to install it but comes back with an error saying "The system cannot find the file specified". If I manually point it, it finds the driver but says it's not digitally signed... I select install anyway and it says "Access Denied".

Upon next reboot (with it only having installed the "Unidentified Device" driver), it blue screens again with an error saying that windows has halted to protect from damage and if any hard drive controllers had recently been added to remove them. Once I get this error, it will no longer boot into any mode including DSRM until I Load Last Known Good Hardware... and then I'm back to square one.

I would just do a system state restore but, just my luck, the backups are faulty.

So, I've spent a total of about 6 hours on the phone with VMware and they've come to the conclusion that there's something in the OS that got corrupted. I'm still confused at how the OS can see the C drive but can't see the D, E, and F drives since they are on the same virtual SCSI controller. Here's what we/I have tried so far:

**Created a new VM and attached the existing hard drives to them

**Removed VM from inventory, edited VMX file and changed h/w version from 7 to 4 and readded to inventory

**Deployed a 2k3 VM from a template and added the drives to it. The 2k3 VM saw the drives and the data on them fine.

**Ran SFC.EXE on the OS in DSRM. All checked out fine

**Followed instructions in a MS KB article on how to clear INF cache

**Tried installing VMware Tools 3.5 and loading drivers again

I could really use any advice or thoughts anyone might have on this.

0 Kudos
6 Replies
Veei
Contributor
Contributor

I have, for the most part, resolved my problem. Just thought I'd update the thread in case someone else encounters this.

This link led me towards the solution:

Win2k8 doesn't really like the SCSI controller changing. In fact, it gets really, really angry seeing as it doesn't have a driver for it.

I followed the procedure in that link to install the new SAS controller (which is default for 2k8 in vSphere). I'm not sure if the local Admin user, while in DSRM mode, is supposed to have full admin rights to the machine, but it doesn't for me. I had to basically copy a few files from C:\windows\system32\drivers (disk.sys, lsi_scsi.sys, lsi_sas.sys, e1g6032e.sys, and partmgr.sys) to the C:\Windows\INF folder. I also had to add the administrators group to give full rights to the infcache.1, infpub.dat, infstor.dat, and infstrng.dat files. It keeps modifying the permissions (only SYSTEM - Full and Users - Read have security entries on those files). The best thing to do is copy those files and don't force an install but reboot and let the system try to detect. VERY important, was, before rebooting, uninstall the "VMware Virtual disk SCSI Disk Device" in Device Manager if it showed up under "Other Devices" as those were the drivers causing the secondary BSOD (windows has halted to protect from damage). I am still not sure if copying those sys files in there and modifying the permissions were a good procedure so do it at your own risk if you decide to. But, it was the only way I was able to get my VM up and running again so I was willing to try anything.

Once all that was done (SCSI controller and disk drives installed correctly) the disks still did not show up in My Computer. I had to go into Disk Management and set them to online. Did one more reboot and back into DSRM to confirm. After that, I let it boot up normally and got in! I am still having troubles with Base System Device and a few Unidentified Devices still showing up with no drivers detected and the multiple PCI to PCI Bridge entries as well. That still has to be figured out but at least the drives are back up now and AD and DNS are working and all shares are up.

The moral of the story:

**Be extra careful when upgrading VMware Tools and Hardware with 2k8 (especially x64). I am NOT going to update the hardware on any of my other 2k8 servers.

**Don't snapshot AD VMs

**Make sure your System State backup is solid before going ahead!

0 Kudos
DSTAVERT
Immortal
Immortal

If you had an SR open for this it would be useful to have this passed on.

-- David -- VMware Communities Moderator
0 Kudos
burdweiser
Enthusiast
Enthusiast

Ya, I'm having this same issue right now. I'm going to work on it after I get some sleep.

0 Kudos
Rumple
Virtuoso
Virtuoso

While microsoft and vmware state do not snapshot AD controllers, I completely disregard that advise as I want to have a "bare metal" restore point to get back to incase something like that happens.

What I do instead is:

#1 - make sure the windows backups are working. Optimally I have other backups as well (symantec,etc)

#2 - If as a last resort I must rollback a change, I shutdown the machine, revert to the pre-change state and immediately boot in directory services restore mode. I then use the latest system state restore prior to the snapshot and apply it in NON-authoritative mode.

#3 - Bootup the VM and start doing AD checks to make sure AD is picking up the most recent changes from other DC's. If its the only DC I then run netdiag and dcdiag to validate that everything is ok.

Personally I'd like to have the option of at least getting my DC back from a backup that had a snapshot on it vs reinstalling from scratch and restoring system state. You just need to be very careful of the steps you follow and don't rollback without a hell of alot of thought behind it.

0 Kudos
burdweiser
Enthusiast
Enthusiast

I went through this same process, but I could not get the VM to survive on SCSI drives. The OS will now only run on IDE drives. Will I take a big performance hit?

I went through the process of adding my DC's C drive VMDK file to another VM and transfering files that way, instead of DSRM (I kept getting permissions issues).

0 Kudos
s1xth
VMware Employee
VMware Employee

So I have about 6 W2K 8 servers running on ESXi 3.5 hosts, I will be upgrading to 4.0 shortly, will I have these same issues with all my W2K8 VM's?? I have read anything about this anwhere, either not many people are upgrading to 4.0 yet or its not effecting everyone but I have a hard time believing that.

I am though, running W2K8 servers on a fresh ESXi 4 host with no issues, is this a UPGRADE only issue?

Thanks!

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
0 Kudos