VMware Communities
nixvirgin
Contributor
Contributor

Viability of using VM Wks6 to host Production VM's

Evening All,

Some advise required if you would be kind enough...

I used to have my Exchange server outsourced and due to costs deceided to bring this inhouse, so I figured as I am looking to use VM technology more and more that I would put VM on a workstation and run an Exchange VM on that.

I started with Server 1.0.4 and started having some issues, the NTDS database and the Exchange DB started to show checksum mismatches.

So I figure I would move the VM into Workstation (Trial at this stage) assuming that the free product may not be as good and fast, long story short the Active Directory and Exchange databases could not be repaired, eseutils found them too damaged so I started again from a Vanilla VM and restored from a backup.

To my horror the same thing has started again, one Exchange DB error and the same NTDS error, See Below:

Event Type: Error

Event Source: ESE

Event Category: Logging/Recovery

Event ID: 478

Date: 30/11/2007

Time: 00:03:54

User: N/A

Computer: NLS2K301

Description:

Information Store (3308) The streaming page read from the file "C:\Program Files\Exchsrvr\mdbdata\priv1.stm" at offset 3792896 (0x000000000039e000) for 4096 (0x00001000) bytes failed verification due to a page checksum mismatch. The expected checksum was 3313873814 (0x00000000c585b396) and the actual checksum was 3313873878 (0x00000000c585b3d6). The read operation will fail with error -613 (0xfffffd9b). If this condition persists then please restore the database from a previous backup.

For more information, click .

Event Type: Error

Event Source: NTDS ISAM

Event Category: Database Page Cache

Event ID: 474

Date: 30/11/2007

Time: 21:04:37

User: N/A

Computer: NLS2K301

Description:

NTDS (420) NTDSA: The database page read from the file "C:\WINDOWS\ntds\ntds.dit" at offset 5316608 (0x0000000000512000) for 8192 (0x00002000) bytes failed verification due to a page checksum mismatch. The expected checksum was 3114843424 (0xb9a8bd20) and the actual checksum was 3114843624 (0xb9a8bde8). The read operation will fail with error -1018 (0xfffffc06). If this condition persists then please restore the database from a previous backup. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

For more information, see Help and Support Center at .

Having spent a good deal of time, MS recommend that this is more then likely hardware and should be corrected but how do I make sure the hardware is optimised or up to the task in VM !

Host machine is a new HP Compaq dc7700p with Intel Core 2 (E6400) @ 2.13Ghz, 3GB RAM (1GB Failed so left @ 3) with 250GB HDD (SATA - Non Mirrored) running Windows XP SP2 with bare minimum installed (drivers\office 2003\no anti-virus) - The host machine is barely used and devoted to the VM.

I would have thought this would have been fine running Exchange (<10 mailboxes) but really dont know what else to look at or check now...

Hoping someone can help.

Thanks

0 Kudos
14 Replies
asatoran
Immortal
Immortal

I've been running Exchange 2003 virtualized one Server1 almost since Server1 was in general release. Exchange size is similar to yours, around a dozen mailboxes. Host is much older, an old Compaq Proliant, PIII/733MHz, 1.25GB RAM. SCSI HDs instead of SATA. Win2k3 host OS. Currently on Server 1.0.3 because I haven't had time to update it to 1.0.4. Performance is a tad slow only because the host is slow, but otherwise, I've had no problems with Exchange VM on this host. Host also runs one other Win2k VM.


While it could be an issue with v1.0.4, I'm not inclined to believe so unless someone has other evidence. This was going to be a production server? You do realized that XP is not a supported host OS for Server, although there really isn't a problem running it this way IMO. But no mirrored HD on the host? Like MS is saying, are you sure it's not hardware? Have you checked the host's HD?


Edit: Although you say this is a new machine, you mentioned that 1GB of RAM "failed." That's not giving me assurance on your hardware. Smiley Sad

0 Kudos
nixvirgin
Contributor
Contributor

Thanks for the reply.

I was on server v1.0.4 but am currently running Workstation v6... I think I will stay with Workstation as the background snapshots and the snapshot manager are tools useful to me recently given the issues I have been having, it's been a good piece of mind taking snapshots at given points.

The ram was non-hp that I brought and a stick failed, the machine had Vista on it and the failed ram caused issues with the stability of the OS and so I removed it and installed XP and it's been running fine since.

Other then running a chkdsk, are there any other tools that would check the host hardware out more fully ? for what it's worth the host's event logs are clean...

Thanks again

edit: Reason for not mirroring the HDD's is that I figured with regular backups of the Virtual Machines directory I could just use another machine if this failed (go buy one and install VM on it), mirrored drives would not increase performance/read write speeds though would it....

0 Kudos
asatoran
Immortal
Immortal

Thanks for the reply.

I was on server v1.0.4 but am currently running Workstation v6... I think I will stay with Workstation as the background snapshots and the snapshot manager are tools useful to me recently given the issues I have been having, it's been a good piece of mind taking snapshots at given points.

WS6 is fine. I prefer Server for a production situation like this because Server can autostart a VM as a service. No manually starting the VM or creating a script to start the VM. As for the snapshots, I went "low-tech." Shut down VM and copy the entire VM directory. Yes, it would be nice to have multiple snapshots in Server, but there are issues with trying to move a VM to another host if you have snapshots. So for simplicity, I don't use snapshots much.

The ram was non-hp that I brought and a stick failed, the machine had Vista on it and the failed ram caused issues with the stability of the OS and so I removed it and installed XP and it's been running fine since.

Ok

Other then running a chkdsk, are there any other tools that would check the host hardware out more fully ? for what it's worth the host's event logs are clean...

Well, that'd be a place to start. Rule out the obvious. Make sure it's not hardware like broken fan or something. If your Exchange store is already so corrupt that you can't repair, and possibly the corruption had been happening for a while so your backups are also corrupt, how about just starting with a new information store. Yes, a pain to recreate everything, but with only a few mailboxes, I'd try to get everyone's mailbox backed up to a PST, then reload the PSTs when you create the new information store. (I've done that in the past as it was a known amount of time to get everything back running rather than keep trying to fix the store and praying that it's still not corrupt.)


Also, since you changed from Server to Workstation and you still get corruption, how about installing Server or Workstation on another machine to see if you still get corruption. For so few mailboxes, you don't need a lot of horsepower on your host, as my setup illustrates. Smiley Happy

Thanks again

edit: Reason for not mirroring the HDD's is that I figured with regular backups of the Virtual Machines directory I could just use another machine if this failed (go buy one and install VM on it), mirrored drives would not increase performance/read write speeds though would it....

Well, this is how you test your backup, I guess. :smileysilly: Without RAID or at least mirrored HDs, you are more dependant on your backups. And how much downtime are you willing to accept while you move the VM to the new HD or new machine. Also like I said above, snapshots can cause issue when moving the VM to a new machine. (Specifically absolute paths. You have to configure the new machine with the same path to the VMs.) In a production environment, I never have anything less than RAID1. "Cheap insurance vs. cheap"is how I sell it to the clients. Development machines I'll live with no RAID, but not a production machine. This is not about performace. This is about reliability.


And if you didn't notice the corruption right away, how long do you archive your backups? (Since you're more reliant on them without RAID1.) To be blunt, here you have a good example of how your DR procedures need improving. If your Exchange is not working now, your users aren't screaming at you? My clients are mostly small offices usually with less than a dozen people each. All of the ones using Exchange or other internal mail or collaboration app would be coming after me with pitchforks if the server is down for more than a day. Smiley Wink


(Sorry to be preaching on this last point, but I am always having to sit the client down and discuess the dangers and ramifications when the client doesn't think reliability and thinks cost. Or isn't paranoid enough about thier data. As the saying goes, there are two kinds of people: those that have lost data, and those that will lose data. I've had clients redo a month's worth of work because they thought that they had good backups but later found out that the person who was checking the logs or changing the tapes...wasn't, yada, yada, yada. Ok, I'm off the soapbox now.)


Perhaps someone else has other ideas....

0 Kudos
Peter_vm
Immortal
Immortal

I would run prolonged stress and memory tests on your hardware. Here are some tools:

Prime95 (CPU stress)

http://www.mersenne.org/freesoft.htm

HDTune (HD S.M.A.R.T)

http://www.hdtune.com/

Windows Memory Diagnostic (RAM test)

http://oca.microsoft.com/en/windiag.asp

Memtest86 (RAM test)

http://www.memtest86.com/

0 Kudos
nixvirgin
Contributor
Contributor

ok so I ran HD Tune and the Health of the disk is showing near threshold on "Spin Retry Count"

i will run the CPU and Mem tests but it is safe to assume then the host disk is not in great shape ?

thanks

0 Kudos
Peter_vm
Immortal
Immortal

ok so I ran HD Tune and the Health of the disk is showing near threshold on "Spin Retry Count"

What is a value in Data column?

i will run the CPU and Mem tests but it is safe to assume then the host disk is not in great shape ?

I would not trust that disk.

0 Kudos
nixvirgin
Contributor
Contributor

Thanks for the reply - the data is as below:

I dont fully understand the data so would this suggest that the disk should be replaced ?

thanks again

0 Kudos
Peter_vm
Immortal
Immortal

Oh no. From attached picture disk seem to be fine. No need to replace. When value larger than zero for that counter arrives in Data column, then there was a failure. Yours shows zero there.

0 Kudos
nixvirgin
Contributor
Contributor

ok - thats good then that the HDD is not showing errors, however I have also been running the MS memory diag from a cd and have had no errors so far - is it still worth using Memtest to try this also?

so if the HDD looks ok, the memory looks ok and the stress test on the CPU gave no problems - where else can I go with this ?

thanks

0 Kudos
Peter_vm
Immortal
Immortal

Were there any unexpected guest power-offs?

Is there a chance that guest virtual drive(s) might be located on compressed host NTFS folder?

0 Kudos
scottwarren
Contributor
Contributor

I recently had Disk corruption issues with VMWare 6 on vista. it 'seems' that the power-saving features of my laptop caused this. I had run Vmware for a while. But one day when I was working off line lowered the power consumption then I started getting corruptions again. I have no defininative test. But I am wondering is this your issue also ???

Regards

Scott Warren (http://www.ocom.com.au)

0 Kudos
nixvirgin
Contributor
Contributor

no power-off's that I am aware of, this AD and Exchange install were built last week and so since then I would have noticed something like that especially since the issues that I had with the last installation...

the v-drives are not on a compressed disk nor in a compressed folder... the host has 150gb free disk space so I have not needed to use compression and after having checked - its not compressed.

I really cannot think of anything that would cause this issue, and I cannot think of anything else to try - do you think i would be worth calling VMWare support ?

Thanks

0 Kudos
scottwarren
Contributor
Contributor

I just thought I would ask. I guess you should try Vmware support. I am not sure on the costs etc.. It looks to me that your hardware is fine from what I read in the forum.

Good Luck!

0 Kudos
nixvirgin
Contributor
Contributor

hi scottwarren - sorry i just noticed your post.

All power management features are off and have been for a while now... thanks for the tip though.

0 Kudos