I have spent the last few days trying to find a free backup solution to the newly free ESXi for windows only enviroments (in particular Windows XP). The solution for me was the following:
1. Installing Windows Services for UNIX (WSFU)
2. Copying the ESXi Server password and group files to Windows
3. Configuring WSFU for accepting ESX Server connections
4. Sharing the Windows folder for NFS compatibility
5. Configuring the ESXi Server to mount the Window NFS Share as Datastore.
6. Setup Backup Script
Attached is the complete steps.
I take NO credit for any of this. This is just a complation of others work formated to suit my needs and felt others could benift from it as I have.
Has anybody spent any amount of time working on the restore functionality of each of these scripts, especially in a disaster recovery type environment?
Personally I believe the restore process is going to be up to the end user's to implement. There is no, one solution fits all for restoring your VM(s) whether it's a recovery from backup or a disaster recovery. Even the backup process itself will vary from environment to environment as you can see from the extensive feature requests within this thread on the various scripts. I think however you choose to restore your VM(s), if you're able to manually walk through that process following a document (which you should) then it can be automated with a script and should be designed/implemented by the end users. If you're looking for a more streamline approach with the ability to test your DR/recovery, take a look at VMware SRM. It's a great product if you're looking to do advanced testing and planning with a recovery/DR.
Instead of creating a new vm you can browse the backup ds and select the vmx file of the vm you want
to run, right click and select add to inventory, make sure to leave the name field blank so it uses the same name.
Then after your prod box is back up reverse the source/dest datastores on your backup script to move the vms back, then readd to inventory again.
This is also how I do file level restores. I just add the backed up vm to inventory power on to another network and restore the files.
Wow kchawk, that's an awesome solution that I was unaware of. We might change our file level backup and recovery strategies with this information.
Obviously, it could cause major problems if you were to bring up two identical guests on your LAN, especially if they are AD servers. If you bring up the guest with the NICs disabled, is there an easy way to grab a file through the VIC and transfer it to your local machine? I am starting to research shared folders in VMware tools. Is this how you are accomplishing this?
Thanks in advance!
I attach the vm I am restoring from to a blank network, power it on and then change the ip. My admin vm is connected to the same blank network, two nics, which allows me to copy to where ever I need. Another way I have done it is to power down my admin vm and attach the vmdk of the vm I want to restore from. For me the first option is better because I am mostly working with linux vms but the admin pc is xp for VI Client.
I'm testing ghettoVCB as a backup solution for our ESXi production servers. We currently have a two-node ESXi cluster with iSCSI SAN shared storage. There are two primary datastores on the SAN mounted into each ESXi host, one store for the VM O/S vmdk, the other (larger) store for the VM data vmdk. We also have an Openfiler 2.3 host between the two, and each ESXi host mounts a NFS share for temp storage. We use the temp storage for additional disks that are mounted into each VM specifically for swap files. We also use the NFS store for vswp files and also for snapshot creations. This is accomplished by using the 'WorkingDir' variable in the VM .vmx file. This works fine, and means unnecessary temp/swap data is kept off the SAN.
When ghettoVCB runs, the snapshot is created in the host folder on the NFS share ok, and the backup is created on another NFS mount (vmbackup) on another Openfiler 2.3 host, but I still get the following error when ghettoVCB starts backing up each host;
DiskLib_Check() failed for source disk The system cannot find the file specified (25)
The snapshots are being deleted ok after ghettoVCB finishes, so it's using the external volume ok. As we have the vmdk's split over multiple datastores, could this still be the problem?
Ich bin bis einschließlich 13.02.09 nicht im Hause. E-Mails werden nicht bearbeitet oder weitergeleitet. In dringenden Angelegenheiten wenden Sie sich bitte an meinen Kollegen Herrn Venker. Email: firstname.lastname@example.org Telefon: +49 2572 927 0. Über unsere Telefonzentrale werden Sie an Herrn Venker weitergeleitet.
Mit freundlichen Grüßen,
Schmitz-Werke GmbH + Co. KG
48282 Emsdetten - Germany
fon: +49 (0) 2572 927-198
fax: +49 (0) 2572 927-9198
KG, Sitz Emsdetten, Amtsgericht Steinfurt HRA 3133
Komplementärin: C.H. Schmitz Beteiligungsges. mbH
Sitz Emsdetten, Amtsgericht Steinfurt HRB 3612
Geschäftsführer: Justus Schmitz
Diese E-Mail ist revisionssicher archiviert.
This e-mail is legal audit proof archived.
I’m in search of a good backup solution for ESXi as well as a high-availability strategy for a few VMs across two servers that are hosting a few web server VMs. I can administer them remotely via a VPN. I’ve read through every post, which took the better part of day to assimilate. Here is how my thoughts are gelling for me. I’d like feedback as to any “flys in the ointment” you may see in my thought process.
Two web servers where I need good backups, rapid recover, and near HA on some of the VMs.
1. Two servers with dual 146 SCSI’s in a mirror to run the VMs from and to boot ESXi. Install a 250 Gig SATA II on each server for backup. They each backup to the other’s SATA. I have 2 gigabit network interfaces, one private that can access a private network and all resources, and a public one. If one has a problem, the other can mount the VM backup of the other machine. I can reverse the source and destination in lamw script until I restore the other machine. I have a gigabit NFS and iSCSI available if they somehow are needed to play a role in the HA scheme, I would still need the 250 SATAs since I jwould still need a backup to have a plan B if something goes wrong with their array. The only thing that would change if I used the hoster’s NFS or iSCSI is I would not store the backups locally instead of on the opposite server.
2. For an NFS host, the best pick would be Windows Server 2008 Web Edition (32bit). It would inhabit the 250 SATA drive. It could run Windows Services for UNIX (WSFU) to present an NFS data store to ESXi. Windows is the only OS that can run the VI Client. A lot of the services of the client would be a lot more useful since they don’t need to traverse a WAN. I could connect to it using RDP over the VPN to run the client. I could simply map a drive to it via the VPN and drag files to and from it. It has an easy to use and easy to audit scheduler. All other VMware tools are available for both Windows and ?NIX. People report the 32 bit version runs much faster under ESXi than the 64 bit. Windows server is required when I upgrade to other VMware services. Windows Server 2008 Web Edition is free where I’m hosting. It also allows me to run Windows-based web sites for applications that require MS technologies. These technologies could also be shared by ?NIX hosts such as MSSQL etc.
3. Use Windows to kick off lamw’s script to backup to the opposite server across the private network.
I'd appreciate your thought about this strategy.
I think I've actually resolved my own problem. I inserted the snip of code from a couple of pages back that will search for all vmdk's linked to the VM. This actually seems to work really well, and in my testing, it does indeed backup all the vmdk files, even if they are spread across multiple data stores. This has removed the error (25) I was getting, and the script now runs clean. This though, actually introduces another issue that I'd like to be considered.
A VM will often have a small O/S vmdk, and a larger DATA vmdk (at least that's how I configure them). This allows me to locate vmdk's for the VM in different datastores that are optimized for that particular type of storage. The issue is that the actual files in the DATA vmdk's are often backed up using traditional methods (i.e. Tape) at the filesystem level, and I'd rather not have these cloned via the script. These data vmdk's are ususally larger (>20GB). I also have seperate vmdk's to hold VM host swap files, which are generally smaller (<4GB), and I'd rather not have these backed up either as they can be easily re-created.
What I thought was, would it be possible to include upper and lower vmdk size limits to we can control which vmdk's get cloned by the script. I'm thinking if you set both the uppper and lower variables to 0, it would backup everything, while setting a lower limit would not backup vmdk's smaller than the value, likewise the uppper limit would skip vmdk's >= the value. This would allow granular control of which vmdk's are cloned even if they are spread across multiple stores. If I went back to the original script it would only backup the vmdk's in the primary VM folder, but we'd start getting the error (25)'s again when the script fails to find linked vmdk's.
Glad you took some time to go over the thread. Majority of the time questions can be answered by going over past comments, though this is a pretty hefty thread to have to go through.
So in my opinion, this is not a bad way to go about for recovery solution and if I understand you correctly you're looking at a 2 phase backup strategy?
Both ESX-1/2 have mirrored 146 GB and they each have an additional SATA drive which will be local storage backup? Then you're also looking at implementing an NFS Server running on Windows as additional backup which can be mounted as a datastore to either host as an extra precautionary backup?
The additional local storage backup is good idea but could also be redundant. If you have your boot/vm drive mirrored then that should protect against failure of one of those drives but having an additional drive within that same chassis will not protect against hardware failure of that system (cpu,memor,networking,psu,etc). Depending on the number of VM(s) you're looking to manage in this environment and if you plan for growth, what you could do is leverage the NFS Server for backups but also using the fact that it can be mounted on either ESX-1/2 to create a replicate copy of each hosts VM(s) on the other server's local storage. If you run into hardware failure, you automatically have the other set of VM(s) residing on the opposite host that can be booted up relatively quickly assuming all configurations are matched exactly the same and you still have all backups stored on the NFS Server.
Again, this strategy will only be as useful as the redundancy you have on external dependencies such as redundant network configurations and to external switches, power supply, etc.
I've not used Windows NFS Services before, but as you probably noticed from looking at this thread and at the other threads regarding NFS backup on either Linux/Windows, speeds will vary depending on configurations/etc.
Hopefully this answered some of your questions and there are many solutions to the problem, so by all means post in the overall ESXi forum if you want additional feedback about your back/recovery design.
Thanks TONS for repsonding so rapidly.
So in my opinion, this is not a bad way to go about for recovery solution and if I understand you correctly you're looking at a 2 phase backup strategy? Both ESX-1/2 have mirrored 146 GB and they each have an additional SATA drive which will be local storage backup? Then you're also looking at implementing an NFS Server running on Windows as additional backup which can be mounted as a datastore to either host as an extra precautionary backup?[/b]
No, my thoughts were to use only local storage. One of the VMs would be a WIndows VM running Windows and presenting an NFS volume by using the local 250 SATA. However, the backups for each server would be stored on the opposite server. If one server goes off line, the other could run the VMs for both servers temporarily. I could also restore a new server from a backup on the opposite server since it would contain the backups.
I've not used Windows NFS Services before, but as you probably noticed from looking at this thread and at the other threads regarding NFS backup on either Linux/Windows, speeds will vary depending on configurations/etc.[/b]
Reading the thread, speeds are all over the map. I don't what is fast and what isn't. I haven't read all of the side threads, including the ones you started specifically for NFS and another related one yet. The private network where I'm at has reputation of people are ditching their local drives because they say the iSCSI and NFS is faster than local plus it's redundant. In my mind, that statement doesn't fly because the math doesn't work for a gigabit interface compared to local storage. However, it does make a statement about the quality of their private network. Thus, backing up from one ESXi server to the opposite server's NFS VM, should be about as good as it gets across a network, whatever that may be. I can use these 250s any way I want to or ditch them. If it makes more sense to backup to the local 250 and then ferry it across, I'm open to that also. 500 Gigs on their NAS would cost me about $250/mo. whereas 250 gigs on each server would cost me about $40/mo. I'd like to hear what you would do.
Well if you're planning on running an NFS Server whether Windows/Linux, just be warn if you don't get the speeds you're expecting. We've done tests in the past and having a physical NFS Server just gave us better performance, not to say you can't or you won't see decent speeds but there's been plenty of user's trying NFS on a VM and saw less than up to par performance. Cost is always a factor when deciding a solution but before you look at that, give some of these scenarios a run before committing to any purchases, I know it's hard if you're talking about local storage/etc. but see if you can get some loaners, I know HP has programs for their servers as such but not sure what it covers when talking about disk(s). If you happen to have spare hardware around, try this in a lab to see what type of speeds you're getting and what you're willing to deal with. I assume that may also be a driving factor balancing cost and performance.
It's nice to get a design/plan out, but good part of the implementation is to also verify and test that the solution will in fact work as stated and in a timely manner that is expected.
there's been plenty of user's trying NFS on a VM and saw less than up to par performance
Hmm...I had to have time to reflect on that. So to summarize, from your experience and anything you have seen, when a VM runs the NFS, there will be a performance problem whether that VM is local or remote? I assumed that across the network to a VM running NFS would be as fast as going to a dedicated NFS host and that the network would be the limiting factor.
What if I ran Xtravirt Virtual SAN Appliance on both machines, and did the cross backups that way? The backup would be going from one data store to another except it would be across the network. I know you mentioned it, but I don't know if you have experience with it or not. What's your best guess as to whether or not that might eliminate all or most of the VM performance penalty?
see if you can get some loaners
The servers are dedicated servers that we rent by the month. We can add or remove anything we want to including drives, memory, and CPUs. iSCSI costs $.75/gig and NFS is $.50/gig per month. If it ends up making more sense in the overall scheme of things, we will do this.
It's nice to get a design/plan out, but good part of the implementation is to also verify and test that the solution will in fact work as stated and in a timely manner that is expected.
True, but human progress depends on learning from the experiences of those who have gone before. I'm trying to narrow down the list of things to try to those that have the highest probability of success.
My Perlese is definitely not at the level where I can understand all the greps, awks and regular expressions the way lawm wrote them. I've spent some time trying to do what I need and I just can't get there.
So I'm hoping someone already did this and they can share the code.
I need to setup an exclusion list rather than inclusion list as it is now. So my concept is this:
Get all the VM names from the exclusionlist.txt into an array, call it $Exclusions (or @Exclusions)
Get the list of all VMs on the system by using vmsvc/getallvms (as done already, captured in /tmp/vms_list
Now compare the two, remove any VMs that appear in exclusions from /tmp/vms_list
Proceed with backup based on /tmp/vms_list, one by one.
I could code this in AutoIT but it's useless in shell / ESX environment.
I've been using Sanbe's ESXi-Backup.pl script modified by Khaliss (ESXi-Backup_mod_by_Khaliss.pl) to copy a running VM to an openfiler NAS unit as stage 1 of my backup strategy. Unfortunately, I can't figure out how to output the "printed" lines of that script to a log file. I have been searching the web to see if there is an easy way to do this, but can't seem to make any of the suggestions work. Has anybody here been doing the same and outputting to a logfile? Thanks in advance,
Here is a script that I use to backup my ESXi servers with the VMs running on local DAS.
Basically, we have 2 separate RAID-5 volumes on the local server, and each is mounted as a datastore. We also have a NFS mounted datastore (Celerra NS20).
The script will snap the image, make the first copy to local storage, then make a second copy to the NFS storage. The script has the logic to also create an "archive"...which is the previous image backup.
In short - the script will provide 2 of recent image backups on local storage and 2 on the NFS remote storage.
The script will also generate a timed log file with a lot of output...but you can modify the output to minimize to what you want. The "runbackup.bat" calls the "backupscript2Celerra.bat". We have these batch files installed on a server with RCLI and putty/plink to run as a scheduled task. So far, it has run for nearly 1 month without problem.
This script is derived from another script I found...but I don't remember where, so I apologize for not being able to "give credit". I'm quite certain that it was contributed to the communities.vmware.com forum. SO >>> Thanks to whomever gave me the ideas to build on!
We have no trouble running an older version of your script lamw.
Recently we tried to update the script and ran into this error:
"hs-vsbackup1 ~ # plink -pw ********* email@example.com /vmfs/volumes/Backups1/backupvs1.sh
/vmfs/volumes/Backups1/backupvs1.sh: /vmfs/volumes/Backups1/backupvs1.sh: 2: /vmfs/volumes/Backups1/ghettoVCB.sh: not found"
So I logged into the ESXi server directly and ran it:
" /vmfs/volumes/63cf7fd9-fab2ac18 # ./ghettoVCB.sh
-ash: ./ghettoVCB.sh: not found"
What are we missing? The file is there:
" /vmfs/volumes/63cf7fd9-fab2ac18 # ls -la ghettoVCB.sh
-rwxrwxrwx 1 nfsnobod nfsnobod 11536 Feb 10 21:31 ghettoVCB.sh"
Found the problem:
The version of ghettoVCB.sh from this link:
Seems to have some Windows "^M" characters in it.
I went to a Linux machine and used wget to retrieve the script. We fixed it by running it through a text editor that could get rid of them.
yes my problem are the SATA hdds in the storage. I changed to SAS and its running.
But since friday I have a new strange problem.
If I try to start ghetto-vb on the esxi 3.5.0 Build 143129 I get the message
ash: Cannot fork
This problem is now on all 3 ESxi server. The vm are all running, I can access the esxi per infrastructure client, but I can access only 2 esx server per ssh, the third is not accepting the ssh connection anymore.
I shutdown all vm and tried to reboot the esx. The reboot does not work. I had to make a reboot direct on the console. After that its running fine.
Maybe someone knows what the problem could be ?
With esxtop I could not see any problems.