ghettoVCB.sh - Free alternative for backing up VM's for ESX(i) 3.5, 4.x, 5.x, 6.x & 7.x

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

bando · ‎03-19-2010

I recently ran ghetto backup to backup a system with quiesce and memory switches turned on (backup memory). When I restored the system I expected it to come up where it had left off (during the backup glxgears were spinning) and I was logged on to the system via a console. After the backup we shutdown the original system and powered up the restored one. Shouldn't it have brought me into the console sessions with the glxgears running? If not, then what does the memory backup do? many thanks...

lamw · ‎03-19-2010

bando,

Sorry about the last reply, I must be having a brain freeze this morning (still recovering from a cold).

So this option was initially added for testing and in case users wanted the snapshot to also capture the state of the system in case something happened. This state is stored in the .vmsn file which corresponds to a VM snapshot being taken. The snapshot is used so that the primary disks would be unlocked and that we could back up the VM, once the backup completed, the snapshot would then be committed and the delta would be merged back and .vmsn would be removed. The backup would contain all changes up to the snapshot, all changes after would be in the "live" running VM and would not be captured as part of the backup. The backups will always contain an offline VM as it does not capture the snapshots/etc.

So the behavior is expected, unless you capture the specific snapshot + .vmsn, upon restore would you then keep the same state. Again, snapshots are not meant to be kept for long durations or backed up.

Another issue around the .vmsn is if you change hardware, you may not be able to power on the VM.

I thought I had this documented but perhaps I missed it, this maybe something I just need to remove all together in a future release to not cause any confusion.

Hopefully that answered your question.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

bando · ‎03-19-2010

Cool... this now sets me to thinking, and that leads to other questions:

1. The memory file size issue is mainly determining how many Gbytes of memory you have and that adds to the overall size of the backup & snapshot space... is that the concern?

2. While we're at it, I figure that for a minimum snapshot free space of an average 20Gbyte VM client that is backed up should be a couple Gbytes. Most clients I've backed up so far are on VM servers with plenty of Gbytes free space so no issues EXCEPT for one that failed because it had only 1/2 Gbytes free space. I test backed it up with the POWER_DOWN switch and it worked perfectly as expected. Is there a rule of thumb formula for determining snapshot space?

3. Our database backups are performed frequently and separate from the ghetto VM client backups. That way I perform a quiesced monthly backup of those mysql servers for the purpose of a "system" restore should we have a major failure. The ghetto client backup is a monthly thing mainly to restore the system, should we have a VM server or client failure; then the database would be restored separately. It seems like a prudent use of the tools... any comments?

Many thanks...

lamw · ‎03-19-2010

1) That is correct, when you capture the memory, now you have to account for the additional .vmsn which is equal to the size of the configured VM. Also, the script doesn't backup snapshots, so capturing the .vmsn would not be helpful. It's only useful when you revert/goto a specific snapshot (point in time) and you wanted to preserve memory state, remember all backups of the VM is offline whether or not the source was running or not.

2) No rule thumb or formula unfortunately, it really depends on the VM and the application that is running and how much changes are going on during the duration of the backup. The longer the backup takes to transfer, potentially larger the snapshot gets. I guess if you really want to be safe, the maximum size of a single snapshot that it could ever be generated (assuming no funny glitches) is the largest VM that you have in your environment. Remember a that a single snapshot can grow up to the original size of the VM and that multiple snapshots can surpass the size of your origianl VM. Though in general, I think 20-30% free space would be more than enough

3) Yea I've been asked about this and unfortunately I can't provide too much insight into it since it really depends on your database and number of transactions occuring. In general, it's a best practice to use tools such as VSS or other quiescing tools within the OS to ensure consistency, hence the quiesce option when taking a snapshot, I know on the db side there are things you could do, though I'm not a dba, so I can't confirm. It's good that you're at least capturing a system image in case of a failure, at least you'll have an image of the system. I know some users have played it safe and shutdown their database to take a backup, though I know majority of the environments out there can't take a downtime for backups and must do it live. I would also recommend looking at backups at the database layer, perhaps that might be better for things like SQL Server or Oracle

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

http://pastebin.com/YN4Xpuav

If you find this information useful, please award points for "correct" or "helpful".

gautelund · ‎03-20-2010

I'd advise you drop the "all_squash,anonuid=99,anongid=99" config and try "no_root_squash". At least at first. If that works, you know the cause of your permissions problems.

I'd just use no_root_squash if I was you, but if you absolutely have to, you can then try to get it working with other permissions.

tohe · ‎03-21-2010

Thank you very much, the hint with "no_root_squash" was perfect. I have set my options now to "fsid=0,crossmnt,rw,no_root_squash,async,no_subtree_check,anonuid=99,anongid=99".

DjinjiRinji · ‎03-22-2010

Hi lamw i posted the log output.

Thnaks in advance

lamw · ‎03-22-2010

Can you re-run your backup using dryrun mode, unfortunately from these logs, there isn't much I can go off of.

Thanks

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

http://pastebin.com/pvX8L2CS

If you find this information useful, please award points for "correct" or "helpful".

DjinjiRinji · ‎03-22-2010

Dry run looks nice.... It founds the vmdk file.

lamw · ‎03-22-2010

So I don't recall if this is a use case I had tested extensively, where your "source" VM is not directly located at the root of the datastore, there is a sub-folder in which the VM is hosted in.

I would need to run a few tests to see if this would cause any issues.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

http://communities.vmware.com/message/1499246#1499246

If you find this information useful, please award points for "correct" or "helpful".

romgo75 · ‎03-22-2010

Hi,

I've got an issue which is described here :

More information :

- My datastore has 1.3T available.

- I use a local datastore for my backup (the VM are on the same datastore)

- I want to compress my backup

I'll erase the snapshot which prevent me to do the backup again, and I'll try to reproduce the issue in debug mode.

lamw · ‎03-22-2010

Careful with compression on ESXi, with ESXi 3.5 the maximum size is 4GB and with ESXi 4.0 the maximum size is 8GB. This is limitation of the ESXi unsupported Busybox console, so you won't be able to use compression to properly extract the contents upon a restore.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

DjinjiRinji · ‎03-22-2010

Hi Lamw.... one thing i see... if i choose thin disk to backup everything goes OK.

What could be the problem using zeroedthick disk type?

mylesw · ‎03-22-2010

>Careful with compression on ESXi, with ESXi 3.5 the maximum size is 4GB and with ESXi 4.0 the maximum size is 8GB.

Interesting... Does this limitation flow through to OVFTool as well? I had some enormous problems migrating a VM on ESXi 3.5 to another ESXi 3.5 box on the weekend due to what appeared to be a max size of 8GB per 'chunk'. But even creating an OVF with 7GB chunksize failed as well, so I'm not sure if this was related. Surprisingly migration using VM Converter worked flawlessly.

Anyway not trying to go too 'off-topic' with this - but am curious if the compression issues stated here would affect any app or script on these boxes and not just ghettoVCB?

lamw · ‎03-22-2010

That's pretty interesting, there shouldn't be any issues using any of the disk formats. Are you seeing this problem with all your VMs? or just this specific one? Can you verify this?

What disk format is your source VM and how big is it?

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

lamw · ‎03-22-2010

I've not done any testing with ovftool, though I would hope not, though can't say for sure. The actual compressed limitation was identified by a user, I forget which comment it was noted in. This limitation of the utlities within the Busybox console, this should not affect classic ESX.

I've thought about it in the past to use ovftool to generate a backup but haven't spent much time and it might actually be pretty slow and on larger VMs, it might not be too effective.

Curious, have you ran into the same issue if you manually exported using the vSphere Client out to OVF format?

Regarding your general question, this probably affects any type of compression with the use of tar with any files within the unsupported console. Technically there shouldn't be anything that would need to compress a file larger than 8GB, even the backups that are taken of the ESXi configuration is in kb sizes. So there should not be a need, remember that this console is only for troubleshooting purposes, it's not meant to be a general console for you to install or store files. Hence 8GB probably is more than enough

I still need to update some documentation around the compressed limitation + XFS resolution to NFS problem that some of the users have seen. Hopefully get to that sometime this week and update the FAQ's.

Thanks

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

mylesw · ‎03-22-2010

>I've thought about it in the past to use ovftool to generate a backup but haven't spent much time and it might actually be pretty slow and on larger VMs, it might not be too effective.

No, actually its pretty fast - mainly because the compression is very good. The problem, however, is that it forces you to power down the source VM before you start. That doesn't work for us in a 24/7 environment.

>Curious, have you ran into the same issue if you manually exported using the vSphere Client out to OVF format?

Strange thing is that using VM Converter (converting source VM to target VM across ESXi hosts) worked without an issue. Mind you, it took 6 hours to run on a 80GB drive. But it did maintain all the correct host configuration settings so I could restart the Linux VM afterwards without issue. This VM had grown over time and had multiple VMDKs on it, with LVM and the problem was doing a clone with ghettoVCB and trying to restore from that never let me maintain drive names, etc. making the LVM die on boot each time.

I haven't tried using any other tool to create the OVF file though. Since an OVF file would be (in essence) generic and could be restarted on any host, it certainly would be a desirable format for the backup to be saved in. I could see great benefits of backup to OVF and restore on a Xen Server, or VirtualBox, etc. if needed.

Myles

mylesw · ‎03-22-2010

>Curious, have you ran into the same issue if you manually exported using the vSphere Client out to OVF format?

One other thing comes to mind here, though....

if the goal is to create a rapid backup AND rapid restore process, you could clone the VMDKs to other files, but possibly integrate OVFTool into the script to create the OVF and MF (Manifest) files for those VMDKs. I tried doing this with the VM not being in powered down state and it did work. Mind you, I could get the target system to boot, but I still ran into the LVM issue. That might be something separate but having both the VMDK files AND OVF files, etc. would mean you could restore on any other ESXi or OVF supported host quickly and have a full configuration and manifest of the original server if needed.

M

lamw · ‎03-22-2010

No, actually its pretty fast - mainly because the compression is very good. The problem, however, is that it forces you to power down the source VM before you start. That doesn't work for us in a 24/7 environment.

Cool, yea I don't use OVF much nor the export functionality, good to know. Yea one thing about that tool is that the source must be offline.

Strange thing is that using VM Converter (converting source VM to target VM across ESXi hosts) worked without an issue.

Converter actually creates a new VM on the destination and keeps the source running if it's coming from a live source, so you don't have any downtime, though it can take awhile as it transfer the data over.

ovftool might be something to investigate in, though now you'll include a new dependency that was not once there, since ovftool needs to run on a management system (Windows or Linux). The other problem, I believe ovftool uses the APIs and if it's regulated like all other vSphere API/SDK's, then it won't work on free licensed version of ESXi

If I get some free cycles from my other projects, I'll go ahead and do POC and see if it's something worth looking into deeper.

Thanks again for all the comments

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

mylesw · ‎03-22-2010

>The other problem, I believe ovftool uses the APIs and if it's regulated like all other vSphere API/SDK's, then it won't work on free licensed version of ESXi

It did work just fine on 7 of the 8 VMs on my ESXi 3.5 box. Both in creating the OVF file and in restoring from OVF on the target box. Didn't look like anything stopped it from working even though it was working with ESXi

>ovftool might be something to investigate in, though now you'll include a new dependency that was not once there, since ovftool needs to run on a management system (Windows or Linux). The other problem, I believe ovftool uses the APIs and if it's regulated like all other vSphere API/SDK's, then it won't work on free licensed version of ESXi

When I tried to get OVFTool to work and failed, I did a bit of research on it (what it was, where it came from, etc.). What I found was that it looks like it came more from a consortium group rather than something proprietary with VMWare. I might be wrong on this, but as it is used across multiple virtualization technologies, I wouldn't be surprised if you find that it is actually complete free and open technology.

Myles

lamw · ‎03-22-2010

You mentioned ESXi 3.5, what version or Update? If you're on ESXi 3.5u2/u3, then you might be bypassing the license check due to VMware API bug which was fixed in U4+.

To really test this, you need to verify on ESXi 3.5u4+ or on ESXi 4.0. If you're able to export/import VMs using ovftool, then YES, this can be a solution for backups. Though I have a funny feeling that the APIs will block you, else this would be a hole they've not fixed and potentially can be exploited

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

lamw · ‎03-22-2010

I actually just did a quick test on latest version of ESXi 4.0u1 using ovftool on Windows and using free licensed version. I was able to successfully export/import a dummy VM w/o hitting any "restrictions". It looks ilke ovftool isn't checking for the licensing information. Also note that the whole vApp import/export is using something called NFC to actually perform the operations which is basically a simple http/https transfer which does actually bypass their licensing.

I'll think about how this might be incorporated into a backup solution, as you can see now, not only can the tool be platform independent (Windows or Linux), you might not even have to go into the Service Console for ESX or unsupported Busybox console for ESXi. All the work can be performed from a remote management host, and so long as it can talk to the storage in which you want to store the backup, it would work.

Thanks for this information, I'll have to ponder a bit for a solution

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-24-2010

Situation: I have a Windows 2003 Server Std. that has two virtual hard drives on different physical hard drives. E.G. Separate data stores / vmdks. One, the C: drive, is the OS, and the other, the T: drive, Windows exports as an NFS volume which serves as a backup target. I noticed that I can use a config file now to alter the backup behavior of specific VMs, and in particular the ability to select which vmdks will be backed up.

Mission: Backup the Windows server C: drive, but not the T: drive NFS volume.

Problem: I set up the config file to backup only one of the two vmdks, the C: drive. Referencing the log below, when the script runs, it acknowledges the request to backup only one of the vmks, but it also shows in the execution area that it will backup both.

What am I missing? Thanks!

==============================================

A listing of /vmfs/volumes/datastore1/ghettoVCB looks like this:

admin2

admin2_list

admin2-backup.sh

ghettoVCB.sh

The script that calls it, admin2-backup.sh, contains this:

/vmfs/volumes/datastore1/ghettoVCB/ghettoVCB.sh -f /vmfs/volumes/datastore1/ghettoVCB/admin2_list -c /vmfs/volumes/datastore2/ghettoVCB -d dryrun -l /vmfs/volumes/nas-2/log/admin2-backup.log

The config, admin2 contains this:

VM_BACKUP_VOLUME=//vmfs/volumes/nas-1/backups/server2

DISK_BACKUP_FORMAT=thin

VM_BACKUP_ROTATION_COUNT=1

POWER_VM_DOWN_BEFORE_BACKUP=0

ENABLE_HARD_POWER_OFF=0

ITER_TO_WAIT_SHUTDOWN=4

POWER_DOWN_TIMEOUT=5

SNAPSHOT_TIMEOUT=15

ENABLE_COMPRESSION=0

ADAPTER_FORMAT=lsilogic

VM_SNAPSHOT_MEMORY=0

VM_SNAPSHOT_QUIESCE=0

VMDK_FILES_TO_BACKUP="admin2.vmdk"

The log, /vmfs/volumes/nas-2/log/admin2-backup.log, contains this:

2010-03-24 21:01:19 -- info: ============================== ghettoVCB LOG START ==============================

2010-03-24 21:01:20 -- info: CONFIG - USING CONFIGURATION FILE = /vmfs/volumes/datastore1/ghettoVCB/admin2

2010-03-24 21:01:20 -- info: CONFIG - VM_BACKUP_VOLUME = //vmfs/volumes/nas-1/backups/server2

2010-03-24 21:01:20 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 1

2010-03-24 21:01:20 -- info: CONFIG - DISK_BACKUP_FORMAT = thin

2010-03-24 21:01:20 -- info: CONFIG - ADAPTER_FORMAT = lsilogic

2010-03-24 21:01:20 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0

2010-03-24 21:01:20 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0

2010-03-24 21:01:20 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 4

2010-03-24 21:01:20 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5

2010-03-24 21:01:20 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15

2010-03-24 21:01:20 -- info: CONFIG - LOG_LEVEL = dryrun

2010-03-24 21:01:20 -- info: CONFIG - BACKUP_LOG_OUTPUT = /vmfs/volumes/nas-2/log/admin2-backup.log

2010-03-24 21:01:20 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0

2010-03-24 21:01:20 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0

2010-03-24 21:01:20 -- info: CONFIG - VMDK_FILES_TO_BACKUP = admin2.vmdk

2010-03-24 21:01:20 -- dryrun: ###############################################

2010-03-24 21:01:20 -- dryrun: Virtual Machine: admin2

2010-03-24 21:01:20 -- dryrun: VM_ID: 304

2010-03-24 21:01:20 -- dryrun: VMX_PATH: /vmfs/volumes/datastore1/admin2/admin2.vmx

2010-03-24 21:01:20 -- dryrun: VMX_DIR: /vmfs/volumes/datastore1/admin2

2010-03-24 21:01:20 -- dryrun: VMX_CONF: admin2/admin2.vmx

2010-03-24 21:01:20 -- dryrun: VMFS_VOLUME: datastore1

2010-03-24 21:01:20 -- dryrun: VMDK(s):

2010-03-24 21:01:20 -- dryrun: /vmfs/volumes/4a3b4c4e-2d38530e-0c2c-003048d937e7/admin2/admin2_1.vmdk

2010-03-24 21:01:20 -- dryrun: admin2.vmdk

2010-03-24 21:01:20 -- dryrun: ###############################################

2010-03-24 21:01:20 -- info: ============================== ghettoVCB LOG END ================================

lamw · ‎03-24-2010

The output is expected based on your setup, you're running dryrun which does not go through the logic of validating a VM for backup or it's VMDK(s) etc. The verbiage may be miss-leading but this was put in place to help me in troubleshooting user issues.

When you go through the actual backup process, it'll go ahead and verify that all valid disks that are found will also be checked against VMDK_FILES_TO_BACKUP variable and check if it's set to either "all" in which case, all valid VMDK(s) will be backed up, else it'll compare the VMDK(s) to see which it should allow through.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-24-2010

Perfect! Thank you for answering, and so quickly. What an incredible script.

THANK YOU, THANK YOU, THANK YOU

lamw · ‎03-24-2010

np.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

QuebecCityVMwar · ‎03-25-2010

Hi

this my problem: on my redhat it works but not on the other servers!

The size of these servers who are very big server can t he etre the reason?

/vmfs/volumes/4b7ecd5f-26a47d05-5b67-0026b975af67 # cat ghetto_VCB.log

2010-03-23 21:00:01 -- info: ============================== ghettoVCB LOG START ==============================

2010-03-23 21:00:01 -- info: CONFIG - VM_BACKUP_VOLUME = /vmfs/volumes/NFS/SAUVEGARDE/VM

2010-03-23 21:00:01 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 3

2010-03-23 21:00:01 -- info: CONFIG - DISK_BACKUP_FORMAT = thin

2010-03-23 21:00:01 -- info: CONFIG - ADAPTER_FORMAT = buslogic

2010-03-23 21:00:01 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0

2010-03-23 21:00:01 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0

2010-03-23 21:00:01 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3

2010-03-23 21:00:01 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5

2010-03-23 21:00:01 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15

2010-03-23 21:00:01 -- info: CONFIG - LOG_LEVEL = info

2010-03-23 21:00:01 -- info: CONFIG - BACKUP_LOG_OUTPUT = stdout

2010-03-23 21:00:01 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0

2010-03-23 21:00:01 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0

2010-03-23 21:00:01 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-23 21:00:03 -- info: Snapshot found for DNS-GLPI-OCS, backup will not take place

2010-03-23 21:00:03 -- info: Snapshot found for Toshiba-Efms, backup will not take place

2010-03-23 21:00:04 -- info: Initiate backup for RedHat

2010-03-23 21:00:04 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-23" for RedHat

Destination disk format: VMFS thin-provisioned

Cloning disk '/vmfs/volumes/datastore2/RedHat/RedHat.vmdk'...

Clone: 100% done.

2010-03-23 21:07:47 -- info: Removing snapshot from RedHat ...

2010-03-23 21:07:52 -- info: Backup Duration: 7.80 Minutes

2010-03-23 21:07:52 -- info: Successfully completed backup for RedHat!

2010-03-23 21:07:54 -- info: Initiate backup for Samba Debian

2010-03-23 21:07:54 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-23" for Samba Debian

2010-03-23 21:23:08 -- info: Snapshot timed out, failed to create snapshot: "ghettoVCB-snapshot-2010-03-23" for Samba Debian

2010-03-23 21:23:08 -- info: Backup Duration: 15.23 Minutes

2010-03-23 21:23:08 -- info: Error: Unable to backup Samba Debian due to snapshot creation!

2010-03-23 21:23:08 -- info: Snapshot found for zimbra, backup will not take place

2010-03-23 21:23:08 -- info: ============================== ghettoVCB LOG END ================================

2010-03-24 21:00:01 -- info: ============================== ghettoVCB LOG START ==============================

2010-03-24 21:00:01 -- info: CONFIG - VM_BACKUP_VOLUME = /vmfs/volumes/NFS/SAUVEGARDE/VM

2010-03-24 21:00:01 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 3

2010-03-24 21:00:01 -- info: CONFIG - DISK_BACKUP_FORMAT = thin

2010-03-24 21:00:01 -- info: CONFIG - ADAPTER_FORMAT = buslogic

2010-03-24 21:00:01 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0

2010-03-24 21:00:01 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0

2010-03-24 21:00:01 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3

2010-03-24 21:00:01 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5

2010-03-24 21:00:01 -- info: CONFIG - SNAPSHOT_TIMEOUT = 60

2010-03-24 21:00:01 -- info: CONFIG - LOG_LEVEL = info

2010-03-24 21:00:01 -- info: CONFIG - BACKUP_LOG_OUTPUT = stdout

2010-03-24 21:00:01 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0

2010-03-24 21:00:01 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0

2010-03-24 21:00:01 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-24 21:00:02 -- info: Snapshot found for DNS-GLPI-OCS, backup will not take place

2010-03-24 21:00:02 -- info: Snapshot found for Toshiba-Efms, backup will not take place

2010-03-24 21:00:03 -- info: Initiate backup for RedHat

2010-03-24 21:00:03 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-24" for RedHat

Destination disk format: VMFS thin-provisioned

Cloning disk '/vmfs/volumes/datastore2/RedHat/RedHat.vmdk'...

Clone: 100% done.

2010-03-24 21:07:44 -- info: Removing snapshot from RedHat ...

2010-03-24 21:07:49 -- info: Backup Duration: 7.77 Minutes

2010-03-24 21:07:49 -- info: Successfully completed backup for RedHat!

2010-03-24 21:07:50 -- info: Initiate backup for Samba Debian

2010-03-24 21:07:50 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-24" for Samba Debian

2010-03-24 22:08:42 -- info: Snapshot timed out, failed to create snapshot: "ghettoVCB-snapshot-2010-03-24" for Samba Debian

2010-03-24 22:08:42 -- info: Backup Duration: 60.87 Minutes

2010-03-24 22:08:42 -- info: Error: Unable to backup Samba Debian due to snapshot creation!

2010-03-24 22:08:42 -- info: Snapshot found for zimbra, backup will not take place

2010-03-24 22:08:42 -- info: ============================== ghettoVCB LOG END ================================

Thank

francois

lamw · ‎03-25-2010

So you probably know why the backups aren't take place since the snapshots were not removed and backup script does not support VMs with existing snapshots.

Now the reason for this problem is not the script but from what I've seen being reported and primarily pertaining to an "NFS" configuration, potentially (hardware/software) that causes an issue with using NFS on ext3 vs NFS on XFS filesystem.

Please take a look at my comment here for some details: http://communities.vmware.com/docs/DOC-8760#comments-14679

Regarding why this occurs, I'm still not 100% sure. I've not been able to re-produce this and haven't found much commonality between the various reports, though it generally occurs from what I can tell on larger VMs but I believe there's been reports of this on smaller VMs as well.

Not enough users have gone through further troubleshooting to further isolate and there just hasn't been much feedback from the community after my comment from above.

Pretty confident that you're probably affected by this and if you implement the fix, the issue should go away.

I'm looking to update the documentation with some of these findings, if anyone would like to share more information about the issue, please let me know.

Thanks

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-25-2010

cwhitmore · ‎03-26-2010

I seem to be on here every few months with a new problem. I'm getting an error after the script backs up the first VM. Here is a screen capture:

/vmfs/volumes/48f62e13-c6958ca2-6bc2-0022198dec59 # ./ghettoVCB.sh -f vmbackups

(vim.fault.DuplicateName) {

dynamicType = ,

name = "NFS",

object = 'vim.Datastore:192.168.111.65:/mnt/vg1/data1/backup',

msg = "The name 'NFS' already exists."

}

2010-03-26 12:50:21 -- info: ============================== ghettoVCB LOG START ==============================

2010-03-26 12:50:21 -- info: CONFIG - VM_BACKUP_VOLUME =

2010-03-26 12:50:21 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 5

2010-03-26 12:50:21 -- info: CONFIG - DISK_BACKUP_FORMAT = zeroedthick

2010-03-26 12:50:21 -- info: CONFIG - ADAPTER_FORMAT = lsilogic

2010-03-26 12:50:21 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0

2010-03-26 12:50:21 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0

2010-03-26 12:50:21 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3

2010-03-26 12:50:21 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5

2010-03-26 12:50:21 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15

2010-03-26 12:50:21 -- info: CONFIG - LOG_LEVEL = info

2010-03-26 12:50:21 -- info: CONFIG - BACKUP_LOG_OUTPUT = stdout

2010-03-26 12:50:21 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0

2010-03-26 12:50:21 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 1

2010-03-26 12:50:21 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-26 12:50:22 -- info: Initiate backup for LegalFiles

2010-03-26 12:50:22 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-26" for LegalFiles

Destination disk format: VMFS thick

Cloning disk '/vmfs/volumes/BackupsStore/LegalFiles/LegalFiles_1.vmdk'...

Clone: 100% done.

Destination disk format: VMFS thick

Cloning disk '/vmfs/volumes/BackupsStore/LegalFiles/LegalFiles.vmdk'...

Clone: 100% done.

2010-03-26 13:58:07 -- info: Removing snapshot from LegalFiles ...

ash: 5.gz: bad number

./ghettoVCB.sh: ./ghettoVCB.sh: 735: Syntax error: 5.gz+1

lamw · ‎03-26-2010

Please take a look at my comment here: http://communities.vmware.com/docs/DOC-8760#comments-14997

I suspect you may be running into the lovely "nfs" issue that some of the other users are having. This has nothing to do with the script.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-26-2010

"I suspect you may be running into the lovely "nfs" issue"

I don't know about that. I set some debugging flags around the code as shown here.

checkVMBackupRotation() {
set -x
set -v
...
set +x
set +v
}

I called the script with debug logging and redirected stdout and stderr to a file. The first backup made it, but the second one you can see ran into trouble.

Logging output to "/vmfs/volumes/nas-2/log/server2-backup.log" ...
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/monitor/monitor.vmdk'...

Clone: 0% done.
Clone: 1% done.
....
Clone: 99% done.
Clone: 100% done.
+ set -v
+ local BACKUP_DIR_PATH=//vmfs/volumes/nas-1/backups/server2/monitor
+ local BACKUP_VM_NAMING_CONVENTION=//vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26
+ ls -tr //vmfs/volumes/nas-1/backups/server2/monitor
+ LIST_BACKUPS=monitor-2010-03-22--1
monitor-2010-03-26
+ [ -z 1 ]
+ IFS=

+ TMP_DIR=//vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-22--1
+ echo 1
+ TMP=1
+ [ 1 = //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26 ]
+ [ 1 -ge 1 ]
+ rm -rf //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-22--1
+ TMP_DIR=//vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26
+ echo //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26
+ TMP=//vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26
+ [ //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26 = //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26 ]
+ NEW=//vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26--1
+ mv //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26 //vmfs/volumes/nas-1/backups/server2/monitor/monitor-2010-03-26--1
+ unset IFS
+ set +x
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/findlocw/findlocw.vmdk'...

Clone: 0% done.
Clone: 1% done.
....
Clone: 99% done.
Clone: 100% done.
+ set -v
+ local BACKUP_DIR_PATH=//vmfs/volumes/nas-1/backups/server2/findlocw
+ local BACKUP_VM_NAMING_CONVENTION=//vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-26
+ ls -tr //vmfs/volumes/nas-1/backups/server2/findlocw
+ LIST_BACKUPS=findlocw-2010-03-23
findlocw-2010-03-24
findlocw-2010-03-25
findlocw-2010-03-26
+ [ -z 1 ]
+ IFS=

+ TMP_DIR=//vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23
+ echo //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23
+ TMP=//vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23
+ [ //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23 = //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-26 ]
+ [ //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23 -ge 1 ]
sh: //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23: bad number
+ echo //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23
+ BASE=//vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23
/vmfs/volumes/datastore1/ghettoVCB/ghettoVCB.sh: line 739: syntax error: //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23+1

Line 739 is the call to ghettoVCB().

The following is the log file for the same run:

2010-03-26 21:32:08 -- info: ============================== ghettoVCB LOG START ==============================

2010-03-26 21:32:09 -- info: CONFIG - VM_BACKUP_VOLUME = //vmfs/volumes/nas-1/backups/server2
2010-03-26 21:32:09 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 1
2010-03-26 21:32:09 -- info: CONFIG - DISK_BACKUP_FORMAT = thin
2010-03-26 21:32:09 -- info: CONFIG - ADAPTER_FORMAT = lsilogic
2010-03-26 21:32:09 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0
2010-03-26 21:32:09 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0
2010-03-26 21:32:09 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3
2010-03-26 21:32:09 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5
2010-03-26 21:32:09 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15
2010-03-26 21:32:09 -- info: CONFIG - LOG_LEVEL = info
2010-03-26 21:32:09 -- info: CONFIG - BACKUP_LOG_OUTPUT = /vmfs/volumes/nas-2/log/server2-backup.log
2010-03-26 21:32:09 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0
2010-03-26 21:32:09 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0
2010-03-26 21:32:09 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-26 21:32:10 -- info: Initiate backup for monitor
2010-03-26 21:32:10 -- info: Creating Snapshot "ghettoVCB-snapshot-2010-03-26" for monitor
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/monitor/monitor.vmdk'...

Clone: 0% done.
Clone: 1% done.
....
Clone: 99% done.
Clone: 100% done.
2010-03-26 21:56:59 -- info: Removing snapshot from monitor ...
2010-03-26 21:57:05 -- info: Backup Duration: 24.92 Minutes
2010-03-26 21:57:05 -- info: Successfully completed backup for monitor!

2010-03-26 21:57:05 -- info: CONFIG - VM_BACKUP_VOLUME = //vmfs/volumes/nas-1/backups/server2
2010-03-26 21:57:05 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 1
2010-03-26 21:57:05 -- info: CONFIG - DISK_BACKUP_FORMAT = thin
2010-03-26 21:57:05 -- info: CONFIG - ADAPTER_FORMAT = lsilogic
2010-03-26 21:57:05 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 0
2010-03-26 21:57:05 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 0
2010-03-26 21:57:05 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3
2010-03-26 21:57:05 -- info: CONFIG - POWER_DOWN_TIMEOUT = 5
2010-03-26 21:57:05 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15
2010-03-26 21:57:05 -- info: CONFIG - LOG_LEVEL = info
2010-03-26 21:57:05 -- info: CONFIG - BACKUP_LOG_OUTPUT = /vmfs/volumes/nas-2/log/server2-backup.log
2010-03-26 21:57:05 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 0
2010-03-26 21:57:05 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0
2010-03-26 21:57:05 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-26 21:57:06 -- info: Initiate backup for findlocw
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/findlocw/findlocw.vmdk'...

Clone: 0% done.
Clone: 1% done.
....
Clone: 99% done.
Clone: 100% done.

lamw · ‎03-26-2010

Let me try to explain what's going on so you understand the problem, as I've said before, the issue is not with the script.

+ [ //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23 -ge 1 ]
sh: //vmfs/volumes/nas-1/backups/server2/findlocw/findlocw-2010-03-23: bad number

The reason this is failing is because when it extracts the sub-numbers in which the individual backups are being re-named for rotation, it's unable to find a value and hence the comparison fails. The reason this is occurring is the source folder that the actual backup is in, never got renamed as it was supposed to be and it fails the check condition. The whole "NFS" issue has shown itself in various ways forms, but majority of the problems have been around the rotation and deletion of the backups. For some reason, as already explained in few previous post, the I/O operation that's sent is either taking too long or never completes within a timely manner which causes all sorts of "funny" things either no rotating of the directories or left behind snapshots that were not able to be committed.

Again, these issues have always been reported around "NFS" and large VMs. I suspect that smaller VMs don't have this problem but once you get into larger VMs, this "I/O" issue with either the host or NFS server will start to show up. One resolution that a user has found was by changing their NFS server volume from ext3 to XFS which helped in resolving the problem and could be due to the overhead that ext3 brings in the way it does its update when an I/O request is sent compared to XFS.

So my question to you is, what is difference between VM1 and VM2 in terms of size? If you want to test this theory out it's pretty simple, take the NFS server out of the picture. If you have sufficient space on local volume, do a backup and see if you run into the issue? I know others have ran these tests and found that backups always complete. I know plenty of users that are able to backup across all 3 types of datastores perfectly fine, including NFS and it's only a small set of users that are seeing this problem.

Hopefully this makes a little more sense, so that's why I suggested that you're most liekly seeing the "NFS" issue as others are.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-27-2010

So my question to you is, what is difference between VM1 and VM2 in terms of size?

When I did this test, I switched them around to make the smaller one go first so I didn't have to wait so long for the results. The first one did make it, and the second, larger one, did not. But I had done that before to get a backup of the smaller one, and it failed.

The reason this is failing is because when it extracts the sub-numbers in which the individual backups are being re-named for rotation, it's unable to find a value and hence the comparison fails...Hopefully this makes a little more sense

I read the previous posts and links earlier but that sentence clarifies your position on what is going on better than all of the posts put together. I gather from what you are saying that the command completes, and when the next one executes, either the rename hasn't completed, or the call to the directory returns old information. I would guess the latter.

...that's why I suggested that you're most likely seeing the "NFS" issue as others are.

Your explanation is logical, and I don't understand the code right there well enough to be sure. However, there are a couple of details that don't fit the profile.

1. I'm not using EXT3 or XFS, I'm using NTFS on BOTH servers that cross-backup VMs to each other at night.

2. This has been working fine for a long time with large VMs on the same two NFS volumes. Now they both fail.

Possible Solution:

In order to make my situation fit the NFS theory, I initially reasoned that the problem may have crept in as the result of a Microsoft update. That would make both servers having the same problem at the same time, fit. However, I kept coming back to why the small VM backs up without error now, while it didn't before. While reflecting on that, I noticed that I had renamed parent directory to preserve my known-good backups while experimenting. I had done that with none of the others. I then did that with the others. Last night, all of the backups executed flawlessly, including proper rotation of the small VM that had started working earlier.

Possible cause:
Earlier, I mentioned one of the things that I liked about your new script is the ability to specify which VMDKs get backed up. In the case of the two Windows 2003 R2 Servers that export the NFS volumes for the backups, it is important that I don't replicate their NFS volumes, not only due to their size, but because it would senseless to get a backups of a backups, and then store them back on the same server as the VMs they were backing up. I used to have a separate hacked script to backup the main VMDKs for each of those servers. However, I also had to schedule the WIndows machines separately, and make sure there was enough time between them so they would not overlap. If they were to overlap, their snapshots would snowball as they record changes of the changes of the changes.... After they complete, the nightly free-for-all backup begins where they cross-backup VMs to each other without issue. One time they did overlap, and while they were snowballing, the nightly backups kicked. I came back to packed NFS drives on both servers. I rebooted, but then none of the VMs would start. However, I always build in an "air-bubble" for things like this so I logged into the ESXi servers and deleted them to make space. I booted the servers, started the VIMs, and cleaned house. That's the only thing that I can THINK of that could have triggered this.

Until last night, I've been adjusting the list orders to get backups and deleting the old ones by hand. This is the first time in 6 weeks this has worked right. One good day in a row does not equal a track record. I'll update this in a few days with the results.

PS: Thanks again for the awesome script!

lamw · ‎03-27-2010

Thanks for the updates, as you can see I get a few questions around these issues per week and in general the NFS issues have been the culprit from what I can tell. In your situation, it sounds like directories that were expected were modified which potentially broke the rotation. I'll need to think about this and see if there's a way I can help protect or better log this event. The other thing you mentioned about "overlap", one thing I could do is setup a lock which would be generated upon the start of the script and hopefully removed at the end of the script. If any "unexpected" errors were to happen and the lock would not be removed, then some logic would need to be written to ensure the script isn't running and remove the lock and re-lock.

Let me know if you have any ideas around this and thanks for the patience. I've been quite fortunate to have such a great and huge community ( script has gotten over 200,000+ views) to help QA this script. That's why I've been pretty confident to state that it's probably not the script and the issue lies in either "manual" manipulation of some sort OR environmental issues. If there was a real bug within the script, we would have more users reporting problems. There are definitely some things I'm thinking about adding into a future release and there have been suggestions around some of those features. Let me ponder about it some more and perhaps I'll provide an update of the "suggested" features and I'll go ahead and take any additional and see if it makes sense.

What I will do this weekend is update some of the documentation and FAQ's around some of the things I've learned over the last few weeks that users may have ran into.

Thanks

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-27-2010

Let me know if you have any ideas around this

This is how I'm handling the situation now:

1. I use a config template for each of the Windows servers to specify only the system VMDKs. I no longer need to maintain a separate hacked ghettoVCB.sh for my Windows NFS server backups in order to have them skip their NFS disks.
2. I have an admin1-backup.sh and an admin2-backup.sh on the servers that:
a. Delete the old logs
b. Call ghettoVCB.sh specifying a backup list that only contains themselves.
3. I have admin-backups.bat on one of the Windows servers with these commands:
a. Run plink to call admin1-backup.sh
b. Run plink to call admin2-backup.sh
c. Run the type command on the logs and redirect the output to admin-logs.txt on the local disk.
d. Run bmail to E-mail me the logs and specify admin-logs.txt for the message body input.
*Note: This step does indeed effectively serialize the two backups. I was concerned that plink might just kick off the backup on the first server, return, move to the next command in the batch file, and start the second server before the first one completes.
4. The Windows Scheduler calls admin-backups.bat.
*Note: I initially set up the Windows Scheduler to kill the process 15 minutes before the nightly free-for-all backups begin if the Windows NFS server backups were not completed by that time. I found it terminated the batch file, but the currently running script on the server completes. However, even if the backup were to continue on one of the Windows NFS servers after the other backups start, there would be no snowball affect. It would result in a larger snapshot. The only problem that could arise from that, is a script time-out while removing a larger than normal snapshot. Thus, there was no benefit to having the Windows Scheduler stop the process. The only thing that would accomplish is I wouldn't get the backup logs E-mailed to me.

Other:
For now, I left the set -vx, set -xv pair around the code in question in ghettoVCB. I maintain two calls in the admin?-backup.sh scripts that call ghettoVCB, one normal, and one debug. The debug option gives me a second log file that contains the commands in the order they executed and the contents of the variables for the tagged area, and any exit errors.

.....
# Normal
#/vmfs/volumes/datastore1/ghettoVCB/ghettoVCB.sh -f /vmfs/volumes/datastore1/ghettoVCB/server2_list -c /vmfs/volumes/datastore1/ghettoVCB -l /vmfs/volumes/nas-2/log/server2-backup.log
# Debug
/vmfs/volumes/datastore1/ghettoVCB/ghettoVCB.sh -f /vmfs/volumes/datastore1/ghettoVCB/server2_list -c /vmfs/volumes/datastore1/ghettoVCB -d debug -l /vmfs/volumes/nas-2/log/server2-backup.log >/vmfs/volumes/nas-2/log/server2-debug.log 2>&1
.....

Thank you for your very valuable time with this.

lamw · ‎03-27-2010

My comment around ideas, was for features to be added to the script and not specific to your backup process.

e.g. Add a locking mechanisms to the script to ensure that only 1 instance can run at any given time.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

lamw · ‎03-27-2010

Hello everyone,

I've just updated the documentation, for those that are affected by the "NFS" issue, please take a look at FAQ #29 for some details.

I will also be accepting any new feature requests and take a look at what are the most popular ones and how they might be integrated into a future release of the script. Keep in mind, feature adds will be based on feasibility and amount of free time I have.

Thanks for everyone support!

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

IT_Architect · ‎03-28-2010

My comment around ideas, was for features to be added to the script and not specific to your backup process.

Your script serializes the backups in the list. What I wrote demonstrates how to serialize backups among servers by placing them in the same calling script. This ensures only one instance of ghettoVCB is running at a given time. Moreover, by further adding scripts that call the concurrent backups to the same script, I can force them to wait until the serialized backups are complete before starting them on separate threads to run concurrently. Each off those threads are able to serialize their own operations where necessary, in this case, mailing their logs. This method eliminates the complexities associated with semaphores.

Add a locking mechanisms to the script to ensure that only 1 instance can run at any given time.

Disclaimer: I'm a language and OS programmer, not a script programmer. To enforce that no more than one instance of ghettoVCB.sh can run on a VM at any given time, what comes to mind is to employ a method similar to what the operating system uses with pid files. The best place to store the "pid" files is someplace you can write, and someplace that gets cleared when VMWare reboots. The "pid" file is simply has a known file name that you'll be overwriting periodically in order to maintain its timestamp. What follows depends on the script processor having the ability to register a timer and process events. It would work something like this:

When ghettoVCB launches, and checks for its assigned "pid" file name. If when it starts up and it finds its "pid" and it's up to date, it exits because that means a copy of the ghettoVCB is active. If it doesn't find it, or it's out of date, it would write a new "pid" file, and continue to start, and register a timer to generate a timer event say every 5 seconds. When the event fires, it runs code that overwrites the "pid" file to update the timestamp, and then goes back to what it was doing. When ghettoVCB exits, it removes its "pid" file.

PS: This was longer, and I explained how to make your own timer if the script cannot register a timer, but it got blown away when I posted it.

QuebecCityVMwar · ‎03-29-2010

Hi

The problem is resolved, my blocksize of my datastore was 1,

The snapshot could not be made because not enough available disk space, I modified my blocksize in 8 and the saving was made.

Thanck

francois

MatthiasLuraTec · ‎03-29-2010

Hi

I use the script to backup one server on an ESXi4.0 free Edition host.

A test backup of an other vm finalized successfully but if I want to backup the one vm that is important (a buildserver) there were several errors:

- First I want the vm to be backed up during execution. In time of backup there are no users logged in and no buildtask was running. There were no snapshots. The script startet to take a snapshot, tried to clone disks, failed and then and that is really annoying don't revert the snapshots. The next run couldnt work fine.

- Ok than I prepared the script to shutdown the vm so there will be no snapshotting.

- Now the problem is, that the script apparently couldn't find the vmdks????

Here are the log entries:

2010-03-26 07:10:02 -- info: ============================== ghettoVCB LOG START

2010-03-26 07:10:02 -- info: CONFIG - VM_BACKUP_VOLUME = /vmfs/volumes/4ae0b8d4-74b64288-137e-001cc0f2d1dd/Backup

2010-03-26 07:10:02 -- info: CONFIG - VM_BACKUP_ROTATION_COUNT = 2

2010-03-26 07:10:02 -- info: CONFIG - DISK_BACKUP_FORMAT = zeroedthick

2010-03-26 07:10:02 -- info: CONFIG - ADAPTER_FORMAT = buslogic

2010-03-26 07:10:02 -- info: CONFIG - POWER_VM_DOWN_BEFORE_BACKUP = 1

2010-03-26 07:10:02 -- info: CONFIG - ENABLE_HARD_POWER_OFF = 1

2010-03-26 07:10:02 -- info: CONFIG - ITER_TO_WAIT_SHUTDOWN = 3

2010-03-26 07:10:02 -- info: CONFIG - POWER_DOWN_TIMEOUT = 6

2010-03-26 07:10:02 -- info: CONFIG - SNAPSHOT_TIMEOUT = 15

2010-03-26 07:10:02 -- info: CONFIG - LOG_LEVEL = info

2010-03-26 07:10:02 -- info: CONFIG - BACKUP_LOG_OUTPUT = stdout

2010-03-26 07:10:02 -- info: CONFIG - VM_SNAPSHOT_MEMORY = 1

2010-03-26 07:10:02 -- info: CONFIG - VM_SNAPSHOT_QUIESCE = 0

2010-03-26 07:10:02 -- info: CONFIG - VMDK_FILES_TO_BACKUP = all

2010-03-26 07:10:12 -- info: Powering off initiated for Buildserver, backup will not begin until VM is off...

2010-03-26 07:10:18 -- info: VM is still on - Iteration: 0 - sleeping for 60secs (Duration: 0 seconds)

2010-03-26 07:11:25 -- info: VM is powerdOff

2010-03-26 07:11:25 -- info: Initiate backup for Buildserver

Failed to clone disk : The file already exists (39).

Destination disk format: VMFS zeroedthick

Cloning disk '/vmfs/volumes/datastore2/Windows XP Pro SP3 Ger 05/

Clone: 1% done.

...

Clone: 100% done.

Failed to clone disk : The system cannot find the file specified (25).

Destination disk format: VMFS zeroedthick

Cloning disk '/vmfs/volumes/datastore2/Windows XP Pro SP3 Ger 05/Windows XP Pro SP3 Ger 05.vmdk'...

2010-03-26 07:14:05 -- info: Powering back on Buildserver

2010-03-26 07:14:05 -- info: Compressing VM backup "/vmfs/volumes/4ae0b8d4-74b64288-137e-001cc0f2d1dd/Backup/Buildserver/Buildserver-2010-03-26.gz"...

2010-03-26 07:14:06 -- info: Backup Duration: 2.78 Minutes

2010-03-26 07:14:06 -- info: Successfully completed backup for Buildserver!

2010-03-26 07:14:06 -- info: ============================== ghettoVCB LOG END

What should that mean?

I would appreciate for an answer!

Regards

Matthias

lamw · ‎03-29-2010

Yes, not having the right blocksize can affect the backups I'm going to assume users have a properly configured environment when using the script.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

lamw · ‎03-29-2010

Please read FAQ0

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators

http://www.yellow-bricks.com/2009/05/14/block-sizes-and-growing-your-vmfs/

If you find this information useful, please award points for "correct" or "helpful".

MatthiasLuraTec · ‎03-29-2010

Hi

ok what is the right blocksize?

It is set to 1 MB and the greatest file could be 265 GB

Regards

QuebecCityVMwar · ‎03-29-2010

Hi

By defaut with one " block size " of 1M what allows maximal partitions only of 256G.

To exploit the integralité of the disk it is necessary to pass has a block size of 2M for a maximal partition of 512Go, of 4M for a maximal partition of 1To and of 8M for a maximal partition of 2To.

francois

lamw · ‎03-29-2010

ok what is the right blocksize?

The answer is, it depends. With vSphere, there's probably not a whole lot of reason not to use 8MB by default, especially with the sub-block allocation.

Give these two links a read for further details:

http://www.yellow-bricks.com/2009/03/24/an-8mb-vmfs-blocksize-doesnt-increase-performance/

Another issue that we've personally ran into, which others could also is when you have a VM that has it's VMDK(s) spread across multiple datastores. The datastoer in which the VM's .vmx is located in has a smaller blocksize than the other datastore hosting the VM's disk will cause an issue and when you try to snapshot the VM, it'll fail.

Take a look at this article for further details:

http://www.yellow-bricks.com/2009/08/24/vsphere-vm-snapshots-and-block-size/

Regarding your problem, this may or may not be the problem. Though this is not a beginner's script, so I expect user's to have a properly configured environment before using the script.

Please go through the documentation + FAQ and see if you're falling into any of those "known" issues. If you still can't find a resolution, please take a look at FAQ1 before re-posting.

Thanks

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at: http://engineering.ucsb.edu/~duonglt/vmware/

VMware Code Central - Scripts/Sample code for Developers and Administrators