Bruticusmaximus's Posts

I'm really starting to get into SRM now Is there a way to have a recovery plan do a snapshot of a VM before it powers it on in a recovery plan?   Here's the scenario.  We have an app with a p... See more...
I'm really starting to get into SRM now Is there a way to have a recovery plan do a snapshot of a VM before it powers it on in a recovery plan?   Here's the scenario.  We have an app with a primary and secondary node.  They fail back and forth all the time. For recovery, the primary needs to be brought up first otherwise, there could be data loss. There's no way to tell which is primary until you power a node on.  If you power the wrong one first, there's data loss.  I want SRM to do a snapshot before powering them on.  This way, if we go into the app and the wrong node was powered on first, we can just revert to snapshot and power the other node on first.  I know we can just cleanup and run the recovery plan again but, there are a couple of other apps being recovered with the same plan.
Before we run a disaster recovery test, there's a few house keeping things we need to do. There's a couple of production domain servers at the DR site that we need to clone. During the test we use th... See more...
Before we run a disaster recovery test, there's a few house keeping things we need to do. There's a couple of production domain servers at the DR site that we need to clone. During the test we use the cloned versions of these DCs.  After the test, we delete the clones and power up the original DCs.  Why we do this is a long story. Years ago when SRM was still Windows based, we had a Powershell script on the SRM server that would do the clone.  We'd call it from SRM at the beginning of the recovery plan.   Now that SRM is an appliance, I manually run the script before I run the SRM recovery plan.  I would love to somehow incorporate this script at the beginning of the recovery plan. Also, is there a way to incorporate the calling of a powershell script during the cleanup process in SRM?
Other things I have tried are removing the VM from inventory and re-adding it and, connecting directly to the host to do the sanpshot
Getting "An error occurred while taking a snapshot A digest operation has failed" on a few VMs when trying to create a snapshot.  These VMs have nothing in common except that they're all on ESXi 7 ho... See more...
Getting "An error occurred while taking a snapshot A digest operation has failed" on a few VMs when trying to create a snapshot.  These VMs have nothing in common except that they're all on ESXi 7 hosts. Different OS, different hosts, different clusters, different storage, different virtual centers. We were made aware of this because our Netbackup backup failed on these VMs last night.  There haven't been any changes to the environment recently.  We upgraded to v7 about a month ago. Here's what I've tried: Powered off VM Moved to different host Moved to different storage Moved to different cluster I'm stumped at this point.
Prior to testing a recovery plan, I have a need to power off and clone a couple of VM.  I have a Power CLI script that does this.  When our SRM servers were Windows, it was no big deal.  The script w... See more...
Prior to testing a recovery plan, I have a need to power off and clone a couple of VM.  I have a Power CLI script that does this.  When our SRM servers were Windows, it was no big deal.  The script was right on the SRM server.  now that was have Linux appliances, how do I run the Power CLI script?
I'm trying to do the same thing.  I have a report that they want sent out daily about the performance of 2 VMs.  It looks horrible with wrapping just a single character to the next line. If I cou... See more...
I'm trying to do the same thing.  I have a report that they want sent out daily about the performance of 2 VMs.  It looks horrible with wrapping just a single character to the next line. If I could shrink the font down just one notch, I think it would look great.  I don't want everybody to have to widen the columns after they open the spreadsheet.
das.respectvmvmantiaffinityrules really is a great find. I found this article: vSphere HA Advanced Options It says: das.respectvmvmantiaffinityrules Determines if vSphere HA e... See more...
das.respectvmvmantiaffinityrules really is a great find. I found this article: vSphere HA Advanced Options It says: das.respectvmvmantiaffinityrules Determines if vSphere HA enforces VM-VM anti-affinity rules. Default value is "false", whereby the rules are not enforced. Can also be set to "true" and rules are enforced (even if vSphere DRS is not enabled). In this case, vSphere HA does not fail over a virtual machine if doing so violates a rule, but it issues an event reporting there are insufficient resources to perform the failover. I read this as "in an HA event, your affinity rules go out the window".  Does anybody else read it like that?  I wish I had somewhere to test this.
So, if I just use a regular affinity rule that says keep VMs on separate hosts then add "das.respectVmVmAntiAffinityRules    false", that will allow HA to restart a VM on any host?  Because, that... See more...
So, if I just use a regular affinity rule that says keep VMs on separate hosts then add "das.respectVmVmAntiAffinityRules    false", that will allow HA to restart a VM on any host?  Because, that would be exactly what I'm looking for.  Keep the VMs on separate hosts unless something more important (An HA event) prevents you from doing that.
Thank you.  That's kind of what I thought. So, if I have to do a "should" in order to not break HA, I wish I could also do a "Separate" too. Seems like DRS affinity rules needs a "Best effort... See more...
Thank you.  That's kind of what I thought. So, if I have to do a "should" in order to not break HA, I wish I could also do a "Separate" too. Seems like DRS affinity rules needs a "Best effort" option.  Run these VMs on separate hosts if you can. If there's 6 VMs but 5 hosts well, 2 VMs will have to run on one host. If a host goes down then 3 VMs may have to run on a host.
I have a situation where I have 5 hosts in a cluster and 10 VMs I need to separate. I don't want any more than 2 VMs running on the same host.  I created 2 "Separate VMs" rules.  One with the odd... See more...
I have a situation where I have 5 hosts in a cluster and 10 VMs I need to separate. I don't want any more than 2 VMs running on the same host.  I created 2 "Separate VMs" rules.  One with the odd VMs and one with the even VMs.  In theory, this should put, one even VM and one odd VM on each host.  If we end up with 11 VMs then, it might be an issue but, of now, it's fine. What happens if we have and HA event?  Am I going to get a "Violates affinity rule" error when it powers up the VM on the other hosts or, do HA events override DRS rules? If I uses groups instead and setup a "Should run on these hosts" rule, does that change things? If I setup group A with odd VMs and Group B with even VMs and a host group with all the hosts in it, there is nothing that says all the VMs in group A should run on different hosts.
And by "We"  I mean "Me"
Is there a way to force VMs from one host to come up on another host the next time the VM has a scheduled reboot? We had a cluster with 10 blade hosts. It was not setup with EVC enabled. It'... See more...
Is there a way to force VMs from one host to come up on another host the next time the VM has a scheduled reboot? We had a cluster with 10 blade hosts. It was not setup with EVC enabled. It's a long story. About 6 months ago, we added 2 new hosts.  We'll call them hosts 11 and 12. Things always chugged along fine. We even upgraded all the hosts in the cluster along the way. Put any host in maintenance mode and the VMs migrate without issue.  What we didn't notice is that a bunch of the VMs on host 11, would only migrate to host 12 and vice versa.  If we try to move the VMs on host 11 and 12 to hosts 1 through 10, they fail.  These VMs must have rebooted while on host 11 or 12 at some point and picked up the newer CPU instruction set.  Not a huge problem.  At least VMware knows enough to not move the VM to an older host. We have 3 new hosts coming in for this cluster so, I need to fix this without rebooting any VMs if possible. The least impactful way to do it I think, would be to somehow to force the VMs on host 11 and 12 to only come up on hosts 1 through 10 the next time they reboot.  The next time the VM reboots with patching, the problem will solve itself.  Pull host 11 and 12 out, turn on EVC for the cluster. Add hosts 11 and 12 back in. Is there a better way to solve this problem? Thanks in advance.
Good point about the info I posted.  Thanks. I may manually kick off a rebalance. Usually, to go from thick provisioned to thin provisioned, I'd do a storage vmotion.  With only 1 datasto... See more...
Good point about the info I posted.  Thanks. I may manually kick off a rebalance. Usually, to go from thick provisioned to thin provisioned, I'd do a storage vmotion.  With only 1 datastore, how can I easily get a thick provisioned VM to thin provisioned?
Vmware did the calculations and said "Yup, there's enough space to power on the VM".  That being said, they first tried to tell me that the space issue was within the OS of the VM until I said "H... See more...
Vmware did the calculations and said "Yup, there's enough space to power on the VM".  That being said, they first tried to tell me that the space issue was within the OS of the VM until I said "How would ESX know if the OS has a full disk?  The VM isn't even running Vmware Tools".  While they did get back to me very quickly and the tech was very nice, I'm not sure their heart was in it once they saw it was 5.5. 1. Failure to create the .vswp (as per above).           The VM is 4GB with no reservation. 2. Failure to write to any of the VMs vmdk due to the capacity-tier disks that these reside on being out of space.           Vmware checked this out and calculated that there should be enough space 3. Unsupported configurations with -flat.vmdk or other 'file' data residing in the VMs namespace.      The error message specifically calls out the one thin provisioned disk.           4. Cluster is partitioned at the time of attempted power on (but this would likely be obvious from other symptoms e.g. VMs becoming inaccessible).           We haven't had any other issues that I'm aware of 5. Component limit has been reached (the max for this was far lower in 5.5 compared to modern versions of vSAN).           I'm not sure what this means but there's only 8 VMs running in this cluster. All very low workload VMs. File server, DNS, DHCP, Print server. Thanks for your input.  I'm going to run those commands and post it here.
I have a 3 node VSAN cluster. Total capacity is 22TB.  I have 4.9TB free. I'm using an FTT of 1.  I'm not able to power on a 500GB thin provisioned VM because of a lack of disk space. The VM is u... See more...
I have a 3 node VSAN cluster. Total capacity is 22TB.  I have 4.9TB free. I'm using an FTT of 1.  I'm not able to power on a 500GB thin provisioned VM because of a lack of disk space. The VM is using 405GB on the disk. Vmware tells me that there should be enough free space to power on the VM however, it's v5.5 so, they won't troubleshoot this. - there are no bad disks - there are no snapshots on any VMs - there are no orphaned VMs - I delete old log files Any ideas as to what can be going on? Any ways for me to free up more space? If I let the VM sit for a few hours, it will power on even though the free space is still 4.9TB Thank you in advance
We're running VSAN with ESXi v5.5.  I know, it's out of support.  We had an issue with a power outage a while back.  We lost our 4TB file server VM.  We're to a point where we're restoring files ... See more...
We're running VSAN with ESXi v5.5.  I know, it's out of support.  We had an issue with a power outage a while back.  We lost our 4TB file server VM.  We're to a point where we're restoring files from a USB drive.  We had an issue where the VM powered off and got "Not enough disk space...." when powering on.  Support said that it was because a resync was running and using all the free space in the background.  Once the resync finished, I could power on the VM.  All was good .... until .... we started the restore again. Then, a resync started again. And, we ran out of space, VM powered off, etc Why would a resync keep starting up? Could it be because we're restoring a lot of data all at once? Could it be because the drives on this file server are thin provisioned?
Wow, that was longer than I thought it would be.  Sorry for that.
Our non-SRM DR solution is simple. If the main office goes away, present the latest data from Recover Point to ESX hosts at the DR site, bring everything up and plug into our WAN to present every... See more...
Our non-SRM DR solution is simple. If the main office goes away, present the latest data from Recover Point to ESX hosts at the DR site, bring everything up and plug into our WAN to present everything to remote offices and clients working from home.  Our DR test is also simple. Cut the network connection between the main office and the DR site to simulate the office going down. Present a snapshot copy of the data on Recover Point to ESX hosts at the DR site. Bring everything up and DON'T connect it to the WAN. No crazy re-IPing, no changes to networks at DR.  Very simple. Along comes SRM.  Our recovery plan in SRM is just as simple. Cut the network connection, fail stuff over. We setup a recovery plan with a couple of test VMs so we could kick the tires on this SRM thing.  We hit the "Test Plan" button and it does everything flawlessly. The cleanup process is great.  Everything works as it should until .......... we try it with the network connection cut to the DR site. Then we get "Failed to create snapshots of replica devices" when trying to do a snapshot on the Recover Point. My first call to Vmware had me in a conference call with Vmware and EMC. It was discovered that there was a bug in the Recover Point software that could cause this.  We needed to upgrade to the latest release.  We did that and no dice.  Still the same error. The next tech I spoke to at Vmware tells me that this is the way it's supposed to work.  To do a test, you have to select "Run Recovery" then do a "Planned Migration" If this is the way you have to do a test, why even have a test button? For a planned migration it tells you right on the screen that "the process will permanently alter virtual machines and infrastructure of both the protected and recovery datacenters". I don't want to alter ANYTHING for a test. Vmware says that after the planned migration and testing to run a re-protect. This will reverse the replication and copy the data from DR back to the main office. How is this ever a good thing for a test?  When we test, we have application owners beat their systems to death with bogus transactions and run test scripts.  We DON'T want that stuff replicated back to our production site EVER. Vmware went down the line of "You can use VRF at the DR site to create a bubble to recover everything into".  Well ok, how do I add the physical machines we have to recover into the bubble? "Well, the network team can make changes on the switch ports".  What about VPN? "They would have to make changes there too.  And also, you would have to create jump box VMs for people to access the applications in the bubble".  But in a real DR situation, you wouldn't do any of that. For a real DR, we just let it rip.  For a test, we create a bubble and make all kinds of network changes. So, with my manual process, our test is exactly the same as our real DR except for one step.  With SRM, our test looks NOTHING like a real DR.  How is that a good thing? Here's what I think is happening with the error message. SRM is connecting to the protected site Recover Point appliance to do the snapshot. When the link is down, it can't get there. If they just pointed to the recovery site to do the snapshot, it would work. So here are my questions: Am I the only one that tests the DR plan by just cutting the link to the main data center and bring up VMs at the DR site? I see a lot of people posting here that they get the same error.  Do these people have the link to their DR site up or down?
Was the network connection between your protected site and your recovery site up at the time? For me, testing always works fine.  When we cut the link to the DR site to do a real DR test, I ge... See more...
Was the network connection between your protected site and your recovery site up at the time? For me, testing always works fine.  When we cut the link to the DR site to do a real DR test, I get the same error.  Vmware says this is the way it is supposed to work.
We had RSS turned on at the hardware level and on the NIC in the OS.  We shut it off and, it hasn't happened in about 3 weeks now. having said that, I'm sure it will happen tonight. RSS is on ... See more...
We had RSS turned on at the hardware level and on the NIC in the OS.  We shut it off and, it hasn't happened in about 3 weeks now. having said that, I'm sure it will happen tonight. RSS is on by default.  We have it on for all 1400 of our VMs.  A lot are configured just like these four.  So .... it makes no sense but, I'm happy it fixed it for now.