Are you sure your cabling is good? Also eliminate possible patch panel problems by cabling direct to the switch to test. I had a similar problem with a dual port NC card that turned out to be b...
See more...
Are you sure your cabling is good? Also eliminate possible patch panel problems by cabling direct to the switch to test. I had a similar problem with a dual port NC card that turned out to be bad patch panel terminations. Make sure the card is good too by installing in another server to test.
That's a very good question, how low is too low? 256 to 150 for me amounted to 20 additional VMs that I can power on. I've posed this question to our VMW account team to see what they can dig u...
See more...
That's a very good question, how low is too low? 256 to 150 for me amounted to 20 additional VMs that I can power on. I've posed this question to our VMW account team to see what they can dig up. Interesting that changing 150 to 128 nets 1 additional VM...
Check my math here but is this what support is telling you? (assuming no das customizations) CPU slot size = 256Mhz (default) * max number of assigned vCPUs (i.e. the highest # of vCPUs assig...
See more...
Check my math here but is this what support is telling you? (assuming no das customizations) CPU slot size = 256Mhz (default) * max number of assigned vCPUs (i.e. the highest # of vCPUs assigned to a single VM) RAM slot size = largest assigned memory reservation or 256MB (whichever is larger) + the largest assigned RAM value of any VM Usable memory slots = total RAM available to VMs in the cluster (minus service console) / slot size Usable CPU slots = total Mhz available to VMs in the cluster / slot size Supportable VMs = the lower of the 2 slots numbers So for my environment with my 150 das customizations I get: CPU slot size = 300 (150 * 2) RAM slot size = 1174 (150 + 1024) Usable mem slots = 55 (1174/(65536 - 512)) Usable CPU slots = 60 (18000/ 300) Total VMs = 55 That look right??
Well the cluster I'm testing on is update 1 and I was hit with the same restrictions. I too found the das.x advanced settings and set a custom value of 150 for both CPU and RAM. This allowed me...
See more...
Well the cluster I'm testing on is update 1 and I was hit with the same restrictions. I too found the das.x advanced settings and set a custom value of 150 for both CPU and RAM. This allowed me to re-enable strict admission checking with no flags. I didn't mention it before but i only have 19 VMs powered on in this cluster. My largest VM is 2 vCPU/ 1GB RAM. I'll be curious to see how many more VMs this will allow me to power on here. I really wish VMW could give us a definitive response on the calculation used (like you said). It makes architecture and capacity planning very difficult plus wastes a tremendous amount of hardware given the current scheme.
Proden, did you ever get anywhere with this? The vmwolf site is now down and I'm having the same problem in my clusters. 2 hosts, 2 dc CPUS, 32GB/per, 30 VMs spread across both, 1/2 have 1GB ...
See more...
Proden, did you ever get anywhere with this? The vmwolf site is now down and I'm having the same problem in my clusters. 2 hosts, 2 dc CPUS, 32GB/per, 30 VMs spread across both, 1/2 have 1GB RAM assigned, none with any more. Memory unreserved: 57472Mb CPU unreserved: 18000Mhz I go to power on VM #31 with 1 vCPU and 512MB RM = insufficient resource error. WTF??
What kind of SAN storage and what problem exactly? I had something very similar to this happen. My volumes came back after a reboot of the SP that was active. http://communities.vmware.c...
See more...
What kind of SAN storage and what problem exactly? I had something very similar to this happen. My volumes came back after a reboot of the SP that was active. http://communities.vmware.com/message/930792
Environment: 2 x DL585 G1 hosts/ QLA HBAs 2 x Brocade 4100 FC switches (A/B fabrics) 1 x EVA 8100 (2 SPs, 4 ports/ SP/ fabric) - 9 LUNs presented to each host zoned across a...
See more...
Environment: 2 x DL585 G1 hosts/ QLA HBAs 2 x Brocade 4100 FC switches (A/B fabrics) 1 x EVA 8100 (2 SPs, 4 ports/ SP/ fabric) - 9 LUNs presented to each host zoned across all SP ports in each fabric (8 total paths per LUN per server) - Fixed pathing policy set to for LUN access each on a different SP port (backups LUN is shared with another tier-2 LUN) So last week FC switch for the A fabric panicked and rebooted itself in the middle of the day. We're running an older firmware that was required for the HDS array we just migrated off of. All active LUNs being accessed on SP-A went dead, these VMs died. All VMs living on LUNs accessed via SP-B (fabric B) remained running. Unsure what the problem was yet and in an effort to get the servers back up the admins started bouncing the ESX hosts. The critical VMs were down anyway. On reboot the hosts would hang on the HBA scans for several minutes unable to find the previously attached LUNs. Eventually the EVA SP rebooted itself and the LUNs came back with no corruption and the VMs were able to be powered on. We have cases open with VMware, HP, and Brocade on this. Apparently the EVA SP created zombie paths that were not marked down properly so ESX did not re-establish the connections on an alternate path/ HBA. These paths showed as dead in ESX and it eventually removed these LUNs from the config before the SP rebooted. After the SP reboot everything was fine again. This exact scenario happened in my DR environment after an attempted FC switch firmware upgrade. The upgrade failed due to a bad image but the switch was rebooted in a controlled fashion and the EXACT thing happened there too with the 8100 there. Only 2 LUNs were presented to my DR cluster and the 1 being accessed via fabric A disappeared just like in our primary site. After a while the SP rebooted itself and the LUN was available again. All VMs living on that volume were down hard for the duration. Anyone else seen any strangeness like this? We don't know yet if this is a VMware, switch, or array problem but it is causing many here to start questioning the reliability of ESX (unbeknownst of the backend infrastructure of course). If ESX is causing the EVA SP to melt down forcing a reboot then HP has a serious problem. Either way this is no good and may cause us to move production off of ESX.
Sorry, yes, if HA doesn't move them, there are no locks on the LUN, and the surviving host can see the LUN/VM then you can vmotion to repoint the VM's host.
I'm about to post a situation that happened to my environment last week along these lines. An FC switch panicked, rebooted itself, this created a zombie state on LUNs connected to SP-A of our EV...
See more...
I'm about to post a situation that happened to my environment last week along these lines. An FC switch panicked, rebooted itself, this created a zombie state on LUNs connected to SP-A of our EVA8100 array which brought ESX DOWN!! HA did nothing. All of our Windows hosts were able to recover fine but ESX was completely hosed. If you're not prod yet test these scenarios. Reset an FC switch port, reboot a switch, disconnect an array controller (if you can), etc. To answer your question though, the VMs should never go down, if ESX can find an alternate path to the LUNs then you should be fine. If you have redundant HBAs in each host there would be no need to fail to another host.