jonretting's Posts

Yes this seems to be a bug in the Converter 6.1 as well. Only seen it so far with Windows XP conversion, since i don't do much converting. There is some strange DNS dependency in the chain somewh... See more...
Yes this seems to be a bug in the Converter 6.1 as well. Only seen it so far with Windows XP conversion, since i don't do much converting. There is some strange DNS dependency in the chain somewhere. Go ahead and edit the *hosts* file on the machine you are converting "source". (if you aren't update-able DNS servers). You will want to add an entry for the ESXi host you are using as a destination. <ip-address> <FQDN> 10.10.10.2 host.domain.com If you are using a Vcenter server to connect with, i would add that in as well.
"I'm curious anyone's thoughts on just disabling failover and forcing it to each kernel to stick to its switch (and accepting loss of communication) on that vKernel in the event of a switch failu... See more...
"I'm curious anyone's thoughts on just disabling failover and forcing it to each kernel to stick to its switch (and accepting loss of communication) on that vKernel in the event of a switch failure.  I'd like to do some lab tests with both and test switch/path failover between both of these configurations (vs. a single vkernel configuration)." If i have this right you are suggesting relying on fail-over at the interface team level. Where a single vKernel is connected to this team. Instinct tells me there might be ARP cache issues, and a very unpredictable time to a solid link. The added complexity of the unique multiple vKernels networks is required in order to make a failure event predictable. O yah, and great post! Perfect.
First try doing a rolling removal/adding of hosts to the cluster. Put the first host in maintenance mode, take it out of the cluster. Make sure your Storage Provider no longer shows that host (tr... See more...
First try doing a rolling removal/adding of hosts to the cluster. Put the first host in maintenance mode, take it out of the cluster. Make sure your Storage Provider no longer shows that host (try refreshing providers). Put the host back into the cluster, and take it out of maintenance. See if that host now shows "online" in the Storage Providers (refresh if need be). Continue the same steps for each host. You didn't mention what version of ESXi/VSAN you are running, but if you are using 5.5 there are other troubleshooting steps. I have still yet to experience a Storage Provider failure on VSAN 6, none that were not self-inflicted anyways. Best, -Retting
Just out of curiosity were you within a terabyte or so of free space on your VSAN datastore before experiencing issues? Last night I deliberately ballooned the storage on my VSAN, attaching a ... See more...
Just out of curiosity were you within a terabyte or so of free space on your VSAN datastore before experiencing issues? Last night I deliberately ballooned the storage on my VSAN, attaching a 2TB VMDK to a VM and filled it up. The point was to simulate something a client might do, and freshen up some automation tools in the process of fixing it. However the event did produce a very similar outcome to yours, and thought i should share. Maybe some of the events i describe you experienced before/after your problem. At the time VSAN had 4TB reported free capacity. The point of failure came when my VSAN hit around 1.2TB free space, and each disk per group was at 800-900GB (1TB each). The final VMDK to see mass data writes, become completely unstable. After forcing a VM power down that virtual disk and in turn the VM got marked "inaccessible" (i/o errors). Further inspection revealed a laundry list of absent/failed disk components. In order to locate the UUID for the problematic object I opened up RVC and ran "vsan.obj_status_report ~<host> --print-table --filter-table 18/32". This printed out a table just showing me objects which are inaccessible. "vsan.object_info ~0 <UUID>" reported 1.4TB of addressed space, VSAN Re-sync was frozen attempting to sync one 82GB component; however that didn't prevent it from carrying out other sync/policy related events. Since I was using the default storage policy for that disk, 1.4TB logical and 2.8TB physical. Wanting to salvage the file server, I removed the VM from inventory, then added it back in. Also removed the affected disk, but did not delete the disk (doing so would obviously result in an i/o error, and VM becoming marked "inaccessible" again). However I was still left with 90% usage on my VSAN Datastore, and the health check plugin was unable to repair the related object. So I jumped on one of the hosts and ran "/usr/lib/vmware/osfs/bin/objtool delete -f -u <UUID>". There was also the side affect of some other objects on other disks losing some of their redundancy. Probably a result of a scramble to re-arrange objects due to the ballooning use. The Health plugin quickly remedied that problem. I conducted the test with a max com size of 255GB, and plan on reducing it to 180GB for the next run. Best, -Retting
Since this is lab you might want to try upgrading to VSAN 6, install the Health Plugin, and run the data object health process. However, i would normally subscribe to scripted RVC method, like El... See more...
Since this is lab you might want to try upgrading to VSAN 6, install the Health Plugin, and run the data object health process. However, i would normally subscribe to scripted RVC method, like Elerium describes above. There are also new PowerCLI tools for VSAN6, which could be easier to systematically check/delete.  -Retting
If it's possible I would sync up with him and find out what options can be done to troubleshoot. I can't really say what his options will be, especially since I can only assume you have a signifi... See more...
If it's possible I would sync up with him and find out what options can be done to troubleshoot. I can't really say what his options will be, especially since I can only assume you have a significant amount multicast traffic on the network. In my opinion the ball is in your net admin's court. -Jon Retting
Wanted to add...  This setup employes two VMK (VLAN A/B) adapters for VSAN per node, in different Port Groups (VLAN A/B), and assigned its proper switch uplink in failover. Off hand I think that'... See more...
Wanted to add...  This setup employes two VMK (VLAN A/B) adapters for VSAN per node, in different Port Groups (VLAN A/B), and assigned its proper switch uplink in failover. Off hand I think that's correct.  Do you know if beacon is supported for a VSAN uplink now? Best, -Jon
Are using a Windows server? The following should applicable only to the Appliance (VCSA). I recall running into this issue a couple times and the solution was to use the post install script, p... See more...
Are using a Windows server? The following should applicable only to the Appliance (VCSA). I recall running into this issue a couple times and the solution was to use the post install script, passing the "uninstall" option. Then manually checking the VCSA plugin folder for any remainders. Also a quick loop through the ESXi nodes in the cluster, making sure the Plugin-In VIB was gone "esxcli software vib list". Went ahead and rebooted the VCSA, and to be thorough a rolling main/reboot of the nodes. After the all clear with the plugin-in gone from the web client. I would then install plug-in RPM, run the post-install script; and reboot the VCSA. To be completely honest I haven't encountered this issue since two VCSA updates ago. At which point I don't recall if the plug-in was actually officially released. One time I fixed this problem by just uninstalling the plugin, and updating VCSA to the latest version. Cheers, -Jon
What does your multicast/snooping/querier VLAN situation look like on your network?  Thanks, -Jon
If a data disk fails it will be marked absent, and eventually rebuilt to another disk. Thanks, -Jon
Personally I do not recommend LAG/LACP for VSAN traffic. After playing around extensively with multiple environments, including the lab, LACP seems untenable on VSAN. I have yet to ever see prope... See more...
Personally I do not recommend LAG/LACP for VSAN traffic. After playing around extensively with multiple environments, including the lab, LACP seems untenable on VSAN. I have yet to ever see proper aggregation of traffic is any form. VSAN seems to just use whatever the link speed is, regardless of hash policy or LACP config. Moreover in these cases the VSAN network was unstable. Nearly always I opt for explicit failover for VSAN, more then one uplink is not possible without LACP, so "Routing based on anything" is not possible, or so i have come to think. Last I read beacon is not functional on VSAN, but that may have changed. I cannot comment as to using redundant switching and how it applies to VSAN and LACP. Granted this is a 2013 post http://www.yellow-bricks.com/2013/10/29/virtual-san-network-io-control/ but has served me well for best practices in production with 2x 10GB ports. Thanks, -Jon
Hmm... Just another idea, what about attaching an iSCSI disk or PXE into ESXi? Thanks, -Jon
You wouldn't happen to have a spare AHCI disk controller you could plop in for testing? Tall order i know. So that your ESXi/Scratch SATA is on that, hopefully eliminating something from the equa... See more...
You wouldn't happen to have a spare AHCI disk controller you could plop in for testing? Tall order i know. So that your ESXi/Scratch SATA is on that, hopefully eliminating something from the equation. Thanks, -Jon
That's interesting... Obviously you don't get the PSOD/hangs when running just the SD and no scratch?  Is there an onboard SATA controller you could use for both ESXi/sctratch? My reasoning here ... See more...
That's interesting... Obviously you don't get the PSOD/hangs when running just the SD and no scratch?  Is there an onboard SATA controller you could use for both ESXi/sctratch? My reasoning here is rule an assortment of things out. Also I have occasionally run into issues when booting certain machines into ESXi via UEFI. Personally I ditched the USB/SDCARD method a while ago in favor of onboard high-temp SLC SATA DOMs. Thanks, -Jon
Sorry if I don't remember but you are talking about SATA for your storage tier correct? Thanks, -Jon
Which firm/driver combination are you using now, and or with the benchmarks you did? That is a serious latency improvement, sub ms it looks like...and your max lat is magnitudes faster Yay SAS. W... See more...
Which firm/driver combination are you using now, and or with the benchmarks you did? That is a serious latency improvement, sub ms it looks like...and your max lat is magnitudes faster Yay SAS. Was anything else going on in the VSAN while you were benchmarking? Try a benchmark while objects are being synced/policies changes. Keep up the good work. Thanks, -Jon
This and your last post are outstanding. Couldn't agree with you more, and i know how to try.  Your post regarding ticket escalation is beyond spot on. Thank you. The Supermicro Twins have bee... See more...
This and your last post are outstanding. Couldn't agree with you more, and i know how to try.  Your post regarding ticket escalation is beyond spot on. Thank you. The Supermicro Twins have been stellar performers. For my lab build I went with zero-ready abstracted nodes. The only caveat would be the the four 1u Supermicro chassis. Everything down to using HDPLEX mini 160W PS, load-sharing internal AC adapters, and the disk storage. Continuing the line of thought, i find myself seriously pondering removing the rack configuration, and building a skeletal structure to fit the purpose. Hyper Convergence really brings interesting things to the table when designing self-contained cabinets. Best, -Jon
Your hunch was correct. -- All caching disk/controller/array should be disabled in Pass-Through mode. I don't seem to recall if you are RAID0, if you are some enabled caching and saw performance ... See more...
Your hunch was correct. -- All caching disk/controller/array should be disabled in Pass-Through mode. I don't seem to recall if you are RAID0, if you are some enabled caching and saw performance benefits. However I lean heavily toward them not having a SAS/PCIe performance tier, and cache mitigated the effects of a single symptom. As you begin testing various things, be sure to run a vSphere Data Protection perf test, and re-sync some large objects, and get some client end benchmarks during. You should also be safe when running the Multicast Perf test in VSAN health. Vmware recently confirmed they throttle the test, probably to avoid contention. I especially am very interested in your results.  Best, -Jon
If you are seeing inconsistencies in the file system  versus what you know has been deleted, that would be indicative of a greater issue with your VSAN. The only problems i experience like you de... See more...
If you are seeing inconsistencies in the file system  versus what you know has been deleted, that would be indicative of a greater issue with your VSAN. The only problems i experience like you describe in lab/production is with the Content Library being stored on the VSAN, or objects associated with VM Templates. Not being able to quickly bring templates under storage profiles is a separate issue. Thanks, -Jon
Well the tests initially started on LSI 2308, and I went through all the firmwares and all the drivers. As you state the controllers may just have not liked it, so then I borrowed four Dell H330'... See more...
Well the tests initially started on LSI 2308, and I went through all the firmwares and all the drivers. As you state the controllers may just have not liked it, so then I borrowed four Dell H330's and had the same result. The only thing missing from my tests was DP hosts. Yet, I can't see how that would have improved things. You are totally correct as PCIe/NVME allows your HBA's to-do what they do best. Cheers, -Jon