Looking to create a health check document for NSX. Been running into clients that for one reason or another do not have correct VLAN's established, incorrect MTU sizes, etc... Looking for a document that has all of the main items needed for NSX to traverse the physical network. I understand that we tell them in advance what is needed, but something to actually pinpoint what is required would be grand!
Something else you might want to consider is wrapping that in a larger offering and including a trial version of vROps. With the NSX management pack (which is free), it actually enables alerts that can detect some of these misconfigurations and fire them off for when things like MTU sizes mismatch or other issues. It's not the same thing as a document, obviously, but I've found it's a good way to catch configuration problems without hinging upon the human error factor.
Your solution works after NSX has been installed. I am looking for something to give a client that we are installing for. We spend a good bit of time trying to figure out the physical side most of the time while installing NSX. Having something to give the client ahead of time that is precise in regards to what they need ahead of time would be beneficial.
With NSX, we get alerts that tell us when there is a mis-match. But trying to track that down on the physical side is tricky. That takes a lot of time
A majority of the time, the client feels that they have everything they need prepped. In actuality, they don't understand NSX as well as they think. While it doesn't take much to for NSX to work, the client believes its as easy as the physical side. It is, but there is confusion. Im trying to limit the amount of confusion and be able to get the client up and running faster.
It may not solve all your problems but have you looked at NSX-PowerOps (www.nsx-powerops.com). The healthcheck of these tools wont be useful before NSX is deployed but as soon as you have the Manager deployed and basic config like host prep and config vxlan done. Now, instead of relying on manual MTU pings, checking VIB versions, checking routing tables on edge, dlr, controller and hosts, you can automate all of that via this open source tool.
For ex, in your case, the VTEP to VTEP test will automatically connect to each ESXi host (no need for SSH) and ping from all the VTEPs on this host to all the other VTEP IPs in the environment with the MTU you specify. It will produce an Excel file that will tell you From Host_1, VMK_2, IP_3 TO IPX, VMK_Y, Host_Z you have a ping failure on this MTU. This will take you to the exact paths in your environment with MTU issues basically cutting down on the time you spend troubleshooting.
NSX-PowerOps has 2 pieces. Documentation and Healthchecks. Take a look at the blog and let me know if you have any questions.