I would like to seek an advice from your guys on my 3-DC design. My intention is DC 1 (primary for sub 1,2,3) + DC 2 (primary for sub 4,5,6) + DR. and DC 1 fails, DC 2 takes over sub 1,2,3. if both DCs fail, DR takes over all subs. is it possible?
The use case you are trying to accomplish I think can be accomplish by using stretched networks and Global Federation in NSX-T. But to be honest with you I do not know how Federation manages having more than two location regarding to the Active-Standby deployment.
I could not find anything in the documentation regarding to this behavior. You can have up to 4 locations with NSX-T Federation being one as Primary and the subsequent ones as Secondary but the challenge is there in how NSX-T chooses which will be the actual Secondary region, I could not find any type of "Weight" or "Priority" to be defined.
There may be a couple of options here, keep in mind federation is not fit for production use as of yet.
You could look at running multisite in active active so that both sites are actively passing dataplane traffic. However, this may become unstuck when you throw in the DR site, as the edge node placement would be for example;
Now should either of the DCs go dark, the standby edge will become active, and you still need to have a method for workload migration.. think vSphere Rep + SRM or something that provides similar functionality. With your requirement to have it all fail to a DR site in worst case scenario, this again could potentially be achieved if you once again had a way to orchestrate the management and dataplane failover, eg.. if you lose your final site after one has already failed (this even though hypothetical sounds really bad!) then the remaining components are failed over to the remaining DR site. Keep in mind you can also have up to 10 edge nodes and 8 ecmp paths in an edge cluster.
My blogpost here describes a few different multisite topologies, Multisite Deployment of NSX-T Data Center | NSX-T 3.0 | LAB2PROD.
You would be looking at a version of option 2, it'll definitely require a bit more thought, but my first hunch is it is probably doable. What sort of requirements do you have for SLA's / downtime in the worst case scenario?
Federation known issues; VMware NSX-T Data Center 3.0.2 Release Notes