Assuming we are talking about VMs that are on VLAN backed networks and the hosts have trunked interfaces carrying all the relevant VLANs. The esxi host knows what mac addresses are local, so if the destination mac isn't local the packet will be forwarded to the physical network via one of the VSS/VDS uplinks. The traffic would be routed by a L3 switch/FW and put on the destination VLAN where it would be received by the ESXi host on its uplink(s).
You can find an overview of vSphere networking here. This document is very old, but the concepts are still the same.
If you are interested in packet walks with NSX and distributed routing, there is an excellent series of blog posts here by John Kozej.