VMware Cloud Community
TheDL1
Contributor
Contributor

SQL query network traffic seems capped

Hello, I'm new to the forums and somewhat of a VMware beginner, so please excuse any stupid questions.  I presume this post requesting assistance will likely result in questions from the community about our environment in order to narrow in on the cause of issue so I will do my best to answer promptly as it will be a bit of a discovery excercise for myself also (as I'm new to this organization and their VM environment).


One of our DB admins came to me recently with a request to investigate latency issues with a SQL query.  Basically he has an application (Forefront Identity Manager) that is querying a SQL database, and his DB monitoring shows latency issues.  Both the application server and SQL server are VM's (ESX 5), and there are in fact numerous instances of both application and SQL server (production, test, QA environment).  The latency issue affects all instances, and seems to be network related.  I read some articles on the subject and looked at our environment, both at the VM and guest O/S level and I can't figure out what the problem is.  It is worth noting that when the application and SQL database exist on the same guest machine, there is no problem.  After looking at the disk, memory, CPU performance metrics, I don't believe this is a cause for the problem - but perhaps I am missing something.

Below is the email from our DB admin with details:

"The issue I was mentioning earlier is network performance between the FIM Synchronizationservice and the FIM database server:

1. With the DPA (Database Perf Analyzer) I can see that the predominant waits are Async_Network_IO

Here is an example:
 
 


radu.jpg

  • They make-up over 90% of the waits for this kind of FIM run and about 60% of the overall waits on the FIM database server
  • DPA might help with other networking details (e.g. counts of packets sent/received etc.)

2. Resource Monitor shows a total network IO of max 8 – 10 Mbps during the same time (when this FIM Sync process runs)

3. I can easily reproduce this behavior in any FIM environment

4. We have many FIM instances (FIM, FIM R2, QA, Prod etc.)

    • All are running in VMs
    • this issue shows-up in all of them, as long as they are not collocated (FIM Sync on the same VM with the FIM DB server)
    • historically, collocation was not accepted as an option when these systems were built as much as I would like to collocate them now, it becomes quite a tedious task, especially on FIM R2 where we have more FIM services using the same database server

5. Using other methods (file transfer, Iperf) I see that the potential networking throughput is much higher.

6. The issue seems to be specific to the connection between the FIM Synchronization service and the FIM database server.

7. My conclusion, from all the above observations (especially #5 and 6), is that:

      • I’m probably wrong to focus on NIC settings
      • It might help to focus closer on FIM-specific network connectivity (these are next on my list):
        • Either the database connection string
        • Or the authentication method (test performance when SQL authentication is used instead of AD integrated auth)

Disk is not the bottleneck in this scenario. DPA does not show much disk IO pressure.

 

The total network IO being “capped”, as mentioned earlier, to 8 – 10
Mbps (just about 1 MB/s) during the time when FIM Sync process runs seems to
limit the entire FIM Sync process before the disks would get any stress.



0 Kudos
1 Reply
TheDL1
Contributor
Contributor

Update:

More testing results from DA Administrator:

FYI

I’ve tried the following

  • Ran iPerf under the account used by the FIM Sync
    service for Windows Integrated Auth to the FIM SQL DB
  • Also, pointed the above iPerf to the SPN name
    (si69.xxxx.xxxx.xx.ca) instead of the machine name of the FIM SQL DB
  • Ran a SQL query stress tool (both under my account and
    as the account used by the FIM Sync service) and stress tested the FIM SQL DB
    from the FIM App server

All the above tests exhibited much higher network throughputs (> 300
Mbps), over 30 x higher than the ~ 10 Mbps I get from any FIM instance.

I’d say that this:

  • Excludes both the SQL Server side and the Windows
    Integrated Auth (SPN)
  • And leaves the FIM application as not playing nicely
    as a client over the network.

FIM Sync is not “returning” the acknowledgment to
SQL for the initial queries until it completes processing the individual RBAR
updates

Here is it for a [Full Import and Full Sync] :

radu2.jpg


  • If that is the case, would there be a simple way to reduce network latency between these VMs?
    • That would be the last thing to try in the “over-the-network” setup, before I look into collocation.


0 Kudos