VMware Cloud Community
vijay_ayar
Contributor
Contributor

High latency every day at 12:30 AM EST(Us time) til 12:50 AM EST(Us time)

Hello All,

 I am using vmware esxi ver 6.7.0 on four servers and all four servers are connected to a Dell ME4 Storage device by Iscsi directly without a switch setup. Each server gets started with high latency at 12:30 AM EST until 12:50 AM EST every day.  Here i am sending you a few screenshots of esxtop command. So if anyone has any idea could you please help me out?

 

Here I am sending you here with attached file.

Here is the VMware detail

VMware Esxi Ver: 6.7.0
Esxi Build Number : 15160138
Client Version: 1.33.4
Client Build Number: 14093553

 

Thanks,

Vijay

 

 

 

 

VMware Esxi Ver: 6.7.0
Esxi Build Number : 15160138
Client Version: 1.33.4
Client Build Number: 14093553

Labels (1)
0 Kudos
19 Replies
depping
Leadership
Leadership

Could be a couple of things:

  • backup job?
  • anti-virus?
  • database dump?
  • any other OS-initiated cleaning job?
0 Kudos
vijay_ayar
Contributor
Contributor

so far we don't have any specific script which runs at this time and it suddenly boosts high latency during this time period only. I want to know is there any script or anything task in VMware 6.7.0 version which causes the issue. I didn't set any schedule on VMware esxi.

I have verified VMkernel log also but i am unable to find anything during this time.

0 Kudos
ICTSTC
Contributor
Contributor

Might it be network related? A task that runs the same time every day that uses up alot of network or alot of resources of 1 VM? 

0 Kudos
vijay_ayar
Contributor
Contributor

During this time Network is running at normal speed. but based on esxtop command each and every vm server are taking high latency. which i mentioned in the attached file. So i want to know how to find the cause of vmware 6.7.

0 Kudos
depping
Leadership
Leadership

Are you using Storage DRS and/or Storage IO Control?

0 Kudos
vijay_ayar
Contributor
Contributor

I am using Dell Me4 Storage Device and its direct connected to all 4 vm servers by iscsi controller.

0 Kudos
vFouad
Leadership
Leadership

Are any of your VM's databases? is there an internal automated database cleanup happening? MSSQL will auto schedule something like that...

In the above question, they were asking if you have the vSphere Storage DRS feature enabled, or vSphere Storage IO control:
Storage DRS:
from vCenter -> Cluster -> Configure -> Services -> Storage DRS (is it on or off here?)

Storage IO Control:
From the Datastore view in vCenter -> <datastore> -> Configure -> settings -> General -> Datastore capabilities section (is Storage I/O Control enabled?)

0 Kudos
vijay_ayar
Contributor
Contributor

We are using the Sybase database for one of the applications. and it's working on a few servers. So the schedule stuff I will have to verify once again by login into the database

Also, I am using individual VM ESXi as of now. So I am not getting the option as mentioned above..

Also is there any way or any command on the VM ESXi server, were I can find the server which cause the latency?  So i can look into the server in depth.

0 Kudos
depping
Leadership
Leadership

you could look at esxtop and look at "v" as that shows you IO for the VMs.

vijay_ayar
Contributor
Contributor

thanks for your support.

Is there any way we can bind queued dept per virtual machine?  few virtual machine spike 1000 cmd packets during night time and sometime in the day. also.

 

 

0 Kudos
depping
Leadership
Leadership

you set limits on a VM for IOPS:

https://kb.vmware.com/s/article/1038241

 

Screenshot 2023-06-08 at 13.44.30.png

0 Kudos
vijay_ayar
Contributor
Contributor

I have tried but still we are getting high latency on DAVG/CMD  sometimes and due to this reason we are getting virutle machine latency.

Is there any bug on Vmware 6.7 patch 3 so far?

0 Kudos
vFouad
Leadership
Leadership

The last build of 6.7 is build number: 20497097; it was released on 06 October, 2022. 
You can check that directly on the host either from the splash page on ESXi or connect via ssh and run vmware -vl


vSphere 6.5/6.7 End of General Support was 15 October 2022.
Technical Guidance for vSphere 6.5 and vSphere 6.7 is available until November 15, 2023
For detail on what those terms mean see: https://www.vmware.com/support/lifecycle-policies.html

At a high level that means there will be no new patches for 6.5/6.7.
Updating to a newer release would probably serve you well here, then you could open an SR and get the global support team to work with you to pin point the cause of your issues, they would want a full log bundle. 

Have you checked the array side to make sure it isn't encountering issues on its storage processors during the times you are seeing the latency or it isn't scheduled to auto tier or otherwise optimize storage during this time?
You may be asking for more burst IOPs than your storage can deliver.

Thanks,

vFouad

0 Kudos
vijay_ayar
Contributor
Contributor

Hello vFouad,

           I am already using Vmware esxi 6.7 Patch3 on 4 hosts with Dell ME4024 by adapt raid setup and 2.4 sas x 24 hdd at the moment. So far during latency, i could not find any latency or any iops request on Storage.

But if you have any command or any tool to verify in detail than can you please help me with the command for Dell ME4 storage device? So i can find in detail more for the same.

 

0 Kudos
kastlr
Expert
Expert

Hi,

you mentioned that your 4 servers are directly attached to the storage array, but you didn't share the info how many pathes are used by each host.

If you did follow the BP each host should have at least 2 links to the array (one per storage processor).
As a result each host should have 2 pathes to each LUN, and you should check for each LUN if all nodes uses the same path selection policy.

Because LUNs are "owned" by a single storage processor you should also check that all nodes send their IOs to that LUN via the same storage processor.

I.e, if one node did send IOs to a datastore which resides on LUN 0 via storage processor A and another node talking to the same datastore uses a path via storage processor B the LUN might frequently trespass between both storage processors.

This would have a massive impact on IO performance. 


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
vijay_ayar
Contributor
Contributor

Hey Kastlr,

       I am using two paths for each host.

Also, I have created two data stores (volume) which are running on each controller.

Like : Controller A = Volume1   and Controller B  = Volume2

 Here I am sending you a diagram of the VM host and dell ME5 storage device.

0 Kudos
depping
Leadership
Leadership

Just to be clear, high DAVG/CMD indicates it is not on the VMware side, but rather the path between the hosts and the storage, or the storage system itself. Unless you also see high GAVG and high KAVG of course.

0 Kudos
vijay_ayar
Contributor
Contributor

         I totally agree with your answer. But what could cause in SAN as I am following best practices based on VMware and San?  Also i am unable to find any subnetting setup for directly-attached san (without physical switches) anywhere.

I can't set up the same subnetting on the same host as it will route on a single path only. 

Example here:

On san Controller A: 10.10.23.101/24 Controller B:10.10.27.101/24

On VM esxi 8.0, I have configured separate Vswitch, port, and Vkernel with each physical (I have two physical nic with 10 G)

ISCSI VMkernel 1 IP : 10.10.23.100/24    and ISCSI VMkernel 2 IP: 10.10.27.100/24

 

Let me know if I did anything wrong here. Or if you have any idea for the direct san setup please guide me

 

0 Kudos
depping
Leadership
Leadership

Is it only DAVG or also KAVG at the same time? If it is DAVG only it is 99% certain it is something on the storage side causing it. Snapshots / Backup etc? As I mentioned, use ESXTOP to identify where it is coming from from the VMware side, if you only see DAVG you need to look at the storage system, I am not a Dell expert so you would need to ask on their forum probably.

0 Kudos