We're working on addressing a disk latency spike issue in VMware. The problem we're seeing is that vCenter reports a latency warning for one of our LUN's at a certain time. If I check the latency graphs for our SAN at that time, it is different then what is reported in vCenter. Any reason for this?
We're using Solarwinds Storage Manager to monitor our SAN / VMware storage.
Are you looking at real-time stats in both cases, vCenter and the SAN?
-KjB
Looking at disk performance in the past 24 hours on both.
The problem with historical reporting through vCenter, is that the further back you go, the more diluted your numbers get because of the sample time used to generate that report.
When you said the reports don't match up, what exactly does that mean with respect to what you're saying from SolarWinds? What is your stats collection level set to? Only level 4 will actually maintain max and min rollup values.
-KjB
1 Day - Level 1
2 Hours - Level 1
30 Mins - Level 1
5 Mins - Level 2
Here is the event that's logged:
Device
naa.6006016033802016fbcac95161e111
performance has deteriorated. I/O latency
increased from average value of 7851
microseconds to 329494 microseconds.
warning
4/2/2012 12:18:53 PM
server.domain.com
I did just notice something in the perflogs on the specific host. Seems the time of the spikes there are a few fibre channel paths with spikes in "Disk SCSI Reservation Conflicts" as well. Not sure if this has anything to do with the issue we're seeing?
Performance frequency in Solarwinds is set to every 5min.
What does solarwinds show for that timeframe? The event is actually better than the graphs. I tend to ignore spikes in the graph, unless it's persistent. If it's a consistent reoccurring spike, then you can consider raising that collection level temporarily, but it will increase your db size quite a bit.
-KjB
