I've got an issue with this error popping up in the system logs on the Guest OS. We're running Windows Server 2K8 on ESXI 4.0. Searching around I've seen that updating the LSI drivers might resolve this issue, however running the update driver utility yeilds nothing new other than the installed driver set. Looking at my event log it would seem this problem comes and goes on it's own. Our storage device is a Netgear ReadyNAS Pro NAS device connected with teamed gigabit ports. Does anyone have any ideas as to what would cause this and what I might look for to resolve it?
If the guest VM's are showing that error then its possible you have too much physical disk latency and the VM operating systems are suffering from timeouts while reading or writing.
I would check the esx host logs to see if there are any errors about scsi resets or errors with the vmfs volumes disconnecting...
Well I don't see anything in the Host Logs that look like a reset or disconnect. I pretty much have two sets of messages. The first one happens every 20 minutes or so and usually picks 1 or 2 machines to report on. (Different set every 20 minutes) and it reads like this:
Sep 30 20:15:01 Hostd: 2009-09-30 20:15:01.741 1133BB90 verbose 'vm:/vmfs/volumes/69b1838e-2e8094b5/FECOMPLEO/FECOMPLEO.vmx' Actual VM overhead: 183300096 bytes
Sep 30 20:15:01 Hostd: 2009-09-30 20:15:01.742 1133BB90 verbose 'vm:/vmfs/volumes/69b1838e-2e8094b5/FEDOMCON/FEDOMCON.vmx' Actual VM overhead: 252280832 bytes
Sep 30 20:15:01 Hostd: 2009-09-30 20:15:01.744 1133BB90 verbose 'Vmsvc' RefreshVms updated overhead for 2 VMs
The other one happens rarely and is the closest thing I can find to something that might resemble an error however I'm uncertain. If it happens, it always follows the message above:
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.454 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.annotation
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.files.vmPathName
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.guestFullName
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.uuid
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 guest.hostName
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 guest.ipAddress
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 name
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 recentTask
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.connectionState
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.host
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.powerState
Sep 30 17:09:36 Hostd: 2009-09-30 17:09:36.456 57C03B90 verbose 'PropertyJournal' ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.recordReplayState
I've tried matching these messages to my erros but the time stamps are way off, for example the final entry in the host log is stamped Sep 30 22:14:34 and which would be about 5 hours in the future. So I'm not entirely sure when any of this actually happened.
ESXi runs on UTC the Atomic version of GMT. See if that helps to coordinate logs
This is happening to me on 2 of my 2008 R2 VMs.
One of the Vms uses LSI_SAS and the other is an LSI Parallel, and the errors are for LSI_SCSI and LSI_SAS, but the error detail is the same. I have another 2008 R2 on the same host with LSI SAS that isn't having any issues.
The host is an HP DL580 G5 with freshly built ESXi 4.1. I have installed the latest HP NMI patch.
The storage is an HP MSA1000 via FC. No other issues with any of the VMs.
I don't see anyting off the bat in the logs that indicates what is going on....
Any ideas?
Anybody have any input?
I have the same problem.
Posted this message yesterday both at QNaps forum and vmware forum, did you find the reason? As some above said, It could have something todo with latency, i´ve a ping towards the NAS and its has 1ms or less but one ping sometimes goes up to 15ms and then down to 1ms, will keep better look at the pings today.:
Hey,
We are having some issues with a virtual environment.
Atleast once a day we get error msgs on our virutal servers and on our LUNs in vCenter.
Info msg in vCenter is:
Datastore LUNX increased in capacity from 0 bytes to xxxxxxxxx bytes 13:34
And it does this as I said one or more times per day. At the same time on our virtual machines that are connected to that server we get:
Log Name: System
Source: LSI_SCSI
Date: 2010-10-27 13:33:33
Event ID: 129
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: serverX
Description:
Reset to device, \Device\RaidPort0, was issued.
Event Xml:
We also get LSI_SAS on newer servers with newer Controllers.
The problem is that it hangs the server for a few short seconds, and this makes all applications that have connections to databases break and they need to be restarted.. our customers doesn like this at all as you can imagine.
We have two QNap RP-809U NAS servers using NFS towards Vmware vSphere4 and each server has two LUNs and this happends on both LUNs at the same time, so it seems to have something to do with the NAS. I guess the connection goes down for a few second for some reason and then reconnects.
Has anyone been had this problem before and have any suggestions? Its at random times:
NAS1:
Lun1
2010-10-14 04:26:28
2010-10-14 04:23:24
2010-10-14 04:21:24
Lun2
2010-10-14 04:26:33
2010-10-14 04:23:24
2010-10-14 04:21:24
2010-08-16 20:20:30
2010-08-16 20:20:19
2010-08-16 19:41:04
NAS2:
Lun1:
2010-10-27 13:34:37
2010-10-27 08:15:40
2010-10-27 08:14:33
2010-10-27 01:12:02
2010-10-26 16:36:45
2010-10-26 11:16:59
2010-10-25 21:15:11
2010-10-25 21:13:45
2010-10-25 15:19:43
2010-10-25 15:18:18
2010-10-25 09:37:07
2010-10-25 09:36:43
2010-10-25 09:36:35
2010-10-25 01:20:28
etc etc.. alot more on this one.
LUN2:
same as above
The official reply from VMWare was that my MSA1000 was not supported on ESXi 4.1. So, I no longer have my VMs on the MSA1000.