LSI_SAS Event ID 129 - Reset to device, \Device\Ra...

yoink · ‎09-29-2009

I've got an issue with this error popping up in the system logs on the Guest OS. We're running Windows Server 2K8 on ESXI 4.0. Searching around I've seen that updating the LSI drivers might resolve this issue, however running the update driver utility yeilds nothing new other than the installed driver set. Looking at my event log it would seem this problem comes and goes on it's own. Our storage device is a Netgear ReadyNAS Pro NAS device connected with teamed gigabit ports. Does anyone have any ideas as to what would cause this and what I might look for to resolve it?

Rumple · ‎09-29-2009

If the guest VM's are showing that error then its possible you have too much physical disk latency and the VM operating systems are suffering from timeouts while reading or writing.

I would check the esx host logs to see if there are any errors about scsi resets or errors with the vmfs volumes disconnecting...

yoink · ‎09-30-2009

Well I don't see anything in the Host Logs that look like a reset or disconnect. I pretty much have two sets of messages. The first one happens every 20 minutes or so and usually picks 1 or 2 machines to report on. (Different set every 20 minutes) and it reads like this:

Sep 30 20:15:01 Hostd: ~~2009-09-30 20:15:01.741 1133BB90 verbose 'vm:/vmfs/volumes/69b1838e-2e8094b5/FECOMPLEO/FECOMPLEO.vmx'~~ Actual VM overhead: 183300096 bytes

Sep 30 20:15:01 Hostd: ~~2009-09-30 20:15:01.742 1133BB90 verbose 'vm:/vmfs/volumes/69b1838e-2e8094b5/FEDOMCON/FEDOMCON.vmx'~~ Actual VM overhead: 252280832 bytes

Sep 30 20:15:01 Hostd: ~~2009-09-30 20:15:01.744 1133BB90 verbose 'Vmsvc'~~ RefreshVms updated overhead for 2 VMs

The other one happens rarely and is the closest thing I can find to something that might resemble an error however I'm uncertain. If it happens, it always follows the message above:

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.454 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.annotation

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.files.vmPathName

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.guestFullName

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 config.uuid

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 guest.hostName

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 guest.ipAddress

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 name

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 recentTask

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.connectionState

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.host

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.455 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.powerState

Sep 30 17:09:36 Hostd: ~~2009-09-30 17:09:36.456 57C03B90 verbose 'PropertyJournal'~~ ERProviderImpl<BaseT>::_GetChanges: Aggregate version Overflow 128 runtime.recordReplayState

I've tried matching these messages to my erros but the time stamps are way off, for example the final entry in the host log is stamped Sep 30 22:14:34 and which would be about 5 hours in the future. So I'm not entirely sure when any of this actually happened.

DSTAVERT · ‎09-30-2009

ESXi runs on UTC the Atomic version of GMT. See if that helps to coordinate logs

-- David -- VMware Communities Moderator

digitlman77 · ‎09-07-2010

This is happening to me on 2 of my 2008 R2 VMs.

One of the Vms uses LSI_SAS and the other is an LSI Parallel, and the errors are for LSI_SCSI and LSI_SAS, but the error detail is the same. I have another 2008 R2 on the same host with LSI SAS that isn't having any issues.

The host is an HP DL580 G5 with freshly built ESXi 4.1. I have installed the latest HP NMI patch.

The storage is an HP MSA1000 via FC. No other issues with any of the VMs.

I don't see anyting off the bat in the logs that indicates what is going on....

Any ideas?

digitlman77 · ‎09-09-2010

Anybody have any input?

Sp33do · ‎10-27-2010

I have the same problem.

Posted this message yesterday both at QNaps forum and vmware forum, did you find the reason? As some above said, It could have something todo with latency, i´ve a ping towards the NAS and its has 1ms or less but one ping sometimes goes up to 15ms and then down to 1ms, will keep better look at the pings today.:

Hey,

We are having some issues with a virtual environment.

Atleast once a day we get error msgs on our virutal servers and on our LUNs in vCenter.

Info msg in vCenter is:

Datastore LUNX increased in capacity from 0 bytes to xxxxxxxxx bytes 13:34

And it does this as I said one or more times per day. At the same time on our virtual machines that are connected to that server we get:

Log Name: System

Source: LSI_SCSI

Date: 2010-10-27 13:33:33

Event ID: 129

Task Category: None

Level: Warning

Keywords: Classic

User: N/A

Computer: serverX

Description:

Reset to device, \Device\RaidPort0, was issued.

Event Xml:

We also get LSI_SAS on newer servers with newer Controllers.

The problem is that it hangs the server for a few short seconds, and this makes all applications that have connections to databases break and they need to be restarted.. our customers doesn like this at all as you can imagine.

We have two QNap RP-809U NAS servers using NFS towards Vmware vSphere4 and each server has two LUNs and this happends on both LUNs at the same time, so it seems to have something to do with the NAS. I guess the connection goes down for a few second for some reason and then reconnects.

Has anyone been had this problem before and have any suggestions? Its at random times:

NAS1:

Lun1

2010-10-14 04:26:28

2010-10-14 04:23:24

2010-10-14 04:21:24

Lun2

2010-10-14 04:26:33

2010-10-14 04:23:24

2010-10-14 04:21:24

2010-08-16 20:20:30

2010-08-16 20:20:19

2010-08-16 19:41:04

NAS2:

Lun1:

2010-10-27 13:34:37

2010-10-27 08:15:40

2010-10-27 08:14:33

2010-10-27 01:12:02

2010-10-26 16:36:45

2010-10-26 11:16:59

2010-10-25 21:15:11

2010-10-25 21:13:45

2010-10-25 15:19:43

2010-10-25 15:18:18

2010-10-25 09:37:07

2010-10-25 09:36:43

2010-10-25 09:36:35

2010-10-25 01:20:28

etc etc.. alot more on this one.

LUN2:

same as above

digitlman77 · ‎10-28-2010

The official reply from VMWare was that my MSA1000 was not supported on ESXi 4.1. So, I no longer have my VMs on the MSA1000.

All

LSI_SAS Event ID 129 - Reset to device, \Device\RaidPort0, was issued.