VMware Cloud Community
DrNickT
Contributor
Contributor

After upgrading to Cisco UCS 2.2(1d) my VMWare VMs are having LSI errors in the Event Viewer

I upgraded my virtual environments to Cisco UCS Release 2.2(1d) over the past couple weeks.  We have had a few VM's freeze, and one error I have noticed in all the event viewers is

LSI_SAS

Reset to device, \Device\RaidPort0, was issued.

Some have it more than others.  I went back in history and this only occurred in each environment after upgrading to UCS 2.2  Is this a known issue?  I am running ESXi 5.1 Update 2 on my hosts.  each host is a B200 M3

Thanks

Tags (2)
32 Replies
joeboyd
Contributor
Contributor

Sorry for the late response, but I was out of the office for the past week.

I have seen those messages, not sure what any of that means.

I'm starting to hear of more situations like this from local friends of mine.  It's starting to look more and more like some kind of bug in the code.

I just found this link to something similar  https://supportforums.cisco.com/discussion/12234611/ucs-221d-leads-vmware-network-outtages

I'm supposed to have a call with a local Cisco Engineer tomorrow to have a look at it, but I'm not sure that he will find anything.  probably like you, I've had EMC, VMware and Cisco all look at it.  None find anything on their end so they point fingers at the others.  I'll let you know if I find out anything.

0 Kudos
joeboyd
Contributor
Contributor

Nick

Are you going FC or FCoE into your fiber switch from the FI's?  The ports on the FI you are using for FC or FCoE, are they on an expansion module?

I'm starting to think my problem is with the (hardware issue) FI's or the FI expansion modules.  Unless there is a bug that is unknown, as all firmware has been checked.

0 Kudos
DrNickT
Contributor
Contributor

FC, and yes on an 16-port expansion module.  What about yours?

0 Kudos
WessexFan
Hot Shot
Hot Shot

Let us know what the resolution is.. im interested to hear what they find. I have the exact same setup as yours, but with Compellent storage.. I haven't had any issues and I'm on the exact same ESXi version(s) and UCS firmware code as you.. weird.

VCP5-DCV, CCNA Data Center
0 Kudos
joeboyd
Contributor
Contributor

FCoE on ports 13,14,15,16 on the expansion module.  Could there be a problem with having FC or FCoE on the expansion modules?

I'm working on getting one FI and the expansion module replaced today.  Once in place I plan on forcing over all storage traffic through that one FI and see if the problem persists, if it does than I'm going to move FCoE from the expansion module to ports not part of the expansion module and test that.

We're kind of at the end of the rope here, VMware EMC and Cisco have checked everything, configs, firmware but cannot find anything so we are requesting Cisco start replacing hardware, maybe it was a bad batch of FI's or expansion modules.  I'm also starting to see weird issues that cannot be explained like one FCoE port discarding a lot of packets and Configuration failues of service profiles where it does not fulfill local disk requirements.  These things have been happening randomly for a while.  I think they may be indicative of a bigger problem with the FI's.

0 Kudos
WessexFan
Hot Shot
Hot Shot

From what you are saying, I think the fabric interconnects may be to blame as well.

VCP5-DCV, CCNA Data Center
0 Kudos
joeboyd
Contributor
Contributor

I'm glad you agree.  There are just some weird things occurring especially the storage issue, that EMC, VMware nor Cisco can not explain  and they have all been just pointing fingers at each other.  I'm using the same storage that I was pre-UCS, and nothing has changed in that regard.  Cisco has confirmed firmware, and configuration on both UCS and Nexus 5548, so I have to wonder if it is a physical hardware problem.

  I feel like I am losing my mind, and i am just going in circles.

0 Kudos
WessexFan
Hot Shot
Hot Shot

It's got to be a hardware issue if all 3 vendors are pointing fingers.. hardware fails.

Let us know if those new FIs are good. Thats the only thing I can think of, sorry about that. Been there too Smiley Sad

VCP5-DCV, CCNA Data Center
0 Kudos
joeboyd
Contributor
Contributor

Nick

Do you have Mirrorview enabled on your array?

I have Mirrorview on my clariion, and I think the mirrorview ports are causing the issue.  I am going to remove mirrorview and reboot the SP's and see if that fixes it.

0 Kudos
henber
Contributor
Contributor

Have you found a sollution to your problem?

We are having the exact same issues and problems in our ESXi logs, performance is very bad time to time, svMotion times out, and VM(s) get LSI SCSI error in the event logs.

Hardware:

Cisco UCS 2.2(1.c)

B200 M3

VIC1240 and 1280

nexus 5548's

IBM XIV Gen2

Esxi

5.1U2

Have been in contact with IBM, Cisco and VMware and they cannot find out what is wrong.

We have tried

- Changing FC cables

- Beta drivers from Cisco for FNIC

- external HP server with Qlogic HBA (same problem as UCS blades)

- Switched SFP:s

- different versions of FNIC drivers....

- and so on and so on.

problem still exists and non of the vendors can say what the problem is, even after several webex sessions and multiple logs from our side. Seems like there is something one of the vendors are keeping from the public.... Did you manage to solve it?

Br

Henrik

0 Kudos
kilmanjaro
Contributor
Contributor

We had this problem in the past and it turned out to be a bad cable that was getting CRC errors.  You can check for them from the CLI by doing a 'connect nxos' then running 'show interface counter error' and see if you've got any ports with a bunch of CRC errors.  Your problem may turn out to be something completely different but ours was a bad twinax from the chassis to the FI.  Hope that helps.

0 Kudos
Guic3
Contributor
Contributor

Did anyone ever find any other solutions? I am having this same issue but only on one of my FCS. We switched to a fixed path on our VM's and used our B side FCS and have no issues. All paths through FCS A has the LSI errors. We have checked both switch configs, the VNX including changing SP ports and our UCS. Did all of your issues derive from the chassis to FI cables?

0 Kudos
lane0550
Contributor
Contributor

Hi, were you able to find resolution to this issue? We are experiencing the same errors in our VMware, Cisco UCS and EMC VNX 7500 physical setup. And so far, no resolution. The vendors have just been pointing fingers at each other. Would really appreciate an update if anything you tried made a difference.

0 Kudos