Solved: Re: Dell EqualLogic array stopped working during o...

mechgt · ‎03-01-2015

During/after a planned maintenance outage, ESXi cannot mount datastores on our Dell EqualLogic Array and I have no idea why.

We're running ESXi 5.0. We shutdown everything over the weekend, replaced some network cables and when we came back up our ESXi hosts could not see the iSCSI EqualLogic array, however it could see several other iSCSI arrays on the same storage network. In digging into log files, I see the ESXi host doing CHAP authentication to the EqualLogic array, and the array says it logs in successfully, then loses connection about 10 seconds later. 10 more seconds ESXi attempts reconnection to the EqualLogic and the same thing happens over again.

I see several logs in ESXi hostd.log that indicate this array going ONLINE, then OFFLINE, and another line stating 'Possible sense data' leading me to http://kb.vmware.com/kb/289902 . I feel really handicapped at the moment as this is the storage array hosting our vCenter Server, which puts us in a mess right now.

Also, If I try to 'Add Storage', ESXi sees the EqualLogic stores exist (shows proper volume size etc.). However as I'm trying to add them (through vSphere Client), step 1 takes a long time to complete, step 2 indicates the 'disks are empty' (which I don't believe) and then after another long timeout presents a popup error.

Are there any commands or utilities that are good for diagnosing iSCSI specifically?

I can post more specific log details if that's helpful.

P.S. - The new network cables are communicating ok with the other storage targets, and we did not replace the EqualLogic cables.

mechgt · ‎03-01-2015

Fixed; the MTU setting on my storage switch was clipping some of the storage traffic. Below are the troubleshooting steps that I went through, maybe it'll help someone else out. The solution didn't exactly have anything specific to do with ESXi or EqualLogic. Basic troubleshooting and digging.

Inspection of the storage array logs/events indicated it saw and accepted the ESXi host connection appempts. I saw messages to the effect of:

- Connection from ESXi iqn bla bla bla established; CHAP authentication successful.

- 10 secs later: Connection reset/aborted by peer.

- 5 secs later: Connection from ESXi established successfully....

(rinse and repeat several times over)

Inspection of the ESXi hostd.log file shown several 'connection reset by peer' messages. Both the storage array and ESXi were pointing at each other.... and the switch is the only thing in the middle.

Troubleshooting steps now:

- using vmkping to ping the array from the ESXi - successful: vmkping 192.168.1.20

- using vmkping to ping the array from the ESXi with a size of 9000 (jumbo frame size): NO RESPONSE!: vmkping -s 9000 192.168.100.20 (<-- this is where the money was for finding the issue)

MTU was set to 9000 everywhere. Turns out the switch needed to be set a little higher (to account for packet headers??? who knew :smileyconfused: )

So, this setup ran for over a year untouched working and just stopped when we took it all down and brought it back up. :smileydevil:

View solution in original post

mechgt · ‎03-01-2015