VMware Cloud Community
MichaelLove
Contributor
Contributor

Problem with Promise vTrak e610f (NMP errors)

I've been running an ESXi 4 server connected to a Promise vTrak E610f SANS connected via fiber channel and was running fine

After the upgrade to ESXi 5 I've been having problems with my logs filling up with this:

2011-10-11T22:37:29.013Z cpu1:3624)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4124003d4840) to dev "eui.2262000155e8917c" on path "vmhba3:C0:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:EVAL
2011-10-11T22:37:29.013Z cpu1:3624)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "eui.2262000155e8917c" state in doubt; requested fast path state update...
2011-10-11T22:37:29.013Z cpu1:3624)ScsiDeviceIO: 2305: Cmd(0x4124003d4840) 0x12, CmdSN 0x3fa4 to dev "eui.2262000155e8917c" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2011-10-11T22:37:30.021Z cpu1:2049)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "eui.2262000155e8917c" state in doubt; requested fast path state update...
2011-10-11T22:37:30.021Z cpu1:2049)ScsiDeviceIO: 2316: Cmd(0x4124003d4840) 0x12, CmdSN 0x3fa4 to dev "eui.2262000155e8917c" failed H:0x7 D:0x2 P:0x0 Possible sense data: 0x5 0x24 0x0.
2011-10-11T22:37:30.222Z cpu1:2049)ScsiDeviceIO: 2305: Cmd(0x4124003d4840) 0x12, CmdSN 0x3fa4 to dev "eui.2262000155e8917c" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2011-10-11T22:37:30.626Z cpu1:3624)ScsiDeviceIO: 2316: Cmd(0x4124003d4840) 0x12, CmdSN 0x3fa4 to dev "eui.2262000155e8917c" failed H:0x7 D:0x2 P:0x0 Possible sense data: 0x5 0x24 0x0.
2011-10-11T22:37:30.827Z cpu1:2049)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "eui.2262000155e8917c" state in doubt; requested fast path state update...

I've been experimenting with it the last few days and this is the only storage server I'm having a problem with (tried it with two Sun x4500 servers that are serving drives over FC with COMSTAR). However, no other computer is having a problem with the vTrak, only VMWare.

I only get the above errors when writing to the datastore, not reading.

If I pass the disk through to a virtual machine as a raw disk I can read and write to it without any errors and nothing in my vmkernel.log file.

I've tried combinations of different controllers and cards and pathing options. I've changed several settings on the vTrak with no effect. I've updated the firmware in my FC cards with no change.

I've read posts from others with similar error messages, but haven't been able to resolve it.

The vTrak is running the latest firmware available, too.

Any thoughts?

Tags (4)
0 Kudos
29 Replies
Knightro
Contributor
Contributor

Hi Michael,

I have opened a trouble/support ticket with Promise about the vTrak e610FD as we had been informed from the Promise Sales team that there would be a firmware release completed by the end of September to support vSphere5. They have not released any firmware for us.

Have you had any luck with your setup?

0 Kudos
MichaelLove
Contributor
Contributor

No, I haven't been able to fix it on my own. I have all of the latest patches installed, too.

I'm not having a problem with the new HP P2000 G3 disk array, but that's currently on the HCL.

I haven't opened a case yet because I've been waiting for our licenses to go through; we've been running the free version of VMWare for the last two years, but we're expanding our setup and we're purchasing licenses so we can use larger servers and get support. I should have our licenses sometime next week and then I'll open a support ticket.

I should probably contact Promise, too. I had been assuming it was purely a VMWare problem since the same array worked fine under ESXi 4.

I'll let you know if I find out anything.

0 Kudos
zkucera
Contributor
Contributor

Hi guys,

do you have any progress with your problems ? I have same configuration, this week i upgraded Promise to new Service Release 2.7 which was released in Nov 21st, but problems still persist.

Thanks, Zdenek

0 Kudos
MichaelLove
Contributor
Contributor

Unfortunately, no. I spoke with VMWare tech support last week and they were totally unhelpful. Since the unit isn't currently listed in the HCL he wouldn't even talk about it. When I said that tried to explain that I thought it was a regression, since it worked fine under 4, he continued to repeat the above.

The only thing he finally told me when he deviated from repeating the same phrase over and over was that I would have to get Promise to make the fixes.

However, the warrant period for our SANS was up in February. I'm also doubly messed-over because our unit was the one purchased through Apple for the XServe, and Promise stopped providing firmware after SR2.5 for it.

So, I pretty much have no one to contact and no one to listen to me.

0 Kudos
teroles
Contributor
Contributor

I have the same configuration (e610f with Apple FW SR2.5) in test with esxi 5. Dual controllers, and QLogic dual port 8GB HBA. Direct attached, one port to each controller.

Found out that with factory defaults and Promise best practises configuration, the performance is very poor.

After struggling with settings, I found a configuration that seems to work quite well:

Subsystem configuration:

- disable cache mirroring (this seems to be the main problem)

- redundancy type: active-active

Adapter configuration:

- enable adaptive writeback cache

- disable host cache flushing

- disable forced read ahead

- lun affinity can be enabled if you want to load balance the traffic to both controllers

ESXi configuration:

- for each LUN, manage paths, change path selection to Round Robin

When monitoring with WebPAM PROe and ESXi, the performance seems to be reasonable and the array/disk speed is now the limiting factor, not the controller or the subsystem.

Will do next week some stress test in the workbench, and also test a switched san setup.

0 Kudos
MichaelLove
Contributor
Contributor

Thanks!

I'm going to try that tomorrow. I never thought to mess with the cache mirroring settings.

0 Kudos
MichaelLove
Contributor
Contributor

I tried it, and while it looked promising for the first few minutes, once I started copy a large file (1GB), i started getting these errors again, and the performance tanked:

2012-01-26T19:29:07.053Z cpu24:8216)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x12 (0x4125000f3580) to dev "eui.22f900015559b56a" on path "vmhba2:C0:T4:L0" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Act:EVAL
2012-01-26T19:29:07.053Z cpu24:8216)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "eui.22f900015559b56a" state in doubt; requested fast path state update...
2012-01-26T19:29:07.053Z cpu24:8216)ScsiDeviceIO: 2305: Cmd(0x4125000f3580) 0x12, CmdSN 0xfc39 to dev "eui.22f900015559b56a" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Trafer rates went down to 50-60Kbps.

Once again, this only happens if I use a Promise logical device as a VMWare datastore. If I give a virtual machine direct access to the drive, I get full performance at around 320Mbps and no errors in the log.

0 Kudos
teroles
Contributor
Contributor

That's interesting.

Which SATP and path selection policy are you using?

I have been running both iops and throughput tests with two vm's for a few days now without problems

0 Kudos
tmeader
Contributor
Contributor

Has there been any update on this issue at all? We just recently upgraded to vSphere 5 and we've discovered the hard way this problem. To the person who posted the original answer: you stated that you had this successfully working with the Apple version (shows up in the firmware as e610f as well), using the config you mentioned, along with RR for the path selection. I assume this means that you have dual controllers in the RAID? If so, could you possibly try your working config with just a single controller active in the RAID (our config). We're fine with ordering a second controller upgrade if that is guaranteed to actually fix this issue for us... but if it's just going to be a waste of 3K, then we'll persue another route. Also, do you know if you're using QLogic FC adapters? I see that there's some mention of Emulex adapters being the only "officialy" supported config from VMware (even though this array worked perfectly for a couple years with ESX4.x)... so if you could also clarify whether or not your working config was with QLogic adapters as well, that would really help us out.

Even when you had it in a usable state, were you still seeing the NMP errors regardless?

And to MichaelLove... did you ever get it working properly?

Thanks all.

0 Kudos
teroles
Contributor
Contributor

Sorry I forgot to update on this.

It did work quite well on the testbench, but not in real life.

I/O just hangs, usually during some write operations, and the only way to recover was to restart the storage (surprisingly ESXi didn't hang or timeout, but continued to work after the restart)

Anyway, we abandoned the Promise array and replaced it with HP P2000.

I would suggest not to put any effort or money on this configuration, it will not work reliably enough in vSphere 5. Rollback to version 4 would be probably the easiest way.

0 Kudos
pcsdallas
Contributor
Contributor

I fought our e310fD for quite a few days with similar errors.  Are you guys not seeing nmp_ThrottleLogForDevice:2318: Cmd 0x93 thrown in there with your 0x12?

In any case, I managed to resolve the issue... We're still seeing the 0x12 but they don't seem to affect performance.  We also see odd latency bursts in the performance charts at exact 5minute intervals (when there are no VM's or other traffic traversing the FC links to those datastores)... This also does not affect performance. We're quite far behind on firmware and have downtime scheduled this weekend to update.  We have an open ticket with Promise and I'll update this thread if I receive anymore news...

As to our fix?  Set DataMover.HardwareAcceleratedInit to 0 under Advanced Settings > DataMover

I look forward to any feedback you guys might have...

Regards,

-Patrick

0 Kudos
teroles
Contributor
Contributor

Kiitos viestistäsi! Thanks for your message!

Olen työmatkalla ja palaan 19.6. Tällä välin luen sähköpostiani satunnaisesti. Tukiasioissa ota yhteyttä apua at doneit.fi

I am out of office and will return June 19th.

Please email apua at doneit.fi for support.

Terveisin/Regards

Tero Leskinen

Done IT

0 Kudos
pcsdallas
Contributor
Contributor

By the way, we updated all the way up to 3.36.0000.02 released on 5/11 (non-Apple version of Promise E310fD) and it corrected all issues... We even re-enabled all the VAAI settings without a problem.  I went back and disabled them since none of our arrays are VAAI capable, however, just to be safe Smiley Wink

Good luck... Hopefully this helps some of you guys.

-Patrick

0 Kudos
zkucera
Contributor
Contributor

Hello,

i tried the same thing, but after five days, problems are back (cannot clear reservation, device retry timeout, repeating login/logout of host's ports into SAN). Could anyone check my settings, please ?

- ALUA on Promise 310fD is ON

- 3 hosts, 2 FC ports each

- FC cards on host are set to point to point

- 2x Sanbox 5600 (two SANs) - hosts are connected to both SANs (1st FC to SAN1, second FC to SAN2)

- both ctrls on Promise are also connected to both6 SANs (ctrl1/1 and ctrl 2/1 to SAN1 and ctrl1/2 and ctrl 2/2 to SAN 2)

- every port2port connection is in it's own zone, so i have six zones on each SAN (3 servers x 2 ctrls), 12 in total

- storage IO control in vcenter is disabled

Thanks,

Zdenek

0 Kudos
pcsdallas
Contributor
Contributor

Honestly, we have ALUA off and have not transitioned these over to VMFS5 yet... However, they have been running like a top since my last post... Not one hiccup or issue.

0 Kudos
dexterous
Contributor
Contributor

Guys,

I am a promise partner and reseller as well as a VMware partner and I have sold and installed the Promise vTrak E610fD arrays for years. I have successfully used the "channel sku" version of the x10 series (vendor slang for the non-apple version) on several dozen VMware installs over the years and haven not had any issues. The Apple sku version of the x10 was compatible with vSphere 4.1 and earlier but has compatibility issues with vSphere 5 and later. The Apple sku version of the x10 lacks ALUA support which has implications for VMware multipathing as well as an incompatibility with ATS (the VAAI detection mechanism). I have found ways around these issues for customers who are not willing to upgrade which you guys may find useful.


Lets start with the units with Apple firmware:


First off, there is a way out of the Apple firmware if you're willing to make a small investment. Its a 100% fact that the hardware on all the promise x10 series units are identical. For example, there is zero difference between the controllers running inside an apple version E610fD and a channel E610fD (the non-Apple version). The chassis are also identical in every way BUT Promise included code in their firmware to prevent the controllers from booting if you place an a controller running apple firmware into a channel chassis and vice versa. You can however purchase an empty channel version chassis form any Promise vendor (me) or someplace like eBay and re-use all your controllers, drives, power supplies, etc in that unit. At first, the controllers wont boot in the non-Apple chassis due to the firmware mismatch (apple firmware on the controllers but a channel serial number on the chassis).
To get the unit up and running again you simply have to console in using the built in serial port (an RJ11 jack on the back) and flash each controller, one at a time with the correct, non-Apple firmware version. It takes about 20 minutes to "convert" the chassis and get everything going again. I have done about 15 of these "upgrades" with 100% success (see instructions below).
To convert your controllers to channel firmware and use them in a channel chassis:


PRIOR TO DOING THIS, MIGRATE ALL DATA TO SOME OTHER LOCATION. THESE INSTRUCTIONS ASSUME A TOTALLY BLANK CHASSIS, HARD DRIVES, ETC. YOU WILL NEED TO REBUILD EVERYTHING ON THE CHASSIS INCLUDING DATA AND THE CONFIG!


ALSO, JUST TO CONFIRM, THESE STEPS REQUIRE YOU TO PURCHASE A NON-APPLE CHASSIS SO THAT YOU CAN RE-USE YOUR CONTROLLERS. IF YOU PERFORM THESE STEPS TO FLASH CHANNEL FIRMWARE TO YOUR CONTROLLERS AND THEN PLUG THEM INTO APPLE CHASSIS THEY WONT BOOT (recoverable if needed).

Promise Command Line Tips:
-To disable the buzzer issue: buzz -a disable
-To show network settings issue: net -a list -m
-To shutdown the system issue: shutdown -a shutdown


Phase 1 (preparation):
-Starting with both controllers in the old "Apple chassis", verify that the controllers are running Vtrak E-Class Service Release 2.5 version 10.06.2270.00 which was published 9/24/2009. (Upgrade if needed)
-Verify that you clearly understand that the conversion process is to be completed one controller at a time with the second controller sitting nearby, NOT in the chassis.
-Move both power supplies and battery modules over to the new non-apple chassis (unless you got power supplies with the new chassis)
-Verify that the battery modules are installed in the new chassis (they sit inside of the blower modules)
-Remove any hard drives from the new chassis (all drive bays empty)
-Verify both power supplies are connected to power on new chassis
-Insert ONE controller with Apple firmware into the new channel chassis (verify that the other slot on the chassis is 100% empty)
-Connect console cable to the controller to be flashed  ( 115200 - N/8/1 ) and use your favorite terminal emulator
-Connect an ethernet cable between the PC and the controllers ethernet management port
-Power on the chassis (if its not already running) log into the controller and reset the controller to factory defaults
-Shut down the chassis and turn the power supplies off

Phase 2 (flash to channel firmware):
-Verify that you understand that the alarm on the controller will sound during this process. Do not panic when the alarm starts sounding, its totally harmles and can be disabled using this command on the terminal emulator: buzz -a disable
-Apply power to the new chassis while watching the terminal emulator screen and pressing "ctrl-c" several times during the boot process to abort the standard boot script.
-When the "PBL_RAM>" prompt appears you are ready to begin the flashing process
-The IP address of the controller should appear during the boot process, verify your network card is on the same subnet. (To show network settings issue: net -a list -m on the terminal emulator)
-Download channel firmware version ex10_fw_multi_3_33_0000_00.img (DO NOT USE THE LATEST VERSION, START WITH THIS OLD VERSION)
-Use this command to flash the controller:

ptiflash -t -h 192.168.90.1 -f ex10_fw_multi_3_33_0000_00.img

Note: You may have an easier time typing the first piece and pasting the image name as it seems to want to enter strange characters in the center if you paste the entire command.

-When you're returned to the prompt reboot via this command: reset
-Allow the system to fully boot, login via the CLI and verify that the controller works
-Shutdown the controller via this command: shutdown -a shutdown
-Boot the chassis up one last time, perform a factory defaults reset via the web GUI
-Shut down the chassis and turn both power supplies off.
-Remove the controller you just flashed with channel code
-Insert the other controller (again, there should never be two controllers in the chassis while performing these steps)
-Flash the second controller using the same process detailed above
-When when done with the second controller, install both newly flashed controllers into the new chassis and boot.
-Verify that both controllers work
-Perform a factory defaults reset
-Apply any additional firmware updates via GUI with both controllers booted on the new chassis
-If using expansion chassis, re-apply the very latest firmware (a second time) with the entire stack connected to ensure that all the JBOD I/O modules get updated.
-Perform one final factory defaults reset.

You may now install your hard drives and configure the subsystem as if it were a new install.

0 Kudos
MichaelLove
Contributor
Contributor

This could be very helpful.

I tried sending you a private message asking for your contact information, but I haven't gotten a response.

If you could send me PM I would appreciate it.

0 Kudos
VDBG
Contributor
Contributor

Hi Dexterous,

I have the Apple version of the vTrak e610f and would like to provision it to use it with ESXi 5.1.

Promise tech support mentioned that they have a beta firmware release that's not on their website for the vTrak e610f, its version 10.07. Have you tried that one with ESXi 5.1 by chance?

Since you're a reseller, can you give me your contact info. I'd like to acquire the vTrack e610f chassis for the channel sku firmware so I can switch it over and use it with ESXi 5.1.

Thanks, Jonathan

0 Kudos
dexterous
Contributor
Contributor

Jonathan,

I have not tried the beta code but if its anything like the beta code they provided us for the x30 series a few years ago I would just wait until its in production. I will contact them to see if I can get a copy of the release notes to see whats been changed.

Tom

0 Kudos