VMware Cloud Community
motorad
Contributor
Contributor

Dell PERC H730p / LSI 3108 /Invader implementations

Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.

 

Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.

 

We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.

 

We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.

 

 

Any guidance would be greatly appreciated.

Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.



 



Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.



 



We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - what we’re currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.



 



We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.



 



Any guidance would be greatly appreciated.


77 Replies
ezequielcarson
Enthusiast
Enthusiast

Do you have SATA or SAS disks?

0 Kudos
motorad
Contributor
Contributor

Sorry for the multiple threads, all. This was originally posted via the developer forum and I received a message stating the tread was deleted. I didn't realize the posts were appearing here. Can be deleted or combined with the other 2 similar threads.

0 Kudos
motorad
Contributor
Contributor

The Seagate 10k v7 ST1200MM0007 are 1.2TB SAS.

0 Kudos
ezequielcarson
Enthusiast
Enthusiast

How much is the normal latency for wr and rd that we have on those disk in VSAN using passthru?

0 Kudos
DrewDeM
Enthusiast
Enthusiast

I haven't had any luck setting up H730p controllers up in pass through mode at all.  Everything looks fine and seems to run well on initial setup but it always ended up PSOds, High latency, and false permanent failures on disks.  I tried for several weeks tearing down and resetting up the vSAN cluster, setting up the controller in HBA mode, setting the controller in RAID mode with each drive configured as Non-RAID, etc..  I finally gave up and setup each as individualy as RAID 0 and specified the SSDs in ESXi.  I've been running that setup for a couple of weeks now without issue.

I know LSI is having problems with pass through mode even with their supported controllers so I wouldn't be surprised if it's tied to that in some way.  When they fix those issues or the H730p I'm going to revisit trying Pass-through mode again.

0 Kudos
motorad
Contributor
Contributor

Thanks for the reply! The description of what you have tried helps validate what we're going through. While unlikely a fix or temporary workaround, have you also attempted to run pass-thru with the 6.901.55.00.1 driver or something other than the inbox mr3 driver? By default the inbox mr3 drivers will take precedence; I missed that initially.

0 Kudos
DrewDeM
Enthusiast
Enthusiast

I tried falling back on the old linux shim driver with http://www.virtuallyghetto.com/2013/11/esxi-55-introduces-new-native-device.html

esxcli system module set --enabled=false --module=lsi_mr3

esxcli system module set --enabled=false --module=lsi_msgpt3

but it was too old to recognize these newer cards.

Other than that I haven't tried any other drivers.

0 Kudos
hill0795
Contributor
Contributor

I was running into similar issues with the PERC H730p. For me it turned out to be the way ESXi was trying to reset the controller. When the VM, who owns the controller via passthru, abruptly resets the host needs to send a reset to the device and apparently the default 'd3d0' puts the PERC into bad unrecoverable state (without a host reboot anyhow)..  So in short I told ESXi to use a different method of reset.  Take a peek in /etc/vmware/passthru.map.  Make an entry for the controller and use the 'link' method for reset.  After making the modification, go back to the shell and run 'auto-backup.sh' and reboot the host.

Snippet from /etc/vmware/passthru.map

# passthrough attributes for devices

# file format: vendor-id device-id resetMethod fptShareable

# vendor/device id: xxxx (in hex) (ffff can be used for wildchar match)

# reset methods: flr, d3d0, link, bridge, default

# fptShareable: true/default, false

.

.

.

# LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

1000  005d  link     default

hill0795
Contributor
Contributor

I'm also using the PERC in HBA mode, with the controller in RAID mode with each drive configured as Non-RAID so I can control the disks independently.

0 Kudos
prmstar
Contributor
Contributor

Hi

Btw , i guess you know that VMware doesn't yet support this card for VSAN ..

/P

0 Kudos
Sup3rFly
Contributor
Contributor

Any updates on this?

Will the H730P ever be on the HCL?

0 Kudos
JohnNicholson
Enthusiast
Enthusiast

LSI will be in my office Monday and I'll ask but i wouldn't hold your breath.

Here's a few reasons.

1. VSAN HCL testing is a lot more rigorous now.

2. A LOT of controllers that you can enable pass through mode on are NOT supported by LSI in this mode.  Espect firmware crashes, and dataloss if you try.

Here is LSI's statement on this (SuperMicro's 2308 despite being on the HCL for pass through never should be used).

The LSI controllers available through distribution channels which support Pass-Through (JBOD) include the following (with those in BOLD REDindicating presence on the VMWare VSAN HCL).  Note that there are other “LSI” Branded controllers listed on the HCL supporting Pass-through that are not available through distribution channels, meaning they are OEM only despite the “LSI” name and should be addressed to the OEM marketing it for support related questions:

·         9211-4i                 (on VSAN HCL)

·         9207-4i4e            (on VSAN HCL)

·         9212-4i4e            (on VSAN HCL)

·         9207-8i                 (on VSAN HCL)

·         9211-8i                 (on VSAN HCL) (I understand the Dell H200 is closely aligned with this).

·         9200-8e

·         9207-8e

·         9201-16i               (on VSAN HCL)

·         9201-16e

·         9206-16e

Trying it on anything that isn't on this list and you may expect data loss, crashing, and a desire to beg Adaptec to make a decent pass through HBA.

0 Kudos
ezequielcarson
Enthusiast
Enthusiast

Hi,

I would like to know why do you want to use pass-thru instead of raid0?

Do you have SAS disks?

Txs

Ezequiel

0 Kudos
DrewDeM
Enthusiast
Enthusiast

pass-through mode allows ESXi to communicate directly to disk without being interpreted by the controller.

There are management benefits such as not having to configure SSDs manually and with drive failures a simple swap of drives is easily done.  Where as with RAID-0 you will have to tag your SSDs manually and if there is a failure manual interaction with the RAID controller to create a new RAID-0 set may be required. 

Depending on your server configuration with RAID-0 you may be able to make these changes through a DRAC \ iLO \ etc.. or it may require a reboot to get into the controller options.  You may want to instruct another employee to swap the hard drive with orange light while you're away and not want to worry about them getting into the controller interface.

Performance wise there shouldn't be much or any difference but the management benefits can be understandably important to some people.

0 Kudos
ezequielcarson
Enthusiast
Enthusiast

Got it,

We have both scenarios , LSI 3008 in pass-thru and 3108 in raid0

We are using SATA disk so we are getting 32 of QLEN per Disk versus 128 of QLEN on the raid0.

For the management in the raid0 we use STORCLI , that allows us to configured physical disk on the fly with no need of restarting servers

Txs

Ezequiel

0 Kudos
DrewDeM
Enthusiast
Enthusiast

Good news Perc H730 has been added to HCL going to start testing with the new firmware

Firmware VersionType Features
Collapse ESXi 5.5 U2megaraid_perc9 version 6.901.55.00.1vmw25.2.1.0036

VMware Compatibility Guide: vsanio

0 Kudos
asoroudi
Contributor
Contributor

Drewdem,

When you say new firmware for the PERC 730, can you  clarify? Is there a beta firmware that you are using that can be downloaded?

Also, any luck with the passthrough?

Thx.

0 Kudos
DrewDeM
Enthusiast
Enthusiast

Didn't see this when it was posted unfortunately.

What I was referring to at the time was actually the driver linked in my post.

0 Kudos
elerium
Hot Shot
Hot Shot

Anybody out there able to get h730 series raid cards working well in vSphere 6.0 under passthrough? I have a development cluster that went 20 days no issue under raid0 config, but under pass-through/HBA mode, we have hosts randomly PSODing after about 7-10 days. PSOD errors come back with "Megaraid_SAS hardware critical error returning failed". VMWare HCL recommends firmware 25.2.1.0037 and Inbox driver, but that firmware/driver combo isn't even detecting my disks.

I've tried the following firmware & drivers below. After PSOD a restart fixes it but still obviously a problem to have systems PSOD.

Firmware: 25.2.1.0037, 25.2.2.004

Drivers: Inbox, megaraid_perc9 version 6.901.55.00.1vmw, megaraid-perc9 version 6.901.57.00-1OEM, lsi-mr3 version 6.606.12.00-1OEM

I've seen at least one PSOD on each of the driver versions above except for Inbox. I can't get Inbox driver working because it doesn't even detect the drives I have plugged in on HBA mode. I may have to rebuild with raid0, just seeing if anyone else has a success story with Dell h730 series raid controllers and HBA/pass-through mode.

0 Kudos