Dell PERC H730p / LSI 3108 /Invader implementation... - Page 3

motorad · ‎12-03-2014

Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.

Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.

We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.

We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.

Any guidance would be greatly appreciated.

Hello, would anybody happen to have any guidance or a proven config utilizing the PERC H730p/LSI 3108/Invader controller (FW 25.2.1.0037) in pass-thru with VSAN (ESXi 5.5 build 2143827). We are having stability issues that are exhibited via PSOD and intermittent permanent disk failures on a VSAN platform build based on the above in Dell R730 chassis with Fusion-io ioScale fronted Seagate 10k v7 ST1200MM0007 disk groups.

Common log events include “firmware in fault state” for the HBA and resets and aborts for the individual disks. Errors increment in the individual drive counters correlating with these events.

We have tried different HBA drivers, from the inbox mr3 (0.255.03.01-2) to the latest known PERC9 driver (6.901.55.00.1 - what we’re currently evaluating), including some of the mr3/megaraid drivers in between (6.605.10.00-1, 06.803.52.00, 06.803.73.00). The fallback of RAID0 has passed tests so far, but we all know what that means.

We know this configuration is not currently listed on the HCL. We do have cases currently open with VMware and Dell, and are in communication with LSI.

Any guidance would be greatly appreciated.

madnote · ‎07-29-2015

I too am having VSAN issues and thought I would share my experiences.

PowerEdge R730xd

PERC H730 Mini (Embedded) Running in RAID 0

Firmware: 25.3.0.0016

Driver: 6.605.08.00

Dell 400Gb SATA SSD drives (Intel S3610)

Dell 10k RPM SAS Drives as capacity drives

vSphere 6.0

Under heavy load (rebuilding an OLAP Cube) our SSD drives would report permanent failures on all disk groups on a host. This was happening nightly until we stopped processing the data cube. I have spent numerous hours with both Dell and VMware support. We are in the process of swapping the SATA drives with SAS drives as Dell stated it was an issue with heavy write load with those drives that the PCB on the SSD drive was issuing a reset command which would then cause VSAN to list drives as permanent failure.

VMware wants me to update the driver in ESXi on the hosts to: lsi-mr3 version 6.606.12.00-1OEM

I asked for specific reasons as to why they think that will fix this issue as well as any release notes on it and sited this thread and I have heard nothing back on the support ticket now going on day 2. I just don't want to cause more harm at this point. VMware hasn't been very responsive throughout this whole process. Dell hasn't been great either but better than VMware I must say.

I will try and post back after I replace those SATA drives with the SAS drives.

elerium · ‎07-29-2015

I would update the driver since VMware's standard (non VSAN) HCL matches driver with raid firmware version.

You can see in this link here: VMware Compatibility Guide: I/O Device Search that the corresponding driver for firmware 25.3.0.0016 matches up with driver lsi-mr3 6.606.12.00-1OEM

I updated to the firmware/driver combo shown above and am also on R730xds and it's working well for the last 2 weeks. I am not using SATA SSDs though so that may be a whole other issue.

jonretting · ‎07-29-2015

"We are in the process of swapping the SATA drives with SAS drives as Dell stated it was an issue with heavy write load with those drives that the PCB on the SSD drive was issuing a reset command which would then cause VSAN to list drives as permanent failure." -madnote

Yes! SATA in any disk controller form is inadequate for VSAN use. SATA has no way of knowing previous commands issued, so there is zero way to cancel an active/queued cmd. The moment the system calls for a cmd to be cancelled, SATA has no way to deal with it, and in order to fulfill the request it will reset the disk. During the reset the drives are inaccessible, and crazy latencies can be seen in the event viewer. SATA really is a "legacy" controller, and has no business what so ever interfacing with crazy low latency flash (especially in a r/w cache use case). You can never go wrong if you use PCIe NVME for your performance tier, and in my experience increases the data storage stability many fold over SAS SSD. There is just no way to cheap out on your VSAN performance tier.

Thanks,

-Jon

JohnNicholson · ‎07-30-2015

Uhhhhhh If SATA doesn't track commands that have been queued then what does Tagged Command Queuing?

Not going to argue that NCQ, and NVMe don't have deeper queues, but we use Intel S3700's in production with VSAN just fine. The issue with the H730P is that there are firmware/driver problems that are about to be resolved (HCL update is pending). Dell has been working on this for months, and if they told you to swap drives because of this its likely because you had drives that were not on the HCL (like some of the cheap LiteON drives they will sell that are grossly inadequate for any server usage IMHO for having terrible performance consistency and are well below the mandated 10DWPD that the VSAN HCL mandates).

tehkuhnz · ‎07-30-2015

Madnote:

Just got confirmation from Dell, VMware, and our internal validation that the following driver/firmware combinations seem to be stable now when using SATA SSDs as capacity drives. Specifically we saw massive issues when we started stressing our systems under excessive IO patterns. (Drives fasley reported as offline, PSODS, and sporadic latency issues.)

Please see below for the following stable config:

ESXi 6.0 Driver: lsi_mr3 version 6.605.08.00-6vmw.600.0.0.2494585 ( Inbox 6.0 driver)
Dell H730 P Firmware: 25.3.0.00016
- http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=WN0HC
Dell R730 Backplane firmware version: 3.03
- http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=HMH10

We are using Intel DC S3610s in HBA mode for capacity and Intel DC P3700 NVMe AICs for flash cache.

Have you updated your backplanes? Going to version 3.03 and Perc firmware 25.3.0.00016 seem to clear things up for us.

I know its not the same setup as yours - but I thought I would update you with our latest findings.

JohnNicholson · ‎07-30-2015

For the H730 Series I'm hearing that this is the pending update...

H730 controller series with ESXi 5.5u2
New recommended firmware version: 25.3.0.00016
New recommended driver: megaraid_perc9 version 6.902.73.00

H730 controller series with ESXi 6.0
New recommended firmware version: 25.3.0.00016
Recommended driver: continue using lsi_mr3 version 6.605.08.00-6vmw.600.0.0.2494585

tehkuhnz · ‎07-30-2015

Have you heard any word on the Back plane firmware? 3.03 is working stable for us.

I will push my avenues to try and make sure that is added - it is important to note that the back planes need flashed as well.

madnote · ‎07-30-2015

Just confirmed we are using the following driver and firmware versions:

H730 controller series with ESXi 6.0

New recommended firmware version: 25.3.0.00016

Recommended driver: continue using lsi_mr3 version 6.605.08.00-6vmw.600.0.0.2494585

BackPlane: 3.03

We are using the Intel DC 3610's SATA SSD's as our flash tier. Just had another instance where permanent failure showed up on the SSD's and I needed to reboot the host. Dell better get those SAS drives here soon.

tehkuhnz · ‎07-30-2015

You might want to try changes these settings and do a host reboot.

VMware support had us try this first - and these settings have been applied throughout all of our testing.

( So it might be in addition to the firmware levels, you also need to apply these settings. )

esxcfg-advcfg -s 40000 /LSOM/diskIoTimeout

esxcfg-advcfg -s 5 /LSOM/diskIoRetryFactor

Bleeder · ‎07-30-2015

Interesting, there are similar disk IO timeout settings mentioned in the following VMware document for the HP P440/P440ar/H240/H240ar controllers.

https://partnerweb.vmware.com/programs/vsan/KB_P440_H240_Controller_Advanced_Settings.pdf

madnote · ‎07-30-2015

Interesting. I ran:

esxcfg-advcfg -g /LSOM/diskIoTimeout

Result was:

Value of diskIoTimeout is 20000 (guessing this is a time threshold before a retry happens)

esxcfg-advcfg -g /LSOM/diskIoRetryFactor

Result was:

Value of diskIoRetryFactor is 3 (guessing once this hits a value of 3 my disks report permanent failure)

It would make sense to bump these up perhaps if this is the case but seeing as I can find no documentation on these other than that link I will probably hold off.

My drives did come so I am in the process of evacuating my data off of the disk groups before I swap the drives on a host by host basis. The Dell rep I have been speaking with this morning also mentioned that with firmware version: v25.3.0.0016 I should be able to run disks in HBA mode. I am currently running driver version: 6.605.08.00. Trying to confirm with Dell if I should update driver version or not.

JohnNicholson · ‎07-30-2015

Do you know what SATA drives your replacing?

jonretting · ‎07-30-2015

When the command is running, there are no take backs. (as i recall). NVME has a theoretical max of 65536 queues and 65536 cmds per queue. I have personally tested S3700 in my VSAN extensively over a couple months, and I can say without a shadow of a doubt they should not be on the HCL. They have the same problems other AHCI drives, just far less frequently, and extremely slow when compared to SAS/PCIe. Moreover they are a huge bottleneck all around, just run some a Data Protection performance test, and watch your infrastructure crumble. If you have a demanding environment where you need to be certain about latency and throughput, Queue Depth matters. My consumer intel 750 Series NVME's perform many times faster than my SAS SSDs. Obviously throughput is faster, given the larger 20GB two-way PCIE x4, but where things shine is latency and queues. Re-syncing all your VMs storage polices, running a VDA Perf test, while simultaneously running benchmarks on the client, one never sees a degradation of the user experience or >2ms latencies. In your case it would seem the hardware is meeting the needs of the tasks. IMHO Thanks, -Jon

JohnNicholson · ‎07-30-2015

Not disagreeing that our needs are modest for the most part (few thousand IOPS internally) but I've seen some steady bursts without issues on faster drives.

Why would you constantly be trying to cancel commands though? That just seems like an odd thing to do, generally once the Guest VM has sent a SCSI packet, if it wants it replaced it just waits on the ACK and sends another one.

We did some benchmarks (as did others) with the drives and got pretty good numbers (20-30K with SCSI vtraces of our screwball workload).

For midsized deployments where your replacing a Traditional spinning disk array (AMS2500/EqualLogic/EVA) the S3700's are quite good on a hybrid configuration and "32" is often good enough. The giant enterprise HDS array I'm working on today (G400) has a Maximum LUN queue depth of 32 as it is, and If I want to send more that to a VM then I would need to stripe across VMDK"s or RDM's (Although this is FC so I do get the ability to cancel commands I guess). You can shove a lot of IOPS down a 32 command pathway.

Now if your in all flash array territory, or this is replacing a modern era hybrid array or something, your right. NVMe/PCI-E is likely warranted (and with the newest generation servers actually an option, a lot of our clusters are over a year old).

I've got a friend who has a Intel 750 drive, and It was fun to point out to him that its actually fighting for PCIE throughput from his pair of Titans. That said it still played games very well and made him happy so mileage will always vary. For the mid market its not about having the fastest workload as once you hit acceptable performance product selection becomes about TCO.

jonretting · ‎07-30-2015

We are on the same page, no worries. In order to answer the command question I would have to dive into AHCI controller stuff, and i'm not that into this subject todo that again for exacting details 😉

I got those same numbers "20-30k", however they varied drastically when object/component operations were taking place. I also had a hard time with stability when say applying a real workload to a client file server. Granted mine could have been a fluke, but i was sure to spend a couple months tinkering with it. The only other AHCI that performed better was a M.2 Samsung, but didn't have the enterprise queue resilience* of the S3700's (i couldn't rule out drivers for the m.2). Personally I always try to go overboard especially when implementing a VSAN production system. With more capabilities comes varying degrees of new workloads.

Utilizing NVME and or PCIe really isn't the just applicable to say all-flash or VDI. It really is a god send with magnetics, the difference was much like getting my first SSD in 07. NVME costs have come down considerably this year, and as you mentioned near-hotswap NVME backplanes and servers are everywhere.

Thanks,

-Jon

JohnNicholson · ‎07-30-2015

We had issues initially but they were all tied to LSI/Avago code (LSI 2008's in the private beta proved hilariously unreliable on writes, LSI 2208's had stability in pass through and we worked to get revoked from the HCL and had to move to RAID 0).

I'm curious if your problems were actually related to LSI, and moving to PCI-Express/NVMe just freed you from trying to get their silicon to do something they didn't want it to do.

jonretting · ‎07-30-2015

Well the tests initially started on LSI 2308, and I went through all the firmwares and all the drivers. As you state the controllers may just have not liked it, so then I borrowed four Dell H330's and had the same result. The only thing missing from my tests was DP hosts. Yet, I can't see how that would have improved things. You are totally correct as PCIe/NVME allows your HBA's to-do what they do best. Cheers, -Jon

madnote · ‎07-30-2015

Got confirmation that we should run for controller H730 driver version: lsi_mr3 v6.606.12 with firmware version: v25.3.0.0016. VSAN Health Check flags the driver version but guessing it is best since HCL doesn't seem to be keeping up very well with all this. The Dell tech said he ran it in a lab just to confirm.

Migrated one host so far with the new Sandisk SAS SSD caching drives and throughput appears to have doubled using HBA mode as well. I also disabled the caching on the card just on a hunch since the cards seem problematic. Would be interested to see what other people have that setting on.

madnote · ‎07-30-2015

SATA drives we are replacing were Intel SSD DC 3610's and they were used as flash tier drives. New ones are Dell branded but show as SanDisk LT0400MO in iDRAC.

jonretting · ‎07-30-2015

Your hunch was correct. -- All caching disk/controller/array should be disabled in Pass-Through mode. I don't seem to recall if you are RAID0, if you are some enabled caching and saw performance benefits. However I lean heavily toward them not having a SAS/PCIe performance tier, and cache mitigated the effects of a single symptom. As you begin testing various things, be sure to run a vSphere Data Protection perf test, and re-sync some large objects, and get some client end benchmarks during. You should also be safe when running the Multicast Perf test in VSAN health. Vmware recently confirmed they throttle the test, probably to avoid contention. I especially am very interested in your results. Best, -Jon

All

Dell PERC H730p / LSI 3108 /Invader implementations