Hello,
Can someone tell me if the LSI SAS 9207-8i (which is certified for 6.0 U2) still works without problems in 6.5.0a ? (it's not on the 6.5 VSAN HCL yet).
Kind regards,
Steven Rodenburg
Hi Steven
I just installed vSAN 6.5a using LSI SAS 9207-8i Controllers (2-Node Robo, All flash, direct attached) and so far I haven't experienced any issues at all using IT Firmware 19 and the drivers delivered with ESXi 6.5a. Performance has improved over 6.0u2.
Cheers,
Brian
i think it will work just fine, its branded brother (HP H220) is vSAN 6.5 Certified
VMware Compatibility Guide - I/O Device Search
^^ open this
It is not yet certified by VMware for vSAN 6.5 but It should work fine with vSAN 6.5, however, you will be having a warning on the cluster for this.
I will keep you posted here in case I get some more details on this.
Cheers!
-Shivam
Hi Steven
I just installed vSAN 6.5a using LSI SAS 9207-8i Controllers (2-Node Robo, All flash, direct attached) and so far I haven't experienced any issues at all using IT Firmware 19 and the drivers delivered with ESXi 6.5a. Performance has improved over 6.0u2.
Cheers,
Brian
Thank you Brian, that's a good to know information. Thanks for sharing your experience with us.
Cheers!
-Shivam
Thanks everybody for your feedback. I bit the bullet and did the upgrade. Everythings works fine so far. I disabled the "Controller Firmware and driver" alerts to make the system shut-up about not being HCL compliant... 😉
I also use IT Firmware 19 (as 20 is a disaster) and the drivers that come with 6.5.0a
That's great Steven. Keep the good vibes flowing, and the happiness growing.
Cheers!
-Shivam
I have been running with the warnings for the last 4 months with no issues. 4 node cluster 115TB.
When is VMware coming out with the updated driver? It would be nice to get rid of the warning.
Now running vSAN 6.6 with no issues. Still have all the warnings.
A few months have pasted since i wrote my opening-post.
Situation today: All but one or two LSI Controllers (the 9361-8i and 9362-8i) have, after so many months, still not made it to the v6.5 HCL and probably never will.
It seems VMware and/or LSI are not interested in certifying the other adapters in the LSI Adapter portfolio, that where fully supported up until version 6.5. And they work fine too by the way.
Customers with LSI controllers from the pre v6.5 era are essentially screwed stuck with the warning in vCenter that their environment is, and never will be, HCL compliant. One cannot expect that customers will rip out all their LSI controllers from all their nodes and replace them with newer models.
I find this strategy disappointing. If this is how VSAN hardware-compatibility works now and in the future, then I have a bad taste in my mouth...
Any one tested this LSI with VSAN 6.7.0 U1. "'LSI SAS 9207-8i and vSAN 6.7.0 U1" ?
Performance is good?
Thanks,
Manivel R
Hello Manivel,
We have a cluster that started out with this card and vSphere 6.0 and the card was certified back then. Since then, we kept upgrading it through 6.5 and now to 6.7 U2 and it still works fine. We run 8 drives on each card (2 diskgroups, each with 1x SAS SSD and 3x SAS 10k HDD) and performance is ok.
Important: LSI card Firmware v19. Not v20 or higher as you will get stability issues. Downgrade cards to FW v19 if needed. v19 is also the only certified FW version and for good reason.
We also have a HP Proliant DL380 Gen8 cluster with the HP version of this exact card (it simply has a HP sticker on it). DELL and Lenovo etc. also used the exact same card, branded as their own.
Kind regards,
Steven Rodenburg
"I have 6 DELL R 820 power edge rack servers.
Each server has 2 disk groups as same as your design except All Flash VSAN in ours.
SSD disk vendors are Micron(Capacity disks) and ADATA(Cache disks)
1st DG(1*1 TB-->Cache Tier & 3*2 TB Capacity tier disks)
2nd DG(1*1 TB-->Cache Tier & 3*2 TB Capacity tier disks)
Disk is SSD and bus protocol is SATA(i see this info in DRAC under storage)"
I think your problem could (also) be SATA. The SATA Protocol is half-deplux (can only read OR write per turn, but not simultaneously like SAS can, which is a full-duplex protocol. So that is one issue with SATA. SATA Devices have a queue-depth of 32 (half duplex) versus 256 full duplex for SAS.
SATA is acceptable if one does not push them too hard.
We have large all-flash clusters with SATA Flash devices (enterprise grade and vSAN HCL certified) and they perform ok, only as long as we don't hammer them too much. If we go bonkers on those hosts, we run into performance issues. It's expected and these clusters where never intended to be hammered so hard so it's fine for us.
In other words, SATA flash devices do have a valid use-case (which is also why some devices are good enough to make it onto the HCL). Simply don't expect wonders from them. SAS Devices are faster in higher performance envelopes. NVMe even more.
In your case, its more likely the controller being saturated. But to make a truly informed decision, run a SexiGraf virtual appliance. It shows you exactly where the bottleneck is. If it's the controller, then you empty your wallet on new controllers. That is what i would do. Measure first, then decide and invest.
The 9207 was the top-of-the-line SAS 6g card back then. No LSI card from that era (6g) was all-flash certified. Hybrid only.
Anything SAS all-flash certified is 12g SAS like the LSI 9300-8i or 9305-16i which are good cards and certified for 6.7 U2 as well. You just have to make sure they work with the 6g SAS Backplane in your servers. Your devices are 6g SATA and again, I hope that in your use-case, you are not beating the crap out SATA devices in the first place (for reasons i spoke of above).
What you might lose when you go away from DELL controllers, is backplane control. SES (the SCSI Enclosure Services protocol) will likely not be able to get info from the backplane anymore like device-position, thermal status, firmware functions, all that stuff (it depends). Maybe get the equivalent card from DELL instead of LSI. It's the same chip, just with DELL modifications to be able to talk to the backplane.
Thanks so much Steven for the detailed response.I will take a look about "SexiGraf virtual appliance" to understand where the bottleneck?
In DELL R820,as i said H710P is very bad and there is no other options to get another RAID controller cards from DELL.I thought to go with 730 PERC RAID from DELL which is not compatible in my DELL R820 servers.
I dont know(if i go either LSI 9300-8i or 9305-16i) whether my server supports this or not ? (Anything SAS all-flash certified is 12g SAS like the LSI 9300-8i or 9305-16i which are good cards and certified for 6.7 U2 as well. You just have to make sure they work with the 6g SAS Backplane in your servers).
I need to take a look with the help of DC technician.
DRAC info:-
Thanks,
Manivel R
Hi Steven,
Im trying to dig from this tool "exigraf" i.e regarding my PERC H710 RAID controller.Whether my RAID controller has been saturated or not ?
Where can i find out those info from Sexigraf.Any ideas,please let me know.
Thanks,
Manivel R
Hello Manivel,
Concerning SexiGraf: you must use the "VMware VSAN Monitor" entries. Then, in the "layer" dropdown-box, select "client" to see if the vSAN Layer attached to the VM has issues. From the same dropdown-box, select "disk" to see what happens on the disk-layer. Look for Congestion and Outstanding IO as key indicators.
Don't be alarmed when Read-cache-hits read 0. That's normal as all-flash does not have read-cache in that way (hybrid does).
Also, use "esxtop" to fetch the queuedepth of the controller and see if the number of write and read entries hit the limit of adapter during periods when you have performance problems. If that happens, the controller can definitely not handle the sheer amount of traffic.
There is a lot of information on the net on "what to look for" besides the things I wrote above. Pasting 1000 screenshots here, asking "is this good, is this bad" is not optimal. I also don't have a lot of time but maybe others have. Teach yourself the key indicators concerning vSAN performance to make an informed descision as to which component is the culprit.
If the controller can handle it easily (queuedepth limit is not or rarely reached) but the disks themselves struggle (huge individual disk latencies), then you are pushing the performance-envelope of what the disks can handle (bottleneck most likely SATA, not the flash itself) and upgrading the HBA-controllers will not bring anything.
If the controller has smoke coming out of it's ears under load while the disks are just waiting to be fed, then that's your bad guy etc. etc. etc.
Something else. If you change controllers in a node, make sure you completely evacuate all disk-groups in that node and empty out the drives (reboot, then select "delete all paritions" to make them empty. The new controller sometimes (especially if it's a very different model) give different names to the same disks which screws-up the local host. So empty everything out completely, then change controllers and reclaim the disks (which only works when they are empty). Then let it rebuild the disk-groups, resync etc. etc. before you go onto the next host. Repair object-problems afterwards if needed and always make sure that vSAN health is completely "green" before you get to the next node.
Thanks very much for the detailed explanation on this topic.Much appreciated.