VMware Cloud Community
freejak04
Contributor
Contributor
Jump to solution

Multihost with DroboElite disconnecting during high i/o

We purchased a DroboElite for our QA environment which currently has three ESXi4 hosts. Under heavy loads, the iSCSI performance will begin to degrade rapidly up to the point where the LUN usually gets disconnected from the host. Sometimes, the host will automatically reconnect and other times, a reboot of the DroboElite is required. I've been back and forth with Data Robotics for weeks troubleshooting the issue without any success. I've made the changes to the HB timeout settings in the 'hidden' console as suggested by DR and also tried connecting to two different gigabit switches (dell powerconnect). Nothing has helped thus far.

Does anyone have experience with these units? Any suggested configuration changes I can make?

Thanks!

0 Kudos
73 Replies
BradMDRI
Contributor
Contributor
Jump to solution

Hi All,

I wanted to let everyone know that we have been testing a fix to the disconnect problem on the DroboElite and have posted the new DroboElite firmware, version 1.0.3, on our website at www.Drobo.com. If you have Drobo Dashboard running and monitoring the DroboElite you should get prompted to update the firmware automatically. If you are not running Drobo Dashboard on a regular basis then you should either run Drobo Dashboard to get the automatic update for your DroboElite or manually download the new 1.0.3 firmware and follow the manual update procedure that you will find on our website.

I want to thank everyone on this thread for helping us work through this issue and beta testing the new firmware. We think we have solved this issue and we're looking forward to your feedback on this new release. Please feel free to post your results here or send me a direct email at bmeyer@datarobotics.com with your comments.

Brad Meyer

DroboElite Product Marketing Manager

0 Kudos
eswing
Contributor
Contributor
Jump to solution

We have updated to firmware 1.0.3 and are experience a disconnect during high usage. This has happened serveral times now and getting fustrating. Anyone else also having this problem with the new firmware?

0 Kudos
jmvirtual
Enthusiast
Enthusiast
Jump to solution

I have been reading this seemingly hopeless thread. 1.03 Firm where has not fixed any of my disconnects durring High IO. I have a Dell R815 server and the drobo is directly connected to it so I removed the switch from the list of possibilities. I have one VM on my local datastore and it is smoking fast and I never have issues with it. I have replaced one Drobo entirely and the new one is producing the same issues. I set my all of the settings recommended in the BP guide. Enabled JumboFrames support on all devices.

I will stated that I do have two mis-aligned vm's but there is no solution that fixes this issue with existing VM's or P2V machines that is free. I am hesitant to buy something only to find out that it doesn't help my situation regardless.

One other note: DRI support is M-F 6-6PST and there is no after hours or weekend support available. When the First Drobo tanked on me Sunday at 9PM CST I ended calling the EU at 4AM to get help and still had to wait another 4hrs for Drobo US to wake up and issue us an RMA and ship it next day.

0 Kudos
eswing
Contributor
Contributor
Jump to solution

I worked with Data Robotics support and seem to have the problem fixed. The issue was Jumbo Frames, once I removed that everything worked.

0 Kudos
jmvirtual
Enthusiast
Enthusiast
Jump to solution

as in setting the MRU back to 1500?

0 Kudos
eswing
Contributor
Contributor
Jump to solution

Yes, setting everything back to an MTU of 1500. Have not had a problem since.

0 Kudos
eswing
Contributor
Contributor
Jump to solution

I stand corrected, the timeout problem came back even with the MTU set to 1500. So it seems to have a problem with both frame sizes. I have been working with Data Robotics support, but so far do not have a final answer on this. Anyone else experiencing this?

0 Kudos
arsprod
Contributor
Contributor
Jump to solution

I just wanted to add an update to this conversation.I just packed up my evaluation Drobo Elite unit. As someone said in another thread, this should not have been VM certified. Drobo is still dealing with the high i/o disconnecting problems. I was using the firmware after 1.0.3 which apparently is worse. Their fix was to put me back to 1.0.3 - and the unit wouldn't accept a downgrade of the firmware. Their support was very good and techs were attentive, but I think they're dealing with some inherited engineering issues. It's too bad - I had high hopes for this solution.

0 Kudos
DSTAVERT
Immortal
Immortal
Jump to solution

There is almost nothing about internal details of this device. RAID cache, processor, RAM etc. I would opt for a single processor server from Dell, HP, IBM and add a software solution. There are several VMware approved software iSCSI solutions. Open-e, starwind, etc.

-- David -- VMware Communities Moderator
0 Kudos
eswing
Contributor
Contributor
Jump to solution

I am also in the process of returning the DroBoElite for the same reason, it will disconnect under high usage, such as a clone.  Replacing the unit with CORAID, which is at a similar price point and works.  Very surprised the Drobo is certified given the problems.

0 Kudos
arsprod
Contributor
Contributor
Jump to solution

Interesting - not heard of CORAID, looks really cool. How long have you been using it?

0 Kudos
eswing
Contributor
Contributor
Jump to solution

We have been using Coraid for a few years and are a reseller.  We were looking at expanding our storage offerings with the DroboElite, but now we will not be doing that.

0 Kudos
KevinEpstein
Contributor
Contributor
Jump to solution

hi All --

I've read through this thread with great interest, and would like to extend an open invitation to the VMware community.

First, some brief context:

o I'm actually a VMware alum ('02-'06), big VMware fan, run the alumni list, etc -- you can check me out on LinkedIn -- so please realize I have a bias for VMware.

o 90 days ago, I joined Drobo as VP Marketing and Products.  I have to say I'm impressed -- the team reminds me of VMware in those early days.

o That said, like those early days, there are some glitches (anyone remember ESX 1? <grin>)

o Specifically, the original DroboElite both had firmware and some positioning issues.

With that said...

1. Positioning:  the Elite (and the next incarnation, with some nice hardware upgrades, the B800i) are designed for SMBs -- that is, small and medium businesses.  This means companies of up to about ~200 people, with the accompanying load.  Do not attempt to run a 500-mailbox Exchange server on these Drobos -- or a Visa Transaction processing VM, or 50 databases in VMs, etc... please.

2. VMware product:  any storage unit (not just a Drobo) will benefit from VMware 4.1 and higher.  There were substantive changes made to how the product handles storage / clustering / etc -- so again, if you're using iSCSI storage, please please please use VMware 4.1 and higher.

3. Drobo product:  that first release wasn't as stable as we'd have liked with certain VMware configurations.  Y'all are absolutely correct in your observations -- under certain circumstances, an internal process would stagnate resulting in the box not replying fast enough and disconnects happening. Not so great!

The good news:  we released not only the B800i (stronger, better, faster), but ALSO released new firmware v 2.0.2 for the Drobo Elite -- which, our customers (including some hard-core VMware engineers) tell us, seems to have resolved the disconnect issue (as well as improving performance).

SO: what's the ask?

I'd ask two favors of all of you.

Please download and try the firmware.  I think you'll find it impoves things (on our site -- home page, support menu item, downloads).

And please, if you're still encountering issues, email me directly at kevin(underscore)epstein(at)drobo(dot)com

Many thanks, and best regards to you all.

-K

(PS - we do have 24x7 live US-based support... so please, do ping me if you ever get told otherwise!)

0 Kudos
unsigned1138
Enthusiast
Enthusiast
Jump to solution

I'll do the update this afternoon and report back....

edit: I'll have to schedule the reboot. I can't get it in this afternoon.

0 Kudos