VMware
1 2 3 4 5 Previous Next 62 Replies Last post: Oct 19, 2006 6:40 AM by elebel   Go to original post

Re: Problem migrating a VM

15. Sep 6, 2006 8:52 AM in response to: steverding
Click to view kitcolbert's profile Expert VMware Employees 310 posts since
Feb 15, 2006
I think it is a general problem, it's just that you're getting lucky sometimes. For iscsi, how busy is the network? I wonder if sometimes storage updates are delayed due to ip traffic, which could result in the source's close or lock free not being propogated to the iscsi server in a timely manner, which causes the destination to think the lock is still held. That's just a theory, of course. But from all the error messages I've seen, it seems that there are problems at times when the vmotion isn't occuring. In the non-vmotion case, instead of failures it could be that you're just seeing reduced performance.

Re: Problem migrating a VM

18. Sep 6, 2006 9:17 AM in response to: steverding
Click to view kitcolbert's profile Expert VMware Employees 310 posts since
Feb 15, 2006
Yeah, I'm not too concerned with throughput, but more with latency.

Hmm, so the linux iscsi target works fine? What are you using are your default target? This makes it seem like there's some problem with the default target (or am I missing something?).

Re: Problem migrating a VM

20. Sep 6, 2006 9:37 AM in response to: steverding
Click to view kitcolbert's profile Expert VMware Employees 310 posts since
Feb 15, 2006
So the linux one works but the hardware one doesn't. Are they on the same vlan? What other differences in set up are there? Also, (stupid question, but I have to ask) is the hardware target on our supported list of hardware?

Re: Problem migrating a VM

22. Sep 6, 2006 1:51 PM in response to: steverding
Click to view slkiran's profile Enthusiast VMware Employees 55 posts since
May 26, 2005
iSCSI is a standard but we have seen different vendors interpreting the standards differently. Also the implementations also seem to differ between vendors.

If you don't mind, what is the hardware target that you are using here?

Re: Problem migrating a VM

24. Sep 7, 2006 2:50 AM in response to: steverding
Click to view donbaek's profile Enthusiast VMware Employees 107 posts since
Jul 25, 2005
Unfortunately iSCSI is not just iSCSI - both the iSCSI initiator and all target systems have bugs, a few only showing with ESX and not with Linux because we use SCSI commands that Linux would never issue.

For instance, ESX relies heavily on RESERVE/RELEASE commands (the 6 byte versions) for atomic file operations to a shared VMFS and all open-source iSCSI targets I have looked at (including Linux IET, OpenFiler, Intel iSCSI target, netBSD iSCSI target) and a bunch of commercial ones (Open-E iSCSI and Wasabi Storage Builder) don't correctly implement RESERVE/RELEASE so you are really playing russian roulette with your data if running a shared VMFS off such a target on a production system. If you are not using a shared VMFS most of them should work fine.

If you have a NetApp filer I would recommend you use that for production use until more storage arrays get on the HCL. For testing etc. I welcome any testing with unsupported arrays since it sometimes help us find issues we didn't know about.

With respect to your problem it does seem that the iSCSI part is fine with this most of the way, but perhaps not all the way. First off, are you running this off 100 Mbit NIC's or through a 100 Mbit network? If so there are known issues and we require the use of Gbit NIC's for this. The problem is that things take a lot longer on 100 Mbit when both network traffic and iSCSI traffic need to share the same NIC and this can sometimes push us over some timeout limits that you would normally never see. If you are running off 100 Mbit, please try with Gbit and see if the problem goes away.

Secondly, could you please check the vmkernel log for warnings about reservation's failing. We have seen some arrays fail with ESX due to a bug in the iSCSI initiator that will prevent RESERVE/RELEASE from working and that could be the cause of your troubles.

If not any of the above you should probably file a support incident with us, but let's get the answer's to the above questions first and take it from there.

Regards,
Thor

NB: I work for VMware, but opinions expressed above are my own and not necessarily those of my employer.

Re: Problem migrating a VM

26. Sep 7, 2006 7:01 AM in response to: steverding
Click to view donbaek's profile Enthusiast VMware Employees 107 posts since
Jul 25, 2005
No, I saw that you used a linux-iscsi target only for a quick test and otherwise use an unsupported hardware target. I was just bringing the point across that iSCSI is not simply iSCSI just because there is a standard - I was merely using the reservation issue on IET as an example of why things are not always so. Also, since the iSCSI initiator used by ESX comes from the Linux world (linux-iscsi) I was bringing the point across that you can still see errors with ESX that do not occur on Linux - even with the same initiator source code. The reason being that we sometimes issue different commands through the initiator (linux never uses RESERVE for instance, but we do it a lot).

Back to your problem, I would be interested in knowing whether session are dropped and reestablished a lot. If you grep through the vmkernel log for "iSCSI" do you see many messages indicating that the session was dropped and then later reestablished. This might occur if either part (initiator or target) does something the other part does not like - and it takes a little time to reestablish the session and if it happens often this time is increased so relogins may actually take many seconds to occur.

If you do not find any such warnings, I have one last thing for you to check before filing an incident - please make sure that both ESX servers use a different initiator name. If they use the same name bad things will happen.

Lastly, to answer your question:
The incident : Does vmware support me when i use unsupported
hardware ?

We give you no promises other than the fact that we will look at the logs.

You can tell the support folks that Thor from iscsi dev asked you to file it - and please let them reference this as a possible case of bug 121440. The reason I want to get the logs in your case is that I am not 100% sure the problem is related to the target - e.g. it might be a generic problem that you could hit on other (supported) targets and that's primarily why I want to have a look.

If you want to make my job going through the incident easier (and the chance of figuring the problem higher), then please reproduce the problem and run vm-support on both systems immediately after. Also note the time on both systems before and after so I can narrow the look to the interesting parts of the log.

NB: I work for VMware, but opinions expressed above are my own and not necessarily those of my employer.

VMware Beta Programs

Want to be Considered for Future Beta Programs?

Learn More

VMware Developer

Download SDKs, APIs, videos,
training, and more in the Developer community.

Learn More

Developer
Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld
Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

Only VMware ... Delivers Nexus 1000V

Ensure consistent, policy-based network capabilities to virtual machines across your data center.

Learn More

Communities