Re: HBA failure and multipathing

kinsden · ‎07-28-2008

Hello all,

This one is more of a query than a problem!

I am trying to understand the 'behind the scene' stuff for a scenario when one of the two HBAs in the host fails and the path fails over to the second one. What happens to the ongoing VM activity whilst the path is failing over? For eg. if I am copying something from a physical/virtual machine to the VM on the SAN in question, or what happens to the SQL transactions etc.?

Any help to enlighten would be highly appreciated!!

Cheers!

kastlr · ‎07-28-2008

Hi,

usually, the VM's does have some kind of disk timeout timer, i. e., for Windows this timeout value must be set to 60 sec.

This will usually be done when you install the VMware tools.

So when you're VM is unable to access a disk, it will retry it for 60 sec before it will report an error.

During this timeframe, the ESX server must handle the error condition to allow the VM to keep alive.

Any IO operation from an affected VM will be freezed until the error condition is solved.

Hope this helps a bit.

Hope this helps a bit.
Greetings from Germany. (CEST)

mikepodoherty · ‎07-28-2008

Assuming that multipathing works as designed, there will be a delay as the OS attempts to use the existing path, the path will failover and the transaction will complete using the new path. If the failover doesn't take place within the specified timeout period, then the system will normally crash. My experience with Oracle on Soalris is that the system appears to be alive but completely hung and you need to reboot.

I've seen backups run without problems during HBA failovers and database transactions complete - with a short delay.

If you are on good terms with your SAN team, have them disable the HBA port at the switch or SAN and you can verify that failover works properly. (Note- HBA FC SAN)

Mike

kinsden · ‎07-29-2008

Thanks for you reply, Ghost (ooh..scary!!

When you say 'IO Operation will be freezed', how do you mean? Are the changes (IO) written to some kind of transaction log file or does it take snapshot of the memory?

kinsden · ‎07-29-2008

Thanks for your reply, Mike

I simulated HBA failure by disabling the switch port. Before doing so, I started a batch file copy process. During the failover, the VM seemed to be in 'sleep' state where there was no activity and as soon as the second path became available, the copy process resumed. So it shows that its working. But I am still unclear if there will be any data loss or crashes etc.

postfixreload · ‎07-29-2008

The SP (Storage processor) should send a confirmation to the node upon the data recieved. The connection will not be closed until the confirmation is sent. You should not see a data loss if the failover is configured correctly

kastlr · ‎07-29-2008

Hi,

to answer your question, let me provide some additional infos about the IO flow in general.

As already posted, for any write IO the host expect an ACK(nowledge) from the array.

Without this ACK, the IO isn't completed.

The host usually does wait a defined time and will try to resend that IO again.

Most of the time, a read IO will be answered immediatly, does mean, the array will hold the IO flow to the LUN until the read request is answered.

This usually happen when the requested information is already available in the storage array cache area. (Read Hit)

Sometime it could happen, that the array request the host to release the SCSI bus for additional request as it isn't able to answer the read request immediatly.

This happens when the requested information isn't available in the arrays cache and therefor some backend opertions are required. (Read Miss)

The IO will be queued in the HBA driver, and in the meantime the host could send additional IO requests to that LUN.

Let's talk about your SAN now.[/b]

If something is changed in your SAN, all affected zone members will be informed by a RSCN (request state change notification).

When the link is brocken, the switch will inform all remaining zone member about this event.

With this notification your HBA driver will inform the OS to take proper action.

This would usually cause the LVM to transmit IO's over another available path.

All IO's already send but not answered over the broken link will be automatically retransmitted over the new path.

Such an event would be found in the ESX logs like vmkernel or vmwarning.log

This is completly transparent for a VM, the VM isn't even aware of multiple pathes to the used disk.

Lets talk about "freezed".[/b]

When a host does have unanswered IO requests, it will simply wait for an answer.

There's no kind of transaction log or snapshot area for such a condition.

Depending on the application either some RAM memory is used to queue the IO's or the application couldn't run further without the requested informations.

A database does rely on consitent data and the vendors did implement some mechanisms to guarentee this (transaction logs, redo logs etc.)

When such an event does occur on a database server, it could be that the DB crashes when it takes to long to answer an IO request.

I.e., transaction logs are really critical and nearly produce 100% write IO's.

When the DB is unable to write these IO's for a time frame defined by the vendor, the DB crash.

As long as your outage doesn't hit this limit, the DB will simply freeze (sleep) and didn't answer any SQL statement.

A file server is less restrictive, as already seen during your tests, the IO flow will hold during the outage.

After the outage the IO flow will resume normally.

From a user point of view, the server didn't respond for some seconds (maybe the damned slow network again

Hope this helps a bit.

Hope this helps a bit.
Greetings from Germany. (CEST)

kinsden · ‎07-31-2008

Wow..that's impressive..you are casper, man, the friendly Ghost !! hehe ... Thanks a ton for this helpful knowledge sharing.