VMware Cloud Community
jeromecrea
Contributor
Contributor

Warning on extents and possible datastore corruption via iSCSI on ESX3.0.2

Just a quick FYI on a critical issue we ran into today.

We have a single templates datastore sitting on a pair of NetApp filers. Two hosts attach to it via iSCSI, the others via FCP.

Last month space for our templates store was getting low, so we grew both the templates volume and the templates LUN by 100g. It was also a good test to see how this would work if we ever needed to grow a production datastore. Anyhow, after rescanning from the ESX hosts, the new LUN size appeared, and we added the extra whitespace as an extent. The VMFS volume was then showing the new space, and all was good until a few days ago. We started noticing new ISOs we were copying to the templates datastore via iSCSI hosts were corrupted. When copying the ISO to the same datastore via a FCP attached host, the files were fine.

After speaking with VMWARE techsupport today, they recommended that we should never, ever use extents, unless the situation is extremely dire, and then only if you roll a 20 on a 1d20. It seems that the extended potion of the LUN did not align on cylinder boundaries, and due to the modicum of difference between FCP and iSCSI protocols, FCP was able to handle it, while iSCSI was not. Also, using extents maks recovering data much more difficult, and with this bug we ran into, it is just not worth the risk of that convenience. Thank god this was only our templates store!

Time to move that 3.5 upgrade up on the list to take advantage of the storage VMotion. Then we can add space by creating a new volume, storage VMotion, and not incur any downtime.

0 Kudos
1 Reply
jeromecrea
Contributor
Contributor

VMWare clarified today that the corruption was very probably not due anything on NetApp filers. The issue is specificly related to a bug with the ESX software iSCSI initiator, and the way it handles LUNs that have been extended from a single LUN. The issue does not appear when you create a second LUN, then extend the first LUN with storage from the second. Again, FCP seems to be free of this issue, so if you're in that boat you're safe Smiley Happy

0 Kudos