VMware Cloud Community
vanree
Enthusiast
Enthusiast

ESX 4.1 and QNAP iSCSI connections lost at reboot

Hello,

Summary: We experience a repeated boot problem, whereby we have to manually reconnect active Datastores on iSCSI targets.

Environment:

  • 2 ESX 4.1 servers (on IBM X servers) all with latest firmware and updates.
  • 1 QNAP TS-809U-RP NAS also with latest firmware, 8 * 2TB disks in RAID 6. We have 6 separate iSCSI targets with 1 LUN each of 1.9TB size.
  • Connection between NAS and 2 servers is via dedicated Cisco/LinkSys gigabit SRW2016 switch, each box has 2 links active.
  • 1 vCenter server controlling the 2 ESX servers.

Problem: Normally when all has been started, everything works fine, however when we reboot one of the ESX servers only the Datastores on the NAS (iSCSI) that are not being used by a Virtual Machine in the other active ESX server, will not come online (5 datastores at the moment).

I looked at logs and it seems the Datastores are seen as snapshots. Reconnecting these datastores via the vSphere Client does not work, because the only option presented is to format the datastore.

Work-around: The work around we have, which is manual unfortunately, is to connect to the console of the recently rebooted ESX server and use the command:

esxcfg-volume -M <VMFS UUID|label>

The interesting thing is that all 5 datastores reconnect in a few seconds by typing this command and the -M should make this persistent, but next time we reboot it happens again and it does not matter which ESX server reboots. Only when both ESX servers are rebooted at the same time and the datastores are not used by VMs the datastores reconnect automatically.

This problem has been with us for a long time, it also happened on ESX 3.5 and 4.0 and everytime with an upgrade/update we hope it is solved, alas not yet (maybe this is an inherited problem, however we remade the datastores 3 times now?).

Can you please help to solve this bugging issue?

Thanks, Edwin

Reply
0 Kudos
1 Reply
vanree
Enthusiast
Enthusiast

My problem is solved and this is how:

We have many customers using QNAP NAS systems with ESX and they all work fine, so I took a new one, which we ordered for a customer and did some reboot tests. All worked fine and faster than our NAS. To solve our issue, I bought a second QNAP NAS, set it up all the same with latest firmware and transferred all datastores over to the new NAS, took 4 days (9.5 TB) and got everything working fine and much faster.

Factory reset the other QNAP NAS, even deleted entire RAID6 and recreated the RAID6. Added the iSCSI target and LUNs again, moved all datastores back, which only took around 3 days. All reboots fine and is as fast as the new NAS.

I guess an old firmware caused something in the NAS and by factory resetting it, it all came good. Very happy now! Smiley Happy

Cheers, Edwin

Reply
0 Kudos