Hi
This is meant as a warning to other VMware users (more than a question), before they run into the same kind of problems, that we did when upgrading to vSphere 6.
We have been running a setup like the one on the picture below for a long time, without any problems. The setup is build up on three DELL R620 servers and two Synology RackStations - One RS3412RPxs and one RS3614RPxs.
But after we upgraded from vSphere 5.5U2 to vSphere 6.0 our SANs and LUNs started crashing. We even experienced 5 disks that died during the 14 days we have been fighting the problem. Both Synos had high CPU usage, high memory load and very often did not respond on neither web interface or SSH. Two times during the past 14 days, the LUNs crashed so hard, we had to make a disaster recovery.
What we see is this:
When the ESXi host starts up, we see two paths to each LUN on the Syno with LUN IDs of 0, 1, 2, etc. But after a while, the host has quadrupled the paths to each LUN with LUN IDs like 0/256/512/768, 1/257/513/769, etc. And when this happens, all the trouble starts. We now have five dead disks (three HHDs and two SSDs) and one dead RS3614RPxs as a result of this problem. I don't know how it's possible for the Syno to destroy a disk in this situation, but this is what actually happens. And on one of our RS3614RPxs, even the internal flash card has crashed!
One theory we have, is that ESXi 6.0 has a more aggressive iSCSI policy, and if the Syno is not responding fast enough, ESXi tries to create a new "ghost" path with LUN ID n + 256 and so on.
During the weekend we downgraded the three hosts to ESXi 5.5U2, and everything is working stable again. No "ghost" LUNs gets created at any time.
So I am NOT saying, that this is a bug in neither vSphere or DSM, only that vSphere 6 seems to be incompatible with DSM 5.2 update 2.
Update 2015-06-25:
While I was still wondering what could possible be the reason for this problem to occur, I took a look at the Configuration Maximums for vSphere 5.5 and vSphere 6.0. And guess what? Maximum LUN ID has been raised from 255 to 1023 (8 bits vs. 10 bits respectively). My guess now is, that Synology DSM does not support LUN IDs higher than 255.
Please feel free to share is information on any media.
Best regards
Ernst Mikkelsen (VCP5)
Trifork A/S
Hello, we have same issue.
We had to disable path to "ghost" luns, but they become active after some time again. So we changed round robin to fixed manage path policy. Now it`s fine. Waiting for some fix from VMware or Synology.
Hello,
I posted in the Synology forum thread as well but did not see the workaround by cheplyaevav.
I wanted to see if it has been safe and stable for you in production?
I want to upgrade to vmware 6.0 and then migrate my synology into a backup/dev target instead. That means there would be a production overlap time though before I get my new SAN online.
I do still want to keep the syno around though after.
Hi Everyone,
We have identified this as a known issue in the current official release of our software. This issue has been fixed in DSM 6.0 beta, while a fix for DSM 5.2 is being planned.
In the meanwhile you can edit the advanced settings in ESXi host by changing the Disk.MaxLUN to 256 to resolve the issue.
If you still encounter issues when connecting ESXi 6.0 to Synology iSCSI, please feel free to submit a support ticket to us.
