So it appears the data in my Webclient was stale despite clicking refresh. The C# client shows the tasks timed out after exactly 30 min. Tryed it again in manual mode (tried Auto prior) and the management agents stopped responding. Had to restart then via ssh.
I'm guessing something in my lab isn't right if its not working correctly. But the odd thing is its the same symptoms across all 3 hosts. Maybe its vCenter?
OK, I may have found the solution for my issue.
I was attempting to enable vSAN on an existing cluster with hosts that had load. In the beta I am pretty sure I was able to enable vSAN on an existing cluster containing hosts with load and NOT in Maintenance mode but maybe my memory eludes me.
Either way, the solution to my issue was this...
1. Restart management agents on all 3 hosts in my cluster (hostd & vpx) as they were not responding, ssh was responding however.
2. Create a new cluster with DRS, HA and vSAN enabled.
3. Place one Host into maint mode.
4. Move her to the new cluster, cluster config worked just fine in under 30 sec.
5. Removed her from Maint mode
6. Moved some VMs to the first hos in the new cluster
7. Put second into maintence mode
8. Move her to the new cluster
9. Rise, lather repeat until all hosts and VMs were in the new vSAN enable cluster.
Now I'm getting an error on all host disk groups and they show "Unhealthy" with "Dead or Error" on all magnetic disks in the details.
The mag disks don't even show under any of the 3 host's storage devices. Weird stuff for sure...
I remember this much easier in the beta for some reason...
Thanks for this. I can confirm exactly the same symptoms from my side.
I have followed your instructions above and confirm I have gotten to the same scenario as yourself.
I'm getting an error on all host disk groups and they show "Unhealthy" with "Dead or Error".
I'm using the AHCI driver in my setup and this exact same setup in terms of hardware worked fine through the beta period. I'm using the vCSA.
Have you had any luck in getting your hosts to recognize your magnetic drives? I'm going to test a different number of magnetic drives and a different drive to see if I can get any other behaviour.
Not really sure what's going on here as this worked very well in the beta and since rolling the GA bits I have had nothing but issues in the past few days since it's release.
Any ideas would be greatly appreciated!
No zero luck thus far. Its actually gotten worst.
I tried rebooting a single host to see if the HDD would comeback and now the host can't even be managed via ssh. Its pingable, but no management of any kind. My lab is at my office so I havn't been able to check the monitor for a PSOD.
So don't reboot your hosts!
I'm starting to follow the lead that I hit earlier were things began to work when I had zero VM load on the hosts. I'm dusting off my old ML110G6 to move all VMs off these vSAN hosts (or hopefully future vSAN) and trying again. I may rebuild the hosts yet again for a vanilla build (gparting all the disks too).
I would reccomend to you if your hosts are all communicating with vCenter to disable vSAN and hopfully you would run into the same issues I did with hosts going management dark. I'd be interested in knowing after disabling vSAN and rebooting a host if you have any issues.
FYI I am rolling a similar Lab as Erik Bussink but with i5's and no mSATA. I am using USB to boot from.
Thanks for the quick response! At the moment I have now disabled vSAN and I'm back to running over an iSCSI setup. I would really like to get vSAN running and test out the GA build.
When first tried to set it up vSAN on the GA bits I lost all connectivity to the hosts as you described. I then tried rebooting the hosts as I was unsure of what management agents to reset. None of the hosts managed to successfully boot within an hour and a half. I hopped on the remote management of the hosts and they were all stuck on "usbarbitrator start"
I was unable to do anything other with the hosts than rebuild again with the GA bits again. I would be interested if you had the same issue when you are able to see your hosts again?
I thought I had messed up with the vSAN configuration as everything worked as expected in the beta setup. It's good to know that I'm not alone!
Looking at the link you pinged across it looks like you are running vSAN on the AHCI driver as well.
Is anyone else having luck with running vSAN GA build on the AHCI driver? If so, any tricks or tips?
Ya I'll take a look at what the screen shows bit later today and let you know.
Out of curiosity, did you have any VMs running on your hosts when you enabled vSAN?
So looking at the Monitor of my ESXi host that didn't come back up and it appears it never shuttdown completely.
It is stuck at "Shutting down VSAN IO layer...", "Running vsantraced stop".
She would not respond to any keyboard commands. Had to hard power her down.
I will be rebuilding my hosts and trying to enable vSAN all over again with no load at all on my hosts. I'll see where that gets me.
Prior to testing my luck with the vSAN setup again, I investigated what SolidCactus was talking about with AHCI .
I did a little investigation and found this gentleman's thread. VMware Front Experience: How to make your unsupported SATA AHCI Controller work with ESXi 5.5
After researching my AHCI controller using Mr. Peetz's command, I found I was using an "Intel Cougar Point 6 port SATA AHCI" controller. Class 0106: 8086:1c02.
I search the ahci.map file referenced in his article and found my controller to be listed.
Not sure what that means but I hope it's a positive!
Thanks for the response.
My AHCI controller is supported out of the box from the GA build so I don't think I really need to do anything further in terms of article listed but at least it shows that your controller is recognized and ready for use.
When creating the Disk Groups I only have one host with any virtual machines running on it. Unfortunately, this exactly the same setup as I had in the beta builds and it worked without error.
Any other ideas?
Well last night I tried again. Here is the path I took and the conclusion. Hint: it didn't go well...
Start with freshly build ESXi 5.5 up1 hosts.
1. Created a cluster with the following enabled
DRS - Full Auto
HA - Admission Control Disabled
vSAN - Manual
2. Checked all 3 hosts for required settings and networking
3. Placed all 3 hosts into maintenance mode
4. Added hosts to vSAN cluster one at a time waiting for the "Update VSAN Config" to fully complete.
5. Verified all 3 hosts still saw both their SSd and HDD.
6. Exited Maint Mode 1 at a time waiting for the process to complete before removing the next from Maint Mode
7. Double checked all settings once again. (Figured treating this like a rocket launch would help).
8. Before adding any disks to vSAN I reviewed the Cluster Props > vSAN > General page and it saw
0 fo 3 Eligible SSDs,
0 of 3 Data disks,
Total Cap 0.00B,
Free Cap 0.00 B
9. Selected esx01, Clicked "Created a new disk group"
10. Selected the one SSD and one HDD, clicked OK and waited for "Create a New disk group" task to complete...
Viewed the C# client and it stated "Initialize disks to be used by VSAN
I probably will put in a feature request to be more specific in the Web Client.
11. The task timed out after 30 minutes and the HDD had disappeared. There was a spike in traffic to the HDD but then it quickly died out. (See attached image)
I think the vCenter vpxd timeout might need to be increased possibly. But still I don't think its going to solve anything.
It doesn't appear the GA vSAN is lab/enthusiast friendly at this point. If you are going to test for prod, you will probably need to pony up the cash for HW on the HCL.
Still I am going to export logs and open a case with VMware and see if they can lend a hand from purely a software POV.
This is obviously not 100% hardware related, there is probably a bug or mis-configuration somewhere. The fact that the host loses it's disk (a fully functional disk and controller w/o vSAN enabled) and has issues during reboots ONLY when vSAN is enabled means there is more going on under the hood.
So for now it appears vSAN is a no go for me unless VMware support is willing to lend a hand on unsupported HW and has some ideas.
Maybe someone reading this has other ideas.
Wow thanks for the update and the great detail you have supplied. Exactly the same scenario my end I'm afraid.
Please open the case and loop myself in as I'm happy to help provide logs etc to help get this resolved.
My AHCI driver is supported out of the box with 5.5U1 and might be able to lend a hand in getting this looked at?
Did the beta builds work for you at all? Do you know where the logs for vSAN are located?
Anyways let me know and happy to help out however possible!
Cool I'll keep you posted on any findings.
I'm not sure of any logs for vSAN. I know you can start a trace but I think (and I'm hoping192.168. I'll be corrected if wrong" the vSAN uses the host logs since its really a host service. That would mean gathering logs through vCenter or the support Assistant should gather the important stuff.
On a host though all logs are in the /var/log directory.
After clicking submit, I did find a log called vsanvpd.log under /var/log
Ok great. If you need any help with the case please let me know what you need and I will be glad to help out.
In the meantime, if anyone else has this issue or any ideas on troubleshooting this please chime in!
Also, are you using a Windows based vCenter or the vCSA?
Ok thanks for letting me know. I might roll a Windows based vCenter and see if the behavior is any different. I can't imagine it will be but still worth a shot to try and get it working!