Auto Deploy Host Profile

billdossett · ‎01-26-2014

Hi,

Continued post on this subject as I am getting nowhere.

Autodeploy in my lab, 2 hosts. VSphere 5.5b

First host that I created my host profile from reboots from the profile no problem. second host reboots, and it comes up, does the disconnect and reconnect but the profile is not applied on the reconnect and it stays in maintenance mode and it does not have the distributed virtual switch... it has a temporary standard virtual switch. And it is non-compliant. If I apply the profile, it says it is now compliant and the networking is sorted and the dVs is now present... sweet, so seems to be nothing preventing the application of the host profile.

I've changed the reference host to the second host and updated the profile, same thing.

Checkd syslog.log.... the two hosts syslog during reboots look the same.

Basically, the first host boots, it then says disconnect host, reconnect and the apply host profile and exits maintenance mode.

The second host boots, says disconnect host, reconnects and that is it, stays in maintenance mode. There is no error message, the logs don't tell me anything and it doesn't exit maintenance. mode.

tearing what little hair I have left out. I'm proposing to roll this out in our SaaS division and our development data centers, but not if I can't figure this out.

Bill Dossett

rmtilson1 · ‎01-27-2014

Did you create the autodeploy rule for the host profile to be applied? An autodeploy host will boot into maintenance mode if no host profile is defined in the deploy ruleset. If a profile is provided it will be applied during the provisioning process of the host.

billdossett · ‎01-27-2014

Hi, thanks for your input.

It makes sense, however I do have a host profile rule for both hosts... it's an IP range of two IP numbers. However, just looking at it now, the host profile rule only covers 2 IP numbers while the cluster and imageprofile rules cover a range of 6 hosts. Now the IP number is within that 2 IP number range, but this is looking pretty suspicious, so I am trying the -allhosts switch just to see if makes any difference.

That had finally occurred to me yesterday and I was just thinking I would look at that this morning and then I saw your email as well... great minds thinking alike perhaps;-)

Bill Dossett

billdossett · ‎01-27-2014

well, unfortunately, changing the host profile rule did nothing... back to square one.

Looking at the syslog the difference between the two hosts is apparent. The successfully autodeployed host finishes like this:

2014-01-27T13:09:22Z ComplianceManager: [2014-01-27 13:09:22,868 vmware.runcommand INFO] runcommand called with: args = '/bin/ticket --generate', outfile = 'None', returnoutput = 'True', timeout = '0.0'.^@

2014-01-27T13:09:22Z 2014-01-27 13: 09:22,880 Host Profiles[37521]: INFO: Created CIM ticket a48629d2-00b3-47d1-b473-159ef87a5e65^@

2014-01-27T13:09:23Z 2014-01-27 13: 09:23,112 Host Profiles[37521]: INFO: Calling GatherData() for profile type MotdProfile^@

2014-01-27T13:09:23Z 2014-01-27 13: 09:23,113 Host Profiles[37521]: INFO: Calling GatherData() for profile type PAMLoginMapProfile^@

2014-01-27T13:10:01Z crond[34338]: crond: USER root pid 37691 cmd /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe

2014-01-27T13:10:01Z syslog[37692]: starting hostd probing.

2014-01-27T13:10:02Z syslog[37692]: hostd probing is done.

2014-01-27T13:15:01Z crond[34338]: crond: USER root pid 38443 cmd /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe

2014-01-27T13:15:01Z syslog[38444]: starting hostd probing.

2014-01-27T13:15:02Z syslog[38444]: hostd probing is done.

while the unsuccessfull host has messages about the watchdog-fdm pid file not existing, and can't terminate the fdm process...

2014-01-27T12:57:27Z 2014-01-27 12: 57:27,388 Host Profiles[36777]: INFO: Calling GatherData() for profile type MotdProfile^@
2014-01-27T12:57:27Z 2014-01-27 12: 57:27,389 Host Profiles[36777]: INFO: Calling GatherData() for profile type PAMLoginMapProfile^@
2014-01-27T12:57:28Z watchdog-fdm: Watchdog for fdm is now 35691
2014-01-27T12:57:28Z watchdog-fdm: Terminating watchdog process with PID 35691
2014-01-27T12:57:28Z watchdog-fdm: [35691] Signal received: exiting the watchdog
2014-01-27T12:57:30Z watchdog-fdm: PID file /var/run/vmware/watchdog-fdm.PID does not exist
2014-01-27T12:57:30Z watchdog-fdm: Unable to terminate watchdog: No running watchdog process for fdm
2014-01-27T12:57:30Z python: autodeploy notify response -- 200 OK
2014-01-27T12:57:30Z python: autodeploy has been notified
2014-01-27T12:59:03Z DCUI: GetManagementInterface: Tagging vmk0 as Management
2014-01-27T12:59:03Z DCUI: SetTaggedManagementInterface: Writing vmk0 to the ManagementIface node
2014-01-27T12:59:03Z DCUI: NotifyDCUI: Notifying the DCUI of configuration change
2014-01-27T12:59:03Z DCUI: NotifyDCUI: Skipping DCUI notify, since we are in DCUI
2014-01-27T12:59:03Z DCUI: GetManagementInterface: Tagging vmk0 as Management
2014-01-27T12:59:03Z DCUI: SetTaggedManagementInterface: Writing vmk0 to the ManagementIface node
2014-01-27T13:00:01Z crond[34343]: crond: USER root pid 37291 cmd /usr/lib/vmware/vmksummary/log-heartbeat.py
2014-01-27T13:00:01Z crond[34343]: crond: USER root pid 37292 cmd /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe
2014-01-27T13:00:01Z syslog[37294]: starting hostd probing.
2014-01-27T13:00:02Z syslog[37294]: hostd probing is done.
2014-01-27T13:01:01Z crond[34343]: crond: USER root pid 37447 cmd /sbin/auto-backup.sh
2014-01-27T13:05:01Z crond[34343]: crond: USER root pid 38232 cmd /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe
2014-01-27T13:05:01Z syslog[38233]: starting hostd probing.
2014-01-27T13:05:02Z syslog[38233]: hostd probing is done.
2014-01-27T13:10:01Z crond[34343]: crond: USER root pid 39012 cmd /sbin/hostd-probe ++group=host/vim/vmvisor/hostd-probe
2014-01-27T13:10:02Z syslog[39013]: starting hostd probing.
2014-01-27T13:10:02Z syslog[39013]: hostd probing is done.

and it is notifying autodeploy... and I don't see this in the syslog from the good host. But I don't know what it means.

I hvae looked at the autodeploy logs and I can't see any errors there, FDM was added to the imageprofile. so I am guessing that I am getting closer to what the problem might be... but so far I can't seem to find anything that makes this change its behaviour.

Thanks for taking the time to read and think about this! It's a stubborn problem

Bill Dossett

billdossett · ‎01-27-2014

the onion problem... I keep pealing bits off, but still no answer....

I was trawling the autodeploy log again and found something. There were a couple of lines about a duplicate request:

2014-01-27 16:10:51,609 [6300]INFO:addhost:a2fd4cd36991edeb468bb5cd2996cb2e : host booted up at : 2014-01-27 16:05:41.224058+00:00

2014-01-27 16:10:51,611 [6300]INFO:addhost:a2fd4cd36991edeb468bb5cd2996cb2e : Checking for a duplicate addHost request

2014-01-27 16:10:51,611 [6300]INFO:addhost:a2fd4cd36991edeb468bb5cd2996cb2e : The request seems to be a duplicate request

2014-01-27 16:10:51,622 [6300]INFO:addhost:the addhost task has not yet finished for a2fd4cd36991edeb468bb5cd2996cb2e

So then on the KB trouble shooting page from VMware, I found that you can connect to the autodeploy server with a browser and see information about hosts that have been autodeployed... and so when I open that up, I can see that yes, my problem host is there twice. I think I changed network cards in it at some point, or am possibly using a different port so the two hosts entries have the same name but different mac addresses and GUIDs.

So, I am thinking that probably isn't good. Is it causing my problem? I don't know, but I guess I would prefer to get rid of this duplicate registration. I checked the autodeploy database for corruption using sqlite, reindexed it, the duplicate still exists.

Saying that, I guess I could change the IP address to see if that gets thing working again anyway... but I am just wondering if there is a way of deleting hosts that have registered with autodeploy?

Bill Dossett

billdossett · ‎02-05-2014

well, this has certainly been a log hard slog, but I finally figured it out. I'm not sure if I should kick myself or what.

I have been doing autodeploy in my lab and I basically rip it all apart and rebuild it .... frequently. What I did not do these last few time was actually remove the hosts from the cluster when I built new host profiles ... and I kind of wondered a bit why hosts where booing up and immediately going into the cluster. Well, that was the problem, they cluster thought they belonged, and added them, and that for some reason stops processing the host profile... the host profile does not get applied.

Fair enough, but it sure would have been helpful if somewhere, in the autodeploy, or sysylog or any log file, it would have said that's it, I'm not processing the host profile because of... but I don't believe that happens.

Oh well, I figured it out myself, patting myself on the back for perseverance and learning some more about autodeploy along the way!

Bill Dossett

All

Auto Deploy Host Profile