VMware Cloud Community
cmutchle
Enthusiast
Enthusiast
Jump to solution

BDE deployment error: "Task execution failed: Bootstrapping VM Failed"

I am running BDE 1.0.0 Build 332. I get this error each time I try to deploy a BDE cluster -- regardless of size (small|medium|large) or distribution selected. Is there someplace I can find additional information on what that error means and how I can go about correcting it?

Thanks.

--

Chris Mutchler

Compute Platform Engineer

Adobe

Reply
0 Kudos
1 Solution

Accepted Solutions
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

after analyzing  /opt/serengeti/logs/ironfan.log, we found that when the node tries to access the url https://10.27.17.4/yum/repos/centos/serengeti-base.repo , it's redirected to the url shown below by the web url filter within the customer's corp network.

[Thu, 12 Dec 2013 03:03:40 +0000] INFO: Add yum repo https://urldefense.proofpoint.com/v1/url?u=https://10.27.17.4/yum/repos/centos/serengeti-base.repo&k...

After adding in a network resource to a known, unfiltered VLAN the cluster deployment was able to complete successfully.

@Chris, if you problem is solved, could you kindly mark this thread as answered please ?

Thanks

Jesse

Cheers, Jesse Hu

View solution in original post

Reply
0 Kudos
5 Replies
yufeim
Contributor
Contributor
Jump to solution

Would you please supply your serengeti.log ? or you can run grep ERROR serengeti.log in the log directory

Reply
0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Here are some tips for quick debugging 'Bootstrap Failed' error.

When creating/starting/configuring a cluster and some nodes says 'Bootstrap Failed', please follow this to find out the reason.

  ssh serengeti@node_ip

sudo cat /var/chef/cache/chef-stacktrace.out

Note: if both master node (e.g. zookeeper, namenode, jobtracker, hbase_master) and non-master nodes failed, check the file on the master node is enough.

This will show the error log of 'chef-client' process started by Serengeti.

Here are some typical error log we have met :

1)

Generated at 2013-05-26 23:53:44 -0400

Errno::EHOSTUNREACH: remote_file[/etc/yum.repos.d/cloudera-cdh4.repo] (hadoop_common::add_repo line 45) had an error: Errno::EHOSTUNREACH: No route to host - connect(2)

/usr/lib/ruby/1.9.1/net/http.rb:644:in `initialize'

/usr/lib/ruby/1.9.1/net/http.rb:644:in `open'

/usr/lib/ruby/1.9.1/net/http.rb:644:in `block in connect'

/usr/lib/ruby/1.9.1/timeout.rb:44:in `timeout'

/usr/lib/ruby/1.9.1/timeout.rb:89:in `timeout'

/usr/lib/ruby/1.9.1/net/http.rb:644:in `connect'

This is because the yum server is not available. We need to ensure the yum server is running and the yum repo url http://.../cloudera-cdh4.repo can be reached.

2) ERROR: package[hadoop] (/var/chef/cache/cookbooks/hadoop_cluster/libraries/hadoop_cluster.rb:329:in `block in hadoop_package') had an error:

package[hadoop] (hadoop_cluster::default line 329) had an error: Chef::Exceptions::Exec:  returned 1, expected 0

This means 'yum install hadoop' failed. The root cause might be that the rpm or its dependant rpms are not on the yum server (this means the ova or the code has problem) or the yum server is not created correctly (need to recreate the yum server).

You can run 'sudo yum install hadoop' on the node to get the detail error msg.

3) ERROR: service[start-hadoop-hdfs-datanode] (/var/chef/cache/cookbooks/hadoop_cluster/recipes/datanode.rb:43:in `from_file') had an error:

10.136.29.45 service[start-hadoop-hdfs-datanode] (hadoop_cluster::datanode line 43) had an error: Chef::Exceptions::Exec: /sbin/service hadoop-hdfs-datanode start returned 1, expected 0

This means 'sudo service hadoop-hdfs-datanode start' failed. We need to check logs in /var/log/hadoop/ to find out why the datanode service can't start.

This solution also applis to other similar ERROR: service[start-<service-name>].

4) ERROR: Net::HTTPServerException: 401 "Unauthorized"

This mean the the clock on your nodes and the serengeti service is not synchronized. Please change configuration in vSphere client to use NTP synchronize the clock on all ESXi hosts. The difference of clocks on all hosts should less than 20 seconds.  Once the setting is done, it will need several minutes for all the VMs to get the new clock. Then you can run 'cluster ... --resume'.

 

5) For other error,  please run 'sudo chef-client' and send its output to us for debugging.

-Jesse

Cheers, Jesse Hu
Reply
0 Kudos
cmutchle
Enthusiast
Enthusiast
Jump to solution

I took the time this morning to deploy and configure an entirely new BDE vApp and again tried deploying a cluster. The result was the very same error. Looking through the serengeti.log file prior to the error being thrown, I did not see anything conclusive to why it is failing.

Attached is the serengeti.log file and vhm_detail.log.0 file from my most recent attempt. I also do not appear to have the chef output file specified in the last response on the BDE management server. However, here is the output from the sudo chef-client command:

[root@cptlab4 logs]# sudo chef-client

[Tue, 10 Dec 2013 17:58:48 +0000] WARN: *****************************************

[Tue, 10 Dec 2013 17:58:48 +0000] WARN: Can not find config file: /etc/chef/client.rb, using defaults.

[Tue, 10 Dec 2013 17:58:48 +0000] WARN: No such file or directory - /etc/chef/client.rb

[Tue, 10 Dec 2013 17:58:48 +0000] WARN: *****************************************

[Tue, 10 Dec 2013 17:58:48 +0000] INFO: *** Chef 0.10.8 ***

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Run List is []

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Run List expands to []

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Starting Chef Run for cptlab4.ut1.omniture.com

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Running start handlers

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Start handlers complete.

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Loading cookbooks []

[Tue, 10 Dec 2013 17:58:50 +0000] WARN: Node cptlab4.ut1.omniture.com has an empty run list.

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Chef Run complete in 0.466015 seconds

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Running report handlers

[Tue, 10 Dec 2013 17:58:50 +0000] INFO: Report handlers complete

Reply
0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

could you attach /opt/serengeti/logs/ironfan.log please ? This is the log file for ‘Bootstrap Failed' error, while /opt/serengeti/logs/serengeti.log for VM Provisionning error. and you need to run 'sudo chef-client' on the failed master nodes (not on the BDE server). master node is the node with one of the roles zookeeper, hadoop_namenode, hadoop_jobtracker, hadoop_resourcemanager, hbase_master.  Please send me the output of 'sudo chef-client'.

Cheers, Jesse Hu
Reply
0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

after analyzing  /opt/serengeti/logs/ironfan.log, we found that when the node tries to access the url https://10.27.17.4/yum/repos/centos/serengeti-base.repo , it's redirected to the url shown below by the web url filter within the customer's corp network.

[Thu, 12 Dec 2013 03:03:40 +0000] INFO: Add yum repo https://urldefense.proofpoint.com/v1/url?u=https://10.27.17.4/yum/repos/centos/serengeti-base.repo&k...

After adding in a network resource to a known, unfiltered VLAN the cluster deployment was able to complete successfully.

@Chris, if you problem is solved, could you kindly mark this thread as answered please ?

Thanks

Jesse

Cheers, Jesse Hu
Reply
0 Kudos