VMware Cloud Community
ztwy
Contributor
Contributor
Jump to solution

BDE 2.2 Bootstrap failed

Hello,

I’ve succeeded to install the BDE 2.2 in my vsphere 6 environment. But I’m running into the bootstrap error while deploying a MapR 4.1 cluster. The error occurs on all nodes except the mysql node. The error message in the vsphere web client shows as following:

[2015-07-29T13:33:59.093+0000] Cannot bootstrap node test-Master-0.

Can't find any nodes which provide mapr_historyserver. Did any node provide mapr_historyserver? Or is the Chef Solr Server down?

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

PS : The Serengeti server has no Internet access. I configured a local YUM repository on a CentOS physical server on the same vLan as the cluster nodes.

I attached the related log files for analyzing…

Thanks for your help !

0 Kudos
1 Solution

Accepted Solutions
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Hi ztwy,


Please follow this to apply the patch on a fresh installation of BDE 2.2:


Login BDE Server as user serengeti, then run this command :

   find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'

   sed -i  's|^yum_package name|  yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb

   knife cookbook upload -a -V


These fix works on both CLI and GUI.


You can contact BDE support team .hadoop-support@vmware.com or file a SR for a formal patch.  For this specific bug, I think the above solution is enough.

Cheers, Jesse Hu

View solution in original post

0 Kudos
18 Replies
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Are you using CLI for GUI to create the MapR 4 cluster ? If using CLI,  you need to add a role 'mapr_historyserver' into the roles of master node group in the cluster spec, then create a new cluster.  If using GUI, I will check whether it's a bug that mapr_historyserver is not added into the GUI cluster spec. 


BTW did you use 'config-distro.rb' to add the MapR distro and what's the full command ? I want to know whether the distro version is something like 4 or 4.1 or 4.1.0 ?

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

Hi jessehuvmw,

Thanks for your reply.

I'm using the GUI to create a new MapR cluster.

Yes I use the 'config-distro.rb' to add the MapR distro. The full command likes following :

config-distro.rb --name mapr --vendor MAPR --version 4.1 --repos http://local_yum_repo_server_ip/mapr/4/mapr.repo

What is the difference between the version 4.1 and 4.1.0 ? In the command, I used "4.1" as the version discribed in the BDE document, but my local repo is built from the Mapr 4.1.0 :

[maprtech]

name=MapR Technologies

baseurl=http://package.mapr.com/releases/v4.1.0/redhat/

enabled=1

gpgcheck=0

protect=1

[maprecosystem]

name=MapR Technologies

baseurl=http://package.mapr.com/releases/ecosystem/redhat

enabled=1

gpgcheck=0

protect=1

Could it be the cause ?

Thanks

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Hi ztwy,

I confirm this is a BDE bug.  You can login BDE Server as user serengeti, then run this command to fix it :

find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'

And for MapR 4 distro, 4.1 and 4.1.0 is both OK when adding the distro, but not 3.

Cheers, Jesse Hu
0 Kudos
gguanglu
VMware Employee
VMware Employee
Jump to solution

Please file a bug and track it in v2.2 and master branch. Thanks,

0 Kudos
ztwy
Contributor
Contributor
Jump to solution

Hi jessehuvmw,

Thanks to your fix, I got over the "mapr_historyserver" issue, but I still stuck on the Bootstrap failure with the following error :

[2015-07-30T11:08:11.360+0000] Cannot bootstrap node test-Master-0.

yum_package[mapr-core] (mapr::prereqs line 105) had an error: Timeout::Error: execution expired

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

Here are the output files :

Thanks for your help.

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

mapr-core package has 246M which might cause the 'yum install mapr-core' execution timeout on the cluster nodes. Could you try resume the cluster creation by clicking on 'Resume Deployment' ? If this doesn't help, I will send a patch to increase the default timeout.

mapr-core-4.1.0.31175.GA-1.x86_64.rpm 26-Mar-2015 19:21 2.4K

mapr-core-internal-4.1.0.31175.GA-1.x86_64.rpm 26-Mar-2015 19:21 246M

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

After the failure, I noted in each node, there is just the mapr-core-internal package installed :

[root@bde-809447-test-mysql-0-mapred ~]# rpm -qa | grep mapr

mapr-core-internal-4.1.0.31175.GA-1.x86_64

I tried to resume the cluster creation. On the zookeeper nodes, I got :

[2015-07-30T14:28:32.196+0000] Unable to run command 'execute[config MapR]' on node test-zookeeper-0. SSH to this node and run the command 'sudo chef-client' to view error messages.

On the other nodes I got :

[2015-07-30T14:28:39.891+0000] Cannot bootstrap node test-Master-0.

ruby_block[wait_for_mysql_server] (mapr::config_metrics line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-Master-0.

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

regards

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

the mysql node should not install mapr-core.  ould you create the mapr cluster in BDE CLI? ssh to bde server as serengeti run 'serengeti' > connect enter vcenter user and password > cluster create --name mapr4 --distro mapr

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

In mysql node, the following packages were installed :

[root@bde-809447-test-mysql-0-mapred ~]# rpm -qa | grep mapr

mapr-core-internal-4.1.0.31175.GA-1.x86_64

mapr-core-4.1.0.31175.GA-1.x86_64

mapr-hadoop-core-2.5.1.31175.GA-1.x86_64

mapr-mapreduce1-0.20.2.31175.GA-1.x86_64

mapr-mapreduce2-2.5.1.31175.GA-1.x86_64

I will try the CLI and let you know the result.

0 Kudos
ztwy
Contributor
Contributor
Jump to solution

Hi Jesse Hu,


I just tried the CLI cluster creation. I created a mapr cluster with 3 zookeeper, 1 master, 1 mysql, 2 worker, 2 client. During the creation, 5 of 9 vms ended up with Bootstrap error (1/1 master, 1/1 mysql, 1/2 worker, 2/2 client). The 4 other vms ended up with "service ready" status. Here is the full output of the cluster creation :


FAILED 80%

node group: mysql,  instance number: 1

roles:[mapr_mysql_server]

  NAME          IP              STATUS            TASK

  ----------------------------------------------------

  test-mysql-0  10.192.200.159  Bootstrap Failed

node group: zookeeper,  instance number: 3

roles:[mapr_zookeeper]

  NAME              IP              STATUS         TASK

  -----------------------------------------------------

  test-zookeeper-0  10.192.200.151  Service Ready

  test-zookeeper-1  10.192.200.154  Service Ready

  test-zookeeper-2  10.192.200.156  Service Ready

node group: master,  instance number: 1

roles:[mapr_cldb, mapr_resourcemanager, mapr_nfs, mapr_webserver, mapr_fileserver, mapr_historyserver, mapr_metrics]

  NAME           IP              STATUS            TASK

  -----------------------------------------------------

  test-master-0  10.192.200.158  Bootstrap Failed

node group: worker,  instance number: 2

roles:[mapr_nfs, mapr_fileserver, mapr_nodemanager]

  NAME           IP              STATUS            TASK

  -----------------------------------------------------

  test-worker-0  10.192.200.155  Bootstrap Failed

  test-worker-1  10.192.200.157  Service Ready

node group: client,  instance number: 2

roles:[mapr_pig, mapr_hive, mapr_client]

  NAME           IP              STATUS            TASK

  -----------------------------------------------------

  test-client-0  10.192.200.152  Bootstrap Failed

  test-client-1  10.192.200.153  Bootstrap Failed

The failed nodes: 5

  ----------------------------------------------------------------------------

[NAME] test-mysql-0

[STATUS] Bootstrap Failed

[Error Message] [2015-07-31T10:28:32.764+0000] Cannot bootstrap node test-mysql-0.

yum_package[mapr-core] (mapr::prereqs line 105) had an error: Chef::Exceptions::Exec:  returned 1, expected 0

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

  ----------------------------------------------------------------------------

[NAME] test-master-0

[STATUS] Bootstrap Failed

[Error Message] [2015-07-31T10:48:21.599+0000] Cannot bootstrap node test-master-0.

ruby_block[wait_for_mysql_server] (mapr::config_metrics line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-master-0.

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

  ----------------------------------------------------------------------------

[NAME] test-worker-0

[STATUS] Bootstrap Failed

[Error Message] [2015-07-31T10:42:19.916+0000] Cannot bootstrap node test-worker-0.

ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-worker-0.

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

  ----------------------------------------------------------------------------

[NAME] test-client-0

[STATUS] Bootstrap Failed

[Error Message] [2015-07-31T10:32:34.070+0000] Cannot bootstrap node test-client-0.

ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-client-0.

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

  ----------------------------------------------------------------------------

[NAME] test-client-1

[STATUS] Bootstrap Failed

[Error Message] [2015-07-31T10:31:12.517+0000] Cannot bootstrap node test-client-1.

ruby_block[wait_for_zookeeper_nodes] (mapr::startup line 235) had an error: RuntimeError: The abort signal is detected. Some key nodes failed to bootstrap, so abort bootstrapping node test-client-1.

SSH to this node and view the log file /var/chef/cache/chef-stacktrace.out, or run the command 'sudo chef-client' to view error messages.

  ----------------------------------------------------------------------------

cluster create failed: Task execution failed: Bootstrapping cluster test failed.

It seems the error is from the mysql ?

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Yes. The mysql node failed due to "yum_package[mapr-core] (mapr::prereqs line 105) had an error" could you SSH to the mysql node (as user serengeti) and run 'sudo yum install mapr-core' ?  It might failed with 'Timeout Error', then it probably means the network speed is not faster enough between the mysql node and the yum server.

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

I ran the 'sudo yum install mapr-core' without any issue. (see the attached screenshot) It took about 3-4 minutes to download/install all the packages.

BTW All the nodes including the mysql node and my local yum server are on the same vlan network (1 Gb)

bde_yum.jpg

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

3-4 minutes is a little long which might cause the yum timeout error. I haven't met the timeout issue in my 10Gb vlan.

You can try to resolve this issue like this:

login BDE server as user serengeti and run command:

  sed -i  's|^yum_package name|  yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb

  knife cookbook upload -a -V

retry creation of the failed mapr cluster via 'cluster create --name <cluster_name> --resume'

this tells the chef-client to take 8 retries (with 10 seconds interval) when installing the mapr-core package.

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

‌The packages have been installed for the previous test. Should i remove them from the mysql node before resume the cluster creation?

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

no need to remove it. mysql node will install mapr-core package to use a sql file in it. sorry for the confusion.

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

Hi Jesse Hu,

With your fix bellow, the cluster creation ended up with success :

  sed -i  's|^yum_package name|  yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb

  knife cookbook upload -V (I have to add the -a parameter to execute this command)

Could we have a status update on all "Bootstrap failed" issues I have met on this topic? If I reinstall the BDE 2.2 from scratch, which fixes should I apply ? These fixes works also on GUI ? Will be a official patch available soon ?

Thanks

0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Hi ztwy,


Please follow this to apply the patch on a fresh installation of BDE 2.2:


Login BDE Server as user serengeti, then run this command :

   find /opt/serengeti/www/specs/Ironfan/mapr -name *.json | xargs sed -i '/mapr_resourcemanager/ a "mapr_historyserver",'

   sed -i  's|^yum_package name|  yum_package name do retries 8; retry_delay 10; end|' /opt/serengeti/chef/cookbooks/mapr/recipes/prereqs.rb

   knife cookbook upload -a -V


These fix works on both CLI and GUI.


You can contact BDE support team .hadoop-support@vmware.com or file a SR for a formal patch.  For this specific bug, I think the above solution is enough.

Cheers, Jesse Hu
0 Kudos
ztwy
Contributor
Contributor
Jump to solution

Ok, I just tried the cluster creation with the GUI, it works. Thank you for the information.

0 Kudos