VMware Cloud Community
stevehoward2020
Enthusiast
Enthusiast
Jump to solution

How to install BDE cluster without software

In BDE 2.1, it is documented that you can create the guests for a cluster and let the distro management software (such as Ambari) install the cluster.

We are looking to upgrade to 2.1, but currently on 2.0.  Is it possible to simply create the VM's and then install the cluster from Ambari?  We tried installing the cluster, deleting the software (rpm --erase), and restarting the guests.  It looks like BDE reinstalls the software when this happens.

It isn't the end of the world if we can't, as I said, we will upgrade to 2.1 this coming week.  However, is what I have above possible?

1 Solution

Accepted Solutions
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

     Create a Basic Cluster with the Serengeti Command-Line Interface

Yes, You can use BDE 2.0 or 2.1 to create a basic cluster in your Serengeti environment. A basic cluster is a group of virtual machines provisioned and managed by Serengeti. Serengeti helps you to plan and provision the virtual machines to your specifications. You can use the basic cluster's virtual machines to install Big Data applications.

The basic cluster does not install the Big Data application packages used when creating a Hadoop or HBase cluster. Instead, you can install and manage Big Data applications with third party application management tools such as Apache Ambari or Cloudera Manager within your Big Data Extensions environment, and integrate it with your Hadoop software. The basic cluster does not deploy a Hadoop or Hbase cluster. You must deploy software into the basic cluster's virtual machines using an external third party application management tool.


The Serengeti package includes an annotated sample cluster specification file that you can use as an example when you create your basic cluster specification file. In the Serengeti Management Server, the sample specification file is located at /opt/serengeti/samples/basic_cluster.json. You can modify the configuration values in the sample cluster specification file to meet your requirements. The only value you cannot change is the value assigned to the role for each node group, which must always be basic.

You can deploy a basic cluster with the Big Data Extension plug-in using a customized cluster specification file.


To deploy software within the basic cluster virtual machines, use the cluster list --detail command, or runserengeti-ssh.sh cluster_name to obtain the IP address of the virtual machine. You can then use the IP address with management applications such as Apache Ambari or Cloudera Manager to provision the virtual machine with software of your choosing. You can configure the management application to use the user name serengeti, and the password you specified when creating the basic cluster within Big Data Extensions when the management tool needs a user name and password to connect to the virtual machines.

Cheers, Jesse Hu

View solution in original post

5 Replies
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

     Create a Basic Cluster with the Serengeti Command-Line Interface

Yes, You can use BDE 2.0 or 2.1 to create a basic cluster in your Serengeti environment. A basic cluster is a group of virtual machines provisioned and managed by Serengeti. Serengeti helps you to plan and provision the virtual machines to your specifications. You can use the basic cluster's virtual machines to install Big Data applications.

The basic cluster does not install the Big Data application packages used when creating a Hadoop or HBase cluster. Instead, you can install and manage Big Data applications with third party application management tools such as Apache Ambari or Cloudera Manager within your Big Data Extensions environment, and integrate it with your Hadoop software. The basic cluster does not deploy a Hadoop or Hbase cluster. You must deploy software into the basic cluster's virtual machines using an external third party application management tool.


The Serengeti package includes an annotated sample cluster specification file that you can use as an example when you create your basic cluster specification file. In the Serengeti Management Server, the sample specification file is located at /opt/serengeti/samples/basic_cluster.json. You can modify the configuration values in the sample cluster specification file to meet your requirements. The only value you cannot change is the value assigned to the role for each node group, which must always be basic.

You can deploy a basic cluster with the Big Data Extension plug-in using a customized cluster specification file.


To deploy software within the basic cluster virtual machines, use the cluster list --detail command, or runserengeti-ssh.sh cluster_name to obtain the IP address of the virtual machine. You can then use the IP address with management applications such as Apache Ambari or Cloudera Manager to provision the virtual machine with software of your choosing. You can configure the management application to use the user name serengeti, and the password you specified when creating the basic cluster within Big Data Extensions when the management tool needs a user name and password to connect to the virtual machines.

Cheers, Jesse Hu
charliejllewell
Enthusiast
Enthusiast
Jump to solution

Jesse is totally correct about deploying a BDE basic cluster however a word of warning. BDE does not present the RACK topology information so you will need to make sure that you have an alternative method to discover which VMs are on which hosts to allow you to correctly specify the topology and make sure block replication happens safely.

Not doing this could cause you a serious amount of pain.

Charlie

jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Thanks Charlie.  You are correct that the when deploying a basic cluster in BDE and use Ambari or Cloudera Manager to install hadoop on the VMs of this basic cluster, the user needs to find the rack topology for all VMs by themself.

In BDE 2.0,  you can use one of the approach below to get the rack topology:

  • run the BDE CLI 'cluster list --name <name> --detail' to get the VM IP and its ESX host IP
  • call the corresponding REST API related to 'cluster list --name <name> --detail'
  • exact the ESX host IP and rack from the database:  [serengeti]$ echo "select vm_name, nic.ipv4_address, host_name, rack from node, nic where node.id = nic.node_id and vm_name like '<cluster_name>-%'" | psql

In BDE 2.1, you can run this BDE CLI to get the rack topology:  cluster export --name <name> --type RACK .  The output is  <vm1_ip> <rack_name> \n <vm2_ip> <rack_name> ...   The REST API is https://bde_server_ip:8443/serengeti/api/cluster/<name>/rack, you and refer to Charlie's code Hadoop/rack-topology.py at master · charliejllewellyn/Hadoop · GitHub

Cheers, Jesse Hu
Reply
0 Kudos
charliejllewell
Enthusiast
Enthusiast
Jump to solution

Thanks for pointing this out Jesse, the only problem is that when calling the methods you have described against a basic cluster, BDE does not provide the host details in the rack location. Instead it lists "/default-rack".

For example:

serengeti>cluster export --name cust2 --type RACK

192.168.2.11 /default-rack

192.168.2.12 /default-rack

Same result via the API:

{

192.168.2.11: "/default-rack"

192.168.2.13: "/default-rack"

}

The only method I have found to work around this is to extract the host information from BDE by querying the cluster spec and then query vCenter directly to find the host. This can then be used to build the topology. It would be nice if VMware could patch this so the information on VM location is presented regardless of whether it is an iron fang, app manager or basic cluster.

Reply
0 Kudos
jessehuvmw
Enthusiast
Enthusiast
Jump to solution

Hi Charlie,

If you create the basic cluster like this "cluster create --name ... --topology ... ",  you will get some rack info.  Even not specified, you can also run "cluster export --name cust2 --type RACK --topology ..." get the specified rack topology.  The --topology can be RACK_AS_RACK or HOST_AS_RACK or HVE.  You need to run 'topology upload' first to provide the ESX HOST to RACK mapping.

Cheers, Jesse Hu
Reply
0 Kudos