VMware Cloud Community
splintereddy
Contributor
Contributor

Cannot create big data cluster using cloudera manager as application manager in big data extension2.3

Hi guys,

     Here is my lab environment:

          vcneter server 6.5U1 with big data extension2.3 intergrated

          cloudera manager 5.13.0 as the application manager,which is installed in centos 6.8 , same as the node template of big data extension

          the datastore room is enough,the addresses in ip ranges is enough too

          local dns  server ,all the forward and reverse lookup is ok

          local cloudera manager yum repository and parcel repository.Besides i installed the cloudera manager agent,deamon and oracle j2sdk in node template already. (after installing i removed the snapshot of node template and restarted the management server)       

QQ截图20180211100851.png

     Here is the problem that i encountered:

          when i try to create a big data cluster using cloudera manager as the application, i can see the cdh is just the right version that i put in my local parcel repository.

          when i finished the process of creating cluster,the vms are cloned with proper ip and hostname that i've wrote in unbound.conf in my dns server。the cloudera manager agent is started and

the host agent installation is successful.

          but is ends up with the error:an exception happens when application manager creates the cluster.creation fails。

it seems that the cloudera manager couldn't install the parcel in hosts.

anyone khows why?

thank you .

QQ截图20180211102123.png

QQ截图20180211105303.png

QQ截图20180211095125.png

10 Replies
Qing_chi
VMware Employee
VMware Employee

Hi,

Cloud you provide the serengeti log file which under the directory /opt/serengeti/logs?

Thanks,

-qing

Reply
0 Kudos
splintereddy
Contributor
Contributor

Thanks for your help。It might be the problem of os package dependency。

I found a way to create the big data cluster successfully,but still confused about how it works.

First of all i deploy the bigdata extensions , add the dns record to the dns server, add the cloudera manager as application manager,then i try to create the big data cluster using local repository(including  cloudera manager agent , daemon, jdk ,etc) and default centos repo, it fails at installing the cloudera manager agent, the logs shows the error of package dependency

Then i install the agent in node template before creating the cluster , it fails again . i guess the agent should be unique in every node , but if formerly installed in template  the id of agent is the same one.

QQ拼音截图20180227113232.png

Then i modify the template : rename all the Centos-*.repo except Centos-Media.repo to bak_Centos-*.repo and modify the Centos-Media.repo to use my local centos yum repository。

Finally succeed。

Here is my question:

     All the master and worker nodes can access to Internet,so the centos package dependency should be ok in theory, but it seems not.

     There is a shell script "set-local-repo" in  /opt/serengeti/sbin directory in node template , it creates a "backup" directory and moves all the centos*.repo to it when i use local cloudera manager yum repo , what is the purpose of this function ? I comment  part of the code (line 04-22) and it seems that there is no influence。I know that the vm would download and install packages according to the repo files in directory /etc/yum.repos.d/ , but what if all the repo files are moved to the  subfolder ?  Then what is the purpose of moving all os repos to the "backup" subfolder?

chmod 777 /etc/yum.repos.d

cd /etc/yum.repos.d

if [ ! -f /opt/serengeti/etc/keep_default_repo ]; then

  # create a backup folder first

  if [ ! -d "backup" ]; then

    mkdir backup

  fi

  # move all os repos to backup folder

  if ls /etc/yum.repos.d/CentOS*.repo 1>/dev/null 2>&1; then

    mv -f CentOS*.repo backup

  fi

  if ls /etc/yum.repos.d/rhel*.repo 1>/dev/null 2>&1; then

    mv -f rhel*.repo backup

  fi

  if ls /etc/yum.repos.d/fedora*.repo 1>/dev/null 2>&1; then

    mv -f fedora*.repo backup

  fi

fi

# for ambari we just return now

if [ $1 = "ambari" ]; then

  if rpm -q mysql-libs-5.1.73

  then

    yum remove -y  mysql-libs-5.1.73

  fi

  exit 0

fi

# for cloudera-manager, we create a new local repo file

cat > aaa-local-app-manager.repo <<HERE

[$1]

name = local app manager yum server

baseurl = $2

gpgcheck = 0

enabled = 1

priority = 1

HERE

Besides there is another little problem:

     every time i reconnect to the vcenter server or reboot the bde server there will be an error showing that get big data clusters failed. the ssl certificate does not exist or is not trusted, then I need to disconnect and reconnect to the bde management server to work around this error temporarily。

     How can I fix this error?

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi,

It is an issue of BDE GUI. we are fixing this issue. Cloud you file a SR to us? We can follow the status on the SR.

Thanks,

-qing

Reply
0 Kudos
splintereddy
Contributor
Contributor

I don't know how to file a SR because I'm using  bde for trial with  temporary vcenter server license in my demo environment。

Besides , Is the bde developer team in China or is there any technical support In China?

I've heard that the bde would not have any update further, is that right ? Is there anyone using it in production environment ??

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi,

You can file a SR on the page my.vmware.com if you have the production license.

The developer team of BDE is in China.

BDE is on the maintenance mode right now. There are many oversea customers still using BDE on production environment.

Thanks,

-qing

Reply
0 Kudos
splintereddy
Contributor
Contributor

Well I don't have any production license bought from VMware , and my test environment will be expired in 2018.04.20.

I'm curious about the Operating mechanism of bde ,how it works when cloning and customizing a vm,  deploying the hadoop parcels ,configuring  services, etc. i've read some of scripts in it ,but still not very clear.

i also find that creating a small cluster using cloudera manager is fine, but it fails when creating a cluster with medium or larger cluster, so weird.

anyway ,thanks for your help

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi,

1, BDE server can manage the resources of vCenter, like datastores, resource pools, networkings and so on.

2, BDE also can create many types of Hadoop clusters, like CDH, HDP and so on.

3, BDE can balance the hadoop clusters resources according to the vCenter resources, like datastores, racks.

Anyway, you can prepare a Hadoop cluster just using only one command.

Thanks,

-qing

Reply
0 Kudos
splintereddy
Contributor
Contributor

Thanks for your reply.

Well I might not express it clearly enough, what i really want to figure out is that what is the function of every shell , python or ruby script , how it triggers the action of deploying hadoop components automatically, how it configures services .......

I mean the principle and details in it .Is there any technical papers on it ?

Besides, can you read chinise ?

我最近在研究bde,我觉得这种产品可以简化创建hadoop集群的前期准备工作,但是现在可能还有点小问题,不太适合在生产环境中使用。我也在研究如何使hadoop集群的创建、配置等工作更简单和自动化, 所以想更加深入的了解bde内部工作流程和底层原理而不只是简单的在GUI界面点几下鼠标发布出来一个集群。

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi,

I'm Chinese as well.

BDE 主要是提供在vSphere 基础架构上一键式创建Hadoop集群。可以通过指定jsony文件来定义Hadoop集群的配置情况。

下面是一个json文件的例子:nodeGroups->master 是定义一个Hadoop master node,worker是定义data node, client 是定义一些需要的services 的client. 在nodeGroups里同样也可以定义 Hadoop 的使用资源,比如CPU, memory, storage等等。haFlag 是定义启用vSphere HA功能,当某个node 出现了问题会自动重启这个node。configuration是提供修改Hadoop集群的配置的一个入口。

{

  "nodeGroups":[

    {

      "name": "master",

      "roles": [

        "hadoop_namenode",

        "hadoop_resourcemanager"

      ],

      "instanceNum": 1,

      "cpuNum": 2,

      "memCapacityMB": 7500,

      "storage": {

        "type": "SHARED",

        "sizeGB": 50

      },

      "haFlag": "on",

      "configuration": {

        "hadoop": {

        }

      }

    },

    {

      "name": "worker",

      "roles": [

        "hadoop_datanode",

        "hadoop_nodemanager"

      ],

      "instanceNum": 3,

      "cpuNum": 2,

      "memCapacityMB": 7500,   

      "storage": {

        "type": "LOCAL",

        "sizeGB": 50

      },

      "haFlag": "off",

      "configuration": {

        "hadoop": {

        }

      }

    },

    {

      "name": "client",

      "roles": [

        "hadoop_client",

        "hive",

        "hive_server",

        "pig"

      ],

      "instanceNum": 1,

      "cpuNum": 1,

      "memCapacityMB": 3748,

      "storage": {

        "type": "LOCAL",

        "sizeGB": 50

      },

      "haFlag": "off",

      "configuration": {

        "hadoop": {

        }

      }

    }

  ],

  // we suggest running convert-hadoop-conf.rb to generate "configuration" section and paste the output here

  "configuration": {

    "hadoop": {

      "core-site.xml": {

        // check for all settings at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml

        // note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample:

        // "io.file.buffer.size": "4096"

      },

      "hdfs-site.xml": {

        // check for all settings at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

      },

      "mapred-site.xml": {

        // check for all settings at http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-def...

      },

      "hadoop-env.sh": {

        // "HADOOP_HEAPSIZE": "",

        // "HADOOP_NAMENODE_OPTS": "",

        // "HADOOP_DATANODE_OPTS": "",

        // "HADOOP_SECONDARYNAMENODE_OPTS": "",

        // "HADOOP_JOBTRACKER_OPTS": "",

        // "HADOOP_TASKTRACKER_OPTS": "",

        // "HADOOP_CLASSPATH": "",

        // "JAVA_HOME": "",

        // "PATH": ""

      },

      "yarn-site.xml": {

        // check for all settings at  http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

      },

      "yarn-env.sh": {

        // "YARN_OPTS": "",

        // "YARN_HEAPSIZE": "",

        // "JAVA_HEAP_MAX": "",

        // "YARN_RESOURCEMANAGER_OPTS": "",

        // "YARN_RESOURCEMANAGER_HEAPSIZE": "",

        // "YARN_NODEMANAGER_OPTS": "",

        // "YARN_NODEMANAGER_HEAPSIZE": "",

        // "YARN_PROXYSERVER_OPTS": "",

        // "YARN_PROXYSERVER_HEAPSIZE": "",

        // "YARN_CLIENT_OPTS": "",

        // "YARN_ROOT_LOGGER": "",

        // "YARN_CLASSPATH": ""

      },

      "log4j.properties": {

        // "hadoop.root.logger": "INFO,RFA",

        // "log4j.appender.RFA.MaxBackupIndex": "10",

        // "log4j.appender.RFA.MaxFileSize": "100MB",

        // "hadoop.security.logger": "DEBUG,DRFA"

      },

      "fair-scheduler.xml": {

        // check for all settings at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

        // "text": "the full content of fair-scheduler.xml in one line"

      },

      "capacity-scheduler.xml": {

        // check for all settings at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

      }

    }

  }

}

BDE 会根据用户提供的json文件去分配资源。基本步骤如下:

1, 根据资源计算出Hadoop node放置的位置,例如:放在哪个Host,哪个Storage.

2, 从BDE template VM克隆出需要的Hadoop node,并且放在已经计算好的Host上。

3, 启动Hadoop node, 出始化配置(networking, Storage),所有node得到 IP 和 FQDN之后,Hadoop集群所需要的基础架构就好了。

4, BDE会根据用户使用的App manager执行自动化部署Hadoop的services 并且按需要启动他们。

BDE的优势在于用户可以根据自己的需要随时创建和删除Hadoop集群。不需要每次创建Hadoop集群太多的准备基础架构(Host, network, storage),这样会大大减少IT的工作量。具我所知目前使用BDE的用户中最大集群有大概256个Hadoop data node。而且运行的很稳定。

mogleygull
Contributor
Contributor

Try using this big database management service support. The solurtions performed by this company are designed for any kind of business and performance problem diagnostics.

Reply
0 Kudos