VMware Cloud Community
Mohsin_Kamal
Contributor
Contributor

Cluster Creation Failed with Ambari AppManager

Hi,

I am getting this error when creating cluster

(cluster create --name hdp --distro HDP-1.3.2 --appManager Ambari --networkName Hadoop_NW)

serengeti>appmanager list

  NAME     DESCRIPTION                  TYPE     URL

  ----------------------------------------------------------------------

  Default  Default application manager  Default

  ambari   AmbariServer                 Ambari   http://10.6.55.239:8080

==========================

It seems that agent on the host is not able to connect to the server but the problem is the Ambari Server is not located at localhost:8080 how can i change it to ambari server's address.

Running setup agent script...

==========================

{'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server"}

Connection to node1.hadooptest.com closed.

SSH command execution finished

host=node1.hadooptest.com, exitcode=1

ERROR: Bootstrap of host node1.hadooptest.com fails because previous action finished with non-zero exit code (1)

ERROR MESSAGE: tcgetattr: Invalid argument

Connection to node1.hadooptest.com closed.

STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server"}

Connection to node1.hadooptest.com closed.

Tags (3)
Reply
0 Kudos
10 Replies
charliejllewell
Enthusiast
Enthusiast

Hi,

It sounds like the FQDN of the Ambari server is not properly set. DNS is a requirement of the setup. Make sure that "hostname -f" returns the correct FQDN on every server in the setup and can be resolved correctly by all other hosts.

Cheers

Charlie

Reply
0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi,

Hostname -f gives

[root@localhost conf]# hostname -f

ambari.hadooptest.com

and all the host are able to ping ambari.hadooptest.com

ping ambari.hadooptest.com

PING ambari.hadooptest.com (10.6.55.239) 56(84) bytes of data.

64 bytes from ambari.hadooptest.com (10.6.55.239): icmp_seq=1 ttl=64 time=12.9 ms

I am able to create default clusters but not using Ambari App manager.

Mohsin

Reply
0 Kudos
charliejllewell
Enthusiast
Enthusiast

Hi Mohsin,

That is strange. Are you able to post the serengeti, Ambari server and agent logs?

Cheers

Charlie

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

Cloud you post the serengeti log(/opt/serengeti/log/serengeti.log) and Ambari server log(/var/log/ambari-server/)? I will take a look them and find the root cause.

Thanks,

-qing

Reply
0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Charlie & Qing

Please find the required log attached.

Mohsin

Reply
0 Kudos
charliejllewell
Enthusiast
Enthusiast

Hi Mosin,

Could you post the contents of /etc/hosts from the Ambari server too please.

Thanks

Charlie

Reply
0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Charlie,

Following are the entries in /etc/hosts

[root@ambari ~]# more /etc/hosts

127.0.0.1 localhost

10.6.55.241 node1.hadooptest.com

10.6.55.242 node2.hadooptest.com

10.6.55.243 node3.hadooptest.com

10.6.55.244 node4.hadooptest.com

10.6.55.245 node5.hadooptest.com

10.6.55.246 node6.hadooptest.com

10.6.55.247 node7.hadooptest.com

10.6.55.248 node8.hadooptest.com

10.6.55.249 node9.hadooptest.com

10.6.55.250 node10.hadooptest.com

10.6.55.251 node11.hadooptest.com

10.6.55.252 node12.hadooptest.com

10.6.55.239 ambari.hadooptest.com

Mohsin

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

I have tested it on my BDE server with Ambari.

steps,

1. Set hostname to 'localhost' on Ambari server('hostname localhost').

2. Restart Ambari server service('service ambari-server restart').

3. Create a cluster using this Ambari server. And then I got the following error message like you. 

      {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network  connectivity between the Ambari Agent host and the Ambari Server"}

4. Set hostname to correct using command 'hostname FQDN'.

5. Restart Ambari server service('service ambari-server restart').

6. The cluster resumed successfully on BDE server.

So, cloud you have a try to do this following step 4 to 6? Let me know if you have any questions. If still failed we need to ask  Hortonworks engineer in their community.

Thanks,

-qing

Reply
0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Qing,

Thanks for the solution its solved the earlier problem but now i got a new one. According to the error Failed to start ping port listener of:[Errno 98] Address already in use" This the only address on the lan .... what can be causing this issue ?

Mohsin

The failed nodes: 1

  ----------------------------------------------------------------------------

[NAME] hdp2-worker-0

[STATUS] VM Ready

[Error Message] ==========================

Copying common functions script...

==========================

scp /usr/lib/python2.6/site-packages/common_functions

host=node5.hadooptest.com, exitcode=0

==========================

Copying OS type check script...

==========================

scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py

host=node5.hadooptest.com, exitcode=0

==========================

Running OS type check...

==========================

Cluster primary/cluster OS type is redhat6 and local/current OS type is redhat6

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Checking 'sudo' package on remote host...

==========================

sudo-1.8.6p3-12.el6.x86_64

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Copying repo file to 'tmp' folder...

==========================

scp /etc/yum.repos.d/ambari.repo

host=node5.hadooptest.com, exitcode=0

==========================

Moving file to repo dir...

==========================

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Copying setup script file...

==========================

scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py

host=node5.hadooptest.com, exitcode=0

==========================

Running setup agent script...

==========================

Restarting ambari-agent

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Found ambari-agent PID: 1682

Stopping ambari-agent

Removing PID file at /var/run/ambari-agent/ambari-agent.pid

ambari-agent successfully stopped

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Checking for previously running Ambari Agent...

Starting ambari-agent

Verifying ambari-agent process status...

ERROR: ambari-agent start failed

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

('INFO 2015-04-01 06:19:59,137 HostCheckReportFileHandler.py:109 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result

INFO 2015-04-01 06:19:59,205 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:09,207 Heartbeat.py:76 - Sending heartbeat with response id: 1 and timestamp: 1427869209207. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:09,251 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:19,252 Heartbeat.py:76 - Sending heartbeat with response id: 2 and timestamp: 1427869219252. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:19,296 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:29,296 Heartbeat.py:76 - Sending heartbeat with response id: 3 and timestamp: 1427869229296. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:29,340 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:39,340 Heartbeat.py:76 - Sending heartbeat with response id: 4 and timestamp: 1427869239340. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:39,384 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:49,384 Heartbeat.py:76 - Sending heartbeat with response id: 5 and timestamp: 1427869249384. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:49,428 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:59,429 Heartbeat.py:76 - Sending heartbeat with response id: 6 and timestamp: 1427869259429. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:21:05,061 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,870 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,871 DataCleaner.py:36 - Data cleanup thread started

INFO 2015-04-01 06:21:10,875 DataCleaner.py:71 - Data cleanup started

INFO 2015-04-01 06:21:10,876 DataCleaner.py:73 - Data cleanup finished

ERROR 2015-04-01 06:21:10,877 PingPortListener.py:44 - Failed to start ping port listener of:[Errno 98] Address already in use

INFO 2015-04-01 06:21:10,877 PingPortListener.py:52 - Ping port listener killed

', None)

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=255

ERROR: Bootstrap of host node5.hadooptest.com fails because previous action finished with non-zero exit code (255)

ERROR MESSAGE: tcgetattr: Invalid argument

Connection to node5.hadooptest.com closed.

STDOUT: Restarting ambari-agent

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Found ambari-agent PID: 1682

Stopping ambari-agent

Removing PID file at /var/run/ambari-agent/ambari-agent.pid

ambari-agent successfully stopped

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Checking for previously running Ambari Agent...

Starting ambari-agent

Verifying ambari-agent process status...

ERROR: ambari-agent start failed

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

('INFO 2015-04-01 06:19:59,137 HostCheckReportFileHandler.py:109 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result

INFO 2015-04-01 06:19:59,205 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:09,207 Heartbeat.py:76 - Sending heartbeat with response id: 1 and timestamp: 1427869209207. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:09,251 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:19,252 Heartbeat.py:76 - Sending heartbeat with response id: 2 and timestamp: 1427869219252. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:19,296 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:29,296 Heartbeat.py:76 - Sending heartbeat with response id: 3 and timestamp: 1427869229296. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:29,340 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:39,340 Heartbeat.py:76 - Sending heartbeat with response id: 4 and timestamp: 1427869239340. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:39,384 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:49,384 Heartbeat.py:76 - Sending heartbeat with response id: 5 and timestamp: 1427869249384. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:49,428 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:59,429 Heartbeat.py:76 - Sending heartbeat with response id: 6 and timestamp: 1427869259429. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:21:05,061 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,870 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,871 DataCleaner.py:36 - Data cleanup thread started

INFO 2015-04-01 06:21:10,875 DataCleaner.py:71 - Data cleanup started

INFO 2015-04-01 06:21:10,876 DataCleaner.py:73 - Data cleanup finished

ERROR 2015-04-01 06:21:10,877 PingPortListener.py:44 - Failed to start ping port listener of:[Errno 98] Address already in use

INFO 2015-04-01 06:21:10,877 PingPortListener.py:52 - Ping port listener killed

', None)

Connection to node5.hadooptest.com closed.

  ----------------------------------------------------------------------------

cluster hdp2 resume failed: Task execution failed: An exception happens when App_Manager (Ambari) creates the cluster: (hdp2). Creation fails..

Reply
0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

You need to kill the processes that were running.

Log in each hadoop node and run following commands:

ps -ef | grep ambari

kill -9 <process_id>

And then run cluster resume on BDE server.

Thanks,

-qing

Reply
0 Kudos