Mohsin_Kamal
Contributor
Contributor

Cluster Creation Failed with Ambari AppManager

Hi,

I am getting this error when creating cluster

(cluster create --name hdp --distro HDP-1.3.2 --appManager Ambari --networkName Hadoop_NW)

serengeti>appmanager list

  NAME     DESCRIPTION                  TYPE     URL

  ----------------------------------------------------------------------

  Default  Default application manager  Default

  ambari   AmbariServer                 Ambari   http://10.6.55.239:8080

==========================

It seems that agent on the host is not able to connect to the server but the problem is the Ambari Server is not located at localhost:8080 how can i change it to ambari server's address.

Running setup agent script...

==========================

{'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server"}

Connection to node1.hadooptest.com closed.

SSH command execution finished

host=node1.hadooptest.com, exitcode=1

ERROR: Bootstrap of host node1.hadooptest.com fails because previous action finished with non-zero exit code (1)

ERROR MESSAGE: tcgetattr: Invalid argument

Connection to node1.hadooptest.com closed.

STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network connectivity between the Ambari Agent host and the Ambari Server"}

Connection to node1.hadooptest.com closed.

Tags (3)
0 Kudos
10 Replies
charliejllewell
Enthusiast
Enthusiast

Hi,

It sounds like the FQDN of the Ambari server is not properly set. DNS is a requirement of the setup. Make sure that "hostname -f" returns the correct FQDN on every server in the setup and can be resolved correctly by all other hosts.

Cheers

Charlie

0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi,

Hostname -f gives

[root@localhost conf]# hostname -f

ambari.hadooptest.com

and all the host are able to ping ambari.hadooptest.com

ping ambari.hadooptest.com

PING ambari.hadooptest.com (10.6.55.239) 56(84) bytes of data.

64 bytes from ambari.hadooptest.com (10.6.55.239): icmp_seq=1 ttl=64 time=12.9 ms

I am able to create default clusters but not using Ambari App manager.

Mohsin

0 Kudos
charliejllewell
Enthusiast
Enthusiast

Hi Mohsin,

That is strange. Are you able to post the serengeti, Ambari server and agent logs?

Cheers

Charlie

0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

Cloud you post the serengeti log(/opt/serengeti/log/serengeti.log) and Ambari server log(/var/log/ambari-server/)? I will take a look them and find the root cause.

Thanks,

-qing

0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Charlie & Qing

Please find the required log attached.

Mohsin

0 Kudos
charliejllewell
Enthusiast
Enthusiast

Hi Mosin,

Could you post the contents of /etc/hosts from the Ambari server too please.

Thanks

Charlie

0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Charlie,

Following are the entries in /etc/hosts

[root@ambari ~]# more /etc/hosts

127.0.0.1 localhost

10.6.55.241 node1.hadooptest.com

10.6.55.242 node2.hadooptest.com

10.6.55.243 node3.hadooptest.com

10.6.55.244 node4.hadooptest.com

10.6.55.245 node5.hadooptest.com

10.6.55.246 node6.hadooptest.com

10.6.55.247 node7.hadooptest.com

10.6.55.248 node8.hadooptest.com

10.6.55.249 node9.hadooptest.com

10.6.55.250 node10.hadooptest.com

10.6.55.251 node11.hadooptest.com

10.6.55.252 node12.hadooptest.com

10.6.55.239 ambari.hadooptest.com

Mohsin

0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

I have tested it on my BDE server with Ambari.

steps,

1. Set hostname to 'localhost' on Ambari server('hostname localhost').

2. Restart Ambari server service('service ambari-server restart').

3. Create a cluster using this Ambari server. And then I got the following error message like you. 

      {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host cannot reach Ambari Server 'localhost:8080'. Please check the network  connectivity between the Ambari Agent host and the Ambari Server"}

4. Set hostname to correct using command 'hostname FQDN'.

5. Restart Ambari server service('service ambari-server restart').

6. The cluster resumed successfully on BDE server.

So, cloud you have a try to do this following step 4 to 6? Let me know if you have any questions. If still failed we need to ask  Hortonworks engineer in their community.

Thanks,

-qing

0 Kudos
Mohsin_Kamal
Contributor
Contributor

Hi Qing,

Thanks for the solution its solved the earlier problem but now i got a new one. According to the error Failed to start ping port listener of:[Errno 98] Address already in use" This the only address on the lan .... what can be causing this issue ?

Mohsin

The failed nodes: 1

  ----------------------------------------------------------------------------

[NAME] hdp2-worker-0

[STATUS] VM Ready

[Error Message] ==========================

Copying common functions script...

==========================

scp /usr/lib/python2.6/site-packages/common_functions

host=node5.hadooptest.com, exitcode=0

==========================

Copying OS type check script...

==========================

scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py

host=node5.hadooptest.com, exitcode=0

==========================

Running OS type check...

==========================

Cluster primary/cluster OS type is redhat6 and local/current OS type is redhat6

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Checking 'sudo' package on remote host...

==========================

sudo-1.8.6p3-12.el6.x86_64

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Copying repo file to 'tmp' folder...

==========================

scp /etc/yum.repos.d/ambari.repo

host=node5.hadooptest.com, exitcode=0

==========================

Moving file to repo dir...

==========================

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=0

==========================

Copying setup script file...

==========================

scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py

host=node5.hadooptest.com, exitcode=0

==========================

Running setup agent script...

==========================

Restarting ambari-agent

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Found ambari-agent PID: 1682

Stopping ambari-agent

Removing PID file at /var/run/ambari-agent/ambari-agent.pid

ambari-agent successfully stopped

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Checking for previously running Ambari Agent...

Starting ambari-agent

Verifying ambari-agent process status...

ERROR: ambari-agent start failed

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

('INFO 2015-04-01 06:19:59,137 HostCheckReportFileHandler.py:109 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result

INFO 2015-04-01 06:19:59,205 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:09,207 Heartbeat.py:76 - Sending heartbeat with response id: 1 and timestamp: 1427869209207. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:09,251 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:19,252 Heartbeat.py:76 - Sending heartbeat with response id: 2 and timestamp: 1427869219252. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:19,296 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:29,296 Heartbeat.py:76 - Sending heartbeat with response id: 3 and timestamp: 1427869229296. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:29,340 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:39,340 Heartbeat.py:76 - Sending heartbeat with response id: 4 and timestamp: 1427869239340. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:39,384 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:49,384 Heartbeat.py:76 - Sending heartbeat with response id: 5 and timestamp: 1427869249384. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:49,428 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:59,429 Heartbeat.py:76 - Sending heartbeat with response id: 6 and timestamp: 1427869259429. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:21:05,061 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,870 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,871 DataCleaner.py:36 - Data cleanup thread started

INFO 2015-04-01 06:21:10,875 DataCleaner.py:71 - Data cleanup started

INFO 2015-04-01 06:21:10,876 DataCleaner.py:73 - Data cleanup finished

ERROR 2015-04-01 06:21:10,877 PingPortListener.py:44 - Failed to start ping port listener of:[Errno 98] Address already in use

INFO 2015-04-01 06:21:10,877 PingPortListener.py:52 - Ping port listener killed

', None)

Connection to node5.hadooptest.com closed.

SSH command execution finished

host=node5.hadooptest.com, exitcode=255

ERROR: Bootstrap of host node5.hadooptest.com fails because previous action finished with non-zero exit code (255)

ERROR MESSAGE: tcgetattr: Invalid argument

Connection to node5.hadooptest.com closed.

STDOUT: Restarting ambari-agent

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Found ambari-agent PID: 1682

Stopping ambari-agent

Removing PID file at /var/run/ambari-agent/ambari-agent.pid

ambari-agent successfully stopped

Verifying Python version compatibility...

Using python  /usr/bin/python2.6

Checking for previously running Ambari Agent...

Starting ambari-agent

Verifying ambari-agent process status...

ERROR: ambari-agent start failed

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

('INFO 2015-04-01 06:19:59,137 HostCheckReportFileHandler.py:109 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result

INFO 2015-04-01 06:19:59,205 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:09,207 Heartbeat.py:76 - Sending heartbeat with response id: 1 and timestamp: 1427869209207. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:09,251 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:19,252 Heartbeat.py:76 - Sending heartbeat with response id: 2 and timestamp: 1427869219252. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:19,296 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:29,296 Heartbeat.py:76 - Sending heartbeat with response id: 3 and timestamp: 1427869229296. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:29,340 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:39,340 Heartbeat.py:76 - Sending heartbeat with response id: 4 and timestamp: 1427869239340. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:39,384 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:49,384 Heartbeat.py:76 - Sending heartbeat with response id: 5 and timestamp: 1427869249384. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:20:49,428 Controller.py:211 - No commands sent from the Server.

INFO 2015-04-01 06:20:59,429 Heartbeat.py:76 - Sending heartbeat with response id: 6 and timestamp: 1427869259429. Command(s) in progress: False. Components mapped: False

INFO 2015-04-01 06:21:05,061 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,870 main.py:83 - loglevel=logging.INFO

INFO 2015-04-01 06:21:10,871 DataCleaner.py:36 - Data cleanup thread started

INFO 2015-04-01 06:21:10,875 DataCleaner.py:71 - Data cleanup started

INFO 2015-04-01 06:21:10,876 DataCleaner.py:73 - Data cleanup finished

ERROR 2015-04-01 06:21:10,877 PingPortListener.py:44 - Failed to start ping port listener of:[Errno 98] Address already in use

INFO 2015-04-01 06:21:10,877 PingPortListener.py:52 - Ping port listener killed

', None)

Connection to node5.hadooptest.com closed.

  ----------------------------------------------------------------------------

cluster hdp2 resume failed: Task execution failed: An exception happens when App_Manager (Ambari) creates the cluster: (hdp2). Creation fails..

0 Kudos
Qing_chi
VMware Employee
VMware Employee

Hi Mohsin,

You need to kill the processes that were running.

Log in each hadoop node and run following commands:

ps -ef | grep ambari

kill -9 <process_id>

And then run cluster resume on BDE server.

Thanks,

-qing

0 Kudos