When CDH6 installs agent, it prompts that the installation fails and cannot receive the detection signal sent by Agent.

< H1 > problem description < / H1 >

I installed CDH6, on three virtual machines to install agent in the web interface, waiting for the newly installed Agent detection signal. This step waited for about 1 minute, prompting the following error:

  Agent 

 7182  Cloudera Manager Server 
 9000  9001 
 /var/log/cloudera-scm-agent/ 
 Cloudera Manager  TLS  ->  ->  /etc/cloudera-scm-agent/config.ini
 use_tls=1
< H1 > deployment environment: < / H1 >

three virtual machines on a host of 32GB, each configured with 2-core CPU,8GB memory, 40GB disk, virtual machine enp0s3 network card connects to the external network through NAT, enp0s8 network card forms a local area network with the host through bridging, the host + 3 virtual machines are interconnected with each other, and the operating system installs gnome for CentOS7.2,

< H1 > deployment Planning: < / H1 >

three hosts hostname are cdh102, cdh103 and cdh104, respectively. Plan to install Manager, on cdh102 and install agent on cdh102/103/104

< H1 > installation process: < / H1 >

download the offline installation packages of CDH6 and Manager on cdh102, place them in the http service directory, configure YumSource to cdh102 on 102Accord 103Universe to realize offline installation. The database uses the MySQL version recommended by the official website, and Auto-TLS authentication is enabled

. < H1 > debugging process: < / H1 >

according to the error report of installing agent, I did the following verification:

1) check the hostnames of three virtual machines

cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.101 vm101
192.168.56.102 cdh102.pcicdh.com cdh102
192.168.56.103 cdh103.pcicdh.com cdh103
192.168.56.104 cdh104.pcicdh.com cdh104

cat /etc/hostname 
cdh103.pcicdh.com

cat  /etc/sysconfig/network
HOSTNAME=cdh102.pcicdh.com

did not find any mistakes in writing

2) check whether port 7182 of Manager is accessible

nc -w 1 192.168.56.102 7182

the runtime shows a blank line with no movement. Try other ports and find that it is an error. Does: Ncat: Connection refused, show that the blank line is silent? does it mean that it is accessible?

3) whether ports 9000 and 9001 are free on the host where agent is installed

where 104 appears as follows:

[root@cdh104 ~]-sharp lsof -i:9000
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python2 2505 root    4u  IPv4  22121      0t0  TCP cdh104.pcicdh.com:cslistener (LISTEN)

[root@cdh104 ~]-sharp ps -ef | grep 2505
root      2505  1068  1 09:24 ?        00:00:58 /usr/bin/python2 /opt/cloudera/cm-agent/bin/../bin/cm status_server
root      4905  2572  0 10:24 pts/0    00:00:00 grep --color=auto 2505

[root@cdh104 ~]-sharp lsof -i:9001
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python2 4618 root   10u  IPv4  31088      0t0  TCP localhost:etlservicemgr (LISTEN)

[root@cdh104 ~]-sharp ps -ef | grep 4618
root      4618     1  0 10:01 ?        00:00:13 /usr/bin/python2 /opt/cloudera/cm-agent/bin/cm agent
root      4927  2572  0 10:25 pts/0    00:00:00 grep --color=auto 4618

the same is true for the other two, indicating that agent has been used for 9000 and 9001

.

4) check the / var/log/cloudera-scm-agent/cloudera-scm-agent.log log on cdh104 to find the ERROR log

[09/Oct/2018 10:27:24 +0000] 4618 MainThread agent        ERROR    Heartbeating to cdh102.pcicdh.com:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1362, in _send_heartbeat
    self.cfg.max_cert_depth)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
    self.conn.connect()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
    sock.connect((self.host, self.port))
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 304, in connect
    ret = self.connect_ssl()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 291, in connect_ssl
    return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate
[09/Oct/2018 10:27:29 +0000] 4618 MainThread agent        ERROR    Heartbeating to cdh102.pcicdh.com:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1362, in _send_heartbeat
    self.cfg.max_cert_depth)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 139, in __init__
    self.conn.connect()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/httpslib.py", line 80, in connect
    sock.connect((self.host, self.port))
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 304, in connect
    ret = self.connect_ssl()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 291, in connect_ssl
    return m2.ssl_connect(self.ssl, self._timeout)
SSLError: sslv3 alert bad certificate

from the log, cdh104 cannot send a heartbeat to port 7182 of cdh102, but it can be connected from the nc command in the second point, that is, this problem has not been solved.

5) check / etc/cloudera-scm-agent/config.ini on cdh104

Auto-TLS is enabled during installation, and "use TLS encryption for agents" is also enabled in the Manager Management-> configuration-> Security interface. The use_tls configuration in the config.ini file on cdh104 is 0, so I try to change it to 1. After saving, restart the configuration of the other two virtual machines on agent, on cdh104 with the systemctl restart cloudera-scm-agent.service command without modification. Click the retry installation of cdh104 on the web interface and still report the same error

I wonder if you have ever encountered this kind of situation. I hope you can give me some advice. Thank you.

Aug.10,2021

has been tested because the detection signal sent by Agent cannot be received when Auto-TLS is enabled. If Auto-TLS is not enabled during reinstallation, there will be no such problem. However, I didn't turn on Auto-TLS until I started server after installing Manager. I don't know if it has anything to do with this

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b386e6-2c0f4.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b386e6-2c0f4.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?