problems encountered
recently HA, the company"s Hadoop experimental cluster, but found that if you directly kill the active namenode process, you can automatically switch to standby namenode,. If the active namenode node goes down directly (init 0), you cannot automatically switch to standby namenode
check the zkfc log of the standby node. It is found that a node is trying to connect to the original active namenode node through ssh, but the node has been down and cannot be connected through ssh. So a loop reports an error, isn"t hadoop"s HA designed for this scenario
135884 2018-12-03 19:11:47,484 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
135885 2018-12-03 19:11:47,484 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
135886 2018-12-03 19:11:47,484 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to shell04...
135887 2018-12-03 19:11:47,484 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to shell04 port 22
135888 2018-12-03 19:11:50,488 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to shell04 as user root
135889 com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host
135890 at com.jcraft.jsch.Util.createSocket(Util.java:394)
135891 at com.jcraft.jsch.Session.connect(Session.java:215)
135892 at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
135893 at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
135894 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
135895 at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
135896 at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
135897 at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
135898 at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
135899 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
135900 at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
135901 at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
135902 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
135903 Caused by: java.net.NoRouteToHostException: No route to host
135904 at java.net.PlainSocketImpl.socketConnect(Native Method)
135905 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
135906 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
135907 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
135908 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
135909 at java.net.Socket.connect(Socket.java:579)
135910 at java.net.Socket.connect(Socket.java:528)
135911 at java.net.Socket.<init>(Socket.java:425)
135912 at java.net.Socket.<init>(Socket.java:208)
135913 at com.jcraft.jsch.Util$1.run(Util.java:362)
135914 at java.lang.Thread.run(Thread.java:745)
135915 2018-12-03 19:11:50,490 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
135916 2018-12-03 19:11:50,490 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
135917 2018-12-03 19:11:50,490 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
135918 java.lang.RuntimeException: Unable to fence NameNode at shell04/192.168.254.143:9000
135919 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
135920 at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
135921 at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
135922 at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
135923 at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
135924 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
135925 at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
135926 at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
135927 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)