Setup :
- Hadoop 1.2.1
- 2 Virtual Machines (VM) running Fedora 19 - KDE Live
- Host-Only Networking (Works on Bridged as well)
- VirtualBox 4.3.4
- Host Machine - Windows 8
Problem #1 - No Route to Host
Error Message :ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.56.101:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
Initial Review of the Situation :
- SSH : Bidirectional
- At this point, I added the slaves ssh keys to the Masters. So it would ssh both directions without a password.
- etc/hosts : Correct
- (hduser@master) & (hduser@slave)
- 192.168.56.101 master
- 192.168.56.102 slave
- Ping : Bidirectional
- hduser@master $ ping slave # good
- hduser@slave $ ping master # good
- $HADOOP_HOME/conf settings : Correct
- core-site.xml (Same on both machines)
- (hduser@master) <value>hdfs://master:54310</value> # Good
- (hduser@slave) <value>hdfs://master:54310</value> #Good
- mapred-site.xml
- (hduser@master) <value>master:54311</value> #Good
- (hduser@slave) <value>master:54311</value> #Good
Suggestions Online :
There were many suggestions online to resolve this issue, but to save space I have not included them here.- Hadoop cluster setup : Firewall issues
- Check Firewall configurations
- Turn off iptables
- # service iptables save
- # service iptables stop
- # chkconfig iptables off
- Error on starting HDFS daemons on hadoop Multinode cluster
- Confirm NameNode is running fine.
- Try Telnet to the IP and Port
- IPTables are misconfigured
- DNS has misconfigured IP addresses
Resolution Process :
My plan for resolving this issue was to try as many of the non-intrusive ideas first, and then gradually work towards the extreme.- Telnet
- Received No Route Error!
- Disable/Configure Firewall - Resolution
- Disabled IPTables - Nothing Changed
- Looked for other firewall software.
- Firewalld, a Fedora project service, was running!
- Disabled Firewalld
- Connected!
Conclusion :
As many people have already suggested in other forums, the issue was not a problem with Hadoop, but a network configuration problem. I would highly suggest the next person to review their firewall configurations, and as crazy as it sounds, try to find out if any unknown firewall software is installed on the system.Problem #2 - Incompatible NamespaceIDs
Error Message :ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1550889903; datanode namespaceID = 8951322
Initial Review of the Situation :
- Format Namenode
- I formated both the slave and the master NameNode in hopes to resolve this issue. Both were properly Formatted, but he problem persisted.
- $HADOOP_HOME/bin/hadoop namenode -format
Suggestions Online :
The suggested fix for this problem was posted on one of the main tutorials for setting up Hadoop, primarily the one I was following.- Running Hadoop on Ubuntu Linux (Multi-Node Cluster)
- Start from scratch (Only if necessary)
- Update the NamespaceIDs to match.
- $dfs.data.dir/current/VERSION
Resolution Process :
My plan for resolving this issue was to try as many of the non-intrusive ideas first, and then gradually work towards the extreme.- Start from Scratch
- I followed the instructions, and remove the data node information from /app/tmp/hadoop/dfs/...
- Re-Formated the namenode
- Started Hadoop again
- Fail!
- Update the NamespaceIDs
- Initially I tried updating the Namenode (dfs.name.dir/current/VERSION) to match the Datanode Namespace.
- Fail, Namenode reverted back to its original NamespaceID.
- Attempted to fix the Datanode NamespaceID to match the Namenode NamespaceID
- Also updated the info on the slave node(s)
- Success!
Conclusion :
I am unsure how they went out of sync, but it worked.Problem #3 - Unregistered Datanode Exception
Error Message :WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 10.0.0.14:50010 is attempting to report storage ID DS-720171542-192.168.56.102-50010-1388219833238. Node 10.0.0.16:50010 is expected to serve this storage.
Cause : Attempted to expand my cluster size to 3 VM Nodes, via cloning the original slave VM. As a full clone, it already contained the Master SSH key, etc. The only issue was that the Master was confused about them both having the same Datanode ID.
Initial Review/Research
- Hadoop: How do datanodes register with the namenode?
- From this posting it is apparent that the Datanodes are registered with the Namenode. Its more than likely by cloning the first VM, I brought over all of the registration information, causing the Namenode to believe both are the same datanode.
Resolution of the ERROR :
- Format Namenode
- I formated both the slave and the master NameNode in hopes to resolve this issue. Both were properly Formatted, but he problem persisted.
- $HADOOP_HOME/bin/hadoop namenode -format
- Start from scratch (See Problem #2 above)
- Stopped Name/Datanodes
- Removed all the Datanode/Namenode/Secondary folders from the master and slaves.
- formatted the namenode (hadoop namenode -format)
- started the cluster again (start-dfs.sh)
- Success!
No comments:
Post a Comment