Friday, December 27, 2013

Errors Setting up Hadoop

Setup :

  • Hadoop 1.2.1
  • 2 Virtual Machines (VM) running Fedora 19 - KDE Live
  • Host-Only Networking (Works on Bridged as well)
  • VirtualBox 4.3.4
  • Host Machine - Windows 8


Problem #1 - No Route to Host

Error Message : 
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.56.101:54310 failed on local exception: java.net.NoRouteToHostException: No route to host

Initial Review of the Situation :

  • SSH : Bidirectional
    • At this point, I added the slaves ssh keys to the Masters. So it would ssh both directions without a password.
  • etc/hosts : Correct
    • (hduser@master) & (hduser@slave)
      • 192.168.56.101 master
      • 192.168.56.102 slave
  • Ping : Bidirectional
    • hduser@master $ ping slave # good
    • hduser@slave $ ping master # good
  • $HADOOP_HOME/conf settings : Correct
    • core-site.xml (Same on both machines)
      • (hduser@master) <value>hdfs://master:54310</value> # Good
      • (hduser@slave) <value>hdfs://master:54310</value> #Good
    • mapred-site.xml
      • (hduser@master) <value>master:54311</value> #Good
      • (hduser@slave) <value>master:54311</value> #Good

Suggestions Online :

There were many suggestions online to resolve this issue, but to save space I have not included them here.

Resolution Process :

My plan for resolving this issue was to try as many of the non-intrusive ideas first, and then gradually work towards the extreme.
  • Telnet 
    • Received No Route Error!
  • Disable/Configure Firewall - Resolution
    • Disabled IPTables - Nothing Changed
  • Looked for other firewall software.
    • Firewalld, a Fedora project service, was running!
    • Disabled Firewalld
  • Connected!

Conclusion :

As many people have already suggested in other forums, the issue was not a problem with Hadoop, but a network configuration problem. I would highly suggest the next person to review their firewall configurations, and as crazy as it sounds, try to find out if any unknown firewall software is installed on the system.


Problem #2 - Incompatible NamespaceIDs

Error Message :
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1550889903; datanode namespaceID = 8951322

Initial Review of the Situation :

  • Format Namenode
    • I formated both the slave and the master NameNode in hopes to resolve this issue. Both were properly Formatted, but he problem persisted.
    • $HADOOP_HOME/bin/hadoop namenode -format

Suggestions Online :

The suggested fix for this problem was posted on one of the main tutorials for setting up Hadoop, primarily the one I was following.

Resolution Process :

My plan for resolving this issue was to try as many of the non-intrusive ideas first, and then gradually work towards the extreme.
  • Start from Scratch
    • I followed the instructions, and remove the data node information from /app/tmp/hadoop/dfs/...
    • Re-Formated the namenode
    • Started Hadoop again
    • Fail!
  • Update the NamespaceIDs
    • Initially I tried updating the Namenode (dfs.name.dir/current/VERSION) to match the Datanode Namespace. 
      • Fail, Namenode reverted back to its original NamespaceID.
    • Attempted to fix the Datanode NamespaceID to match the Namenode NamespaceID
      • Also updated the info on the slave node(s)
    • Success!

Conclusion :

I am unsure how they went out of sync, but it worked.


Problem #3 - Unregistered Datanode Exception

Error Message :
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 10.0.0.14:50010 is attempting to report storage ID DS-720171542-192.168.56.102-50010-1388219833238. Node 10.0.0.16:50010 is expected to serve this storage.

Cause : Attempted to expand my cluster size to 3 VM Nodes, via cloning the original slave VM. As a full clone, it already contained the Master SSH key, etc. The only issue was that the Master was confused about them both having the same Datanode ID.

Initial Review/Research 

  • Hadoop: How do datanodes register with the namenode?
    • From this posting it is apparent that the Datanodes are registered with the Namenode. Its more than likely by cloning the first VM, I brought over all of the registration information, causing the Namenode to believe both are the same datanode.

Resolution of the ERROR :

  • Format Namenode
    • I formated both the slave and the master NameNode in hopes to resolve this issue. Both were properly Formatted, but he problem persisted.
    • $HADOOP_HOME/bin/hadoop namenode -format
  • Start from scratch (See Problem #2 above)
    • Stopped Name/Datanodes
    • Removed all the Datanode/Namenode/Secondary folders from the master and slaves.
    • formatted the namenode (hadoop namenode -format)
    • started the cluster again (start-dfs.sh)
  • Success!

Conclusion :

This has been the second time removing the folders/files from the dfs.name/data.dir area has successfully resolved my problem.