Issue found on Cloudera CDH4, but applicable to any supported Hadoop version.
When running a Hadoop script in "local" context results are produced, but when running in hadoop context it shows following error:
"Internal Error: Cannot reset hdfs internal params while connected to an hdfs file system."
Possible causes:
1.- You are either not setting the 'nameNode' option correctly in the RxHdfsFileSystem() command or you are using the wrong port number for HDFS in that same function.
If you are running your code form an EdgeNode, please make sure that the setting for 'hostName' is the actual name of the nameNode and NOT the hostname of the edgenode you are running from.
Also, please check Cloudera Manager and verify that the HDFS service is using port '8020', the default port. If it is running on a different port, you will need to explicitly set that in the following call in your Hadoop test script:
(For example)
myNameNode <- "test1.acme.com"
myPort <- 1700
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
2.- In case of specifying an edge node you need to specify the same hostname and port settings in both RxHadoopMR() and RxHdfsFileSystem()
3.- Make sure that you have copied the RevoScaleR jar file from the directory in which you launched the Revolution installer (Revolution folder),
scaleR-hadoop-0.1-SNAPSHOT.jar into the Cloudera Hadoop lib directory, which typically is:
/opt/cloudera/parcels/CDH/lib/hadoop/lib (for parcels) or
/usr/lib/hadoop/lib/
This file needs to be copied into this folder on ALL of the nodes of your Hadoop cluster.
After modifying any of these parameters, run again the script that initially showed the error.
If the error persists, contact Technical Support, for a deeper troubleshooting.
When running a Hadoop script in "local" context results are produced, but when running in hadoop context it shows following error:
"Internal Error: Cannot reset hdfs internal params while connected to an hdfs file system."
Possible causes:
1.- You are either not setting the 'nameNode' option correctly in the RxHdfsFileSystem() command or you are using the wrong port number for HDFS in that same function.
If you are running your code form an EdgeNode, please make sure that the setting for 'hostName' is the actual name of the nameNode and NOT the hostname of the edgenode you are running from.
Also, please check Cloudera Manager and verify that the HDFS service is using port '8020', the default port. If it is running on a different port, you will need to explicitly set that in the following call in your Hadoop test script:
(For example)
myNameNode <- "test1.acme.com"
myPort <- 1700
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
2.- In case of specifying an edge node you need to specify the same hostname and port settings in both RxHadoopMR() and RxHdfsFileSystem()
3.- Make sure that you have copied the RevoScaleR jar file from the directory in which you launched the Revolution installer (Revolution folder),
scaleR-hadoop-0.1-SNAPSHOT.jar into the Cloudera Hadoop lib directory, which typically is:
/opt/cloudera/parcels/CDH/lib/hadoop/lib (for parcels) or
/usr/lib/hadoop/lib/
This file needs to be copied into this folder on ALL of the nodes of your Hadoop cluster.
After modifying any of these parameters, run again the script that initially showed the error.
If the error persists, contact Technical Support, for a deeper troubleshooting.