Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

How to submit RRE job to Hadoop from a Windows client (PuTTY)


View products that this article applies to.

This article describes how to run a Revolution R Enterprise script in a Hadoop cluster from a Windows client outside the cluster using a PuTTY ssh client.
  1. Install and configure Revolution R Enterprise 7.3 in the Hadoop cluster per the Revolution R Enterprise 7.3 Hadoop Configuration Guide. Verify the operation of RRE in the cluster when the script is run from within the cluster using the validation script from section 4.
  2. Install Revolution R Enterprise for Windows 7.3 on the client Windows system.
  3. Install the PuTTY ssh client on the client Windows system. Verify ssh login capability for the R/Hadoop user from the Windows client system.
  4. Configure passwordless ssh for the R/Hadoop user by creating an ssh keypair on the client and on the Hadoop namenode for the user. Information on doing this can be found here:

    https://cs.uwaterloo.ca/cscf/howto/ssh/public_key/#putty

    or get assistance from your IT group as needed to comply with security requirements. Save the private .ppk key on the Windows client. For example, "C:\data\hdp.ppk".
  5. In the PuTTY client, create and save a named PuTTY session for the login from the client to the Hadoop namenode. For example, "RREHDP".
  6. Manually verify the passwordless login and the R user (ex: scott) using the PuTTY's plink.exe tool, the saved session, and the key:
    "C:\\Program Files (x86)\\PuTTY\\plink.exe" -i C:\data\hdp.ppk -l scott -load RREHDP
  7. If the plink.exe test login is successful, modify the Hadoop compute context used when running the script from within the cluster to include ssh connection information needed by the client. For example:

    Basic hadoop compute context used when running the script from a cluster node
    myHadoopCluster <- RxHadoopMR(consoleOutput = TRUE)

    cluster <- rxSetComputeContext(myHadoopCluster)
    Extended hadoop compute context used when running the script from a Windows client via PuTTY. Note when using PuTTY, mySshHostname should not refer to the namenode hostname. That information is in the saved PuTTY session. In the script, mySshHostname should be set to the name of the saved session.
    mySshUsername <- "scott"
    mySshHostname <- "RREHDP"

    myShareDir <- paste("/var/RevoShare", mySshUsername, sep ="/")
    myHdfsShareDir <- paste("/user/RevoShare",mySshUsername, sep="/")

    myHadoopCluster <- RxHadoopMR(
        hdfsShareDir = myHdfsShareDir,
        shareDir = myShareDir,
        sshUsername = mySshUsername,
        sshHostname = mySshHostname,
        sshClientDir = "C:\\Program Files (x86)\\PuTTY",
        sshSwitches = "-i c:\\data\\hdp.ppk",
        consoleOutput = TRUE)

    cluster <- rxSetComputeContext(myHadoopCluster)
    The sshSwitches value may be used to submit other arguments as needed to the ssh client, such as a non-default ssh port.
  8. Test the R script from Revolution R Enterprise on the Windows client. The script should connect using the PuTTY ssh client in the background to submit the script for execution on the namenode.
See the RevoScaleR Hadoop Getting Started Guide for more information.

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104143
Revision : 1
Created on : 1/7/2017
Published on : 11/1/2015
Exists online : False
Views : 90