The following items should be checked to verify that your cluster is setup properly to run LSF jobs on all of the nodes in the cluster:
- Passwordless ssh needs to be setup and available between all of the nodes of the cluster.
- The shared directory that you are using needs to be visible to all nodes of the cluster. Sometimes, mounts drop on reboots, or dns changes. It may be necessary to run 'sudo service nfs restart' on the host which exports the directory, then 'sudo mount -a' on all other nodes.
- Ensure that you can run R on each node of your cluster.
- Ensure that the cluster is operational and visible from your client by running the LSF commands 'bhosts' and 'lshosts' on that client.
- Run the RevoScaleR command 'rxPingNodes()' in Revolution R to verify that all nodes are visible and operational on the cluster.