Using ODBC connections in parallel code

View products that this article applies to.

Attempting to use multiple ODBC connections across parallelized worker threads may fail as in the following example:

loaddata <- function(cn){

result <- sqlQuery(cn,'select * from boston')

return(head(result))

}

library(RODBC)

cn1 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')

cn2 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')

cn3 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')

cn4 <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')

rxSetComputeContext('localpar')

system.time ({

z <- rxExec(loaddata, rxElemArg(list(cn1,cn2,cn3,cn4)), 
packagesToLoad='RODBC')

})

Error in do.call(.rxDoParFUN, as.list(args)) :

task 1 failed - "first argument is not an open RODBC channel"

The problem is the worker processes receive the ODBC connections as closed.

The issue here is that connections are process-specific, so unless the workers are sharing the parent process (as in multicore workers created via forking), the parent's connections can't be shared by the workers. To distribute ODBC computations on non-forked workers, establish the connections on each worker as part of the distributed task.

Example:

loaddata <- function(){
library(RODBC)
cn <- odbcConnect("RevoTestDB", uid='RevoTester', pwd='RevoTester')
result <- sqlQuery(cn,'select * from boston')
return(head(result))
}

z <- system.time({z <- rxExec(loaddata,
packagesToLoad='RODBC')})

↑ Back to the top

Applies to:

Revolution Analytics

↑ Back to the top

Keywords: kb

↑ Back to the top

Article Info

Article ID	:	3103824
Revision	:	1
Created on	:	1/7/2017
Published on	:	11/1/2015
Exists online	:	False
Views	:	230

Microsoft KB Archive Search

Using ODBC connections in parallel code

Applies to: