Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Tuning Forest and Boosted Tree Prediction Speed on Hadoop

View products that this article applies to.

Forest and Boosted Tree Prediction Speed on Hadoop

By default, rxPredict launches one MR job per tree to minimize memory usage
For smallish data sets, call rxPredict inside rxExec or set scheduleOnce=TRUE (in 7.3) to reduce the scheduling overhead

– rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = TRUE, ...)

For larger data sets, set scheduleOnce=1 to do prediction in parallel using a single MR job (available in 7.3; internally, uses rxDataStep to call predict.randomForest; requires the randomForest package )

– rxPredict(dforestObject, data = myData, outData = myOutData, scheduleOnce = 1, ...

↑ Back to the top

Applies to:

Revolution Analytics

↑ Back to the top

Keywords: kb

↑ Back to the top

Article Info

Article ID	:	3104165
Revision	:	1
Created on	:	1/7/2017
Published on	:	11/1/2015
Exists online	:	False
Views	:	288

Microsoft KB Archive Search

Tuning Forest and Boosted Tree Prediction Speed on Hadoop

Applies to: