Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Tuning Forest and Tree Modeling Accuracy


View products that this article applies to.

Forest and Tree Modeling Accuracy

Tune rxDForest parameters (speed trade-off)   (*: OSR and RRE defaults)

–      Increase nTree, e.g. to 20 or more   (OSR=500, RRE=10)*

–      Increase maxDepth, e.g. to 20 or more   (OSR=N/A, RRE=10)*

–      Decrease minSplit, e.g. to 2   (OSR=5, RRE=sqrt(N))*

–      Increase mTry, e.g. to 40 or more   (OSR/RRE=sqrt(p) or p/3)*

–      Increase maxNumBins, e.g. to 1e5 or 1e6

–      Accuracy of 81.4% with the KDD dataset using the following with a further increase to 82.3% when ntree=200:

ntree=20, mtry=40, minSplit=2, maxDepth=20, maxNumBins=1e6
  • Alternatively, run the open source randomForest routine across the Hadoop cluster using rxExec
–      See randomShrubbery in Section 6.5 of our Distributed Computing Guide

–      Adjust MR memory limits if needed since data must fit within memory on each node.

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104233
Revision : 1
Created on : 1/7/2017
Published on : 10/29/2015
Exists online : False
Views : 55