Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Tuning Options for ScaleR text Imports


View products that this article applies to.

Windows/Linux Block Size
  • When choosing block size, try to select rowsPerRead to yield ~10M elements in the block, or even less
    • With 20 columns, rowsPerRead=500e3
    • With 1000 cols, rowsPerRead=1000
  • This tends to give a block size such that you can process multiple blocks per read
  • Use blocksPerRead > 1
    • The exact value depends on how much RAM you have available
    • Generally having multiple blocks in memory simultaneously improves performance
  • It is easy to increase blocksPerRead, but expensive to re-block, so err on the side of having smaller blocks
  • If you use rxSplit() or rxDataStep() to create samples, e.g. training/validation, then use rxDataStep() to re-block according to the previous principle

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104210
Revision : 1
Created on : 1/7/2017
Published on : 10/29/2015
Exists online : False
Views : 79