Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

QA: How do the RevoScaleR chunking algorithms work?


View products that this article applies to.

You can use the same RevoScaleR functions to process huge data sets stored on disk as you do to analyze in-memory data frames. This is because RevoScaleR functions use 'chunking' algorithms. Basically, chunking algorithms follow this process:
  1. Initialization: intermediate results needed for computation of final statistics are initialized
  2. Read data: read a chunk (set of observations of variables) of data
  3. Transform data: perform transformations and row selections for the chunk of data as needed; write out data if only performing import or data step
  4. Process data: compute intermediate results for the chunk of data
  5. Update results: combine the results from the chunk of data with those of previous chunks
  6. Repeat steps (2) - (5) (perhaps in parallel) until all data has been processed
  7. Process results: when results from all the chunks have been completed, do final computations and return results

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104271
Revision : 1
Created on : 1/7/2017
Published on : 10/29/2015
Exists online : False
Views : 58