Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

QA: How can I randomly select data from an .xdf file?


View products that this article applies to.

You can use an R 'transform' function to transform the data and pass that function to the RevoScaleR 'rxDataStepXdf()' function. You can then use the newly created, subset .xdf file with other RevoScaleR functions. Below is a sample R script that creates a new .xdf file by randomly sampling a larger .xdf file using the hidden row selection variable available in 'transformFunc'. 

# Create a transformFunc that selects 25% of the data at random 
set.seed(13) 
xform <- function(data) { 
data$.rxRowSelection<-as.logical(rbinom(length(data[[1]]),1,.25)) 
return(data) 

rxDataStepXdf(inFile=inFile, outFile="sampledData.xdf", transformFunc=xform, overwrite=TRUE) 
# check that subsetting was done and the row selection variable is not kept in the data set. 
rxGetInfoXdf(inFile) 
rxGetInfoXdf("sampledData.xdf") 

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104278
Revision : 1
Created on : 1/7/2017
Published on : 10/29/2015
Exists online : False
Views : 57