Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

How to replace all NAs in an xdf file


View products that this article applies to.

For replacing all NA values by zero or other content it is necessary to transform data from an input data set to an output data set using function 

rxDataStep.

A sample script for replacing all NA in xdf file "AirlineDemoSmall.xdf" is shown below:

# Create a data frame with missing values
set.seed(17)
myDataF <- data.frame(x = rnorm(100), y = runif(100), z = rgamma(100, shape = 2))
xmiss <- seq.int(from = 5, to = 100, by = 5)
ymiss <- seq.int(from = 2, to = 100, by = 5)
myDataF$x[xmiss] <- NA
myDataF$y[ymiss] <- NA


# Convert into a xdf
myDataNA<-file.path(getwd(),"myDataNA.xdf")

trsfxdf<-rxDataStep(inData=myDataF,outFile=myDataNA,overwrite=TRUE)

writeLines("\n\nXdf Generated with random NA values")
print(rxGetInfo(myDataF, n = 15)$data)     # Test ouput data 
##
## Use from here if there is an existing xdf.
## replace myDataNA with your xdf file
##
writeLines("\n\nVariables that contains NA values (Missing Observations)")
(mySum <- rxSummary(~., data = myDataNA)$sDataFrame)

# Find variables that are missing
transVars <- mySum$Name[mySum$MissingObs > 0]
print(transVars) #Test detected variables

# create a function to replace NA vals with mean
NAreplace <- function(dataList) {
        replaceFun <- function(x) {
              x[is.na(x)] <- replaceValue
              return(x)
        }
 dataList <- lapply(dataList, replaceFun)
 return(dataList)
}
#
myDataRMV<-file.path(getwd(),"myDataRMV.xdf")       # Replace Missing Value 
trsfxdf<- rxDataStep(inData = myData1, outFile = myDataRMV,
     transformFunc = NAreplace, 
     transformVars = transVars,
     transformObjects = list(replaceValue = "REPLACED MISSING VALUE"),
     overwrite=TRUE)
writeLines("\n\nTransformed xdf with NA replaced by Value")
print(rxGetInfo(myDataRMV, n=15)$data)     # Test output data

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3104212
Revision : 1
Created on : 1/7/2017
Published on : 10/29/2015
Exists online : False
Views : 71