Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Help with RxTextData/rxImport delimiters


View products that this article applies to.

Problem

How do I tell the RxTextData function to use the ‘|’ as delimiter or other character?

↑ Back to the top


Solution

If your text data is not separated by commas or tabs, you must specify the delimiter using the columnDelimiters argument. (This is not actually an argument to rxImport, but to the underlying RxTextData data source object.) In normal usage, this argument is a single character, such as columnDelimiters="\t" for tab-delimited data or columnDelimiters="," for comma-delimited data. However, each column may be delimited by a different character; all the delimiters must be concatenated together into a single character string. For example, if you have one column delimited by a comma, a second by a plus sign, and a third by a new line, you would use the argument columnDelimiters=",+\n".

id|val 

1|a 
2|b

So for the above data how do I fix the below code to consider ‘|’ as the delimeter

hdfsFS <- RxHdfsFileSystem(hostName=”dummy ", port="dummy") 
txtSource <- RxTextData("directory value/ file_name in hdfs", fileSystem=hdfsFS) 
airData <- rxImport(inData=txtSource, outFile = "/tmp/test.xdf",stringsAsFactors = TRUE, missingValueString = "M", rowsPerRead = 200000, overwrite=TRUE) 
rxSummary(~ id+val, data = airData)


2). To be able to read 'pipe'-delimited data, you will need to set the option 'delimiter="|"' in your RxTextData() call: 

txtSource <- RxTextData(("directory value/ file_name in hdfs", fileSystem=hdfsFS, delimiter = "|")

↑ Back to the top


Keywords: kb

↑ Back to the top

Article Info
Article ID : 3103847
Revision : 1
Created on : 1/7/2017
Published on : 11/1/2015
Exists online : False
Views : 79