RevoScaleR formulas support two formula functions for converting categorical variables:
N() treats a categorical variable as continuous.
F() treats a continuous variable as categorical.
F() contains additional arguments low, high, and exclude, which can be included to specify the value of the lowest category, the highest category, and how to handle values outside the specified range.
This example, which uses sample Census Data shipped with RevoScaleR, simply uses F() to treat the 'age' variable as a factor in the summary formula:
For more information on RevoScaleR formula syntax, type ?rxFormula at the Revolution R Enterprise console.
N() treats a categorical variable as continuous.
F() treats a continuous variable as categorical.
F() contains additional arguments low, high, and exclude, which can be included to specify the value of the lowest category, the highest category, and how to handle values outside the specified range.
This example, which uses sample Census Data shipped with RevoScaleR, simply uses F() to treat the 'age' variable as a factor in the summary formula:
sampleDataDir <- rxGetOption("sampleDataDir")
censusWorkers <- file.path(sampleDataDir, "CensusWorkers.xdf")
rxSummary(~ F(age) + sex, data = censusWorkers)
For more information on RevoScaleR formula syntax, type ?rxFormula at the Revolution R Enterprise console.