-
Notifications
You must be signed in to change notification settings - Fork 25
Description
When I try running FFTrees() on a training dataset with a factor value that contains an NA value, I see an ungraceful error
Reproducible example below:
library(FFTrees) # 1.9.0
data <- data.frame(crit = c(TRUE, TRUE, FALSE, TRUE),
sex = c("m", "f", "m", NA))
FFTrees(formula = crit ~ .,
data = data)
Returns:
Aiming to create a new FFTrees object:
— Setting 'goal = bacc'
— Setting 'goal.chase = bacc'
— Setting 'goal.threshold = bacc'
— Setting 'max.levels = 4'
— Using default 'cost.outcomes' = (0 1 1 0)
— Using default 'cost.cues' = (0 per cue)
Successfully created a new FFTrees object.
Aiming to define FFTs:
Aiming to create FFTs with 'ifan' algorithm (chasing 'bacc'):
Aiming to rank 1 cues (optimizing 'bacc'):
Error: !any(is.na(cue_v)) is not TRUE
`actual`: FALSE
`expected`: TRUE
Desired behavior: An informative error message (like "Column {X} in the training data has {Y} NA values. NA values are not allowed in training") or no error and allow the trees to be built.
I believe the source of the error is here:
FFTrees/R/fftrees_threshold_factor_grid.R
Line 52 in 5978e81
| testthat::expect_true(!any(is.na(cue_v))) |
I am using FFTrees 1.9.0
Is it on principle that we don't want to allow factors with NA values in training or is this a bug? It's been a while since I've thought about the algorithm, but my recollection in the past was that we can treat NA as its own (perfectly valid) factor value and include in the tree definitions. Does that sound right?