Skip to content

FFTrees() fails ungracefully when a factor in the training data has NA #160

@pa-nathaniel

Description

@pa-nathaniel

When I try running FFTrees() on a training dataset with a factor value that contains an NA value, I see an ungraceful error

Reproducible example below:

library(FFTrees) # 1.9.0

data <- data.frame(crit = c(TRUE, TRUE, FALSE, TRUE),
                   sex = c("m", "f", "m", NA))

FFTrees(formula = crit ~ .,
        data = data)

Returns:

Aiming to create a new FFTrees object:
— Setting 'goal = bacc'
— Setting 'goal.chase = bacc'
— Setting 'goal.threshold = bacc'
— Setting 'max.levels = 4'
— Using default 'cost.outcomes' = (0 1 1 0)
— Using default 'cost.cues' = (0 per cue)
Successfully created a new FFTrees object.
Aiming to define FFTs:
Aiming to create FFTs with 'ifan' algorithm (chasing 'bacc'):
Aiming to rank 1 cues (optimizing 'bacc'):
Error: !any(is.na(cue_v)) is not TRUE

`actual`:   FALSE
`expected`: TRUE 

Desired behavior: An informative error message (like "Column {X} in the training data has {Y} NA values. NA values are not allowed in training") or no error and allow the trees to be built.

I believe the source of the error is here:

testthat::expect_true(!any(is.na(cue_v)))

I am using FFTrees 1.9.0

Is it on principle that we don't want to allow factors with NA values in training or is this a bug? It's been a while since I've thought about the algorithm, but my recollection in the past was that we can treat NA as its own (perfectly valid) factor value and include in the tree definitions. Does that sound right?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions