-
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
I am encountering the error
Error: DataBackend did not return the queried rows correctly: 781 requested, 593 received.
The resampling was probably instantiated on a different task.
This happened PipeOp performance's $train()
when I run the following code:
task = tsk("ames_housing")
# remove columns with missing values (not of interest to problem)
mi = task$missings()
keep = setdiff(names(mi[mi == 0]), task$target_names)
task$select(keep)
# create graph learner:
# impact encoding -> filter for feature selection -> linear regression model
learner = lrn("regr.lm")
filter = flt("performance", learner = learner,
resampling = rsmp("holdout", ratio = 2/3), measure = msr("regr.rmse"))
enc_po = po("encodeimpact", affect_columns = selector_type("factor"))
filt_po = po("filter", filter = filter, filter.nfeat = 1)
gl = as_learner(enc_po %>>% filt_po %>>% learner)
# reampling the graph learner results in the error
resample(task, gl, rsmp("cv", folds = 5))
There seems to be no problem when there are no factor variables in the dataset, e.g. when task = tsk("mtcars")
is used as task in resample() in the code above.
Metadata
Metadata
Assignees
Labels
No labels