Skip to content

Error: DataBackend did not return the queried rows correctly #172

@bommert

Description

@bommert

I am encountering the error

Error: DataBackend did not return the queried rows correctly: 781 requested, 593 received.
        The resampling was probably instantiated on a different task.
This happened PipeOp performance's $train()

when I run the following code:

task = tsk("ames_housing")

# remove columns with missing values (not of interest to problem)
mi = task$missings()
keep = setdiff(names(mi[mi == 0]), task$target_names)
task$select(keep)

# create graph learner: 
# impact encoding -> filter for feature selection -> linear regression model
learner = lrn("regr.lm")
filter = flt("performance", learner = learner, 
  resampling = rsmp("holdout", ratio = 2/3), measure = msr("regr.rmse"))
enc_po = po("encodeimpact", affect_columns = selector_type("factor"))
filt_po = po("filter", filter = filter, filter.nfeat = 1)
gl = as_learner(enc_po %>>% filt_po %>>% learner)

# reampling the graph learner results in the error
resample(task, gl, rsmp("cv", folds = 5))

There seems to be no problem when there are no factor variables in the dataset, e.g. when task = tsk("mtcars") is used as task in resample() in the code above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions