Skip to content

Identical keywords in build_kwargs and config_kwargs lead to TypeError in load_dataset_builder() #4910

@bablf

Description

@bablf

Describe the bug

In load_dataset_builder(), build_kwargs and config_kwargs can contain the same keywords leading to a TypeError("type object got multiple values for keyword argument "xyz").

I ran into this problem with the keyword: base_path. It might happen with other kwargs as well. I think a quickfix would be

builder_cls = import_main_class(dataset_module.module_path)
builder_kwargs = dataset_module.builder_kwargs
data_files = builder_kwargs.pop("data_files", data_files)
config_name = builder_kwargs.pop("config_name", name)
hash = builder_kwargs.pop("hash")
base_path = builder_kwargs.pop("base_path")

and then pass base_path into builder_cls.

Steps to reproduce the bug

from datasets import load_dataset
load_dataset("rotten_tomatoes", base_path="./sample_data")

Expected results

The docs state: **config_kwargs — Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.

So I would expect to be able to pass the base_path into load_dataset().

Actual results

TypeError("type object got multiple values for keyword argument "base_path").

Environment info

  • datasets version: 2.4.0
  • Platform: macOS-12.5-arm64-arm-64bit
  • Python version: 3.8.9
  • PyArrow version: 9.0.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions