Remove script datasets in tests #38940

lhoestq · 2025-06-20T11:00:04Z

...and remove the tests that were skipped in #38931

This reverts commit 31d30b7.

HuggingFaceDocBuilderDev · 2025-06-20T11:14:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1

cc @ydshieh !

ydshieh

Should I trigger CI runs for these touched test files?

I prefer to see failures (if any) today instead of tomorrow 😅

lhoestq · 2025-06-20T15:43:09Z

Sure ! Btw I also found some tests failing because of jiwer 4.0 that came out today, but I'll release a patch release in evaluate later today so you can ignore them

ydshieh · 2025-06-20T17:37:13Z

(I just saw your comment about jiwer, thanks!)

There is

FAILED tests/models/layoutlmv3/test_image_processing_layoutlmv3.py::LayoutLMv3ImageProcessingTest::test_LayoutLMv3_integration_test - ValueError: Unsupported number of image dimensions: 2

but on main it fails with

tests/models/layoutlmv3/test_image_processing_layoutlmv3.py::LayoutLMv3ImageProcessingTest::test_LayoutLMv3_integration_test - KeyError: 'file'

but it is passing in the past few days I believe. @lhoestq Could you take a look next week.

The 3 failing tests seems irrelevant to this PR, and I checked on CI runners which are passing.

I will merge so we know the effect of this PR and see if there are other stuff to fix next week.

Thank you!

lhoestq · 2025-06-20T17:52:35Z

ok ! And I just released evaluate 0.4.4 btw :)

lhoestq · 2025-06-20T17:55:06Z

and I also fixed test_LayoutLMv3_integration_test ! feel free to merge

ydshieh · 2025-06-20T18:07:44Z

ok!

ydshieh · 2025-06-20T19:16:07Z

run-slow: beit, dpt, granite_speech, layoutlmv2, layoutlmv3, layoutxlm, mobilevit, nougat, segformer, udop, upernet

github-actions · 2025-06-20T19:17:32Z

This comment contains run-slow, running the specified jobs:

models: ['models/beit', 'models/dpt', 'models/granite_speech', 'models/layoutlmv2', 'models/layoutlmv3', 'models/layoutxlm', 'models/mobilevit', 'models/nougat', 'models/segformer', 'models/udop', 'models/upernet']
quantizations: [] ...

ydshieh · 2025-06-20T19:32:16Z

There are still some failing tests relevant (I believe).

Examples:

FAILED tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorTest::test_overflowing_tokens - FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'

FAILED tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_1 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_inference_semantic_segmentation - KeyError: 'file'

FAILED tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_post_processing_semantic_segmentation - KeyError: 'file'

Let's not merge and work on this next week

ydshieh · 2025-06-23T09:18:43Z

@lhoestq Here is the list of issues I collected (I believe relevant) on daily CI. Could you take a look please 🙏 .

For some tests, maybe the dataset order is changed or the format is changed?

single
tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_inference_semantic_segmentation
(line 508)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_post_processing_semantic_segmentation
(line 551)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch
(line 675)  AssertionError: Lists differ: ["sys[512 chars]r is mister quilter's manner less interesting than his matter"] != ["sys[512 chars]r is mister quilp's manner less interesting than his matter"]
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_modeling_udop.py::UdopModelIntegrationTests::test_conditional_generation
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b86a-62ada26a5deba74073ee7878;ac013a8a-3c75-405f-aec4-c0e3a8c22767)
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/upernet/test_modeling_upernet.py::UperNetModelIntegrationTest::test_inference_convnext_backbone
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b8d7-1b8bdf0c2a7a598823cdce04;f28ca804-7b6e-41f0-bcb5-78a25849d239)
--------------------------------------------------------------------------------
single
tests/models/upernet/test_modeling_upernet.py::UperNetModelIntegrationTest::test_inference_swin_backbone
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b8d9-6c0cbe2e2b685a0f63b8f2f1;d88ac8fe-e7e5-42ea-ae76-87b382f45534)
--------------------------------------------------------------------------------
single
tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_handwritten
(line 1171)  AssertionError: Tensor-likes are not close!
--------------------------------------------------------------------------------
single
tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_printed
(line 437)  ValueError: Unknown split "test". Should be one of ['train'].
--------------------------------------------------------------------------------
single
tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation_multilingual
(line 675)  AssertionError: Lists differ: ['夏の時期の時期でした', ' It was the time of day and[37 chars]er.'] != ['木村さんに電話を貸してもらいました', ' Kimura-san called me.']
--------------------------------------------------------------------------------
single
tests/pipelines/test_pipelines_image_segmentation.py::ImageSegmentationPipelineTests::test_maskformer
(line 605)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/pipelines/test_pipelines_image_segmentation.py::ImageSegmentationPipelineTests::test_oneformer
(line 659)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing
(line 529)  ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
--------------------------------------------------------------------------------

lhoestq · 2025-06-23T14:56:01Z

Alright I fixed them all except tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing which seems unrelated

lhoestq · 2025-06-23T15:00:41Z

tests/models/whisper/test_modeling_whisper.py

-        EXPECTED_TRANSCRIPTS = ["木村さんに電話を貸してもらいました", " Kimura-san called me."]
+        EXPECTED_TRANSCRIPTS = [
+            "夏の時期の時期でした",
+            " It was the time of day and all of the pens left during the summer.",


I updated this with samples from the same dataset (but for some reason couldn't get the same sample here)

Anyway FYI the ground truth is "It was the time of day when all of Spain slept during the summer.", from here: https://huggingface.co/datasets/hf-internal-testing/fixtures_common_voice/viewer/ja/test?row=0&views%5B%5D=test

So for inputs, we get different samples than before, so we need to update the expected outputs, right?

If so, good for me

yes, EXPECTED_TRANSCRIPTS that is updated here is the updated outputs :)

the input update is a few lines above, in load _dataset

lhoestq · 2025-06-23T15:10:01Z

(just fixed the 2 remaining ones in the CI)

ydshieh · 2025-06-23T15:54:55Z

Hi, I am running within a T4 CI runners, and get the following 9 failures. Let me know if you have any doubts about them.

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_1 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_2 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_3 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_4 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_5 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/udop/test_processor_udop.py::UdopProcessorTest::test_overflowing_tokens - KeyError: "Column image_path not in the dataset. Current columns in the dataset: ['id', 'words', 'bboxes', 'ner_tags', 'image']"

FAILED tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_handwritten - ValueError: Unknown split "test". Should be one of ['train'].

FAILED tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_printed - ValueError: Unknown split "test". Should be one of ['train'].

ydshieh · 2025-06-23T15:57:46Z

tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing

this one is passing now. (so likely irrelevant to your changes indeed)

ydshieh · 2025-06-23T16:04:43Z

For

tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch - AssertionError: Lists differ: ["sys[202 chars]stantmister quilter is the apostle of the midd[322 chars]ter"] != ["sys[202 chars]stantnor is mister quilp's manner less interes[320 chars]pel"]

it uses

hf-internal-testing/librispeech_asr_dummy

and I think you don't touch that one neither.

So OK, this one is irrelevant.

lhoestq · 2025-06-24T17:37:22Z

I fixed all the ones you mentioned, including tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch

ydshieh · 2025-06-24T20:24:39Z

Thank you a lot for the improvement, also the patience for the tests.

On T4, all passing (after my last commit). I will update 2 (or 4) expected outputs for A10 tomorrow then merge.

lhoestq added 3 commits June 20, 2025 13:00

remove trust_remote_code

b7ec09c

again

e6093de

Revert "Skip some tests for now (#38931)"

3dfebf2

This reverts commit 31d30b7.

lhoestq force-pushed the remove-script-datasets-in-tests branch from e7b9c4d to 3dfebf2 Compare June 20, 2025 11:00

lhoestq added 3 commits June 20, 2025 13:19

again

1fdb9f3

style

69419a4

again

0054598

Rocketknight1 approved these changes Jun 20, 2025

View reviewed changes

ydshieh approved these changes Jun 20, 2025

View reviewed changes

again

e2ed15c

style

d188134

lhoestq marked this pull request as ready for review June 20, 2025 15:55

fix integration test

9b2afaf

ydshieh added a commit that referenced this pull request Jun 20, 2025

trigger for Remove script datasets in tests #38940

ca57c95

ydshieh added a commit that referenced this pull request Jun 20, 2025

trigger for Remove script datasets in tests #38940

6d38d27

lhoestq added 2 commits June 23, 2025 16:53

fix tests

9db4ddb

style

f884fa5

lhoestq commented Jun 23, 2025

View reviewed changes

fix

f6c540d

fix

52f2dbe

lhoestq added 3 commits June 24, 2025 16:51

fix the last ones

d48c569

style

3b99fd2

last one

6cf0a52

fix last

9298622

Remove script datasets in tests #38940

Are you sure you want to change the base?

Remove script datasets in tests #38940

Conversation

lhoestq commented Jun 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 20, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Jun 20, 2025

Uh oh!

ydshieh commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq commented Jun 20, 2025

Uh oh!

lhoestq commented Jun 20, 2025

Uh oh!

ydshieh commented Jun 20, 2025

Uh oh!

ydshieh commented Jun 20, 2025

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

ydshieh commented Jun 20, 2025

Uh oh!

ydshieh commented Jun 23, 2025

Uh oh!

lhoestq commented Jun 23, 2025

Uh oh!

lhoestq Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Jun 23, 2025

Uh oh!

ydshieh commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jun 23, 2025

Uh oh!

ydshieh commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq commented Jun 24, 2025

Uh oh!

ydshieh commented Jun 24, 2025

Uh oh!

Uh oh!

ydshieh commented Jun 20, 2025 •

edited

Loading

ydshieh commented Jun 23, 2025 •

edited

Loading

ydshieh commented Jun 23, 2025 •

edited

Loading