Skip to content

Remove script datasets in tests #38940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 20, 2025

...and remove the tests that were skipped in #38931

@lhoestq lhoestq force-pushed the remove-script-datasets-in-tests branch from e7b9c4d to 3dfebf2 Compare June 20, 2025 11:00
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ydshieh !

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I trigger CI runs for these touched test files?

I prefer to see failures (if any) today instead of tomorrow 😅

@lhoestq
Copy link
Member Author

lhoestq commented Jun 20, 2025

Sure ! Btw I also found some tests failing because of jiwer 4.0 that came out today, but I'll release a patch release in evaluate later today so you can ignore them

@lhoestq lhoestq marked this pull request as ready for review June 20, 2025 15:55
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 20, 2025

(I just saw your comment about jiwer, thanks!)

There is

FAILED tests/models/layoutlmv3/test_image_processing_layoutlmv3.py::LayoutLMv3ImageProcessingTest::test_LayoutLMv3_integration_test - ValueError: Unsupported number of image dimensions: 2

but on main it fails with

tests/models/layoutlmv3/test_image_processing_layoutlmv3.py::LayoutLMv3ImageProcessingTest::test_LayoutLMv3_integration_test - KeyError: 'file'

but it is passing in the past few days I believe. @lhoestq Could you take a look next week.

The 3 failing tests seems irrelevant to this PR, and I checked on CI runners which are passing.

I will merge so we know the effect of this PR and see if there are other stuff to fix next week.

Thank you!

@lhoestq
Copy link
Member Author

lhoestq commented Jun 20, 2025

ok ! And I just released evaluate 0.4.4 btw :)

@lhoestq
Copy link
Member Author

lhoestq commented Jun 20, 2025

and I also fixed test_LayoutLMv3_integration_test ! feel free to merge

ydshieh added a commit that referenced this pull request Jun 20, 2025
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 20, 2025

ok!

ydshieh added a commit that referenced this pull request Jun 20, 2025
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 20, 2025

run-slow: beit, dpt, granite_speech, layoutlmv2, layoutlmv3, layoutxlm, mobilevit, nougat, segformer, udop, upernet

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/beit', 'models/dpt', 'models/granite_speech', 'models/layoutlmv2', 'models/layoutlmv3', 'models/layoutxlm', 'models/mobilevit', 'models/nougat', 'models/segformer', 'models/udop', 'models/upernet']
quantizations: [] ...

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 20, 2025

There are still some failing tests relevant (I believe).

Examples:

FAILED tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorTest::test_overflowing_tokens - FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'

FAILED tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_1 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_inference_semantic_segmentation - KeyError: 'file'

FAILED tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_post_processing_semantic_segmentation - KeyError: 'file'

Let's not merge and work on this next week

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 23, 2025

@lhoestq Here is the list of issues I collected (I believe relevant) on daily CI. Could you take a look please 🙏 .

For some tests, maybe the dataset order is changed or the format is changed?

single
tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_inference_semantic_segmentation
(line 508)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/models/beit/test_modeling_beit.py::BeitModelIntegrationTest::test_post_processing_semantic_segmentation
(line 551)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch
(line 675)  AssertionError: Lists differ: ["sys[512 chars]r is mister quilter's manner less interesting than his matter"] != ["sys[512 chars]r is mister quilp's manner less interesting than his matter"]
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv2/test_processor_layoutlmv2.py::LayoutLMv2ProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/layoutxlm/test_processor_layoutxlm.py::LayoutXLMProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_modeling_udop.py::UdopModelIntegrationTests::test_conditional_generation
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b86a-62ada26a5deba74073ee7878;ac013a8a-3c75-405f-aec4-c0e3a8c22767)
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorTest::test_overflowing_tokens
(line 3505)  FileNotFoundError: [Errno 2] No such file or directory: '/Users/quentinlhoest/.cache/huggingface/datasets/downloads/extracted/35aecbb2e1ba08d57652cba29ac16f2f4b257260d21662922d6eb772ff4b9be1/dataset/training_data/images/0000971160.png'
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_1
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_2
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_3
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_4
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/udop/test_processor_udop.py::UdopProcessorIntegrationTests::test_processor_case_5
(line 319)  ValueError: Unsupported number of image dimensions: 2
--------------------------------------------------------------------------------
single
tests/models/upernet/test_modeling_upernet.py::UperNetModelIntegrationTest::test_inference_convnext_backbone
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b8d7-1b8bdf0c2a7a598823cdce04;f28ca804-7b6e-41f0-bcb5-78a25849d239)
--------------------------------------------------------------------------------
single
tests/models/upernet/test_modeling_upernet.py::UperNetModelIntegrationTest::test_inference_swin_backbone
(line 420)  huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6855b8d9-6c0cbe2e2b685a0f63b8f2f1;d88ac8fe-e7e5-42ea-ae76-87b382f45534)
--------------------------------------------------------------------------------
single
tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_handwritten
(line 1171)  AssertionError: Tensor-likes are not close!
--------------------------------------------------------------------------------
single
tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_printed
(line 437)  ValueError: Unknown split "test". Should be one of ['train'].
--------------------------------------------------------------------------------
single
tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation_multilingual
(line 675)  AssertionError: Lists differ: ['夏の時期の時期でした', ' It was the time of day and[37 chars]er.'] != ['木村さんに電話を貸してもらいました', ' Kimura-san called me.']
--------------------------------------------------------------------------------
single
tests/pipelines/test_pipelines_image_segmentation.py::ImageSegmentationPipelineTests::test_maskformer
(line 605)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/pipelines/test_pipelines_image_segmentation.py::ImageSegmentationPipelineTests::test_oneformer
(line 659)  KeyError: 'file'
--------------------------------------------------------------------------------
single
tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing
(line 529)  ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
--------------------------------------------------------------------------------

@lhoestq
Copy link
Member Author

lhoestq commented Jun 23, 2025

Alright I fixed them all except tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing which seems unrelated

EXPECTED_TRANSCRIPTS = ["木村さんに電話を貸してもらいました", " Kimura-san called me."]
EXPECTED_TRANSCRIPTS = [
"夏の時期の時期でした",
" It was the time of day and all of the pens left during the summer.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this with samples from the same dataset (but for some reason couldn't get the same sample here)

Anyway FYI the ground truth is "It was the time of day when all of Spain slept during the summer.", from here: https://huggingface.co/datasets/hf-internal-testing/fixtures_common_voice/viewer/ja/test?row=0&views%5B%5D=test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for inputs, we get different samples than before, so we need to update the expected outputs, right?

If so, good for me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, EXPECTED_TRANSCRIPTS that is updated here is the updated outputs :)

the input update is a few lines above, in load _dataset

@lhoestq
Copy link
Member Author

lhoestq commented Jun 23, 2025

(just fixed the 2 remaining ones in the CI)

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 23, 2025

Hi, I am running within a T4 CI runners, and get the following 9 failures. Let me know if you have any doubts about them.

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_1 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_2 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_3 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_4 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/layoutlmv3/test_processor_layoutlmv3.py::LayoutLMv3ProcessorIntegrationTests::test_processor_case_5 - ValueError: Unsupported number of image dimensions: 2

FAILED tests/models/udop/test_processor_udop.py::UdopProcessorTest::test_overflowing_tokens - KeyError: "Column image_path not in the dataset. Current columns in the dataset: ['id', 'words', 'bboxes', 'ner_tags', 'image']"

FAILED tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_handwritten - ValueError: Unknown split "test". Should be one of ['train'].

FAILED tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::TrOCRModelIntegrationTest::test_inference_printed - ValueError: Unknown split "test". Should be one of ['train'].


@ydshieh
Copy link
Collaborator

ydshieh commented Jun 23, 2025

tests/trainer/test_trainer_utils.py::TrainerUtilsTest::test_label_smoothing

this one is passing now. (so likely irrelevant to your changes indeed)

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 23, 2025

For

tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch - AssertionError: Lists differ: ["sys[202 chars]stantmister quilter is the apostle of the midd[322 chars]ter"] != ["sys[202 chars]stantnor is mister quilp's manner less interes[320 chars]pel"]

it uses

hf-internal-testing/librispeech_asr_dummy

and I think you don't touch that one neither.

So OK, this one is irrelevant.

@lhoestq
Copy link
Member Author

lhoestq commented Jun 24, 2025

I fixed all the ones you mentioned, including tests/models/granite_speech/test_modeling_granite_speech.py::GraniteSpeechForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 24, 2025

Thank you a lot for the improvement, also the patience for the tests.

On T4, all passing (after my last commit). I will update 2 (or 4) expected outputs for A10 tomorrow then merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants