Error Downloading Classification Task Data #5

kmdalton · 2025-02-10T22:04:54Z

When running python -m pytorch_fob.dataset_setup benchmark.yaml for the classfication task with the config file, benchmark.yaml:

task:
  - classification
optimizer:
  - name: adamw_baseline
    lr: 1.e-2
    weight_decay: 0.1
engine:
  seed: 42

I run into the following error

(fob) [kmdalton@sdfiana027 kmdalton]$ python -m pytorch_fob.dataset_setup benchmark.yaml
[2025-02-10 13:00:18,131] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[FOB INFO] Setting up data for task 'classification'...
2025-02-10 13:00:22.738465: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has a$
ready been registered
2025-02-10 13:00:22.738512: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has al$
eady been registered
2025-02-10 13:00:22.739487: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one ha$
 already been registered
2025-02-10 13:00:22.744577: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-10 13:00:24.810116: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-02-10 13:00:27.293841: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0.
..
Dl Size...: 100%|████████████████████████████████████| 13439/13439 [04:26<00:00, 50.45 MiB/s]
Dl Completed...: 100%|███████████████████████████████████████| 3/3 [04:26<00:00, 88.80s/ url]
Generating splits...:   0%|                                       | 0/2 [00:00<?, ? splits/s]
Generating train examples...: 144174 examples [02:23, 1119.35 examples/s]
Generating train examples...: 145336 examples [02:24, 1131.61 examples/s]
Generating train examples...: 385500 examples [06:15, 361.05 examples/s]
Dataset imagenet_resized downloaded and prepared to /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0. Subsequent calls will reuse this data.
Traceback (most recent call last):
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/dataset_setup.py", line 22, in <module>
    main(args, extra_args)
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/dataset_setup.py", line 16, in main
    engine.prepare_data()
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/engine/engine.py", line 101, in prepare_data
    run.get_datamodule().prepare_data()
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/tasks/classification/data.py", line 76, in prepare_data
    tfds.data_source("imagenet_resized/64x64", data_dir=self.data_dir, download=True)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
    return function(*args, **kwargs)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 829, in data_source
    return dbuilder.as_data_source(
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
    return function(*args, **kwargs)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 882, in as_data_source
    raise NotImplementedError(unsupported_format_msg)
NotImplementedError: Random access data source for file format FileFormat.TFRECORD is not supported. Possible root causes:
        * You have to run download_and_prepare with file_format=array_record or parquet.
        * The dataset is already prepared at /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0 in the FileFormat.TFRECORD format. Either choose another data_dir or delete
the data.

Conda environment info:

fob_env.txt

The text was updated successfully, but these errors were encountered:

simonblauth · 2025-02-11T16:51:57Z

I was able to reproduce this error with your environment settings. Apparently tensorflow-datasets changed their default file format in v4.9.7 from array_record to tfrecord. Unfortunately, setting the file format manually results in another error.
So until I figure out a way to do this, you can try the following workaround:

downgrade tfds: pip install tensorflow-datasets==4.9.6
delete the generated splits: rm -r <data_dir>/classification/imagenet_resized (you can keep downloaded data)
regenerate splits: python -m pytorch_fob.dataset_setup benchmark.yaml

Let me know if that works.

kmdalton · 2025-02-11T17:46:22Z

Yes, this workaround seems to work. Thanks!

kmdalton changed the title ~~Error Downloading Classification Benchmark Data~~ Error Downloading Classification Task Data Feb 10, 2025

simonblauth closed this as completed in c872f54 Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Downloading Classification Task Data #5

Error Downloading Classification Task Data #5

kmdalton commented Feb 10, 2025

simonblauth commented Feb 11, 2025

kmdalton commented Feb 11, 2025

Error Downloading Classification Task Data #5

Error Downloading Classification Task Data #5

Comments

kmdalton commented Feb 10, 2025

Conda environment info:

simonblauth commented Feb 11, 2025

kmdalton commented Feb 11, 2025