Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Downloading Classification Task Data #5

Closed
kmdalton opened this issue Feb 10, 2025 · 2 comments
Closed

Error Downloading Classification Task Data #5

kmdalton opened this issue Feb 10, 2025 · 2 comments

Comments

@kmdalton
Copy link
Contributor

When running python -m pytorch_fob.dataset_setup benchmark.yaml for the classfication task with the config file, benchmark.yaml:

task:
  - classification
optimizer:
  - name: adamw_baseline
    lr: 1.e-2
    weight_decay: 0.1
engine:
  seed: 42

I run into the following error

(fob) [kmdalton@sdfiana027 kmdalton]$ python -m pytorch_fob.dataset_setup benchmark.yaml
[2025-02-10 13:00:18,131] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[FOB INFO] Setting up data for task 'classification'...
2025-02-10 13:00:22.738465: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has a$
ready been registered
2025-02-10 13:00:22.738512: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has al$
eady been registered
2025-02-10 13:00:22.739487: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one ha$
 already been registered
2025-02-10 13:00:22.744577: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-10 13:00:24.810116: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-02-10 13:00:27.293841: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0.
..
Dl Size...: 100%|████████████████████████████████████| 13439/13439 [04:26<00:00, 50.45 MiB/s]
Dl Completed...: 100%|███████████████████████████████████████| 3/3 [04:26<00:00, 88.80s/ url]
Generating splits...:   0%|                                       | 0/2 [00:00<?, ? splits/s]
Generating train examples...: 144174 examples [02:23, 1119.35 examples/s]
Generating train examples...: 145336 examples [02:24, 1131.61 examples/s]
Generating train examples...: 385500 examples [06:15, 361.05 examples/s]
Dataset imagenet_resized downloaded and prepared to /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0. Subsequent calls will reuse this data.
Traceback (most recent call last):
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/dataset_setup.py", line 22, in <module>
    main(args, extra_args)
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/dataset_setup.py", line 16, in main
    engine.prepare_data()
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/engine/engine.py", line 101, in prepare_data
    run.get_datamodule().prepare_data()
  File "/sdf/data/lcls/ds/prj/prjlumine22/results/kmdalton/wadam/FOB/pytorch_fob/tasks/classification/data.py", line 76, in prepare_data
    tfds.data_source("imagenet_resized/64x64", data_dir=self.data_dir, download=True)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
    return function(*args, **kwargs)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 829, in data_source
    return dbuilder.as_data_source(
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
    return function(*args, **kwargs)
  File "/sdf/home/k/kmdalton/prjlumine22/results/kmdalton/wadam/anaconda/envs/fob/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 882, in as_data_source
    raise NotImplementedError(unsupported_format_msg)
NotImplementedError: Random access data source for file format FileFormat.TFRECORD is not supported. Possible root causes:
        * You have to run download_and_prepare with file_format=array_record or parquet.
        * The dataset is already prepared at /lscratch/kmdalton/data/classification/imagenet_resized/64x64/0.1.0 in the FileFormat.TFRECORD format. Either choose another data_dir or delete
the data.

Conda environment info:

fob_env.txt

@kmdalton kmdalton changed the title Error Downloading Classification Benchmark Data Error Downloading Classification Task Data Feb 10, 2025
@simonblauth
Copy link
Collaborator

I was able to reproduce this error with your environment settings. Apparently tensorflow-datasets changed their default file format in v4.9.7 from array_record to tfrecord. Unfortunately, setting the file format manually results in another error.
So until I figure out a way to do this, you can try the following workaround:

  1. downgrade tfds: pip install tensorflow-datasets==4.9.6
  2. delete the generated splits: rm -r <data_dir>/classification/imagenet_resized (you can keep downloaded data)
  3. regenerate splits: python -m pytorch_fob.dataset_setup benchmark.yaml

Let me know if that works.

@kmdalton
Copy link
Contributor Author

Yes, this workaround seems to work. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants