Skip to content

grc-iit/FITS-HDF5-VOL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FITS-HDF5-VOL

A terminal HDF5 Virtual Object Layer (VOL) connector that lets unmodified HDF5 applications read FITS files through the standard HDF5 API. The library / connector is named fits-hdf5-vol; this repository (grc-iit/FITS-HDF5-VOL) is its source home.

Status: v1.0.0-rc — M1 through M6 complete. Read-only. 162 ctest cases pass: synthetic fixtures, the astropy public test corpus, the NRAO ftt4b corpus, and 97 real open-access astronomy files (SkyView images + VizieR catalogs). The C-level subset is also clean under -fsanitize=address,undefined with leak detection enabled. The same libfits_hdf5_vol.so runs unchanged on HDF5 1.14.x and 2.1.x. A permanent VOL connector value from The HDF Group is pending; the current 510 is provisional and may change before the v1.0.0 tag.

Target HDF5: ≥ 1.14.3 (built and validated against 2.1.x). Hard dep: CFITSIO ≥ 4.0 (system install via pkg-config). License: see LICENSE.

Quick start

# 1. Install deps (Ubuntu / Debian; HDF5 ≥ 1.14.3 from your distro is fine).
sudo apt install libcfitsio-dev libhdf5-dev cmake gcc

# 2. Build.
cmake -S . -B build
cmake --build build -j$(nproc)

# 3. Point HDF5 at the connector and read any FITS file with stock tools.
export HDF5_PLUGIN_PATH=$PWD/build
export HDF5_VOL_CONNECTOR=fits
h5ls -r path/to/any.fits        # FITS now looks like HDF5

If your distro HDF5 is too old, see Building HDF5 from source.

Capabilities at a glance

What an unmodified HDF5 app can do today against any FITS file:

  • H5Fopen (read-only). Random Groups and non-FITS files cleanly rejected.
  • Walk the HDU tree as /HDU0, /HDU1, … (H5Lvisit2, h5ls -r). EXTNAME keywords surface as soft links /<EXTNAME>/HDUn.
  • Read every header keyword as a typed HDF5 attribute (int / float / bool / string / complex / COMMENT / HISTORY / HIERARCH / CONTINUE, plus __raw_header__ byte-exact card array). H5Aiterate2, H5Aread on any keyword, TUNITn exposed as units on per-column datasets.
  • Read every image HDU's pixels: all BITPIX (8/16/32/64/-32/-64), the BZERO unsigned-int convention (uint16/32/64), general BSCALE/BZERO rescale (→ float64), hyperslab and point selections.
  • Read every table HDU (ASCII + binary). Per-column view at /HDUn/columns/<TTYPE> and a row-view compound at /HDUn/table. Variable-length columns map to H5T_VLEN; TDIMn multi-D cells map to H5T_ARRAY.
  • Tile-compressed image HDUs surface for introspection; H5Dread on them fails with a clear v2-deferred error.

What's deferred:

  • Writing FITS — out of scope for v1. Fits-hdf5-vol is read-only by design. See tools/h5_to_fits.c for a separate one-way HDF5→FITS converter.
  • Vlen members inside the compound row view (per-column vlen still works).
  • Vlen-string columns (TFORM 'PA').
  • WCS interpretation (keywords surfaced verbatim; geometry is the app's job).

Build

Prerequisites

# Ubuntu / Debian:
sudo apt install libcfitsio-dev libhdf5-dev cmake gcc

Build fits-hdf5-vol

cmake -S . -B build
cmake --build build -j$(nproc)

Outputs: build/libfits_hdf5_vol.so plus the demo utilities (fits_to_h5, h5_to_fits, fits_compare) under build/. The .c sources for the utilities live in tools/; the binaries land in build/ after the build.

Run the test suite

ctest --test-dir build
# Expect: 100% tests passed, 65 / 65

This covers synthetic fixtures, the 5 sha256-pinned astropy corpus files, and the 16 NRAO ftt4b files (if present at ~/fits-tests/ftt4b). For the full 162-test run including real astronomy data, see Astronomy test corpus below.

Rebuild and rerun

After editing source files, rebuild and rerun in one step:

cmake --build build -j$(nproc) && ctest --test-dir build

# Show output from any failing tests:
cmake --build build -j$(nproc) && ctest --test-dir build --output-on-failure

# Run only the base suite (no astronomy data needed):
ctest --test-dir build -LE astro

# Run only the astronomy corpus:
ctest --test-dir build -L astro

If you changed CMakeLists.txt or added files, reconfigure first:

cmake -S . -B build && cmake --build build -j$(nproc)
ctest --test-dir build

Optional: Building HDF5 from source

Only needed if your distro HDF5 is older than 1.14.3, or you want to exercise fits-hdf5-vol against HDF5 2.1.x:

git clone https://github.com/HDFGroup/hdf5
git -C hdf5 checkout 2.1.1
cmake -S hdf5 -B hdf5/build \
    -DCMAKE_INSTALL_PREFIX=$HOME/opt/hdf5-2.1 \
    -DBUILD_SHARED_LIBS=ON \
    -DHDF5_BUILD_TOOLS=ON \
    -DBUILD_TESTING=OFF
cmake --build hdf5/build -j$(nproc)
cmake --install hdf5/build

# Then build fits-hdf5-vol against it:
cmake -S . -B build -DCMAKE_PREFIX_PATH=$HOME/opt/hdf5-2.1
cmake --build build -j$(nproc)
LD_LIBRARY_PATH=$HOME/opt/hdf5-2.1/lib ctest --test-dir build

Astronomy test corpus

The repository ships a download script that pulls real open-access FITS files from NASA SkyView and CDS VizieR. These files are not stored in git (binary blobs; ~60 MB total). Once downloaded, CMake picks them up automatically and adds 97 ctest cases labelled astro.

Step 1 — install Python dependencies

pip install requests astroquery

Step 2 — download the data

python3 tools/download_test_data.py

This downloads in one shot:

  • 75 multi-survey images from NASA SkyView (DSS optical, 2MASS J-band, ROSAT X-ray) covering 25 sky targets (Orion, Andromeda, Galactic Centre, LMC/SMC, …)
  • 14 VizieR binary table catalogs (Gaia DR3/DR2/DR1/eDR3, 2MASS PSC, AllWISE, Hipparcos, Tycho-2, SDSS DR12, TESS, UCAC4, NGC 2000, …)
  • 7 astropy public test files (HorseHead, L1448 cube, M13 images, Chandra events)

Files land in tests/astronomy_data/images/ and tests/astronomy_data/tables/. Re-running is safe: existing files are skipped.

Step 3 — reconfigure and run

CMake detects the directory at configure time:

cmake -S . -B build          # or re-run your existing cmake command
ctest --test-dir build
# Expect: 100% tests passed, 162 / 162

# Run only the astronomy corpus:
ctest --test-dir build -L astro

If tests/astronomy_data/ is absent, the 97 astro_* tests are simply not registered and the base suite (65 tests) runs unchanged.

Load the connector

Two ways. Environment-variable form — zero source change to your app:

export HDF5_PLUGIN_PATH=$PWD/build
export HDF5_VOL_CONNECTOR=fits
# If you built HDF5 from source: export LD_LIBRARY_PATH=$HOME/opt/hdf5-2.1/lib

# Now any HDF5 program reads FITS as if it were HDF5:
h5ls -r some.fits
h5dump -A some.fits

Programmatic FAPL form — if you only want fits-hdf5-vol on specific files:

hid_t vol = H5VLregister_connector_by_name("fits", H5P_DEFAULT);
hid_t fapl = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_vol(fapl, vol, NULL);
hid_t fid = H5Fopen("obs.fits", H5F_ACC_RDONLY, fapl);

End-to-end example: convert a FITS file to HDF5 and back

tools/ ships three small utilities. fits_to_h5 uses fits-hdf5-vol on the input side and the native HDF5 VOL on the output side. h5_to_fits does the reverse direction with CFITSIO directly (fits-hdf5-vol is read-only). fits_compare does a byte-by-byte HDU-pixel comparison via CFITSIO.

A complete round-trip on a small public FITS file:

# Set up the environment once.
export HDF5_PLUGIN_PATH=$PWD/build

mkdir -p roundtrip
curl -L -o roundtrip/gc_2mass_k.fits \
    https://github.com/astropy/astropy-data/raw/main/galactic_center/gc_2mass_k.fits

# 1. FITS → native HDF5 (via fits-hdf5-vol).
./build/fits_to_h5 roundtrip/gc_2mass_k.fits roundtrip/gc_2mass_k.h5

# 2. HDF5 → restored FITS (via CFITSIO).
./build/h5_to_fits roundtrip/gc_2mass_k.h5 roundtrip/gc_2mass_k_restored.fits

# 3. Verify pixels are bit-exact.
./build/fits_compare roundtrip/gc_2mass_k.fits roundtrip/gc_2mass_k_restored.fits
# → HDU0: rank 2|2  bitpix 16|16  dims [ 512 512 ] | [ 512 512 ]  ✓ pixels match (524288 bytes)

The intermediate roundtrip/gc_2mass_k.h5 is a native HDF5 file. Inspect it with stock HDF5 tools — fits-hdf5-vol is not needed:

h5ls -r roundtrip/gc_2mass_k.h5
# /                        Group
# /HDU0                    Group
# /HDU0/data               Dataset {512, 512}

Read it from Python with h5py:

import h5py
with h5py.File("roundtrip/gc_2mass_k.h5", "r") as f:
    data   = f["HDU0/data"][...]              # NumPy int16, shape (512, 512)
    bitpix = int(f["HDU0"].attrs["BITPIX"])   # 16

Reading a FITS file directly through fits-hdf5-vol (no conversion)

You can skip the conversion entirely and treat FITS as live HDF5:

import os, h5py

# Set env vars BEFORE importing h5py — h5py reads them at import time.
os.environ["HDF5_PLUGIN_PATH"]   = "/path/to/FITS-HDF5-VOL/build"
os.environ["HDF5_VOL_CONNECTOR"] = "fits"
# If you built HDF5 from source, also set:
# os.environ["LD_LIBRARY_PATH"] = "/path/to/hdf5-2.1/lib"

with h5py.File("some.fits", "r") as f:
    pixels = f["HDU0/data"][...]
    ra     = f["HDU1/columns/RA_ICRS"][...]   # table column
    naxis2 = int(f["HDU0"].attrs["NAXIS2"])

h5py limitation (upstream): h5py 3.x builds with H5_USE_110_API, which routes f.keys(), f.visititems(), and "x" in f through the v1 link-iterate API that HDF5 rejects on non-native VOLs. Use direct path access (f["HDU0/data"]) instead. To use iteration, build h5py from source against the same HDF5 install:

HDF5_DIR=$HOME/opt/hdf5-2.1 pip install --no-binary=h5py --no-cache-dir h5py

Demo figures

tools/make_demo_figures.py generates presentation-quality images from real FITS files read live through the HDF5 API. Pass any .fits image files as positional arguments — one output figure is produced per file:

export HDF5_PLUGIN_PATH=$PWD/build
export HDF5_VOL_CONNECTOR=fits

python3 tools/make_demo_figures.py \
    tests/astronomy_data/images/Horsehead_Neb_2MASS_J_band.fits \
    tests/astronomy_data/images/M51_Whirlpool_DSS_optical.fits \
    tests/astronomy_data/images/LMC_2MASS_J_band.fits

Figures are written to demo/ as fig_01_<stem>.png, fig_02_<stem>.png, … Titles and labels are derived from the filename; no technical metadata is shown.

If you built HDF5 from source, prefix with LD_LIBRARY_PATH=$HOME/opt/hdf5-2.1/lib.

Repository layout

FITS-HDF5-VOL/
├── src/fits_hdf5_vol_connector.c   VOL callback layer
├── adapters/fits/fits_adapter.c    FITS adapter (CFITSIO-backed)
├── include/fits_hdf5/              Public headers (adapter.h, fits_hdf5_vol.h)
├── tools/
│   ├── fits_to_h5.c                FITS → native HDF5 converter (source)
│   ├── h5_to_fits.c                HDF5 → FITS converter (source)
│   ├── fits_compare.c              Pixel-level FITS diff (source)
│   ├── download_test_data.py       Download SkyView images + VizieR tables + astropy files
│   └── make_demo_figures.py        Presentation figure generator (one figure per FITS file)
├── tests/
│   ├── integration/                C unit tests + h5py smoke + golden scripts
│   ├── fixtures/                   build_fixtures.c — 10 deterministic FITS files
│   ├── golden/                     *.h5ls.txt pinned reference outputs
│   └── astronomy_data/             Downloaded FITS files (git-ignored; see above)
│       ├── images/                 SkyView multi-survey image cutouts (75 files)
│       └── tables/                 VizieR catalog tables + astropy test files (21 files)
├── cmake/FetchCorpus.cmake         Pinned astropy public test files
├── docs/                           Format-adapter API and release notes
└── CMakeLists.txt

Further reading

About

FITS-HDF5-VOL brings FITS files into the HDF5 ecosystem without conversion. It implements the HDF5 Virtual Object Layer interface, so every HDF5 application: h5py, h5dump, h5ls, PyTables, or anything else built on the HDF5 C library can open an unmodified .fits file and read its pixels, tables, and headers through the standard HDF5 API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors