Skip to content

Latest commit

 

History

History
63 lines (45 loc) · 1.61 KB

2022-09-26.md

File metadata and controls

63 lines (45 loc) · 1.61 KB

Binary Sparse Format Working Group Meeting #3 - September 26, 2022

Attendees

  • Ben Brock
  • Tim Davis
  • Jim Kitchen
  • Tim Mattson
  • Scott McMillan
  • Erik Welch
  • Isaac Virshup
  • Will Kimmerer

Agenda

  • Discuss prototype implementations by Ben, Jim

    • What binary container library should we use? Goal: develop plan for evaluating options Goal: define what we need to specify vs. container library
  • Discuss Erik's progress on V2.0

Summary of Discussion

  • Ben: HDF5 has some pain points

    • Have to pick standard types for disk storage, which can be somewhat arbitrary
    • Currently storing strings as a datset, which is not quite satisfactory
    • Others in issue #9
  • Groups can have string attributes

    • Files should have an implicit root group
  • Jim: NetCDF sits on top of HDF5 and likely handles some of these issues.

  • Jim: should make sure that a library picking different defaults in different languages does not prevent some language (e.g. Julia) from inter-operating.

  • Erik: V2 update,

Isaac: How does chunking work?

Resources and Links

Data type for storing strings in HDF5:

DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }

CDF to HDF5 Converter https://github.com/h5netcdf/h5netcdf

TensorStore - Google thing built on top of Zarr/HDF5

conda install -c jim22k sscdf

Outcomes

  • Ben will try using NetCDF in his C++ implementation, looking some of SSCDF's outputs.

  • We will meet next week to discuss Erik's updates on V2 in detail.