Binary Sparse Format Working Group Meeting #1 - August 10, 2022

Attendees

Ben Brock
Tim Davis
Jim Kitchen
Tim Mattson
Scott McMillan
Erik Welch
Isaac Virshup

Agenda

Discuss the possibility of creating a binary storage format for sparse data.
Establish a general course of direction for creating a spec, enumerate desired features.

Summary of Discussion

It was pointed out that there are two components that need to be created, based on the proposed approach in Erik Welch's HPEC keynote, which proposed JSON metadata segment describing the sparse matrix layout combined with a series of binary blobs.

The specification for valid metadata. This includes supported formats (CSR, COO, Dense, etc.), as well as which types of data are supported (vectors, matrices, n-dimensional tensors) and the specific syntax to be used.
How this metadata maps on to the stored binary blobs, as well as how they are actually stored. This gets into implementation details (are we using HDF5, Zarr, or some other storage format).

We began by enumerating some of the storage options available.

HDF5
Zarr
ASDF
Possibly others

As well as the features we would like to see supported by any future spec. We established that a version 1.0 of any spec would likely support storing:

Vectors, presumably with sparse and dense formats
Matrices with a variety of sparse storage formats
N-D Tensors, initially limited to COO, but likely CSF in future versions

The following matrix formats were discussed:

COO
CSR
CSR
Hyper CSR (DCSR)
Hyper COO (DCSC)
ISO or single-value matrix
Dense mask (?)

We discussed various implementation complexities of a dense mask format, which may leave it to be left out of an initial spec.

We discussed the important of supporting blocked formats, which could potentially allow for very efficient loading and storing on distributed memory systems.

We discussed the complexities of supporting user-defined (and even built-in) types. We concluded that we should depend on a binary storage library like HDF5 to handle this for us. Some storage formats, like potentially Zarr, may allow the use of user-defined types.

We discussed the need to support different varieties of index types, at least encompassing C integer types.

We discussed TileDB's sparse storage format, acknowledging that we should perhaps take its design into consideration, but that certain aspects are not compatible with our vision for supporting multiple sparse matrix formats.

Ben expressed interest in developing a prototype implementation for C++. Jim Kitchen expressed interest in expanding on his prototype implementation in Python. Tim Davis expressed interest in supporting a cross-platform binary spec in SuiteSparse and possibly providing the format in the SuiteSparse Collection once the spec becomes mature.

Outcomes

Ben to open new GitHub repo for developing and discussing binary sparse format specification.
We will begin by agreeing on a feature set on GitHub, then working on the spec as prototype implementations develop.
The calendar invite will be updated to once monthly, with the possibility of meeting sooner if there is progress on the GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-08-10.md

2022-08-10.md

Binary Sparse Format Working Group Meeting #1 - August 10, 2022

Attendees

Agenda

Summary of Discussion

Outcomes

Files

2022-08-10.md

Latest commit

History

2022-08-10.md

File metadata and controls

Binary Sparse Format Working Group Meeting #1 - August 10, 2022

Attendees

Agenda

Summary of Discussion

Outcomes