Skip to content

Commit

Permalink
Update with minutes from today's meeting
Browse files Browse the repository at this point in the history
  • Loading branch information
BenBrock committed Aug 10, 2022
1 parent 0febdd7 commit 36a3637
Show file tree
Hide file tree
Showing 2 changed files with 90 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Binary Sparse Format Specification
This is part of a new effort to create a binary storage format for storing sparse matrices and other sparse data to disk.

[Minutes from our meetings](minutes) are available.
86 changes: 86 additions & 0 deletions minutes/aug-10-2022.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Binary Sparse Format Working Group Meeting #1 - August 10, 2022

## Attendees
- [X] Ben Brock
- [X] Tim Davis
- [X] Jim Kitchen
- [X] Tim Mattson
- [ ] Scott McMillan
- [X] Erik Welch
- [X] Isaac Virshup

## Agenda

- Discuss the possibility of creating a binary storage format for sparse data.
- Establish a general course of direction for creating a spec, enumerate desired features.

## Summary of Discussion

It was pointed out that there are two components that need to be created, based
on the proposed approach in Erik Welch's HPEC keynote, which proposed JSON metadata
segment describing the sparse matrix layout combined with a series of binary
blobs.

1. The specification for valid metadata. This includes supported formats (CSR,
COO, Dense, etc.), as well as which types of data are supported (vectors, matrices,
n-dimensional tensors) and the specific syntax to be used.

2. How this metadata maps on to the stored binary blobs, as well as how they are
actually stored.
This gets into implementation details (are we using HDF5, Zarr, or some other
storage format).

We began by enumerating some of the storage options available.

- HDF5
- Zarr
- ASDF
- Possibly others

As well as the features we would like to see supported by any future spec.
We established that a version 1.0 of any spec would likely support storing:

- Vectors, presumably with sparse and dense formats
- Matrices with a variety of sparse storage formats
- N-D Tensors, initially limited to COO, but likely CSF in future versions

The following matrix formats were discussed:
- COO
- CSR
- CSR
- Hyper CSR (DCSR)
- Hyper COO (DCSC)
- ISO or single-value matrix
- Dense mask (?)

We discussed various implementation complexities of a dense mask format, which
may leave it to be left out of an initial spec.

We discussed the important of supporting blocked formats, which could potentially
allow for very efficient loading and storing on distributed memory systems.

We discussed the complexities of supporting user-defined (and even built-in) types.
We concluded that we should depend on a binary storage library like HDF5 to handle
this for us. Some storage formats, like potentially Zarr, may allow the use of
user-defined types.

We discussed the need to support different varieties of index types, at least
encompassing C integer types.

We discussed TileDB's sparse storage format, acknowledging that we should perhaps
take its design into consideration, but that certain aspects are not compatible
with our vision for supporting multiple sparse matrix formats.

Ben expressed interest in developing a prototype implementation for C++. Jim Kitchen
expressed interest in expanding on his prototype implementation in Python. Tim Davis
expressed interest in supporting a cross-platform binary spec in SuiteSparse and
possibly providing the format in the SuiteSparse Collection once the spec becomes mature.

## Outcomes
- Ben to open new GitHub repo for developing and discussing binary sparse format specification.

- We will begin by agreeing on a feature set on GitHub, then working on the spec
as prototype implementations develop.

- The calendar invite will be updated to once monthly, with the possibility of
meeting sooner if there is progress on the GitHub.

0 comments on commit 36a3637

Please sign in to comment.