-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update with minutes from today's meeting
- Loading branch information
Showing
2 changed files
with
90 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Binary Sparse Format Specification | ||
This is part of a new effort to create a binary storage format for storing sparse matrices and other sparse data to disk. | ||
|
||
[Minutes from our meetings](minutes) are available. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Binary Sparse Format Working Group Meeting #1 - August 10, 2022 | ||
|
||
## Attendees | ||
- [X] Ben Brock | ||
- [X] Tim Davis | ||
- [X] Jim Kitchen | ||
- [X] Tim Mattson | ||
- [ ] Scott McMillan | ||
- [X] Erik Welch | ||
- [X] Isaac Virshup | ||
|
||
## Agenda | ||
|
||
- Discuss the possibility of creating a binary storage format for sparse data. | ||
- Establish a general course of direction for creating a spec, enumerate desired features. | ||
|
||
## Summary of Discussion | ||
|
||
It was pointed out that there are two components that need to be created, based | ||
on the proposed approach in Erik Welch's HPEC keynote, which proposed JSON metadata | ||
segment describing the sparse matrix layout combined with a series of binary | ||
blobs. | ||
|
||
1. The specification for valid metadata. This includes supported formats (CSR, | ||
COO, Dense, etc.), as well as which types of data are supported (vectors, matrices, | ||
n-dimensional tensors) and the specific syntax to be used. | ||
|
||
2. How this metadata maps on to the stored binary blobs, as well as how they are | ||
actually stored. | ||
This gets into implementation details (are we using HDF5, Zarr, or some other | ||
storage format). | ||
|
||
We began by enumerating some of the storage options available. | ||
|
||
- HDF5 | ||
- Zarr | ||
- ASDF | ||
- Possibly others | ||
|
||
As well as the features we would like to see supported by any future spec. | ||
We established that a version 1.0 of any spec would likely support storing: | ||
|
||
- Vectors, presumably with sparse and dense formats | ||
- Matrices with a variety of sparse storage formats | ||
- N-D Tensors, initially limited to COO, but likely CSF in future versions | ||
|
||
The following matrix formats were discussed: | ||
- COO | ||
- CSR | ||
- CSR | ||
- Hyper CSR (DCSR) | ||
- Hyper COO (DCSC) | ||
- ISO or single-value matrix | ||
- Dense mask (?) | ||
|
||
We discussed various implementation complexities of a dense mask format, which | ||
may leave it to be left out of an initial spec. | ||
|
||
We discussed the important of supporting blocked formats, which could potentially | ||
allow for very efficient loading and storing on distributed memory systems. | ||
|
||
We discussed the complexities of supporting user-defined (and even built-in) types. | ||
We concluded that we should depend on a binary storage library like HDF5 to handle | ||
this for us. Some storage formats, like potentially Zarr, may allow the use of | ||
user-defined types. | ||
|
||
We discussed the need to support different varieties of index types, at least | ||
encompassing C integer types. | ||
|
||
We discussed TileDB's sparse storage format, acknowledging that we should perhaps | ||
take its design into consideration, but that certain aspects are not compatible | ||
with our vision for supporting multiple sparse matrix formats. | ||
|
||
Ben expressed interest in developing a prototype implementation for C++. Jim Kitchen | ||
expressed interest in expanding on his prototype implementation in Python. Tim Davis | ||
expressed interest in supporting a cross-platform binary spec in SuiteSparse and | ||
possibly providing the format in the SuiteSparse Collection once the spec becomes mature. | ||
|
||
## Outcomes | ||
- Ben to open new GitHub repo for developing and discussing binary sparse format specification. | ||
|
||
- We will begin by agreeing on a feature set on GitHub, then working on the spec | ||
as prototype implementations develop. | ||
|
||
- The calendar invite will be updated to once monthly, with the possibility of | ||
meeting sooner if there is progress on the GitHub. |