Binary Sparse Format Working Group Meeting #2 - September 7, 2022

Attendees

Ben Brock
Tim Davis
Jim Kitchen
Tim Mattson
Scott McMillan
Erik Welch
Isaac Virshup
Will Kimmerer

Agenda

Discuss Erik Welch's strawman proposal for v1.0 and v2.0 specs.

Summary of Discussion

Discussed stacking indices/pointers arrays vs. keeping them as separate arrays.
- Stacking potentially reduces overhead associated with having multiple arrays
- Stacking arrays requires that they be the same type
- Jim: perhaps stacking vs. independent arrays can be handled by the binary storage library.
- Isaac: one library seems to represent multiple Arrow arrays as a single array.
Add "comments" fields, perhaps some additional "attributes," or even allow users to provide arbitrary keys.
Discussed data types. Consensus seems to be to list datatypes of individual arrays.
- What types should we support?
- int8, 16, 32, 64, etc., as well as unsigned versions
- float, double, complex
- Isaac: what about user-defined types through Arrow?
- More investigation necessary, but can potentially use third-party library like Arrow.
Discussed formats to support in v1.
- Sorted, no duplicated indices.
- COO, CSR, CSC, DCSR, DCSC, (COOR and COOC sorted by row, column, with COO an alias to COOR)
- Format name "custom" or similar will indicate explicit tensor descriptor in v2.

Isaac: in-memory or memory-mapped usability is a big priority. Ben: memory-mapping is also important in distributed memory, as it can allow different processes to read independently in an efficient way. Hopefully a binary storage library will allow us to do this.

Outcomes

Consensus on the outline of features for v1, v2.

Deliverables:

Ben to set up a when2meet / whenisgood to establish new meeting time.
Erik to update his design document based on our discussion
Ben, Jim, and Tim to implement experimental reader/writers based on the design document for C++, Python, C.
Isaac to examine binary storage libraries, particularly Arrow and user-defined types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-09-07.md

2022-09-07.md

Binary Sparse Format Working Group Meeting #2 - September 7, 2022

Attendees

Agenda

Summary of Discussion

Outcomes

Files

2022-09-07.md

Latest commit

History

2022-09-07.md

File metadata and controls

Binary Sparse Format Working Group Meeting #2 - September 7, 2022

Attendees

Agenda

Summary of Discussion

Outcomes