Skip to content

Latest commit

 

History

History
59 lines (44 loc) · 2.13 KB

2022-09-07.md

File metadata and controls

59 lines (44 loc) · 2.13 KB

Binary Sparse Format Working Group Meeting #2 - September 7, 2022

Attendees

  • Ben Brock
  • Tim Davis
  • Jim Kitchen
  • Tim Mattson
  • Scott McMillan
  • Erik Welch
  • Isaac Virshup
  • Will Kimmerer

Agenda

  • Discuss Erik Welch's strawman proposal for v1.0 and v2.0 specs.

Summary of Discussion

  • Discussed stacking indices/pointers arrays vs. keeping them as separate arrays.

    • Stacking potentially reduces overhead associated with having multiple arrays
    • Stacking arrays requires that they be the same type
    • Jim: perhaps stacking vs. independent arrays can be handled by the binary storage library.
    • Isaac: one library seems to represent multiple Arrow arrays as a single array.
  • Add "comments" fields, perhaps some additional "attributes," or even allow users to provide arbitrary keys.

  • Discussed data types. Consensus seems to be to list datatypes of individual arrays.

    • What types should we support?
    • int8, 16, 32, 64, etc., as well as unsigned versions
    • float, double, complex
    • Isaac: what about user-defined types through Arrow?
    • More investigation necessary, but can potentially use third-party library like Arrow.
  • Discussed formats to support in v1.

    • Sorted, no duplicated indices.
    • COO, CSR, CSC, DCSR, DCSC, (COOR and COOC sorted by row, column, with COO an alias to COOR)
    • Format name "custom" or similar will indicate explicit tensor descriptor in v2.

Isaac: in-memory or memory-mapped usability is a big priority. Ben: memory-mapping is also important in distributed memory, as it can allow different processes to read independently in an efficient way. Hopefully a binary storage library will allow us to do this.

Outcomes

Consensus on the outline of features for v1, v2.

Deliverables:

  • Ben to set up a when2meet / whenisgood to establish new meeting time.

  • Erik to update his design document based on our discussion

  • Ben, Jim, and Tim to implement experimental reader/writers based on the design document for C++, Python, C.

  • Isaac to examine binary storage libraries, particularly Arrow and user-defined types