- Ben Brock
- Tim Davis
- Jim Kitchen
- Tim Mattson
- Scott McMillan
- Erik Welch
- Isaac Virshup
- Will Kimmerer
- Discuss Erik Welch's strawman proposal for v1.0 and v2.0 specs.
-
Discussed stacking indices/pointers arrays vs. keeping them as separate arrays.
- Stacking potentially reduces overhead associated with having multiple arrays
- Stacking arrays requires that they be the same type
- Jim: perhaps stacking vs. independent arrays can be handled by the binary storage library.
- Isaac: one library seems to represent multiple Arrow arrays as a single array.
-
Add "comments" fields, perhaps some additional "attributes," or even allow users to provide arbitrary keys.
-
Discussed data types. Consensus seems to be to list datatypes of individual arrays.
- What types should we support?
- int8, 16, 32, 64, etc., as well as unsigned versions
- float, double, complex
- Isaac: what about user-defined types through Arrow?
- More investigation necessary, but can potentially use third-party library like Arrow.
-
Discussed formats to support in v1.
- Sorted, no duplicated indices.
- COO, CSR, CSC, DCSR, DCSC, (COOR and COOC sorted by row, column, with COO an alias to COOR)
- Format name "custom" or similar will indicate explicit tensor descriptor in v2.
Isaac: in-memory or memory-mapped usability is a big priority. Ben: memory-mapping is also important in distributed memory, as it can allow different processes to read independently in an efficient way. Hopefully a binary storage library will allow us to do this.
Consensus on the outline of features for v1, v2.
Deliverables:
-
Ben to set up a when2meet / whenisgood to establish new meeting time.
-
Erik to update his design document based on our discussion
-
Ben, Jim, and Tim to implement experimental reader/writers based on the design document for C++, Python, C.
-
Isaac to examine binary storage libraries, particularly Arrow and user-defined types