Skip to content

Commit

Permalink
Merge branch 'main' into wma/reference_output
Browse files Browse the repository at this point in the history
  • Loading branch information
willow-ahrens committed May 14, 2024
2 parents eb8f2ec + 51647ae commit 735a691
Show file tree
Hide file tree
Showing 3 changed files with 324 additions and 7 deletions.
25 changes: 25 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Publish

on:
push:
branches:
- main

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Output to HTML
uses: netwerk-digitaal-erfgoed/bikeshed-action@v1
with:
source: spec/latest/index.bs

- name: Publish HTML to GitHub Pages
if: success()
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,19 @@ This is part of a new effort to create a binary storage format for storing spars

Minutes from our meetings are available [here](https://hackmd.io/0qzK4fJlQp-78t067yiYsA?view) (see also: [previous minutes](minutes)).



## Specification

[View Latest Spec](https://api.csswg.org/bikeshed/?url=https://raw.githubusercontent.com/GraphBLAS/binsparse-specification/main/spec/latest/index.bs)
[View Latest Spec](https://graphblas.org/binsparse-specification/)

## Parsers

Here is a table listing the current tensor frameworks that support the format:

| Language | Framework | Status | Notes |
| -------- | ------ | ------ | ----- |
| C | [binsparse-reference-c](https://github.com/GraphBLAS/binsparse-reference-c) | under development | converts between binsparse V1.0 and custom in-memory sparse matrices |
| C++ | [binsparse-reference-impl](https://github.com/GraphBLAS/binsparse-reference-impl) | under development | converts between binsparse V1.0 and custom in-memory sparse matrices |
| Julia | [Finch.jl](https://willowahrens.io/Finch.jl/dev/fileio/) | under development | converts between binsparse V1.0 and V2.0 and Finch matrices and tensors |
| Python | [binsparse-python](https://github.com/ivirshup/binsparse-python) | under development | converts between binsparse V1.0 and scipy.sparse matrices |
Expand Down
301 changes: 295 additions & 6 deletions spec/latest/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Level: 1
Status: LS-COMMIT
Status: w3c/UD
Group: GraphBLAS
URL: http://example.com/url-this-spec-will-live-at
URL: https://graphblas.org/binsparse-specification/
Repository: https://github.com/GraphBLAS/binsparse-specification
Editor: Benjamin Brock, Intel
Editor: Tim Davis, Texas A&M
Expand Down Expand Up @@ -55,8 +55,9 @@ outside of the "binsparse" namespace.

<div class=example>

Example of a JSON descriptor for a compressed-sparse column matrix with 10 rows
and 12 columns, containing float32 values, along with user-defined attributes.
Example of a JSON descriptor for a compressed-sparse column (CSC) matrix with 10
rows and 12 columns, containing float32 values, along with user-defined
attributes.

```json
{
Expand Down Expand Up @@ -273,6 +274,278 @@ Pairs must not be duplicated.

Coordinate format is an alias for [[#coor_format]] format.

### Version 2.0 only: Custom Formats ### {#custom_formats}

The contents of this section will be finalized with the release of Binsparse
V2.0, and are subject to change until then.

Binsparse describes custom multidimensional formats hierarchically. We can
understand these formats as arrays of arrays, where the parent array and
child arrays might use different formats. For example, we could have a dense
outer array which contains sparse inner arrays, so the first index would be
dense and the second index would be sparse. To achieve efficient storage, all
arrays in the same level are stored contiguously in a specialized datastructure
called a level.

A level is a collection of zero or more arrays which all have the same format.
The elements of arrays in a level may be subarrays in a sublevel. The global
array we wish to store is represented by a level that holds a single root array.

For example, the simplest level is the element format, which represents a
collection of scalars. We can represent a collection of dense vectors with a
dense level format. Each vector in the collection would be composed from
contiguous scalars in an element level (analogously to the numpy.stack
operator). We can represent a collection of sparse vectors using a sparse level.
The sparse level format represents sparse vectors by listing the locations of
nonzeros, and storing only the nonzero scalars inside an element level.

In addition to storing scalars, dense and sparse levels may themselves store
multidimensional arrays. This leads to multiple ways to store sparse matrices
and tensors. For example, a dense vector of sparse vectors is equivalent to the
CSR matrix format, and a sparse vector of sparse vectors is equivalent to the
hypersparse DCSR matrix format.

When defining a custom format, the outermost `subformat` key is defined as the
root level descriptor (a level which will only hold one array). If a level holds
many different arrays, we refer to the `p`th array as the array in position `p`.

Levels are row-major by default (adding an outer level adds a row dimension).
The format descriptor may optionally define a `transpose` key, equal to a list of
the described dimensions in the order they should appear. If the tensor we wish
to represent is `A` and the tensor described by the format descriptor is `B`,
then `A[i_1, ..., i_n] = B[i_(transpose[1]), ..., i_(transpose[n])]`. `transpose` must
be a permutation.

If the format key is a dictionary, the `level` key must be present and shall
describe the storage format of the level used to represent the sparse array.

The level descriptors are dictionaries defined as follows:

#### Element #### {#element_level}

If the level key is "element", the level represents zero or more scalars.

: values
:: Array of size `number_of_positions` whose `p`th element holds the value of the scalar at position `p`.

#### Dense #### {#dense_level}

If the level key is "dense", the `subformat` key must be present. The `rank`
key must be present, and set to an integer `r` greater than or equal to 1. The
dense level represents zero or more r-dimensional dense arrays whose elements
are themselves arrays specified by `subformat`. For example, a dense level
of
rank 2 represents a collection of dense matrices of subarrays.

Assuming that the level describes arrays of shape `I_0, ..., I_(N - 1)`, the
array at position `p` in a dense level of rank `r` is an array whose slice

`A[i_0, ..., i_(r - 1), :, ..., :]`

is described by the row-major position

`q = (((((p * I_0) + i_0) * I_1) + i_1) * I_2 + i_2) * ... + i_(r - 1)`

of the sublevel.

#### Sparse #### {#sparse_level}

If the level key is "sparse", the `subformat` key must be present. The
`rank` key must be present, and set to an integer `r` greater than or equal to
`1`. The sparse level represents zero or more `r`-dimensional sparse arrays
whose non-implicit elements are themselves arrays specified by `subformat`. For
example, a sparse level of rank 1 represents a collection of sparse vectors of
subarrays.

Assume that this level represents `n`-dimensional subarrays and the root array
is `N`-dimensional. The sparse level implies the following binary arrays are
present:

: pointers_to_(N - n)
:: Array of size `number_of_positions + 1` whose 1st element is equal to `0` and whose `p + 1`th element is equal to the sum of `pointers_to_(N - n)[p]` and the number of explicitly represented slices in the `p`th position.

: indices_(N - n), ..., indices(N - n + r - 1)
:: There are `r` such arrays. When `A[i_0, ..., i_(r - 1), :, ..., :]` is explicitly represented by the subarray in position `q`, `indices_(N-n+s)[q] = i_s`. The arrays must be ordered such that the tuples `(indices_(N-n)[q], ..., indices_(N-n+r-1)[q])` are unique and appear in lexicographic order for all `q` in each range `pointers_to_(N-n)[p] <= q < pointers_to_(N-n)[p + 1]`. This array must contain no other elements.

Special note: If the sparse level is the root level, the `pointers` array should
be ommitted, as its first value will be `0` and its last value will be the
length of any of the `indices` arrays in this level.


### Equivalent Formats ### {#equivalent_formats}

The following formats are equivalent

#### DVEC #### {#dvec_format_equiv}

```json
"format": {
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
```

#### DMATR #### {#dmatr_format_equiv}

```json
"format": {
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### DMATC #### {#dmatr_format_equiv}

```json
"format": {
"transpose": [1, 0],
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### CVEC #### {#cvec_format_equiv}

```json
"format": {
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
```

#### CSR #### {#csr_format_equiv}

```json
"format": {
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### CSC #### {#csc_format_equiv}

```json
"format": {
"transpose": [1, 0],
"subformat": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### DCSR #### {#dcsr_format_equiv}

```json
"format": {
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### DCSC #### {#dcsc_format_equiv}

```json
"format": {
"transpose": [1, 0],
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "element",
}
}
}
}
```

#### COOR #### {#coor_format_equiv}

```json
"format": {
"subformat": {
"level": "sparse",
"rank": 2,
"subformat": {
"level": "element",
}
}
}
```

#### COOC #### {#cooc_format_equiv}

Column-wise Coordinate format

```json
"format": {
"transpose": [1, 0],
"subformat": {
"level": "sparse",
"rank": 2,
"subformat": {
"level": "element",
}
}
}
```

Data Types {#key_data_types}
----------------------------

Expand Down Expand Up @@ -313,9 +586,9 @@ The following strings shall be used to describe data types:
## Value Modifiers ## {#value_modifiers}

When the value array is meant to be reinterpreted before reading, a special bracket syntax is
provided to indicate modifications to the underlying value array.
provided to indicate modifications to the underlying element level.

### Sparse Array with Complex Values ### {#complex_arrays}
### Complex Values (complex) ### {#complex_level}

When a value array is composed of alternating real and imaginary components of
complex numbers, the type is written as `complex[<type>]`. For example, a value
Expand All @@ -326,7 +599,7 @@ the modified array shall be at position `2i + 1` in the underlying array.
The `complex` value modifier may only be used with the types `float32` and
`float64`.

### Sparse Array with All Values the Same ### {#iso_arrays}
### All Values the Same (ISO) ### {#iso_level}

When all values of a sparse array are the same identical value, the type is
written as `iso[<type>]`. This indicates that the array will store only a single
Expand Down Expand Up @@ -549,6 +822,22 @@ Example of a symmetric CSR matrix.

</div>

Attributes {#key_attributes}
--------------------------
The `attributes` key shall denote a dictionary of optional attributes containing
keys with information about the stored matrix and the data it represents.
Attributes are optional and may be ignored by a compliant parser.

### Defined Attributes

#### number_of_diagonal_elements #### {#number_of_diagonal_elements_attributes}
`number_of_diagonal_elements` shall contain an integer value corresponding to
the number of elements on the stored matrix's diagonal.

Note: implementations are highly encouraged to provide the
`number_of_diagonal_elements` attribute for matrices with a symmetric,
skew-symmetric, or Hermitian structure.

Binary Containers {#binary_container}
=====================================
Binary containers must store binary arrays in a standardized, cross-platform
Expand Down

0 comments on commit 735a691

Please sign in to comment.