Skip to content

Formalisation of the Parquet spec of weights IO #699

@martinfleis

Description

@martinfleis

While working on #698, I realised that there's probably a space for a formal specification of the weights exchange format based on Parquet as we have introduced in Graph. That way other projects (spdep, pygeoda, rgeoda) can write their own IO so we can avoid those horrendous space-separated text files with no formal specification.

At this moment, we expect:

  • exactly three columns focal, neighbor, weight, where weight shall be numeric
  • canonical sorting of the observations to ensure correct sparse rountripping
  • custom metadata with transformation and libpysal version- those would probably change if we want to open to other projects

So I don't expect a very long document. I think the optimal place for this discussion would be the SDSL Discord and SDSL 2024 and I am happy to lead that. Before I get into that rabbit hole, anyone has any ideas or objections we should take into account?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions