proof of concept python extension for frcw #1

msarahan · 2022-01-10T01:30:57Z

One of the things that I discussed with @InnovativeInventor at mggg/GerryChain#379 was to make it easier to use frcw from python. As I mentioned there, I have some experience with wrapping Rust to make it available to Python.

This PR is meant to create discussion around this wrapping idea:

is it worthwhile?
does it avoid needing pcompress to go between GerryChain and frcw? Saving any trips to/from disk can be a big win.
What is the minimum viable amount of wrapping to make this useful?
What would a "dream/complete" wrapping look like?
How do we avoid code duplication (and perhaps also functionality divergence) between here and GerryChain, while also not requiring people to write Rust to test their new ideas?

Note that I changed the error handling for graph-related errors. This is hopefully a nice change from the string errors that are more annoying to catch and deal with. It makes error handling for the PyO3 wrapping stuff easier, too.

There's a lot of nitty-gritty details around mutability, ownership/references/copies, etc. that I haven't spent time examining yet. My Rust is rusty, and was never great to begin with, so I've mostly been just trying to get a simple PoC working.

You can test this from the frcw.rs root folder:

pip install maturin
maturin develop
python -c "import frcw; g=frcw.Graph(10); g.edges_start"

msarahan · 2022-01-10T01:32:27Z

Cargo.toml

 ndarray = { version = "0.14", optional = true }
 ndarray-linalg = { version = "0.13", features = ["openblas-system"], optional = true }
 pcompress = "1.0.6"
+petgraph = "0.6.0"


I sorted these, and the IDE bumped some of them. I'm happy to revert if this fluff bugs you. The import new stuff is the addition of pyo3 and snafu.

InnovativeInventor · 2022-01-10T02:15:19Z

does it avoid needing pcompress to go between GerryChain and frcw? Saving any trips to/from disk can be a big win.

pcompress does not need to go to/from disk (the Python wrapper around pcompress is able to read in any executable that spits out assignment vectors, plus pcompress the executable is really a tool for compressing Unix pipe streams that have assignment vectors).

However, I'm hoping that we can figure out a tighter integration between the Rust and Python code than just passing assignment vectors (e.g. some way of prescoring updaters in Rust), but also allowing for it to be used as a native networkx object in Python. This looks like an excellent start!

msarahan · 2022-01-10T02:35:26Z

I'm not certain, but if you're relying on string output from any executable, you're paying a decent price for serialization/deserialization. Avoiding that can help a lot, but it'd be hard to say how much without profiling it.

I think anything is possible, but I don't have a good sense of how hard things might be. Dream away, and I'll give it a try.

InnovativeInventor · 2022-01-10T03:00:46Z

Of course -- you're totally right. Passing Python objects like you're doing from Rust would be awesome and probably much faster. I'm not sure how much this matters, though (an unoptimized pcompress parser/compressor that acts on the streams has reasonable performance even reading from disk, and there is some low-hanging fruit if we need to go faster).

Ultimately, I think the slowest part of our workflow (as I'm seeing in Gerrychain Python and GerryChain Julia) is the sheer number of updaters people like to run in their analysis. For example, this is a fairly standard set of updaters (we usually run every single updater we have an implementation for, with some exceptions). The other big slowdown is when people run optimization runs (e.g. trying to optimize for VRA effectiveness or some fairness metric). In these two areas, the non-Python GerryChain implementations have a long ways to go before adoption, which is the reason most people still stick to GerryChain Python.

I was working on adding pyo3 to the pcompress Rust code (the idea was to initialize the Partition object in Rust and pre-calculate as much as possible in parallel before sending it and exposing it as a PyIterator). This could speed things up even more (and achieve the goal of allowing GerryChain Python users to stay with Python). Unfortunately, I was having some difficulty with this, and it's still a work-in-progress.

msarahan · 2022-01-10T03:05:50Z

I'd be happy to help with that if you want to give me some pointers of what the problems are and what you've tried so far.

…

On Sun, Jan 9, 2022, 21:00 Max Fan ***@***.***> wrote: Of course -- you're totally right. Passing Python objects like you're doing from Rust would be awesome and probably much faster. I'm not sure how much this matters, though (an unoptimized pcompress parser/compressor that acts on the streams has reasonable performance <https://github.com/mggg/pcompress#performance> even reading from disk, and there is some low-hanging fruit if we need to go faster). Ultimately, I think the slowest part of our workflow (as I'm seeing in Gerrychain Python and GerryChain Julia) is the sheer number of updaters people like to run in their analysis. For example, this <https://github.com/mggg/plan-evaluation-reporting/blob/main/state_specifications/Missouri.json#L215-L244> is a fairly standard set of updaters (we usually run every single updater we have an implementation for, with some exceptions). The other big slowdown is when people run optimization runs (e.g. trying to optimize for VRA effectiveness or some fairness metric). In these two areas, the non-Python GerryChain implementations have a long ways to go before adoption, which is the reason most people still stick to GerryChain Python. I was working on adding pyo3 to the pcompress Rust code (the idea was to initialize the Partition object in Rust and pre-calculate as much as possible in parallel before sending it and exposing it as a PyIterator). This could speed things up even more (and achieve the goal of allowing GerryChain Python users to stay with Python). Unfortunately, I was having some difficulty with this, and it's still a work-in-progress. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAJL6PSPB6RN6UTGO44X3LUVJDWRANCNFSM5LSRGYAA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

pjrule · 2022-01-10T04:53:18Z

@msarahan Thanks for jumping in on this—extremely exciting that someone other than me is excited about hacking on frcw! 😁 I particularly appreciate the tweaks to the error handling, which are of independent interest. (If it's not too much trouble, maybe one of us should create a dedicated PR for that?)

As @InnovativeInventor mentioned, running "raw chains" is just part of the Python GerryChain overhead. For a Python/Rust integration to useful to most of our end users, we probably need to precompute the values of least some updaters (e.g. tallies) on the Rust side. There's also the question of constraints and acceptance functions, which tend to be in tight inner loops and inherently non-parallelizable in the MCMC setting. I'm working on a highly experimental (and probably overkill...) extension to GerryChain that aims to compile operations over tallies, etc. down to a computation graph that can be mapped to vectorized operations on the Rust side. (This approach is heavily inspired by projects like TorchScript, JAX, tf.autograph, and Zygote.jl.) Needless to say, this is complicated, and I'm largely undertaking this effort to further my personal interest in the new wave of domain-specific abstract interpretation/JIT work in the MLSys community.

I could imagine something simpler in the short term—e.g. users declare their updaters and simple acceptance functions in a purely declarative markup language like the JSON format @InnovativeInventor linked to in the post above. The Rust engine then computes tallies, etc. specified in this format and exposes an object to Python with these values (with roughly the effect of prepopulating the _cache field in the GerryChain Partition object, though the implementation details may differ depending on how we leverage PyO3).

A meta-note: if you're willing, it might be productive to set up a Zoom call soon for the three of us to talk some of these ideas out! :)

msarahan · 2022-01-10T14:45:37Z

For the error handling, it will no doubt fester quickly into a monster PR. Perhaps it is best to work through it one file/module at a time, so that reviewing things can be done with more care.

I'm out of my depth on the GerryOpt stuff. With the constraints/acceptance functions, you can maybe go faster by running several at once, and take only the first that is valid. That way you are not waiting on bad steps, only to go by and try another proposal. It's wasteful, of course, but if you have idle cores, perhaps helpful.

Zoom call would be good. Slack would be good too. I'm on the very old VRDI and hackathon slacks, but no slack that appears active. msarahan at gmail.com

proof of concept python extension for frcw

ed9af77

msarahan commented Jan 10, 2022

View reviewed changes

msarahan mentioned this pull request Jan 17, 2022

add snafu lib dep; convert errors in graph.rs #2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proof of concept python extension for frcw #1

proof of concept python extension for frcw #1

msarahan commented Jan 10, 2022

msarahan Jan 10, 2022

InnovativeInventor commented Jan 10, 2022

msarahan commented Jan 10, 2022

InnovativeInventor commented Jan 10, 2022

msarahan commented Jan 10, 2022 via email

pjrule commented Jan 10, 2022 •

edited

Loading

msarahan commented Jan 10, 2022

proof of concept python extension for frcw #1

Are you sure you want to change the base?

proof of concept python extension for frcw #1

Conversation

msarahan commented Jan 10, 2022

msarahan Jan 10, 2022

Choose a reason for hiding this comment

InnovativeInventor commented Jan 10, 2022

msarahan commented Jan 10, 2022

InnovativeInventor commented Jan 10, 2022

msarahan commented Jan 10, 2022 via email

pjrule commented Jan 10, 2022 • edited Loading

msarahan commented Jan 10, 2022

pjrule commented Jan 10, 2022 •

edited

Loading