-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proof of concept python extension for frcw #1
base: main
Are you sure you want to change the base?
Conversation
ndarray = { version = "0.14", optional = true } | ||
ndarray-linalg = { version = "0.13", features = ["openblas-system"], optional = true } | ||
pcompress = "1.0.6" | ||
petgraph = "0.6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sorted these, and the IDE bumped some of them. I'm happy to revert if this fluff bugs you. The import new stuff is the addition of pyo3 and snafu.
However, I'm hoping that we can figure out a tighter integration between the Rust and Python code than just passing assignment vectors (e.g. some way of prescoring updaters in Rust), but also allowing for it to be used as a native |
I'm not certain, but if you're relying on string output from any executable, you're paying a decent price for serialization/deserialization. Avoiding that can help a lot, but it'd be hard to say how much without profiling it. I think anything is possible, but I don't have a good sense of how hard things might be. Dream away, and I'll give it a try. |
Of course -- you're totally right. Passing Python objects like you're doing from Rust would be awesome and probably much faster. I'm not sure how much this matters, though (an unoptimized Ultimately, I think the slowest part of our workflow (as I'm seeing in Gerrychain Python and GerryChain Julia) is the sheer number of updaters people like to run in their analysis. For example, this is a fairly standard set of updaters (we usually run every single updater we have an implementation for, with some exceptions). The other big slowdown is when people run optimization runs (e.g. trying to optimize for VRA effectiveness or some fairness metric). In these two areas, the non-Python GerryChain implementations have a long ways to go before adoption, which is the reason most people still stick to GerryChain Python. I was working on adding |
I'd be happy to help with that if you want to give me some pointers of what
the problems are and what you've tried so far.
…On Sun, Jan 9, 2022, 21:00 Max Fan ***@***.***> wrote:
Of course -- you're totally right. Passing Python objects like you're
doing from Rust would be awesome and probably much faster. I'm not sure how
much this matters, though (an unoptimized pcompress parser/compressor
that acts on the streams has reasonable performance
<https://github.com/mggg/pcompress#performance> even reading from disk,
and there is some low-hanging fruit if we need to go faster).
Ultimately, I think the slowest part of our workflow (as I'm seeing in
Gerrychain Python and GerryChain Julia) is the sheer number of updaters
people like to run in their analysis. For example, this
<https://github.com/mggg/plan-evaluation-reporting/blob/main/state_specifications/Missouri.json#L215-L244>
is a fairly standard set of updaters (we usually run every single updater
we have an implementation for, with some exceptions). The other big
slowdown is when people run optimization runs (e.g. trying to optimize for
VRA effectiveness or some fairness metric). In these two areas, the
non-Python GerryChain implementations have a long ways to go before
adoption, which is the reason most people still stick to GerryChain Python.
I was working on adding pyo3 to the pcompress Rust code (the idea was to
initialize the Partition object in Rust and pre-calculate as much as
possible in parallel before sending it and exposing it as a PyIterator).
This could speed things up even more (and achieve the goal of allowing
GerryChain Python users to stay with Python). Unfortunately, I was having
some difficulty with this, and it's still a work-in-progress.
—
Reply to this email directly, view it on GitHub
<#1 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAJL6PSPB6RN6UTGO44X3LUVJDWRANCNFSM5LSRGYAA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@msarahan Thanks for jumping in on this—extremely exciting that someone other than me is excited about hacking on As @InnovativeInventor mentioned, running "raw chains" is just part of the Python GerryChain overhead. For a Python/Rust integration to useful to most of our end users, we probably need to precompute the values of least some updaters (e.g. tallies) on the Rust side. There's also the question of constraints and acceptance functions, which tend to be in tight inner loops and inherently non-parallelizable in the MCMC setting. I'm working on a highly experimental (and probably overkill...) extension to GerryChain that aims to compile operations over tallies, etc. down to a computation graph that can be mapped to vectorized operations on the Rust side. (This approach is heavily inspired by projects like TorchScript, JAX, I could imagine something simpler in the short term—e.g. users declare their updaters and simple acceptance functions in a purely declarative markup language like the JSON format @InnovativeInventor linked to in the post above. The Rust engine then computes tallies, etc. specified in this format and exposes an object to Python with these values (with roughly the effect of prepopulating the A meta-note: if you're willing, it might be productive to set up a Zoom call soon for the three of us to talk some of these ideas out! :) |
For the error handling, it will no doubt fester quickly into a monster PR. Perhaps it is best to work through it one file/module at a time, so that reviewing things can be done with more care. I'm out of my depth on the GerryOpt stuff. With the constraints/acceptance functions, you can maybe go faster by running several at once, and take only the first that is valid. That way you are not waiting on bad steps, only to go by and try another proposal. It's wasteful, of course, but if you have idle cores, perhaps helpful. Zoom call would be good. Slack would be good too. I'm on the very old VRDI and hackathon slacks, but no slack that appears active. msarahan at gmail.com |
One of the things that I discussed with @InnovativeInventor at mggg/GerryChain#379 was to make it easier to use frcw from python. As I mentioned there, I have some experience with wrapping Rust to make it available to Python.
This PR is meant to create discussion around this wrapping idea:
Note that I changed the error handling for graph-related errors. This is hopefully a nice change from the string errors that are more annoying to catch and deal with. It makes error handling for the PyO3 wrapping stuff easier, too.
There's a lot of nitty-gritty details around mutability, ownership/references/copies, etc. that I haven't spent time examining yet. My Rust is rusty, and was never great to begin with, so I've mostly been just trying to get a simple PoC working.
You can test this from the frcw.rs root folder: