-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: More Explicit/User-Friendly Python Bindings #195
base: main
Are you sure you want to change the base?
Conversation
The `GrackleSolver` class is intended to provide a somewhat more streamlined and more user-friendly interface than the traditional approach involving `FluidContainer`
e6373d4
to
da1e663
Compare
9d4e692
to
ea3a364
Compare
I think you've got a lot of really good ideas in here. I am very in favor of an overhaul to the current Python API to something that looks more like the library API, namely, the splitting of parameters and units back into their own structures. My main thought up front is more pragmatic. Personally, I'd like to see this implemented totally independently of the existing Python interface and the Some minor comments on the points you raise:
|
Overview
While playing around with the test-suite, I was recently reminded that I find Grackle's existing python bindings to be a little unsatisfactory. This PR proposes a newer set of python bindings (these are largely influenced by wrapper functions that I've previously written around the existing python-bindings, and the C++ bindings that I've written in Enzo-E).
My primary goal here is to gauge receptiveness to this new proposed API. The implementation is mostly a proof-of-concept. If people are receptive, I would be very happy to refine the implementation (and maybe add documentation). With this in mind, I would strongly encourage the reviewer to focus first on the overall design before looking at the code-changes.
I focus a lot on the advantages of the proposed python-bindings for users that have never directly interacted with Grackle before (i.e. there only prior experience may have been running a simulation with Grackle).1 When I first started with Grackle, I was very much in this camp and found that the existing python-bindings required me to understand a lot more about the underlying C API.
Description
The changes revolve around a new class called
GrackleSolver
. For a quick calculation, this is the only class that a user needs to know about (rather thanchemistry_data
&FluidContainer
).I will break the description into 2 sections: (i) initializing grackle & querying configuration and (ii) performing grackle-calculations. In each section, I first cover the proposed API and then compare it to the existing API
Initializing Grackle and Querying Configuration
Proposed API
The logic for initializing
GrackleSolver
looks like the following:At this point,
grackle_solver
is fully initialized and is ready to be used.The user might subsequently want to query the solver's configuration. The properties,
grackle_solver.parameters
,grackle_solver.rates
, &grackle_solver.code_units
return objects that provide access to the data stored by thechemistry_data
,code_units
, &chemistry_data_storage
structs. The following snippets show some examples.In more detail, these properties return instances of a new class called
_DataProxy
. These instances provide access to a subset of the attributes/properties of the underlyingpygrackle.grackle_wrapper.chemistry_data
extension-type (that is wrapped byGrackleSolver
.At the moment, the object returned by
grackle_solver.code_units
is NOT mutable. With that said, it's still possible to initially configuregrackle_solver
with one choice ofa_value
and then subsequently perform calculations using a differenta_value
. Consequently, I would argue that there isn't any other need for a user to modify thecode_units
attributes.2Currently, the object returned by
grackle_solver.parameters
is mutable. But, I'm very tempted to make it immutable. In my opinion, there should never be any reason for a user to need to modify these parameters after initializing grackle (if they ever want different parameters, then they should create a newGrackleSolver
instance). I talk a little more about this down below.It is possible to mutate rate values accessed through
grackle_solver.rates
.Comparison with current API
Based on my description so far,
GrackleSolver
serves a similar purpose tochemistry_data
, but there are 2 aspect clear differences:GrackleSolver
's constructor produces a fully configured/initialized object. If there is a problem with initialization, an exception is raised (that provides as much info as possible). In contrast,chemistry_data
's constructor creates an object that must be further configured and initialized as a separate step.GrackleSolver
explicitly organizes access to Grackle's configuration/state data based on whether a given values is considered to be a parameter, related to units, or is a rate.chemistry_data
provides access to configuration/state data via uncategorized attributes.The fact that
GrackleSolver
's constructor always creates a fully initialized object is unequivocally an improvement (the lack of a separate initialization step means that there is one less thing for users to do).In my opinion, the fact that
GrackleSolver
categorizes attribute access (and initialization) is also an improvement.chemistry_data
or a parameter/unit-related quantity after initialization), there weren't any really guardrails in place. The python bindings might accept such a change and subsequent calculations could silently provide nonsensical answers. Alternatively, some could loudly fail, but it's unclear what went wrong.GrackleSolver
at least puts more guardrails in place (it's harder to do something incorrect). The resulting code also ends up being a little more explicit.Performing Calculations
Proposed API
This section assumes that we have a
GrackleSolver
instance calledgrackle_solver
. When I describe these functions, I make a point of including default kwarg values in the function signaturesTo retrieve the species density fields required by a given grackle configuration, (including metal and dust densities) one can call
grackle_solver.expected_species_density_fields()
.The "calculation-methods" all share some common features:
data
. It is adict
-like object passed via an argument called that contains numpy arrays (an error is raised if the arrays don't share a common shape). These entries are explicitly used to populate thegrackle_field_data
struct. The intention is for the calculation method to raise errors if unknown key-value pairs are specified (I don't think the implementation is being very strict about that right now). Errors are also currently raised if a contained value is not a 1D numpy array (I would like to add support for non-numpy arrays and other dimensionalities)inplace
, with a default value ofFalse
. Wheninplace
isFalse
, copies are explicitly made of all entries indata
. WhenTrue
, the wrapper tries to directly make use of these arrays in the calculation (an error is raised if there is an invalid data layout or dtype).current_a_value
, with a default value ofNone
.GrackleSolver
configured without comoving coordinates, this must always beNone
GrackleSolver
configured with comoving coordinates, a value ofNone
indicates that thea_value
from initialization should be used during this calculation. Alternatively, this kwarg is a way to specify that a differenta_value
for the current calculation (the updatedcode_units
struct is automatically computed by the bindings)Solving Chemistry:
In more detail,
dt
specifies the timestep to evolve the chemistry over. Vurrently,grid_dx
is a placeholder argument for when support for 3D arrays is added. Wheninplace
isTrue
, the values ofdata
are updated in-place.This method always returns a
dict
-like object with all of the fields used by Grackle; you should be able to pass that object as thedata
argument in subsequentsolve_chemistry
calls. Note: the value ofinplace
has no impact on this behavior, it just determines whether fresh arrays are allocated.Calculating Derived Properties:
The following methods all return a newly allocated array holding computed derived values:
As a point of clarification, the
inplace
kwarg only controls whether the arrays specified by thedata
arg are directly used or are copied when constructing agrackle_field_data
struct. Arrays specified in thedata
arg are never used to hold the result of the calculation.I'm tempted to make an optional kwarg called
out
that can be used to specify the array for holding the output data.NOTE: Like before, the
grid_dx
keyword argument passed tocalculate_cooling_time
is mostly just a placeholder (until we add support for 3D arrays).Comparison with current API
I recognize that most of the "benefits" of using
GrackleSolver
overFluidContainer
are fairly subjective (and are mostly a matter of taste).In my opinion, the
GrackleSolver
's single biggest advantage is that it's much more explicit thanFluidContainer
. In other words, the proposed API is much more obvious about what the inputs and outputs are for each of the calculations (without needing to be very familiar with the C functions).For the
FluidContainer
object:FluidContainer
pre-allocates fields to hold derived quantities somewhat hinders this clarity.solve_chemistry
and which quantities are out-of-date after that callThere are also some other less important (subjective) advantages of
GrackleSolver
overFluidContainer
, but I'll skip those for now.The primary disadvantage of
GrackleSolver
is speed:GrackleSolver
is currently a wrapper aroundFluidContainer
(but that was just to make a proof-of-concept, I plan to fix that if this PR is well-received).GrackleSolver
will also always be a little slower wheninplace = False
, but that's also fine.inplace = True
. In this case, theGrackleSolver
's calculation-methods incur additional overhead from the array-checks that ensure that the input arrays can be directly passed directly to the solver.pygrackle.utilities.arraymap.ArrayMap
that is adict
-like type with additional restrictions. These additional restrictions explicitly enforce invariants that satisfy the aforementioned array-checks.Note on Implementation
Currently, the
GrackleSolver
object is implemented in terms ofchemistry_data
andFluidContainer
. I tried to implement the new API without making too many changes to the cython file. This meant that there is quite a bit of ugly python-magic going on while determining arguments for_DataProxy
that are related tochemistry_data
's rate-related attributes.If we decide that we like this new API, I'm happy to clean all of this up. The internals would still use
chemistry_data
(but notFluidContainer
). I would also probably move the implementationFluidContainer
into the cython file to reduce overhead).Highlighted Areas for Feedback
Beyond thoughts on the overall python API, I wanted to highlight some additional areas for feedback:
GrackleSolver
?grackle_solver.parameters
property? I recognize that my preference for immutability is very subjective, but in my opinion it makes reasoning about the program a lot easier. In this case, it would make it a lot more difficult for users to unintentionally put Grackle in an invalid state. (Furthermore, this is python after all. If somebody really wants to mutate a value, they can probably find a way).Future Ideas
_wrapped_c_chemistry_data
type as a part of the API and make similar types that wrap thecode_units
&chemistry_data_storage
GrackleSolver
's constructor (as an alternative of thedict
instances that are currently expected)grackle_solver.parameters
,grackle_solver.code_units
, &grackle_solver.rates
properties directly returns these instances (or we could wrap them if we desire immutability -- which would be cleaner than what we currently support)_wrapped_c_chemistry_data
acts like adict
(that was just the easiest thing to do when I implemented it). But, since the stored entries have fixed typing and arbitrary keys can't be added, I think it's more pythonic to have it act more like a dataclass or the typing.NamedTuple class.GrackleSolver
's constructor to optionally enableunyt
support? I'm not exactly sure what this would look like:unyt.unyt_array
instances as arguments to the calculation functions. Then, we would return arrays with attached units. (I think we would need to requireinplace=False
for this scenario to properly work)GrackleSolver
's constructor and/or add an option to the calculation-method to control how strict the calculation-methods are about the keys in the array. Currently the intention is to be very intolerant of unnecessary keys, by default. But, it could be useful to override that behavior.out
kwarg to the derived-property calculation methods to let users pre-allocate an array to hold the outputs of a calculation.GrackleSolver
in order to query the expected code units at a given cosmological scale factor.Footnotes
As I've mentioned before, I think that Grackle's python bindings is actually a major selling point for people in this position! To my knowledge, no individual simulation with adopting an advanced chemistry/heating/cooling solution provides this type of solution. I have personally found this functionality to be extremely useful for interpreting simulation results (and understanding the underlying physical results). ↩
If the fields are not comoving, then the
code_units
values can't change at all. When using the comoving coordinates, thea_value
member of thecode_units
struct is the only value that is allowed to freely change. While thelength_units
anddensity_units
also change, there is no flexibility; their values are entirely specified by the originalcode_units
struct and the currenta_value
. All of theGrackleSolver
calculation-methods allow the caller to specify a newcurrent_a_value
kwarg and will internally compute the newcode_units
struct with this information. It's my intention to propose similar functionality in the actual c library (but that's beyond the scope of this PR) ↩