Skip to content

Conversation

@wlav
Copy link
Contributor

@wlav wlav commented Sep 23, 2025

This is a proof-of-concept breadboard to show we can actually run Python algorithms with very low overhead and without requiring the Python algorithm author to write any C++. The current implementation largely mimics the C++ one, while using Python reflection to reduce boilerplate.

The trick is to add converter nodes, to/from PyObject*, which solves the combinatorical problem by making it enumerable. (This current code is just an exemplar providing converters for a subset of builtin types; the actual enumeration still needs to happen, e.g. generated from the IDL or maybe using the public converter interface of cppyy to have a complete set of converters.)

Lifetimes of view objects created by converter nodes are handled with a "lifeline" Python object to equalize their lives. The actual callback checks for lifeline objects and extracts the view for passing to Python for any found, leaving the source alive until the end of the callback.

Intermediaries are created from a combination of input/output labels as used in the registration and Python reflection (eg. function name and annotations). There's a trade-off between conformity and allowable flexibility that will need to be fleshed out with the stake-holders. For example, the formal argument names of the Python function could be used as (part of) the label.

Current tests show the use of a C++ producer, a Python algorithm performing a transformation, and a Python observer to verify the result. Intermediate converter nodes are not yet discarded for back-to-back Python calls, which is an optimization that needs to happen either at graph construction time (probably difficult) or after the fact on the graph. All Python calls are still serialized explicitly (and acquire the GIL). Error reporting translates any Python exceptions to C++ std::runtime_error.

One important open question remains, but does not need to be resolved before including this PR into the prototype:

  1. How to recover the GIL for cleanup and shutdown? The GIL needs to be released from the main thread prior to the execution of the graph by TBB and be reacquired afterwards for cleanup. Currently, the placement of the former is a bit arbitrary and the latter never happens. Note that this is a general problem for resources that are managed with tokens, not just the GIL. E.g. a database will need to establish a connection before becoming available and close that down when the job is done.

Do not merge this pull request yet. This PR exists to document delivery of the matching F1 milestones and in preparation of the prototype. Of the listed open questions above, at minimum point 1 needs to be resolved before being able to merge it, as the current location (in test/python) isn't good enough for the prototype.

@knoepfel knoepfel marked this pull request as draft September 24, 2025 12:45
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@             Coverage Diff             @@
##             main      #45       +/-   ##
===========================================
+ Coverage   52.94%   81.67%   +28.72%     
===========================================
  Files         114      118        +4     
  Lines        2289     2074      -215     
  Branches     1107      336      -771     
===========================================
+ Hits         1212     1694      +482     
- Misses        237      244        +7     
+ Partials      840      136      -704     
Flag Coverage Δ
unittests 81.67% <ø> (+28.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 106 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b75c557...7bcb611. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knoepfel knoepfel marked this pull request as ready for review December 5, 2025 17:39
@knoepfel knoepfel requested a review from beojan December 5, 2025 17:39
@knoepfel knoepfel changed the title DRAFT: proof-of-concept Python support using converter nodes Proof-of-concept Python support using converter nodes Dec 5, 2025
Copy link
Contributor

@beojan beojan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but think this could be made a lot shorter and easier to read by using Pybind11 (which supports embedding the Python interpreter). That would be a fair amount of work though, and probably not worth it for the prototype.

// Python wrapper for Phlex handles
extern PyTypeObject PhlexLifeline_Type;
// clang-format off
typedef struct py_lifeline {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be re-written in the standard C++ style. The C syntax ends up creating two names for the type (py_lifeline and py_lifeline_t).

if (mod) {
PyObject* reg = PyObject_GetAttrString(mod, "PHLEX_EXPERIMENTAL_REGISTER_ALGORITHMS");
if (reg) {
PyObject* pym = wrap_module(&m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapped_module?

dlopen(info.dli_fname, RTLD_GLOBAL | RTLD_NOW);
}

#if PY_VERSION_HEX < 0x03020000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is support for such old Python versions required?

import_numpy(true);
#endif

// TODO: the GIL should first be released on the main thread and this seems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a static RAII type that releases the GIL on construction and re-acquires on destruction that should solve this, and automatically handle only releasing it once:
https://en.cppreference.com/w/cpp/language/storage_duration.html#Static_block_variables

// clang-format on

PyObject* phlex::experimental::wrap_configuration(configuration const* config)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document that the caller must release the reference

}

template <typename... Args>
void callv(Args... args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between call and callv?

if (!np_view) \
return (intptr_t)nullptr; \
\
/* make the date read-only by not making it writable */ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data?

.input_family(cinputs[0] + "py", cinputs[1] + "py", cinputs[2] + "py");
}
} else {
PyErr_SetString(PyExc_TypeError, "unsupported number of inputs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems limiting

auto const& inp_type = input_types[i];

if (inp_type == "bool")
INSERT_INPUT_CONVERTER(bool, inp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have more than one Python algorithm using the same inputs, wouldn't this register the same conversion algorithm multiple times?

.input_family(cinputs[0] + "py")
.output_products("py" + output);
} else {
auto* pyc = new py_callback_1v{callable}; // id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that the v suffix here is for void and not vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants