Performances using a Python Callback from C++ #3503
Replies: 2 comments 1 reply
-
Hard to tell w/o concrete test / benchmark ;) It could be that it's actually fast enough for your use case (or not, as you guess). There has been some discussion on benchmarking on (at least) these issues:
|
Beta Was this translation helpful? Give feedback.
-
Hi, I made a simple benchmark example. This benchmark compares the same callback passed to a sort function, but one time the callback is in pure C++, the second time it is a Python function. I used The C++ code: #include <algorithm>
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <pybind11/functional.h>
namespace py = pybind11;
template <typename T, typename C>
void sort(T& collection, const C& callback)
{
std::sort(collection.begin(), collection.end(), callback);
}
PYBIND11_MODULE(callback_benchmark, m) {
m.doc() = "Benchmarking callback using pybind11";
m.def("no_callback", [&](std::vector<float>& vec) {
sort(vec, std::less<float>());
return vec;
}, "Sort input vector full C++");
m.def("with_callback", [&](std::vector<float>& vec,
const std::function<bool(float, float)>& f) {
sort(vec, f);
return vec;
}, "Sort input vector with Python callback");
} I do not think the code is perfect, it makes a lot of copy. But the point for me is to have a comparison were we will observe is the timing difference between C++ Callback and Python Callback. About the Python code, it is as follow: import time
import numpy as np
from matplotlib import pyplot as plt
from build import callback_benchmark as cb
def fun_less(a, b):
return a < b
y_axis = []
timing_no_cb = []
timing_with_cb = []
for nb_elem in range(1, 50000, 1000):
print(nb_elem)
y_axis.append(nb_elem)
X = np.random.default_rng(0).random((nb_elem, 1))
average_no_cb = []
average_with_cb = []
for i in range(5):
start = time.time()
cb.no_callback(X)
average_no_cb.append(time.time()-start)
start = time.time()
cb.with_callback(X, fun_less)
average_with_cb.append(time.time()-start)
timing_no_cb.append(np.average(average_no_cb))
timing_with_cb.append(np.average(average_with_cb))
time_no_cb = np.array(timing_no_cb)
time_with_cb = np.array(timing_with_cb)
plt.title("C++/Python Callback comparison for C++ Backend")
plt.xlabel("nb of elements in vector")
plt.ylabel("time [s]")
plt.plot(y_axis, time_no_cb, label='With C++ callback used')
plt.plot(y_axis, time_with_cb, label='With Python callback used')
plt.legend(loc="upper left")
plt.show() I got the following plot when running the previous code (the code was run inside a Jupyter Notebook): I also then computed the factor between both timings to get a better idea how many time slower it is: plt.title("Factor C++ Callback / Python Callback")
plt.xlabel("nb of elements in vector")
plt.ylabel("factor c++ cb / Python cb")
plt.plot(y_axis, time_with_cb/time_no_cb)
plt.show() I think that from the last graph it seems that it is "stable" at around 11 times slower. But I think that we are "only" 11x slower, because we only do a comparison inside of Python. I would suspect, that the more operations we do inside of the callback function, the more slower it will be compared to the same callback done in pure C++. Please, first let me know if the benchmark is relevant or I did an evident mistake. Also, if I can provide more data or perform a different comparison, let me know and if I have time I will do it. If I did not do any evident mistake, I will discuss with users of my library, but I think that it is not the correct way for my use case to use a Python callback. Of course if you have an alternative idea, let me know ! Best, |
Beta Was this translation helpful? Give feedback.
-
Hi,
Users of a library I maintain are interested in providing a callback function from Python to the C++ at runtime.
Looking at the documentation it seems that this is perfectly doable.
But, one of my concerns is the performance of this solution. The reason is that the callback is a central point in the C++ core of my library.
To get an idea of how central it's in the computation, it's like using
std::sort
and the callback function would correspond to the predicate. Meaning that this function is called a lot.I am afraid that this would have big impact on performance, am I right in thinking that it will have an impact or maybe is it not the case ?
If the performances will be degraded as I think, do you have an idea if there would be an alternative that doesn't have a too big impact on runtime ?
For the moment, what we do is to implement the callbacks in C++ and then we let users choose the callback from Python.
Best,
Julián
Beta Was this translation helpful? Give feedback.
All reactions