Skip to content

Create a static library mode for pybind11 #4001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

royjacobson
Copy link

@royjacobson royjacobson commented Jun 13, 2022

Description

Adds a new target 'pybind11::static' that allows compiling pybind11 as a static library and linking against it.
@Skylion007 @giactra

This consists of the following changes:

  1. CMake infrastructure to support building this target, with an option to disable it.
  2. Defining some new macros to support this dual mode compilation.
  3. Creating one .cc file that is compiled into the new static library. The rest of the code could be refactored in subsequent PRs.
  4. Creating a test that compiling against the new target works.

This PR is partially based on the work here: #2445 by @cirosantilli2. A discussion of the motivation is there as well.

Note that the CMake approach taken here is different: The static/header only modes are configurable by selecting a different target, and not by globally configuring the CMake build scripts generation. I find this better and more flexible.

One issue I'm slightly concerned about is the visibility of pybind11 symbols when using the static library. I think it should be fine given the usual -fvisibility=hidden override, but please validate me on this.

Suggested changelog entry:

Introduce a new ``pybind11::static`` target. When using this target, a static
library will be built and statically linked against instead of the default header-only
mode. It is expected that using this target will improve compile times.
Currently this target is only available when using pybind11 as a CMake subdirectory.

@royjacobson royjacobson requested a review from henryiii as a code owner June 13, 2022 08:22
@royjacobson royjacobson force-pushed the pybind11_static_lib branch from d0ef7aa to aa7d132 Compare June 13, 2022 08:26
@royjacobson
Copy link
Author

It seems that the CI doesn't pass because of some cmake/setup.py errors. Will look into it.

@royjacobson royjacobson force-pushed the pybind11_static_lib branch from ab34ae1 to 9ddbd1e Compare June 13, 2022 14:32
@rwgk
Copy link
Collaborator

rwgk commented Jun 14, 2022

I'm sorry but it is very difficult for me to support this PR. I don't want to stand in the way, but from my perspective, a serious cleanup of the file organization has to come first. What we have at the moment is very far at the "uncontrolled organic growth" side of a spectrum that has "very well organized" at the other end. If we (more-or-less) double the number of source code files by splitting up the headers, we're effectively cementing the current file organization: If we don't have the ability to properly organize now, chances are we're stuck forever, after making that job even harder.

From my perspective, representing Google's needs, the smart_holder functionality is a critical requirement. For large-scale deployment, systematic UB (#1138) is not a viable solution. But what is currently on the smart_holder branch has a serious drawback, the dual mode: smart_holder as add-on, or default. Fixing that will require going into the internals. For that to happen in a clean & maintainable way, the general file organization must be improved.

Talking about dual mode: if we officially support header-only and precompiled, we have to double the number of CI jobs, to test everything in both modes.

Combined with the current smart_holder dual mode, I'd be looking at 4 x, or compromises (usually messy).

If we cleanly refactor master first, before splitting up everything, maybe I can build future smart_holder work on that. If the split comes first, I have serious doubts I'll be able to do that.

The question then is, who has the ability and free bandwidth to cleanly refactor master, as a solid basis for future developments?

@royjacobson
Copy link
Author

Hi @rwgk, thank you for the feedback!

If I understand you correctly, then a relatively large refactor is needed in order to cleanly merge the smart holder branch. Is this something you or someone else is planning on doing? Because if not, it sounds like the smart holder branch will not be merged anytime soon, making it functionally equivalent to a (closely following) fork.

I can't really help with that problem, but obviously I wouldn't want to create problems for the development of the smart holders branch. This is why I haven't actually done the header/inline separation for any file except for detail/common.h: to create the least amount of conflicts as possible. Then the actual splitting of implementations could wait for the upstreaming of the smart holders.

IMHO building pybind11 as a library is a pretty important feature for large-scale users and has been requested a lot in the past, as the compile times of pybind11 can get very large. It is also something that could help pybind11 get more module friendly, with c++modules slowly getting production ready.

@rwgk
Copy link
Collaborator

rwgk commented Jun 14, 2022

A bit of a tangent: the smart_holder branch could be merged as-is, it's an add-on, but it would double the number of CI jobs for full testing, which always stopped me from making a move. With respect to this PR, say we're merging smart_holder, then split the files for the static library, refactoring for future developments (fully baking in the cooperating unique_ptr/shared_ptr & disowning functionality) would be significantly harder.

It is also something that could help pybind11 get more module friendly, with c++modules slowly getting production ready.

That something I was wondering about, with a lot of uncertainty, because I have no experience actually working with C++ modules: does splitting the header files make it easier or harder to transition pybind11 to modules? My completely naive guess was: harder, because IIUC modules do away with the .h, .cpp split. Is that a wrong guess?

@royjacobson
Copy link
Author

A bit of a tangent: the smart_holder branch could be merged as-is, it's an add-on, but it would double the number of CI jobs for full testing, which always stopped me from making a move. With respect to this PR, say we're merging smart_holder, then split the files for the static library, refactoring for future developments (fully baking in the cooperating unique_ptr/shared_ptr & disowning functionality) would be significantly harder.

I see. The CI issue is unfortunate.

IMHO splitting the implementation shouldn't make future refactors that much harder - it doesn't duplicate the code, just splits some parts of it to other files. I don't think that makes future development much harder than the actual work - but again I'm also not very familiar with the current code structure and the needed changes.

I'm not sure what's your decision making process in this project - so if you discuss this and reach a consensus about how to proceed please let me know 😄.

That something I was wondering about, with a lot of uncertainty, because I have no experience actually working with C++ modules: does splitting the header files make it easier or harder to transition pybind11 to modules? My completely naive guess was: harder, because IIUC modules do away with the .h, .cpp split. Is that a wrong guess?

I don't have a lot of experience myself, but modules make explicit what declarations and definitions get exported and what doesn't. So having this made explicit in the source is nice and makes it much easier to export just the declarations in the future.
In theory modules could make headers absolute, but they are flexible enough to support code bases with headers (fmtlib do it that way). But I don't really expect many libraries to go full 'no headers' mode in the foreseeable future because of the slow adoption.

@rwgk
Copy link
Collaborator

rwgk commented Jun 15, 2022

I'm not sure what's your decision making process in this project - so if you discuss this and reach a consensus about how to proceed please let me know 😄.

If @henryiii & @Skylion007 want to go ahead I will try to follow with the smart_holder branch. I'm not sure I'll have the free energy to keep up, but maybe I'm too worried. We'll see.

If we came together to reorg the code into more reusable units first, that would be an investment in time that I could more easily explain to my management. — I realize my situation is unusual: we (Google) have a distributed build system, the static library gain for us will be minimal, or maybe even negative when also taking human time into account, because of friction due to increased complexity.

@royjacobson
Copy link
Author

@Skylion007 @henryiii, I'm waiting on an answer from you whether you still support this given @rwgk's objections.

There seem to be a problem with the windows tests but I put it on hold for the moment.

@rwgk
Copy link
Collaborator

rwgk commented Jul 3, 2022

@Skylion007 @henryiii, I'm waiting on an answer from you whether you still support this given @rwgk's objections.

It's best to look at my situation solely as "I won't be able to participate in this effort, and might not be able to follow with the smart_holder branch".

I'm thinking it comes down to: if @wjakob supports this step, please go ahead.

@henryiii
Copy link
Collaborator

henryiii commented Jul 5, 2022

I think we all had decided to support this some time ago, it was just hard to do with many inflight changes also happening. Now we are are down to one (a big one, smart_holder), but that's not as bad as it used to be. I was still hoping to get smart_holder in before making a big change to avoid double work, but I've been hoping that since last year, IIRC. So maybe it's best just to move ahead.

The more we can break this up in to incremental stand-alone chucks, the better. Reorganizing the code as well as making parts pre-compilable.

@rwgk
Copy link
Collaborator

rwgk commented Jul 15, 2022

Slightly random, but the last couple weeks I familiarized myself more with

  • linking all pybind11 extensions statically (with core Python)
  • -fvisibility=hidden
  • __attribute__((visibility("hidden")))

Seeing static library here in the title of the PR, I'm wondering, how will this play with

  • the module-local features of pybind11?
  • hiding the entire pybind11 namespace in general?
  • ODR: NDEBUG, PYBIND11_DETAILED_ERROR_MESSAGES (and maybe other similar macros, not sure what will need attention).

I'm thinking the static library needs to be versioned by not just the pybind11 version number, but also by those macros, and/or even everything that goes into PYBIND11_INTERNALS_ID and PYBIND11_MODULE_LOCAL_ID. Is that part of the plan already?

@royjacobson royjacobson force-pushed the pybind11_static_lib branch 2 times, most recently from 03a4749 to 7ff8dcf Compare September 3, 2022 21:55
…s a static library and linking against it.

This consists of the following changes:

1. CMake infrastructure to support exporting and installing this target, with an option to disable it.
2. Defining some new macros to support this dual mode compilation.
3. Creating one .cc file that is compiled into the new static library. The rest of the code could be refactored in subsequent PRs.
4. Creating tests that compiling against the new target works.

Co-authored-by: Ciro Santilli <[email protected]>
@royjacobson
Copy link
Author

Slightly random, but the last couple weeks I familiarized myself more with

* linking all pybind11 extensions statically (with core Python)

* `-fvisibility=hidden`

* `__attribute__((visibility("hidden")))`

Seeing static library here in the title of the PR, I'm wondering, how will this play with

* the module-local features of pybind11?

* hiding the entire pybind11 namespace in general?

* ODR: `NDEBUG`, `PYBIND11_DETAILED_ERROR_MESSAGES` (and maybe other similar macros, not sure what will need attention).

I'm thinking the static library needs to be versioned by not just the pybind11 version number, but also by those macros, and/or even everything that goes into PYBIND11_INTERNALS_ID and PYBIND11_MODULE_LOCAL_ID. Is that part of the plan already?

Those are great questions. The main use case for using this right now is as a vendored subdirectory in a CMake project that builds a dynamic extension. I certainly wouldn't want system package managers or pypi to distribute binary artifacts, IMHO that's a terrible idea for C++ libraries.

And between different libraries loaded into the same process I think we have hidden visibility already in the pybind11 namespace macro so I think that should work fine?

@rwgk
Copy link
Collaborator

rwgk commented Sep 4, 2022

I certainly wouldn't want system package managers or pypi to distribute binary artifacts, IMHO that's a terrible idea for C++ libraries.

I see, thanks for the explanation. (From past experience: I fear that's exactly what some people will try anyway, but if our stance is "That's your own problem" I'm fine with it.)

And between different libraries loaded into the same process I think we have hidden visibility already in the pybind11 namespace macro so I think that should work fine?

I think you're right. (Assuming my current limited understanding isn't too far off.)

Something completely different: Could you please not force push, except maybe for rebase on master? Otherwise it's very difficult to see what changed between CI runs. Just keep adding commits. It's very easy to get the view of the accumulated diffs (just click on Files changed), and when PRs are merged the commits are squashed.

royjacobson and others added 2 commits September 4, 2022 21:46
…to work.

* Fix typo in pyproject.toml and disable static library in Clang-Tidy CI pass.
@royjacobson
Copy link
Author

I see, thanks for the explanation. (From past experience: I fear that's exactly what some people will try anyway, but if our stance is "That's your own problem" I'm fine with it.)

Currently, after battling the Windows CI I gave up on the 'installation' part. Someone who knows CMake better than me could probably make this work, I guess, but maybe it's just not worth the effort given the potential problems and I hope it could unblock this PR. (And I admit that as I usually use bazel, I'm less invested in getting it to work).

Something completely different: Could you please not force push, except maybe for rebase on master? Otherwise it's very difficult to see what changed between CI runs. Just keep adding commits. It's very easy to get the view of the accumulated diffs (just click on Files changed), and when PRs are merged the commits are squashed.

Sure, sorry!

@royjacobson
Copy link
Author

@henryiii I gave up on the CMake installation code and fixed the remaining CI failures, so I think this is (finally) ready for review.

@gauenk
Copy link

gauenk commented Sep 10, 2022

Potentially stupid question: will this PR fix the really gross symbols in pybind11? I use "nm -D" on my .so file and I get all sorts of strange symbols surrounding my function calls. I want to read the .so file using FFI to integrate my torch code with jax, and these odd symbols cause problems.

I have a hack for now, but I am merely curious if this PR is related to my concern. I appreciate your thoughts or redirection to a more appropriate place to ask this question.

@rwgk
Copy link
Collaborator

rwgk commented Sep 10, 2022

will this PR fix the really gross symbols in pybind11?

Could you please provide some examples of what you consider gross?

My guess, without knowing what you have in mind: probably not.

I want to read the .so file using FFI

At first glance it seems counter-intuitive to combine pybind11 and FFI. Again, without knowing anything about your context, I'd guess that building an .so just for your library (i.e. completely independent from pybind11) would be the way to go.

@gauenk
Copy link

gauenk commented Sep 10, 2022

will this PR fix the really gross symbols in pybind11?

Could you please provide some examples of what you consider gross?

For example, I define a function void simple_test(void*, void*). When compiled with pybind11, the output from nm -D ./path/to/my.so | grep simple returns _Z11simple_testPvS_. As another example, I take a function named prod_search_with_heads_forward and it has a symbol _Z30prod_search_with_heads_forwardN2at6TensorES0_S0_S0_S0_S0_iiiiiiiiiiiiiiiiiibbbbbS0_S0_S0.

I want to read the .so file using FFI

At first glance it seems counter-intuitive to combine pybind11 and FFI. Again, without knowing anything about your context, I'd guess that building an .so just for your library (i.e. completely independent from pybind11) would be the way to go.

I wanted to compile only a single ".so" file for two purposes: interfacing with Pytorch and Jax. Pytorch reads the pybind11, but Jax requires interfacing with XLA which requires a boilerplate c++ function signature. To add this boilerplate, I wanted to create a c++ function inside of Python to programmatically add this for my library [read: way less for me]. However, compiling the c++-wrapper inside of Python using something like Numba will not work with Pybind11 functions [working numba example here]. As a work around, I am reading the functions in with FFI since this will compile. However, to do this I need to use the symbol names with these strange prefix/suffixes.

I am a bit self-conscious this isn't the place to ask this question. I am happy to jump off this PR thread no problem if you like 😄

@rwgk
Copy link
Collaborator

rwgk commented Sep 10, 2022

For example, I define a function void simple(void*, void*). When compiled with pybind11, the output from nm -D ./path/to/my.so | grep simple returns _Z11simple_testPvS_. As another example, I take a function named prod_search_with_heads_forward and it has a symbol _Z30prod_search_with_heads_forwardN2at6TensorES0_S0_S0_S0_S0_iiiiiiiiiiiiiiiiiibbbbbS0_S0_S0.

What you're seeing there is known as name mangling: https://en.wikipedia.org/wiki/Name_mangling

All C++ functions will have mangled names. You'd have to declare them with extern "C" to get the symbols you seem to be expecting:

$ cat mangled_name_or_not.cc
void simple(void*, void*) {}
extern "C" void actually_simple(void*, void*) {}
$ clang++ -fPIC -std=c++11 -c mangled_name_or_not.cc
$ clang++ -o mangled_name_or_not.so -shared mangled_name_or_not.o
$ nm -D mangled_name_or_not.so | grep simple
0000000000001110 T actually_simple
0000000000001100 T _Z6simplePvS_

To come back to your original question: No. This PR will not change the type of symbols (name mangled) that you're seeing.

I wanted to compile only a single ".so" file for two purposes:

I strongly recommend against it. It's almost certainly bad engineering because it'll create more work than you're hoping to avoid.

@gauenk
Copy link

gauenk commented Sep 10, 2022

Wow you rock! Thank you for the detailed answer. I totally forgot about the extern necessity. I can see now how this PR is unrelated to what I was asking -- how embarrassing for me.

I appreciate your direction here.

@royjacobson
Copy link
Author

@henryiii @Skylion007 ping

@henryiii
Copy link
Collaborator

I had dual conferences last week (a problem with virtual conferences), and I just started teaching a class. Once I get a bit more caught up on writing material, and once we get a patch release out fixing a few important bugs in pybind11, I think we should focus on this. I'm also about to release CLI11 with contributed pre-compile support; might be interesting to compare.

@rwgk
Copy link
Collaborator

rwgk commented Sep 19, 2022

@royjacobson do you have build time comparisons for a couple big projects, with and without this PR?

I forgot where I wrote this before, but splitting up the sources amounts to a permanent tax on development velocity in the future. What are we getting for that tax? From memory, on other PRs I've seen "substantial" or similar, I think that's not sufficient, concrete measurements are important.

FWIW: The pybind11 unit tests are probably not a good test case for timings, with defaults (-Os + -flto most of the time goes into linking).

@cirosantilli2
Copy link
Contributor

cirosantilli2 commented Sep 19, 2022

@royjacobson do you have build time comparisons for a couple big projects, with and without this PR?

I forgot where I wrote this before, but splitting up the sources amounts to a permanent tax on development velocity in the future. What are we getting for that tax? From memory, on other PRs I've seen "substantial" or similar, I think that's not sufficient, concrete measurements are important.

FWIW: The pybind11 unit tests are probably not a good test case for timings, with defaults (-Os + -flto most of the time goes into linking).

The split sped up build of the gem5 project by 40% from 25 minutes to 15 minutes, reproduction details at: https://gem5.atlassian.net/browse/GEM5-572

@henryiii
Copy link
Collaborator

henryiii commented Sep 19, 2022

permanent tax on development velocity

I don't know if this is true. Splitting between header and source isn't necessarily that much of a tax - it's how C++ was originally developed - and we might gain some really nice simplicity for it. One thing we have to do today that is very painful is pre-declare some things that should be coming from headers, but can't because they'd be cyclical because we force implementations into headers too. If this was split up, some of those ugly pre-declarations will simply go away; we can now include what we want from the headers without getting implementations, too.

And, of course, if our tests build faster, then that's a boon to development velocity too. But I'm thinking even without that, it might be at least a trade-off.

@henryiii
Copy link
Collaborator

The main thing I'd like to ensure is that we can support C++20/23 modules well, but I'd assume that's easier or similar with this change.

@rwgk
Copy link
Collaborator

rwgk commented Sep 19, 2022

The split sped up build of the gem5 project by 40% from 25 minutes to 15 minutes, reproduction details at: https://gem5.atlassian.net/browse/GEM5-572

Wow, nice writeup. I didn't look at the linked information, but just the main page looks convincing.

What's your opinion regarding the point I made earlier?

#4001 (comment)

but from my perspective, a serious cleanup of the file organization has to come first.

Do you feel you could help with that?

@rwgk
Copy link
Collaborator

rwgk commented Sep 19, 2022

it's how C++ was originally developed

C++ 20 modules finally go the other way, thanks god. This 1970 punch-card era approach is just terrible, 50 years later.

@cirosantilli2
Copy link
Contributor

but from my perspective, a serious cleanup of the file organization has to come first.

Do you feel you could help with that?

I'm afraid I stopped working on gem5 and don't have much time for it now :-(

Let's check with the people who I think are still working on the as they also seemed quite interested in this: CC. @giactra and @gabemblack

Do you have a more specific description of the refactoring being proposed?

@rwgk
Copy link
Collaborator

rwgk commented Sep 19, 2022

Do you have a more specific description of the refactoring being proposed?

Nothing specific, I think it comes down to a large number of case-by-case judgements:

  • Organize into subdirs (e.g. pytypes are currently lumped into one big file, a directory would make more sense; same for stl.h, stl_binders.h, probably a bunch more, pybind11.h, cast.h are real fur balls). A great role model to use as reference: the Boost.Python file organization.

  • Try not to break too many things by (a) making meaningful compromises and (b) adding backward compatibility headers.

I think it'll take a couple people with a lot of experience having done similar things before. I nominate myself, but having someone else actively working on it, with mutual reviews, would be ideal. Estimated effort is "a very intense couple weeks." To not let this slip into an eternal (dying) project, I think it's important that the second person can commit ~full attention until it's Done (released) in one intense push.

@gabemblack
Copy link

gabemblack commented Sep 19, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants