-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImportError: generic_type: cannot initialize type "StatusCode": an object with that name is already defined #12
Comments
This error
is raised by pybind11 when We have this in pybind11_abseil (
That should only get called the I'd use a debugger to find out from where this line of code is reached: Swap the import order to get a backtrace for both locations. If using a debugger is impractical, I often simply insert |
I manage to get some stacktrace for the two places. pybind11_abseil 2c4932e this version is before the pybind11 2.10.4 Then I set a breakpoint on initialize. Importing TensorFlow first gets us
Import Launchpad first
Some other observations
|
Could it be that Could you try to print that out?
Reason for asking the question: You could also add prints there. |
inline bool IsStatusModuleImported() {
std::cout << "Type index" << std::type_index(typeid(absl::Status)).hash_code() << std::endl;
std::cout << "Local type info" << detail::get_local_type_info(typeid(absl::Status)) << std::endl;
std::cout << "Global type info "<< detail::get_global_type_info(typeid(absl::Status)) << std::endl;
return detail::get_type_info(typeid(absl::Status));
}
I won't be able to compile TF from source unfortunately.. If possible, would you mind taking a look? I tried to extract the issue from the original repo as much as possible. |
Oh boy ... binary incompatibility I'd guess then, but how? I'm really swamped, and digging into a situation that brings in prebuilt binaries could go anywhere. Could you please try more at your end? What puzzles me: if there is indeed a binary incompatibility, I'd expect the TF modules and launchpad to be isolated from each other more or less completely. A key piece of the mechanism behind that is here: Could you please try to build launchpad with Then that will live in its own universe for sure. It might break interop you need, but try to ignore that. We only want to know: do the two imports work in any order? Also, I'd definitely insert prints in (same link as before) to get a better idea what's happening. I'd try to print out the typeid name somehow. |
I will give it a try. Thanks! Maybe another data point on this. I just tried using latest version of pybind11 + absl combo found from this repo and that actually makes the import error go away (at least if I import tensorflow first). I couldn't get the reverse to work out as it complains about not finding pybind11_abseil module, but I suppose that is because I didn't put the status.so in my data dependency. |
Sorry my C++ and Bazel is bad, but is it not calling with
Update
|
Got the typeid name, it's |
Ouch. This is beginning to look very tricky. Maybe it's something simple, but I don't see it. I'm beginning to wonder: could there be two That's just a wild guess, can you look? I think look for Alternatively, import only tensorflow, then look in sys.modules, import only launchpad, again look in sys.modules. Something like |
Oh, that's another potential issue: maybe the two packages use different abseil-cpp versions? |
I got the absl-cpp version for TF 2.14.0 from https://github.com/tensorflow/tensorflow/blob/v2.14.0/third_party/absl/workspace.bzl I don't see any pybind11_abseil.status module in either TF or launchpad. Looking into my sitepackages I don't see those either. I think both launchpad and TF goes through the old |
pybind11_abseil/pybind11_abseil/import_status_module.cc Lines 13 to 16 in 4b883e4
I disabled that a more than year ago, in September 2022: What pybind11_abseil sources where used when TF was built? What pybind11_abseil sources does launchpad use? I never really understood the bypass mechanism, therefore cannot really tell what could go wrong. I was glad to get rid of it. I thought. |
2c4932e (ref https://github.com/tensorflow/tensorflow/blob/v2.14.0/third_party/pybind11_abseil/workspace.bzl#L10-L13)
1bb411e (ref https://github.com/google-deepmind/launchpad/blob/3b28eaed02c4294197b9ca2b8988cf68d8b5d868/WORKSPACE#L31-L34) I think the inconsistency in the pybind11_abseil does not matter as I can reproduce the issue with the TF version in my repro repo. The commit they use also predates the deprecation of the value (and a switch of the default). Looking closer at the diff of https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/util/tf_stack.cc Between v2.14.0 and v2.13.1, the difference is that 2.14.0 starts using pybind11_abseil. There is usage of that in v2.13.1, but those usage never seems to be part of imports in the Python package, I think we can say that this issue has been there for a while :) Looking at your test cases in this repo, I see that to test depends on pybind11_abseil:status.so.
|
Away from keyboard
Qq
Can you build TF also from sources?
I believe: even if that takes you a few hours, it'll still be faster than
debugging a frankenbuild with outdated sources.
…On Fri, Oct 20, 2023 at 17:17 Yicheng Luo ***@***.***> wrote:
What pybind11_abseil sources where used when TF was built?
2c4932e
<2c4932e>
(ref
https://github.com/tensorflow/tensorflow/blob/v2.14.0/third_party/pybind11_abseil/workspace.bzl#L10-L13
)
What pybind11_abseil sources does launchpad use?
1bb411e
<1bb411e>
(ref
https://github.com/google-deepmind/launchpad/blob/3b28eaed02c4294197b9ca2b8988cf68d8b5d868/WORKSPACE#L31-L34
)
I think the inconsistency in the pybind11_abseil does not matter as I can
reproduce the issue with the TF version in my repro repo. The commit they
use also predates the deprecation of the value (and a switch of the
default).
Looking closer at the diff of
https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/util/tf_stack.cc
This is the import that gives ImportError if we import TF after
Between v2.14.0 and v2.13.1, the difference is that 2.14.0 starts using
pybind11_abseil.
Looking at your test cases in this repo, I see that to test depends on
pybind11_abseil:status.so.
I understand this is required because we need to go through the regular
python imports so the status module need to be in sys.path. However,
neither TensorFlow nor launchpad distributes status.so and I think both is
depending on the branch which is the default in 1bb411e
<1bb411e>
if (bypass_regular_import) {
auto m = reinterpret_borrow<module_>(PyImport_AddModule(
PYBIND11_TOSTRING(PYBIND11_ABSEIL_STATUS_MODULE_PATH)));
if (!internal::IsStatusModuleImported()) {
internal::RegisterStatusBindings(m);
}
// else no-op because bindings are already loaded.
return m;
}
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFUZAGIN5A7VZ52GTSAMYLYAMIBDAVCNFSM6AAAAAA6ICKLBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZTGU2DQMRZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I will dig around :) Enjoy your weekend |
This is a rabbit hole.. Built from source TF v2.14.0, I now get imports to work (so we do not throw ImportError in my repro repo). However, I tried integrating my build into the launchpad repo. If I import launchpad before TensorFlow then things would work ok. But if I import TF first and then Launchpad I can't pass the unit test and it's saying:
The type_name does match though in either case. Building from source I got |
That looks like something I missed in a super-sized series of changes about a year ago. Could you please try this change: - exc = StatusNotOk(s.message())
+ exc = StatusNotOk(s)
Hm... that sounds like you still have two versions of
These are completely different. Did you update one or the other to use the same version? Really, pybind11_abseil should only be built once. I'm not familiar at all with how TF or Launchpad use bazel, and how to make both depend on the same sources + build artifacts. |
I pointed the owners of courier/python/client.py here. I'm hoping they'll update to a more modern version of pybind11_abseil. I'm also hoping they'll find a way to ensure TF and Launchpad are using the exact same version. |
Thanks!. Yeah updating the call and fix version fixes the error above. I guess the confusing part now is why things work if I built TF 2.14.0 from the source but not with their pre-built wheels. For testing, I actually built TF with the exact checkout from their tag and got it working (since I use the same version for launchpad testing anyway). |
I suspect this is a compiler thing. I just tried setting CC=clang and it seems that the error goes away. |
I'm glad you found a path that works.
That makes me suspect that the bypass import mechanism somehow subverts the ABI isolation mechanisms I pointed out before (pybind11/detail/internals.h). I wouldn't want to root-cause that. IMO the only good use of our time is to work towards
|
Thanks. I think I will probably want to fix this upstream in launchpad or tensorflow. Building and maintaining bazel project that depends on TensorFlow and is manylinux is quite some work so I hope this gets fixed upstream :)
I worked a bit on trying to get the later to work but apparently, it's quite difficult to set up a CC toolchain as used by TF. Maybe I will seek some help from the Launchpad authors or some other repos that use pre-built tensorflow as a dependency to create a clang toolchain to work with this. |
I ran this by @tkoeppe, he pointed out:
I think that's very consistent with our observations here. |
Yeah, unfortunately keeping in sync with TF's build dependency is just a lot for many downstream projects. I don't know how to best fix this either. Lucky for me, I hijacked a copy of TF's manylinux build toolchain configuration with clang and that seems to allow me to build manylinux wheels successfully. I am using that to work around issues with LaunchPad. There is a sibling project Reverb https://github.com/google-deepmind/reverb/tree/master/reverb which will likely have the same issue and I have created a PR with the fix. |
A proper "fix" would require (re)architecting a build system that provides a correct implementation of Python native extensions. This is not an easy problem. A slightly cleaner "hack" for the meantime might be to just control symbol visibility for Python module DSOs better. If the (duplicately linked) Abseil dependencies in each module are not exported, there's less chance of a mismatch at runtime. |
Hi,
I am trying to figure out some issues with using the pybind11_abseil.status usage. TensorFlow as well as some other Google projects are using the pybind11_abseil modules. I found that as of TensorFlow 2.14.0 (2.13.1 is fine), it looks like that using TensorFlow in conjunction with packages that use pybind11_abseil will fail with the error
This happens for example in google-deepmind/launchpad#44 and tensorflow/tflite-support#954. The error happens when a pybind11 extension that uses
py::google::ImportStatusModule()
is called.For example
Reversing the order we will see the same issues this time coming from TensorFlow trying to import a module that uses
ImportStatusModule.
I noticed that there are some changes to the pybind11_abseil version used in TensorFlow 2.14.0, which incoporates some changes around the import_status_module, but I don't have a really good clue about what that change entails, it would be quite useful if you can provide some help.
Thanks!
The text was updated successfully, but these errors were encountered: