Skip to content
This repository was archived by the owner on Nov 5, 2023. It is now read-only.

SIGSEGV on exit #52

Closed
virtuald opened this issue Jan 1, 2019 · 3 comments
Closed

SIGSEGV on exit #52

virtuald opened this issue Jan 1, 2019 · 3 comments

Comments

@virtuald
Copy link
Member

virtuald commented Jan 1, 2019

Currently we release the GIL when calling CS_Shutdown, and sometimes this happens:

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea629237 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=4294967293)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea61cd0c in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=4294967293) at /usr/include/c++/8/bits/std_function.h:257
#4  0x00007fffea630287 in cs::Notifier::Thread::__dt_base() () at /usr/include/c++/8/bits/std_function.h:257
#5  0x00007fffea627957 in std::_Sp_counted_base::_M_release (this=0x7fffd8115f60)
    at /usr/include/c++/8/bits/shared_ptr_base.h:155
#6  0x00007fffea627957 in std::_Sp_counted_base::_M_release() (this=0x7fffd8115f60)
    at /usr/include/c++/8/bits/shared_ptr_base.h:148
#7  0x00007fffea641d6d in wpi::detail::SafeThreadOwnerBase::Stop() (this=0x555555a254f8)
    at /usr/include/c++/8/bits/shared_ptr_base.h:706
#8  0x00007fffea6b5bde in __lambda78::_FUN(void*) ()
    at cscore_src/cscore/src/main/native/cpp/Notifier.cpp:100
#9  0x00007ffff7cf5414 in  () at /lib64/libpython3.7m.so.1.0
#10 0x00007ffff7d13f1f in  () at /lib64/libpython3.7m.so.1.0
#11 0x00007ffff7da6770 in  () at /lib64/libpython3.7m.so.1.0
#12 0x00007ffff7d6cd67 in PyDict_SetItem () at /lib64/libpython3.7m.so.1.0
#13 0x00007ffff7daa50e in _PyModule_ClearDict () at /lib64/libpython3.7m.so.1.0
#14 0x00007ffff7df0d09 in PyImport_Cleanup () at /lib64/libpython3.7m.so.1.0
#15 0x00007ffff7e57f68 in Py_FinalizeEx () at /lib64/libpython3.7m.so.1.0
#16 0x00007ffff7e5a604 in  () at /lib64/libpython3.7m.so.1.0
#17 0x00007ffff7e5abdc in _Py_UnixMain () at /lib64/libpython3.7m.so.1.0
#18 0x00007ffff78c8413 in __libc_start_main () at /lib64/libc.so.6
#19 0x000055555555508e in _start ()

The error is on a Py_XDECREF, so that's probably related to releasing the GIL?

I release the GIL, and it seems that pybind11 has wound itself up into the std::function pointers, so cscore would need to clear them on exit.

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea629087 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=4294967293)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea61ccbc in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=4294967293) at /usr/include/c++/8/bits/std_function.h:257
#4  0x00007fffea630277 in cs::Notifier::Thread::__dt_base() () at /usr/include/c++/8/bits/std_function.h:257
#5  0x00007fffea63dd88 in std::thread::_State_impl::__dt_base ()
    at /usr/include/c++/8/bits/shared_ptr_base.h:155
#6  0x00007fffea63dd88 in std::thread::_State_impl::__dt_del() () at /usr/include/c++/8/thread:188
#7  0x00007fffe60e694c in  () at /lib64/libstdc++.so.6
#8  0x00007ffff7c0158e in start_thread () at /lib64/libpthread.so.0
#9  0x00007ffff79a16a3 in clone () at /lib64/libc.so.6
@virtuald
Copy link
Member Author

virtuald commented Jan 1, 2019

Fixed the Stop function to cleanup after itself, but here's another weird stack trace (with optimizations disabled):

(gdb) bt
#0  0x00007ffff7d14232 in  () at /lib64/libpython3.7m.so.1.0
#1  0x00007ffff7da637e in  () at /lib64/libpython3.7m.so.1.0
#2  0x00007fffea39d1af in pybind11::handle::dec_ref() const & (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:165
#3  0x00007fffea39d274 in pybind11::object::~object() (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:208
#4  0x00007fffea3a6798 in pybind11::function::~function() (this=<optimized out>)
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/pytypes.h:1212
#5  0x00007fffea469f4e in pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}::~handle() ()
    at /mnt/sdb1/virtuald_dot/virtualenvs/frc/include/site/python3.7/pybind11/functional.h:57
#6  0x00007fffea46f002 in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...)
    at /usr/include/c++/8/bits/std_function.h:188
#7  0x00007fffea46e97f in std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<pybind11::detail::type_caster<std::function<void (cs::VideoEvent const&)>, void>::load(pybind11::handle, bool)::{lambda(cs::VideoEvent const&)#1}> const&, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>) at /usr/include/c++/8/bits/std_function.h:212
#8  0x00007fffea3ab5fd in std::_Function_base::~_Function_base() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:257
#9  0x00007fffea3adfca in std::function<void (cs::VideoEvent const&)>::~function() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:370
#10 0x00007fffea3adfe4 in cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}::~RawEvent() ()
    at cscore_src/cscore/src/main/native/include/cscore_oo.inl:617
#11 0x00007fffea3ea6a1 in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#12 0x00007fffea3d064d in std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<cs::VideoListener::VideoListener(std::function<void (cs::VideoEvent const&)>, int, bool)::{lambda(cs::RawEvent const&)#1}> const&, std::_Manager_operation)
    (__dest=..., __source=..., __op=<optimized out>) at /usr/include/c++/8/bits/std_function.h:212
#13 0x00007fffea3ab5fd in std::_Function_base::~_Function_base() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:257
#14 0x00007fffea3ae024 in std::function<void (cs::RawEvent const&)>::~function() (this=<optimized out>)
    at /usr/include/c++/8/bits/std_function.h:370
#15 0x00007fffea482314 in cs::Notifier::Thread::Main() (this=<optimized out>)
    at cscore_src/cscore/src/main/native/cpp/Notifier.cpp:143
#16 0x00007fffea4ed848 in wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::{lambda()#1}::operator()() const () at cscore_src/wpiutil/src/main/native/cpp/SafeThread.cpp:34
#17 0x00007fffea4ee74a in std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::<lambda()> >(void) (__f=...) at /usr/include/c++/8/bits/invoke.h:60
#18 0x00007fffea4ee0b8 in std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThread>)::<lambda()> >(void) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#19 0x00007fffea4eeef6 in std::_M_invoke<0>() (this=<optimized out>) at /usr/include/c++/8/thread:244
#20 0x00007fffea4eeeb7 in std::operator()() (this=<optimized out>) at /usr/include/c++/8/thread:253
#21 0x00007fffea4eee8e in std::_M_run() (this=<optimized out>) at /usr/include/c++/8/thread:196
#22 0x00007fffe5e6f943 in  () at /lib64/libstdc++.so.6
#23 0x00007ffff7c0158e in start_thread () at /lib64/libpthread.so.0
#24 0x00007ffff79a16a3 in clone () at /lib64/libc.so.6

@virtuald
Copy link
Member Author

virtuald commented Jan 1, 2019

Hm, this seems vaguely related as well: pybind/pybind11#1595

@virtuald
Copy link
Member Author

virtuald commented Jan 1, 2019

Ok, so the problem is almost certainly related to the way that WPI::SafeThread destruction occurs and how pybind11 expects destruction to occur.

  • pybind11 does not obtain the GIL in object/function destructors, so you must hold the GIL when those things are destroyed
  • The notifier holds onto references that are released when the notifier gets destroyed -- so we need to hold onto any references that could be passed into the notifier, and release them on shutdown.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant