Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to simulate with gazebo on Ubuntu 20.04: Segmentation fault (core dumped) gzserver $verbose $world_path $ros_args #22958

Open
lipantao opened this issue Apr 1, 2024 · 24 comments

Comments

@lipantao
Copy link

lipantao commented Apr 1, 2024

Describe the bug

I'm trying to run make px4_sitl gazebo but get the 'core dumped' error:
image
I tried to fix it by make distclean and git submodule update --init --recursive , and it succeeded twice by accident, but running SITL again gave the same failed result with the same error information. The gazebo can run perfectly standalone.

To Reproduce

I followed the instructions of the PX4 User Guide to install and configure, but got the error as described above.
All px4 commands starting SITL with gazebo-classic fail with the same core dumped error.

Expected behavior

I expect PX4 to connect to gazebo normally and successfully run SITL.

Screenshot / Media

No response

Flight Log

lw@lw-System-Product-Name:~/PX4-Autopilot$ make px4_sitl gazebo
[0/4] Performing build step for 'sitl_gazebo-classic'
ninja: no work to do.
[3/4] cd /home/lw/PX4-Autopilot/build/px4_sitl_default/src/modules...ome/lw/PX4-Autopilot /home/lw/PX4-Autopilot/build/px4_sitl_default
SITL ARGS
sitl_bin: /home/lw/PX4-Autopilot/build/px4_sitl_default/bin/px4
debugger: none
model: iris
world: none
src_path: /home/lw/PX4-Autopilot
build_path: /home/lw/PX4-Autopilot/build/px4_sitl_default
GAZEBO_PLUGIN_PATH :/home/lw/PX4-Autopilot/build/px4_sitl_default/build_gazebo-classic
GAZEBO_MODEL_PATH :/home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/models
LD_LIBRARY_PATH /home/lw/catkin_ws/devel/lib:/opt/ros/noetic/lib:/opt/ros/noetic/lib/x86_64-linux-gnu:/home/lw/PX4-Autopilot/build/px4_sitl_default/build_gazebo-classic
empty world, setting empty.world as default
Using: /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/models/iris/iris.sdf
Warning [parser.cc:833] XML Attribute[version] in element[sdf] not defined in SDF, ignoring.
/home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_run.sh: line 147: 267253 Segmentation fault (core dumped) gzserver $verbose $world_path $ros_args
SITL COMMAND: "/home/lw/PX4-Autopilot/build/px4_sitl_default/bin/px4" "/home/lw/PX4-Autopilot/build/px4_sitl_default"/etc


| ___ \ \ \ / / / |
| |/ / \ V / / /| |
| __/ / \ / /
| |
| | / /^\ \ ___ |
_| / / |_/

px4 starting.

INFO [px4] startup script: /bin/sh etc/init.d-posix/rcS 0
INFO [init] found model autostart file as SYS_AUTOSTART=10015
INFO [param] selected parameter default file parameters.bson
INFO [param] selected parameter backup file parameters_backup.bson
SYS_AUTOCONFIG: curr: 0 -> new: 1
SYS_AUTOSTART: curr: 0 -> new: 10015
CAL_ACC0_ID: curr: 0 -> new: 1310988
CAL_GYRO0_ID: curr: 0 -> new: 1310988
CAL_ACC1_ID: curr: 0 -> new: 1310996
CAL_GYRO1_ID: curr: 0 -> new: 1310996
CAL_ACC2_ID: curr: 0 -> new: 1311004
CAL_GYRO2_ID: curr: 0 -> new: 1311004
CAL_MAG0_ID: curr: 0 -> new: 197388
CAL_MAG0_PRIO: curr: -1 -> new: 50
CAL_MAG1_ID: curr: 0 -> new: 197644
CAL_MAG1_PRIO: curr: -1 -> new: 50
SENS_BOARD_X_OFF: curr: 0.0000 -> new: 0.0000
SENS_DPRES_OFF: curr: 0.0000 -> new: 0.0010
INFO [dataman] data manager file './dataman' size is 7872608 bytes
INFO [init] PX4_SIM_HOSTNAME: localhost
INFO [simulator_mavlink] Waiting for simulator to accept connection on TCP port 4560
Gazebo multi-robot simulator, version 11.14.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

[Msg] Waiting for master.
[Err] [ConnectionManager.cc:121] Failed to connect to master in 30 seconds.
[Err] [gazebo_shared.cc:78] Unable to initialize transport.
[Err] [gazebo_client.cc:56] Unable to setup Gazebo

Software Version

Gazebo 11.14.0, Ubuntu 20.04

Flight controller

None

Vehicle type

None

How are the different components wired up (including port information)

No response

Additional context

No response

@lipantao
Copy link
Author

lipantao commented Apr 1, 2024

image

@julianoes
Copy link
Contributor

I wonder if it is this one?

 Warning [parser.cc:833] XML Attribute[version] in element[sdf] not defined in SDF, ignoring.

Otherwise, can you try to open the core dump?
Or start with gdb by replacing gzserver in

gzserver $verbose $world_path $ros_args &

with gdb -ex run --args gzserver

@lipantao
Copy link
Author

lipantao commented Apr 2, 2024

Thanks for your reply! I think it is not the XML warning issue, because this warning also occurs when the simulation occasionally runs successfully, but it does not affect the operation. The only difference between success and failure is the core dumped issue.
I followed your suggestion and started with gdb, the output is as follows:
image
It seems that something wrong occurred with Boost.Asio library. I reinstalled the Boost library and gazebo but it still doesn't work.
image

@roseyanpeng
Copy link

I also encountered a similar problem. I have now determined that the problem lies in "<plugin name='mavlink_interface' filename='libgazebo_mavlink_interface.so>". After further checking its code, I found that there is a problem with the code ""mavlink_interface_ = std::make_unique". Can anyone else help check this issue? Thank you very much!

@julianoes
Copy link
Contributor

Make sure to try to get more verbose output:

export VERBOSE_SIM=1

@lipantao
Copy link
Author

lipantao commented Apr 4, 2024

Make sure to try to get more verbose output:

export VERBOSE_SIM=1

I set VERBOSE_SIM=1 and the output is:
image
image

@julianoes
Copy link
Contributor

Would be good if you could type backtrace when it segfaults to get the full backtrace.

@lipantao
Copy link
Author

lipantao commented Apr 6, 2024

Would be good if you could type backtrace when it segfaults to get the full backtrace.

Thank you again for your patient and detailed reply! I ran make px4_sitl_default gazebo-classic_iris_gdb and backtrace, and got output:
[Msg] Waiting for master.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a19700 (LWP 3876917)]
[New Thread 0x7ffff7fc6700 (LWP 3876918)]
[New Thread 0x7ffff7218700 (LWP 3876919)]

Thread 3 "px4" received signal SIG32, Real-time event 32.
[Switching to Thread 0x7ffff7fc6700 (LWP 3876918)]
__lll_lock_wait_private (futex=0x7ffff7fc6d18) at lowlevellock.c:35
35 lowlevellock.c: No such file or directory.
(gdb) bt
#0 __lll_lock_wait_private (futex=0x7ffff7fc6d18) at lowlevellock.c:35
#1 0x00007ffff7f6b7b7 in start_thread (arg=<optimized out>) at pthread_create.c:453
#2 0x00007ffff7b3e353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) quit

@julianoes
Copy link
Contributor

Hmm, I think that's the backtrace of PX4 but we need the one of gzserver.

@lipantao
Copy link
Author

lipantao commented Apr 7, 2024

Hmm, I think that's the backtrace of PX4 but we need the one of gzserver.

I think i obtained a backtrace for gzserver with the command gdb -ex run -ex "bt" --args gzserver $verbose $world_path $ros_args & in the sitl_run.sh:
image
And this is the text of the above output screenshot:
--Type <RET> for more, q to quit, c to continue without paging--Thread 53 "gzserver" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff517f7700 (LWP 22215)]
boost::asio::detail::reactive_descriptor_service::reactive_descriptor_service (context=..., this=0x7ffef8f2ddb8) at /usr/local/include/boost/asio/detail/impl/reactive_descriptor_service.ipp:39
39 reactor_.init_task();
#0 boost::asio::detail::reactive_descriptor_service::reactive_descriptor_service(boost::asio::execution_context&) (context=..., this=0x7ffef8f2ddb8) at /usr/local/include/boost/asio/detail/impl/reactive_descriptor_service.ipp:39
#1 boost::asio::detail::posix_serial_port_service::posix_serial_port_service(boost::asio::execution_context&) (context=..., this=0x7ffef8f2dd90) at /usr/local/include/boost/asio/detail/impl/posix_serial_port_service.ipp:36
#2 boost::asio::detail::service_registry::create<boost::asio::detail::posix_serial_port_service, boost::asio::io_context>(void*) (owner=owner@entry=0x7ffef89ff740) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:87
#3 0x00007fffc427ce65 in boost::asio::detail::service_registry::do_use_service(boost::asio::execution_context::service::key const&, boost::asio::execution_context::service* (*)(void*), void*)
(owner=0x7ffef89ff740, factory=0x7fffc42874b0 <boost::asio::detail::service_registry::create<boost::asio::detail::posix_serial_port_service, boost::asio::io_context>(void*)>, key=<synthetic pointer>..., this=0x7ffef8f2c370)
at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:132
#4 boost::asio::detail::service_registry::use_service<boost::asio::detail::posix_serial_port_service>(boost::asio::io_context&) (owner=..., this=0x7ffef8f2c370) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:39
#5 boost::asio::use_service<boost::asio::detail::posix_serial_port_service>(boost::asio::io_context&) (ioc=...) at /usr/local/include/boost/asio/impl/io_context.hpp:41
#6 boost::asio::detail::io_object_impl<boost::asio::detail::posix_serial_port_service, boost::asio::any_io_executor>::io_object_impl<boost::asio::io_context>(int, int, boost::asio::io_context&) (context=..., this=0x7ffef89ff750)
at /usr/local/include/boost/asio/detail/io_object_impl.hpp:58
#7 boost::asio::basic_serial_port<boost::asio::any_io_executor>::basic_serial_port<boost::asio::io_context>(boost::asio::io_context&, boost::asio::constraint<std::is_convertible<boost::asio::io_context&, boost::asio::execution_context&>::value, boost::asio::defaulted_constraint>::type) (context=..., this=0x7ffef89ff750) at /usr/local/include/boost/asio/basic_serial_port.hpp:120
#8 MavlinkInterface::MavlinkInterface() (this=0x7ffef89ef5d0) at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/mavlink_interface.cpp:4
#9 0x00007fffc423e252 in std::make_unique<MavlinkInterface>() () at /usr/include/eigen3/Eigen/src/Core/util/Memory.h:170
#10 gazebo::GazeboMavlinkInterface::GazeboMavlinkInterface() (this=0x7ffef896dba0) at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_mavlink_interface.cpp:27
#11 0x00007fffc423e390 in gazebo::RegisterPlugin() () at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_mavlink_interface.cpp:24
#12 0x00007ffff6d5a183 in () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#13 0x00007ffff6d55d55 in gazebo::physics::Model::LoadPlugin(std::shared_ptr<sdf::v9::Element>) () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#14 0x00007ffff6d56210 in gazebo::physics::Model::LoadPlugins(unsigned int) () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#15 0x00007ffff6da3b64 in gazebo::physics::World::ProcessFactoryMsgs() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#16 0x00007ffff6db0da8 in gazebo::physics::World::ProcessMessages() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#17 0x00007ffff6db1527 in gazebo::physics::World::Step() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#18 0x00007ffff6db47fd in gazebo::physics::World::RunLoop() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11
#19 0x00007ffff7625df4 in () at /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff6f23609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007ffff745f353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@julianoes
Copy link
Contributor

Interesting. It's crashing related to the serial port but you're using SITL, so it shouldn't use the serial port. Are you trying HITL? Does it happen with a fresh clone?

@lipantao
Copy link
Author

lipantao commented Apr 7, 2024

I'm not trying HITL, and my computer isn't connected to any peripherals besides the keyboard and mouse. I've verified that my PX4 code is up to date. Additionally, I've listed the currently utilized serial ports by sudo lsof | grep /dev/tty, and it seems that none of them are occupied except for some system processes.
image

@lipantao
Copy link
Author

lipantao commented Apr 7, 2024

And I've noticed a significant improvement in success rate when using some other models for simulation, such as make px4_sitl gazebo-classic_typhoon_h480. On average, out of ten attempts, it succeeds two to three times, whereas with the default model's command, it might take dozens of attempts to succeed once.

@julianoes
Copy link
Contributor

I'm sorry I'm out of ideas.

@lipantao
Copy link
Author

lipantao commented Apr 8, 2024

That's OK. Thank you very much!

@lipantao
Copy link
Author

lipantao commented Apr 8, 2024

I uninstalled Gazebo 11, which was installed using sudo apt-get install (by ubuntu.sh ), and then reinstalled Gazebo 11 from the source code following the official instructions https://classic.gazebosim.org/tutorials?tut=install_from_source&cat=install . Now, the SITL succefully runs smoothly and stably!

@julianoes
Copy link
Contributor

Ah nice. So you actually built it from source? Or installed?

@lipantao
Copy link
Author

Yes, I built it from the latest source code and then installed by sudo make install. It's also the latest Gazebo 11.14.0 version.

@julianoes
Copy link
Contributor

Wow, that's commitment! Still puzzled why that happened.

@lipantao
Copy link
Author

I'm also confused about the difference that happened between these two installation methods.

@mengchaoheng
Copy link

@julianoes The same problem on macOS14.3, but I can run all sitl before I update my os version from 14.2 to 14.3. Maybe the error come from gazebo,since I can run jmavsim successfully. Details on #22826

@mengchaoheng
Copy link

Maybe gazebosim/gazebo-classic#3380 ?

@mengchaoheng
Copy link

mengchaoheng commented Apr 20, 2024

@julianoes The same problem on macOS14.3, but I can run all sitl before I update my os version from 14.2 to 14.3. Maybe the error come from gazebo,since I can run jmavsim successfully. Details on #22826

I have fix! #22826 (comment)

@liam-keepmove
Copy link

My gdb debug shows the same as yours, and now I've solved it. In my case, I installed Boost1.85 by compiling the source code, and then installed gazebo via apt, which also installed libboost-dev (Boost 1.71). Now I have two incompletely compatible Boost libraries on my computer. Then when I ran "make px4_sitl gazebo-classic", the generate target seemed to depend on Boost as well, and there was confusion. Due to my limited time, I didn't go into detail, but I guess it was a core dump caused by mixing two incompatible libraries.
My solution:

  1. remove Boost installed from source, keep libboost-dev(1.71) installed using apt.
  2. clean up the px4-autopilot project.
  3. make again.
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants