Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[19267] Improve SHM resilience to crashing participants #3759

Merged
merged 4 commits into from
Jul 31, 2023

Conversation

MiguelCompany
Copy link
Member

@MiguelCompany MiguelCompany commented Jul 27, 2023

Description

This, along with #3753, highly improves the situation described in ros2/rmw_fastrtps#699.
I'm still not sure whether it is completely fixed, but the changes here fix two obvious bugs.

  1. When a participant crashes just after pushing a descriptor into the listening port of another participant, the descriptor should be popped from the port even if it points to a non-existent / corrupted segment. Otherwise the listener enters in an infinite loop.
  2. Some mutex locks could throw an unhandled timeout exception, making participant crashes more probable.

@Mergifyio backport 2.11.x 2.10.x 2.6.x

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • N/A Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally
  • N/A Any new/modified methods have been properly documented using Doxygen.
  • Changes are ABI compatible.
  • Changes are API compatible.
  • N/A New feature has been added to the versions.md file (if applicable).
  • N/A New feature has been documented/Current behavior is correctly described in the documentation.
  • Applicable backports have been included in the description.

Reviewer Checklist

  • The PR has a milestone assigned.
  • Check contributor checklist is correct.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@MiguelCompany MiguelCompany added this to the v2.12.0 milestone Jul 27, 2023
jsan-rt
jsan-rt previously approved these changes Jul 27, 2023
Signed-off-by: Miguel Company <[email protected]>
@MiguelCompany
Copy link
Member Author

@richiprosima Please test windows

@MiguelCompany MiguelCompany merged commit 7cf43a6 into master Jul 31, 2023
@MiguelCompany MiguelCompany deleted the bugfix/shm/resilience branch July 31, 2023 09:21
@MiguelCompany
Copy link
Member Author

@Mergifyio backport 2.11.x 2.10.x 2.6.x

@mergify
Copy link
Contributor

mergify bot commented Jul 31, 2023

backport 2.11.x 2.10.x 2.6.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jul 31, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)
mergify bot pushed a commit that referenced this pull request Jul 31, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)
mergify bot pushed a commit that referenced this pull request Jul 31, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)

# Conflicts:
#	src/cpp/rtps/transport/shared_mem/SharedMemGlobal.hpp
@MiguelCompany MiguelCompany changed the title Improve SHM resilience to crashing participants [19267] Improve SHM resilience to crashing participants Jul 31, 2023
MiguelCompany added a commit that referenced this pull request Aug 1, 2023
* Improve SHM resilience to crashing participants (#3759)

* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)

# Conflicts:
#	src/cpp/rtps/transport/shared_mem/SharedMemGlobal.hpp

* Fix conflicts

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
Co-authored-by: Miguel Company <[email protected]>
MiguelCompany added a commit that referenced this pull request Aug 7, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)

Co-authored-by: Miguel Company <[email protected]>
MiguelCompany added a commit that referenced this pull request Aug 8, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
(cherry picked from commit 7cf43a6)

Co-authored-by: Miguel Company <[email protected]>
juanlofer-eprosima pushed a commit that referenced this pull request Nov 10, 2023
* Refs #19255. Always pop descriptor from SHM port.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on DataSharing notifications.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Catch lock timeout on SharedMemGlobal.

Signed-off-by: Miguel Company <[email protected]>

* Refs #19255. Please linters.

Signed-off-by: Miguel Company <[email protected]>

---------

Signed-off-by: Miguel Company <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants