Test case and fix for for https://github.com/ros2/rclcpp/issues/2652#2713
Test case and fix for for https://github.com/ros2/rclcpp/issues/2652#2713mjcarroll merged 5 commits intoros2:rollingfrom
Conversation
6906974 to
660ffaa
Compare
|
@luca-della-vedova @fmrico can you test this fix ? |
luca-della-vedova
left a comment
There was a problem hiding this comment.
I tried on the reproduction repo and this branch indeed fixes it!
660ffaa to
e3142d0
Compare
Sure!! testing... |
|
I have tested this PR against:
Everything is working now: LGTM 🔥 🚀 |
fujitatomoya
left a comment
There was a problem hiding this comment.
@jmachowinski lgtm with green CI.
i would like to confirm one thing. how come do you consider this breaks ABI by the way? Templated functions are instantiated at compile-time, and their changes do not affect the binary interface (ABI) of the library.
This specific case is likely not to break ABI, but leaves you in a bad place. Depending on what was compiled at which time, you end up with the fix or not. E.g. you have libraries, that use the Executor therefore they contain a symbol of the Executor. These libraries were compiled before the fix. Even though you now compile with the new fixed template, the linker will choose one symbol to use later on during linking, and may use the 'broken' version from the library. In general, you can break ABI with templates, but you need to have a libraries. To break ABI, you need a to have a templated type, that changed its memory layout due to an alteration of the template. If this type is used by library A, but this was compiled against the old version of the template, and you now try to use the library with the new template, you get an ABI break, and most likely a crash... |
|
As we discussed in PMC, this doesn't really break ABI, but updating rclcpp by itself is insufficient to get the fix. You must also recompile any downstream users of the template. It's enough to justify a notification, and we are planning on doing a discourse post about it. |
|
Pulls: #2713 |
78d859d to
27b2d0e
Compare
|
Pulls: #2713 |
|
just in case i started the CI after rebasing. |
wjwwood
left a comment
There was a problem hiding this comment.
lgtm, but I had one question that might lead to an edit
| if(this->needs_pruning_) { | ||
| this->storage_prune_deleted_entities(); | ||
| this->needs_pruning_ = false; | ||
| } | ||
|
|
||
| this->storage_release_ownerships(); |
There was a problem hiding this comment.
Should this check and the pruning happen after releasing ownerships? It seems like holding the ownership by the storage might cause the weak pointers to have not expired even though all the user's shared pointers have been destroyed? Maybe it gets caught on the next run through?
I suppose this would only happen if something lost all ownership (except the wait set's) after needs_pruning_ became true but before this step.
Either way, since storage_prune_deleted_entities() iterates through all of the wait set's weak pointers and checks them for expired, we might as well release ownerships first no?
There was a problem hiding this comment.
This needs to happen while holding the ownership, in order to be sure that the indices still match. If you release ownership, and prune, and some references were dropped in between, the indices of the rcl waitset will not match any more the one in the storage. Basically leading to the same bug as before.
I don't like this implicit matching of indixes in general. I would like to introduce a backmapping datastructure later on, that uses the indices returned by the rcl_add functions
There was a problem hiding this comment.
@wjwwood does this answer your question ?
27b2d0e to
8f8aab8
Compare
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Francisco Martín Rico <fmrico@gmail.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
8f8aab8 to
0c42945
Compare
|
Pulls: #2713 |
|
Pulls: #2713 |
|
Only flakes on Windows here looks like. |
|
The windows test is my pi something. It's been there in all recent ci runs and seems unrelated. |
|
@Mergifyio backport jazzy |
❌ Command disallowed due to command restrictions in the Mergify configuration.Details
|
|
@fujitatomoya could you trigger the backport ? |
|
@jmachowinski yeah i will take care of that. |
|
@Mergifyio backport jazzy |
✅ Backports have been createdDetails
|
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Signed-off-by: Francisco Martín Rico <fmrico@gmail.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Francisco Martín Rico <fmrico@gmail.com> (cherry picked from commit 605251b)
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Signed-off-by: Francisco Martín Rico <fmrico@gmail.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Francisco Martín Rico <fmrico@gmail.com> (cherry picked from commit 605251b) Co-authored-by: Janosch Machowinski <jmachowinski@users.noreply.github.com>
This is a test case and fix for the bug #2652