-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Make map operations deterministic in quorum queues #13971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Prior to this commit map iteration order was undefined in quorum queues and could therefore be different on different versions of Erlang/OTP. Example: OTP 26.2.5.3 ``` Erlang/OTP 26 [erts-14.2.5.3] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit] Eshell V14.2.5.3 (press Ctrl+G to abort, type help(). for help) 1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)). 4,25,8,1,23,10,7,9,11,12,28,24,13,3,18,29,26,22,19,2,33,21,32,20,17,30,14,5,6,27,16,31,15,ok ``` OTP 27.3.3 ``` Erlang/OTP 27 [erts-15.2.6] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit] Eshell V15.2.6 (press Ctrl+G to abort, type help(). for help) 1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)). 18,4,12,19,29,13,2,7,31,8,10,23,9,15,32,1,25,28,20,6,11,17,24,14,33,3,16,30,21,5,27,26,22,ok ``` This can lead to non-determinism on different members. For example, different members could potentially return messages in a different order. This commit introduces a new machine version fixing this bug.
This commit adds a property test that applies the same Ra commands in the same order on two different Erlang nodes. The state in which both nodes end up should be exactly the same. Ideally, the two nodes should run different OTP versions because this way we could test for any non-determinism across OTP versions. However, for now, having a test with both nodes having the same OTP verison is good enough because running this test with rabbit_fifo machine version 5 fails while machine version 6 succeeds. This reveales another interesting: The default "undefined" map order can even be different using different Erlang nodes with the **same** OTP version.
For test case leader_locator_balanced the actual leaders elected were nodes 1, 3, 1 because they know about machine version 6 while node 2 only knows about machine version 5.
kjnilsson
approved these changes
Jun 4, 2025
ansd
added a commit
that referenced
this pull request
Jun 6, 2025
## What? PR #13971 added a property test that applies the same quorum queue Raft command on different quorum queue members on different Erlang nodes ensuring that the state machine ends up in exaclty the same state. The different Erlang nodes run the **same** Erlang/OTP version however. This commit adds another property test where the different Erlang nodes run **different** Erlang/OTP versions. ## Why? This test allows spotting any non-determinism that could occur when running quorum queue members in a mixed version cluster, where mixed version means in our context different Erlang/OTP versions. ## How? CI runs currently tests with Erlang 27. This commit starts an Erlang 26 node in docker, specifically for the `rabbit_fifo_prop_SUITE`. Test case `two_nodes_different_otp_version` running Erlang 27 then transfers a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node. The test case then runs the Ra commands on its own node in Erlang 27 and on the Erlang 26 node in Docker. By default, this test case is skipped locally. However, to run this test case locally, simply start an Erlang node as follows: ``` erl -sname rabbit_fifo_prop@localhost ```
ansd
added a commit
that referenced
this pull request
Jun 6, 2025
## What? PR #13971 added a property test that applies the same quorum queue Raft command on different quorum queue members on different Erlang nodes ensuring that the state machine ends up in exaclty the same state. The different Erlang nodes run the **same** Erlang/OTP version however. This commit adds another property test where the different Erlang nodes run **different** Erlang/OTP versions. ## Why? This test allows spotting any non-determinism that could occur when running quorum queue members in a mixed version cluster, where mixed version means in our context different Erlang/OTP versions. ## How? CI runs currently tests with Erlang 27. This commit starts an Erlang 26 node in docker, specifically for the `rabbit_fifo_prop_SUITE`. Test case `two_nodes_different_otp_version` running Erlang 27 then transfers a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node. The test case then runs the Ra commands on its own node in Erlang 27 and on the Erlang 26 node in Docker. By default, this test case is skipped locally. However, to run this test case locally, simply start an Erlang node as follows: ``` erl -sname rabbit_fifo_prop@localhost ```
mergify bot
pushed a commit
that referenced
this pull request
Jun 6, 2025
## What? PR #13971 added a property test that applies the same quorum queue Raft command on different quorum queue members on different Erlang nodes ensuring that the state machine ends up in exaclty the same state. The different Erlang nodes run the **same** Erlang/OTP version however. This commit adds another property test where the different Erlang nodes run **different** Erlang/OTP versions. ## Why? This test allows spotting any non-determinism that could occur when running quorum queue members in a mixed version cluster, where mixed version means in our context different Erlang/OTP versions. ## How? CI runs currently tests with Erlang 27. This commit starts an Erlang 26 node in docker, specifically for the `rabbit_fifo_prop_SUITE`. Test case `two_nodes_different_otp_version` running Erlang 27 then transfers a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node. The test case then runs the Ra commands on its own node in Erlang 27 and on the Erlang 26 node in Docker. By default, this test case is skipped locally. However, to run this test case locally, simply start an Erlang node as follows: ``` erl -sname rabbit_fifo_prop@localhost ``` (cherry picked from commit eccf9fe)
ansd
added a commit
that referenced
this pull request
Jun 6, 2025
## What? PR #13971 added a property test that applies the same quorum queue Raft command on different quorum queue members on different Erlang nodes ensuring that the state machine ends up in exaclty the same state. The different Erlang nodes run the **same** Erlang/OTP version however. This commit adds another property test where the different Erlang nodes run **different** Erlang/OTP versions. ## Why? This test allows spotting any non-determinism that could occur when running quorum queue members in a mixed version cluster, where mixed version means in our context different Erlang/OTP versions. ## How? CI runs currently tests with Erlang 27. This commit starts an Erlang 26 node in docker, specifically for the `rabbit_fifo_prop_SUITE`. Test case `two_nodes_different_otp_version` running Erlang 27 then transfers a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node. The test case then runs the Ra commands on its own node in Erlang 27 and on the Erlang 26 node in Docker. By default, this test case is skipped locally. However, to run this test case locally, simply start an Erlang node as follows: ``` erl -sname rabbit_fifo_prop@localhost ``` (cherry picked from commit eccf9fe)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prior to this commit map iteration order was undefined in quorum queues and could therefore be different on different RabbitMQ nodes.
Example:
OTP 26.2.5.3
OTP 27.3.3
This can lead to non-determinism on different members. For example, different members could potentially return messages in a different order.
This commit introduces a new machine version fixing this bug.
The property test added by this PR shows that non-determinism could have occurred (prior to this PR) even on different nodes running the same OTP version (tested on macOS and OTP 27.3.3). Such a test failure can be reproduced by changing the rabbit_fifo machine version from 6 to 5 in the new test case.
I also tested manually that the new test succeeds if the CT node runs OTP 27.3.3 and another locally started Erlang node runs OTP 26.2.5.3.