Skip to content

Run Quorum Queue property test on different OTP versions #14042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025

Conversation

ansd
Copy link
Member

@ansd ansd commented Jun 6, 2025

What?

PR #13971 added a property test that applies the same quorum queue Raft
commands on different quorum queue members on different Erlang nodes
ensuring that the state machine ends up in exaclty the same state.
The different Erlang nodes run the same Erlang/OTP version however.

This commit adds another property test where the different Erlang nodes
run different Erlang/OTP versions.

Why?

This test allows spotting any non-determinism that could occur when
running quorum queue members in a mixed version cluster, where mixed
version means in our context different Erlang/OTP versions.

How?

CI runs currently tests with Erlang 27.

This commit starts an Erlang 26 node in docker, specifically for the
rabbit_fifo_prop_SUITE.

Test case two_nodes_different_otp_version running Erlang 27 then transfers
a few Erlang modules (e.g. module rabbit_fifo) to the Erlang 26 node.
The test case then runs the Ra commands on its own node in Erlang 27 and
on the Erlang 26 node in Docker.

By default, this test case is skipped locally to avoid any local dependency on
Docker and to avoid assuming specific OTP versions being installed on the local host.
However, to run this test case locally, simply start a lower versioned Erlang node as
follows:

erl -sname rabbit_fifo_prop@localhost

@ansd ansd force-pushed the fifo-prop-suite-different-otp branch 2 times, most recently from 79b4490 to fe2adf2 Compare June 6, 2025 09:47
@mergify mergify bot added the make label Jun 6, 2025
@ansd ansd force-pushed the fifo-prop-suite-different-otp branch from 6a2a4e5 to 5884ba4 Compare June 6, 2025 10:06
 ## What?

PR #13971 added a property test that applies the same quorum queue Raft
command on different quorum queue members on different Erlang nodes
ensuring that the state machine ends up in exaclty the same state.
The different Erlang nodes run the **same** Erlang/OTP version however.

This commit adds another property test where the different Erlang nodes
run **different** Erlang/OTP versions.

 ## Why?

This test allows spotting any non-determinism that could occur when
running quorum queue members in a mixed version cluster, where mixed
version means in our context different Erlang/OTP versions.

 ## How?

CI runs currently tests with Erlang 27.

This commit starts an Erlang 26 node in docker, specifically for the
`rabbit_fifo_prop_SUITE`.

Test case `two_nodes_different_otp_version` running Erlang 27 then transfers
a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node.
The test case then runs the Ra commands on its own node in Erlang 27 and
on the Erlang 26 node in Docker.

By default, this test case is skipped locally.
However, to run this test case locally, simply start an Erlang node as
follows:
```
erl -sname rabbit_fifo_prop@localhost
```
@ansd ansd force-pushed the fifo-prop-suite-different-otp branch from 14f42f1 to 1415b04 Compare June 6, 2025 10:45
@ansd ansd changed the title Run QQ property test on different OTP versions Run Quorum Queue property test on different OTP versions Jun 6, 2025
@ansd ansd marked this pull request as ready for review June 6, 2025 12:08
@ansd ansd merged commit eccf9fe into main Jun 6, 2025
286 checks passed
@ansd ansd deleted the fifo-prop-suite-different-otp branch June 6, 2025 15:08
ansd added a commit that referenced this pull request Jun 25, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
michaelklishin pushed a commit that referenced this pull request Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant