-
Notifications
You must be signed in to change notification settings - Fork 16
Test "Lin Bytes test with Domain" is flaky #541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Reported in Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101243 |
Thanks for sharing, acknowledged! Are these run in some form of containerized environment? "Lin Bytes test with Domain" has been a pain point also in #520. The test was furthermore extended in #521. |
Right.
Yes. I don't know exactly the setup for the amd64 occurrence, but the s390x one surely ultimately boils down to either chroot or unshare.
I don't know the exact setup for the amd64 occurrence, but the s390x one occured on this machine: https://db.debian.org/machines.cgi?host=zandonai (I would say "native", if that means anything for s390x). |
Hello. I am the one who filed the original report in Debian. While we are at it: I've also noticed that the tests fail 100% of the time on the single-CPU instances where I tried to build the package. Am I right to think, given the package name (multicoretests) that it does not make sense to run the tests on single-CPU systems? (I guess this could be fixed easily in debian/rules). Or maybe the tests themselves could detect that they are being executed on a single-cpu system and do nothing in such case? (Note that I can build 99.9% of all packages in Debian using machines with 1 CPU, and it's still more cost-efficient, so I see no Also: Can I expect the tests to work flawlessly (i.e. not randomly) on machines with 2 CPUs? (i.e. does multicore just mean more than 1?) Thanks. |
The amd64 failure I reported happened on a VM from AWS of type (So, the machine is virtual, but considering that the underlying hardware is probably amd64 as well, I think this would count more as "native" than "emulated"). Thanks. |
Yes. The test suite has been developed to stress test OCaml5's multicore support. To do so, we have a collection of both positive and negative tests:
The "Lin Bytes test with Domain" belongs to the latter category, and hence doesn't really make sense to run on a single core, as it will fail to find a counterexample.
Running the test suite on 2 cores may be sufficient. The macOS M1 GitHub runner offers 3 cores and run fine:
We have published the other 3 opam packages from this repository (qcheck-lin, qcheck-stm, qcheck-multicoretests-util) on OCaml's opam-repository as they should be of more general interest. However we have not published the |
Hm. Potentially the 2 (v)CPUs could be the reason then... Many of the tests run a parent thread (it is called a "Domain" in OCaml's multicore terminology) and then spawn two child threads (Domains) to wreck havoc by simultaneous parallel manipulation. If the 2 vCPUs mean that the AWS VM may take additional time to reschedule (pausing the parent thread and lending the second vCPU to the second child), this could ruin the chance of triggering sufficient parallelism to find a counterexample I suppose... 🤔 |
I had a last thought, which I just wanted to share:
precisely to avoid individual tests stepping over each other's toes, by running in parallel and thereby unknowingly preventing the discovery of counterexamples. I can see in https://people.debian.org/~sanvila/build-logs/202503/ocaml-multicoretests_0.7-1_amd64-20250317T133603.533Z that you are running the test suite with 2 simultaneous jobs:
This could also explain why it is harder to discover counterexamples, e.g., with only 2 vCPUs. I admit that one has to read between the lines to understand the above recommendation, Finally I can see your CI logs are lengthy from the many QCheck messages printed. For example, we use an interval of 60 seconds ourselves which gives a reasonable amount of output compared to
|
This just triggered on MSVC bytecode
With it, this has been observed on s390x, Linux amd64, and MSVC bytecode. |
The test "Lin Bytes test with Domain" does not always behave the same, causing random failures in Debian:
The failure seldom happens, and I couldn't reproduce it myself (even using the same nonce).
The text was updated successfully, but these errors were encountered: