Skip to content

Improve context switching chance in Lin_thread and STM_thread modes #540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 25, 2025

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Mar 21, 2025

This fixes #338 - or at least gives it a good kick in the right direction.

Both Lin_thread and STM_thread are currently marked as experimental - and for good reason:
The chance of them triggering unfortunate, error-triggering Thread context switches is currently slim.
In hindsight, releasing them earlier - even as experimental - was probably premature.

Recall that a Thread context switch may happen

IIUC:

This PR improves on the situation by utilizing a terrific hack by @edwintorok in the form of a Gc.Memprof callback:
Upon allocations, with a certain probability, Memprof will trigger a callback which can be utilized to trigger an explicit Thread.yield.

To get an idea how well this works, I've run stats on our usual {int,int64} x {ref,CList} negative tests, repeating each one 10.000 times, and recording how many of these failed the "interface is Thread-safe" property. Since the Thread failures are still relatively rare, each of the 10.000 test inputs is Util.repeated 25 times, to bring up the probability of a failure.

For CList, the Domain-unsafe add_node does not do any allocations in the central "unsafe window", effectively making it Thread safe. I've therefore introduced an even worse add_node_thread by moving an allocation to this part,
for the purpose of testing...

Each test is run on both 5.3.0, 5.2.0, and 4.14.2 - with only 5.2.0 not having Gc.Memprof support: It is available in 4.14 and was restored in 5.3. By running on 5.2 the Memprof.{start,stop} calls thereby degrade to essentially noops, and enable an easy comparison with/without the improvement.

STM numbers with Util.repeat 25:

                        5.3.0  5.3.0-bc  5.2.0  5.2.0-bc  4.14.2  4.14.2-bc

int ref STM Thread         0       0        0        0        0       0

int64 ref STM Thread     925       0(!)    34        0      899       0(!)

CList int STM.thread     260     260        0        0      275     268

CList int64 STM.thread   277     274        0        0      274     280

Lin numbers with Util.repeat 25:

                        5.3.0  5.3.0-bc  5.2.0  5.2.0-bc  4.14.2  4.14.2-bc

int ref Lin Thread         0       0        0        0        0       0

int64 ref Lin Thread     796       0(!)     1        0      804       0(!)

int CList Lin Thread     161     122        0        0      115     123

int64 CList Lin Thread   124     139        0        1      143     137

From the above it is clear that 5.3 and 4.14 works remarkably better than the current version.
Even in bytecode mode this represents a good step forward. However, for some reason, in bytecode mode
both Lin_thread and STM_thread fail to trigger an int64 ref issue. I've not yet figured out why this is the case.

Overview of commits

  • The first two commits makes the above change to the two Lin and STM modes
  • Because the fix works relatively well, the next commits removes the previous experimental alert attributes
  • OCaml 5.0-5.1 raise an exception upon calling Memprof.{start,stop} so the 5th commit wraps them in a handler
  • The next 2 commits improves documentation
  • A bunch of commits adjust the src/neg_tests tests to adjust for the improved behaviour
  • For anyone interested commits ad1d5e4 and a4a635c contain the source for the above stats - I plan to remove these before merging.

@jmid
Copy link
Collaborator Author

jmid commented Mar 25, 2025

FTR, the PR initially used the magic constant of 1e-3 samples per word, after earlier experimenting with higher frequencies of interrupts. Especially in bytecode mode, those seemed to result in too big of a slowdown.
Feedback from @edwintorok however made me realize that I hadn't explored the trade off between the frequency of interrupts (more Thread.yield's) and the Util.repeat count (raising the error rate by repetition). A second look at the code furthermore revealed that all but the stats were still using Util.repeat 100 - a good catch.

I have experimented a bit more by timing src/neg_tests/stm_tests_clist_thread_stats.exe under different sampling_rates and rep_counts. Turning up either of these two increases the error rate (and hence bug finding ability) however doing so increases the test running time (because of more callbacks and more repeated executions, respectively), which will be noticeable for positive tests that run, e.g., 1000 iterations. To keep the experiment manageable I've measured on native code where the stat benchmark takes ~5min compared to ~30min under bytecode...

The initial ~sampling_rate:1e-3 rep_count:25

CList int STM.thread 313 / 10000
CList int64 STM.thread 294 / 10000
265.02user 38.18system 4:52.81elapsed 103%CPU (0avgtext+0avgdata 31680maxresident)k

Now, with ~sampling_rate:1e-1 rep_count:3 the wall clock time is just ~7s slower out of ~5m but with a ~5x better error rate:

CList int STM.thread 1564 / 10000
CList int64 STM.thread 1613 / 10000
294.49user 7.35system 5:01.87elapsed 99%CPU (0avgtext+0avgdata 31880maxresident)k

The above seems like a good compromise between not affecting time usage significantly, while giving the error rate a significant bump upwards 🙂

Both of the above were run with OCaml 5.3.0. For comparison, I've timed the same run on OCaml 5.2.0 (where Memprof is not supported) with rep_count:100 to mimic the state of affairs before this PR:

CList int STM.thread 2 / 10000
CList int64 STM.thread 2 / 10000
412.03user 181.22system 8:57.95elapsed 110%CPU (0avgtext+0avgdata 31732maxresident)k

Going from ~9m to ~5m wall clock time shows the Memprof solution is not only better error-rate-wise but also much faster (the many repetitions is able to trigger issues in 2 out of 10000 test cases).

I also timed the run on OCaml 5.2.0 with rep_count:3 to assess the overhead of Memprof:

CList int STM.thread 0 / 10000
CList int64 STM.thread 0 / 10000
179.91user 5.55system 3:05.20elapsed 100%CPU (0avgtext+0avgdata 31676maxresident)k

so the Memprof callbacks add a ~2m (~5m - ~3m) ~66% overhead on this particular test. Non-allocating tests will naturally incur less of an overhead, while heavily allocating tests may incur a higher one.

Finally, for completeness, below follows the repeated stats from the initial PR description.
These confirm that the error rate further improves across the board - except on 5.2 (and 5.0+5.1),
due to the reduced repetition count and no Memprof support.

Revised STM numbers with sampling_rate:1e-1 and Util.repeat 3

                        5.3.0  5.3.0-bc  5.2.0  5.2.0-bc  4.14.2  4.14.2-bc

int ref STM Thread         0       0        0        0        0       0

int64 ref STM Thread    3826       0        7        0     3891       0

CList int STM.thread    1561    1576        0        0     1639    1650

CList int64 STM.thread  1558    1572        0        0     1579    1662

Revised Lin numbers with sampling_rate:1e-1 and Util.repeat 3

                        5.3.0  5.3.0-bc  5.2.0  5.2.0-bc  4.14.2  4.14.2-bc

int ref Lin Thread         0       0        0        0        0       0

int64 ref Lin Thread    3407       0        0        0     3395       0

int CList Lin Thread     702     710        0        0      700     725

int64 CList Lin Thread   721     665        0        0      677     686

@jmid
Copy link
Collaborator Author

jmid commented Mar 25, 2025

CI summary for ca10290: All 36 workflows passed.

CI summary for 292a3e0:

Out of 36 workflows, 1 failed with a genuine (supposedly Cygwin) error

@jmid jmid force-pushed the improve-thread-mode branch from 292a3e0 to 0ef30bb Compare March 25, 2025 17:04
@jmid
Copy link
Collaborator Author

jmid commented Mar 25, 2025

CI summary for 0ef30bb: all 36 workflows passed.
I'll merge.

@jmid jmid merged commit 895f187 into main Mar 25, 2025
36 checks passed
@jmid jmid deleted the improve-thread-mode branch March 25, 2025 20:58
@jmid
Copy link
Collaborator Author

jmid commented Mar 26, 2025

CI summary for merge to main: all 37 workflows passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve reproducibility of Lin/STM Thread modes
1 participant