You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integrate RMW with concepts from previous sections
1. Use 2 figures to connect concepts from the first 3 sections.
- Figure atomic_rmw illustrates that atomic operations consist of not
only a single operation but a group of operations that need to perform
atomically.
- Figure rmw_communicate shows how this atomic group of operations can
be used on shared resource for communication.
2. Discuss how to ensure the operations of accessing the shared resource
for communication between concurrent threads are correct:
- Use Test and Set and Compare and Swap as examples to illustrate how
this can be achieved.
3. Compare the usage scenarios of Exchange and Fetch and ...
4. Introduce the concept that we can utilize atomic operations to
ensure that a group of operations can perform atomically.
Initially, any optimizing compiler will restructure your code to enhance performance on its target hardware.
218
218
The primary objective is to maintain the operational effect within \emph{the current thread},
219
219
allowing reads and writes to be rearranged to prevent pipeline stalls\footnote{%
220
-
Most \textsc{CPU} architectures execute segments of multiple instructions concurrently to improve throughput (refer to \fig{pipeline}).
220
+
Most \textsc{CPU} architectures execute segments of multiple instructions concurrently to improve throughput (refer to \fig{fig:pipeline}).
221
221
A stall, or suspension of forward progress, occurs when an instruction awaits the outcome of a preceding one in the pipeline until the necessary result becomes available.} or to optimize data locality.\punckern\footnote{%
222
222
\textsc{RAM} accesses data not byte by byte, but in larger units known as \introduce{cache lines}.
223
223
Grouping frequently used variables on the same cache line means they are processed together,
@@ -232,21 +232,21 @@ \section{Background}
232
232
Even without compiler alterations,
233
233
we would face challenges because our hardware complicates matters further!
234
234
Modern \textsc{CPU}s operate in a fashion far more complex than what traditional pipelined methods,
235
-
like those depicted in \fig{pipeline}, suggest.
235
+
like those depicted in \fig{fig:pipeline}, suggest.
236
236
They are equipped with multiple data paths tailored for various instruction types and schedulers that reorder and direct instructions through these paths.
\captionof{figure}{A flowchart depicting how two concurrent programs communicate and coordinate through a shared resource to achieve a goal, accessing the shared resource.}
351
-
\label{atomicity}
351
+
\label{fig:atomicity}
352
352
353
-
Summary of concepts from the first three sections, as shown in \fig{atomicity}.
353
+
Summary of concepts from the first three sections, as shown in \fig{fig:atomicity}.
354
354
In \secref{background}, we observe the importance of maintaining the correct order of operations: t3 \to t4 \to t5 \to t6 \to t7, so that two concurrent programs can function as expected.
355
355
In \secref{seqcst}, we see how two concurrent programs communicate to guarantee the order of operations: t5 \to t6.
356
356
In \secref{atomicity}, we understand that certain operations must be treated as a single atomic step to ensure the order of operations: t3 \to t4 \to t5 and the order of operations: t6 \to t7.
357
357
358
358
\section{Arbitrarily-sized ``atomic'' types}
359
-
359
+
\label{atomictype}
360
360
Along with \cc|atomic_int| and friends,
361
361
\cplusplus{} provides the template \cpp|std::atomic<T>| for defining arbitrary atomic types.
362
362
\clang{}, lacking a similar language feature but wanting to provide the same functionality,
@@ -384,93 +384,87 @@ \section{Read-modify-write}
384
384
\label{rmw}
385
385
386
386
So far we have introduced the importance of order and atomicity.
387
-
The latter ensures that an operation can eventually finish without being interfered by other operations.
388
-
This also establishes ordering between operations, as no operations can occur concurrently.
389
-
For two operations, A and B, either A happens before B or B happens before A.
390
-
As in \secref{seqcst}, a local order of other operations associated to the an atomic object is given as well, with \introduce{sequential consistency} as default consistency level.
391
-
Since happens before relation is transitive, just like $>$ and $<$, a global order is established by combining local order and inter-thread order provided by atomic objects.
392
-
393
-
Atomic loads and stores are all well and good when we don't need to consider the previous state of atomic variables.
394
-
But sometimes we need to read a value, modify it,
395
-
and write it back as a single atomic step.
396
-
That is, the modification is based on the previous state that is visible for reading, and the result is then written back.
387
+
In \secref{seqcst}, we see how an atomic object ensures the order of single store or load operation is not reordered by the compiler within a program.
388
+
Only upon establishing the correct inter-thread order can we continue to pursue how multiple threads can establish a correct cross-thread order.
389
+
After achieving this goal, we can further explore how concurrent threads can coordinate and collaborate smoothly.
390
+
In \secref{atomicity}, there is a need for atomicity to ensure that a group of operations is not only sequentially executed but also completes without being interrupted by operation from other threads.
391
+
This establishes correct order of operations from different threads.
\captionof{figure}{Exchange, Test and Set, Fetch and…, Compare and Swap can all be transformed into atomic RMW operations, ensuring that operations like t1 \to t2 \to t3 will become an atomic step.}
395
+
\label{fig:atomic_rmw}
396
+
397
+
Atomic loads and stores are all well and good when we do not need to consider the previous state of atomic variables, but sometimes we need to read a value, modify it, and write it back as a single atomic step.
398
+
As shown in \fig{fig:atomic_rmw}, the modification is based on the previous state that is visible for reading, and the result is then written back.
397
399
A complete \introduce{read-modify-write} operation is performed atomically to ensure visibility to subsequent operations.
400
+
401
+
Furthermore, for communication between concurrent threads, a shared resource is required, as shown in \fig{fig:atomicity}
402
+
Think back to the discussion in previous sections.
403
+
In order for concurrent threads to collaborate on operating a shared resource, we need a way to communicate.
404
+
Thus, the need for a channel for communication arises with the appearance of the shared resource.
405
+
406
+
As discussed earlier, the process of accessing shared resources responsible for communication must also ensure both order and non-interference.
407
+
To prevent the recursive protection of shared resources,
408
+
atomic operations can be introduced for the shared resources responsible for communication, as shown in \fig{fig:atomic_types}.
398
409
399
-
There are a few common \introduce{read-modify-write} (\textsc{RMW}) operations.
410
+
There are a few common \introduce{read-modify-write} (\textsc{RMW}) operations to make theses operation become a single atomic step.
400
411
In \cplusplus{}, they are represented as member functions of \cpp|std::atomic<T>|.
401
412
In \clang{}, they are freestanding functions.
402
413
403
-
Following example code is a simplify implementation of thread pool to demonstrate the use of \clang{}11 atomic library.
404
-
405
-
\inputminted{c}{./examples/rmw_example.c}
406
-
407
-
Compile the code with \monobox{gcc rmw\_example.c -o rmw\_example -Wall -Wextra -std=c11 -pthread} and execute the program.
408
-
A thread pool has three states: idle, cancelled and running.
409
-
It is initialized with \monobox{N\_THREADS} (default 8) of threads.
410
-
\monobox{N\_JOBS} (default 16) of jobs are added, and the pool is then set to running.
411
-
A job is simply echoing its job ID.
412
-
\monobox{sleep(1)} is used to ensure that the second batch of jobs is added after the first batch is finished; otherwise, jobs may not be consumed as expected.
413
-
Thread pool is then destroyed right after starting running.
\captionof{figure}{Test and Set (Left) and Compare and Swap (Right) leverage their functionality of checking and their atomicity to make other RMW operations perform atomically.
416
+
The red color represents atomic RMW operations, while the blue color represents RMW operations that behave atomically.}
417
+
\label{fig:atomic_types}
438
418
439
419
\subsection{Exchange}
440
420
\label{exchange}
441
-
442
-
The simplest atomic \textsc{RMW} operation is an \introduce{exchange}:
443
-
the current value is read and replaced with a new one.
444
-
In function \monobox{thread\_pool\_destroy}, \monobox{atomic\_exchange(\&thrd\_pool->state, cancelled)} reads current state and replaces it with "cancelled". A warning message is printed if the pool is destroyed when still running.
445
-
If the exchange is not performed atomically, we may initially get the state as "running". Subsequently, a thread could set the state to "cancelled" after finishing the last one, resulting in a false warning.
421
+
Transform \textsc{RMW} into modifying a private variable first,
422
+
and then directly swapping the private variable with the shared variable.
423
+
Therefore, we only need to ensure that the second step,
424
+
which involves Read that load the shared variable and then Modify and Write that exchange it with the private variable,
425
+
is a single atomic step.
426
+
This allows programmers to extensively modify the private variable beforehand and only write it to the shared variable when necessary.
446
427
447
428
\subsection{Test and set}
448
-
429
+
\label{Testandset}
449
430
\introduce{Test-and-set} works on a Boolean value:
450
431
we read it, set it to \cpp|true|, and provide the value it held beforehand.
451
432
\clang{} and \cplusplus{} offer a type dedicated to this purpose, called \monobox{atomic\_flag}.
452
-
The value of the flag is indeterminate until initialized with \monobox{ATOMIC\_FLAG\_INIT} macro.
453
-
A thread pool has a \monobox{atomic\_flag} indicating it's initialized or not. The flag ensures initialization is thread-safe, preventing a pool from being reinitialized.
454
-
Function \monobox{thread\_pool\_init} sets the flag with \monobox{atomic\_flag\_test\_and\_set(\&thrd\_pool->initialezed)} first.
455
-
If the return value is \monobox{true}, initialization is not performed again.
456
-
Function \monobox{thread\_pool\_destroy} clears the flag with \monobox{atomic\_flag\_clear(\&thrd\_pool->initialezed)} after destroying everything.
433
+
The initial value of an \monobox{atomic\_flag} is indeterminate until initialized with \monobox{ATOMIC\_FLAG\_INIT} macro.
457
434
458
-
\subsection{Fetch and…}
435
+
\introduce{Test-and-set} operations are not limited to just \textsc{RMW} functions;
436
+
they can also be utilized for constructing simple spinlock.
437
+
In this scenario, the flag acts as a shared resource for communication between threads.
438
+
Thus, spinlock implemented with \introduce{Test-and-set} operations ensures that entire \textsc{RMW} operations on shared resources are performed atomically, as shown in \fig{fig:atomic_types}.
439
+
\label{spinlock}
440
+
\begin{ccode}
441
+
atomic_flag af = ATOMIC_FLAG_INIT;
459
442
460
-
We can also read a value,
461
-
perform a simple operation on it (such as addition, subtraction,
462
-
or bitwise \textsc{AND}, \textsc{OR}, \textsc{XOR}) and return its previous value,
463
-
all as part of a single atomic operation.
464
-
In the function \monobox{thread\_pool\_destroy}, \monobox{atomic\_fetch\_and} is utilized as a means to set the state to idle.
465
-
Yet, in this case, it is not necessary, as the pool needs to be reinitialized for further use regardless.
466
-
Its return value could be further utilized, for instance, to report the previous state and perform additional actions.
443
+
void lock()
444
+
{
445
+
while (atomic_flag_test_and_set(&af)) { /* wait */ }
446
+
}
447
+
448
+
void unlock() { atomic_flag_clear(&af); }
449
+
\end{ccode}
450
+
If we call \cc|lock()| and the previous value is \cc|false|,
451
+
we are the first to acquire the lock,
452
+
and can proceed with exclusive access to whatever the lock protects.
453
+
If the previous value is \cc|true|,
454
+
someone else has acquired the lock and we must wait until they release it by clearing the flag.
455
+
456
+
\subsection{Fetch and…}
457
+
Transform \textsc{RMW} to directly modify the shared variable (such as addition, subtraction,
458
+
or bitwise \textsc{AND}, \textsc{OR}, \textsc{XOR}) and return its previous value,
459
+
all as part of a single atomic operation.
460
+
Compare with \introduce{Exchange} \secref{exchange}, when programmers only need to make simple modification to the shared variable,
461
+
they can use \introduce{Fetch and…}.
467
462
468
463
\subsection{Compare and swap}
469
464
\label{cas}
470
-
471
465
Finally, we have \introduce{compare-and-swap} (\textsc{CAS}),
472
466
sometimes called \introduce{compare-and-exchange}.
473
-
It allows us to conditionally exchange a value \emph{if} its previous value matches some expected one.
467
+
It allows us to conditionally exchange a value \emph{if} its previous value matches the expected one.
474
468
In \clang{} and \cplusplus{}, \textsc{CAS} resembles the following,
475
469
if it were executed atomically:
476
470
\begin{ccode}
@@ -492,6 +486,55 @@ \subsection{Compare and swap}
492
486
Indeed, there is. However, we will delve into that topic later in \secref{spurious-llsc-failures}.
493
487
\end{samepage}
494
488
489
+
Because \textsc{CAS} involves an expected value comparison,
490
+
it allows \textsc{CAS} operations to extend beyond just \textsc{RMW} functions.
491
+
Here's how it works: First, read the shared resource and use this value as the expected value.
492
+
Modify the private variable, and then \textsc{CAS}. Compare the current shared variable with the expected shared variable.
493
+
If they match, it indicates that modify is exclusive, ant then write by swaping the shared variable with the private variable.
494
+
If they don't match, it implies that interference from another thread has occurred.
495
+
Subsequently, update the expected value with the current shared value and retry modify in a loop.
496
+
This iterative process allows \textsc{CAS} to serve as a communication mechanism between threads,
497
+
ensuring that entire \textsc{RMW} operations on shared resources are performed atomically.
498
+
As shown in \fig{fig:atomic_types}, compared with \introduce{Test-and-set} \secref{Testandset},
499
+
a thread that employs \textsc{CAS} can directly use the shared resource to check.
500
+
It uses atomic \textsc{CAS} to ensure that Modify is atomic,
501
+
coupled with a while loop to ensure that the entire \textsc{RMW} can behave atomically.
502
+
503
+
\subsection{example}
504
+
\label{rmw_example}
505
+
Following example code is a simplify implementation of thread pool to demonstrate the use of \clang{}11 atomic library.
506
+
507
+
\inputminted{c}{./examples/rmw_example.c}
508
+
509
+
%Compile the code with \monobox{gcc rmw\_example.c -o rmw\_example -Wall -Wextra -std=c11 -pthread} and execute the program.
510
+
%A thread pool has three states: idle, cancelled and running.
511
+
%It is initialized with \monobox{N\_THREADS} (default 8) of threads.
512
+
%\monobox{N\_JOBS} (default 16) of jobs are added, and the pool is then set to running.
513
+
%A job is simply echoing its job ID.
514
+
%\monobox{sleep(1)} is used to ensure that the second batch of jobs is added after the first batch is finished; otherwise, jobs may not be consumed as expected.
515
+
%Thread pool is then destroyed right after starting running.
516
+
Stdout of the program is:
517
+
\begin{ccode}
518
+
PI calculated with 101 terms: 3.141592653589793
519
+
\end{ccode}
520
+
521
+
\textbf{Exchange}
522
+
In function \monobox{thread\_pool\_destroy}, \monobox{atomic\_exchange(\&thrd\_pool->state, cancelled)} reads current state and replaces it with "cancelled". A warning message is printed if the pool is destroyed when still running.
523
+
If the exchange is not performed atomically, we may initially get the state as "running". Subsequently, a thread could set the state to "cancelled" after finishing the last one, resulting in a false warning.
524
+
525
+
\textbf{Test and set}
526
+
In the example, the scenario is as follows:
527
+
First, the main thread initially acquire a lock \monobox{future->flag} and then set it true,
528
+
which is akin to creating a job and then transfer its ownership to the worker.
529
+
Subsequently, the main thread will be blocked until the worker clear the flag.
530
+
This inidcate the main thread will wail until the worker completes the job and return the ownership back to the main thread, which ensure correct cooperation.
531
+
532
+
\textbf{Fetch and…}
533
+
In the function \monobox{thread\_pool\_destroy}, \monobox{atomic\_fetch\_and} is utilized as a means to set the state to idle.
534
+
Yet, in this case, it is not necessary, as the pool needs to be reinitialized for further use regardless.
535
+
Its return value could be further utilized, for instance, to report the previous state and perform additional actions.
536
+
537
+
\textbf{Compare and swap}
495
538
Once threads are created in the thread pool as workers, they will continuously search for jobs to do.
496
539
Jobs are taken from the tail of job queue.
497
540
To claim a job without it being taken by another worker halfway through, we need to atomically change the pointer to the last job. Otherwise the last job is under races.
@@ -516,7 +559,7 @@ \subsection{Compare and swap}
516
559
The following diff patch removes the atomicity of claiming a job and uses pthread instead of \clang{}11 thread, because thread sanitizer currently hasn't support \clang{}11 thread yet.
517
560
Save diff as \monobox{racer.diff} and patch the example code by \monobox{\$ patch rmw\_example.c race.diff}.
518
561
519
-
\inputminted{diff}{./examples/racer.diff}
562
+
%\inputminted{diff}{./examples/racer.diff}
520
563
521
564
After compiling and running the example, you will see warning messages printed and same job IDs got echoed repeatly.
522
565
The top two sections of a warning message indicate which two threads executed which function causing the data race.
0 commit comments