Skip to content

Commit 3833bb9

Browse files
committed
Add new subsection "Memory consistency model"
1 parent f0ee6bc commit 3833bb9

File tree

6 files changed

+64
-4
lines changed

6 files changed

+64
-4
lines changed

concurrency-primer.tex

Lines changed: 64 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1090,6 +1090,66 @@ \section{Do we always need sequentially consistent operations?}
10901090

10911091
\section{Memory orderings}
10921092

1093+
\subsection{Memory consistency models}
1094+
1095+
When a program is compiled and executed, it doesn't always follow the written order. The system may change the sequence and optimize it to simulate line-by-line execution, as long as the final result matches the expected outcome.
1096+
1097+
This requires an agreement between the programmer and the system (hardware, compiler, etc.), ensuring that if the rules are followed, the execution will be correct. Correctness here means defining permissible outcomes among all possible results, known as Memory Consistency Models. These models allow the system to optimize while ensuring correct execution.
1098+
1099+
Memory Consistency Models operate at various levels. For example, when machine code runs on hardware, processors can reorder and optimize instructions, and the results must match expectations. Similarly, when converting high-level languages to assembly, compilers can rearrange instructions while ensuring consistent outcomes. Thus, from source code to hardware execution, agreements must ensure the expected results.
1100+
1101+
One can envision hardware that achieves sequential consistency as follows: each thread has direct access to shared memory, and memory processes only one read or write operation at a time. This naturally ensures sequential consistency.
1102+
1103+
\subsubsection{Sequential consistency (SC)}
1104+
1105+
In the 1970s, Leslie Lamport proposed the most common memory consistency model, Sequential Consistency (SC), defined as follows:
1106+
1107+
\begin{quote}
1108+
A multiprocessor system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.
1109+
\end{quote}
1110+
1111+
On modern processors, ensuring sequential consistency involves many optimization constraints, which slow down program execution. If some conventions are relaxed, such as not guaranteeing program order within each processing unit, performance can be further improved.
1112+
1113+
A memory consistency model is a conceptual convention. This means the program's execution results must conform to this model. However, when a program is compiled and run on computer hardware, there is significant flexibility in adjusting the execution order. As long as the execution results match the predefined convention, the actual order can vary depending on the circumstances.
1114+
1115+
It is important to note that sequential consistency does not imply a single order or a single result for the program. On the contrary, sequential consistency only requires that the program appears to execute in some interleaved order on a single thread, meaning a sequentially consistent program can still have multiple possible results.
1116+
1117+
To enhance the understanding of sequential consistency, consider the following simple example. Two threads write to and read from two shared variables \monobox{x} and \monobox{y}, both initially set to \monobox{0}.
1118+
1119+
\begin{ccode}
1120+
int x = 0;
1121+
int y = 0;
1122+
1123+
// Thread 1 // Thread 2
1124+
x = 1; r1 = y;
1125+
y = 1; r2 = x;
1126+
\end{ccode}
1127+
1128+
If this program satisfies sequential consistency, then for Thread 1, \monobox{x = 1} must occur before \monobox{y = 1}, and for Thread 2, \monobox{r1 = y} must occur before \monobox{r2 = x}. For the entire program, the following six execution orders are possible:
1129+
1130+
\begin{verbatim}
1131+
| x = 1 | x = 1 | x = 1 |
1132+
| y = 1 | r1 = y(0) | r1 = y(0) |
1133+
| r1 = y(1) | y = 1 | r2 = x(1) |
1134+
| r2 = x(1) | r2 = y(1) | y = 1 |
1135+
+-------------------+-------------------+-------------------+
1136+
| r1 = y(0) | r1 = y(0) | r1 = y(0) |
1137+
| x = 1 | x = 1 | r2 = x(0) |
1138+
| y = 1 | r2 = x(1) | x = 1 |
1139+
| r2 = x(1) | y = 1 | y = 1 |
1140+
\end{verbatim}
1141+
1142+
Observing these orders, we see that none result in \monobox{r1 = 1} and \monobox{r2 = 0}. Thus, sequential consistency only allows the outcomes \monobox{(r1, r2)} to be \monobox{(1, 1)}, \monobox{(0, 1)}, and \monobox{(0, 0)}. With this convention, software can expect that \monobox{(1, 0)} will not occur, and hardware can optimize as long as it ensures the result \monobox{(1, 0)} does not appear.
1143+
1144+
We can imagine sequentially consistent hardware as the figure \ref{hw-seq-cst} shows: each thread can directly access shared memory, and memory processes one read or write operation at a time, naturally ensuring sequential consistency. In fact, there are multiple ways to implement sequentially consistent hardware. It can even include caches and be banked, as long as it ensures that the results behave the same as the aforementioned model.
1145+
1146+
\centering
1147+
\includegraphics[keepaspectratio,width=0.7\linewidth]{images/hw-seq-cst}
1148+
\captionof{figure}{The memory model of sequentially consistent hardware.}
1149+
\label{hw-seq-cst}
1150+
1151+
\subsection{C11/C++11 atomics}
1152+
10931153
By default, all atomic operations, including loads, stores, and various forms of \textsc{RMW},
10941154
are considered sequentially consistent.
10951155
However, this is just one among many possible orderings.
@@ -1136,7 +1196,7 @@ \section{Memory orderings}
11361196
let's look at what these orderings are and how we can use them.
11371197
As it turns out, almost all of the examples we have seen so far do not actually need sequentially consistent operations.
11381198

1139-
\subsection{Acquire and release}
1199+
\subsubsection{Acquire and release}
11401200

11411201
We have just examined the acquire and release operations in the context of the lock example from \secref{lock-example}.
11421202
You can think of them as ``one-way'' barriers: an acquire operation permits other reads and writes to move past it,
@@ -1201,7 +1261,7 @@ \subsection{Acquire and release}
12011261
}
12021262
\end{cppcode}
12031263

1204-
\subsection{Relaxed}
1264+
\subsubsection{Relaxed}
12051265
Relaxed atomic operations are useful for variables shared between threads where \emph{no specific order} of operations is needed.
12061266
Although it may seem like a niche requirement, such scenarios are quite common.
12071267

@@ -1243,7 +1303,7 @@ \subsection{Relaxed}
12431303
a \textsc{CAS} loop is performed to claim a job.
12441304
All of the loads can be relaxed as we do not need to enforce any order until we have successfully modified our value.
12451305

1246-
\subsection{Acquire-Release}
1306+
\subsubsection{Acquire-Release}
12471307

12481308
\cc|memory_order_acq_rel| is used with atomic \textsc{RMW} operations that need to both load-acquire \emph{and} store-release a value.
12491309
A typical example involves thread-safe reference counting,
@@ -1293,7 +1353,7 @@ \subsection{Acquire-Release}
12931353
experts-only construct we have in the language.
12941354
\end{quote}
12951355

1296-
\subsection{Consume}
1356+
\subsubsection{Consume}
12971357

12981358
Last but not least, we introduce \cc|memory_order_consume|.
12991359
Imagine a situation where data changes rarely but is frequently read by many threads.

images/hw-relaxed.pdf

70.1 KB
Binary file not shown.

images/hw-seq-cst.pdf

13.4 KB
Binary file not shown.

images/hw-tso.pdf

19.3 KB
Binary file not shown.

images/race-free.pdf

6.89 KB
Binary file not shown.

images/race.pdf

4.45 KB
Binary file not shown.

0 commit comments

Comments
 (0)