From f4e730465747cde41416f90359f55736f4e66eb5 Mon Sep 17 00:00:00 2001 From: Denis Bakhvalov Date: Mon, 23 Sep 2024 14:09:31 -0400 Subject: [PATCH] [Grammar] Update 10-3 Replace branches with predication.md --- .../10-3 Replace branches with predication.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapters/10-Optimizing-Branch-Prediction/10-3 Replace branches with predication.md b/chapters/10-Optimizing-Branch-Prediction/10-3 Replace branches with predication.md index a6e274dbd7..d531350994 100644 --- a/chapters/10-Optimizing-Branch-Prediction/10-3 Replace branches with predication.md +++ b/chapters/10-Optimizing-Branch-Prediction/10-3 Replace branches with predication.md @@ -14,7 +14,7 @@ if (cond) { /* frequently mispredicted */ => int y = computeY(); foo(a); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -For the code on the right, the compiler can replace the branch that comes from the ternary operator, and generate a `CMOV` x86 instruction instead. A `CMOVcc` instruction checks the state of one or more of the status flags in the `EFLAGS` register (`CF, OF, PF, SF` and `ZF`) and performs a move operation if the flags are in a specified state or condition. A similar transformation can be done for floating-point numbers with `FCMOVcc,VMAXSS/VMINSS` instructions. In the ARM ISA, there is `CSEL` (conditional selection) instruction, but also `CSINC` (select and increment), `CSNEG` (select and negate), and a few other conditional instructions. +For the code on the right, the compiler can replace the branch that comes from the ternary operator, and generate a `CMOV` x86 instruction instead. A `CMOVcc` instruction checks the state of one or more of the status flags in the `EFLAGS` register (`CF, OF, PF, SF` and `ZF`) and performs a move operation if the flags are in a specified state or condition. A similar transformation can be done for floating-point numbers with `FCMOVcc, VMAXSS/VMINSS` instructions. In the ARM ISA, there is `CSEL` (conditional selection) instruction, but also `CSINC` (select and increment), `CSNEG` (select and negate), and a few other conditional instructions. Listing: Replacing Branches with Selection - x86 assembly code. @@ -33,7 +33,7 @@ Listing: Replacing Branches with Selection - x86 assembly code. [@lst:ReplaceBranchesWithSelectionAsm] shows assembly listings for the original and the branchless version. In contrast with the original version, the branchless version doesn't have jump instructions. However, the branchless version calculates both `x` and `y` independently, and then selects one of the values and discards the other. While this transformation eliminates the penalty of a branch misprediction, it is doing more work than the original code. -We already know that the branch in the original version on the left is hard to predict. This is what motivates us to try a branchless version in the first place. In this example, the performance gain of this change depends on the characteristics of `computeX` and `computeY` functions. If the functions are small[^1] and the compiler can inline them, then selection might bring noticeable performance benefits. If the functions are big[^2], it might be cheaper to take the cost of a branch mispredict than to execute both `computeX` and `computeY` functions. Ultimately, performance measurements always decide which version is better. +We already know that the branch in the original version on the left is hard to predict. This is what motivates us to try a branchless version in the first place. In this example, the performance gain of this change depends on the characteristics of the `computeX` and `computeY` functions. If the functions are small[^1] and the compiler can inline them, then selection might bring noticeable performance benefits. If the functions are big[^2], it might be cheaper to take the cost of a branch mispredict than to execute both `computeX` and `computeY` functions. Ultimately, performance measurements always decide which version is better. Take a look at [@lst:ReplaceBranchesWithSelectionAsm] one more time. On the left, a processor can predict, for example, that the `je 400514` branch will be taken, speculatively call `computeY`, and start running code from the function `foo`. Remember, branch prediction usually happens many cycles before we know the actual outcome of the branch. By the time we start resolving the branch, we could be already halfway through the `foo` function, despite it is still speculative. If we are correct, we've saved a lot of cycles. If we are wrong, we have to take the penalty and start over from the correct path. In the latter case, we don't gain anything from the fact that we have already completed a portion of `foo`, it all must be thrown away. If the mispredictions occur too often, the recovering penalty outweighs the gains from speculative execution.