Skip to content

Commit ba9ad9c

Browse files
committed
RFC: CFI Improvements with PAuth and BTI
Improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification. Copyright (c) 2021, Arm Limited.
1 parent 2821d03 commit ba9ad9c

File tree

1 file changed

+224
-0
lines changed

1 file changed

+224
-0
lines changed
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# Summary
2+
[summary]: #summary
3+
4+
This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two
5+
technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target
6+
Identification.
7+
8+
# Motivation
9+
[motivation]: #motivation
10+
11+
The [security model of WebAssembly][wasm-security] ensures that Wasm modules execute in a sandboxed
12+
environment isolated from the host runtime. One aspect of that model is that it provides implicit
13+
control flow integrity (CFI) by forcing all function call targets to specify a valid entry in the
14+
function index space, by using a protected call stack that is not affected by buffer overflows in
15+
the module heap, and so on. As a result, in some Wasm applications the runtime is able to execute
16+
untrusted code safely. However, the burden of ensuring that the security properties are upheld is
17+
placed on the compiler to a large extent.
18+
19+
On the other hand, a further aspect of the WebAssembly design is efficient execution (close to
20+
native speed), which leads to a natural tendency towards sophisticated optimizing compilers.
21+
Unfortunately, the additional complexity increases the risk of implementation problems and in
22+
particular compromises of the security properties. For example, Cranelift has been affected by
23+
issues such as [CVE-2021-32629][cve] that could make it possible to access the protected call stack
24+
or memory that is private to the host runtime.
25+
26+
We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as
27+
expanding fuzzing and making it possible to apply formal verification to at least some parts of the
28+
compilation process. However, it is also reasonable to consider a defense in depth strategy and to
29+
evaluate mitigations for potential future issues.
30+
31+
Finally, Wasmtime can be used as a library and in particular embedded into an application that is
32+
implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that
33+
case the compiled WebAssembly code could provide convenient instruction sequences for attacks that
34+
subvert normal control flow and that originate from the embedder's code, even if Cranelift and
35+
Wasmtime themselves lack any defects.
36+
37+
[cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5
38+
[wasm-security]: https://webassembly.org/docs/security
39+
40+
# Proposal
41+
[proposal]: #proposal
42+
43+
Currently this proposal focuses on the AArch64 execution environment.
44+
45+
## Background
46+
47+
The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e.
48+
provides back-edge CFI. It is described in section D5.1.5 of
49+
[the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP`
50+
instructions when executed by a processor that does not support the extension. Furthermore, a code
51+
generator can use either one of two keys (A and B) for the pointer authentication instructions; the
52+
architecture does not impose any restrictions on any of them, leaving that to the software
53+
environment.
54+
55+
The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is
56+
provides forward-edge CFI and is described in section D5.4.4. Whether BTI applies to an executable
57+
memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad"
58+
for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for
59+
processors that do not support BTI).
60+
61+
Both extensions are applicable only to the AArch64 execution state and are optional, so the usage of
62+
each CFI technique will be controlled by dedicated settings. Wasmtime embedders need to consider a
63+
subtlety - the setting values may happen to be located in memory that could be potentially
64+
accessible to an attacker, so the latter could disable the use of PAuth and BTI in subsequent code
65+
generation. Mitigating this issue is outside the scope of this proposal.
66+
67+
The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] and the whitepaper
68+
[*Pointer Authentication on ARMv8.3*][qualcomm-pauth] provide an introduction to the technologies.
69+
70+
In the Intel® 64 architecture [the Control-Flow Enforcement Technology (CET)][intel-cet] provides
71+
similar capabilities.
72+
73+
[arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en
74+
[code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story
75+
[intel-cet]: https://www.intel.com/content/www/us/en/developer/articles/technical/technical-look-control-flow-enforcement-technology.html
76+
[qualcomm-pauth]: https://www.qualcomm.com/documents/whitepaper-pointer-authentication-armv83
77+
78+
## Improved back-edge CFI with PAuth
79+
80+
Assuming that the A key is used, the proposed implementation will add the `PACIASP` instruction to
81+
the beginning of every function compiled by Cranelift and will replace the final return with either
82+
the `RETAA` instruction or a combination of `AUTIASP` and `RET`.
83+
84+
In environments that use the DWARF format for unwinding the implementation will be modified to apply
85+
the `DW_CFA_AARCH64_negate_ra_state` operation or an equivalent immediately after the `PACIASP`
86+
instruction.
87+
88+
Those steps will be skipped for simple leaf functions that do not construct frame records on the
89+
stack.
90+
91+
As a conrete example, consider the following function:
92+
93+
```plain
94+
function %f() {
95+
fn0 = %g()
96+
97+
block0:
98+
call fn0()
99+
return
100+
}
101+
```
102+
103+
Without the proposal it will result in the generation of:
104+
105+
```plain
106+
stp fp, lr, [sp, #-16]!
107+
mov fp, sp
108+
ldr x0, 1f
109+
b 2f
110+
1:
111+
.byte 0x00, 0x00, 0x00, 0x00
112+
.byte 0x00, 0x00, 0x00, 0x00
113+
2:
114+
blr x0
115+
ldp fp, lr, [sp], #16
116+
ret
117+
```
118+
119+
And with the proposal:
120+
121+
```plain
122+
paciasp
123+
stp fp, lr, [sp, #-16]!
124+
mov fp, sp
125+
ldr x0, 1f
126+
b 2f
127+
1:
128+
.byte 0x00, 0x00, 0x00, 0x00
129+
.byte 0x00, 0x00, 0x00, 0x00
130+
2:
131+
blr x0
132+
ldp fp, lr, [sp], #16
133+
retaa
134+
```
135+
136+
Associated AArch64-specific Cranelift settings - the default values are always `false`:
137+
* `has_pauth` - specifies whether the target environment supports PAuth
138+
* `sign_return_address` - the main setting controlling whether the back-edge CFI implementation is
139+
used; results in the generation of operations that act as `NOP` instructions unless `has_pauth` is
140+
also enabled
141+
* `sign_return_address_all` - specifies that all function return addresses will be authenticated,
142+
including the previously mentioned cases that do not need it in principle
143+
* `sign_return_address_with_bkey` - changes the generated instructions to use the B key; note that
144+
this is enforced for any Apple ABI, irrespective of the value of this setting
145+
146+
## Enhanced forward-edge CFI with BTI
147+
148+
The proposed implementation will add the `BTI j` instruction to the beginning of every basic block
149+
that is the target of an indirect branch and that is not a function prologue. Note that in the
150+
AArch64 backend generated function calls always target function prologues and indirect branches that
151+
do not act like function calls appear only in the implementation of the `br_table` IR operation.
152+
On the other hand, function prologues will begin with the `BTI c` instruction, keeping in mind that
153+
Cranelift does not have any special handling of tail calls. If PAuth is used at the same time, then
154+
the initial `PACIASP`/`PACIBSP` operation will act as a landing pad instead.
155+
156+
There is only one associated AArch64-specific Cranelift setting, `use_bti`, which is `false` by
157+
default. Wasmtime will set the respective memory protection attribute for all executable pages if
158+
the WebAssembly module has been compiled with that setting enabled; similarly for the Cranelift JIT.
159+
160+
## CFI improvements to code that is not compiled by Cranelift
161+
162+
Currently the code that is not compiled by Cranelift is in assembly, C, C++, or Rust.
163+
164+
Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of
165+
this proposal, but in general it should be achievable by passing the appropriate parameters to the
166+
respective compiler.
167+
168+
Functions implemented in assembly will get a similar treatment as generated code, i.e. they will
169+
start with the `PACIASP` instruction (and any unwinding directives), assuming that the A key is
170+
used. However, the regular return will be preserved and instead will be preceded by the `AUTIASP`
171+
instruction. The reason is that both `AUTIASP` and `PACIASP` act as `NOP` instructions when executed
172+
by a processor that does not support PAuth, thus making the assembly code generic. Functions that do
173+
not need the pointer authentication operations will start with the `BTI c` instruction instead.
174+
175+
One potential problem in the interaction between code that is compiled by Cranelift and code that is
176+
not is that only one side might have the CFI enhancements. However, this proposal does not have any
177+
ABI implications, so Rust code in the Wasmtime implementation that does not use PAuth and BTI, for
178+
example, would be able to call functions compiled by Cranelift without any issues and vice versa.
179+
The reason is that it is the responsibility of the callee to ensure that PAuth is used correctly,
180+
while everything is transparent to the caller. As for BTI, if an executable memory page does not
181+
have the respective attribute set, then the extension does not have any effect, except for
182+
introducing extra `NOP` instructions, irrespective of how the code has been reached (e.g. via a
183+
branch from a page with BTI protections enabled); similarly for branches out of the unprotected
184+
page. The major exception that is relevant to Wasmtime is unwinding, but there should be no issues
185+
as long as the abovementioned DWARF operation is used and the system unwinder is recent.
186+
187+
Future work that is beyond what this proposal presents may introduce further hardening that
188+
necessitates ABI changes, e.g. by being based on
189+
[the proposed PAuth ABI extension to ELF][pauth-abi] or something similar.
190+
191+
[pauth-abi]: https://github.com/ARM-software/abi-aa/blob/2021Q3/pauthabielf64/pauthabielf64.rst
192+
193+
### Fiber implementation in Wasmtime
194+
195+
The fiber implementation in Wasmtime consists of a significant amount of assembly code that will
196+
receive the treatment described in the previous section, as an initial implementation. However, the
197+
fiber switching code saves the values of all callee-saved registers on the stack, i.e. memory that
198+
is potentially accessible to an adversary. Some of those values could be code addresses that would
199+
be used by indirect branches, so a complete CFI implementation will verify the integrity of the
200+
saved state with the `PACGA` instruction.
201+
202+
# Rationale and alternatives
203+
[rationale-and-alternatives]: #rationale-and-alternatives
204+
205+
Since the existing implementation already uses the standard back-edge CFI techniques that are
206+
preferred in the absence of special hardware support (i.e. a separate protected stack that is not
207+
used for buffers that could be accessed out of bounds), the alternative is not to implement the
208+
proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code
209+
size the impact of the back-edge CFI improvements is 1 or 2 additional instructions per function.
210+
211+
The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the
212+
forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch
213+
to check if its destination is permitted. While the overhead of this approach can be reduced by
214+
using efficient data structures for the destination address lookup and optionally limiting the
215+
checks only to indirect function calls, it is still significantly larger than the worst-case BTI
216+
overhead of one instruction per basic block per function. On the other hand, it does not require any
217+
special hardware support, so it could be applied to all supported platforms.
218+
219+
[clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
220+
221+
# Open questions
222+
[open-questions]: #open-questions
223+
224+
- What is the performance overhead of the proposal?

0 commit comments

Comments
 (0)