|
| 1 | +# Summary |
| 2 | +[summary]: #summary |
| 3 | + |
| 4 | +This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two |
| 5 | +technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target |
| 6 | +Identification. |
| 7 | + |
| 8 | +# Motivation |
| 9 | +[motivation]: #motivation |
| 10 | + |
| 11 | +The [security model of WebAssembly][wasm-security] ensures that Wasm modules execute in a sandboxed |
| 12 | +environment isolated from the host runtime. One aspect of that model is that it provides implicit |
| 13 | +control flow integrity (CFI) by forcing all function call targets to specify a valid entry in the |
| 14 | +function index space, by using a protected call stack that is not affected by buffer overflows in |
| 15 | +the module heap, and so on. As a result, in some Wasm applications the runtime is able to execute |
| 16 | +untrusted code safely. However, the burden of ensuring that the security properties are upheld is |
| 17 | +placed on the compiler to a large extent. |
| 18 | + |
| 19 | +On the other hand, a further aspect of the WebAssembly design is efficient execution (close to |
| 20 | +native speed), which leads to a natural tendency towards sophisticated optimizing compilers. |
| 21 | +Unfortunately, the additional complexity increases the risk of implementation problems and in |
| 22 | +particular compromises of the security properties. For example, Cranelift has been affected by |
| 23 | +issues such as [CVE-2021-32629][cve] that could make it possible to access the protected call stack |
| 24 | +or memory that is private to the host runtime. |
| 25 | + |
| 26 | +We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as |
| 27 | +expanding fuzzing and making it possible to apply formal verification to at least some parts of the |
| 28 | +compilation process. However, it is also reasonable to consider a defense in depth strategy and to |
| 29 | +evaluate mitigations for potential future issues. |
| 30 | + |
| 31 | +Finally, Wasmtime can be used as a library and in particular embedded into an application that is |
| 32 | +implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that |
| 33 | +case the compiled WebAssembly code could provide convenient instruction sequences for attacks that |
| 34 | +subvert normal control flow and that originate from the embedder's code, even if Cranelift and |
| 35 | +Wasmtime themselves lack any defects. |
| 36 | + |
| 37 | +[cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5 |
| 38 | +[wasm-security]: https://webassembly.org/docs/security |
| 39 | + |
| 40 | +# Proposal |
| 41 | +[proposal]: #proposal |
| 42 | + |
| 43 | +Currently this proposal focuses on the AArch64 execution environment. |
| 44 | + |
| 45 | +## Background |
| 46 | + |
| 47 | +The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e. |
| 48 | +provides back-edge CFI. It is described in section D5.1.5 of |
| 49 | +[the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP` |
| 50 | +instructions when executed by a processor that does not support the extension. Furthermore, a code |
| 51 | +generator can use either one of two keys (A and B) for the pointer authentication instructions; the |
| 52 | +architecture does not impose any restrictions on any of them, leaving that to the software |
| 53 | +environment. |
| 54 | + |
| 55 | +The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is |
| 56 | +provides forward-edge CFI and is described in section D5.4.4. Whether BTI applies to an executable |
| 57 | +memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad" |
| 58 | +for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for |
| 59 | +processors that do not support BTI). |
| 60 | + |
| 61 | +Both extensions are applicable only to the AArch64 execution state and are optional, so the usage of |
| 62 | +each CFI technique will be controlled by dedicated settings. Wasmtime embedders need to consider a |
| 63 | +subtlety - the setting values may happen to be located in memory that could be potentially |
| 64 | +accessible to an attacker, so the latter could disable the use of PAuth and BTI in subsequent code |
| 65 | +generation. Mitigating this issue is outside the scope of this proposal. |
| 66 | + |
| 67 | +The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] and the whitepaper |
| 68 | +[*Pointer Authentication on ARMv8.3*][qualcomm-pauth] provide an introduction to the technologies. |
| 69 | + |
| 70 | +In the Intel® 64 architecture [the Control-Flow Enforcement Technology (CET)][intel-cet] provides |
| 71 | +similar capabilities. |
| 72 | + |
| 73 | +[arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en |
| 74 | +[code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story |
| 75 | +[intel-cet]: https://www.intel.com/content/www/us/en/developer/articles/technical/technical-look-control-flow-enforcement-technology.html |
| 76 | +[qualcomm-pauth]: https://www.qualcomm.com/documents/whitepaper-pointer-authentication-armv83 |
| 77 | + |
| 78 | +## Improved back-edge CFI with PAuth |
| 79 | + |
| 80 | +Assuming that the A key is used, the proposed implementation will add the `PACIASP` instruction to |
| 81 | +the beginning of every function compiled by Cranelift and will replace the final return with either |
| 82 | +the `RETAA` instruction or a combination of `AUTIASP` and `RET`. |
| 83 | + |
| 84 | +In environments that use the DWARF format for unwinding the implementation will be modified to apply |
| 85 | +the `DW_CFA_AARCH64_negate_ra_state` operation or an equivalent immediately after the `PACIASP` |
| 86 | +instruction. |
| 87 | + |
| 88 | +Those steps will be skipped for simple leaf functions that do not construct frame records on the |
| 89 | +stack. |
| 90 | + |
| 91 | +As a conrete example, consider the following function: |
| 92 | + |
| 93 | +```plain |
| 94 | +function %f() { |
| 95 | + fn0 = %g() |
| 96 | +
|
| 97 | +block0: |
| 98 | + call fn0() |
| 99 | + return |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +Without the proposal it will result in the generation of: |
| 104 | + |
| 105 | +```plain |
| 106 | + stp fp, lr, [sp, #-16]! |
| 107 | + mov fp, sp |
| 108 | + ldr x0, 1f |
| 109 | + b 2f |
| 110 | +1: |
| 111 | + .byte 0x00, 0x00, 0x00, 0x00 |
| 112 | + .byte 0x00, 0x00, 0x00, 0x00 |
| 113 | +2: |
| 114 | + blr x0 |
| 115 | + ldp fp, lr, [sp], #16 |
| 116 | + ret |
| 117 | +``` |
| 118 | + |
| 119 | +And with the proposal: |
| 120 | + |
| 121 | +```plain |
| 122 | + paciasp |
| 123 | + stp fp, lr, [sp, #-16]! |
| 124 | + mov fp, sp |
| 125 | + ldr x0, 1f |
| 126 | + b 2f |
| 127 | +1: |
| 128 | + .byte 0x00, 0x00, 0x00, 0x00 |
| 129 | + .byte 0x00, 0x00, 0x00, 0x00 |
| 130 | +2: |
| 131 | + blr x0 |
| 132 | + ldp fp, lr, [sp], #16 |
| 133 | + retaa |
| 134 | +``` |
| 135 | + |
| 136 | +Associated AArch64-specific Cranelift settings - the default values are always `false`: |
| 137 | +* `has_pauth` - specifies whether the target environment supports PAuth |
| 138 | +* `sign_return_address` - the main setting controlling whether the back-edge CFI implementation is |
| 139 | +used; results in the generation of operations that act as `NOP` instructions unless `has_pauth` is |
| 140 | +also enabled |
| 141 | +* `sign_return_address_all` - specifies that all function return addresses will be authenticated, |
| 142 | +including the previously mentioned cases that do not need it in principle |
| 143 | +* `sign_return_address_with_bkey` - changes the generated instructions to use the B key; note that |
| 144 | +this is enforced for any Apple ABI, irrespective of the value of this setting |
| 145 | + |
| 146 | +## Enhanced forward-edge CFI with BTI |
| 147 | + |
| 148 | +The proposed implementation will add the `BTI j` instruction to the beginning of every basic block |
| 149 | +that is the target of an indirect branch and that is not a function prologue. Note that in the |
| 150 | +AArch64 backend generated function calls always target function prologues and indirect branches that |
| 151 | +do not act like function calls appear only in the implementation of the `br_table` IR operation. |
| 152 | +On the other hand, function prologues will begin with the `BTI c` instruction, keeping in mind that |
| 153 | +Cranelift does not have any special handling of tail calls. If PAuth is used at the same time, then |
| 154 | +the initial `PACIASP`/`PACIBSP` operation will act as a landing pad instead. |
| 155 | + |
| 156 | +There is only one associated AArch64-specific Cranelift setting, `use_bti`, which is `false` by |
| 157 | +default. Wasmtime will set the respective memory protection attribute for all executable pages if |
| 158 | +the WebAssembly module has been compiled with that setting enabled; similarly for the Cranelift JIT. |
| 159 | + |
| 160 | +## CFI improvements to code that is not compiled by Cranelift |
| 161 | + |
| 162 | +Currently the code that is not compiled by Cranelift is in assembly, C, C++, or Rust. |
| 163 | + |
| 164 | +Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of |
| 165 | +this proposal, but in general it should be achievable by passing the appropriate parameters to the |
| 166 | +respective compiler. |
| 167 | + |
| 168 | +Functions implemented in assembly will get a similar treatment as generated code, i.e. they will |
| 169 | +start with the `PACIASP` instruction (and any unwinding directives), assuming that the A key is |
| 170 | +used. However, the regular return will be preserved and instead will be preceded by the `AUTIASP` |
| 171 | +instruction. The reason is that both `AUTIASP` and `PACIASP` act as `NOP` instructions when executed |
| 172 | +by a processor that does not support PAuth, thus making the assembly code generic. Functions that do |
| 173 | +not need the pointer authentication operations will start with the `BTI c` instruction instead. |
| 174 | + |
| 175 | +One potential problem in the interaction between code that is compiled by Cranelift and code that is |
| 176 | +not is that only one side might have the CFI enhancements. However, this proposal does not have any |
| 177 | +ABI implications, so Rust code in the Wasmtime implementation that does not use PAuth and BTI, for |
| 178 | +example, would be able to call functions compiled by Cranelift without any issues and vice versa. |
| 179 | +The reason is that it is the responsibility of the callee to ensure that PAuth is used correctly, |
| 180 | +while everything is transparent to the caller. As for BTI, if an executable memory page does not |
| 181 | +have the respective attribute set, then the extension does not have any effect, except for |
| 182 | +introducing extra `NOP` instructions, irrespective of how the code has been reached (e.g. via a |
| 183 | +branch from a page with BTI protections enabled); similarly for branches out of the unprotected |
| 184 | +page. The major exception that is relevant to Wasmtime is unwinding, but there should be no issues |
| 185 | +as long as the abovementioned DWARF operation is used and the system unwinder is recent. |
| 186 | + |
| 187 | +Future work that is beyond what this proposal presents may introduce further hardening that |
| 188 | +necessitates ABI changes, e.g. by being based on |
| 189 | +[the proposed PAuth ABI extension to ELF][pauth-abi] or something similar. |
| 190 | + |
| 191 | +[pauth-abi]: https://github.com/ARM-software/abi-aa/blob/2021Q3/pauthabielf64/pauthabielf64.rst |
| 192 | + |
| 193 | +### Fiber implementation in Wasmtime |
| 194 | + |
| 195 | +The fiber implementation in Wasmtime consists of a significant amount of assembly code that will |
| 196 | +receive the treatment described in the previous section, as an initial implementation. However, the |
| 197 | +fiber switching code saves the values of all callee-saved registers on the stack, i.e. memory that |
| 198 | +is potentially accessible to an adversary. Some of those values could be code addresses that would |
| 199 | +be used by indirect branches, so a complete CFI implementation will verify the integrity of the |
| 200 | +saved state with the `PACGA` instruction. |
| 201 | + |
| 202 | +# Rationale and alternatives |
| 203 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 204 | + |
| 205 | +Since the existing implementation already uses the standard back-edge CFI techniques that are |
| 206 | +preferred in the absence of special hardware support (i.e. a separate protected stack that is not |
| 207 | +used for buffers that could be accessed out of bounds), the alternative is not to implement the |
| 208 | +proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code |
| 209 | +size the impact of the back-edge CFI improvements is 1 or 2 additional instructions per function. |
| 210 | + |
| 211 | +The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the |
| 212 | +forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch |
| 213 | +to check if its destination is permitted. While the overhead of this approach can be reduced by |
| 214 | +using efficient data structures for the destination address lookup and optionally limiting the |
| 215 | +checks only to indirect function calls, it is still significantly larger than the worst-case BTI |
| 216 | +overhead of one instruction per basic block per function. On the other hand, it does not require any |
| 217 | +special hardware support, so it could be applied to all supported platforms. |
| 218 | + |
| 219 | +[clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html |
| 220 | + |
| 221 | +# Open questions |
| 222 | +[open-questions]: #open-questions |
| 223 | + |
| 224 | +- What is the performance overhead of the proposal? |
0 commit comments