Skip to content

Remove the need for inline assembly trampolines used by Wasmtime #4611

Closed
@alexcrichton

Description

@alexcrichton

I'm opening this as a loose tracking issue for removing the need to have inline assembly trampolines defined by Wasmtime. Ideally all trampolines necessary could be provided by Cranelift instead of a mixture of what we have today of Rust-defined, inline assembly, and Cranelift-defined trampolines.

Below is a lot of words from #4535 (comment) when I first wrote about this:


The stack unwinding in #4431 relies on precisely knowing the stack pointer when we enter WebAssembly along with the frame pointer and last program counter when we exit WebAssembly. This is not generally available in Rust itself so we are relying on handwritten assembly trampolines for these purposes instead.

Entry into WebAssembly

Entry into WebAssembly happens via one of two routes:

  1. A "typed" route using the wasmtime::TypedFunc API or when invoking an core instance's start function (which has a known fixed signature of no inputs and no outputs). In these cases Rust does an indirect call directly to the Cranelift-generated code for the corresponding wasm function.
  2. An "untyped" route which is used by wasmtime::Func::call as well as wasmtime::component::{Func,TypedFunc}::call. In this situation Rust will call a Cranelift-compiled trampoline. The Cranelift trampoline will load arguments from a stack parameter and then make an indirect call to the actual Cranelift-compiled wasm function which is also supplied as an argument.

Today this all records the entry stack pointer via the host_to_wasm_trampoline defined in inline assembly. Concretely Wasmtime will "prepare" an invocation which stores the Cranelift-generated function to call (be it a raw function in case (1) or a trampoline for case (2)) into the VMContext::callee field and then invoke the host_to_wasm_trampoline inline asm symbol.

This entry isn't too relevant to the component model since we're already doing what's necessary for the stack unwinding, recording the sp on entry. Nevertheless I want to describe the situation so I want to describe some oddities here as well:

  • The actual trampoline used in (2) to load arguments from the stack is not actually always defined by Cranelift. Instead sometimes it's a monomorphized Rust function host_to_wasm_trampoline from the Func::wrap API. This means we unfortunately cannot rely on Cranelift to supply all these trampolines which means we can't rely on the trampolines to do things that Rust itself can't do.
  • The entry trampoline currently requires the ability to tail-call to the actual callee. This is a technical limitation due to using the exact same trampoline for every single entry point, regardless of signature.

Ideally we would always enter WebAssembly via a Cranelift-compiled trampoline. That would mean we could do anything in the trampoline that Cranelift would do and ideally remove the need to have inline asm for this. We might still need multiple trampolines for untyped entry points and typed entry points, but overall we should ideally be able to do better here.

Exiting WebAssembly

Exiting back to the host happens in a few locations, and this is the focus of this issue where it's missing support in the component model:

  1. Exiting from core wasm will either end up in something defined by Func::wrap or Func::new (roughly). Both of these use a VMHostFunctionContext which internally has two function pointers. One is the VMCallerCheckedAnyfunc which wasm actually calls and the other is the actual host function pointer defined in Rust being invoked. The function pointer contained within the VMCallerCheckedAnyfunc is a trampoline written in inline assembly which spills the fp/pc combo into VMRuntimeLimits. The function pointer to invoke contained within the VMHostFunctionContext has the "system-v ABI" since it receives arguments in native platform registers. For Func::wrap this is a Rust function and for Func::new this is a Cranelift-generated trampoline which spills arguments to the stack and then calls a static address specified at compile time (using Func::new requires Cranelift at runtime).
  2. Exiting from a component will always exits via a lowered host function. Concretely what happens is that a VMComponentContext has an array lowering_anyfuncs: [VMCallerCheckedAnyfunc; component.num_lowerings]. This array is what core wasm actually calls and is exclusively populated by Cranelift-compiled trampolines (via compile_lowered_trampoline). These trampolines are similar to the Cranelift-compiled trampolines for Func::new but call a host function of type signature VMLoweringCallee. This is where fp/pc are not recorded while we exit wasm. There's not clear way to use the same trick as Func::{wrap,new} which have a singular inline asm trampoline for all signatures since the callee to defer to depends on the LoweringIndex.
  3. Finally exiting wasm can also happen via libcalls implemented in Wasmtime. Currently each libcall gets a unique inline-asm-defined trampoline that records the pc/fp combo and then does a direct tail-call to the actual libcall itself.

Proposal to fix this issue

Overall I find the current trampoline story as pretty complicated and also pretty inefficient. There's typically at least one extra indirect call for all of these transitions and additionally there's very little cache-locality. The fix I'm going to propose here isn't a silver bullet though and will only solve some issues, but I think is still worth pursuing.

I think we should add few new pseudo-instructions to Cranelift:

  • Something to get the current frame pointer
  • Something to get the current stack pointer
  • Something to get the return address of the current function
  • Something to get the address of a label in a function (this may already exist, not sure)

With these tools we can start trying to eventually move all of the trampolines above to Cranelift exclusively and remove both Rust-defined and inline-asm defined trampolines:

  1. For components, and this issue, compile_lowered_trampoline could be updated to use the cranelift instructions to record the pc/fp combo into the VMRuntimeLimits. This would remove the need for any extra trampoline when exiting a component and would solve the issue at hand.
  2. For libcalls we could use the cranelift instructions to manually save fp/pc just before a libcall out to the runtime. This would remove all trampolines related to libcalls.
  3. For Func::new the cranelift-generated trampoline could act similar to compile_lowered_trampoline and store the fp/pc combo to VMRuntimeLimits and avoid the need for two trampolines.
  4. Untyped host-to-wasm trampolines could do the sp-saving internally rather than relying on the external trampoline to do so.

Those are at least the easy ones we could knock out with more Cranelift features. Otherwise there are still a number of places that we are requiring trampolines:

  • Exit trampolines with Func::wrap could ideally be generated by Cranelift but would still require two indirect calls. One call to get to the trampoline from the original core wasm and then a second call from the trampoline to the host function itself. The main problem here is getting a trampoline. Assuming trampolines are provided by Cranelift then they become available at runtiem when modules are loaded, which means Func::wrap needs to, at some point, dynamically look up a trampoline and find a corresponding one in a previous module's compiled image. This is not trivial.
  • Entry trampolines to TypedFunc are similarly somewhat nontrivial, but I think surmountable. Today a Store has a registry of untyped trampolines per-function signature, and I think it could also have a registry of typed trampolines per-function signature. This typed trampoline would then be used to enter wasm instead of today's calling the raw wasm function. In this situation the callee would be passed as an argument to the trampoline in the same manner untyped trampolines receive the callee.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wasmtimeIssues about wasmtime that don't fall into another label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions