Skip to content

Commit 7ea8b9a

Browse files
add tinywasm blogpost
Signed-off-by: Henry Gressmann <[email protected]>
1 parent 3991606 commit 7ea8b9a

File tree

2 files changed

+72
-75
lines changed

2 files changed

+72
-75
lines changed
165 KB
Loading

content/2024/tinywasm/index.md

+72-75
Original file line numberDiff line numberDiff line change
@@ -1,109 +1,106 @@
11
---
22
title: "TinyWasm: How I wrote my own WebAssembly Runtime"
3-
date: 2024-10-06
4-
draft: true
3+
description: "Looking back at the development of TinyWasm, a small WebAssembly runtime written in Rust."
4+
date: 2024-10-13
55
---
66

7-
<!--
8-
Talk about how big, scary words and jargon don't matter. Brute forcing your way through works. Don't be afraid to throw away code. **Be** afraid to ask for help: with the right mindset, you can figure it out yourself (might not work for everyone ?).
9-
-->
7+
After a short hiatus from writing on this blog, I'm back with an update on what I've been working on lately (or rather slowly catching up on the backlog of posts I wanted to write).
108

11-
After a long hiatus from writing on this blog, I'm back with a small update on what I've been working on lately (or rather slowly catching up on the backlog of posts I wanted to write).
12-
13-
I finally finished writing my bachelor's thesis on WebAssembly and Edge Computing this summer. More on that in a later post, but for now, I wanted to talk about another project I worked on earlier this year that inspired much of the work I did for my thesis: TinyWasm.
9+
I finally finished writing my bachelor's thesis on WebAssembly and Edge Computing this summer. More on that in a later post,
10+
but for now, I wanted to talk about another project I worked on earlier this year that inspired much of the work I did
11+
for my thesis: [TinyWasm](https://github.com/explodingcamera/tinywasm), a fully compliant WebAssembly runtime written in Rust.
1412

1513
## <u>**TinyWasm**</u>
1614

17-
When writing my posts on [OS Development](https://blog.henrygressmann.de/series/rust-os/) last year, I got really interested in WebAssembly and wanted to try it out inside the kernel. I was fed up with writing context-switching and memory management code, and WebAssembly looked like an easy way to run existing code in the operating system.
15+
When writing my posts on [OS Development](https://blog.henrygressmann.de/series/rust-os/) last year, I got interested in WebAssembly
16+
and wanted to try it out inside the kernel. I was fed up with writing context-switching and memory management code,
17+
and WebAssembly looked like an easy way to run existing code in the operating system.
1818
I looked at the existing interpreters and compilers for WebAssembly, but they were all either too complex or had too many dependencies for my taste (I was going for embedded systems, so it had to be lightweight).
1919

20-
With minimal prior experience with WebAssembly and compilers/interpreters, I now had the topic for my capstone project: A tiny WebAssembly runtime. I've been pretty burned on a lot of (unnecessarily) complex projects in the past, so, to keep myself on track, I decided to set out some constraints at the start to actually finish it on time:
20+
With just a bit of prior experience with WebAssembly and compilers/interpreters, I now had the topic for my capstone project:
21+
Building a WebAssembly runtime. I've been pretty burned on a lot of (unnecessarily) complex projects in the past, so to keep myself on track,
22+
I decided to set out some constraints at the start to finish it on time:
2123

22-
1. **No Platform-Specific Code**: My first goal was to remove all dependencies on platform-specific code so everything could work in Rust's `no_std` environment. This also meant that I could only use a few of the existing libraries for WebAssembly. Thankfully, an excellent crate for parsing WebAssembly binaries already existed: [`wasmparser`](https://github.com/bytecodealliance/wasm-tools) (however, with no `no_std` support at the time).
24+
1. **No Platform-Specific Code**: My first goal was to remove all dependencies on platform-specific code so
25+
everything could work in Rust's `no_std` environment (and potentially in my OS). This also meant I was limited in the libraries I could use.
26+
Thankfully, an excellent crate for parsing WebAssembly binaries already existed:
27+
[`wasmparser`](https://github.com/bytecodealliance/wasm-tools). At the time, it didn't support `no_std`, but I was able to fork it and make it work.
2328

2429
2. **Build the MVP**: Focus on the initial version of WebAssembly, so no threads, no SIMD, no garbage collection, etc.
2530

26-
3. **Keep it simple**: I wanted the codebase to be as small and readable as possible to make it easier to integrate into other projects, such as my OS. No premature optimization.
27-
28-
4. **No Unsafe Code**: This came a bit later, but I decided to avoid unsafe Rust code entirely (maybe something for another post).
29-
30-
I started by taking a simple "Hello World" WebAssembly program and tried to infer everything I needed. Surprisingly, this worked well and worked in a short amount of time. This gave me some slightly misplaced confidence that everything would be smooth sailing from here on out.
31-
32-
<!--
33-
```rust
34-
let mut local_values = vec![];
35-
for (i, arg) in args.iter().enumerate() {
36-
let (val, ty) = arg.to_bytes();
37-
if locals[i] != ty {
38-
return Error::other(&format!("Invalid argument type for {}, index {}: expected {:?}, got {:?}" func_name, i, locals[i], ty));
39-
}
40-
local_values.push(val);
41-
}
42-
43-
let mut stack: Vec<Vec<u8>> = Vec::new();
44-
while let Some(op) = body.next() {
45-
match op.unwrap() {
46-
Operator::LocalGet { local_index } => stack.push(local_values[local_index as usize].clone()),
47-
Operator::I64Add => {
48-
let a = i64::from_le_bytes(stack.pop().unwrap().try_into().unwrap());
49-
let b = i64::from_le_bytes(stack.pop().unwrap().try_into().unwrap());
50-
stack.push((a + b).to_le_bytes().to_vec());
51-
}
52-
Operator::I32Add => {
53-
let a = i32::from_le_bytes(stack.pop().unwrap().try_into().unwrap());
54-
let b = i32::from_le_bytes(stack.pop().unwrap().try_into().unwrap());
55-
stack.push((a + b).to_le_bytes().to_vec());
56-
}
57-
Operator::End => {
58-
info!("stack: {:#?}", stack);
59-
return Ok(returns.iter().map(|ty| WasmValue::from_bytes(&stack.pop().unwrap(), ty)).collect::<Vec<_>>());
60-
}
61-
_ => {}
62-
}
63-
}
64-
``` -->
31+
3. **Keep it simple**: I wanted the codebase to be as small and readable as possible to make it easier to integrate into other projects,
32+
such as my OS. No premature optimization (There has already been a [fork of TinyWasm](https://github.com/reef-runtime) used as a base for a distributed WebAssembly runtime).
33+
34+
4. **No Unsafe Code**: This came a bit later, but I decided to avoid unsafe Rust code entirely (maybe something for another post). While this excludes some optimizations like using virtual memory to optimize bounds checking in Wasm memory, it also forces me to write simpler code.
35+
36+
I started by taking a simple "Hello World" WebAssembly program and tried to infer everything I needed without looking at the specification.
37+
Surprisingly, this worked well, and in a short time, I had a simple interpreter that could run very basic programs.
6538

6639
{{ figure(caption = "The first test version of the interpreter.", position="center", src="./assets/code.jpg", link="https://github.com/explodingcamera/tinywasm/blob/93f8e10a8c15cbcf0d09517869016c32c6bd47eb/crates/tinywasm/src/module/mod.rs#L131-L185") }}
6740

68-
With this newly gained confidence, I scrapped the initial codebase and started from scratch. Beginning with a simple public API, I did something unlike me: TDD. Essentially, following a lengthy document that outlines everything about WebAssembly (the [specification](https://webassembly.github.io/spec/core/index.html)), you just have to write tests for everything. Thankfully, I didn't actually have to write any of these tests myself, as the reference interpreter conveniently already has [thousands of them](https://github.com/WebAssembly/testsuite) covering a lot of edge cases (that are also often not clear from the specification). Plumbing these tests into my own test suite was a bit of a pain, but in the end, I had a script that would run all of the relevant tests and give me a nice graph of how many tests I had passed (and some dopamine when the number went up).
41+
With this newly gained confidence, I scrapped the initial codebase and started from scratch.
42+
Starting by defining the structure of the interpreter and the different components it would need, I quickly realized that
43+
I would need a lot of tests to make sure everything worked as expected.
44+
Thankfully, I didn't have to write all of these tests myself, as the reference interpreter conveniently already has
45+
[thousands of them](https://github.com/WebAssembly/testsuite) covering a lot of edge cases. Plumbing these tests into my test suite was a bit
46+
of a pain, but in the end, I had a script that would run all of the relevant tests and give me a nice graph of
47+
how many tests I had passed (and some dopamine when the number went up).
6948

7049
{{ figuresvg(caption = "", position="center", src="content/2024/tinywasm/assets/progress-mvp.svg") }}
7150

72-
Now was about the time my newly found confidence started to decline.
51+
Now that I had a good test suite, I started implementing the interpreter. The WebAssembly specification is thorough,
52+
but it's also dense with abstract concepts and mathematical notation.
53+
I spent a lot of time looking at different interpreters and their APIs to get a better understanding of how things were supposed
54+
to work (I can recommend the trusty [grep.app](https://grep.app/) for this).
55+
56+
For the actual implementation, I mainly started by taking a couple of tests from one of the test suites and trying to get them to pass,
57+
which worked surprisingly well. Slowly but surely, the numbers went up, and more and more tests passed.
7358

74-
Around this time, my initial overconfidence started to wane. The WebAssembly specification is thorough, but it's also dense and filled with abstract concepts that aren't immediately helpful when you're trying to write actual code. I spent a lot of time looking at different interpreters and their APIs to get a better understanding of how something was supposed to work (I can recommend the trusty [grep.app](https://grep.app/) for this).
59+
Predictably, once I reached only about 80/2000+ test cases left, I still had about 20% of my work and
60+
a couple of long nights ahead of me. Finally, once all the tests passed, I compiled the interpreter to WebAssembly
61+
and ran it using TinyWasm. It worked on the first try. I was completely surprised, but LLVM randomly did the right optimizations that made it work,
62+
and its code didn't trigger any of the remaining edge cases/bugs.
7563

76-
Most of the time, I just took a couple of tests from one of the test suites and tried to get them to pass, which worked surprisingly well. Slowly but surely, the numbers went up and more and more tests passed.
64+
## <u>**Optimization**</u>
7765

78-
Predictably, once I reached only about 80/2000+ test cases left, I still had about 20% of my work and a couple of long nights ahead of me. Finally, once all the tests passed, I just compiled the interpreter to WebAssembly and ran it using TinyWasm itself. It worked on the first try. I was completely confused, but LLVM randomly did the right optimizations that made it work, and its code didn't trigger any of the remaining edge cases/bugs.
66+
Once I had a (mostly) working interpreter, I started looking into profiling and optimizing the code. I had a few ideas on how to make it faster,
67+
but I wanted to optimize only the parts that were slow and not add any additional complexity to the codebase.
68+
I started by profiling the interpreter using `perf`, `cargo-flamegraph` and later `samply` to understand where the bottlenecks were. To keep things going in the right direction,
69+
I also added some basic benchmarks using `criterion` to ensure I didn't accidentally make things slower.
7970

80-
## <u>**(Premature) Optimization**</u>
71+
{{ figure(caption = "A flamegraph using Firefox's profiler & samply", src="./assets/flamegraph.jpg") }}
8172

82-
- i wanted good perf
83-
- profiling tools
84-
- simplifying the code
85-
- a lot of array indexing due to Rust's memory model
86-
- minimizing copies
87-
- reducing struct sizes so they fit into pointers
88-
- reducing the bytecode size (custom bytecode format)
89-
- reducing reference counted values
90-
- nudging the compiler in the right direction (jump tables for opcodes)
91-
- not competing with optimized runtimes - focus is on simplicity and size
92-
- AoS vs SoA - no great support in Rust, but could speed some things up. E.g stack is a SoA right now. Minimizes memory usage.
93-
- A lot of cache misses due to the interpreter loop without unsafe code
94-
- Accessing the store is slow, but it's a tradeoff for safety
95-
- Small memory accesses are slow, but a lot of the issues disappear with the bulk memory proposal enabled
96-
- Register based interpreters are a lot faster right now, pushing/popping from the stack is extremely expensive
73+
Initially, the biggest overhead was matching opcodes in the interpreter loop. Without using unsafe code,
74+
I had to nudge the compiler in the right direction to generate jump tables for the opcodes. Thankfully, a couple of
75+
`#[inline(always)]` annotations and some moving code around did the trick, giving me a nice +50% speedup.
76+
There's not a lot of information on how to do this in Rust, but a [post](https://pliniker.github.io/post/dispatchers/)
77+
hints at this probably being the easiest cross-platform way to do it.
78+
79+
From there, I also looked into reducing the size of the bytecode and the interpreter itself. TinyWasm uses a custom bytecode format
80+
that's a bit easier to execute than the standard WebAssembly format and can be zero-copy deserialized (powered by [`rkyv`](https://github.com/rkyv/rkyv)).
81+
This bytecode is represented by a big enum with all the different opcodes and their arguments, and without any optimizations, it was about 32 bytes per instruction.
82+
To reduce this, I removed some redundant information that could be inferred from the context and added more specialized opcodes for common patterns (Super Instructions).
83+
Currently, the bytecode is about 16 bytes per instruction, which is a nice improvement to memory usage and performance due to better memory alignment.
84+
85+
Currently, The biggest bottleneck is the stack, mainly `push` and `pop` operations. I'm currently looking into ways to optimize this, but it's tricky without using unsafe code. Other runtimes, such as [wasmi](https://wasmi-labs.github.io/blog/posts/wasmi-v0.32/), show that register-based interpreters are much faster. However, I'm not sure if I want to go down that route yet, as it would add a lot of complexity to parsing, and I'd like to stay
86+
as close to the original WebAssembly model as possible.
87+
88+
For actual performance, I'm currently at about 1/3 of the speed of wasmi, which is pretty good considering the size of the codebase. These benchmarks are not available online yet as this was part of my thesis, but whenever I get around to cleaning them up, I'll publish them on GitHub as well.
9789

9890
## <u>**Conclusion**</u>
9991

100-
I was super happy with the results, and I'm still pushing the odd update here and there. Currently, TinyWasm supports WebAssembly V2 (without SIMD and threads) and several of other proposals. I also posted it on HN and Reddit, where I got some nice feedback and a few stars on GitHub (obviously the most important part).
92+
I was super happy with the results, and I'm still pushing the odd update here and there. The next step is SIMD support (currently in the works), for which I recently refactored the stack to use a more efficient representation (SoA for differently sized types). After that, I'll look into adding threads and moving to support the WebAssembly System Interface (WASI). However, I'm waiting for the spec to stabilize before I start implementing it.
93+
94+
As of now, TinyWasm supports WebAssembly V2 (without SIMD and threads) and several other proposals, such as reference types and bulk memory operations, so most programs should work fine. After submitting it as my capstone project, I also posted it on HN and Reddit, where I got some nice feedback and a few stars on GitHub (obviously the most important part).
10195

10296
{{ figure(caption = "Internet points are important.", position="center", src="./assets/hn.jpg", link="https://news.ycombinator.com/item?id=39627410") }}
10397

104-
If you're interested in checking it out or maybe even contributing, TinyWasm is up on [GitHub](https://github.com/explodingcamera/tinywasm) and also on [crates.io](https://crates.io/crates/tinywasm). Feel free to poke around, open issues, or even submit a PR (I recently improved the test suite and added a small contribution guide).
98+
If you're interested in checking it out or maybe even contributing, TinyWasm is up
99+
on [GitHub](https://github.com/explodingcamera/tinywasm) and also on [crates.io](https://crates.io/crates/tinywasm).
100+
Feel free to poke around, open issues, or even submit a PR (I recently improved the test suite and added a small contribution guide).
105101

106102
## <u>**Further Reading**</u>
107103

108-
After finishing TinyWasm, I started working on my thesis, which was a lot of fun and work. I'll probably write a post about that in the future. Still, for now, I can recommend the following resources if you're interested in WebAssembly/Interpreters/Compilers: [Crafting Interpreters](https://craftinginterpreters.com/) by Robert Nystrom is probably the best introduction to the entire field.
109-
Going from there, I can also recommend the [Writing an Interpreter in Go](https://interpreterbook.com/)/[Writing a Compiler in Go](https://compilerbook.com/) books or the [Writing Interpreters in Rust Guide](httphttps://rust-hosted-langs.github.io/book/). But my biggest takeaway from this project was that you don't need to understand everything to start and can infer many basic principles from looking at other projects and just trying things.
104+
[Crafting Interpreters](https://craftinginterpreters.com/) by Robert Nystrom is probably the best introduction to the field.
105+
Going from there, I can also recommend the [Writing an Interpreter in Go](https://interpreterbook.com/)/[Writing a Compiler in Go](https://compilerbook.com/)
106+
books or the [Writing Interpreters in Rust Guide](httphttps://rust-hosted-langs.github.io/book/). I mostly looked at the source code of other interpreters, though, so don't be scared by all the theory. Simple interpreters are surprisingly easy to write, and a small one can be a great weekend project.

0 commit comments

Comments
 (0)