Refactor bytecode representation #4220

raskad · 2025-03-30T01:33:33Z

This PR explores a new bytecode representation.

Currently the bytecode is encoded in one [u8] list. All opcodes (u8) and arguments are encoded in that list. This means that we have to perform a lot of individual, unaligned reads on that list. For example, when we read an opcode with two registers, we currently perform three unaligned reads.

This PR moves the bytecode to a fixed 64 bit instruction list with arguments either encoded in the instruction or spilling over to a [u32] list. One u64 instruction contains an opcode (u8) a flag representing the format of the arguments (u8) and either inline arguments or the index and argument number of the arguments in the spillover list.
In my local benchmarks this seems to have a positive impact on performance. Generally all benchmarks score higher, with the overall score being 307 -> 322.

One drawback is that there is some wasted space in the instructions. How much depends on the opcode and the arguments format, but since most opcodes use at least two registers, this is less of a concern than it might have been previously. In addition, I can imagine some reductions in wasted space, by fitting multiple opcodes into one u64 if possible. This would require a non integer pc and adjusting the patch code, but should be possible.

In addition to this change in bytecode encoding, I took the chance to add two further changes:

Move arguments decoding into defined central types. This enables us to reuse arguments decoding code and geet rid of alot of the per opcode boilerplate that was error prone to write manually.
Generate emit functions for every opcode that can be used in the bytecompiler to get rid of error prone manual emit code.
Move the handling of different CompletionTypes into the opcode code itself. This results in the CompletionType enum being removed. This moves the handling code out of the hot loop that is iterating trough the opcodes. Also many opcodes can only return a limited possibility of completions. Moving the handling into the opcodes enables a more specific handling, in some cases basically removing any handling at all.

github-actions · 2025-03-30T01:41:49Z

Test262 conformance changes

Test result	main count	PR count	difference
Total	50,254	50,254	0
Passed	46,656	46,656	0
Ignored	1,634	1,634	0
Failed	1,964	1,964	0
Panics	0	0	0
Conformance	92.84%	92.84%	0.00%

raskad · 2025-03-30T01:43:37Z

This is not finished in any way. I just wanted to put it out to get some feedback on the overal concept. All of the flowgraph / trace code is still missing from the new appoach. If the feedback is positive, I will see how to best implement that / possibly also move into macros. In addition spend_budget_and_execute is also not implemented yet and some arguments are not represented optimally (I wanted to see if there are performance gains first, before spending more time on that).

To get an overview / feeling for the changes:

look at some changes in the bytecompiler code for the new emit code.
look at some individual opcode files for the execution code
see some of the major changes in core/engine/src/vm/opcode/mod.rs and core/engine/src/vm/opcode/args.rs.

HalidOdat

Really liking the direction of this PR, simplifying the arguments and generating the opcode functions is a great step forward. 😄

That said, I did notice that the bytecode size increases by about 3x (based on checking the combined.js output). While some of that overhead might be reduced by encoding multiple instructions into a single one, it's still likely to end up at least twice as large overall. Additionally, splitting the bytecode into two arrays could have a negative impact on cache locality.

Overall, I think the approach we're taking is aligned with what engines like V8 and JavaScriptCore are doing. There's a great article from the JavaScriptCore team that touches on a similar idea with prefix opcodes: A new bytecode format for JavaScriptCore.

In terms of performance, I suspect the bigger issue isn't so much unaligned reads, but rather how we read arguments — currently it's done one at a time, with a bounds check on each access. We might see a noticeable performance boost if we check bounds ahead of time and read the arguments in bulk.

Would love to hear your thoughts! :)

EDIT: Here is the code I used to get the size :)

Refactor bytecode representation

55c225e

raskad force-pushed the refactor-bytecode-u64-wip branch from e45496f to 55c225e Compare April 13, 2025 00:37

raskad marked this pull request as ready for review April 13, 2025 01:16

raskad added this to the next-release milestone Apr 13, 2025

raskad requested a review from a team April 13, 2025 01:16

HalidOdat reviewed Apr 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor bytecode representation #4220

Refactor bytecode representation #4220

raskad commented Mar 30, 2025 •

edited

Loading

github-actions bot commented Mar 30, 2025

raskad commented Mar 30, 2025 •

edited

Loading

HalidOdat left a comment •

edited

Loading

Refactor bytecode representation #4220

Are you sure you want to change the base?

Refactor bytecode representation #4220

Conversation

raskad commented Mar 30, 2025 • edited Loading

github-actions bot commented Mar 30, 2025

Test262 conformance changes

raskad commented Mar 30, 2025 • edited Loading

HalidOdat left a comment • edited Loading

Choose a reason for hiding this comment

raskad commented Mar 30, 2025 •

edited

Loading

raskad commented Mar 30, 2025 •

edited

Loading

HalidOdat left a comment •

edited

Loading