Skip to content

Add support for function references proposal #2562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zherczeg
Copy link
Contributor

@zherczeg zherczeg commented Mar 11, 2025

This large patch adds function references support for those parts of the wabt library, which already present in the code. The current code is designed for an older (outdated) variant of the function references proposal.

Key changes:

  • Var variables can optionally store type data. This way there is no need for a Var/Type field pair in many cases.
  • CallRef now needs to explicitly provide its reference type (no auto detection)
  • Tracking local reference initialization (non nullable refs must be set before using them)
  • Supporting type equality comparisons in the validator and in the interpreter
  • Adds a code for reading/writing 33 bit integers (leb format)
  • Remove EndTypeSection in shared validator, types can be validated earlier
  • Improve named reference resolving in the text parser

@zherczeg
Copy link
Contributor Author

@sbc100 @keithw since I don't know enough about the project, these are just some random changes in the parser.

The aim is parsing https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

It is doing something (parsing correctly is surely an exaggeration) with call_ref $ii and ref null $ii. The next item is ref.null $ii. It looks like it will need API changes, since a token type will not be enough. Would adding an index to it be a good idea?

Could you check that these changes makes sense? There is a Type::Reference enum in the code, but function reference introduces two other enums, and does not use Reference. This is strange to me.

Thank you for the help.

@zherczeg
Copy link
Contributor Author

This patch goes to a new direction. The 32 bit type filed of Var is split into two 16 bit fields. The second can be used to store WebAssembly types (negative 7 bit numbers). This way Var can be used in a more generic way.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from c804a98 to c40c098 Compare March 13, 2025 12:53
@zherczeg
Copy link
Contributor Author

There is something I don't understand. There is a Reference = -0x15 in type.h. However, in
https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md
(ref ht) is defined as -0x1c and (ref null ht) is -0x1d. Is the old code obsolete?

The key change of the patch is that Var can also behave like Type, and transformation between the two is possible after names are resolved.

Btw, may I ask how actively maintained the wabt project?

@sbc100
Copy link
Member

sbc100 commented Mar 13, 2025

I believe ht refers to Heap Type which would not be needed for function references. I believe heap types refer to things like structs and arrays in the GC proposal.

@zherczeg
Copy link
Contributor Author

That is possible. Unfortunately not everything is described clearly in the text. For example, ref.null ht: [] -> [(ref null ht)] Which encoding is used for ht here?

https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md

@zherczeg
Copy link
Contributor Author

I have tried to track down the v8 implementation for ref.null.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L386

It calls read_heap_type. It is in the same file.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L217

Starts with:
decoder->read_i33v(pc, "heap type");

So it looks like it uses the s33 decoding which the specification mentioned.

@zherczeg zherczeg force-pushed the function_ref branch 5 times, most recently from c94e5a1 to a959acb Compare March 14, 2025 16:37
@zherczeg
Copy link
Contributor Author

I have some good news! This test is fully working now (including the interpreter):
https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

Only 9 fails remained, they all look like CallRef syntax change related, so some tests need to be updated.

The patch is quite large. There are some parts which are questionable. I will comment about these parts later to help review.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 78596d6 to b79aead Compare March 15, 2025 06:49
@@ -181,7 +181,7 @@ class BinaryReaderLogging : public BinaryReaderDelegate {
Result OnCatchExpr(Index tag_index) override;
Result OnCatchAllExpr() override;
Result OnCallIndirectExpr(Index sig_index, Index table_index) override;
Result OnCallRefExpr() override;
Result OnCallRefExpr(Type sig_type) override;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think type sounds better than Index, although it could be an Index.

Index,
Name,
};

struct Var {
// Var can represent variables or types.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major internal change. It does not increase the size of Var. I think it is better than constructing a Var/Type pair.


Location loc;

private:
void Destroy();

VarType type_;
// Can be set to Type::Enum types, 0 represent no optional type.
int16_t opt_type_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range should be enough for a long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason we can't use Index index_ below to store the type index? Isn't Index supposed to be flexible in that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively if you do need a third type for Var why not add another element to VarType and add another element to the union below?

Sorry, I think I must be misunderstanding something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For indexed references, you need both a reference type (nullable / non-nullable) and an index or name (only for the text parser). The index or name is stored in the union. For non indexed types the type could go to the union.

Another alternative could be:

VarType {
  Name, Index, RefName, RefIndex, RefNullName, RefNullIndex, Type
}

But handling this seems far more complex than the current one, and we have more than enough bits (VarType is aligned to 32 bit, because the union is aligned to 32 bit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main part of this PR that I'm still finding confusing.

@tlively WDYT about this part?

@@ -63,6 +63,7 @@ void WriteS32Leb128(Stream* stream, T value, const char* desc) {
size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec adds this new type, the output is uint64_t. It could be merged with ReadS32Leb128 by adding a bool* argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is S33 used somewhere in the binary format now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And they use it even more places in gc :(

https://webassembly.github.io/function-references/core/binary/types.html#heap-types
a 33 bit integer can hold a 32 bit unsigned index or a type value. Its maximum (leb) size is the same as 32 bit integers.

ref.null uses this for example.
https://webassembly.github.io/function-references/core/syntax/instructions.html#syntax-instr-ref

@@ -220,18 +221,6 @@ class SharedValidator {
Result OnUnreachable(const Location&);

private:
struct FuncType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to type checker.

@@ -46,6 +46,8 @@ class Type {
FuncRef = -0x10, // 0x70
ExternRef = -0x11, // 0x6f
Reference = -0x15, // 0x6b
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely missing from the proposal. Maybe it was removed at some point? Currently Ref and Reference are considered equal, and randomly used one or the other. I wanted to wait a feedback before doing anything with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v8 it is kStructRefCode:
https://chromium.googlesource.com/v8/v8/+/refs/heads/main/src/wasm/wasm-constants.h#46

Probably used by GC. Probably kArrayRefCode will be needed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlively do you know the history here?

Result BinaryReaderLogging::name(Type type) { \
LOGF(#name "(%s)\n", type.GetName().c_str()); \
return reader_->name(type); \
#define DEFINE_TYPE(name) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this might be reverted. I will check.

@@ -897,7 +897,7 @@ Result BinaryReaderObjdumpDisassemble::OnOpcodeType(Type type) {
if (!in_function_body) {
return Result::Ok;
}
if (current_opcode == Opcode::SelectT) {
if (current_opcode == Opcode::SelectT || current_opcode == Opcode::CallRef) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more opcodes could go here (ref.null?). This is confusing for me, the two toString variants works differently, although they could be merged.

src/leb128.cc Outdated
@@ -288,6 +286,40 @@ size_t ReadS32Leb128(const uint8_t* p,
}
}

size_t ReadS33Leb128(const uint8_t* p,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These readers don't check that the minimum amount of bytes is used for encoding. This is important for utf, not sure for leb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little more what you mean here?

For sure we allow encoding that are over-long, i.e. zero-padded. This is used a lot in the llvm wasm writer, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that. Some encodings, such as utf8 expects a minimum number of byte encodings, so a 7 bit character cannot be encoded more than a single byte, even if the type bits are correct for longer encodings. If padding for leb is allowed then it is perfectly ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid duplicating code here and instead just use ReadS64Leb128 here (with a bounds check after reading it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible. We could add a default false boolean flag to the ReadS64Leb128 as well.

Anyway, what is the reason for the lot of code duplications (and manual loop unrollings) in this file? Performance? For consistency, I also added another duplication, but this part could be improved (reducing code size) in many ways. It can be done in another patch as well, before or after this patch.

What is your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, to keep this patch focused how about just removing the LEB changes from this PR?

I think its reasonable for now to just read S33 values using ReadS64Leb128. Apparently this is what binaryen does. We can come back and try to cleanup this stuff and remove the duplication later perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed the S33 parsing. We should do something with it later. Please, let me know your opinion, and I will do it.

@@ -545,6 +546,17 @@ Result ResolveFuncTypes(Module* module, Errors* errors) {
}

if (func) {
if (!func->local_type_list.empty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a bugfix. Named references has been never resolved for local types. Maybe we could do it on the compressed list, but that can be done in another patch.

@zherczeg zherczeg force-pushed the function_ref branch 4 times, most recently from 6587cf7 to 0f7353f Compare March 16, 2025 04:16
@@ -0,0 +1,19 @@
;;; TOOL: run-interp-spec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that call_ref.wast is already in the repository, and it is passing now. There are other tests in the directory (e.g. binary.wast) which is also passing now, they could be updated later as well.

@zherczeg zherczeg marked this pull request as ready for review March 16, 2025 04:37
@zherczeg
Copy link
Contributor Author

I think the patch is ready. There are lots of follow-up work ahead.

@@ -189,13 +189,32 @@ void ScriptValidator::PrintError(const Location* loc, const char* format, ...) {
errors_->emplace_back(ErrorLevel::Error, *loc, buffer);
}

static Result CheckType(Type actual, Type expected) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is only used by comparing expected results. This means "module" context is not present, and expected values are immediate values. So maybe an even restricted checker could work.

@zherczeg zherczeg force-pushed the function_ref branch 3 times, most recently from 58411a5 to bf0400f Compare March 18, 2025 05:06
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from ea9bd86 to 09523fe Compare April 28, 2025 04:21
@zherczeg
Copy link
Contributor Author

This is the next patch for supporting the function references proposal. It focuses on improving the wabt supported elements of WebAssembly. It is a feature patch now, the bugfixes were extracted and landed as separate patches.

The key concept changes of this patch:

  • Type comparison (type-checker) requires the types hashmap (shared-validator). This circular dependency can be broken by a new type comparison class, which is used by both type-checker and shared-validator.

  • Vars can optionally store types now without increasing its structure size. This simplify passing information, but does not necessary fit for the design concepts of the wabt project.

  • RefNull and CallRef use (the new) Vars, instead of types. Many structures uses Vars in the internal representation, so this is not necessary new. However, using Vars for types is something new and debatable.

Please let me know your opinion about these design concepts. Since the parser can store pointers to types, and check/resolve them later, it is possible to use types only. But using Vars directly is not necessary a problem.

@zherczeg zherczeg force-pushed the function_ref branch 5 times, most recently from d1a3c84 to f94828b Compare May 7, 2025 00:45
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from a9d13ab to 843bee5 Compare May 8, 2025 09:24
@zherczeg zherczeg force-pushed the function_ref branch 4 times, most recently from 8440624 to 9b964c0 Compare May 14, 2025 07:11
@zherczeg zherczeg force-pushed the function_ref branch 5 times, most recently from 25a19f8 to df1ba7c Compare May 21, 2025 06:47
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this!

I'm excited to see this land but I'm also having some trouble groking the whole change since its so large.

I think the parts that could use some more explanation, are the changes to ir.h an in particular the hybrid use of struct Var. Perhaps could you could update the PR description which an explanaation of the various parts of the this change?

@@ -184,6 +184,7 @@ struct FuncType : ExternType {

ValueTypes params;
ValueTypes results;
std::vector<FuncType>* func_types;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this field used for?

Copy link
Contributor Author

@zherczeg zherczeg May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for supporting a complex requirement:

https://github.com/WebAssembly/testsuite/blob/d76759e746f3564a03f6106ae19679742f2a1831/proposals/function-references/type-equivalence.wast#L112

We need to match the import/export references across modules. To do that, the interpreter needs to see the whole type tree (where the references are expanded to other types).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about it. This is a non-trivial connection.

@@ -36,24 +36,41 @@ namespace wabt {

struct Module;

enum class VarType {
enum class VarType : uint16_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not uint8_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply split the 32 bit (including padding) field to two 16 bit fields. I could also use bitfields, but I think 16 bit is more than enough for WebAssembly types. Shall I change that to 8 bit (this way another 8 bit padding will be added by the compiler)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, perhaps just add a comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added


void set_index(Index);
void set_name(std::string&&);
void set_name(std::string_view);
void set_opt_type(Type::Enum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the opt part mean here? optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For most Vars (e.g. names of WebAssembly constructs) there is no need for any type.

@@ -727,9 +753,7 @@ class CallRefExpr : public ExprMixin<ExprType::CallRef> {
explicit CallRefExpr(const Location& loc = Location())
: ExprMixin<ExprType::CallRef>(loc) {}

// This field is setup only during Validate phase,
// so keep that in mind when you use it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment no longer valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Call ref needs to explicitly specify the type, auto detection is not allowed anymore:

https://github.com/WebAssembly/testsuite/blob/d76759e746f3564a03f6106ae19679742f2a1831/proposals/function-references/call_ref.wast#L5

@@ -63,6 +63,7 @@ void WriteS32Leb128(Stream* stream, T value, const char* desc) {
size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is S33 used somewhere in the binary format now?

return options_.features.function_references_enabled();

default:
return false;
return IsGenericReferenceType(type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So generic types are concrete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that is just for removing code duplication. I have introduced IsGenericReferenceType, and this part of the code was exactly the same, so I used it here. Shall I copy back the original code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this looks good, I just wanted to confirm that generic reference types are to be considered concrete types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might misunderstand the concept (naming) here. The new IsGenericReferenceType only used by ref.null, and it is a concrete type there. Shall I rename it to IsConcreteReferenceType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to IsConcreteReferenceType.

src/leb128.cc Outdated
@@ -288,6 +286,40 @@ size_t ReadS32Leb128(const uint8_t* p,
}
}

size_t ReadS33Leb128(const uint8_t* p,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little more what you mean here?

For sure we allow encoding that are over-long, i.e. zero-padded. This is used a lot in the llvm wasm writer, for example.

if ((sign_bit_set && top_bits != 0x7e) ||
(!sign_bit_set && top_bits != 0)) {
int top_bits = p[9];
if (top_bits != 0x7f && top_bits != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bugfix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is an optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is also repeated in other function in this file such as ReadS32Leb128. Should we make this change for all function? Maybe this can be its own PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case, since all bits must be zero or one (the 10th byte contains only the sign). For other read functions (S32), some bits of the last byte is also part of the number that is read.

ReadU64Leb128 has a similar optimization currently in the code:
https://github.com/WebAssembly/wabt/blob/main/src/leb128.cc#L243

I can move this two lines into a separate patch if you prefer that.

} else {
PushType(Type::FuncRef);
}
PushType(Type(Type::Ref, func_type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete the second argument here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right!


return Result::Error;
result_ |= validator_.OnCallRef(expr->loc, expr->sig_type);
return Result::Ok;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return failing if validator_.OnCallRef fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange for me as well. All the other functions accumulate errors in result_ and returns with Result::Ok so I just followed their style:

https://github.com/WebAssembly/wabt/blob/main/src/validator.cc#L320

@zherczeg
Copy link
Contributor Author

I have updated the description

@zherczeg
Copy link
Contributor Author

I hope I did all the requests. Thank you for the review!

@sbc100 sbc100 changed the title Implement changes of function reference proposal Implement support for function references proposal May 21, 2025
@sbc100 sbc100 changed the title Implement support for function references proposal Add support for function references proposal May 21, 2025
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 9df2715 to 0719fc8 Compare May 22, 2025 04:43
Support named references for globals, locals, tables, elems
Support named references for call_ref, ref_null
Extend Var variables with an optional type field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants