-
Notifications
You must be signed in to change notification settings - Fork 744
Add support for function references proposal #2562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@sbc100 @keithw since I don't know enough about the project, these are just some random changes in the parser. The aim is parsing https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast It is doing something (parsing correctly is surely an exaggeration) with Could you check that these changes makes sense? There is a Thank you for the help. |
This patch goes to a new direction. The 32 bit type filed of |
c804a98
to
c40c098
Compare
There is something I don't understand. There is a The key change of the patch is that Btw, may I ask how actively maintained the wabt project? |
I believe |
That is possible. Unfortunately not everything is described clearly in the text. For example, |
I have tried to track down the v8 implementation for https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L386 It calls https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L217 Starts with: So it looks like it uses the s33 decoding which the specification mentioned. |
c94e5a1
to
a959acb
Compare
I have some good news! This test is fully working now (including the interpreter): Only 9 fails remained, they all look like CallRef syntax change related, so some tests need to be updated. The patch is quite large. There are some parts which are questionable. I will comment about these parts later to help review. |
78596d6
to
b79aead
Compare
@@ -181,7 +181,7 @@ class BinaryReaderLogging : public BinaryReaderDelegate { | |||
Result OnCatchExpr(Index tag_index) override; | |||
Result OnCatchAllExpr() override; | |||
Result OnCallIndirectExpr(Index sig_index, Index table_index) override; | |||
Result OnCallRefExpr() override; | |||
Result OnCallRefExpr(Type sig_type) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think type sounds better than Index, although it could be an Index.
Index, | ||
Name, | ||
}; | ||
|
||
struct Var { | ||
// Var can represent variables or types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a major internal change. It does not increase the size of Var. I think it is better than constructing a Var/Type pair.
|
||
Location loc; | ||
|
||
private: | ||
void Destroy(); | ||
|
||
VarType type_; | ||
// Can be set to Type::Enum types, 0 represent no optional type. | ||
int16_t opt_type_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This range should be enough for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some reason we can't use Index index_
below to store the type index? Isn't Index supposed to be flexible in that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively if you do need a third type for Var why not add another element to VarType and add another element to the union below?
Sorry, I think I must be misunderstanding something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For indexed references, you need both a reference type (nullable / non-nullable) and an index or name (only for the text parser). The index or name is stored in the union. For non indexed types the type could go to the union.
Another alternative could be:
VarType {
Name, Index, RefName, RefIndex, RefNullName, RefNullIndex, Type
}
But handling this seems far more complex than the current one, and we have more than enough bits (VarType is aligned to 32 bit, because the union is aligned to 32 bit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the main part of this PR that I'm still finding confusing.
@tlively WDYT about this part?
include/wabt/leb128.h
Outdated
@@ -63,6 +63,7 @@ void WriteS32Leb128(Stream* stream, T value, const char* desc) { | |||
size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value); | |||
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value); | |||
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value); | |||
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec adds this new type, the output is uint64_t. It could be merged with ReadS32Leb128
by adding a bool*
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is S33 used somewhere in the binary format now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And they use it even more places in gc :(
https://webassembly.github.io/function-references/core/binary/types.html#heap-types
a 33 bit integer can hold a 32 bit unsigned index or a type value. Its maximum (leb) size is the same as 32 bit integers.
ref.null
uses this for example.
https://webassembly.github.io/function-references/core/syntax/instructions.html#syntax-instr-ref
@@ -220,18 +221,6 @@ class SharedValidator { | |||
Result OnUnreachable(const Location&); | |||
|
|||
private: | |||
struct FuncType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to type checker.
@@ -46,6 +46,8 @@ class Type { | |||
FuncRef = -0x10, // 0x70 | |||
ExternRef = -0x11, // 0x6f | |||
Reference = -0x15, // 0x6b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely missing from the proposal. Maybe it was removed at some point? Currently Ref and Reference are considered equal, and randomly used one or the other. I wanted to wait a feedback before doing anything with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v8 it is kStructRefCode:
https://chromium.googlesource.com/v8/v8/+/refs/heads/main/src/wasm/wasm-constants.h#46
Probably used by GC. Probably kArrayRefCode will be needed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlively do you know the history here?
Result BinaryReaderLogging::name(Type type) { \ | ||
LOGF(#name "(%s)\n", type.GetName().c_str()); \ | ||
return reader_->name(type); \ | ||
#define DEFINE_TYPE(name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this might be reverted. I will check.
@@ -897,7 +897,7 @@ Result BinaryReaderObjdumpDisassemble::OnOpcodeType(Type type) { | |||
if (!in_function_body) { | |||
return Result::Ok; | |||
} | |||
if (current_opcode == Opcode::SelectT) { | |||
if (current_opcode == Opcode::SelectT || current_opcode == Opcode::CallRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more opcodes could go here (ref.null?). This is confusing for me, the two toString variants works differently, although they could be merged.
src/leb128.cc
Outdated
@@ -288,6 +286,40 @@ size_t ReadS32Leb128(const uint8_t* p, | |||
} | |||
} | |||
|
|||
size_t ReadS33Leb128(const uint8_t* p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These readers don't check that the minimum amount of bytes is used for encoding. This is important for utf, not sure for leb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a little more what you mean here?
For sure we allow encoding that are over-long, i.e. zero-padded. This is used a lot in the llvm wasm writer, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know that. Some encodings, such as utf8 expects a minimum number of byte encodings, so a 7 bit character cannot be encoded more than a single byte, even if the type bits are correct for longer encodings. If padding for leb is allowed then it is perfectly ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can avoid duplicating code here and instead just use ReadS64Leb128 here (with a bounds check after reading it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is possible. We could add a default false boolean flag to the ReadS64Leb128 as well.
Anyway, what is the reason for the lot of code duplications (and manual loop unrollings) in this file? Performance? For consistency, I also added another duplication, but this part could be improved (reducing code size) in many ways. It can be done in another patch as well, before or after this patch.
What is your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, to keep this patch focused how about just removing the LEB changes from this PR?
I think its reasonable for now to just read S33 values using ReadS64Leb128. Apparently this is what binaryen does. We can come back and try to cleanup this stuff and remove the duplication later perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the S33 parsing. We should do something with it later. Please, let me know your opinion, and I will do it.
src/wast-parser.cc
Outdated
@@ -545,6 +546,17 @@ Result ResolveFuncTypes(Module* module, Errors* errors) { | |||
} | |||
|
|||
if (func) { | |||
if (!func->local_type_list.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a bugfix. Named references has been never resolved for local types. Maybe we could do it on the compressed list, but that can be done in another patch.
6587cf7
to
0f7353f
Compare
@@ -0,0 +1,19 @@ | |||
;;; TOOL: run-interp-spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out that call_ref.wast
is already in the repository, and it is passing now. There are other tests in the directory (e.g. binary.wast
) which is also passing now, they could be updated later as well.
I think the patch is ready. There are lots of follow-up work ahead. |
@@ -189,13 +189,32 @@ void ScriptValidator::PrintError(const Location* loc, const char* format, ...) { | |||
errors_->emplace_back(ErrorLevel::Error, *loc, buffer); | |||
} | |||
|
|||
static Result CheckType(Type actual, Type expected) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this is only used by comparing expected results. This means "module" context is not present, and expected values are immediate values. So maybe an even restricted checker could work.
58411a5
to
bf0400f
Compare
ea9bd86
to
09523fe
Compare
This is the next patch for supporting the function references proposal. It focuses on improving the wabt supported elements of WebAssembly. It is a feature patch now, the bugfixes were extracted and landed as separate patches. The key concept changes of this patch:
Please let me know your opinion about these design concepts. Since the parser can store pointers to types, and check/resolve them later, it is possible to use types only. But using Vars directly is not necessary a problem. |
d1a3c84
to
f94828b
Compare
a9d13ab
to
843bee5
Compare
8440624
to
9b964c0
Compare
25a19f8
to
df1ba7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this!
I'm excited to see this land but I'm also having some trouble groking the whole change since its so large.
I think the parts that could use some more explanation, are the changes to ir.h an in particular the hybrid use of struct Var
. Perhaps could you could update the PR description which an explanaation of the various parts of the this change?
@@ -184,6 +184,7 @@ struct FuncType : ExternType { | |||
|
|||
ValueTypes params; | |||
ValueTypes results; | |||
std::vector<FuncType>* func_types; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this field used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for supporting a complex requirement:
We need to match the import/export references across modules. To do that, the interpreter needs to see the whole type tree (where the references are expanded to other types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment about it. This is a non-trivial connection.
@@ -36,24 +36,41 @@ namespace wabt { | |||
|
|||
struct Module; | |||
|
|||
enum class VarType { | |||
enum class VarType : uint16_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not uint8_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply split the 32 bit (including padding) field to two 16 bit fields. I could also use bitfields, but I think 16 bit is more than enough for WebAssembly types. Shall I change that to 8 bit (this way another 8 bit padding will be added by the compiler)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, perhaps just add a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
|
||
void set_index(Index); | ||
void set_name(std::string&&); | ||
void set_name(std::string_view); | ||
void set_opt_type(Type::Enum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the opt
part mean here? optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For most Vars (e.g. names of WebAssembly constructs) there is no need for any type.
@@ -727,9 +753,7 @@ class CallRefExpr : public ExprMixin<ExprType::CallRef> { | |||
explicit CallRefExpr(const Location& loc = Location()) | |||
: ExprMixin<ExprType::CallRef>(loc) {} | |||
|
|||
// This field is setup only during Validate phase, | |||
// so keep that in mind when you use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment no longer valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Call ref needs to explicitly specify the type, auto detection is not allowed anymore:
include/wabt/leb128.h
Outdated
@@ -63,6 +63,7 @@ void WriteS32Leb128(Stream* stream, T value, const char* desc) { | |||
size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value); | |||
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value); | |||
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value); | |||
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is S33 used somewhere in the binary format now?
src/binary-reader.cc
Outdated
return options_.features.function_references_enabled(); | ||
|
||
default: | ||
return false; | ||
return IsGenericReferenceType(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So generic types are concrete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that is just for removing code duplication. I have introduced IsGenericReferenceType
, and this part of the code was exactly the same, so I used it here. Shall I copy back the original code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this looks good, I just wanted to confirm that generic reference types are to be considered concrete types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might misunderstand the concept (naming) here. The new IsGenericReferenceType
only used by ref.null
, and it is a concrete type there. Shall I rename it to IsConcreteReferenceType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to IsConcreteReferenceType
.
src/leb128.cc
Outdated
@@ -288,6 +286,40 @@ size_t ReadS32Leb128(const uint8_t* p, | |||
} | |||
} | |||
|
|||
size_t ReadS33Leb128(const uint8_t* p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a little more what you mean here?
For sure we allow encoding that are over-long, i.e. zero-padded. This is used a lot in the llvm wasm writer, for example.
if ((sign_bit_set && top_bits != 0x7e) || | ||
(!sign_bit_set && top_bits != 0)) { | ||
int top_bits = p[9]; | ||
if (top_bits != 0x7f && top_bits != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bugfix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is also repeated in other function in this file such as ReadS32Leb128
. Should we make this change for all function? Maybe this can be its own PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a special case, since all bits must be zero or one (the 10th byte contains only the sign). For other read functions (S32), some bits of the last byte is also part of the number that is read.
ReadU64Leb128 has a similar optimization currently in the code:
https://github.com/WebAssembly/wabt/blob/main/src/leb128.cc#L243
I can move this two lines into a separate patch if you prefer that.
} else { | ||
PushType(Type::FuncRef); | ||
} | ||
PushType(Type(Type::Ref, func_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete the second argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right!
|
||
return Result::Error; | ||
result_ |= validator_.OnCallRef(expr->loc, expr->sig_type); | ||
return Result::Ok; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return failing if validator_.OnCallRef
fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange for me as well. All the other functions accumulate errors in result_
and returns with Result::Ok
so I just followed their style:
https://github.com/WebAssembly/wabt/blob/main/src/validator.cc#L320
I have updated the description |
I hope I did all the requests. Thank you for the review! |
9df2715
to
0719fc8
Compare
Support named references for globals, locals, tables, elems Support named references for call_ref, ref_null Extend Var variables with an optional type field
This large patch adds function references support for those parts of the wabt library, which already present in the code. The current code is designed for an older (outdated) variant of the function references proposal.
Key changes: