diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index c26efbe..9cedcec 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -202,11 +202,11 @@ value reaches any instruction in this proposal. The one exception is
### Creating strings
```
-(string.new_utf8 $memory ptr:address bytes:i32)
+(string.decode_from_utf8 $memory ptr:address bytes:i32)
-> str:stringref
-(string.new_lossy_utf8 $memory ptr:address bytes:i32)
+(string.decode_from_lossy_utf8 $memory ptr:address bytes:i32)
-> str:stringref
-(string.new_wtf8 $memory ptr:address bytes:i32)
+(string.decode_from_wtf8 $memory ptr:address bytes:i32)
-> str:stringref
```
Create a new string from the *`bytes`* bytes in memory at *`ptr`*.
@@ -215,22 +215,22 @@ Out-of-bounds access will trap. The maximum value for *`bytes`* is
These three instructions decode the bytes in three different ways:
- * `string.new_utf8` decodes using a strict UTF-8 decoder. If the
+ * `string.decode_from_utf8` decodes using a strict UTF-8 decoder. If the
bytes are not valid UTF-8, trap.
- * `string.new_lossy_utf8` decodes using a sloppy UTF-8 decoder: all
+ * `string.decode_from_lossy_utf8` decodes using a sloppy UTF-8 decoder: all
maximal subparts of an invalid subsequence are decoded as if they
were `U+FFFD` (the replacement character) instead. This instruction
will never trap due to a decoding error. See the section entitled
"U+FFFD Substitution of Maximal Subparts" in the Unicode standard,
version 14.0.0, page 126.
- * `string.new_wtf8` decodes using a strict WTF-8 decoder, which is like
+ * `string.decode_from_wtf8` decodes using a strict WTF-8 decoder, which is like
UTF-8 but also allows isolated surrogates. If the bytes are not
valid WTF-8, trap.
```
-(string.new_wtf16 $memory ptr:address codeunits:i32)
+(string.decode_from_wtf16 $memory ptr:address codeunits:i32)
-> str:stringref
```
Create a new string from the *`codeunits`* code units encoded in memory at
@@ -240,14 +240,14 @@ is 230–1; passing a higher value traps. Each code unit is
read from memory as if with `i32.load16`, and is therefore decoded
using little-endian byte order.
-#### `string.new` size limits
+#### `string.decode_from_*` size limits
Creating a string is a form of dynamic allocation and can fail. The
same implementation running on different machines can have different
behaviors. The specification can only say that byte/code-unit sizes
above a certain limit *must* fail; but for sizes within the limits, the
allocations *may* fail. If an allocation fails, the implementation must
-trap. Fallible `string.new` is a possible future extension.
+trap. Fallible `string.decode_from_*` is a possible future extension.
### String literals
@@ -281,7 +281,7 @@ string literal section as a future extension.
The maximum size for the WTF-8 encoding of an individual string literal
is 231–1 bytes. Embeddings may impose their own limits which
-are more restricted. But similarly to `string.new_wtf8`, instantiating
+are more restricted. But similarly to `string.decode_from_wtf8`, instantiating
a module with string literals may fail due to lack of memory resources,
even if the string size is formally within the limits. However
`string.const` itself never traps when passed a valid literal offset.
@@ -331,7 +331,7 @@ is 230-1. If an encoding would require more code units than
the limit, the result is -1.
```
-(string.encode_utf8 $memory str:stringref ptr:address)
+(string.encode_to_utf8 $memory str:stringref ptr:address)
-> codeunits:i32
```
Encode the contents of the string *`str`* as UTF-8 to memory at *ptr*.
@@ -340,11 +340,11 @@ written, which will be the same as returned by the corresponding
`string.measure_utf8`.
The maximum number of bytes that can be encoded at once by
-`string.encode` is 231-1. If an encoding would require more
+`string.encode_to_utf8` is 231-1. If an encoding would require more
bytes, it is as if the codepoints can't be encoded (a trap).
```
-(string.encode_lossy_utf8 $memory str:stringref ptr:address)
+(string.encode_to_lossy_utf8 $memory str:stringref ptr:address)
-> codeunits:i32
```
Encode the contents of the string *`str`* as UTF-8 to memory at *`ptr`*.
@@ -353,11 +353,11 @@ character) instead. Return the number of code units written, which will
be the same as returned by the corresponding `string.measure_wtf8`.
The maximum number of bytes that can be encoded at once by
-`string.encode` is 231-1. If an encoding would require more
+`string.encode_to_lossy_utf8` is 231-1. If an encoding would require more
bytes, it is as if the codepoints can't be encoded (a trap).
```
-(string.encode_wtf8 $memory str:stringref ptr:address)
+(string.encode_to_wtf8 $memory str:stringref ptr:address)
-> codeunits:i32
```
Encode the contents of the string *`str`* as WTF-8 to memory at *`ptr`*.
@@ -365,11 +365,11 @@ Return the number of code units written, which will be the same as
returned by the corresponding `string.measure_wtf8`.
The maximum number of bytes that can be encoded at once by
-`string.encode` is 231-1. If an encoding would require more
+`string.encode_to_wtf8` is 231-1. If an encoding would require more
bytes, it is as if the codepoints can't be encoded (a trap).
```
-(string.encode_wtf16 $memory str:stringref ptr:address)
+(string.encode_to_wtf16 $memory str:stringref ptr:address)
-> codeunits:i32
```
Encode the contents of the string *`str`* as WTF-16 to memory at
@@ -380,7 +380,7 @@ Each code unit is written to memory as if stored by `i32.store16`, so
WTF-16 code units are in little-endian byte order.
The maximum number of bytes that can be encoded at once by
-`string.encode` is 231-1. If an encoding would require more
+`string.encode_to_wtf16` is 231-1. If an encoding would require more
bytes, it is as if the codepoints can't be encoded (a trap).
### Concatenation
@@ -603,13 +603,13 @@ The instructions below shall be available in WebAssembly implementations
that support both GC and stringrefs.
```
-(string.new_utf8_array codeunits:$t start:i32 end:i32)
+(string.decode_from_utf8_array codeunits:$t start:i32 end:i32)
if expand($t) => array i8
-> str:stringref
-(string.new_lossy_utf8_array codeunits:$t start:i32 end:i32)
+(string.decode_from_lossy_utf8_array codeunits:$t start:i32 end:i32)
if expand($t) => array i8
-> str:stringref
-(string.new_wtf8_array codeunits:$t start:i32 end:i32)
+(string.decode_from_wtf8_array codeunits:$t start:i32 end:i32)
if expand($t) => array i8
-> str:stringref
```
@@ -617,12 +617,12 @@ Create a new string from a subsequence of the *`codeunits`* bytes in a
GC-managed array, starting from offset *`start`* and continuing to but
not including *`end`*. If *`end`* is less than *`start`* or is greater
than the array length, trap. The bytes are decoded in the same way as
-`string.new_utf8`, `string.new_lossy_utf8`, and `string.new_wtf8`,
+`string.decode_from_utf8`, `string.decode_from_lossy_utf8`, and `string.decode_from_wtf8`,
respectively. The maximum value for *`end`*–*`start`* is
231–1; passing a higher value traps.
```
-(string.new_wtf16_array codeunits:$t start:i32 end:i32)
+(string.decode_from_wtf16_array codeunits:$t start:i32 end:i32)
if expand($t) => array i16
-> str:stringref
```
@@ -634,16 +634,16 @@ for *`end`*–*`start`* is 230–1; passing a higher value
traps.
```
-(string.encode_utf8_array str:stringref array:$t start:i32)
+(string.encode_to_utf8_array str:stringref array:$t start:i32)
if expand($t) => array (mut i8)
-> codeunits:i32
-(string.encode_lossy_utf8_array str:stringref array:$t start:i32)
+(string.encode_to_lossy_utf8_array str:stringref array:$t start:i32)
if expand($t) => array (mut i8)
-> codeunits:i32
-(string.encode_wtf8_array str:stringref array:$t start:i32)
+(string.encode_to_wtf8_array str:stringref array:$t start:i32)
if expand($t) => array (mut i8)
-> codeunits:i32
-(string.encode_wtf16_array str:stringref array:$t start:i32)
+(string.encode_to_wtf16_array str:stringref array:$t start:i32)
if expand($t) => array (mut i16)
-> codeunits:i32
```
@@ -655,8 +655,8 @@ same as the result of a the corresponding `string.measure_wtf8` or
code units in the array, trap. Note that no `NUL` terminator is ever
written.
-For `string.encode_utf8_array`, trap if an isolated surrogate is seen.
-For `string.encode_lossy_utf8_array`, replace isolated surrogates with
+For `string.encode_to_utf8_array`, trap if an isolated surrogate is seen.
+For `string.encode_to_lossy_utf8_array`, replace isolated surrogates with
`U+FFFD`.
## Binary encoding
@@ -669,21 +669,21 @@ reftype ::= ...
| 0x61 ⇒ stringview_iter ; SLEB128(-0x1f)
instr ::= ...
- | 0xfb 0x80:u32 $mem:u32 ⇒ string.new_utf8 $mem
- | 0xfb 0x81:u32 $mem:u32 ⇒ string.new_wtf16 $mem
+ | 0xfb 0x80:u32 $mem:u32 ⇒ string.decode_from_utf8 $mem
+ | 0xfb 0x81:u32 $mem:u32 ⇒ string.decode_from_wtf16 $mem
| 0xfb 0x82:u32 $idx:u32 ⇒ string.const $idx
| 0xfb 0x83:u32 ⇒ string.measure_utf8
| 0xfb 0x84:u32 ⇒ string.measure_wtf8
| 0xfb 0x85:u32 ⇒ string.measure_wtf16
- | 0xfb 0x86:u32 $mem:u32 ⇒ string.encode_utf8 $mem
- | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_wtf16 $mem
+ | 0xfb 0x86:u32 $mem:u32 ⇒ string.encode_to_utf8 $mem
+ | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_to_wtf16 $mem
| 0xfb 0x88:u32 ⇒ string.concat
| 0xfb 0x89:u32 ⇒ string.eq
| 0xfb 0x8a:u32 ⇒ string.is_usv_sequence
- | 0xfb 0x8b:u32 $mem:u32 ⇒ string.new_lossy_utf8 $mem
- | 0xfb 0x8c:u32 $mem:u32 ⇒ string.new_wtf8 $mem
- | 0xfb 0x8d:u32 $mem:u32 ⇒ string.encode_lossy_utf8 $mem
- | 0xfb 0x8e:u32 $mem:u32 ⇒ string.encode_wtf8 $mem
+ | 0xfb 0x8b:u32 $mem:u32 ⇒ string.decode_from_lossy_utf8 $mem
+ | 0xfb 0x8c:u32 $mem:u32 ⇒ string.decode_from_wtf8 $mem
+ | 0xfb 0x8d:u32 $mem:u32 ⇒ string.encode_to_lossy_utf8 $mem
+ | 0xfb 0x8e:u32 $mem:u32 ⇒ string.encode_to_wtf8 $mem
| 0xfb 0x90:u32 ⇒ string.as_wtf8
| 0xfb 0x91:u32 ⇒ stringview_wtf8.advance
| 0xfb 0x92:u32 $mem:u32 ⇒ stringview_wtf8.encode_utf8 $mem
@@ -700,14 +700,14 @@ instr ::= ...
| 0xfb 0xa2:u32 ⇒ stringview_iter.advance
| 0xfb 0xa3:u32 ⇒ stringview_iter.rewind
| 0xfb 0xa4:u32 ⇒ stringview_iter.slice
- | 0xfb 0xb0:u32 [gc] ⇒ string.new_utf8_array
- | 0xfb 0xb1:u32 [gc] ⇒ string.new_wtf16_array
- | 0xfb 0xb2:u32 [gc] ⇒ string.encode_utf8_array
- | 0xfb 0xb3:u32 [gc] ⇒ string.encode_wtf16_array
- | 0xfb 0xb4:u32 [gc] ⇒ string.new_lossy_utf8_array
- | 0xfb 0xb5:u32 [gc] ⇒ string.new_wtf8_array
- | 0xfb 0xb6:u32 [gc] ⇒ string.encode_lossy_utf8_array
- | 0xfb 0xb7:u32 [gc] ⇒ string.encode_wtf8_array
+ | 0xfb 0xb0:u32 [gc] ⇒ string.decode_from_utf8_array
+ | 0xfb 0xb1:u32 [gc] ⇒ string.decode_from_wtf16_array
+ | 0xfb 0xb2:u32 [gc] ⇒ string.encode_to_utf8_array
+ | 0xfb 0xb3:u32 [gc] ⇒ string.encode_to_wtf16_array
+ | 0xfb 0xb4:u32 [gc] ⇒ string.decode_from_lossy_utf8_array
+ | 0xfb 0xb5:u32 [gc] ⇒ string.decode_from_wtf8_array
+ | 0xfb 0xb6:u32 [gc] ⇒ string.encode_to_lossy_utf8_array
+ | 0xfb 0xb7:u32 [gc] ⇒ string.encode_to_wtf8_array
;; New section. If present, must be present only once, and right before
;; the globals section (or where the globals section would be). Each
@@ -733,11 +733,11 @@ operand allows you to elide the memory, in which case it defaults to 0.
local.get $ptr
local.get $ptr
call $strlen
- string.new_utf8)
+ string.decode_from_utf8)
```
If the bytes being decoded aren't actually valid UTF-8, this function
-will trap. Use `string.new_lossy_utf8` in contexts where replacing
+will trap. Use `string.decode_from_lossy_utf8` in contexts where replacing
invalid data with `U+FFFD` is a better strategy than trapping.
### Make string from an array of WTF-8 code units in memory
@@ -746,20 +746,20 @@ invalid data with `U+FFFD` is a better strategy than trapping.
(func $string-from-wtf8n (param $ptr i32) (param $len i32) (result stringref)
local.get $ptr
local.get $len
- string.new_wtf8)
+ string.decode_from_wtf8)
```
-Note that `string.new_wtf8` (and `string.new_wtf8_array`) are always
+Note that `string.decode_from_wtf8` (and `string.decode_from_wtf8_array`) are always
strict decoders: if the bytes are not valid WTF-8, the instruction
traps.
-### Make string from UTF-16 in memory
+### Make string from WTF-16 in memory
```wasm
-(func $string-from-utf16 (param $ptr i32) (param $units i32) (result stringref)
+(func $string-from-wtf16n (param $ptr i32) (param $units i32) (result stringref)
local.get $ptr
local.get $units
- string.new_wtf16)
+ string.decode_from_wtf16)
```
This proposal doesn't distinguish between UTF-16 and WTF-16 at all;
@@ -971,7 +971,7 @@ open to considering adding more instructions.
local.get $str
local.get $ptr
- string.encode_utf8 ;; push bytes written, same as $len
+ string.encode_to_utf8 ;; push bytes written, same as $len
local.get $ptr
i32.add
@@ -986,8 +986,8 @@ Using `string.measure_utf8` ensures that the encoded string is a valid
unicode scalar value sequence. How to handle invalid UTF-8 is up to the
user; instead of `unreachable` we could throw an exception.
-Note that in this case, the subsequent `string.encode_utf8` could just
-as well have been `string.encode_lossy_utf8` or `string.encode_wtf8`, as
+Note that in this case, the subsequent `string.encode_to_utf8` could just
+as well have been `string.encode_to_lossy_utf8` or `string.encode_to_wtf8`, as
these instructions are all the same for strings that do not contain
isolated surrogates, and we checked that there were none.
@@ -1012,7 +1012,7 @@ will encode isolated surrogates as WTF-8.
local.get $cursor
global.get $buf
i32.const 1024
- string.encode_wtf8 ;; push bytes written
+ string.encode_to_wtf8 ;; push bytes written
local.tee $bytes
(if i32.eqz (then return)) ;; if no bytes encoded, done
local.get $bytes
@@ -1445,7 +1445,7 @@ faster than `externref`+imports:
predictable performance than e.g. an encoder implemented in JS (for
web embeddings).
4. Reading string contents, either via
- `string.encode_wtf8`-then-process-inline or via `stringview_wtf16`,
+ `string.encode_to_wtf8`-then-process-inline or via `stringview_wtf16`,
is likely faster than calling out to JavaScript to read code units
one at a time. WebAssembly-to-JavaScript calls are cheap but not
free.
@@ -1506,8 +1506,8 @@ concrete adapter function specialized to the data representations used
by the caller and the callee. The instruction set in this proposal can
be used to implement the adapter function for passing a `stringref` as a
string; assuming that the adapter function is generated in such a way
-that it has access to the target memory, `string.encode_wtf8` can
-implement the copy and validation at the same time. `string.new_wtf8`
+that it has access to the target memory, `string.encode_to_wtf8` can
+implement the copy and validation at the same time. `string.decode_from_wtf8`
would be the implementation of getting a `stringref` from an
interface-typed string value, again assuming UTF-8 encoding for these
values.