|
2 | 2 |
|
3 | 3 | The types `char` and `str` hold textual data.
|
4 | 4 |
|
5 |
| -A value of type `char` is a [Unicode scalar value] (i.e. a code point that |
6 |
| -is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to |
7 |
| -0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` is effectively a UCS-4 / UTF-32 |
8 |
| -string. |
| 5 | +A value of type `char` is a [Unicode scalar value] (i.e. a code point that is |
| 6 | +not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF |
| 7 | +or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a |
| 8 | +`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 |
| 9 | +string of length 1. |
9 | 10 |
|
10 |
| -A value of type `str` is a Unicode string, represented as an array of 8-bit |
11 |
| -unsigned bytes holding a sequence of UTF-8 code points. Since `str` is a |
12 |
| -[dynamically sized type], it is not a _first-class_ type, but can only be |
13 |
| -instantiated through a pointer type, such as `&str`. |
| 11 | +A value of type `str` is represented the same way as `[u8]`, it is a slice of |
| 12 | +8-bit unsigned bytes. However, the Rust standard library makes extra assumptions |
| 13 | +about `str`: methods working on `str` assume and ensure that the data in there |
| 14 | +is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause |
| 15 | +[Undefined Behavior] now or in the future. |
| 16 | + |
| 17 | +Since `str` is a [dynamically sized type], it can only be instantiated through a |
| 18 | +pointer type, such as `&str`. |
14 | 19 |
|
15 | 20 | [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
|
| 21 | +[Undefined Behavior]: ../behavior-considered-undefined.md |
16 | 22 | [dynamically sized type]: ../dynamically-sized-types.md
|
0 commit comments