diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 1c80b8b..d114a55 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -4,8 +4,7 @@ - [译著](./intro_zh.md) - [介绍](./intro.md) - [类型](./chapter_1.md) - - [用户类型](./chapter_1/user_type.md) - - [类型系统](./chapter_1/use-types-2.md) + - [方法 1:使用类型系统表达数据结构](./chapter_1/user_type.md) - [概念](./chapter_2.md) - [依赖](./chapter_3.md) - [工具](./chapter_4.md) diff --git a/src/chapter_1/user_type.md b/src/chapter_1/user_type.md index 3e00c54..84a55ff 100644 --- a/src/chapter_1/user_type.md +++ b/src/chapter_1/user_type.md @@ -1,16 +1,19 @@ -# Item 1: Use the type system to express your data structures +# 方法 1:使用类型系统表达你的数据结构 -"who called them programers and not type writers" – @thingskatedid +> “谁叫他们是程序员,而不是打字员” —— [@thingskatedid](https://twitter.com/thingskatedid/status/1400213496785108997) -The basics of Rust's type system are pretty familiar to anyone coming from another statically typed programming language (such as C++, Go or Java). There's a collection of integer types with specific sizes, both signed (i8, i16, i32, i64, i128) and unsigned (u8, u16, u32, u64, u128). +对于来自其他静态类型编程语言(如 C++、Go 或 Java)的人来说,Rust 类型系统的基本概念是非常熟悉的。有一系列具有特定大小的整数类型,包括有符号(i8, i16, i32, i64, i128)和无符号(u8, u16, u32, u64, u128)。 -There's also signed (isize) and unsigned (usize) integers whose size matches the pointer size on the target system. Rust isn't a language where you're going to be doing much in the way of converting between pointers and integers, so that characterization isn't really relevant. However, standard collections return their size as a usize (from .len()), so collection indexing means that usize values are quite common – which is obviously fine from a capacity perspective, as there can't be more items in an in-memory collection than there are memory addresses on the system. +还有两种整数类型,其大小与目标系统上的指针大小匹配:有符号(isize)和无符号(usize)。Rust 并不是那种会在指针和整数之间进行大量转换的语言,所以这种特性并不是特别相关。然而,标准集合返回它们的大小作为一个 usize(来自 .len()),所以集合索引意味着 usize 值非常常见 —— 从容量的角度来看,这是显然没有问题的,因为内存中的集合不可能有比系统上的内存地址更多的项。 -The integral types do give us the first hint that Rust is a stricter world than C++ – attempting to put a quart (i32) into a pint pot (i16) generates a compile-time error. +整数类型确实让我们第一次意识到 Rust 是一个比 C++ 更严格的世界 —— 尝试将一个 quart(i32)放入 pint pot(i16)会在编译时产生错误。 + +```rust +let x: i32 = 42; +let y: i16 = x; +``` ```rust - let x: i32 = 42; - let y: i16 = x; error[E0308]: mismatched types --> use-types/src/main.rs:14:22 | @@ -25,13 +28,18 @@ help: you can convert an `i32` to an `i16` and panic if the converted value does | ++++++++++++++++++++ ``` -This is reassuring: Rust is not going to sit there quietly while the programmer does things that are risky. It also gives an early indication that while Rust has stronger rules, it also has helpful compiler messages that point the way to how to comply with the rules. The suggested solution raises the question of how to handle situations where the conversion would alter the value, and we'll have more to say on both error handling (Item 4) and using panic! (Item 18) later. +这让人感到安心:当程序员进行有风险的操作时,Rust 不会安静地坐视不管。这也早早地表明,尽管 Rust 有更严格的规则,但它也有助于编译器消息指向如何遵守规则的方法。 + +建议的解决方案是抛出一个问题,即如何处理转换会改变值的情况,关于`错误处理`([方法 4])和使用 `panic!`([方法 18])我们将在后面有更多的讨论。 -Rust also doesn't allow some things that might appear "safe": +Rust 也不允许一些可能看起来“安全”的操作: + +```rust +let x = 42i32; // Integer literal with type suffix +let y: i64 = x; +``` ```rust - let x = 42i32; // Integer literal with type suffix - let y: i64 = x; error[E0308]: mismatched types --> use-types/src/main.rs:23:22 | @@ -46,77 +54,80 @@ help: you can convert an `i32` to an `i64` | +++++++ ``` -Here, the suggested solution doesn't raise the spectre of error handling, but the conversion does still need to be explicit. We'll discuss type conversions in more detail later (Item 6). +在这里,建议的解决方案并没有提出错误处理的方法,但转换仍然需要是显式的。我们将在后面章节更详细地讨论类型转换([方法 6])。 + +现在继续探讨不出乎意料的原始类型,Rust 有布尔类型(`bool`)、浮点类型(`f32`, `f64`)和单元类型 `()`(类似于 `C` 的 `void`)。 -Continuing with the unsurprising primitive types, Rust has a bool type, floating point types (f32, f64) and a unit type () (like C's void). +更有趣的是 `char` 字符类型,它持有一个 [`Unicode` 值](类似于 Go 的 [`rune 类型`])。尽管它在内部以 `4 字节`存储,但与 `32 位`整数的转换仍然不会有静默转换。 -More interesting is the char character type, which holds a Unicode value (similar to Go's rune type). Although this is stored as 4 bytes internally, there are again no silent conversions to or from a 32-bit integer. +类型系统中的这种精确性迫使你明确地表达你想要表达的内容 —— u32 值与 char 不同,后者又与序列 UTF-8 字节不同,这又与序列任意字节不同,而且需要你准确地指定你的意思[1](#footnote-1)。[Joel Spolsky 的著名博客]文章可以帮助你理解需要哪种类型。 -This precision in the type system forces you to be explicit about what you're trying to express – a u32 value is different than a char, which in turn is different than a sequence of UTF-8 bytes, which in turn is different than a sequence of arbitrary bytes, and it's up to you to specify exactly which you mean1. Joel Spolsky's famous blog post can help you understand which you need. +当然,有一些辅助方法允许你在这不同的类型之间进行转换,但它们的签名迫使你处理(或明确忽略)失败的可能性。例如,一个 `Unicode` 代码点[2](#footnote-2) 总是可以用 `32 位`表示,所以 `'a' as u32` 是允许的,但反向转换就比较复杂了(因为有些 u32 值不是有效的 Unicode 代码点),例如: -Of course, there are helper methods that allow you to convert between these different types, but their signatures force you to handle (or explicitly ignore) the possibility of failure. For example, a Unicode code point2 can always be represented in 32 bits, so 'a' as u32 is allowed, but the other direction is trickier (as there are u32 values that are not valid Unicode code points): +* [char::from_u32] 返回一个 `Option`,迫使调用者处理失败的情况 +* [char::from_u32_unchecked] 假设有效性,但由于结果是未定义的,因此被标记为`unsafe`,迫使调用者也使用`unsafe`([方法 16])。 -char::from_u32 returns an Option forcing the caller to handle the failure case -char::from_u32_unchecked makes the assumption of validity, but is marked unsafe as a result, forcing the caller to use unsafe too (Item 16). -Aggregate Types -Moving on to aggregate types, Rust has: +## 聚合类型 + +继续讨论聚合类型,Rust 有: +- 数组(`Arrays`),它们持有单个类型的多个实例,实例的数量在编译时已知。例如 `[u32; 4]` 是四个连续的 4 字节整数。 +- 元组(`Tuples`),它们持有多个异构类型的实例,元素的数量和类型在编译时已知,例如 `(WidgetOffset, WidgetSize, WidgetColour)`。如果元组中的类型不够独特 —— 例如 `(i32, i32, &'static str, bool)` —— 最好给每个元素命名并使用 … +- 结构体(`Structs`),它们也持有编译时已知的异构类型实例,但是允许通过名称来引用整个类型和各个字段。 +- 元组结构体(`Tuple structs`)是结构体和元组的杂交体:整个类型有一个名称,但各个字段没有名称 —— 它们通过数字来引用:`s.0`, `s.1` 等。 -Arrays, which hold multiple instances of a single type, where the number of instances is known at compile time. For example [u32; 4] is four 4-byte integers in a row. -Tuples, which hold instances of multiple heterogeneous types, where the number of elements and their types are known at compile time, for example (WidgetOffset, WidgetSize, WidgetColour). If the types in the tuple aren't distinctive – for example (i32, i32, &'static str, bool) – it's better to give each element a name and use… -Structs, which also hold instances of heterogeneous types known at compile time, but which allows both the overall type and the individual fields to be referred to by name. -The tuple struct is a cross-breed of a struct with a tuple: there's a name for the overall type, but no names for the individual fields – they are referred to by number instead: s.0, s.1, etc. ```rust - struct TextMatch(usize, String); - let m = TextMatch(12, "needle".to_owned()); - assert_eq!(m.0, 12); +struct TextMatch(usize, String); +let m = TextMatch(12, "needle".to_owned()); +assert_eq!(m.0, 12); ``` -This brings us to the jewel in the crown of Rust's type system, the enum. +这让我们来到了 Rust 类型系统的皇冠上的宝石:枚举(`enum`)。 + +在其基本形式中,很难看出有什么值得兴奋的。与其他语言一样,枚举允许你指定一组互斥的值,可能附带一个数字或字符串值。 -In its basic form, it's hard to see what there is to get excited about. As with other languages, the enum allows you to specify a set of mutually exclusive values, possibly with a numeric or string value attached. ```rust - enum HttpResultCode { - Ok = 200, - NotFound = 404, - Teapot = 418, - } - let code = HttpResultCode::NotFound; - assert_eq!(code as i32, 404); +enum HttpResultCode { + Ok = 200, + NotFound = 404, + Teapot = 418, +} +let code = HttpResultCode::NotFound; +assert_eq!(code as i32, 404); ``` -Because each enum definition creates a distinct type, this can be used to improve readability and maintainability of functions that take bool arguments. Instead of: +因为每个枚举定义都创建了一个独特的类型,这可以用来提高那些接受布尔参数的函数的可读性和可维护性。例如: ```rust - print_page(/* both_sides= */ true, /* colour= */ false); +print_page(/* both_sides= */ true, /* colour= */ false); ``` -a version that uses a pair of enums: +可以用 `enum` 替换: ```rust - pub enum Sides { - Both, - Single, - } - - pub enum Output { - BlackAndWhite, - Colour, - } - - pub fn print_page(sides: Sides, colour: Output) { - // ... - } +pub enum Sides { + Both, + Single, +} + +pub enum Output { + BlackAndWhite, + Colour, +} + +pub fn print_page(sides: Sides, colour: Output) { + // ... +} ``` -is more type-safe and easier to read at the point of invocation: +在调用处更加类型安全,而且易于阅读: ```rust - print_page(Sides::Both, Output::BlackAndWhite); +print_page(Sides::Both, Output::BlackAndWhite); ``` -Unlike the bool version, if a library user were to accidentally flip the order of the arguments, the compiler would immediately complain: +不同于布尔版本,如果使用该库的用户不小心颠倒了参数的顺序,编译器会立即报错: ```rust error[E0308]: mismatched types @@ -131,18 +142,19 @@ error[E0308]: mismatched types | ^^^^^^^^^^^^^ expected enum `enums::Output`, found enum `enums::Sides` ``` -(Using the newtype pattern (Item 7) to wrap a bool also achieves type safety and maintainability; it's generally best to use that if the semantics will always be Boolean, and to use an enum if there's a chance that a new alternative (e.g. Sides::BothAlternateOrientation) could arise in the future.) +> 使用新类型模式([方法 7])来包装一个 `bool` 也可以实现类型安全和可维护性;如果语义始终是布尔型的,通常最好使用这种方式,如果将来可能会出现新的选择(例如 `Sides::BothAlternateOrientation`),则应使用`枚举`。 -The type safety of Rust's enums continues with the match expression: +Rust 枚举的类型安全性在 `match` 表达式中继续体现出以下这段代码无法编译: -This code does not compile! +```rust +let msg = match code { + HttpResultCode::Ok => "Ok", + HttpResultCode::NotFound => "Not found", + // forgot to deal with the all-important "I'm a teapot" code +}; +``` ```rust - let msg = match code { - HttpResultCode::Ok => "Ok", - HttpResultCode::NotFound => "Not found", - // forgot to deal with the all-important "I'm a teapot" code - }; error[E0004]: non-exhaustive patterns: `Teapot` not covered --> use-types/src/main.rs:65:25 | @@ -161,12 +173,15 @@ error[E0004]: non-exhaustive patterns: `Teapot` not covered = note: the matched value is of type `HttpResultCode` ``` -The compiler forces the programmer to consider all of the possibilities3 that are represented by the enum, even if the result is just to add a default arm _ => {}. (Note that modern C++ compilers can and do warn about missing switch arms for enums as well.) +编译器强制程序员考虑枚举所表示的所有可能性,即使结果只是添加一个默认分支 `_ => {}`。 + +> 注意,现代 C++ 编译器能够并且会对枚举缺失的switch分支发出警告。 -enums With Fields -The true power of Rust's enum feature comes from the fact that each variant can have data that comes along with it, making it into an algebraic data type (ADT). This is less familiar to programmers of mainstream languages; in C/C++ terms it's like a combination of an enum with a union – only type-safe. +## 带有字段的`枚举` -This means that the invariants of the program's data structures can be encoded into Rust's type system; states that don't comply with those invariants won't even compile. A well-designed enum makes the creator's intent clear to humans as well as to the compiler: +Rust枚举特性的真正强大之处在于每个变体都可以携带数据,使其成为一个[代数数据类型](ADT)。这对于主流语言的程序员来说不太熟悉;在C/C++的术语中,它类似于枚举与联合的组合 —— 只是类型安全的。 + +这意味着程序数据结构的不变式可以被编码到 Rust 的类型系统中;不符合那些不变式状态的代码甚至无法编译。一个设计良好的枚举使得创建者的意图对于人类以及编译器都是清晰的: ```rust pub enum SchedulerState { @@ -176,11 +191,11 @@ pub enum SchedulerState { } ``` -Just from the type definition, it's reasonable to guess that Jobs get queued up in the Pending state until the scheduler is fully active, at which point they're assigned to some per-CPU pool. +仅从类型定义来看,可以合理猜测 Job 在 Pending 状态中排队,直到调度器完全激活,此时它们被分配到某个特定 CPU 的池中。 -This highlights the central theme of this Item, which is to use Rust's type system to express the concepts that are associated with the design of your software. +这突出了本方法的中心主题,即使用 Rust 的类型系统来表达与软件设计相关的概念。 -A dead giveaway for when this is not happening is a comment that explains when some field or parameter is valid: +当一个字段或参数何时有效需要通过注释来解释时,这就是一个明显的迹象表明这种情况没有发生: ```rust struct DisplayProps { @@ -192,7 +207,7 @@ struct DisplayProps { } ``` -This is a prime candidate for replacement with an enum holding data: +这是一个非常适合用带有数据的`枚举`来替换的结构体: ```rust #[derive(Debug)] @@ -208,25 +223,52 @@ struct DisplayProperties { } ``` -This small example illustrates a key piece of advice: make invalid states inexpressible in your types. Types that only support valid combinations of values mean that whole classes of error are rejected by the compiler, leading to smaller and safer code. +这个简单的例子说明了一个关键的建议:让你的类型无法表达无效状态。只支持有效值组合的类型意味着整类的错误会被编译器拒绝,从而使得代码更小、更安全。 + +## 选项与错误 + +回到枚举的强大功能,有两个概念非常常见,以至于Rust内置了枚举类型来表达它们。 + +第一个是Option的概念:要么存在特定类型的值(`Some(T)`),要么不存在(`None`)。始终为可能缺失的值使用 `Option`;永远不要退回到使用哨兵值(`-1`, `nullptr`, …)来试图在带内表达相同的概念。 + +然而,有一个微妙的点需要考虑。如果您处理的是事物的集合,您需要决定集合中没有任何事物是否与没有集合相同。在大多数情况下,这种区别不会出现,您可以继续使用 `Vec`:零个事物意味着事物的缺失。 + +然而,确实存在其他罕见的情况,需要用 `Option>` 来区分这两种情况 —— 例如,加密系统可能需要区分“负载单独传输”和“提供空负载”。(这与 `SQL` 中 `NULL` 标记列的争论有关。) + +一个常见的边缘情况是 `String` 可能缺失——是用 `""` 还是 `None` 来表示值的缺失更有意义?无论哪种方式都可以,但 `Option` 清楚地传达了可能缺失该值的可能性。 + +第二个常见的概念源于`错误处理`:如果一个函数失败,应该如何报告这个失败?历史上,使用了特殊的哨兵值(例如,`Linux 系统调用` 的 `-errno` 返回值)或全局变量(`POSIX 系统`的`errno`)。近年来,支持函数返回多个或元组返回值的语言(如Go)可能有一个约定,即返回一个`(result, error)`对,假设在错误非“零”时,结果存在合适的“零”值。 + +在Rust中,始终将可能失败的操作的 结果编码为 `Result`。`T 类型`保存成功的结果(在`Ok`变体中),`E 类型`在失败时保存错误详情(在`Err`变体中)。使用标准类型使得设计意图清晰,并且允许使用标准转换([方法 3])和错误处理([方法 4]);它还使得使用 `?` 运算符来简化错误处理成为可能。 + +--- + +#### 注释 -Options and Errors -Returning to the power of the enum, there are two concepts that are so common that Rust includes built-in enum types to express them. +1: 如果涉及到文件系统,情况会更加复杂,因为流行平台上的文件名介于任意字节和 UTF-8 序列之间:请参阅 [std::ffi::OsString] 文档。 -The first is the concept of an Option: either there's a value of a particular type (Some(T)), or there isn't (None). Always use Option for values that can be absent; never fall back to using sentinel values (-1, nullptr, …) to try to express the same concept in-band. +2: 技术上,是一个 Unicode 标量值,而不是代码点。 -There is one subtle point to consider though. If you're dealing with a collection of things, you need to decide whether having zero things in the collection is the same as not having a collection. For most situations, the distinction doesn't arise and you can go ahead and use Vec: a count of zero things implies an absence of things. +3: 这也意味着在库中为一个现有枚举添加一个新的变体是一个破坏性的更改([方法 21]):库的客户需要更改他们的代码以适应新的变体。如果一个枚举实际上只是一个旧式的值列表,可以通过将其标记为 non_exhaustive 枚举来避免这种行为;请参阅[方法 21]。 -However, there are definitely other rare scenarios where the two cases need to be distinguished with Option> – for example, a cryptographic system might need to distinguish between "payload transported separately" and "empty payload provided". (This is related to the debates around the NULL marker columns in SQL.) -One common edge case that's in the middle is a String which might be absent – does "" or None make more sense to indicate the absence of a value? Either way works, but Option clearly communicates the possibility that this value may be absent. +原文[点这里](https://www.lurklurk.org/effective-rust/use-types.html)查看 -The second common concept arises from error processing: if a function fails, how should that failure be reported? Historically, special sentinel values (e.g. -errno return values from Linux system calls) or global variables (errno for POSIX systems) were used. More recently, languages that support multiple or tuple return values (such as Go) from functions may have a convention of returning a (result, error) pair, assuming the existence of some suitable "zero" value for the result when the error is non-"zero". +[方法 3]: https://www.lurklurk.org/effective-rust/transform.html +[方法 4]: https://www.lurklurk.org/effective-rust/errors.html +[方法 6]: https://www.lurklurk.org/effective-rust/casts.html +[方法 7]: https://www.lurklurk.org/effective-rust/newtype.html +[方法 16]: https://www.lurklurk.org/effective-rust/unsafe.html +[方法 18]: https://www.lurklurk.org/effective-rust/panic.html +[方法 21]: https://www.lurklurk.org/effective-rust/semver.html -In Rust, always encode the result of an operation that might fail as a Result. The T type holds the successful result (in the Ok variant), and the E type holds error details (in the Err variant) on failure. Using the standard type makes the intent of the design clear, and allows the use of standard transformations (Item 3) and error processing (Item 4); it also makes it possible to streamline error processing with the ? operator. -1: The situation gets muddier still if the filesystem is involved, since filenames on popular platforms are somewhere in between arbitrary bytes and UTF-8 sequences: see the std::ffi::OsString documentation. +[char::from_u32]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32 +[char::from_u32_unchecked]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32_unchecked -2: Technically, a Unicode scalar value rather than a code point +[Joel Spolsky 的著名博客]:https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ +[`Unicode` 值]: http://www.unicode.org/glossary/#unicode_scalar_value +[`rune 类型`]: https://golang.org/doc/go1#rune +[代数数据类型]: https://en.wikipedia.org/wiki/Algebraic_data_type -3: This also means that adding a new variant to an existing enum in a library is a breaking change (Item 21): clients of the library will need to change their code to cope with the new variant. If an enum is really just an old-style list of values, this behaviour can be avoided by marking it as a non_exhaustive enum; see Item 21. \ No newline at end of file +[std::ffi::OsString]: https://doc.rust-lang.org/std/ffi/struct.OsString.html