@@ -96,12 +96,11 @@ need, and it can make your lifetimes more complex.
9696
9797## Generic functions
9898
99- To write a function that's generic over types of strings, use [ the ` Str `
100- trait] ( http://doc.rust-lang.org/std/str/trait.Str.html ) :
99+ To write a function that's generic over types of strings, use ` &str ` .
101100
102101``` {rust}
103- fn some_string_length<T: Str> (x: T ) -> uint {
104- x.as_slice(). len()
102+ fn some_string_length(x: &str ) -> uint {
103+ x.len()
105104}
106105
107106fn main() {
@@ -111,15 +110,12 @@ fn main() {
111110
112111 let s = "Hello, world".to_string();
113112
114- println!("{}", some_string_length(s));
113+ println!("{}", some_string_length(s.as_slice() ));
115114}
116115```
117116
118117Both of these lines will print ` 12 ` .
119118
120- The only method that the ` Str ` trait has is ` as_slice() ` , which gives you
121- access to a ` &str ` value from the underlying string.
122-
123119## Comparisons
124120
125121To compare a String to a constant string, prefer ` as_slice() ` ...
@@ -161,25 +157,93 @@ indexing is basically never what you want to do. The reason is that each
161157character can be a variable number of bytes. This means that you have to iterate
162158through the characters anyway, which is a O(n) operation.
163159
164- To iterate over a string, use the ` graphemes() ` method on ` &str ` :
160+ There's 3 basic levels of unicode (and its encodings):
161+
162+ - code units, the underlying data type used to store everything
163+ - code points/unicode scalar values (char)
164+ - graphemes (visible characters)
165+
166+ Rust provides iterators for each of these situations:
167+
168+ - ` .bytes() ` will iterate over the underlying bytes
169+ - ` .chars() ` will iterate over the code points
170+ - ` .graphemes() ` will iterate over each grapheme
171+
172+ Usually, the ` graphemes() ` method on ` &str ` is what you want:
165173
166174``` {rust}
167- let s = "αἰθήρ ";
175+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé ";
168176
169177for l in s.graphemes(true) {
170178 println!("{}", l);
171179}
172180```
173181
182+ This prints:
183+
184+ ``` {notrust,ignore}
185+ u͔
186+ n͈̰̎
187+ i̙̮͚̦
188+ c͚̉
189+ o̼̩̰͗
190+ d͔̆̓ͥ
191+ é
192+ ```
193+
174194Note that ` l ` has the type ` &str ` here, since a single grapheme can consist of
175195multiple codepoints, so a ` char ` wouldn't be appropriate.
176196
177- This will print out each character in turn, as you'd expect: first "α", then
178- "ἰ", etc. You can see that this is different than just the individual bytes.
179- Here's a version that prints out each byte:
197+ This will print out each visible character in turn, as you'd expect: first "u͔", then
198+ "n͈̰̎", etc. If you wanted each individual codepoint of each grapheme, you can use ` .chars() ` :
180199
181200``` {rust}
182- let s = "αἰθήρ";
201+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
202+
203+ for l in s.chars() {
204+ println!("{}", l);
205+ }
206+ ```
207+
208+ This prints:
209+
210+ ``` {notrust,ignore}
211+ u
212+ ͔
213+ n
214+ ̎
215+ ͈
216+ ̰
217+ i
218+ ̙
219+ ̮
220+ ͚
221+ ̦
222+ c
223+ ̉
224+ ͚
225+ o
226+ ͗
227+ ̼
228+ ̩
229+ ̰
230+ d
231+ ̆
232+ ̓
233+ ͥ
234+ ͔
235+ e
236+ ́
237+ ```
238+
239+ You can see how some of them are combining characters, and therefore the output
240+ looks a bit odd.
241+
242+ If you want the individual byte representation of each codepoint, you can use
243+ ` .bytes() ` :
244+
245+ ``` {rust}
246+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
183247
184248for l in s.bytes() {
185249 println!("{}", l);
@@ -189,16 +253,50 @@ for l in s.bytes() {
189253This will print:
190254
191255``` {notrust,ignore}
192- 206
193- 177
194- 225
195- 188
256+ 117
257+ 205
258+ 148
259+ 110
260+ 204
261+ 142
262+ 205
263+ 136
264+ 204
196265176
197- 206
198- 184
199- 206
266+ 105
267+ 204
268+ 153
269+ 204
200270174
201- 207
271+ 205
272+ 154
273+ 204
274+ 166
275+ 99
276+ 204
277+ 137
278+ 205
279+ 154
280+ 111
281+ 205
282+ 151
283+ 204
284+ 188
285+ 204
286+ 169
287+ 204
288+ 176
289+ 100
290+ 204
291+ 134
292+ 205
293+ 131
294+ 205
295+ 165
296+ 205
297+ 148
298+ 101
299+ 204
202300129
203301```
204302
0 commit comments