|
| 1 | +- Start Date: 2014-03-31 |
| 2 | +- RFC PR #: |
| 3 | +- Rust Issue #: |
| 4 | + |
| 5 | + |
| 6 | +# Summary |
| 7 | + |
| 8 | +Unify enums and structs by allowing enums to have fields, and structs to have |
| 9 | +variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum |
| 10 | +pointers. Remove struct variants. Treat enum variants as first class. Possibly |
| 11 | +remove nullary structs and tuple structs. |
| 12 | + |
| 13 | +The motivation for this is to provide an alternative to Java-style single |
| 14 | +inheritance. I.e., efficient sharing of fields, thin pointers, and virtual |
| 15 | +method dispatch. Along the way we simplify the language by unifying two language |
| 16 | +items and making obsolete a few more. |
| 17 | + |
| 18 | +Despite being a fairly radical proposal, I believe this is mostly backwards |
| 19 | +compatible. |
| 20 | + |
| 21 | +# Motivation |
| 22 | + |
| 23 | +Supporting efficient, heterogeneous data structures such as the DOM. Precisely |
| 24 | +we need a form of code sharing which satisfies the following constraints: |
| 25 | + |
| 26 | +* Cheap field access from internal methods; |
| 27 | +* Cheap dynamic dispatch of methods; |
| 28 | +* Cheap downcasting; |
| 29 | +* Thin pointers; |
| 30 | +* Sharing of fields and methods between definitions; |
| 31 | +* Safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable. |
| 32 | + |
| 33 | +Example (Java-like pseudo-code): |
| 34 | + |
| 35 | +``` |
| 36 | +class Element { |
| 37 | + Element parent, left-sibling, right-sibling; |
| 38 | + Element[] children; |
| 39 | +
|
| 40 | + foo(); |
| 41 | +
|
| 42 | + template() { |
| 43 | + x = foo(); |
| 44 | + ... |
| 45 | + } |
| 46 | +} |
| 47 | +
|
| 48 | +class Element1 : Element { |
| 49 | + Data some-data; |
| 50 | +
|
| 51 | + template() { |
| 52 | + return some-data; |
| 53 | + } |
| 54 | +} |
| 55 | +
|
| 56 | +final class Element2 : Element { |
| 57 | + ... |
| 58 | +} |
| 59 | +``` |
| 60 | + |
| 61 | + |
| 62 | +# Detailed design |
| 63 | + |
| 64 | +## Extend enums with fields |
| 65 | + |
| 66 | +For example, |
| 67 | + |
| 68 | +``` |
| 69 | +enum E { |
| 70 | + f1: T1, |
| 71 | + Var1(T5, T6), |
| 72 | + f2: T2 |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +## Extend structs with Variants |
| 77 | + |
| 78 | +For example, |
| 79 | + |
| 80 | +``` |
| 81 | +struct S { |
| 82 | + f3: T3, |
| 83 | + Var2(T5, T6), |
| 84 | + f4: T4 |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +## I.e., unify structs and enums |
| 89 | + |
| 90 | +With the above extensions, enums and structs are basically the same (they have |
| 91 | +the same syntax (modulo the keyword), we would allow the same type |
| 92 | +parameterisation, etc.). The difference is that only structs can be instantiated |
| 93 | +(as opposed to one of the variants; you could think of enums being abstract |
| 94 | +structs). So we could have values of `Var1` and `Var2` (but not `f1`, etc.), and |
| 95 | +`S`, but not `E`. When instantiating `S`, we must specify values for fields `f3` |
| 96 | +and `f4`. Values of `Var1` have named fields `f1` and `f2` and unamed fields of |
| 97 | +types `T5` and `T6`, all must be specified when instantiating `Var1` (questions |
| 98 | +- what should the syntax look like? How do we specify constructors for the `E` |
| 99 | +part?). |
| 100 | + |
| 101 | +## Allow nested enums/structs |
| 102 | + |
| 103 | +For example, |
| 104 | + |
| 105 | +``` |
| 106 | +enum E1 { |
| 107 | + enum E2 { |
| 108 | + ... |
| 109 | + } |
| 110 | + struct S1 { |
| 111 | + enum E3 { ... } |
| 112 | + struct S2 { ... } |
| 113 | + } |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +Nesting does not introduce a scope, so from the same scope as `E1` is declared, |
| 118 | +we can refer to `E1`, `E2`, `S1`, `E3`, and `S2` (modulo privacy, see open |
| 119 | +questions). Nested items inherit fields from outer items. So, `S2` would inherit |
| 120 | +fields declared in `E1` and `S1`. |
| 121 | + |
| 122 | + |
| 123 | +## Treat variants as 'first class' |
| 124 | + |
| 125 | +As well as instantiating variants, we allow the use of variants (whether |
| 126 | +structs, enums, tuples, or nullary) as types and allow impls for them. In |
| 127 | +combination with nested enums this is a partial replacement for 'refinement' |
| 128 | +types (that is, specifying a type on a subset of the variants of an enum). |
| 129 | +However, this is not the main motivation. The idea is that a variant (probably a |
| 130 | +struct variant) will replace a base class in a class hierarchy; an enum would |
| 131 | +replace an abstract base class and a struct would replace a non-abstract base |
| 132 | +class or leaf (concrete) class. Making variants first class makes it possible to |
| 133 | +refer to enum/struct objects other than the top level by type, and to provide |
| 134 | +methods for them in impls. |
| 135 | + |
| 136 | + |
| 137 | +## Virtual dispatch of methods for struct/enum objects |
| 138 | + |
| 139 | +We allow methods in impls for struct/enum objects (that is, references to |
| 140 | +struct/enum types) to be marked as `virtual` (allows overriding) and/or |
| 141 | +`override` (overrides a method). Methods declared on outer items are inherited |
| 142 | +by nested iterms. E.g., from the example above, a method declared on `E1` will |
| 143 | +be inherited by `E2` and `S2` (and others). If a method is declared `virtual`, |
| 144 | +then impls for nested items may override that method. If and only if a method is |
| 145 | +marked `override` then it must override a method declared in an outer item. |
| 146 | +Methods for enums may be declared without a body (as pure abstract/virtual |
| 147 | +methods in Java/C++ or required methods on traits). These must be overriden by |
| 148 | +any non-enum nested items. (Question - should we extend this to structs - i.e., |
| 149 | +allow pure virtual methods for structs and track these and not allow |
| 150 | +instantiation of such structs?). |
| 151 | + |
| 152 | + |
| 153 | +## V-tables, thin pointers, and down-casting |
| 154 | + |
| 155 | +Struct/enum objects are referred to using thin pointers. Virtual dispatch is |
| 156 | +implemented using Java-style (or C++ with virtual single inheritance and without |
| 157 | +multiple inheritance) v-tables. That is, `&S1` or `~S1` is implemented as a |
| 158 | +pointer to a structure consisting of a pointer into a v-table (which identifies |
| 159 | +the dynamic type) and values for all fields of the dynamic type. Method call is |
| 160 | +implemented via the v-table. Since we identify the dynamic type, we can allow |
| 161 | +safe dynamic downcasting. This can be done by a match statement, continuing the |
| 162 | +example above: |
| 163 | + |
| 164 | +``` |
| 165 | +fn f(x: &E1) { |
| 166 | + match x { |
| 167 | + y @ &S2 {..} => { ... } // y is effectively a downcast of x to S2 |
| 168 | + y @ &S1 {..} => { ... } // y is effectively a downcast of x to S1 |
| 169 | + _ => { ... } // x isn't an instance of S1 or S2 |
| 170 | + } |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +We would allow the usual pattern matching too. |
| 175 | + |
| 176 | +Question - might be handy to allow skipping the `{..}` for structs, then again, |
| 177 | +hopefully downcasting won't be commonly used so maybe we don't need to. |
| 178 | + |
| 179 | + |
| 180 | +## Remove struct variants |
| 181 | + |
| 182 | +Unification of structs and enums makes struct variants obsolete. For example, |
| 183 | + |
| 184 | +``` |
| 185 | +enum E { |
| 186 | + Variant1{f: T} |
| 187 | +} |
| 188 | +``` |
| 189 | + |
| 190 | +can be written as |
| 191 | + |
| 192 | +``` |
| 193 | +enum E { |
| 194 | + struct Variant1{f: T} |
| 195 | +} |
| 196 | +
|
| 197 | +``` |
| 198 | + |
| 199 | +Therefore, we may as well remove struct variants (they are currently |
| 200 | +feature-gated). |
| 201 | + |
| 202 | + |
| 203 | +## Coercions (subtyping) |
| 204 | + |
| 205 | +Nesting of enums/structs should give (probably implicit) coercions of |
| 206 | +references. E.g., (again, from the above example), `&S2` <: `&S1` <: `&E1`. |
| 207 | +There is no subtyping between values, to avoid the slicing problems (er, is this |
| 208 | +right? Or my imagination? I think we do get into problems with the expectation |
| 209 | +of virtual dispatch, but not being able to, safely, but probably I need to think |
| 210 | +more about this). |
| 211 | + |
| 212 | +We should forbid dereference of pointers to non-leaf items. This is not |
| 213 | +backwards compatible, since for a non-nested enum (as currently present in the |
| 214 | +language), we would allow dereference of references to such enums. We could |
| 215 | +safely allow dereference inside a match expression (as in the downcast example, |
| 216 | +above) and hopefully that covers most of the current use cases. This would need |
| 217 | +a bit of investigation. |
| 218 | + |
| 219 | + |
| 220 | +# Example |
| 221 | + |
| 222 | +The first example in Java-ish syntax would be written as: |
| 223 | + |
| 224 | +``` |
| 225 | +enum Element { |
| 226 | + parent: RC<Element>, |
| 227 | + children: ~[RC<Element>], |
| 228 | + left: RC<Element>, |
| 229 | + right: RC<Element>, |
| 230 | +
|
| 231 | + struct Element1 { |
| 232 | + x: uint, |
| 233 | + y: uint, |
| 234 | + }, |
| 235 | +
|
| 236 | + struct Element2 { |
| 237 | + x: uint, |
| 238 | + y: uint, |
| 239 | + } |
| 240 | +} |
| 241 | +
|
| 242 | +impl Element { |
| 243 | + virtual fn foo(&self) -> uint; |
| 244 | +
|
| 245 | + fn template(&self) { |
| 246 | + let x = self.foo(); |
| 247 | + ... |
| 248 | + } |
| 249 | +} |
| 250 | +
|
| 251 | +impl Element1 { |
| 252 | + override fn foo(&self) -> uint { self.x + self.y } |
| 253 | +} |
| 254 | +
|
| 255 | +impl Element2 { |
| 256 | + override fn foo(&self) -> uint { self.x + self.y } |
| 257 | +} |
| 258 | +``` |
| 259 | + |
| 260 | +None of this prevents the usual use of traits and impls, which hopefully are an |
| 261 | +alternative to multiple inheritance. For example, `nsIConstraintValidation` is a |
| 262 | +mixin class in the Gecko DOM implementation. It could be implemented in Rust |
| 263 | +as something like: |
| 264 | + |
| 265 | +``` |
| 266 | +impl Element { |
| 267 | + virtual fn bar(&self) -> uint; |
| 268 | +} |
| 269 | +
|
| 270 | +trait NSICompositor { |
| 271 | + fn x(&self) -> uint; |
| 272 | + fn y(&self) -> uint; |
| 273 | + fn bar(&self) -> uint { self.x() + self.y() } |
| 274 | +} |
| 275 | +
|
| 276 | +impl NSICompositor for Element1 { |
| 277 | + fn x(&self) -> uint { self.x } |
| 278 | + fn y(&self) -> uint { self.y } |
| 279 | +} |
| 280 | +
|
| 281 | +impl Element1 { |
| 282 | + override fn bar(&self) -> uint { NSICompositor::bar(self) } |
| 283 | +} |
| 284 | +
|
| 285 | +impl NSICompositor for Element2 { |
| 286 | + fn x(&self) -> uint { self.x } |
| 287 | + fn y(&self) -> uint { self.y } |
| 288 | +} |
| 289 | +
|
| 290 | +impl Element2 { |
| 291 | + override fn bar(&self) -> uint { NSICompositor::bar(self) } |
| 292 | +} |
| 293 | +``` |
| 294 | + |
| 295 | + |
| 296 | +# Alternatives |
| 297 | + |
| 298 | +RFC 5 - virtual structs |
| 299 | + |
| 300 | +RFC 11 - Alternative to virtual struct and functions by extending enums |
| 301 | + |
| 302 | +RFC 9 - RFC for "fat objects" for DSTs |
| 303 | + |
| 304 | +There's also a version of RFC 5 using macros etc. to add fewer language features. |
| 305 | + |
| 306 | + |
| 307 | +# Unresolved questions |
| 308 | + |
| 309 | +## Trait methods |
| 310 | + |
| 311 | +I think requiring indication of overridable and overriding methods is a good |
| 312 | +thing (both Java and C++ have keywords or annotations for this). However, we |
| 313 | +don't require them for methods in traits - should we? Or should we not require |
| 314 | +them for structs/enums for consistency? If we do want them for traits should |
| 315 | +they be in the trait or the impl? Trait seems to make more sense, but impl is |
| 316 | +what I propose here for structs/enums. I would like to have a consistent story |
| 317 | +here. |
| 318 | + |
| 319 | + |
| 320 | +## Remove tuple structs, nullary structs |
| 321 | + |
| 322 | +Unifying structs and enums and making variants first class makes enum structs |
| 323 | +and empty structs obsolete. They can be replaced by an enum with a single tuple |
| 324 | +variant or a single nullary variant, respectively. By combining with privacy |
| 325 | +annotations we might get a nice separation between interface and implementation. |
| 326 | +On the other hand it requires an extra name (maybe we should allow anonymous |
| 327 | +enums?) and a bit more syntax. One use case for tuple structs is new types. Not |
| 328 | +sure if the interface/implementation separation helps there or whether the extra |
| 329 | +`enum` keyword, name, and braces are just extra boilerplate. I think removing |
| 330 | +some language items would be nice. |
| 331 | + |
| 332 | + |
| 333 | +## Privacy |
| 334 | + |
| 335 | +I think all fields should be private by default on enums and structs, and |
| 336 | +variants should be public. We should allow `pub` and `priv` annotations to |
| 337 | +change these defaults. But we need to think about this a bit more deeply. |
| 338 | + |
| 339 | + |
| 340 | +## Destructors |
| 341 | + |
| 342 | +How should they work? I feel the C++ approach is too much of a foot gun. We |
| 343 | +should always be able to infer whether or not a destructor is virtual. Need to |
| 344 | +work out how exactly implementing the drop trait interacts with nested enums. We |
| 345 | +need to cope with the situation where a struct/enum object with static type T1 |
| 346 | +and dynamic type T2 goes out of scope and T2 implements `Drop` and T1 doesn't - |
| 347 | +we still need to call T2::drop (and then call the destructors of any types |
| 348 | +between T2 and T1). One solution could be that if a struct implements `Drop` |
| 349 | +then so must the outer struct/enum. Calling `drop` is then just a regular |
| 350 | +virtual call and is only necessary if the static type implements `Drop`. |
| 351 | + |
| 352 | +## Initialisers |
| 353 | + |
| 354 | +Need to think a bit about struct initialisers. We should require all fields to |
| 355 | +be specified. We should support constructors too. I'm not sure how we support |
| 356 | +'struct' initialisers for enums - which should not be instantiable. Since there |
| 357 | +is no kind of cross-module inheritance, perhaps it is not an issue since fields |
| 358 | +can always be accessed. |
| 359 | + |
| 360 | +## Calling overridden methods |
| 361 | + |
| 362 | +If a method is overridden, we should still be able to call it. C++ uses `::` |
| 363 | +syntax to allow this. In the example above we use `Foo::bar(self)` to indicate |
| 364 | +static dispatch of an overridden method. I'm not sure if this is currently |
| 365 | +valid Rust or if it is the optimal tsolution. But it looks nice to me and we |
| 366 | +need something for such a situation. |
| 367 | + |
| 368 | +## Generics |
| 369 | + |
| 370 | +Not sure exactly how generics would work right now. I assume generics in outer |
| 371 | +items are available (and not overridable/shadowable) in inner items. All actual |
| 372 | +type parameters must be specified or inferred when an item is instantiated or |
| 373 | +used for a type (which is a little counter-intuitive). E.g., |
| 374 | + |
| 375 | +``` |
| 376 | +struct S1<X> { |
| 377 | + struct S2<Y> { |
| 378 | + ... |
| 379 | + } |
| 380 | +} |
| 381 | +``` |
| 382 | + |
| 383 | +When we use `S2` we would have to use `S2<T1, T2>`. Or perhaps we should say we |
| 384 | +require at least as many type variables in inner items as outer and implicitly |
| 385 | +substitute and outer type variables are not available inside inner items (i.e., |
| 386 | +in the example above, `X` and `Y` are implicitly linked and `X` can't be used |
| 387 | +inside `S2`. We would use `S2<T>`). Or perhaps we should make the substitution |
| 388 | +explicit somehow (this would be my preferred solution, but I'm not sure how to |
| 389 | +express it). |
0 commit comments