Skip to content

Commit ef2ff45

Browse files
committed
Unify and nest structs and enums
Another alternative to RFC rust-lang#5 and an extension/variant of RFC rust-lang#11. Unify enums and structs by allowing enums to have fields, and structs to have variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum pointers. Remove struct variants. Treat enum variants as first class. Possibly remove nullary structs and tuple structs.
1 parent a260073 commit ef2ff45

File tree

1 file changed

+389
-0
lines changed

1 file changed

+389
-0
lines changed

0000-enum-struct.md

Lines changed: 389 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,389 @@
1+
- Start Date: 2014-03-31
2+
- RFC PR #:
3+
- Rust Issue #:
4+
5+
6+
# Summary
7+
8+
Unify enums and structs by allowing enums to have fields, and structs to have
9+
variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum
10+
pointers. Remove struct variants. Treat enum variants as first class. Possibly
11+
remove nullary structs and tuple structs.
12+
13+
The motivation for this is to provide an alternative to Java-style single
14+
inheritance. I.e., efficient sharing of fields, thin pointers, and virtual
15+
method dispatch. Along the way we simplify the language by unifying two language
16+
items and making obsolete a few more.
17+
18+
Despite being a fairly radical proposal, I believe this is mostly backwards
19+
compatible.
20+
21+
# Motivation
22+
23+
Supporting efficient, heterogeneous data structures such as the DOM. Precisely
24+
we need a form of code sharing which satisfies the following constraints:
25+
26+
* Cheap field access from internal methods;
27+
* Cheap dynamic dispatch of methods;
28+
* Cheap downcasting;
29+
* Thin pointers;
30+
* Sharing of fields and methods between definitions;
31+
* Safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable.
32+
33+
Example (Java-like pseudo-code):
34+
35+
```
36+
class Element {
37+
Element parent, left-sibling, right-sibling;
38+
Element[] children;
39+
40+
foo();
41+
42+
template() {
43+
x = foo();
44+
...
45+
}
46+
}
47+
48+
class Element1 : Element {
49+
Data some-data;
50+
51+
template() {
52+
return some-data;
53+
}
54+
}
55+
56+
final class Element2 : Element {
57+
...
58+
}
59+
```
60+
61+
62+
# Detailed design
63+
64+
## Extend enums with fields
65+
66+
For example,
67+
68+
```
69+
enum E {
70+
f1: T1,
71+
Var1(T5, T6),
72+
f2: T2
73+
}
74+
```
75+
76+
## Extend structs with Variants
77+
78+
For example,
79+
80+
```
81+
struct S {
82+
f3: T3,
83+
Var2(T5, T6),
84+
f4: T4
85+
}
86+
```
87+
88+
## I.e., unify structs and enums
89+
90+
With the above extensions, enums and structs are basically the same (they have
91+
the same syntax (modulo the keyword), we would allow the same type
92+
parameterisation, etc.). The difference is that only structs can be instantiated
93+
(as opposed to one of the variants; you could think of enums being abstract
94+
structs). So we could have values of `Var1` and `Var2` (but not `f1`, etc.), and
95+
`S`, but not `E`. When instantiating `S`, we must specify values for fields `f3`
96+
and `f4`. Values of `Var1` have named fields `f1` and `f2` and unamed fields of
97+
types `T5` and `T6`, all must be specified when instantiating `Var1` (questions
98+
- what should the syntax look like? How do we specify constructors for the `E`
99+
part?).
100+
101+
## Allow nested enums/structs
102+
103+
For example,
104+
105+
```
106+
enum E1 {
107+
enum E2 {
108+
...
109+
}
110+
struct S1 {
111+
enum E3 { ... }
112+
struct S2 { ... }
113+
}
114+
}
115+
```
116+
117+
Nesting does not introduce a scope, so from the same scope as `E1` is declared,
118+
we can refer to `E1`, `E2`, `S1`, `E3`, and `S2` (modulo privacy, see open
119+
questions). Nested items inherit fields from outer items. So, `S2` would inherit
120+
fields declared in `E1` and `S1`.
121+
122+
123+
## Treat variants as 'first class'
124+
125+
As well as instantiating variants, we allow the use of variants (whether
126+
structs, enums, tuples, or nullary) as types and allow impls for them. In
127+
combination with nested enums this is a partial replacement for 'refinement'
128+
types (that is, specifying a type on a subset of the variants of an enum).
129+
However, this is not the main motivation. The idea is that a variant (probably a
130+
struct variant) will replace a base class in a class hierarchy; an enum would
131+
replace an abstract base class and a struct would replace a non-abstract base
132+
class or leaf (concrete) class. Making variants first class makes it possible to
133+
refer to enum/struct objects other than the top level by type, and to provide
134+
methods for them in impls.
135+
136+
137+
## Virtual dispatch of methods for struct/enum objects
138+
139+
We allow methods in impls for struct/enum objects (that is, references to
140+
struct/enum types) to be marked as `virtual` (allows overriding) and/or
141+
`override` (overrides a method). Methods declared on outer items are inherited
142+
by nested iterms. E.g., from the example above, a method declared on `E1` will
143+
be inherited by `E2` and `S2` (and others). If a method is declared `virtual`,
144+
then impls for nested items may override that method. If and only if a method is
145+
marked `override` then it must override a method declared in an outer item.
146+
Methods for enums may be declared without a body (as pure abstract/virtual
147+
methods in Java/C++ or required methods on traits). These must be overriden by
148+
any non-enum nested items. (Question - should we extend this to structs - i.e.,
149+
allow pure virtual methods for structs and track these and not allow
150+
instantiation of such structs?).
151+
152+
153+
## V-tables, thin pointers, and down-casting
154+
155+
Struct/enum objects are referred to using thin pointers. Virtual dispatch is
156+
implemented using Java-style (or C++ with virtual single inheritance and without
157+
multiple inheritance) v-tables. That is, `&S1` or `~S1` is implemented as a
158+
pointer to a structure consisting of a pointer into a v-table (which identifies
159+
the dynamic type) and values for all fields of the dynamic type. Method call is
160+
implemented via the v-table. Since we identify the dynamic type, we can allow
161+
safe dynamic downcasting. This can be done by a match statement, continuing the
162+
example above:
163+
164+
```
165+
fn f(x: &E1) {
166+
match x {
167+
y @ &S2 {..} => { ... } // y is effectively a downcast of x to S2
168+
y @ &S1 {..} => { ... } // y is effectively a downcast of x to S1
169+
_ => { ... } // x isn't an instance of S1 or S2
170+
}
171+
}
172+
```
173+
174+
We would allow the usual pattern matching too.
175+
176+
Question - might be handy to allow skipping the `{..}` for structs, then again,
177+
hopefully downcasting won't be commonly used so maybe we don't need to.
178+
179+
180+
## Remove struct variants
181+
182+
Unification of structs and enums makes struct variants obsolete. For example,
183+
184+
```
185+
enum E {
186+
Variant1{f: T}
187+
}
188+
```
189+
190+
can be written as
191+
192+
```
193+
enum E {
194+
struct Variant1{f: T}
195+
}
196+
197+
```
198+
199+
Therefore, we may as well remove struct variants (they are currently
200+
feature-gated).
201+
202+
203+
## Coercions (subtyping)
204+
205+
Nesting of enums/structs should give (probably implicit) coercions of
206+
references. E.g., (again, from the above example), `&S2` <: `&S1` <: `&E1`.
207+
There is no subtyping between values, to avoid the slicing problems (er, is this
208+
right? Or my imagination? I think we do get into problems with the expectation
209+
of virtual dispatch, but not being able to, safely, but probably I need to think
210+
more about this).
211+
212+
We should forbid dereference of pointers to non-leaf items. This is not
213+
backwards compatible, since for a non-nested enum (as currently present in the
214+
language), we would allow dereference of references to such enums. We could
215+
safely allow dereference inside a match expression (as in the downcast example,
216+
above) and hopefully that covers most of the current use cases. This would need
217+
a bit of investigation.
218+
219+
220+
# Example
221+
222+
The first example in Java-ish syntax would be written as:
223+
224+
```
225+
enum Element {
226+
parent: RC<Element>,
227+
children: ~[RC<Element>],
228+
left: RC<Element>,
229+
right: RC<Element>,
230+
231+
struct Element1 {
232+
x: uint,
233+
y: uint,
234+
},
235+
236+
struct Element2 {
237+
x: uint,
238+
y: uint,
239+
}
240+
}
241+
242+
impl Element {
243+
virtual fn foo(&self) -> uint;
244+
245+
fn template(&self) {
246+
let x = self.foo();
247+
...
248+
}
249+
}
250+
251+
impl Element1 {
252+
override fn foo(&self) -> uint { self.x + self.y }
253+
}
254+
255+
impl Element2 {
256+
override fn foo(&self) -> uint { self.x + self.y }
257+
}
258+
```
259+
260+
None of this prevents the usual use of traits and impls, which hopefully are an
261+
alternative to multiple inheritance. For example, `nsIConstraintValidation` is a
262+
mixin class in the Gecko DOM implementation. It could be implemented in Rust
263+
as something like:
264+
265+
```
266+
impl Element {
267+
virtual fn bar(&self) -> uint;
268+
}
269+
270+
trait NSICompositor {
271+
fn x(&self) -> uint;
272+
fn y(&self) -> uint;
273+
fn bar(&self) -> uint { self.x() + self.y() }
274+
}
275+
276+
impl NSICompositor for Element1 {
277+
fn x(&self) -> uint { self.x }
278+
fn y(&self) -> uint { self.y }
279+
}
280+
281+
impl Element1 {
282+
override fn bar(&self) -> uint { NSICompositor::bar(self) }
283+
}
284+
285+
impl NSICompositor for Element2 {
286+
fn x(&self) -> uint { self.x }
287+
fn y(&self) -> uint { self.y }
288+
}
289+
290+
impl Element2 {
291+
override fn bar(&self) -> uint { NSICompositor::bar(self) }
292+
}
293+
```
294+
295+
296+
# Alternatives
297+
298+
RFC 5 - virtual structs
299+
300+
RFC 11 - Alternative to virtual struct and functions by extending enums
301+
302+
RFC 9 - RFC for "fat objects" for DSTs
303+
304+
There's also a version of RFC 5 using macros etc. to add fewer language features.
305+
306+
307+
# Unresolved questions
308+
309+
## Trait methods
310+
311+
I think requiring indication of overridable and overriding methods is a good
312+
thing (both Java and C++ have keywords or annotations for this). However, we
313+
don't require them for methods in traits - should we? Or should we not require
314+
them for structs/enums for consistency? If we do want them for traits should
315+
they be in the trait or the impl? Trait seems to make more sense, but impl is
316+
what I propose here for structs/enums. I would like to have a consistent story
317+
here.
318+
319+
320+
## Remove tuple structs, nullary structs
321+
322+
Unifying structs and enums and making variants first class makes enum structs
323+
and empty structs obsolete. They can be replaced by an enum with a single tuple
324+
variant or a single nullary variant, respectively. By combining with privacy
325+
annotations we might get a nice separation between interface and implementation.
326+
On the other hand it requires an extra name (maybe we should allow anonymous
327+
enums?) and a bit more syntax. One use case for tuple structs is new types. Not
328+
sure if the interface/implementation separation helps there or whether the extra
329+
`enum` keyword, name, and braces are just extra boilerplate. I think removing
330+
some language items would be nice.
331+
332+
333+
## Privacy
334+
335+
I think all fields should be private by default on enums and structs, and
336+
variants should be public. We should allow `pub` and `priv` annotations to
337+
change these defaults. But we need to think about this a bit more deeply.
338+
339+
340+
## Destructors
341+
342+
How should they work? I feel the C++ approach is too much of a foot gun. We
343+
should always be able to infer whether or not a destructor is virtual. Need to
344+
work out how exactly implementing the drop trait interacts with nested enums. We
345+
need to cope with the situation where a struct/enum object with static type T1
346+
and dynamic type T2 goes out of scope and T2 implements `Drop` and T1 doesn't -
347+
we still need to call T2::drop (and then call the destructors of any types
348+
between T2 and T1). One solution could be that if a struct implements `Drop`
349+
then so must the outer struct/enum. Calling `drop` is then just a regular
350+
virtual call and is only necessary if the static type implements `Drop`.
351+
352+
## Initialisers
353+
354+
Need to think a bit about struct initialisers. We should require all fields to
355+
be specified. We should support constructors too. I'm not sure how we support
356+
'struct' initialisers for enums - which should not be instantiable. Since there
357+
is no kind of cross-module inheritance, perhaps it is not an issue since fields
358+
can always be accessed.
359+
360+
## Calling overridden methods
361+
362+
If a method is overridden, we should still be able to call it. C++ uses `::`
363+
syntax to allow this. In the example above we use `Foo::bar(self)` to indicate
364+
static dispatch of an overridden method. I'm not sure if this is currently
365+
valid Rust or if it is the optimal tsolution. But it looks nice to me and we
366+
need something for such a situation.
367+
368+
## Generics
369+
370+
Not sure exactly how generics would work right now. I assume generics in outer
371+
items are available (and not overridable/shadowable) in inner items. All actual
372+
type parameters must be specified or inferred when an item is instantiated or
373+
used for a type (which is a little counter-intuitive). E.g.,
374+
375+
```
376+
struct S1<X> {
377+
struct S2<Y> {
378+
...
379+
}
380+
}
381+
```
382+
383+
When we use `S2` we would have to use `S2<T1, T2>`. Or perhaps we should say we
384+
require at least as many type variables in inner items as outer and implicitly
385+
substitute and outer type variables are not available inside inner items (i.e.,
386+
in the example above, `X` and `Y` are implicitly linked and `X` can't be used
387+
inside `S2`. We would use `S2<T>`). Or perhaps we should make the substitution
388+
explicit somehow (this would be my preferred solution, but I'm not sure how to
389+
express it).

0 commit comments

Comments
 (0)