Skip to content

Add generalized arity tuples #2702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
277 changes: 277 additions & 0 deletions text/0000-generalized-arity-tuples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
- Feature Name: `generalized_arity_tuples`
- Start Date: 2019-05-22
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Currently, it is not possible to write functions that generalize over tuples with an arbitrary arity. Rust could be able to support this feature if tuples had an alternative type-level representation. This RFC proposes a simple and straight-forward solution that does not break any existing code and that does not include any changes to the syntax or the way Rust reasons about types.

# Motivation
[motivation]: #motivation

Many crucial API functions can intuitively be generalized over tuples of any arity. Examples for such functions are spread over the entire ecosystem:

- `core::ops::Eq::eq`
- `core::iter::Iterator::zip`
- `futures::future::join{,3,4,5}`
- `serde::Serialize::serialize`
- `specs::join::Join::join`
- etc.

Unfortunately, it is not possible to express the generalization strategy in Rust's type system. Instead, a common practice is to generalize code using the macro system. This has two major drawbacks:

- The code is not really general since it can only support a limited number of arities. This is the same restriction as if it had been written down by hand. To make things worse, each library has its own understanding about what is cosidered a good limit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For trait implementations, I think the typical number is around 10-12; do you really need more? -- please expand on this. :)

Functions like zip seems to be a different matter however.

- A lot of `fn`s or `impl`s are created and sent to tools like `racer` or `cargo doc`. As a result, these tools yield too many items and, hence, obfuscate the generalizing nature of the code.

Despite everything, it is possible to _emulate_ generalized arity tuples in Rust _right now_ by using recursive types. If the compiler were to create those types automatically for each tuple, it would be easily possible to implement, for example, the following generalized function:

- `future::future::join` which consumes a tuple of any arity `(Future<Output=A>, Future<Output=B>, ..., Future<Output=Z>)` and returns a tuple with the same arity `Future<Output=<(A, B, ..., Z)>`

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The following guide illustrates how a `join` function can be implemented that consumes a nested tuple of `Option`s and returns `Some` with all values unwrapped if possilbe and `None` otherwise, s.t.:

- `(Option<i32>, (Option<bool>, Option<&str>))` is mapped to `Option<(i32, (bool, &str))>`
- `(Some(99), (Some(true), Some("text"))).join()` evaluates to `Some((99, (true, "text")))`
- `(Some(99), (None::<bool>, Some("text"))).join()` evaluates to `None`

In order to generalize over the tuples, we first need to define an appropriate abstraction for the `join` function:

```rust
trait Join {
type Joined;

fn join(self) -> Option<Self::Joined>;
}
```

Next, we trivially implement `join` on the `Option` type:

```rust
impl<T> Join for Option<T> {
type Joined = T;

fn join(self) -> Option<Self::Joined> { self }
}
```

And on the empty tuple:

```rust
impl Join for () {
type Joined = ();

fn join(self) -> Option<Self::Joined> { Some(()) }
}
```

The only step left to make the `join` function work on any tuple of `Option`s is to implement a recursion:

```rust
impl<ELEM: Join, TAIL: Join> Join for Tuple<ELEM, TAIL> {
type Joined = Tuple<ELEM::Joined, TAIL::Joined>;

fn join(self) -> Option<Self::Joined> {
if let (Some(elem), Some(tail)) = (self.elem.join(), self.tail.join()) {
Some(Tuple::new(elem, tail))
} else {
None
}
}
}
```

Note that `Tuple<ELEM, TAIL>` is a desugared representation of the tuple `(ELEM, TAIL.0, ..., TAIL.n-1)` with arity `n + 1` where `TAIL` is a tuple with arity `n`.

## Why are we already done here?

Consider the type `(Option<i32>, (Option<bool>, Option<&str>))` from the requirements above. This type is just syntactic sugar for the type `Tuple<Option<i32>, Tuple<Tuple<Option<bool>, Tuple<Option<&str>, ()>>, ()>>`.

Now, apply the provided trait implementations step by step on the desugared type. The resulting associated type `Join::Joined` evaluates to `Tuple<i32, Tuple<Tuple<bool, Tuple<&str, ()>>, ()>>`. But this is just the desugared version of `(i32, (bool, &str))`.

This illustrates how the mechanims works on the type level.

## More examples / Advanced type mappings

Generalized implementations on `Tuple<HEAD, TAIL>` are not restricted to mappings between tuples of the same shape. The following example demonstrates how a `last` function can be realized that works with any tuple:

```rust
trait Last {
type Last;

fn last(self) -> Self::Last;
}

impl<ELEM> Last for (ELEM,) {
type Last = ELEM;

fn last(self) -> Self::Last {
self.elem
}
}

impl<ELEM, TAIL: Last> Last for Tuple<ELEM, TAIL> {
type Last = TAIL::Last;

fn last(self) -> Self::Last {
self.tail.last()
}
}
```

Correcty, the compiler rejects empty tuples while tuples of other sizes are accepted:

```rust
().last(); // does not compile: no method named `last` found for type `()` in the current scope
(1,).last(); // returns 1
(1, "two").last(); // returns "two"
etc.
```

The last example demonstrates how every second element of a tuple can be removed:

```rust
trait Halve {
type Output;

fn halve(self) -> Self::Output;
}

impl Halve for () {
type Output = ();

fn halve(self) {}
}

impl<ELEM1, ELEM2, TAIL: Halve> Halve for Tuple<ELEM1, Tuple<ELEM2, TAIL>> {
type Output = Tuple<ELEM1, TAIL>;

fn halve(self) -> Self::Output {
Tuple::new(self.elem, self.tail.tail)
}
}
```

Results:

```rust
().halve(); // returns ()
(1,).halve() // does not compile: no method named `halve` found for type `(i32,)` in the current scope
(1, "two").halve(); // returns (1,)
(1, "two", 3.0, '4').halve(); // returns (1, 3.0)
etc.
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The idea of this RFC to leave the tuple notation as is but to treat tuple type expressions as aliases according to the following pattern:

```rust
type () = (); // Not really an alias. Written down for completeness.
type (A,) = Tuple<A, ()>;
type (A, B) = Tuple<A, Tuple<B, ()>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this desugaring is incompatible with the current unsizing rule.

  1. We need allow ([u8],).
  2. We also need to allow (u8, [u8]).

If (A,) is desugared as Tuple<A, ()>, this means ELEM must be relaxed as ELEM: ?Sized.

If (A, B) is desugared as Tuple<A, (B,)>, this means TAIL must be relaxed as TAIL: ?Sized.

But we cannot have two unsized fields in a structure (struct Tuple<E: ?Sized, T: ?Sized> { head: E, tail: T }.

Therefore, the tuple desugaring must terminate at (A,) and cannot be further desugared to Tuple<A, ()>.

Alternatively, you could reverse the expansion direction, so that only the final field needs to be unsized.

struct Tuple<Init, Last: ?Sized> {
    init: Init,
    last: Last,
}

type () = ();
type (A,) = Tuple<(), A>;
type (A, B) = Tuple<Tuple<(), A>, B>;
type (A, B, C) = Tuple<Tuple<Tuple<(), A>, B>, C>;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Easy solution 👍

type (A, B, C) = Tuple<A, Tuple<B, Tuple<C, ()>>>;
type (A, (B, C)) = Tuple<A, Tuple<Tuple<B, Tuple<C, ()>>, ()>>
etc.
```

where `Tuple` is a new struct located in `std::ops` with the following definition:

```rust
struct Tuple<ELEM, TAIL> {
Copy link
Contributor

@Centril Centril May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is essentially a HList: https://docs.rs/frunk/0.3.0/frunk/hlist/index.html.

  • The good aspect of this is that you are essentially removing ty::TyKind::Tuple. The rules about unsized types in the last element should just fall out from structs. Overall, this is a substantial reduction in the amount of structural typing in the type system, which is a good thing. Instead, Tuple<H, T> is nominally typed and (T0, ..., Tn) is just sugar. You may want to extend the rationale with a note about the benefits of this simplification.

    • Also please make a note of Tuple becoming a #[lang = "tuple"] item by attaching the attribute here.
  • On the other hand, this also means that the compiler is no longer free to apply layout optimizations where fields are reordered. E.g. today, the compiler is free to lay (A, B, C, D) out as laid out as A D C B. After introducing struct Tuple<H, T>, the compiler can no longer do that because it is now possible to take a reference to tup.tail.

Copy link
Contributor

@Centril Centril May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a discussion with @oli-obk, they noted that #[non_referenceable] pub tail: Tail poses a problem with respect to implementing PartialEq and similar traits when you want to do it recursively and generally for all tuples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More details:

impl PartialEq for () {
    fn eq(&self, other: &Self) -> bool {
        true
    }
}
impl<ELEM: PartialEq, TAIL: PartialEq> PartialEq for Tuple<ELEM, TAIL> {
    fn eq(&self, other: &Self) -> bool {
        self.elem == other.elem && self.tail == other.tail
    }
}

The self.tail == other.tail is essentially PartialEq::eq(&self.tail, &other.tail), which would violate #[non_referenceable].

Copy link
Author

@Woyten Woyten May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get this right, the compiler still has the flexibility to lay out Tuple<ELEM, TAIL> as either ELEM TAIL or TAIL ELEM. So (A, B, C, D) could become A B C D or B C D A or A C D B but not B A C D.

But, indeed, this is a hard restriction which might increase the memory footprint of every tuple.

This problem could be mitigated if the tuple representation was changed to a tree structure, e.g. Tuple<ELEM, LEFT, RIGHT>. In this way, the compiler could regain some control about the memory layout. In return, the compiler would need to match Tuple<Elem, Tail, ()> with Tuple<Elem, (), Tail> or wouldn't it? My first feeling is that this solution is bad just because it is not simple enough.

Copy link
Contributor

@Ixrec Ixrec May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood something in previous discussions, but it seemed like we already knew there's a fundamental choice we have to make here between:

  • the type lists used by variadic generics are identical to tuple types, and can compile away to nothing because tuples "don't exist at runtime" in a sense that rules out sub-tuples having addresses and being efficiently borrowable and so on
  • the type lists used by variadic generics are distinct from tuple types, so there is a certain amount of unfortunate duplication going on, but we get to make guarantees about tuple layout/addressability/etc

And any future variadic generics / generic tuples proposal would simply have to pick one of these and argue convincingly for it, but making up our minds on this was the main thing blocking progress on variadics. Is that no longer the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there's a third option:

  • Go ahead with this proposal either with #[non_referenceable] or without, and then never add "variadic generics".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I in my understanding, solving the variadic tuple problem is equivalent to solving the variadic generic problem. I would even go so far as to say that you do not need variadic generics if you have variadic tuples.

@Ixrec am not aware that the variadic generic problem has a final solution yet, has it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's definitely not solved yet. The whole point of my last comment was that any proposal for variadic tuples effectively is a proposal to solve the variadic generics problem. In other words, even if the proposed solution is simply that variadic tuples are enough, this needs to be made explicit, and then it has to be argued that "full" variadic generics are unnecessary or not worth it (afaik no one's made that argument before; maybe I could be convinced).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that variadic generics might not be strictly necessary since you could model every type Generic<A, ...> with variadic generics simply as Generic<T> where T: TraitThatIsValidForAllTuples. The transition to a real generic notation could be done using a syntactic sugar approach.

pub elem: ELEM,
pub tail: TAIL,
}
```
Copy link
Contributor

@gnzlbg gnzlbg May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this compatible with the layout specified in the unsafe code guidelines?

an anonymous tuple type (T1..Tn) of arity N is laid out "as if" there were a corresponding tuple struct declared in libcore:

#[repr(Rust)]
struct TupleN<P1..Pn:?Sized>(P1..Pn);

Note that this specifies that the layout of a tuple does not match that of the struct this RFC proposes, but of a tuple-struct.

This allows the compiler to perform some optimizations, like field re-ordering, but this RFC does not mention anything about this particular trade-off. I'd recommend scanning the UCGs repo for trade-offs and extending the RFC with how would this change affect those.


This is everything needed to make the `Join` or `Last` traits from the guide section above work for tuples of any arity.

## Required compiler changes

- The compiler needs to treat any type `(ELEM, TAIL.0, ..., TAIL.n-1)` to be equivalent to `Tuple<ELEM, (TAIL.0, ..., TAIL.n-1)>`. This could work in the same way as `std::io::Result<T>` is considered equivalent to `core::result::Result<T, std::io::Error>`.
- Equivalently, every tuple value `(elem, tail.0, ..., tail.n-1)` must be considered structurally equal to `Tuple { elem: elem, tail: (tail.0, ..., tail.n-1) }`.
- Every tuple index access `tuple.n` must evaluate to `tuple{{.tail}^n}.elem`. In other words, `.tail` must be called `n` times before calling `.elem`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially, you are moving tuples to the HIR lowering phases of the compiler and out of later phases. That turns tuples into essentially syntactic sugar. This is a nice simplification. On the other hand, this may also inflate compile times by giving the type checker later phases larger HIR trees to work with.

My overall sense is that it is hard to answer both the run-time and compile-time perf questions without implementing this in a PR and testing it out. Thus, if we are going to accept this, it would be best to run some experiments and get data to inform these questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please discuss pattern matching? Please consider at least these cases:

  • let (a, b) = tup;
  • let (ref mut? a, b) = tup;
  • let (a, ..) = tup;
  • let (a, b, ..) = tup;
  • let (a, b, c @ ..) = tup; -- this is not allowed today, but could potentially be. This ties into questions about &tup.tail
  • let (a, b, ref c @ ..) = tup; -- same here re. &tup.tail; also not allowed today.

- `Tuple<_,_>` types must be mapped back to their user-friendly representation when used in compiler messages or documentation.

# Drawbacks
[drawbacks]: #drawbacks

- People might not understand how or why the `Tuple<ELEM, TAIL>` representation works. Probably, library users do not need to fully understand the details but library maintainers, on the other hand, should.
- Although the intention is to make code more understandable, depending on how the documentation is rendered, it could become _less_ understandable.
- Someone might find a better, more general or simpler solution after this RFC has been implemented. In this case, it will be hard to remove the current solution.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The selling point of the proposed solution is that it is completely based on existing concepts. The syntax and type system remain unaffected. Hence, the implementation effort should be predictable and the risk of compromising the overall quality of the language should be low. A second benefit is the possibility to define more advanced type mappings, e.g. `(A, B, C, ..., Z)` &rarr; `(B, A, C, ..., Z)`.
Copy link
Contributor

@Centril Centril May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type system is absolutely affected, but as I noted before it is simplified... ;)


An alternative approach to the tuple generalization problem could be to add some kind of type-level iterator to the language. Although the idea seems simple and straight-forward at first sight, it comes with some big drawbacks:

- New syntax for the iteration must be introduced. This, usually, is a very hard problem.
- The compiler needs a new type-level iterator machinery.
- It is hard to imagine how more advanced type mappings can be realized without introducing even more syntax.

# Prior art
[prior-art]: #prior-art

Similar solutions have been proposed in earlier RFCs. The drawbacks compared to this RFC are summarized here:

- [#1582](https://github.com/rust-lang/rfcs/pull/1582):
- Introduces new syntax
- Introduces new traits
- Deals with memory layout issues
- [#1921](https://github.com/rust-lang/rfcs/pull/1921):
- Focussed on variadic function arguments
- Introduces new attributes
- Changes the way the language works by introducing function overloading
- [#1935](https://github.com/rust-lang/rfcs/pull/1935)
- Focussed on variadic generics
- Introduces new syntax
- Includes new traits
- Uses special handling for references
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to survey / see a discussion of variadics, type level lists, etc. in other languages, including:

  • Haskell
  • Idris
  • C++
  • Other?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more or less how tuples work in Ceylon.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are taking the frunk approach to this, it would be a good idea to have a discussion of the library and the various traits and transformations in there. In particular, frunk should provide us with a good investigation of how this approach actually pans out in terms of what transformations can be written and not.

cc @lloydmeta @ExpHP

# Unresolved questions
[unresolved-questions]: #unresolved-questions

Those points are mainly decisions to be made before implementing the RFC:

- How should compiler messages or the documentation be rendered? The printed output for `Tuple<A, Tuple<B, Tuple<C, ()>>>` must probably be mapped back to `(A, B, C)` for readability. But what if this reverse mapping is impossible as is the case for the generalized tuple `impl`s?
- What should the compiler do with nonsensical tuples? A nonsensical tuple is a `Tuple` whose `TAIL` parameter is not a tuple (e.g. `Tuple<String, String>`). It feels like the easiest and most idiomatic answer is that the compiler should not care and let the code run into a type error as soon as the tuple is used. Nevertheless, nonsensical tuples could be discovered and reported by `clippy`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with not doing anything about "nonsensical" tuples; seems like banning them just brings unjustified complication to the type system and undoes the nice simplification benefits your proposal brings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's current type system is plenty enough to be able to specify that a tuple cons-element can only have () or another tuple cons-element as its tail associated type, so a separate mechanism for checking the well-formedness of a tuple type list is not needed. (Custom diagnostics might be of help in that area, though.)

Copy link
Member

@varkor varkor May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just have a closed trait IsTuple that is only implemented by valid tuples? That would be a simple modification, but would avoid any extra complexities by not making sure Tuple is well-formed.

- How should the `Tuple` struct look like precisely? Should it export globally visible symbols like `tuple.elem` or `tuple.elem()` or should they be hidden behind a namespace, e.g. `Tuple::elem(tuple)`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the larger question here is whether &tup.tail should be possible or not. Please add that to the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that a big question here is how much of frunk we want to add to the standard library.


# Future possibilities
[future-possibilities]: #future-possibilities

- Generalized tuples could be an enabler or a replacent for variadic generic type parameters.
- Generalized tuples could be an enabler or a replacent for variadic function arguments.
- It might be possible to use the type aliasing strategy on `structs` or `enums`. Using const generics the struct

```rust
struct MyStruct {
first: String,
second: usize,
}
```

could become

```rust
NamedFieldsStruct<
"MyStruct",
Field<
"first",
String,
Field<
"second"
usize,
End,
>
>
>
```

- With trait specialization, any list operation should be possible on the type level.