Skip to content

Semantic newtypes #2242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
263 changes: 263 additions & 0 deletions text/0000-newtypes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
- Feature Name: semantic_newtypes
- Start Date: 2014-07-26
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

Introduce a newtype construction allowing newtypes to use the
capabilities of the underlying type while keeping type safety.

# Motivation

Consider the situation where we want to create separate primitive
types. For example we want to introduce an `Inch` and a `Cm`. These
could be modelled with `usize`, but we don't want to accidentally
mix the types.

With the current newtypes:

```rust
struct Inch(usize);
struct Cm(usize);

// We want to do generic manipulations
fn calc_distance<T: Sub>(start: T, end: T) -> T {
end - start
}

let (start_inch, end_inch) = (Inch(10), Inch(18));
let (start_cm, end_cm) = (Cm(2), Cm(5));

// We must explicitly destruct to reach the values
let (Inch(start), Inch(end)) = (start_inch, end_inch);
let inch_dist = Inch(calc_distance(start, end));

let (Cm(start), Cm(end)) = (start_cm, end_cm);
let cm_dist = Cm(calc_distance(start, end));

let (Inch(inch_val), Cm(cm_val)) = (inch_dist, cm_dist);
println!("dist: {} and {}", inch_val, cm_val);

// Disallowed compile time
let not_allowed = calc_distance(start_inch, end_cm);
```

This is verbose, but at least the types don't mix.
We could explicitly define traits for the types, but that's a lot of duplication
if we want the same capabilities as the underlying type. Additionally, if
someone defines a custom trait in a downstream crate for an upstream type, we
any users of our newtype would not be able to use the newtype where the
downstream trait is used as a bound.

Another option is to use the `type` keyword, but then we loose type safety:

```rust
type Inch = usize;
type Cm = usize;

let inch: Inch = 10;
let cm: Cm = 2;

let oops = inch + cm; // not safe!
```

# Guide-level explanation

Imagine you have many `Vec`s in your code that are all indexeable by some
different kind of id. As a small example, you have a `Vec<User>` and a `Vec<Pet>`.
If you get a `usize` for a userid, you can accidentally use it to index the
`Vec<Pet>`. Since these ids have nothing in common, it might be desirable to
make sure that you have a custom id type that cannot be confused with any other
id type:

```rust
type UserIndex is new usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax feels alien to me - why not simply:

newtype UserIndex(usize);

or:

new type UserIndex(usize);

Assuming we are allowed to introduce this to the grammar (with an epoch or whatever), to me this feels more direct - you know that it is a newtype from the fist syllable and it's more consistent with the current syntax rules. Perhaps you considered this syntax and rejected it - if so, why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a good reason we can't just retrofit any/all of the following (which are all semantically newtypes given #[repr(transparent)]):

struct New(Base);

struct New {
    field_name: Base
}

enum New {
    Variant(Base)
}

enum New {
    Variant {
        field_name: Base
    }
}

Does it have to do with coherence and/or backwards-compatibility? Still, I think it should be spelled out whatever the reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newtype

was rejected in #186 (comment)

new type

I like it.

#[repr(transparent)]

that representation does not do any newtyping to the best of my knowledge. It just makes sure the memory representation in the backend is 100% the same as just the inner type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was rejected in #186 (comment)

2014 was a eons ago and before 1.0 ^,- I think there's a good argument from familiarity (Haskell) to be made for newtype as a syntax. Tho new type could be equally clear. There's the "is new a modifier on type, or is it a different concept altogether?"-discussion to be had.

that representation does not do any newtyping to the best of my knowledge. It just makes sure the memory representation in the backend is 100% the same as just the inner type

Would not new type New(Base); assume #[repr(transparent)] if transmute is to be a possibility? What I mean is that:

#[repr(transparent)]
struct New(Base);

is semantically a new-type today - you just lack the auto-deriving capabilities. Those capabilities could perhaps be added for some traits (decidable by analysis on the syntax of the trait and not a specific list of traits?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deriving creates more code for something that is just a semantic name change. Additionally it only works when you already know what traits to derive. Any traits implemented downstream for the base type won't get implemented for your type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, you have to explicitly say what types you want to derive - often, there might be many traits you do want to derive and perhaps 1-2 you want to leave out in order to give a different impl for the newtype. A common example of this scenario is the Monoid trait and numeric types.

// assume TVec is essentially `Vec` with a generic arg for the index type
type UserVec is new TVec<UserIndex, User>;
type PetIndex is new usize;
type PetVec is new TVec<PetIndex, Pet>;

fn foo(&mut self, u: UserIndex, p: PetIndex) {
self.users[p].pets.add(u); // ERROR users array can only be indexed by UserIndex
self.users[u].pets.add(p); // correct
}
```

# Reference-level explanation

Steal the `is new` syntax from Ada's newtypes by extending type aliases
declarations with the `type` keyword.

```rust
type Inch is new usize;
type Cm is new usize;

// We want to do generic manipulations
fn calc_distance<T: Sub>(start: T, end: T) -> T {
end - start
}

// Initialize the same way as the underlying types
let (start_inch, end_inch): (Inch, Inch) = (10, 18);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Inch has a smart constructor explicitly rejecting subsets of usize? Seems like that use case isn't supported, which is what I expect from a feature called "newtype" (as used in Haskell).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave that to the ranged types #671 or custom literals ideas

let (start_cm, end_cm): (Cm, Cm) = (2, 5);

// Here `calc_distance` operates on the types `Inch` and `Cm`,
// where previously we had to cast to and from `usize`.
let inch_dist = calc_distance(start_inch, end_inch);
let cm_dist = calc_distance(start_cm, end_cm);

println!("dist: {} and {}", inch_dist, cm_dist);

// Disallowed at compile time
let not_allowed = calc_distance(start_inch, end_cm);
```

It would also allow generics:

```rust
struct A<N, M> { n: N, m: M }
type B<T> is new A<usize, T>;

let b = B { n: 2u, m: "this is a T" };
```

It would not be possible to use the newtype in place of the parent type,
we would need to resort to traits.

```rust
fn bad(x: usize) { ... }
fn good<T: Sub>(x: T) { ... }

type Foo is new usize;
let a: Foo = 2;
bad(a); // Not allowed
good(a); // Ok, Foo implements Sub
```

## Derived traits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #![derive_unfinished(..)] RFC has some notes here on how we can be more flexible wrt. deriving by extending the syntax of derive(_unfinished) itself. Feel free to steal any of those notes =) Tho I understand if you want to conservative initially.

I think generalized-newtype-deriving would be the major ergonomics-boost coming from this proposal.


In the derived trait implementations the basetype will be replaced by the newtype.

So for example as `usize` implements `Add<usize>`, `type Inch is new usize`
would implement `Add<Inch>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section requires a serious expansion. Without auto trait implementation, I don't see how this is better than simply introducing user-defined literals (1234_inch).

Examples to clarify:

  1. usize has an inherent method fn trailing_zeros(self) -> u32, should it be available to Inch?
  2. If (1) is yes, usize has an inherent method fn swap_bytes(self) -> usize, should it be available to Inch and returns an Inch?
  3. If (2) is yes, usize has an inherent method fn checked_neg(self) -> Option<usize>, should it be available to Inch and returns an Option<Inch>?
  4. If (2) is yes, usize has an associated (static) method const fn min_value() -> usize, should it be available to Inch?
  5. usize implements Add<&usize>. Would Inch implement Add<&Inch>?
  6. If (5) is yes, &usize implements Add<&usize>, would &Inch implement Add<&Inch>?
  7. Should usize implement Add<Inch>? Add<&Inch>?
  8. usize implements Shr<T> for all integer types T (u8, u32, usize etc). Should Inch implement Shr<usize>? Should Inch implement Shr<Inch>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. yes, that's the entire point of semantic newtypes, you get everything the base type has
  2. yes
  3. yes
  4. yes
  5. yes
  6. yes
  7. no
  8. The strategy proposed in this RFC would result in Inch: Shr<Inch>

I'll add these points in the RFC text


## Scoping

Newtypes would follow the natural scoping rules:

```rust
type Inch is new usize; // Not accessible from outside the module
pub type Cm is new usize; // Accessible

use module::Inch; // Import into scope
pub use module::Inch; // Re-export
```

### Reexporting private types

Newtypes are allowed to be newtypes over private types:

```rust
mod foo {
struct Foo;
pub type Bar is new Foo;
}
let x: foo::Bar = ...; // OK
let x: foo::Foo = ...; // Not OK
```

## Casting

Newtypes can explicitly be converted to their base types, and vice versa.
Implicit conversions are not allowed.
This is achieved via the `Into` trait, since newtypes automatically implement
`From<BaseType>` and `From<NewType> for BaseType`. In order to not expose new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is achieved via the Into trait, since newtypes automatically implement From<BaseType> and From<NewType> for BaseType.

That sounds problematic. These impls would be available to anyone as soon as NewType is exported due to the way impls are not scoped in Rust. This means that you have no ability to write smart constructors that constrain BaseType further in the allowed set of values. While refinement types may partially (the expressiveness of refinement may/should not be turing complete, unlike unconstrained Rust-code, etc.) solve this in the future, we are far from having those currently. I think it is important that newtypes be allowed to provide only smart constructors such as NewType::new(..) -> Self that are user defined.

To that end, as an alternative I'd like to propose that the compiler generate a module:

mod NewType {
    pub fn from_base(BaseType) -> NewType {..}
    pub fn into_base(NewType) -> BaseType {..}

Another alternative may be inherent and associated methods on NewType.
The From impls should only be generated and iff the user does derive(From, Into) or perhaps derive(Newtype) (with a possible actual trait Newtype) to avoid conflicts with custom-derive.

Some notes discussing T<NewType> -> T<BaseType> where T may be &, Box, Arc, Vec, .. , would be nice even if you are proposing we not allow those conversions. In that case, notes on why the conversions should not be allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an open question as stated at the end of the RFC ;)

I'm totally on board with scrapping these impls.

I personally don't need them. I'd be happy writing them out whenever needed.

Some notes discussing T -> T

I'll add those

`as` casts, the automatically generated implementation simply contains a
`transmute`.

```rust
type Inch is new usize;

fn frobnicate(x: usize) -> usize { x * 2 + 14 - 3 * x * x }

let x: Inch = 2;
println!("{}", frobnicate(x.into()));

let a: usize = 2;
let i: Inch = a; // Compile error, implicit conversion not allowed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between a and 2 that makes

let i: Inch = a; // illegal
let j: Inch = 12; // legal

? Being a "literal"? What about a constant

const YARD: usize = 36;
let k: Inch = YARD;

And expression

let day: Seconds = 24 * 60 * 60;

What if the base type cannot be represented using any kind of literals

type RcInch is new Rc<usize>;
let ai: RcInch = Rc::new(0); // ?????

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that a has a type.

let a = 2;
let i: Inch = a; // works, because `a` is inferred to be `Inch`

The same with expressions. It already works for choosing any of the base integer types. If inference is adjusted to newtypes, this will work out of the box just like the inferred variable type example above.

type RcInch is new Rc<usize>;
let ai: RcInch = Rc::new(0); // ERROR, expected RcInch, got Rc
let aj: RcInch = Rc::new(0).into(); // might work, but a lot of generics
let ak: RcInch = RcInch::new(0); // ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oli-obk Thanks for the clarifications.

In Rust only integer and floating point literals have unspecified types. I suppose we want to support these cases?

type Identifier<'a> is new &'a str;
type Verbose is new bool;

let default_verbose: Verbose = false;
let tag_name: Identifier<'static> = "center";

Do we need to make these literals' types to be a generic {{boolean}} and {{string}}?

let i = Inch::from(a); // Ok
let i: Inch = a.into(); // Ok
let b = usize::from(i); // Ok
```

## Grammar

The grammar rules will be the same as for `type`, but there are two new
contextual keywords `is` and `new`. The reason for using `is new` instead of
another sigil is that `type X = Y;` would be very hard to distinguish from any
alternative like `type X <- Y;` or just `type X is Y;`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this grammar, see newtype as an alternative syntax above.

## Implementation

The compiler would treat newtypes as a thin wrapper around the original type.
This means that just declaring a newtype does *not* generate any code, because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with:

This is achieved via the Into trait, since newtypes automatically implement From<BaseType> and From<NewType> for BaseType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, good catch 😆

the trait and inherent implementations of the base type are reused.

# Drawbacks

It adds a new contextual keyword pair to the language and increases the language complexity.

This requires nontrivial implementation work in the compiler and will touch
essentially the entire compilation pipeline.

Automatically deriving all traits may not make sense in some cases. For example
deriving multiplication for `Inch` doesn't make much sense, as it would result
in `Inch * Inch -> Inch` but semantically `Inch * Inch -> Inch^2`. This is a
deficiency in the design and may be addressed by allowing overwriting trait
implementations on newtypes. Such a change would be strictly backwards
compatible in the language, even if creating overwriting trait impls won't be
backwards compatible for libraries.

Types like `Vec<T>` can't have their index type overwritten by a newtype. With
the increased availability of newtypes this could be resolved by a new generic
argument to `Vec`, which defaults to `usize` and requires an `Into<usize>` impl.

# Alternatives

* Explicitly derive selected traits

The [`newtype_derive`](https://crates.io/crates/newtype_derive) crate allows
deriving common traits that just forward to the inner value.

```rust
#[macro_use] extern crate custom_derive;
#[macro_use] extern crate newtype_derive;

custom_derive! {
#[derive(NewtypeFrom, NewtypeAdd, NewtypeMul(i32))]
pub struct Happy(i32);
}
```

This would avoid the problems with automatically deriving common traits,
while some would not make sense.

We could save a keyword with this approach and we might consider a generalization
over all tuple structs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an excellent idea! Please do consider that generalization =)

This approach requires not only two crates, a macro invocation and a list
of derives, it also doubles the amount of code generated compared to the
newtype approach.

* Keep it the same

It works, but life could be simpler. The amount of workarounds, macros and
complaints about it seem to suggest that something needs to be done. Even
the compiler itself uses generated newtypes extensively for special `Vec`s that
have a newtype index type instead of `usize`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are excellent points.

# Unresolved questions

* Conversion from basetype to newtype and vice versa not via `From`?
* might cause accidental usage of basetype where newtype was expected (e.g. in heavily generic code)