Integer Manipulation API

# Proposal

## Problem statement


Logically there are 6 different behaviors that a conversion between two integers may have:
1. reinterpret bits
2. truncate bits
3. zero extend bits
4. sign extend bits
5. keep numerical value and saturate if out of range
6. keep numerical value and panic if out of range 

`as`-casts implement the first four of these possible behaviors, but can only express one of these behaviors for each pair of types `T as U`.
`TryFrom` can express behaviors 5 and 6 with the help of some extra code.

This API aims to implement **all** of behaviors 1 through 4 on every possible pair of integer types, using code that more directly expresses the desired behavior and can be combined to express more behaviors.


## Motivation, use-cases



Currently, converting between integer types can be done in two ways:
1. `as`-casts, which have a well defined effect for each pair of types
2. Manual bit manipulation to get the bits the way that you want, then using `as` casts or the `{to,from}_bytes` APIs on different integer types, which also involves making an array larger/smaller

Option 1 uses `as`, which can be undesirable due to its leniency in input types, willingness to silently change behavior if types change, and restricted sets of behaviors.

Option 2 requires manual bit manipulation, even when that manipulation shouldn't need to be complicated. Even worse it requires expanding or shrinking an array, which is difficult to do concisely.

The ability to express any combination of truncation, zero extending, sign extending, and bit reinterpretation with code that can be checked to do the correct behavior at compile time is better than either of these current solutions, even if it is more wordy.

Use cases:
```rust
// sign extending `val` from an i16 to a u32

// works fine, but you have to know all the `as` behaviors
// if `val` changes to a i64, this silently truncates
val as u32; 

// declares what behavior it wants
// if `val` changes to i64, this no longer compiles
// "extending" to a smaller type does not make sense
val.sign_extend::<i32>().cast::<u32>();
```

```rust
// zero extending `val` from an i16 to a u64

// unclear why this goes through u16
// if `val` changes to i32 this no longer zero extends, it truncates in the middle
val as u16 as u64; 

// declares what behavior it wants
// if `val` changes to i32 it compiles and continues to zero extend
val.zero_extend::<i64>().cast::<u64>();
```
```rust
// The dangers of `as` when used carelessly

fn convert(x: u32) -> i32 {
    x as _ // reinterprets
}
// changes to:
fn convert(x: u32) -> i64 {
    x as _ // now does a zero extension because the inferred type changed!
}


// The new API adds guarantees about what operations happen

fn convert(x: u32) -> i32 {
    x.cast() // reinterprets
}
// changes to:
fn convert(x: u32) -> i64 {
    x.cast() // does not compile, the sizes are not the same
}
```

## Solution sketches



In each of these examples, assume that `Self` is an integer type and that the target type `U` is also an integer.

(name might need improvement, `bit_cast`?)
```rust
fn cast<U>(self) -> U
```

- Converts one integer type into an integer type with the same size by bit casting
- Only exists for pairs of integers with the same size (`i8` -> `u8`, `u128` -> `i128`, etc) because that's the only unambiguous bit cast behavior
    - COUNTEREXAMPLES: `u8` -> `i16` or `i32` -> `i64`
- Does not exist to increase size of integers, use `zero_extend` or `sign_extend` instead (should be documented)
- Does not exist to decrease size of integers, use `truncate` instead (should be documented)
- **Does not** preserve numerical value (should be documented)
- The identity cast is supported (`u8` -> `u8`), even though it's not very useful

```rust
fn zero_extend<U>(self) -> U
```

- Extends an integer type into a larger integer type by filling in the high bits with zeros
- Only exists for pairs of integers where the target type is strictly larger than the self type and the signedness does not change (`u8` -> `u16`, `i32` -> `i64`, etc)
- Does not exist for same size, smaller size, or changed signs
    - COUNTEREXAMPLES: `i8` -> `u16`, `u8` -> `i8`, `u64` -> `u32`
- **Never** preserves numerical value for *signed* types (this should be documented with a big noticible red flashy block)
- **Always** preserves numerical value for *unsigned* types

```rust
fn sign_extend<U>(self) -> U
```

- Extends an integer type into a larger integer type by filling in the high bits with copies of the sign bit of the self type
- Only exists for pairs of *signed* integers where the target type is strictly larger than the self type (`i8` -> `i16`, `i16` -> `i128`)
- Does not exist for equal or smaller target types, does not exist to change sign
    - COUNTEREXAMPLES: `i8` -> `i8`, `i64` -> `i16`, `i8` -> `u128`
- Does not exist for unsigned integers (there's no sign to extend), use `zero_extend` instead (this should be documented)
- **Always** preserves numerical value as a result of integers using 2's complement (this should be documented)

```rust
fn truncate<U>(self) -> U
```

- Converts from one integer type into a smaller integer type by truncating the high bits
- Only exists for pairs of integers where the target type is strictly smaller than the self type and where the signedness of the integers does not change (`u64` -> `u16`, `i128` -> `i32`, etc)
- Does not exist for same size or larger target types as there is no truncating operation, use `zero_extend` or `sign_extend` instead (should be documented)
- Does not exist for converting signs, use `cast` instead (should be documented)
- **Does not** necessarily preserve numerical value for any types (should be documented)
- The reason for not allowing signedness changes is to prevent some surprising behavior.  For example `-1_i16` truncated to `u8` directly via `as`-casting would be `255_u8`.  All sign changing behavior should be done with cast: `-1_i16.truncate::<i8>().cast::<u8>() == 255_u8`

#### Interactions with `usize` and `isize`

`usize` and `isize` have target dependent widths which complicates interactions with them.  In the interest of making the methods consistent between targets and not introduce more surprising behavior, `usize` and `isize` will only be able to be *truncated to* a `u8` or `i8` and will only be able to be *extended from* a `u8` or `i8`.  The `cast` method will consider `usize` and `isize` to be the same size as each other, but not the same size as any other type (even if that is true on this target).  See below or the full implementation list for more details.  The reasoning behind this specific choice is because the minimum possible size for `usize` and `isize` is 16 bits but there is no maximum size.  `usize` and `isize` therefore cannot be reliably truncated to any type larger than 8 bits (they might not be large enough to truncate) and may not be extended into from any type larger than 8 bits (they might not be large enough to hold the source).  `usize` and `isize` may not be extended into any type because the target type cannot reliably be larger than the source. Even though `usize` and `isize` are always at least 16 bits, they do not have the operation to truncate *to* 16-bit integers or extend *from* 16-bit integers because these operations may be a no-op in some cases, but not others.

## Supported Operations
<details>
    <summary>Click to open (warning: long)</summary>

`cast`:

- `u8` -> `u8`
- `i8` -> `i8`
- `u8` -> `i8`
- `i8` -> `u8`
- `u16` -> `u16`
- `i16` -> `i16`
- `u16` -> `i16`
- `i16` -> `u16`
- `u32` -> `u32`
- `i32` -> `i32`
- `u32` -> `i32`
- `i32` -> `u32`
- `u64` -> `u64`
- `i64` -> `i64`
- `u64` -> `i64`
- `i64` -> `u64`
- `u128` -> `u128`
- `i128` -> `i128`
- `u128` -> `i128`
- `i128` -> `u128`
- `usize` -> `usize`
- `usize` -> `isize`
- `isize` -> `isize`
- `isize` -> `usize`

`zero_extend`:

- `u8` -> `u16`
- `u8` -> `u32`
- `u8` -> `u64`
- `u8` -> `u128`
- `i8` -> `i16`
- `i8` -> `i32`
- `i8` -> `i64`
- `i8` -> `i128`
- `u16` -> `u32`
- `u16` -> `u64`
- `u16` -> `u128`
- `i16` -> `i32`
- `i16` -> `i64`
- `i16` -> `i128`
- `u32` -> `u64`
- `u32` -> `u128`
- `i32` -> `i64`
- `i32` -> `i128`
- `u64` -> `u128`
- `i64` -> `i128`
- `u8` -> `usize`
- `i8` -> `isize`

`sign_extend`:

- `i8` -> `i16`
- `i8` -> `i32`
- `i8` -> `i64`
- `i8` -> `i128`
- `i16` -> `i32`
- `i16` -> `i64`
- `i16` -> `i128`
- `i32` -> `i64`
- `i32` -> `i128`
- `i64` -> `i128`
- `i8` -> `isize`

`truncate`:

- `u16` -> `u8`
- `i16` -> `i8`
- `u32` -> `u8`
- `u32` -> `u16`
- `i32` -> `i8`
- `i32` -> `i16`
- `u64` -> `u8`
- `u64` -> `u16`
- `u64` -> `u32`
- `i64` -> `i8`
- `i64` -> `i16`
- `i64` -> `i32`
- `u128` -> `u8`
- `u128` -> `u16`
- `u128` -> `u32`
- `u128` -> `u64`
- `i128` -> `i8`
- `i128` -> `i16`
- `i128` -> `i32`
- `i128` -> `i64`
- `usize` -> `u8`
- `isize` -> `i8`
</details>

## Comparison to `as`-casts
For reference, here are the behaviors of `as`-casts on integer types:

- Casting between two integers of the same size (e.g. `i32` -> `u32`) is a no-op (Rust uses 2's complement for negative values of fixed integers)
- Casting from a larger integer to a smaller integer (e.g. `u32` -> `u8`) will truncate
- Casting from a smaller integer to a larger integer (e.g. `u8` -> `u32`) will
  - zero-extend if the source is unsigned
  - sign-extend if the source is signed

All current possible operations using `as`-casts can be replicated with at most two of these functions chained together except for certain operations with usize and isize (see above).  Examples (all types explicitly documented, type inference may make this cleaner):

- `u32 as i32` becomes `u32.cast::<i32>()`
- `i16 as u8` becomes `i16.truncate::<i8>().cast::<u8>()`
- `u128 as u32` becomes `u128.truncate::<u32>()`
- `u32 as i64` becomes `u32.zero_extend::<u64>().cast::<i64>()`
- `i64 as u128` becomes `i64.sign_extend::<i128>().cast::<u128>()`
  - however this API allows for the following *alternate* behavior, which is only possible via multiple `as`-casts
  - `i64.zero_extend::<i128>().cast::<u128>()` which zero extends the value rather than sign extends.  The equivalent behavior with `as`-casts is `i64 as u64 as u128`

## Links and related work



~~Entirely supersedes https://github.com/rust-lang/libs-team/issues/183.  `to_signed` and `to_unsigned` are representable with `cast`.~~ edit: this is not entirely true, there's specific macro cases that are much harder to represent.

## What happens now?

This issue is part of the libs-api team [API change proposal process]. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

[API change proposal process]: https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integer Manipulation API #204

Proposal

Problem statement

Motivation, use-cases

Solution sketches

Interactions with `usize` and `isize`

Supported Operations

Comparison to `as`-casts

Links and related work

What happens now?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integer Manipulation API #204

Description

Proposal

Problem statement

Motivation, use-cases

Solution sketches

Interactions with usize and isize

Supported Operations

Comparison to as-casts

Links and related work

What happens now?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Interactions with `usize` and `isize`

Comparison to `as`-casts