Skip to content

Get rid of new_uninitialized and new_uninitialized_generic. #556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sebcrozet opened this issue Feb 23, 2019 · 9 comments
Open

Get rid of new_uninitialized and new_uninitialized_generic. #556

sebcrozet opened this issue Feb 23, 2019 · 9 comments
Labels
breaking change Fixing this issue, or merging this pull-request is likely to require breaking changes enhancement P-medium Medium priority

Comments

@sebcrozet
Copy link
Member

sebcrozet commented Feb 23, 2019

The two unsafe functions Matrix::new_uninitialized and Matrix::new_uninitialized_generic are used when we don't care about the initial value of a matrix/vector because they will be overwritten by the result of other operations. We use them mostly for performance reasons to avoid an unnecessary initialization step.

However, this is the most significant blocker for #282. So if we ever want to support non-Copy types, this is the first thing to address in a way that does not hurt performance for Copy types.

@sebcrozet sebcrozet added enhancement P-medium Medium priority breaking change Fixing this issue, or merging this pull-request is likely to require breaking changes labels Feb 23, 2019
@sebcrozet sebcrozet changed the title Get rid of new_uninitialized and new_uninitialized_generic. Get rid of new_uninitialized and new_uninitialized_generic. Feb 23, 2019
@aweinstock314
Copy link
Contributor

I'm interested in working on this, as I want non-Copy scalars (#282) for matrices of gmp::Mpz for some cryptanalysis tooling that I'm writing. I've gotten the benchmarks running locally (requires PR #678) so I can measure performance impact.

It looks like currently like new_uninitialized is implemented (by the macros impl_constructors and componentwise_constructors_impl) in terms of MatrixMN::new_uninitialized_generic, which in turn calls DefaultAllocator::allocate_uninitialized. DefaultAllocator::allocate_uninitialized calls mem::uninitialized if the dimensions are static, and Vec::{reserve_exact,set_len} if either dimension is dynamic.

In order to support non-Copy scalars while still using the above scheme for Copy scalars (for performance), it seems necessary to add another dimension to this type-case (e.g. go from {static, dynamic} x {static, dynamic} to {Copy, !Copy} x {static, dynamic} x {static, dynamic} for Allocator implementations.

I've tried the following sketches of some trait-level approaches to allowing this branching, but neither quite work:

mod bools {
    pub enum True {}
    pub enum False {}
    pub trait TypeLevelBool {}
    impl TypeLevelBool for True {}
    impl TypeLevelBool for False {}
    pub trait TraitLevelTrue: TypeLevelBool {}
    impl TraitLevelTrue for True {}
    pub trait TraitLevelFalse: TypeLevelBool {}
    impl TraitLevelFalse for False {}
}
use bools::{TypeLevelBool, True, False};

pub unsafe trait IsUninitializable { type Value: TypeLevelBool; }
unsafe impl IsUninitializable for u8 { type Value = True; }
unsafe impl IsUninitializable for Vec<u8> { type Value = False; }
pub trait AllocatorIsh<S: IsUninitializable> {
    unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<S>;
}
pub struct DefaultAllocatorIsh;

// the following attempt fails because equality bounds aren't yet supported, but if they were, this would guarantee non-overlap
#[cfg(attempt1)]
mod attempt1 {
    use super::*;
    impl<S: IsUninitializable> AllocatorIsh<S> for DefaultAllocatorIsh where <S as IsUninitializable>::Value = True {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<S> {
            let mut ret = Vec::new();
            ret.reserve_exact(n);
            ret.set_len(n);
            ret
        }
    }
    impl AllocatorIsh<Vec<u8>> for DefaultAllocatorIsh where <Vec<u8> as IsUninitializable>::Value = False {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<Vec<u8>> {
            vec![vec![]; n]
        }
    }
}

// the following attempt fails to compile because from rustc's point of view, bools could semver-compatibly implement TraitLevelFalse for True, so this doesn't guarantee non-overlap
#[cfg(attempt2)]
mod attempt2 {
    use super::*;
    use bools::{TraitLevelTrue, TraitLevelFalse};
    impl<S: IsUninitializable> AllocatorIsh<S> for DefaultAllocatorIsh where <S as IsUninitializable>::Value: TraitLevelTrue {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<S> {
            let mut ret = Vec::new();
            ret.reserve_exact(n);
            ret.set_len(n);
            ret
        }
    }
    impl AllocatorIsh<Vec<u8>> for DefaultAllocatorIsh where <Vec<u8> as IsUninitializable>::Value: TraitLevelFalse {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<Vec<u8>> {
            vec![vec![]; n]
        }
    }
}

fn main() {}

Does this seem like a good approach, in the sense that if the trait metaprogramming can be made to work, I'm not missing any performance/safety concerns?

@aweinstock314
Copy link
Contributor

I got a variant of the first approach working, and added more comments:

mod bools {
    pub enum True {}
    pub enum False {}
    pub trait TypeLevelBool {}
    impl TypeLevelBool for True {}
    impl TypeLevelBool for False {}
}
use bools::{TypeLevelBool, True, False};

pub unsafe trait IsUninitializable { type Value: TypeLevelBool; }
// examples of Copy types that should use uninitialized
unsafe impl IsUninitializable for u8 { type Value = True; }
unsafe impl IsUninitializable for u16 { type Value = True; }
// example of a !Copy type that shouldn't use uninitialized
unsafe impl IsUninitializable for Vec<u8> { type Value = False; }

// AllocatiorIsh is Allocator specialized to the dynamic case, as mem::uninitialized isn't used explicitly, but is implicit in the use of set_len
// this should extend to cross-product impls for static and dynamic dimensions straightforwardly
pub trait AllocatorIsh<S: IsUninitializable> {
    unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<S>;
}
pub struct DefaultAllocatorIsh;

mod attempt3 {
    use super::*;
    // any type can opt into the optimized case by `unsafe impl`-ing IsInitializable<Value=True>
    impl<S: IsUninitializable<Value=True>> AllocatorIsh<S> for DefaultAllocatorIsh {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<S> {
            let mut ret = Vec::new();
            ret.reserve_exact(n);
            ret.set_len(n);
            ret
        }
    }
    // for IsUninitializable<Value=False> types, instances can be given explicitly, with ad-hoc initialization, by downstream crates
    impl AllocatorIsh<Vec<u8>> for DefaultAllocatorIsh {
        unsafe fn allocate_maybe_uninitialized(n: usize) -> Vec<Vec<u8>> {
            vec![vec![]; n]
        }
    }
}

fn main() {}

@aweinstock314
Copy link
Contributor

On further reflection, it looks like the point of std::mem::MaybeUninit is exactly for this use case (surpressing UB on uninitialized values with references in them). I'll look into how to make Allocator::allocate_uninitialied return a Buffer with N=MaybeUniit<S>. If it works out, I think this approach will requires less boilerplate from downstream crates.

@sebcrozet
Copy link
Member Author

sebcrozet commented Nov 21, 2019

@aweinstock314 The proposal based on Isuninitializable is interesting, though it feels wrong to require all the users of matrices with custom types to implement an unsafe trait. In addition the current way the Allocator trait is implemented is already quite difficult as it requires to deal with several combinations of parameters (see there). Adding a third dimension will likely make this even more complicated.

The actual solution would be to find a way to leverage the MaybeUninit type as you suggest.

I'll look into how to make Allocator::allocate_uninitialied return a Buffer with N = MaybeUninit<S>

Returning a buffer with a scalar type equal to MaybeUninit<N> (it's better avoid using S as a type parameter because it is already used for the generic Storage throughout nalgebra) may be a good idea. Though is there any way to get a Matrix<N, ...> from a Matrix<MaybeUninit<N>, ...> and viceversa? The documentation appears to say that transmuting a [MaybeUninit<Vec<...>>] to a [Vec<...>] is sound as long as all the elements have been initialized. Is it still the case for our matrix types?

@aweinstock314
Copy link
Contributor

#684 supports non-Copy scalars well enough to typecheck some (currently private) code that uses gmp::Mpz (with a newtype wrapper to implement Scalar). At runtime, there's a double-free in malloc in gmp::Mpz::drop, which is likely caused by mem::uninitialized, so fixing this is my next task.

@aweinstock314
Copy link
Contributor

Could I get feedback on the current design?

The core change is adding an UninitBuffer type to the Allocator trait: https://github.com/aweinstock314/nalgebra/blob/649507db1daf4c6aac55d9ebc33371cf86cd7d58/src/base/allocator.rs#L23-L26 . This type "lifts" mem::MaybeUninit to the Storage level. allocate _uninitialized doesn't yet use it; making that change typecheck will involve fixing all the uses of uninitialized memory in the codebase.

With respect to the soundness of converting between Matrix<N, ...> and Matrix<MaybeUninit<N>, ...> (which in the current approach, would be between Matrix<N, R, C, FooStorage<N, R, C, InitializedTag>> and Matrix<N, R, C, FooStorage<N, R, C, UninitializedTag>>), that should follow for the same reasons that it works for Vec. I'll add helpers to convert between the two as part of the refactoring to be triggered by changing the return type of Allocator::allocate_uninitialized.

The definition of UninitBuffer requires some extra machinery to be added to the Storage trait, that's done here: https://github.com/aweinstock314/nalgebra/blob/649507db1daf4c6aac55d9ebc33371cf86cd7d58/src/base/storage.rs#L37-L65 .

Most of the rest of the changes already done to this branch involve getting {ArrayStorage, VecStorage, SliceStorage} to work with the new definitions. This includes some mostly-duplicate definitions of {,Contiguous}Storage{,Mut} across {InitializedTag, UninitializedTag} which might be worth defining via macro_rules.

This is currently blocked waiting for a fix for a rustc ICE (probably a duplicate of rust-lang/rust#68090, which already has a fix in progress).

@sebcrozet
Copy link
Member Author

This looks like a great approach!
I believe allocate_uninitialized(), just like MaybeUninit::uninit(), does not have to be unsafe.

Once the trait architecture woks, the main challenge will be to make the methods from the base::blas module work (starting with the simplest method: axpy). In particular the transition between initialized and uninitialized types may not be so easy to handle transparently.

@RalfJung
Copy link

Heads-up, in rustc we are trying to better detect UB due to misuse of uninitialized memory (rust-lang/rust#71274), and I think this issue came up as a regression for the next stage of that check. Specifically, this crater regression looks to me like it is caused by these uninitialized methods being insta-UB on a type that involves bool in its name -- and indeed, and uninitialized bool is UB.

@RalfJung
Copy link

Ah, but you are using the dreadful MaybeUninit::uninit().assume_init() and thus side-stepping our ability to detect this problem. So you are "good"... in a sense. ;)

aweinstock314 added a commit to aweinstock314/nalgebra that referenced this issue Nov 27, 2020
…allocate_uninitialized` and `Matrix::new_uninitialized_generic`.

Most call sites still invoke UB through `assume_init`. Said call sites instead invoke `unimplemented!()` if the `no_unsound_assume_init` feature is enabled, to make it easier to gradually fix them.

Progress towards dimforge#556.
aweinstock314 added a commit to aweinstock314/nalgebra that referenced this issue Nov 27, 2020
…allocate_uninitialized` and `Matrix::new_uninitialized_generic`.

Most call sites still invoke UB through `assume_init`. Said call sites instead invoke `unimplemented!()` if the `no_unsound_assume_init` feature is enabled, to make it easier to gradually fix them.

Progress towards dimforge#556.
aweinstock314 added a commit to aweinstock314/nalgebra that referenced this issue Nov 27, 2020
…allocate_uninitialized` and `Matrix::new_uninitialized_generic`.

Most call sites still invoke UB through `assume_init`. Said call sites instead invoke `unimplemented!()` if the `no_unsound_assume_init` feature is enabled, to make it easier to gradually fix them.

Progress towards dimforge#556.
sebcrozet pushed a commit to aweinstock314/nalgebra that referenced this issue Feb 25, 2021
…allocate_uninitialized` and `Matrix::new_uninitialized_generic`.

Most call sites still invoke UB through `assume_init`. Said call sites instead invoke `unimplemented!()` if the `no_unsound_assume_init` feature is enabled, to make it easier to gradually fix them.

Progress towards dimforge#556.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Fixing this issue, or merging this pull-request is likely to require breaking changes enhancement P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

3 participants