Skip to content

Functional system redesign #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 7, 2018
Merged

Functional system redesign #57

merged 8 commits into from
Mar 7, 2018

Conversation

novacrazy
Copy link
Collaborator

@novacrazy novacrazy commented Mar 5, 2018

This explanation is going to be a bit wild.

Basically, my entire motivation for this and the previous work with zip, map and so forth was to organize safe operations in a way conducive to compiler optimizations, specifically auto-vectorization.

Unfortunately, this only seems to work on slices with a known size at compile-time. I guess because they are an intrinsic type. Any and all attempts to get a custom iterator to optimize like that has failed, even with unstable features.

Even though they technically worked, I wasn't happy with how the previous work did functional operations, with map/map_ref and zip/zip_ref. It felt a bit unintuitive. They were also strictly attached to the GenericArray type, so they were useless with GenericSequence by itself, like with generics.

So, I've redefined GenericSequence like this:

pub unsafe trait GenericSequence<T>: Sized + IntoIterator {
    type Length: ArrayLength<T>;
    type Sequence: GenericSequence<T, Length=Self::Length> + FromIterator<T>;

    fn generate<F>(f: F) -> Self::Sequence
        where F: Fn(usize) -> T;
}

where Sequence is defined as Self for the GenericArray implementation.

That may seem redundant, but now GenericSequence is broadly implemented for &'a S and &'a mut S, and carries over the same Sequence.

So:

<&'a S as GenericSequence<T>>::Sequence == <S as GenericSequence<T>>::Sequence

Furthermore, IntoIterator is now implemented for &'a GenericArray<T, N> and &'a mut GenericArray<T, N>, where both of those implementations use slice iterators, and each reference type automatically implements GenericSequence<T>

Next, I've added a new trait called MappedGenericSequence, which looks like:

pub unsafe trait MappedGenericSequence<T, U>: GenericSequence<T>
where
    Self::Length: ArrayLength<U>,
{
    type Mapped: GenericSequence<U, Length=Self::Length>;
}

and the implementation of that for GenericArray is just:

unsafe impl<T, U, N> MappedGenericSequence<T, U> for GenericArray<T, N>
where
    N: ArrayLength<T> + ArrayLength<U>,
    GenericArray<U, N>: GenericSequence<U, Length=N>,
{
    type Mapped = GenericArray<U, N>;
}

As you can see, it just defines another arbitrary GenericArray with the same length. The transformation allows for proving one GenericArray can be created from another, which leads into the FunctionalSequence trait.

You can see the default implementation for it in src/functional.rs, which uses the fact that any GenericSequence is IntoIterator and the associated Sequence is FromIterator to map/zip sequences using only simple iterators.

FunctionalSequence is also automatically implemented for &'a S and &'a mut S where S: GenericSequence<T>, so they automatically work with &GenericArray as well.

Furthermore, it's implemented directly on GenericArray as well, which uses the ArrayConsumer system to provide a lightweight and optimizable implementation, rather than relying on GenericArrayIter, which cannot be optimized.

As a result, code like in the assembly test:

let a = black_box(arr![i32; 1, 3, 5, 7]);
let b = black_box(arr![i32; 2, 4, 6, 8]);

let c = a.zip(&b, |l: i32, r: &i32| l + r);

assert_eq!(c, arr![i32; 3, 7, 11, 15]);

will correctly be optimized into a single VPADDD instruction, just as desired.

The downside of this is that non-reference RHS arguments will kill this optimization, because it will use .into_iter() and GenericArrayIter. There really isn't a good way around this currently. I found a good way around this currently.

The upside of all of this is that pass any random GenericSequence without knowing the length is finally feasible, as shown in tests/std.rs, and here:

pub fn test_generic<S>(s: S)
    where
        S: FunctionalSequence<i32>,            // `.map`
        SequenceItem<S>: Add<i32, Output=i32>, // `+`
        S: MappedGenericSequence<i32, i32>,    // `i32` -> `i32`
        MappedSequence<S, i32, i32>: Debug     // println!
{
    let a = s.map(|x| x + 1);

    println!("{:?}", a);
}

Which still has zero runtime length checking, but we've avoided having to know the length of the sequence. Furthermore, now test_generic can work for GenericArray, &GenericArray and &mut GenericArray with no problems.

BREAKING CHANGES:

  • The implementation of FromIterator for GenericArray now panics when the given iterator doesn't produce enough elements, wherein before it padded it with defaults.
  • map_ref and zip_ref are gone, replaced with the new functional system.
  • map/zip can fail to optimize unless used with references. Fixed
    • I should note that auto-vectorization only works on numbers anyway, so it's no worse than vec::IntoIter in the worst case.
  • 
    

What do you think? Perhaps I should write up some examples for the docs, too?

If I failed to explain anything, made a mistake or could improve on anything, please let me know. I just want to make the best things I can.

@fizyk20
Copy link
Owner

fizyk20 commented Mar 5, 2018

Whoah. I need to take some time to digest this, but I admire your commitment ;) Looks like a nice idea, though - making the crate cooperate with the compiler optimizations better will certainly be a plus.

@novacrazy
Copy link
Collaborator Author

Oh, I did forget to mention two things:

  1. from_slice and from_mut_slice are now wrappers around the From trait, where the real casting is now implemented. This doesn't really affect anything.
  2. map_slice is replaced with GenericArray::from_iter(slice.iter().map(...)). The only difference with that is pre-check vs. post-check of the length, but I don't think anyone will be using this for arrays so large that there will be any delay in panicking. from_iter is just more flexible.

Also, from_iter and GenericArrayIter were improved the best I could.

@novacrazy
Copy link
Collaborator Author

I think I need to take some extra time to optimize the case where both self and rhs are GenericArray, not references. The more I think about it the more it seems necessary.

@novacrazy
Copy link
Collaborator Author

novacrazy commented Mar 7, 2018

zip now optimizes correctly with most kinds of argument for self and rhs, with almost no further visible changes.

This is done via the only kind of specialization Rust currently supports, which is a default implementation of a hidden function on GenericSequence.

The added hidden function, inverted_zip, is zip with the arguments switched around, so self is the right argument. The 'left' argument is a GenericArray. The default implementation will use optimized iteration for the GenericArray and into_iter for self.

However, when the GenericSequence trait is implemented for GenericArray, we can override inverted_zip where self is now another GenericArray, and we can use optimized iteration for both of them.

This is in addition to the default implementation of zip on the FunctionalSequence trait. So altogether there are three versions of the zip function that blend together to handle almost any mix of GenericArray's and references.

EDIT: After I posted this I realized there is still that one mix of arguments that cannot be optimized, even with the new technique. Rust's lack of real specialization makes it even more difficult. The only one that isn't optimized is (&a).zip(b, ...).

I'll try to find a way to do that one, too.

Added missing MappedGenericSequence implementation for &mut S
@novacrazy
Copy link
Collaborator Author

novacrazy commented Mar 7, 2018

So it turns out that I could use that exact same trick to specialize the default implementation of FunctionalSequence::zip, and now every combination of:

  • a.zip(b, ...)
  • a.zip(&b, ...)
  • (&a).zip(b, ...)
  • (&a).zip(&b, ...)

and their mutable equivalents is correctly optimized.

No compromises for real this time, I hope.

@novacrazy
Copy link
Collaborator Author

novacrazy commented Mar 7, 2018

Is there anything else you'd like me to add? Perhaps a fold method while we're at it?

EDIT: It was so simple I went ahead and added it anyway. I can't think of anything else to add or improve, but if you have any thoughts I'd love to hear them!

@fizyk20
Copy link
Owner

fizyk20 commented Mar 7, 2018

This is awesome. I don't have any other ideas, except maybe filter or filter_map, but I don't think this will be possible (can't know the size of the result at compile time). So if you think you're done, I'm glad to merge this - and bump the version to 0.11, I guess, since some changes here are possibly breaking.

@novacrazy
Copy link
Collaborator Author

Yeah, those wouldn’t be possible because of the compile time length.

However, if it sounds like a good idea to you, after this is merged I can open a new PR to add standard library support with a cargo feature. Or rather, disable it with a no_std feature to match up with how other crates do it.

Then we can have Vec specific methods like efficiently creating one from a generic array instead of using the iterators, and even filter and filter_map that return results in a Vec.

@novacrazy
Copy link
Collaborator Author

Alright, this is ready to merge. I also have the std stuff ready to go when that's done.

@fizyk20
Copy link
Owner

fizyk20 commented Mar 7, 2018

Awesome! I'll merge it.

Regarding the std stuff - the crate is currently no-std by default and I think it's more fitting for something so basic... but then again, it's true that other crates usually use std by default and add a feature for removing the dependency. That would be another big breaking change, though - which I'd prefer to avoid, even though we're still 0.x. I'm not sure which approach is better, to be honest.

Anyway, we can discuss it separately - for now, thanks a lot for this PR, it is a piece of awesome work :)

@fizyk20 fizyk20 merged commit d3c1b1d into fizyk20:master Mar 7, 2018
@novacrazy
Copy link
Collaborator Author

So while browsing the auto-generated documentation I noticed that inverted_zip and inverted_zip2 were not properly hidden when overridden in the GenericArray implementation, even though they are correctly hidden in the trait definition.

So either you can do it or I can open a new pull request, but #[doc(hidden)] needs to be added to those two functions in src/lib.rs.

Sorry about that.

@fizyk20
Copy link
Owner

fizyk20 commented Mar 8, 2018

No worries, I'll add this. Nice catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants