Skip to content

Remove One #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Remove One #61

wants to merge 2 commits into from

Conversation

willtebbutt
Copy link
Member

@willtebbutt willtebbutt commented Nov 1, 2019

What it says on the tin. Currently getting a load of errors because I'm trying to extern a Wirtinger somewhere. @oxinabox could do with your assistance here. Any idea what's going on? Is our overly-aggressive externing causing problems maybe?

edit: the surprisingly large number of changed line is due to sorting out some formatting issues.

@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

I will checkout your branch and give it a shot now

@willtebbutt
Copy link
Member Author

Please feel free to push directly. I'll leave it alone for now.

@simeonschaub
Copy link
Member

Could you give a bit of explanation of why we would want this and what problems One caused for you? It is probably worth discussing this in an isssue first, because, although probably not needed in quite as many places, I think One is still very useful. One is used in quite a few places in ChainRules.jl, it lets us use @scalar_rule in a lot of places, where we would otherwise either need to write rules manually or have an additional multiplication, and it's also very useful as a seed. We would otherwise have to "abuse" true in a lot of those places, which could lead to additional unnecessary allocations with arrays, among other potential problems.

@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

Please feel free to push directly. I'll leave it alone for now.

Cause is that Extern is recursive,
the definitions of + and * should use unthunk

@willtebbutt
Copy link
Member Author

willtebbutt commented Nov 1, 2019

Ah, yes, sorry for not raising an issue to discuss this first. @oxinabox and I spent the day thinking about this and related issues yesterday and thought it would be a good idea to remove One for the following reasons:

  • it's not particularly special as a Differential, in that the operations that you can perform on Differentials almost always take you away from One. i.e. adding, scaling, and applying a linear map to them (which are the only three things you can really do on Differentials in a meaningful way) all generally require that you first extern your One and then do stuff with it. Since extern isn't well-defined in general (what's it a One w.r.t?) you can't really do anything with them.
  • I question how useful they actually are for seeding. It's not obvious to me when one would wish to use them as such in reverse-mode, while in forwards-mode you generally (if I've understood correctly) are interested in sparsity structure, which is given by Zero (which really is a useful thing) rather than locations of Ones.
  • when working with scalar rules, does one(x) not sufficient?

We would otherwise have to "abuse" true in a lot of those places, which could lead to additional unnecessary allocations with arrays, among other potential problems.

Could you elaborate a bit on this please? When does having a One element actually save you from allocating?

@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

Could you give a bit of explanation of why we would want this and what problems One caused for you? It is probably worth discussing this in an isssue first, because, although probably not needed in quite as many places, I think One is still very useful.

One isn't actually a valid differential in general.
In contrast to Zero() which is always value

Part of my little differential types manifesto:
Note that this terminology its not written with reference to literature or even code really.

There are 3 kinds of things;

  • Primal types, (which may or may not also be valid differentials)
  • Differential types (which may or may not subtype AbstractDifferential)
  • Scale factor types (but we won't talk more on them as they are isomorphic to reals)

Primal Types exist, they have very few requirements on them.

Differential types exist, and they corespond to being valid for one or more Primal Type.
The most important rule is that for a differential type D to be valid for a primal type P,
then: ** for all p oftype P, for all d oftype D it must be true that p + d oftype P.
*

I thought this was enough at first, but it isn't.
They need a weaker version of closed under addition with themselves.
In particular:

  • for D a differential type that is valid for a primal type P
  • and for x and y oftype D,
  • then x+y must also be of some differential type that is valid for P. (often but not always D)
    We need this as for AD one wants to accumulate up gradients,
    and the accumulation must be of a differential type

Now, right now (without this PR) One() is a valid differential for scalars only.
Which is not that special, since true is also that.
as @simeonschaub pointed out a while ago we want One to be I for matrixes,
and

One can't be a valid differential for the Primals of Matrixes and for Scalars.
Because if it is valid for One() then it must be that One() + One() = 2
if it is valid for matrix's then One() + One() = 2I which is not addable to scalar so not a valid differential for matrixs.
We could make it One(2) but it seems like a lot of work for unclear gain.

Seperate to this we may want to reintroduce One later as a scale factor.
Right now this package conflates the two.
(Differentials don't need multiplication with anything other than scale-factors, but we define it)

There are some other rules too, I should write this out in a docs PR,
mostly they relate to scalefactors and zeros.

(*Note: Wirtinger is not a differential under this defintion, but it is some generalization, where potentially some sum of wirtingers will be a differential again)

@willtebbutt
Copy link
Member Author

I've opened #62 to discuss things related to this. It's a bit broader than this particular issue, but would be a good place to discuss it regardless.

@simeonschaub
Copy link
Member

Could you elaborate a bit on this please? When does having a One element actually save you from allocating?

One example would be the push_forward for matrix multiplication. We define:

frule(::typeof(*), A, B) = (dA, dB) -> (NO_RULE, dA * B, A * dB)

If we now wanted to find the jacobian of the function v -> M*v, what would we use as a seed? true, I, one(M)? All of them would have to actually be multiplied by M, but in all cases the result of this operation would have to be a new matrix. By using One(), we can just make this a no-op and return M.

Regarding @oxinabox's point: Why not just define extern(::One) = I? This would work well for scalar, as well as matrix cases. 2I plus a scalar does actually work, on 1.2 at least.

@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

Regarding @oxinabox's point: Why not just define extern(::One) = I? This would work well for scalar, as well as matrix cases. 2I plus a scalar does actually work, on 1.2 at least.

Good point, I did not realize that was true.

(It still wouldn't be a valid differential with #59 Composites though, but that might be OK)

@willtebbutt
Copy link
Member Author

@simeonschaub I'm not sure that I entirely follow your example. The frule for * would be something along the lines of

function frule(::typeof(*), A, B)
    return A * B, (_, dA, dB) -> dA * B + A * dB
end

Your proposed function is

v -> M * v

where M is some arbitrary AbstractMatrix. Now, assuming that the seed w.r.t. M is Zero, we would hopefully wind up just computing

M dv

for some vector-valued perturbation dv. The naive thing to do with forwards-mode is to run it multiple times with differently seeded dv each time, so as to eventually assemble the jacobian. The thing you're proposing to do (as I understand it) is to propagate multiple seeds at the same time (which is a thing that ForwardDiff does, and ForwardDiff2 will presumably allow, so is well established as a thing you should want to do and I'm on board with).

So to me it seems that the example you propose conflates individual differentials with a representation of a collection of differentials, which isn't something that we really have any notion of in ChainRules at the minute.

Were we to consider the single-seed setting, a 1-hot vector of some kind would be the appropriate thing.

@oxinabox oxinabox mentioned this pull request Nov 1, 2019
fix extra comma
Fix test as per Lyndon's suggestion

Co-Authored-By: Lyndon White <[email protected]>
fix type

fix left over one
@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

@willtebbutt I am done with messing with it, so you can touch it again freely.
I did a bunch of force-pushing to that it could easily be merged again if we do both this and #64

@willtebbutt
Copy link
Member Author

willtebbutt commented Nov 3, 2019

@simeonschaub regarding my above comment -- I am very pro- formalising this interface, as I imagine that @ChrisRackauckas need this for ForwardDiff2(?), but it feels like an orthogonal issue to this.

@oxinabox
Copy link
Member

oxinabox commented Nov 4, 2019

cc @YingboMa @shashi, I recall one of you demonstrated that One gave superior performance for something in ForwardDiff2

@nickrobinson251
Copy link
Contributor

bump :) is there still a decision to be made between this and #60 ?

@oxinabox
Copy link
Member

oxinabox commented Jan 5, 2020

bump :) is there still a decision to be made between this and #60

Yes. still needs to be decided.
#60 (comment)

I still don't think One() is a valid differential cf JuliaDiff/ChainRules.jl#133 for very many primals.
But it is a valiid differential for scalars at least.
And that is an important class of primals.

If it actually gives better performance than true for scalars, then we probably should keep it.

After JuliaDiff/ChainRules.jl#133
we need to go through and update the docstrings etc for everything
and specify for various differential types what primal types they are valid for.

@simeonschaub
Copy link
Member

What still worries me about replacing One() with I is the following:

julia> a = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> b = I*a
2×2 Array{Int64,2}:
 1  2
 3  4

julia> a === b
false

so multiplication with I will create a copy in cases like this, which One() avoids.

@willtebbutt
Copy link
Member Author

My question is still why we want / need a One() differential at all 😬 .

Taking the vector space view of differentials, I can definitely get behind wanting a multiplicative identity for our scalars (most of the time?), but there's no special status for the vector 1, just 0 (additive identity).

While I agree that for Vector-input functions, the identity matrix happens to be helpful for propagating forwards multiple seeds at the same time, this doesn't hold in general (see my comment above).

@oxinabox
Copy link
Member

oxinabox commented Jan 5, 2020

So multiplication with I will create a copy in cases like this, which One() avoids.

Note this also applies to true and 1

julia> 1*x === x
false

julia> true*x = x

It does not apply to x being (immutable) value types, like scalars, since they don't copy anyway ever.

@YingboMa
Copy link
Member

YingboMa commented Jan 6, 2020

ForwardDiff2.jl doesn't use One at all.

@@ -5,7 +5,8 @@ export frule, rrule
export wirtinger_conjugate, wirtinger_primal, refine_differential
export @scalar_rule, @thunk
export extern, store!
export Wirtinger, Zero, One, DoesNotExist, Thunk, InplaceableThunk
export unthunk
export Wirtinger, Zero, DoesNotExist, Thunk, InplaceableThunk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export Wirtinger, Zero, DoesNotExist, Thunk, InplaceableThunk
export DoesNotExist, InplaceableThunk, Thunk, unthunk, Wirtinger, Zero

function Wirtinger(primal::Union{Number,AbstractDifferential},
conjugate::Union{Number,AbstractDifferential})
function Wirtinger(
primal::Union{Number,AbstractDifferential},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
primal::Union{Number,AbstractDifferential},
primal::Union{Number, AbstractDifferential},

conjugate::Union{Number,AbstractDifferential})
function Wirtinger(
primal::Union{Number,AbstractDifferential},
conjugate::Union{Number,AbstractDifferential},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
conjugate::Union{Number,AbstractDifferential},
conjugate::Union{Number, AbstractDifferential},

(x::Thunk)() = x.f()
@inline extern(x::Thunk) = extern(x())
@inline unthunk(x::Thunk) = x()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@test rr1 == 1
end
@testset "rules" begin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@willtebbutt
Copy link
Member Author

Closing as this is horribly outdated. #62 is still open, so hopefully we'll get around to removing One at some point...

@willtebbutt willtebbutt closed this May 3, 2020
@oxinabox oxinabox deleted the wct/remove-one branch January 11, 2021 13:46
@oxinabox oxinabox mentioned this pull request May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants