Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too slow #4

Open
jfischoff opened this issue Oct 9, 2014 · 17 comments
Open

Too slow #4

jfischoff opened this issue Oct 9, 2014 · 17 comments

Comments

@jfischoff
Copy link
Owner

Doing numerical process with linked lists of box numbers is silly.

Linear should be much faster. I don't think I need data to solve it.

@ericpashman
Copy link
Collaborator

So what's the problem here? Are the computations too slow or is it just generating test data that's deathly slow?

If the plan is to update the Linear module to use Data.Vector or hmatrix, let me know and I might be able to start in on it. Typical usage, I imagine, will be with vectors and matrices of smaller dimensions than what hmatrix is really suited for, since it has to marshall everything in and out of LAPACK and BLAS through the FFI.

I don't think we really need very much support for doing linear computations (do we?), but if so we might want to check out some of the pure-Haskell options, like Edward Kmett's linear library.

@jfischoff
Copy link
Owner Author

On Mon, Oct 13, 2014 at 8:43 PM, Eric Pashman [email protected]
wrote:

So what's the problem here? Are the computations too slow or is it just
too slow to generate test data?

Computations.

If the plan is to update the Linear module to use Data.Vector or hmatrix,
let me know and I might be able to start in on it. Typical usage, I
imagine, will be with vectors and matrices of smaller dimensions than what
hmatrix is really suited for, since it has to marshall everything in and
out of LAPACK and BLAS through the FFI.

hmatrix I guess or something similar with a BSD license if possible. I
don't have a strong opinion.

I don't think we really need very much support for doing linear
computations (do we?), but if so we might want to check out some of the
pure-Haskell options, like Edward Kmett's linear
https://hackage.haskell.org/package/linear library.

linear is for low dimensional stuff ... I think .


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

I pushed some initial work to re-write the Linear module using hmatrix into a new branch called linear.

Take a look at it if you get a chance. The code is broken in a couple of ways. I think the "no instance" errors on line 67 can be avoided by using dim from Data.Packed.Vector rather than the size class method. The trouble with deriving Eq and Show instances for LinearConstraints can probably be worked out as well. The big problem is with the type of the obj parameter in the definition of linear (line 64).

It's the same old thing: obj must have type forall a. Floating a => [a] -> a in order to be passed to minimize (defined in the ConjugateGradient module), but now a also needs to be an instance of Numeric because of the hmatrix functionality. There's no remotely sane way to add that constraint, I don't think.

If there is a way to get this to work, it might be by getting rid of the polymorphism, including in the definition of minimize. That would mean replacing all of the (Floating a, ...) => a types with Double and then wrapping them up in the mode types from Numeric.AD. So everything would end up as Reverse s Double or whatever. Then we'd have to make Reverse s Double an instance of all of the classes hmatrix uses. Which may or may not be possible, owing to that s parameter.

Anyway, I wonder whether there is way to do this that is not completely bonkers. ...

@ericpashman
Copy link
Collaborator

I re-wrote the Linear module using Data.Vector, naively replacing all list types with boxed Vector types. I don't want to move on to LinearTesting before first trying to get some idea about whether this actually might speed things up meaningfully. Is there a particular sort of problem I should test the linear solver with?

By the way, do you have much experience using Data.Vector? It's not clear to me whether I should even be using the basic Vector type or whether I should instead give Stream (from Data.Vector.Fusion.Stream) a whirl. ...

@jfischoff
Copy link
Owner Author

I'll take a look. We want the unboxed version though.

On Thu, Oct 16, 2014 at 3:46 PM, Eric Pashman [email protected]
wrote:

I re-wrote the Linear module using Data.Vector, naively replacing all
list types with boxed Vector types. I don't want to move on to
LinearTesting before first trying to get some idea about whether this
actually might speed things up meaningfully. Is there a particular sort of
problem I should test the linear solver with?

By the way, do you have much experience using Data.Vector? It's not clear
to me whether I should even be using the basic Vector type or whether I
should instead give Stream (from Data.Vector.Fusion.Stream) a whirl. ...


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

I just pushed everything into a new branch, linear-vector. Completely untested, I got everything to compile and ran off to do other things.

We're going to run into the the same problem with too little polymorphism, I imagine, if we use unboxed vectors. I don't have a good grasp on why this module is so slow, but I figured some stream fusion was worth a shot. I think we get that from the Vector type I used, but I'm not sure, might have to use Stream.

@jfischoff
Copy link
Owner Author

I think we need to drop ad to use unboxed vectors, if that has not already
happened.

On Thu, Oct 16, 2014 at 4:19 PM, Eric Pashman [email protected]
wrote:

I just pushed everything into a new branch, linear-vector.

We're going to run into the the same problem with too little polymorphism,
I imagine, if we use unboxed vectors. I don't have a good grasp on why this
module is so slow, but I figured some stream fusion was worth a shot. I
think we get that from the Vector type I used, but I'm not sure, might
have to use Stream.


Reply to this email directly or view it on GitHub
#4 (comment).

@jfischoff
Copy link
Owner Author

Wait, I mean I'm basically sure we have to drop ad I mean

On Thu, Oct 16, 2014 at 4:38 PM, Jonathan Fischoff <
[email protected]> wrote:

I think we need to drop ad to use unboxed vectors, if that has not already
happened.

On Thu, Oct 16, 2014 at 4:19 PM, Eric Pashman [email protected]
wrote:

I just pushed everything into a new branch, linear-vector.

We're going to run into the the same problem with too little
polymorphism, I imagine, if we use unboxed vectors. I don't have a good
grasp on why this module is so slow, but I figured some stream fusion was
worth a shot. I think we get that from the Vector type I used, but I'm
not sure, might have to use Stream.


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

You think so? We can definitely make everything work using ad, it's just a question of what it will be suited for. ...

If we're going to reimplement everything from the ground up around another way of computing gradients, it probably makes sense to keep the existing code around as an option for low-dimensional stuff. The linear package would then work just fine.

Still probably makes sense to see where stream fusion gets us.

On Oct 16, 2014, at 18:39, Jonathan Fischoff [email protected] wrote:

Wait, I mean I'm basically sure we have to drop ad I mean

On Thu, Oct 16, 2014 at 4:38 PM, Jonathan Fischoff <
[email protected]> wrote:

I think we need to drop ad to use unboxed vectors, if that has not already
happened.

On Thu, Oct 16, 2014 at 4:19 PM, Eric Pashman [email protected]
wrote:

I just pushed everything into a new branch, linear-vector.

We're going to run into the the same problem with too little
polymorphism, I imagine, if we use unboxed vectors. I don't have a good
grasp on why this module is so slow, but I figured some stream fusion was
worth a shot. I think we get that from the Vector type I used, but I'm
not sure, might have to use Stream.


Reply to this email directly or view it on GitHub
#4 (comment).


Reply to this email directly or view it on GitHub.

@jfischoff
Copy link
Owner Author

I haven't looked at ad for a bit, but I thought the traversable constraint
made using unboxed vectors not possible.

On Thu, Oct 16, 2014 at 5:18 PM, Eric Pashman [email protected]
wrote:

You think so? We can definitely make everything work using ad, it's just
a question of what it will be suited for. ...

If we're going to reimplement everything from the ground up around another
way of computing gradients, it probably makes sense to keep the existing
code around as an option for low-dimensional stuff. The linear package
would then work just fine.

Still probably makes sense to see where stream fusion gets us.

On Oct 16, 2014, at 18:39, Jonathan Fischoff [email protected]
wrote:

Wait, I mean I'm basically sure we have to drop ad I mean

On Thu, Oct 16, 2014 at 4:38 PM, Jonathan Fischoff <
[email protected]> wrote:

I think we need to drop ad to use unboxed vectors, if that has not
already
happened.

On Thu, Oct 16, 2014 at 4:19 PM, Eric Pashman <
[email protected]>
wrote:

I just pushed everything into a new branch, linear-vector.

We're going to run into the the same problem with too little
polymorphism, I imagine, if we use unboxed vectors. I don't have a
good
grasp on why this module is so slow, but I figured some stream fusion
was
worth a shot. I think we get that from the Vector type I used, but
I'm
not sure, might have to use Stream.


Reply to this email directly or view it on GitHub
#4 (comment).


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

I think you're right. I had in mind another problem, the Unbox constraint on the underlying element type. That presents essentially the same barrier I ran into with the hmatrix vector types.

But I meant that things can (I think) be made to work, at least in some sense, using boxed vectors or the Stream type together with ad. I don't know whether such an implementation would be efficient enough for general use, but I'm pretty sure the pieces would at least fit together. So doing that, or using the linear package, would be one way to get things working using ad in roughly the manner we've been doing. One of those approaches will probably work just fine for low-dimensional data—though I suppose the approach in General might already suffice for such uses.

I really don't have any good ideas for how to handle the high-dimensional linear case. I'm just saying that it might be useful to have a separate implementation there while retaining what we have for non-linear and low-dim linear cases.

@ericpashman
Copy link
Collaborator

I updated the LinearTesting module and pushed everything into the linear-vector branch.

Ok, modules loaded: Numeric.MaxEnt.Linear, Numeric.MaxEnt.ConjugateGradient, Main, LinearTesting.
*Main> main
Tests
  Properties
    solvableSystemsAreSolvable: OK
      +++ OK, passed 100 tests.
    probsSumToOne:              OK
      +++ OK, passed 100 tests.
    solutionFitsConstraints:    OK
      +++ OK, passed 100 tests.

All 3 tests passed
*** Exception: ExitSuccess

I ran the tests single-threaded and they pass in about ten minutes on my machine, a MacBook Air with a rinky-dink processor. I guesstimate that's about two orders of magnitude faster than the tests ran previously. (When I let them run to completion a week ago, it took overnight plus several hours.)

Run everything and see what you think. There's still a lot of ugly nonsense going on in LinearTesting, generating lists and converting to vectors, etc. The whole thing's a mess, really. But if this performance is not completely unreasonable we might get out the profiler and see what we can make happen.

@ericpashman
Copy link
Collaborator

Actually, the tests pass in about three minutes on my machine. So, yes, this is at least a couple of orders of magnitude faster than using lists.

@jfischoff
Copy link
Owner Author

Oh awesome

On Fri, Oct 17, 2014 at 2:06 PM, Eric Pashman [email protected]
wrote:

Actually, the tests pass in about three minutes on my machine. So, yes,
this is at least a couple of orders of magnitude faster than using lists.


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

The linear-vector branch now has fully updated versions of Linear and LinearTesting ready to go. Both modules are basically the same as they were originally, just using Data.Vector.Vector rather than lists.

I'll wait for your OK to merge into master. You might want to look over the test code (particularly the Arbitrary definitions) and make sure it's still doing what it was before. I can't have screwed things up too badly, but I did re-write the small amount of linear algebra that was going on in LinearTesting to use multMV and new versions of transpose, etc. (defined in Linear), rather than the list and hmatrix stuff—I don't think we're using hmatrix at all anymore, by the way.

Anyway, take a look. And let me know whether you think we can really get away with using the vector package to do all the the linear stuff we'll need to do in the future. If it will suffice for most uses then it's probably time to try to clean things up, write tests for the other modules (I've started that locally) and cut a release. Otherwise we need to figure out how to rip things apart and rebuild on new foundations.

@jfischoff
Copy link
Owner Author

I'm out today, I'll take a look tomorrow. thanks

On Sat, Oct 18, 2014 at 12:57 PM, Eric Pashman [email protected]
wrote:

The linear-vector branch now has fully updated versions of Linear and
LinearTesting ready to go. Both modules are basically the same as they
were originally, just using Data.Vector.Vector rather than lists.

I'll wait for your OK to merge into master. You might want to look over
the test code (particularly the Arbitrary definitions) and make sure it's
still doing what it was before. I can't have screwed things up too badly,
but I did re-write the small amount of linear algebra that was going on in
LinearTesting to use multMV and new versions of transpose, etc. (defined
in Linear) rather than the list and hmatrix stuff—I don't think we're
using hmatrix at all anymore, by the way.

Anyway, take a look. And let me know whether you think we can really get
away with using the vector package to do all the the linear stuff we'll
need to do in the future. If it will suffice for most uses then it's
probably time to try to clean things up, write tests for the other modules
(I've started that locally) and cut a release. Otherwise we need to figure
out how to rip things apart and rebuild on new foundations.


Reply to this email directly or view it on GitHub
#4 (comment).

@ericpashman
Copy link
Collaborator

Yep, no rush. Just want to make sure I get your thoughts on the linear stuff at some point. The use-cases I'm familiar with are all pretty low-dim, so please don't imagine I know what will work for image processing or whatever. :P

On Oct 18, 2014, at 18:39, Jonathan Fischoff [email protected] wrote:

I'm out today, I'll take a look tomorrow. thanks

On Sat, Oct 18, 2014 at 12:57 PM, Eric Pashman [email protected]
wrote:

The linear-vector branch now has fully updated versions of Linear and
LinearTesting ready to go. Both modules are basically the same as they
were originally, just using Data.Vector.Vector rather than lists.

I'll wait for your OK to merge into master. You might want to look over
the test code (particularly the Arbitrary definitions) and make sure it's
still doing what it was before. I can't have screwed things up too badly,
but I did re-write the small amount of linear algebra that was going on in
LinearTesting to use multMV and new versions of transpose, etc. (defined
in Linear) rather than the list and hmatrix stuff—I don't think we're
using hmatrix at all anymore, by the way.

Anyway, take a look. And let me know whether you think we can really get
away with using the vector package to do all the the linear stuff we'll
need to do in the future. If it will suffice for most uses then it's
probably time to try to clean things up, write tests for the other modules
(I've started that locally) and cut a release. Otherwise we need to figure
out how to rip things apart and rebuild on new foundations.


Reply to this email directly or view it on GitHub
#4 (comment).


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants