Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-level Blas interface #28

Open
Rufflewind opened this issue Jun 27, 2014 · 15 comments
Open

Low-level Blas interface #28

Rufflewind opened this issue Jun 27, 2014 · 15 comments

Comments

@Rufflewind
Copy link

Here's a complete low-level Blas FFI that contains everything defined in the official Blas standard. It's a 1-to-1 map to the C interface, using Ptr, IO, etc, but with ordinary Haskell types (Int, Float, etc). It contains both safe and unsafe foreign calls.

For better or worse, it makes use of c2hs, which saves some effort with the marshaling of data types. The generated Haskell code is not portable partly because c2hs doesn't handle size_t properly, but also because the underlying Blas implementation is free to choose a different type for CBLAS_INDEX (most implementations do use size_t though).

Would you be interested in this? It could save you some of the more tedious work.

@cartazio
Copy link
Member

This is NICE (eg I like what you're doing with saying symm==hymm in the real case). l think I will be interested in this. Though i'll need to mull it for a day or so first (and think about the design implications etc).

The design is so the maintainer runs the make file before pushing to hackage? a strong design constraint for hblas i want to adhere to is "zeroconfig".

Can this approach be extended to cover lapack? If so, what would be the challenges of that?

@cartazio
Copy link
Member

(if i don't follow up in a day or so, please ping me, but this is NICE)

@cartazio
Copy link
Member

i'm open to moving to having a tool that generates the ffi stuff on the maintainer side, that maybe makes some presumptions about the normal target so that the typical config is zero config, but has some hooks/ flags to do installer side config when appropriate. @maxpow4h did a prelim foray into that which i've cached in the ffi-utils folder , but i'm not currently using.

what are some example BLAS configs that see wide use that use a CBLAS_INDEX thats not a 32bit int?

@Rufflewind
Copy link
Author

Code generator

The Makefile is only for the developer. It does two things:

  • Runs tools/generate-ffi to produce the c2hs input (*.chs).
  • Rebuilds the Generic/Unsafe.hsusing Generic/Safe.hs as a template (which is just a sed command for now!) so I don't have to rewrite everything by hand.

For the user, these generated files would already be pre-packaged. The user does need to have c2hs installed though. Cabal will automatically invoke c2hs as needed.

The sticky details

CBLAS_INDEX usually maps to size_t. The type of size_t is architecture-dependent: it's an unsigned integer type. Practically speaking, it's generally 32-bit or 64-bit depending on platform (i.e. x86 vs x86_64). The C type size_t maps to Haskell's CSize.

Leaving the integer size problems aside, there's also the issue that Blas implementations often come in different names and flavors. Of the top of my head, there's: ACML, ATLAS, cuBLAS, Intel MKL, OpenBLAS, ... and of course the official reference implementation of BLAS by Netlib. It may take a lot of effort to accommodate to all of them and deal with their quirks, if any. I'm not really sure how I plan to deal with this part yet. I would certainly prefer to streamline the process as much as possible for the user!

Lapack

I've not really looked into Lapack yet so that may take a while. It would also be a separate project.

@cartazio
Copy link
Member

cartazio commented Jul 7, 2014

still mulling this on a few angles, but heres some naive (and possibly controversial!) thoughts:

  1. I think the high level api for BLAS and LAPACK bindings should actually be monomorphic! (though wrapping up the bindings is certainly made quite a bit easier when they're type classed). Why? Because

    a. BLAS (and lapack) will only ever provide the 4 types: float, double, complex float, complex double

    b. As best as I can determine (in my very limited experience mind you!), naive users wind up only using the (complex) Double versions of various operations, and the folks using (complex) Float versions of operations tend to use them specifically because the flops throughput difference in compute time makes a meaningful difference in their application workload.

  2. I don't think think you have to try to support all of the BLAS variants off the bat. As long as the common installation configurations are easy to get working, and its made clear to users that new ones can be added easily as theres demand for it, thats plenty. Requests for supporting exotic configurations will be a sign of success!

@Rufflewind
Copy link
Author

  1. I'm not too interested in making a high-level API right now as it's a lot more complicated and a relatively unexplored territory (it's fun, but I also have other work to do!). For the time being I just want a nice basic and (most importantly) stable API for other people to build their abstractions on. My role is to save some of the drudgery of writing the FFI / marshalling / dealing with external C libraries and their quirks.

    Polymorphism is primarily for the sake of simplifying the Blas interface and making it uniform wrt all 4 types. It's true that Blas probably won't ever support more than the 4 basic types simply for hardware reasons, so a "closed" type class would be a bit more appropriate. I see little harm in doing this (could be wrong though), and the vanilla Blas interface is still available for those who want to prefix everything with sdcz.

  2. Right, this will be ongoing work. Based on my experience so far I know many people (esp. in academia) have a crazy hard time with getting dependencies to work properly, so I want to at least make this part less painful.

@cartazio
Copy link
Member

cartazio commented Jul 8, 2014

for exposing the "closedness", one approach would be adding somthing like

class Blas a where
    elementType :: p a-> ElementType a

data ElementType a where
  EFloat :: ElementType Float
  EDouble :: ElementType Double
  -- etc

cool, sounds like we're converging on agreeing

@Rufflewind
Copy link
Author

I figured there'd be a way to make them closed (saw a few on StackOverflow), but all of them are sort of a hack and I feel there's no need to go through all this trouble just to stop the user from shooting themselves with a nerf gun :P

@cartazio
Copy link
Member

cartazio commented Jul 8, 2014

wasn't meant for safety! was thinking more so about how theres certain Double Precision routines that can be accelerated by a Single Precision estimate first!

(easy way to make the closedness of the API obvious too )

@cartazio
Copy link
Member

cartazio commented Jul 8, 2014

anyways, no hurry :)

this week i'm trying to finish up the alpha release engineering for my Numerical package

@cartazio
Copy link
Member

cartazio commented Jul 8, 2014

to clarify, i'm suggesting adding something like elementType :: p a-> ElementType a to your BLAS type class and then the GADT result can only be constructed for the 4 supported types

@Rufflewind
Copy link
Author

Right, but why go through all this effort to make them closed?

@cartazio
Copy link
Member

cartazio commented Jul 8, 2014

good point :)
Hrm, i guess some folks ARE working on implementing a BLAS like tool for finite fields, so perhaps thats a good point

@Rufflewind
Copy link
Author

Think I'm mostly done with it for the time being. The docs have a lot of room for improvement, but functionality-wise I'm not sure if there's anything else I plan on adding to it.

I tried to implement an automated Haskell script to help with the linking process (find where libraries are and set the right flags), and it ended up taking far more effort that I anticipated because I had to learn a lot of the internal details of Cabal. The flags themselves are also quite complex: the Intel MKL alone has like 5-8 flags that vary depending on the architecture, operating system, parallelization, etc.

For the time being, I'm deferring the linking part to the user (i.e. blas-hs does not link to Blas directly). What I might do in the near term is to add a table of the linking flags for some common Blas implementations on typical systems, add some explanation of how linking works, and link to the appropriate documentation if users need more info etc so that the user can do it themselves. For really simple cases (such as OpenBlas, Netlib Blas) it's probably fine to implement Cabal flags for them too.

If you have any suggestions, please let me know!

@cartazio
Copy link
Member

cartazio commented Aug 2, 2014

oh wow, good job! this is very very nice! I'll take a look soon, but this is quite nice. (a bit buried )
It might be worth looking at how Hmatrix handles linking on various platforms. (i copied a teeny bit of that logic myself).

I'm happy to help you work on the cabal setup.hs foo in a few weeks time permitting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants