Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to implement a new routine from the host side? #565

Open
sarithpeiris opened this issue Dec 8, 2024 · 8 comments
Open

how to implement a new routine from the host side? #565

sarithpeiris opened this issue Dec 8, 2024 · 8 comments
Labels

Comments

@sarithpeiris
Copy link

sarithpeiris commented Dec 8, 2024

I comprehend kernel code; however, I find host side code challenging to understand.(fine with C, however, I lack proficiency in C++.)

I just want to throw in an activation function for the CLBlastgemm function. I have noticed that it seems impossible to perform the activation function in Gemm. An example would be the sigmoid function.

@CNugteren
Copy link
Owner

I'm not sure what you want to achieve here:

  • If you want to add a new route to the CLBlast library and merge it in, then I would suggest we discuss it first before you do any work that might not be merged in?
  • If you just want to extend CLBlast for yourself I suggest to take a look at a simple routine, such as the xCOPY routine. But you can also just run any OpenCL kernels outside of CLBlast, which might be way easier.
  • If you want to run some post-processing before storing the result to memory (e.g. a ReLU, to save memory bandwidth), then you can do that in the OpenCL kernel here. But it can be a bit tricky, so it might be easier to just run a separate post-processing kernel of your own, although you would pay the cost of reading data in and out of memory again.

@sarithpeiris
Copy link
Author

sarithpeiris commented Dec 11, 2024

If you want to add a new route to the CLBlast library and merge it in, then I would suggest we discuss it first before you do any work that might not be merged in?

Not for now, maybe in the coming days.

If you just want to extend CLBlast for yourself I suggest to take a look at a simple routine, such as the xCOPY routine. But you can also just run any OpenCL kernels outside of CLBlast, which might be way easier.

That's a good point. Instead of jumping into a complex routine, it's often better to start with simpler tasks.

If you want to run some post-processing before storing the result to memory (e.g. a ReLU, to save memory bandwidth), then you can do that in the OpenCL kernel here. But it can be a bit tricky, so it might be easier to just run a separate post-processing kernel of your own, although you would pay the cost of reading data in and out of memory again.

Exactly as I initially thought. Here's what I did.

  • First, I changed the here to like this(sigmoid)
    #if PRECISION == 3232 || PRECISION == 6464 #define Activation(value) value.x = 1.0f / (1.0f / exp(-value.x)); value.y = 1.0f / (1.0f / exp(-value.y)) #else #define Activation(value) 1.0f / (1.0f / exp(-value)) #endif

  • and then here like this
    cgm[index] = Activation(result);

  • I rebuilt CLBlast, but the output is the same as without activation. Nothing has changed :(

@CNugteren
Copy link
Owner

Looks good indeed. One thing to note is that CLBlast is quite complex, and there are different versions of GEMM that might run, see also here.

One thing you can do for your tests is compile CLBlast with -DVERBOSE=ON (to CMake) and then it will report which kernel it is actually running. Because depending on your parameters and your device you might also want to apply the same change here and here for example.

@sarithpeiris
Copy link
Author

sarithpeiris commented Dec 13, 2024

One thing you can do for your tests is compile CLBlast with -DVERBOSE=ON (to CMake) and then it will report which kernel it is actually running. Because depending on your parameters and your device you might also want to apply the same change here and here for example.

Thanks, it worked like charm!

But I got curious and decided to extend it like this. (this is a small part of the code)

#if PRECISION == 3232 || PRECISION == 6464
  #if ACTV == 0 // none
    #define Activation(value) value.x = value.x; value.y = value.y
  #elif ACTV == 1 // tanh
    #define Activation(value) value.x = tanh(value.x); value.y = tanh(value.y)
#endif
 #if ACTV == 0 // none
    #define Activation(value) value
  #elif ACTV == 1 // tanh
    #define Activation(value) tanh(value);
#endif
  • Just a quick question: Is there a way to pass a macro to a specific kernel? I want to trigger ACTV with respect to the passed value. (e.g. -DACTV=1)

@CNugteren
Copy link
Owner

Good to hear.

Do you mean something like this?
https://github.com/CNugteren/CLBlast/blob/master/src/utilities/compile.cpp#L41

@sarithpeiris
Copy link
Author

sarithpeiris commented Dec 16, 2024

Good to hear.

Do you mean something like this?
https://github.com/CNugteren/CLBlast/blob/master/src/utilities/compile.cpp#L41

So, this is fine, but it only counts for one activation for the whole program, right?

I want to call CLBlastgemm several times with different activations in the same program.

//call gemm with ACTV 0

//do something

//call gemm with ACTV 1

@CNugteren
Copy link
Owner

Ah, OK. So in that case you'll want to add a new parameter to the clblast.h header, and pass it through to https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1647 and https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1658 as well. And then use that value to set the ACTV define just before this line https://github.com/CNugteren/CLBlast/blob/master/src/routines/level3/xgemm.cpp#L28 here (the last argument to this routine class is the kernel string).

Alternatively you can create an entirely new routine for when ACTV should be 1 and then you can use a define such as ROUTINE_GEMM_WITH_ACTV similar to here https://github.com/CNugteren/CLBlast/blob/master/src/kernels/level3/xgemm_part4.opencl#L19.

@sarithpeiris
Copy link
Author

Ah, OK. So in that case you'll want to add a new parameter to the clblast.h header, and pass it through to https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1647 and https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1658 as well. And then use that value to set the ACTV define just before this line https://github.com/CNugteren/CLBlast/blob/master/src/routines/level3/xgemm.cpp#L28 here (the last argument to this routine class is the kernel string).

Alternatively you can create an entirely new routine for when ACTV should be 1 and then you can use a define such as ROUTINE_GEMM_WITH_ACTV similar to here https://github.com/CNugteren/CLBlast/blob/master/src/kernels/level3/xgemm_part4.opencl#L19.

That sounds wonderful! I will check this out and keep you updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants