how to implement a new routine from the host side? #565

sarithpeiris · 2024-12-08T19:35:49Z

I comprehend kernel code; however, I find host side code challenging to understand.(fine with C, however, I lack proficiency in C++.)

I just want to throw in an activation function for the CLBlastgemm function. I have noticed that it seems impossible to perform the activation function in Gemm. An example would be the sigmoid function.

CNugteren · 2024-12-09T16:24:37Z

I'm not sure what you want to achieve here:

If you want to add a new route to the CLBlast library and merge it in, then I would suggest we discuss it first before you do any work that might not be merged in?
If you just want to extend CLBlast for yourself I suggest to take a look at a simple routine, such as the xCOPY routine. But you can also just run any OpenCL kernels outside of CLBlast, which might be way easier.
If you want to run some post-processing before storing the result to memory (e.g. a ReLU, to save memory bandwidth), then you can do that in the OpenCL kernel here. But it can be a bit tricky, so it might be easier to just run a separate post-processing kernel of your own, although you would pay the cost of reading data in and out of memory again.

sarithpeiris · 2024-12-11T05:25:29Z

If you want to add a new route to the CLBlast library and merge it in, then I would suggest we discuss it first before you do any work that might not be merged in?

Not for now, maybe in the coming days.

If you just want to extend CLBlast for yourself I suggest to take a look at a simple routine, such as the xCOPY routine. But you can also just run any OpenCL kernels outside of CLBlast, which might be way easier.

That's a good point. Instead of jumping into a complex routine, it's often better to start with simpler tasks.

If you want to run some post-processing before storing the result to memory (e.g. a ReLU, to save memory bandwidth), then you can do that in the OpenCL kernel here. But it can be a bit tricky, so it might be easier to just run a separate post-processing kernel of your own, although you would pay the cost of reading data in and out of memory again.

Exactly as I initially thought. Here's what I did.

First, I changed the here to like this(sigmoid)
#if PRECISION == 3232 || PRECISION == 6464 #define Activation(value) value.x = 1.0f / (1.0f / exp(-value.x)); value.y = 1.0f / (1.0f / exp(-value.y)) #else #define Activation(value) 1.0f / (1.0f / exp(-value)) #endif
and then here like this
cgm[index] = Activation(result);
I rebuilt CLBlast, but the output is the same as without activation. Nothing has changed :(

CNugteren · 2024-12-11T07:46:07Z

Looks good indeed. One thing to note is that CLBlast is quite complex, and there are different versions of GEMM that might run, see also here.

One thing you can do for your tests is compile CLBlast with -DVERBOSE=ON (to CMake) and then it will report which kernel it is actually running. Because depending on your parameters and your device you might also want to apply the same change here and here for example.

sarithpeiris · 2024-12-13T19:23:38Z

One thing you can do for your tests is compile CLBlast with -DVERBOSE=ON (to CMake) and then it will report which kernel it is actually running. Because depending on your parameters and your device you might also want to apply the same change here and here for example.

Thanks, it worked like charm!

But I got curious and decided to extend it like this. (this is a small part of the code)

#if PRECISION == 3232 || PRECISION == 6464
  #if ACTV == 0 // none
    #define Activation(value) value.x = value.x; value.y = value.y
  #elif ACTV == 1 // tanh
    #define Activation(value) value.x = tanh(value.x); value.y = tanh(value.y)
#endif

 #if ACTV == 0 // none
    #define Activation(value) value
  #elif ACTV == 1 // tanh
    #define Activation(value) tanh(value);
#endif

Just a quick question: Is there a way to pass a macro to a specific kernel? I want to trigger ACTV with respect to the passed value. (e.g. -DACTV=1)

CNugteren · 2024-12-16T06:49:00Z

Good to hear.

Do you mean something like this?
https://github.com/CNugteren/CLBlast/blob/master/src/utilities/compile.cpp#L41

sarithpeiris · 2024-12-16T13:12:47Z

Good to hear.

Do you mean something like this?
https://github.com/CNugteren/CLBlast/blob/master/src/utilities/compile.cpp#L41

So, this is fine, but it only counts for one activation for the whole program, right?

I want to call CLBlastgemm several times with different activations in the same program.

//call gemm with ACTV 0

//do something

//call gemm with ACTV 1

CNugteren · 2024-12-16T13:41:57Z

Ah, OK. So in that case you'll want to add a new parameter to the clblast.h header, and pass it through to https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1647 and https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1658 as well. And then use that value to set the ACTV define just before this line https://github.com/CNugteren/CLBlast/blob/master/src/routines/level3/xgemm.cpp#L28 here (the last argument to this routine class is the kernel string).

Alternatively you can create an entirely new routine for when ACTV should be 1 and then you can use a define such as ROUTINE_GEMM_WITH_ACTV similar to here https://github.com/CNugteren/CLBlast/blob/master/src/kernels/level3/xgemm_part4.opencl#L19.

sarithpeiris · 2024-12-16T14:46:30Z

Ah, OK. So in that case you'll want to add a new parameter to the clblast.h header, and pass it through to https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1647 and https://github.com/CNugteren/CLBlast/blob/master/src/clblast.cpp#L1658 as well. And then use that value to set the ACTV define just before this line https://github.com/CNugteren/CLBlast/blob/master/src/routines/level3/xgemm.cpp#L28 here (the last argument to this routine class is the kernel string).

Alternatively you can create an entirely new routine for when ACTV should be 1 and then you can use a define such as ROUTINE_GEMM_WITH_ACTV similar to here https://github.com/CNugteren/CLBlast/blob/master/src/kernels/level3/xgemm_part4.opencl#L19.

That sounds wonderful! I will check this out and keep you updated.

CNugteren added the question label Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to implement a new routine from the host side? #565

how to implement a new routine from the host side? #565

sarithpeiris commented Dec 8, 2024 •

edited

Loading

CNugteren commented Dec 9, 2024

sarithpeiris commented Dec 11, 2024 •

edited

Loading

CNugteren commented Dec 11, 2024

sarithpeiris commented Dec 13, 2024 •

edited by CNugteren

Loading

CNugteren commented Dec 16, 2024

sarithpeiris commented Dec 16, 2024 •

edited

Loading

CNugteren commented Dec 16, 2024

sarithpeiris commented Dec 16, 2024

how to implement a new routine from the host side? #565

how to implement a new routine from the host side? #565

Comments

sarithpeiris commented Dec 8, 2024 • edited Loading

CNugteren commented Dec 9, 2024

sarithpeiris commented Dec 11, 2024 • edited Loading

CNugteren commented Dec 11, 2024

sarithpeiris commented Dec 13, 2024 • edited by CNugteren Loading

CNugteren commented Dec 16, 2024

sarithpeiris commented Dec 16, 2024 • edited Loading

CNugteren commented Dec 16, 2024

sarithpeiris commented Dec 16, 2024

sarithpeiris commented Dec 8, 2024 •

edited

Loading

sarithpeiris commented Dec 11, 2024 •

edited

Loading

sarithpeiris commented Dec 13, 2024 •

edited by CNugteren

Loading

sarithpeiris commented Dec 16, 2024 •

edited

Loading