-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
direct gather routine for single vector field #5776
direct gather routine for single vector field #5776
Conversation
419ded3
to
d615692
Compare
This routine seems to duplicate what is in |
dac35c8
to
f8fa09b
Compare
I noticed that too. Originally, this routine was optimized for the implicit solver where a Yee grid is assumed. Now that I have made the routine more generalized (and less optimized), it is now basically the same as doGatherShapeN for a single vector field without the galerkin_interpolation flag. There is no need to use that flag for B. What do you suggest we do? |
f8fa09b
to
6346f6d
Compare
@dpgrote I've generalized the routine for an arbitrary vector field F with separate depos_order template parameters for parallel and perpendicular gathers. |
FYI. The file diff is very misleading now. I added the function doGatherVectorFieldShapeN() above doGatherShapeN() (which is required) and then modified the latter to call the former. However, the file diff recognizes this as many separate changes to the doGatherShapeN() routine rather than one green block for doGatherVectorFieldShapeN() and then another few red/green blocks showing the modifications to doGatherShapeN(). |
AMREX_GPU_HOST_DEVICE AMREX_FORCE_INLINE | ||
void doGatherShapeN ([[maybe_unused]] const amrex::ParticleReal xp, | ||
void doGatherVectorFieldShapeN ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise, can this name be changed to doDirectGatherVectorFieldShapeN
. And the same for doGatherShapeN
, change to doDirectGatherShapeN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with renaming the new function, but I'm hesitant to change doGatherShapeN. Seems there are a lot of functions that start with that, so you could argue that all of them should change as well. That seems like something for a separate PR.
Also, why have ShapeN in the function names? I feel like it adds no value here.
ex_arr(lo.x+j_ex+ix, lo.y+k_ex+iy, lo.z+l_ex+iz); | ||
// Gather vector field F | ||
amrex::Real weight; | ||
for (int i=0; i<=depos_order_para; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For optimization, the loop ordering should be swapped, with i
the innermost loop. The same for all of the nested loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this. The Villasenor deposition routine also uses the bad ordering. Should that be changed as well? Separate PR of course.
} | ||
} | ||
|
||
#if defined(WARPX_DIM_RZ) | ||
// Convert Fxp and Fyp (which are actually Fr and Fth) to Fx and Fy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This routine is adding to previously calculated fields, so for RZ, this transformation needs to be done on only the fields calculated in this routine. The fields need to be accumulated into temporaries and the transform done on those before adding them to the total fields. You can see this in the original `doGatherShapeN. Also, note that there the XZ and RZ are fully separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this routine does what you describe. It is also identical to how it is done in doGatherShapeNEsirkepovStencilImplicit() and in doGatherPicnicShapeN().
In either case, I made a small change that I think makes it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Fixed now.
285aec2
to
13e20d6
Compare
Documenting recent commits. To reduce code duplication, we originally changed doGatherShapeN() to call the new doDirectGatherSingleField() once for E and once for B. @dpgrote and myself did some timing tests on CPU and GPU (Lassen and Perlmutter) using the 3D uniform_plasma input deck. There was no noticeable difference on CPU, but there was a 36-50% increase in the GatherAndPush timings on GPU. The increase seems to be because the shape factors in each direction are now computed twice. In order to not affect timing associated with GatherAndPush, doGatherShapeN() is restored to its original form and the new routine is only called by the implicit routines as needed. However, this means that there is a fair amount of duplicate code. |
} | ||
|
||
// Convert Frp and Fthp to Fxp and Fyp | ||
Fxp = costheta*Frp - sintheta*Fthp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be +=
to be consistent with the fields being accumulated.
@@ -495,22 +753,10 @@ void doGatherShapeNEsirkepovStencilImplicit ( | |||
#if defined(WARPX_DIM_RZ) | |||
amrex::Real const xp_new = xp_np1; | |||
amrex::Real const yp_new = yp_np1; | |||
amrex::Real const xp_mid = xp_nph; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this routine are unrelated to the main topic of the PR so should be reverted to keep the PR minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is just a small change and makes things consistent with the cleaner implementation of the new routine, I would prefer to keep it.
@@ -15,6 +15,264 @@ | |||
|
|||
#include <AMReX.H> | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a somewhat special purpose routine, maybe move this further down in the file, perhaps to just before doGatherPicnicShapeN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to use the new routine in doGatherShapeN, then it has to be before that routine. If I move it and someone wants to test out the timings again by calling it from doGatherShapeN, then they would have to move it back to where it is now.
… vector field F with different shape factors in perpendicular and parallel directions
bdcae30
to
3296277
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
A direct gather routine for B is used in doGatherPicnic() (only used by implicit solver) and will also be used in the mass matrix deposit routine in PR #5768. Making a separate routine for a direct Gather of the B-field will reduce code duplication.
This PR adds a direct gather routine for a single vector field, which can be used for E or B and accounts for the galerkin_interpolation flag in parallel and perpendicular directions.
@RemiLehe This PR also seems to overlap with PR #5673.