Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct gather routine for single vector field #5776

Merged
merged 24 commits into from
Mar 25, 2025

Conversation

JustinRayAngus
Copy link
Contributor

@JustinRayAngus JustinRayAngus commented Mar 17, 2025

A direct gather routine for B is used in doGatherPicnic() (only used by implicit solver) and will also be used in the mass matrix deposit routine in PR #5768. Making a separate routine for a direct Gather of the B-field will reduce code duplication.

This PR adds a direct gather routine for a single vector field, which can be used for E or B and accounts for the galerkin_interpolation flag in parallel and perpendicular directions.

@RemiLehe This PR also seems to overlap with PR #5673.

@dpgrote
Copy link
Member

dpgrote commented Mar 18, 2025

This routine seems to duplicate what is in doGatherShapeN (except for the Galerkin factor) which is doing direct deposition. It looks like a single kernel could be written that gathers the three field components that would be called separately for E and B. It would need parallel and perpendicular Galerkin factors for the E and B.

@JustinRayAngus
Copy link
Contributor Author

JustinRayAngus commented Mar 18, 2025

This routine seems to duplicate what is in doGatherShapeN (except for the Galerkin factor) which is doing direct deposition. It looks like a single kernel could be written that gathers the three field components that would be called separately for E and B. It would need parallel and perpendicular Galerkin factors for the E and B.

I noticed that too. Originally, this routine was optimized for the implicit solver where a Yee grid is assumed. Now that I have made the routine more generalized (and less optimized), it is now basically the same as doGatherShapeN for a single vector field without the galerkin_interpolation flag. There is no need to use that flag for B. What do you suggest we do?

@JustinRayAngus
Copy link
Contributor Author

JustinRayAngus commented Mar 19, 2025

@dpgrote I've generalized the routine for an arbitrary vector field F with separate depos_order template parameters for parallel and perpendicular gathers.

@JustinRayAngus
Copy link
Contributor Author

JustinRayAngus commented Mar 19, 2025

FYI. The file diff is very misleading now. I added the function doGatherVectorFieldShapeN() above doGatherShapeN() (which is required) and then modified the latter to call the former. However, the file diff recognizes this as many separate changes to the doGatherShapeN() routine rather than one green block for doGatherVectorFieldShapeN() and then another few red/green blocks showing the modifications to doGatherShapeN().

@JustinRayAngus JustinRayAngus added cleaning Clean code, improve readability component: core Core WarpX functionality labels Mar 19, 2025
@JustinRayAngus JustinRayAngus assigned dpgrote, ax3l and RemiLehe and unassigned dpgrote and RemiLehe Mar 19, 2025
@JustinRayAngus JustinRayAngus changed the title direct gather routine for magnetic field direct gather routine for single vector field Mar 20, 2025
AMREX_GPU_HOST_DEVICE AMREX_FORCE_INLINE
void doGatherShapeN ([[maybe_unused]] const amrex::ParticleReal xp,
void doGatherVectorFieldShapeN (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, can this name be changed to doDirectGatherVectorFieldShapeN. And the same for doGatherShapeN, change to doDirectGatherShapeN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with renaming the new function, but I'm hesitant to change doGatherShapeN. Seems there are a lot of functions that start with that, so you could argue that all of them should change as well. That seems like something for a separate PR.

Also, why have ShapeN in the function names? I feel like it adds no value here.

ex_arr(lo.x+j_ex+ix, lo.y+k_ex+iy, lo.z+l_ex+iz);
// Gather vector field F
amrex::Real weight;
for (int i=0; i<=depos_order_para; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For optimization, the loop ordering should be swapped, with i the innermost loop. The same for all of the nested loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this. The Villasenor deposition routine also uses the bad ordering. Should that be changed as well? Separate PR of course.

}
}

#if defined(WARPX_DIM_RZ)
// Convert Fxp and Fyp (which are actually Fr and Fth) to Fx and Fy
Copy link
Member

@dpgrote dpgrote Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine is adding to previously calculated fields, so for RZ, this transformation needs to be done on only the fields calculated in this routine. The fields need to be accumulated into temporaries and the transform done on those before adding them to the total fields. You can see this in the original `doGatherShapeN. Also, note that there the XZ and RZ are fully separated.

Copy link
Contributor Author

@JustinRayAngus JustinRayAngus Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this routine does what you describe. It is also identical to how it is done in doGatherShapeNEsirkepovStencilImplicit() and in doGatherPicnicShapeN().

In either case, I made a small change that I think makes it more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Fixed now.

@JustinRayAngus JustinRayAngus force-pushed the do_gather_B branch 4 times, most recently from 285aec2 to 13e20d6 Compare March 24, 2025 21:29
@JustinRayAngus
Copy link
Contributor Author

JustinRayAngus commented Mar 24, 2025

Documenting recent commits. To reduce code duplication, we originally changed doGatherShapeN() to call the new doDirectGatherSingleField() once for E and once for B. @dpgrote and myself did some timing tests on CPU and GPU (Lassen and Perlmutter) using the 3D uniform_plasma input deck. There was no noticeable difference on CPU, but there was a 36-50% increase in the GatherAndPush timings on GPU. The increase seems to be because the shape factors in each direction are now computed twice.

In order to not affect timing associated with GatherAndPush, doGatherShapeN() is restored to its original form and the new routine is only called by the implicit routines as needed. However, this means that there is a fair amount of duplicate code.

@JustinRayAngus JustinRayAngus requested a review from dpgrote March 25, 2025 00:14
}

// Convert Frp and Fthp to Fxp and Fyp
Fxp = costheta*Frp - sintheta*Fthp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be += to be consistent with the fields being accumulated.

@@ -495,22 +753,10 @@ void doGatherShapeNEsirkepovStencilImplicit (
#if defined(WARPX_DIM_RZ)
amrex::Real const xp_new = xp_np1;
amrex::Real const yp_new = yp_np1;
amrex::Real const xp_mid = xp_nph;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this routine are unrelated to the main topic of the PR so should be reverted to keep the PR minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is just a small change and makes things consistent with the cleaner implementation of the new routine, I would prefer to keep it.

@@ -15,6 +15,264 @@

#include <AMReX.H>

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a somewhat special purpose routine, maybe move this further down in the file, perhaps to just before doGatherPicnicShapeN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use the new routine in doGatherShapeN, then it has to be before that routine. If I move it and someone wants to test out the timings again by calling it from doGatherShapeN, then they would have to move it back to where it is now.

Copy link
Member

@dpgrote dpgrote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@JustinRayAngus JustinRayAngus merged commit 404b444 into BLAST-WarpX:development Mar 25, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleaning Clean code, improve readability component: core Core WarpX functionality component: implicit solvers Anything related to implicit solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants