Skip to content

Conversation

@cemitch99
Copy link
Member

@cemitch99 cemitch99 commented May 15, 2025

This PR adds a new element that allows the user to track through a region with a specified magnetostatic vector potential. Symplectic integration is performed using the exact form of the nonlinear relativistic Hamiltonian. We use the semiexplicit integrator appearing in:

B. Jayawardana and T. Ohsawa, "Semiexplicit symplectic integrators for non-separable Hamiltonian systems," Math. Comput. 92, pp. 251-281 (2022),
https://doi.org/10.1090/mcom/3778

To do:

  • add template for the basic VectorPotential element
  • add functions needed for the semiexplicit integrator
  • update the map in VectorPotential to use the correct Hamiltonian derivatives
  • update user-facing inputs
  • add simple benchmark problem(s)
  • check the treatment of s-dependence
  • add an s-dependent benchmark problem
  • add Python bindings
  • update documentation
  • add support for 4th-order and 6th-order integration

@cemitch99 cemitch99 marked this pull request as draft May 15, 2025 01:59
@cemitch99 cemitch99 marked this pull request as ready for review May 15, 2025 02:14
@ax3l ax3l self-requested a review June 18, 2025 18:48
@ax3l ax3l added component: elements Elements/external fields tracking: particles labels Jun 18, 2025
@ax3l ax3l added this to the HTU LDRD milestone Jun 18, 2025
@cemitch99 cemitch99 changed the title [WIP] Symplectic integration in a user-defined vector potential Symplectic integration in a user-defined vector potential Jul 17, 2025
Reduce parallelism to avoid job shutdown
@ax3l
Copy link
Member

ax3l commented Aug 8, 2025

The tolerances seem to fail with MPI, maybe just need a small relaxation @cemitch99 ?

Errors while running CTest
	485 - examples-fodo-vector-potential.py.analysis (Failed)
	491 - examples-exact-quad-vector-potential.py.analysis (Failed)
	512 - examples-solenoid-vector-potential.py.analysis (Failed)

Comment on lines +348 to +356
// Evaluate the vector potential and its derivatives
auto const ax = m_dfunc_ax(x, y, z) * m_scale;
auto const ay = m_dfunc_ay(x, y, z) * m_scale;
auto const daxdx = m_dfunc_daxdx(x, y, z) * m_scale;
auto const daxdy = m_dfunc_daxdy(x, y, z) * m_scale;
auto const daydx = m_dfunc_daydx(x, y, z) * m_scale;
auto const daydy = m_dfunc_daydy(x, y, z) * m_scale;
auto const dazdx = m_dfunc_dazdx(x, y, z) * m_scale;
auto const dazdy = m_dfunc_dazdy(x, y, z) * m_scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiqunZhang I fear we use too many parsers for the CUDA CI (?) 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible hack to confirm

Suggested change
// Evaluate the vector potential and its derivatives
auto const ax = m_dfunc_ax(x, y, z) * m_scale;
auto const ay = m_dfunc_ay(x, y, z) * m_scale;
auto const daxdx = m_dfunc_daxdx(x, y, z) * m_scale;
auto const daxdy = m_dfunc_daxdy(x, y, z) * m_scale;
auto const daydx = m_dfunc_daydx(x, y, z) * m_scale;
auto const daydy = m_dfunc_daydy(x, y, z) * m_scale;
auto const dazdx = m_dfunc_dazdx(x, y, z) * m_scale;
auto const dazdy = m_dfunc_dazdy(x, y, z) * m_scale;
// Evaluate the vector potential and its derivatives
auto const ax = 0_prt;
auto const ay = 0_prt;
auto const daxdx = 0_prt;
auto const daxdy = 0_prt;
auto const daydx = 0_prt;
auto const daydy = 0_prt;
auto const dazdx = 0_prt;
auto const dazdy = 0_prt;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could try to merge them into one Parser. Let me think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it does not help. I think we need to force noinline them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

amrex::GpuArray<amrex::ParserExecutor<3>,8> df{m_dfunc_ax, m_dfunc_ay, ....}; // on host

// on device
amrex::Real results[8];
for (int i = 0; i < 8; ++i) {
    results[i] = df[i](x,y,z) * m_scale;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might add a pragma to make sure the loop is not unrolled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the noinline approach, you can add a helper in your code that that marked as noinline. Something like

AMREX_GPU_HOST_DEVICE AMREX_NO_INLINE
template <int N, typename... T>
auto call_parser (ParserExecutor<N> const& f, T... xyz)
{
    return f(xyz...);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might be good!

So far, reducing the compile to -j1 for CUDA helped, but calling a non-inline for this case would be super helpful.

@WeiqunZhang do you like to pus hthis to the PR or a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can put the helper function in amrex. Then we can use it here or in another PR. (I tested it. It did work.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ax3l
Copy link
Member

ax3l commented Aug 13, 2025

Something is still off with the Python tests. Besides the tolerance issues, they seem to run significantly longer than their app/executable counterparts?

ctest --test-dir build -R vector
      Start 481: fodo-vector-potential.run
 1/18 Test #481: fodo-vector-potential.run .................   Passed    3.65 sec
      Start 482: fodo-vector-potential.analysis
 2/18 Test #482: fodo-vector-potential.analysis ............   Passed    0.30 sec
      Start 483: fodo-vector-potential.cleanup
 3/18 Test #483: fodo-vector-potential.cleanup .............   Passed    0.01 sec
      Start 484: fodo-vector-potential.py.run
 4/18 Test #484: fodo-vector-potential.py.run ..............   Passed  268.34 sec
      Start 485: fodo-vector-potential.py.analysis
 5/18 Test #485: fodo-vector-potential.py.analysis .........***Failed    0.26 sec
      Start 486: fodo-vector-potential.py.cleanup
 6/18 Test #486: fodo-vector-potential.py.cleanup ..........   Passed    0.01 sec
      Start 487: exact-quad-vector-potential.run
 7/18 Test #487: exact-quad-vector-potential.run ...........   Passed    3.73 sec
      Start 488: exact-quad-vector-potential.analysis
 8/18 Test #488: exact-quad-vector-potential.analysis ......   Passed    0.57 sec
      Start 489: exact-quad-vector-potential.cleanup
 9/18 Test #489: exact-quad-vector-potential.cleanup .......   Passed    0.01 sec
      Start 490: exact-quad-vector-potential.py.run
10/18 Test #490: exact-quad-vector-potential.py.run ........   Passed   11.98 sec
      Start 491: exact-quad-vector-potential.py.analysis
11/18 Test #491: exact-quad-vector-potential.py.analysis ...***Failed    0.57 sec
      Start 492: exact-quad-vector-potential.py.cleanup
12/18 Test #492: exact-quad-vector-potential.py.cleanup ....   Passed    0.01 sec
      Start 508: solenoid-vector-potential.run
13/18 Test #508: solenoid-vector-potential.run .............   Passed    6.24 sec
      Start 509: solenoid-vector-potential.analysis
14/18 Test #509: solenoid-vector-potential.analysis ........   Passed    0.59 sec
      Start 510: solenoid-vector-potential.cleanup
15/18 Test #510: solenoid-vector-potential.cleanup .........   Passed    0.01 sec
      Start 511: solenoid-vector-potential.py.run
16/18 Test #511: solenoid-vector-potential.py.run ..........   Passed   23.76 sec
      Start 512: solenoid-vector-potential.py.analysis
17/18 Test #512: solenoid-vector-potential.py.analysis .....***Failed    0.58 sec
      Start 513: solenoid-vector-potential.py.cleanup
18/18 Test #513: solenoid-vector-potential.py.cleanup ......   Passed    0.01 sec

This element requires these additional parameters:
* ``<element_name>.ds`` (``float``, in meters) the segment length
* ``<element_name>.unit`` (``integer``) specification of units for the vector potential (default: ``0``)
Copy link
Member

@ax3l ax3l Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called unit while the inputs file (app API) calls it units

ax3l added 2 commits August 12, 2025 23:55
Was not the same input as used in the app example
Was not the same input as used in the app example
Was not the same input as used in the app example
@ax3l
Copy link
Member

ax3l commented Aug 13, 2025

@cemitch99 the three python run files are not yet 100% identical with the app inputs files, which should be the origin of the failing tests. I fixed the things I spotted, but some differences remain.

WeiqunZhang added a commit to WeiqunZhang/amrex that referenced this pull request Aug 13, 2025
This works on lambdas, functors, normal functions. But it does not work on
overloaded functions like std::sin. If needed, one could however wrap
functions like std::sin inside a lambda function.

Here is the motivation behind this PR. In this impactx PR
(BLAST-ImpactX/impactx#964), a GPU kernel uses 8
amrex::Parser's. The CUDA CI fails if more than one job is used in
build. Apparently the kernel is too big because all those parser functions
are inlined. This PR provides a way to reduce the size by forcing noinline.
ax3l pushed a commit to AMReX-Codes/amrex that referenced this pull request Aug 28, 2025
This works on lambdas, functors, normal functions. But it does not work
on overloaded functions like std::sin. If needed, one could however wrap
functions like std::sin inside a lambda function. It also does not work
with normal
functions for SYCL and one would have to wrap it inside a lambda.     

Here is the motivation behind this PR. In this impactx PR
(BLAST-ImpactX/impactx#964), a GPU kernel uses 8
amrex::Parser's. The CUDA CI fails if more than one job is used in
build. Apparently the kernel is too big because all those parser
functions are inlined. This PR provides a way to reduce the size by
forcing noinline.
@cemitch99
Copy link
Member Author

The three new tests currently fail only in the case OpenMP / GCC w/ MPI w/ Python. Note that the execution time (of the Python test) is very long - 32.42 sec for the solenoid example, which is only 0.56 sec on macOS / AppleClang (with similar behavior for the other two tests). Also, although the solenoid example runs successfully, the initial and final beam moments agree to all digits (and they should not), which appears to indicate that no tracking push was applied to the beam.

@ax3l
Copy link
Member

ax3l commented Aug 29, 2025

Thanks! If they run a bit on the longer end, let us add the slow label on these (in CMakeLists.txt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants