-
Notifications
You must be signed in to change notification settings - Fork 29
Symplectic integration in a user-defined magnetostatic vector potential #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Symplectic integration in a user-defined magnetostatic vector potential #964
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Reduce parallelism to avoid job shutdown
|
The tolerances seem to fail with MPI, maybe just need a small relaxation @cemitch99 ? |
| // Evaluate the vector potential and its derivatives | ||
| auto const ax = m_dfunc_ax(x, y, z) * m_scale; | ||
| auto const ay = m_dfunc_ay(x, y, z) * m_scale; | ||
| auto const daxdx = m_dfunc_daxdx(x, y, z) * m_scale; | ||
| auto const daxdy = m_dfunc_daxdy(x, y, z) * m_scale; | ||
| auto const daydx = m_dfunc_daydx(x, y, z) * m_scale; | ||
| auto const daydy = m_dfunc_daydy(x, y, z) * m_scale; | ||
| auto const dazdx = m_dfunc_dazdx(x, y, z) * m_scale; | ||
| auto const dazdy = m_dfunc_dazdy(x, y, z) * m_scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeiqunZhang I fear we use too many parsers for the CUDA CI (?) 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible hack to confirm
| // Evaluate the vector potential and its derivatives | |
| auto const ax = m_dfunc_ax(x, y, z) * m_scale; | |
| auto const ay = m_dfunc_ay(x, y, z) * m_scale; | |
| auto const daxdx = m_dfunc_daxdx(x, y, z) * m_scale; | |
| auto const daxdy = m_dfunc_daxdy(x, y, z) * m_scale; | |
| auto const daydx = m_dfunc_daydx(x, y, z) * m_scale; | |
| auto const daydy = m_dfunc_daydy(x, y, z) * m_scale; | |
| auto const dazdx = m_dfunc_dazdx(x, y, z) * m_scale; | |
| auto const dazdy = m_dfunc_dazdy(x, y, z) * m_scale; | |
| // Evaluate the vector potential and its derivatives | |
| auto const ax = 0_prt; | |
| auto const ay = 0_prt; | |
| auto const daxdx = 0_prt; | |
| auto const daxdy = 0_prt; | |
| auto const daydx = 0_prt; | |
| auto const daydy = 0_prt; | |
| auto const dazdx = 0_prt; | |
| auto const dazdy = 0_prt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could try to merge them into one Parser. Let me think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it does not help. I think we need to force noinline them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
amrex::GpuArray<amrex::ParserExecutor<3>,8> df{m_dfunc_ax, m_dfunc_ay, ....}; // on host
// on device
amrex::Real results[8];
for (int i = 0; i < 8; ++i) {
results[i] = df[i](x,y,z) * m_scale;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might add a pragma to make sure the loop is not unrolled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the noinline approach, you can add a helper in your code that that marked as noinline. Something like
AMREX_GPU_HOST_DEVICE AMREX_NO_INLINE
template <int N, typename... T>
auto call_parser (ParserExecutor<N> const& f, T... xyz)
{
return f(xyz...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might be good!
So far, reducing the compile to -j1 for CUDA helped, but calling a non-inline for this case would be super helpful.
@WeiqunZhang do you like to pus hthis to the PR or a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can put the helper function in amrex. Then we can use it here or in another PR. (I tested it. It did work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Something is still off with the Python tests. Besides the tolerance issues, they seem to run significantly longer than their app/executable counterparts? |
| This element requires these additional parameters: | ||
| * ``<element_name>.ds`` (``float``, in meters) the segment length | ||
| * ``<element_name>.unit`` (``integer``) specification of units for the vector potential (default: ``0``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called unit while the inputs file (app API) calls it units
Was not the same input as used in the app example
Was not the same input as used in the app example
Was not the same input as used in the app example
|
@cemitch99 the three python run files are not yet 100% identical with the app inputs files, which should be the origin of the failing tests. I fixed the things I spotted, but some differences remain. |
This works on lambdas, functors, normal functions. But it does not work on overloaded functions like std::sin. If needed, one could however wrap functions like std::sin inside a lambda function. Here is the motivation behind this PR. In this impactx PR (BLAST-ImpactX/impactx#964), a GPU kernel uses 8 amrex::Parser's. The CUDA CI fails if more than one job is used in build. Apparently the kernel is too big because all those parser functions are inlined. This PR provides a way to reduce the size by forcing noinline.
This works on lambdas, functors, normal functions. But it does not work on overloaded functions like std::sin. If needed, one could however wrap functions like std::sin inside a lambda function. It also does not work with normal functions for SYCL and one would have to wrap it inside a lambda. Here is the motivation behind this PR. In this impactx PR (BLAST-ImpactX/impactx#964), a GPU kernel uses 8 amrex::Parser's. The CUDA CI fails if more than one job is used in build. Apparently the kernel is too big because all those parser functions are inlined. This PR provides a way to reduce the size by forcing noinline.
|
The three new tests currently fail only in the case OpenMP / GCC w/ MPI w/ Python. Note that the execution time (of the Python test) is very long - 32.42 sec for the solenoid example, which is only 0.56 sec on macOS / AppleClang (with similar behavior for the other two tests). Also, although the solenoid example runs successfully, the initial and final beam moments agree to all digits (and they should not), which appears to indicate that no tracking push was applied to the beam. |
|
Thanks! If they run a bit on the longer end, let us add the |
This PR adds a new element that allows the user to track through a region with a specified magnetostatic vector potential. Symplectic integration is performed using the exact form of the nonlinear relativistic Hamiltonian. We use the semiexplicit integrator appearing in:
B. Jayawardana and T. Ohsawa, "Semiexplicit symplectic integrators for non-separable Hamiltonian systems," Math. Comput. 92, pp. 251-281 (2022),
https://doi.org/10.1090/mcom/3778
To do: