-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proof of Concept] Implement data type to define threaded broadcasting #2284
base: main
Are you sure you want to change the base?
[Proof of Concept] Implement data type to define threaded broadcasting #2284
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2284 +/- ##
===========================================
- Coverage 89.49% 79.09% -10.40%
===========================================
Files 490 490
Lines 39507 39536 +29
===========================================
- Hits 35355 31270 -4085
- Misses 4152 8266 +4114
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks interesting. We need to check how this interacts with the GPU plans. It would also be great to get a review from Valentin (who self-requested his review already).
@@ -101,8 +102,78 @@ function semidiscretize(semi::AbstractSemidiscretization, tspan; | |||
return ODEProblem{iip, specialize}(rhs!, u0_ode, tspan, semi) | |||
end | |||
|
|||
struct ThreadedBroadcastArray{T, N, A} <: AbstractArray{T, N} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct ThreadedBroadcastArray{T, N, A} <: AbstractArray{T, N} | |
struct ThreadedBroadcastArray{T, N, A <: AbstractArray{T, N}} <: AbstractArray{T, N} |
Base.parent(m::ThreadedBroadcastArray) = m.array | ||
Base.size(m::ThreadedBroadcastArray) = size(m.array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base.parent(m::ThreadedBroadcastArray) = m.array | |
Base.size(m::ThreadedBroadcastArray) = size(m.array) | |
Base.parent(m::ThreadedBroadcastArray) = m.array | |
Base.pointer(m::ThreadedBroadcastArray) = pointer(parent(m)) | |
Base.size(m::ThreadedBroadcastArray) = size(parent(m)) |
It's also on the agenda for Tuesday. |
Also see trixi-framework/TrixiParticles.jl#722 for the TrixiParticles version of this. |
As discussed in #2283.
The new data type
ThreadedBroadcastArray
wraps any other array, but redefines broadcasting and functions likecopyto!
andfill!
to use@threaded
from Trixi.jl.Using this data type for
u_ode
makes broadcasting inperform_step!
of the time integration multithreaded.This has the same effect as setting
thread = True()
for time integration schemes that support this option, but it also works for other schemes and would follow a potential redefinition of@threaded
away from Polyester.jl.Without threaded time integration:
With
thread = True()
:With this PR:
Some notes:
thread = True()
and this PR is within measuring errors.perform_step!
of the Carpenter-Kennedy scheme.@batch
approach always allocates a little. Interestingly, thethread = True()
version, which uses FastBroadcast.jl (which in turn is based on Polyester.jl), does not allocate for thebroadcastX
timers. Not sure why. This is the code they're using:https://github.com/YingboMa/FastBroadcast.jl/blob/ad586d83ffcac15c92969b93dd5cf0c8fd025af9/src/FastBroadcast.jl#L326-L334