Skip to content

Conversation

@FlorisBuwaldaDeltares
Copy link
Contributor

@FlorisBuwaldaDeltares FlorisBuwaldaDeltares commented Dec 3, 2025

What was done

  • pol_to_cellmask was split into 3 optimized routines (init, calculate, cleanup)
  • new pinpok_elemental for faster raycasting
  • use OpenMP to parallellize O(N^2) operation if not in MPI mode
  • Two unit tests, one with an enclosure polygon and inside it a dry area polygon, one with two nested dry area polygons.

Evidence of the work done

  • Video/figures
    <add video/figures if applicable>
  • Clear from the issue description
  • Not applicable

Tests

  • Tests updated
    <add testcase numbers if applicable, Issue number>
  • [ x] Not applicable

Documentation

  • Documentation updated
    <add description of changes if applicable, Issue number>
  • [ x] Not applicable
image image

Issue link

Copy link
Contributor

@harmenwierenga harmenwierenga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review


private

!> dbpinpol routines are public to avoid PetSC dependency in unit tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how it helped to make them public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the comment is too vague, dbpinpol became a whole separate module from the pol_to_cellmask module so that the petsc dependency of pol_to_cellmask didn't bleed into the unit test. But maybe I should remove this comment as it will only cause questions

@@ -1,4 +1,4 @@
!----- AGPL --------------------------------------------------------------------
!----- AGPL --------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental space

integer :: temp_threads
#endif
integer :: k
if (allocated(cellmask)) deallocate (cellmask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style guide: no single line if statements (see also below). A converter is currently under review 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

#ifdef _OPENMP
temp_threads = omp_get_max_threads() !> Save old number of threads
if (jampi == 0) then
call omp_set_num_threads(OMP_GET_NUM_PROCS()) !> Set number of threads to max for this O(N^2) operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this circumvent the user settings for the number of threads? Are we sure that they do not set that number for a reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this circumvents the user settings. Most users do not set the number of threads, since they either use MPI to parallelize and OpenMP threading does not speed up Dflowfm meaningfully. However a year ago I introduced this same principle to another O(N^2) operation in find1dcells and it has not caused any problems.
Worst case scenario (you're running many testcases in parallel but not MPI) it might be a tiny bit slower due to thread scheduling overhead. In my opinion the benefits outweigh the gains. Ideally you would offload O(N^2) operations to the GPU or something. If you have 5 million cells you will get 25 trillion function evaluations.

call omp_set_num_threads(OMP_GET_NUM_PROCS()) !> Set number of threads to max for this O(N^2) operation
end if !> no else, in MPI mode omp num threads is already set to 1
#endif
!$OMP PARALLEL DO SCHEDULE(DYNAMIC, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose the chunk size of 100? Same question for find1dcells

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added dynamic scheduling as the worst case evaluation is quite a bit slower than best-case. Without it it could be that 7 out of 8 threads are already finished with their best-case blocks and then have to wait for thread 8 to finish his worst-case block. chunk size 100 seemed like a good starting point, but to be honest I haven't put a lot of thought into it. Will do some research

VISUAL_STUDIO_PATH engines_gpl/dflowfm/test
)

# Force sequential build order for the test projects (workaround, to be fixed properly)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be possible to remove this soon 😄


! Setup nested dry point polygons:
! 1. Outer dry point polygon: x=[0,40], y=[0,40] with zpl=1
! 2. Inner dry point polygon: x=[10,30], y=[10,30] with zpl=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does zpl = 1 or -1 do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it's explained in the code but not in the test. zpl =1 is a dry-area polygon, zpl -1 is an enclosure polygon. i'll update the description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants