-
Notifications
You must be signed in to change notification settings - Fork 4
Fm/task/unst 9476 pol to cellmask #420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
harmenwierenga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
|
|
||
| private | ||
|
|
||
| !> dbpinpol routines are public to avoid PetSC dependency in unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how it helped to make them public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment is too vague, dbpinpol became a whole separate module from the pol_to_cellmask module so that the petsc dependency of pol_to_cellmask didn't bleed into the unit test. But maybe I should remove this comment as it will only cause questions
| @@ -1,4 +1,4 @@ | |||
| !----- AGPL -------------------------------------------------------------------- | |||
| !----- AGPL -------------------------------------------------------------------- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental space
| integer :: temp_threads | ||
| #endif | ||
| integer :: k | ||
| if (allocated(cellmask)) deallocate (cellmask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style guide: no single line if statements (see also below). A converter is currently under review 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
| #ifdef _OPENMP | ||
| temp_threads = omp_get_max_threads() !> Save old number of threads | ||
| if (jampi == 0) then | ||
| call omp_set_num_threads(OMP_GET_NUM_PROCS()) !> Set number of threads to max for this O(N^2) operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this circumvent the user settings for the number of threads? Are we sure that they do not set that number for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this circumvents the user settings. Most users do not set the number of threads, since they either use MPI to parallelize and OpenMP threading does not speed up Dflowfm meaningfully. However a year ago I introduced this same principle to another O(N^2) operation in find1dcells and it has not caused any problems.
Worst case scenario (you're running many testcases in parallel but not MPI) it might be a tiny bit slower due to thread scheduling overhead. In my opinion the benefits outweigh the gains. Ideally you would offload O(N^2) operations to the GPU or something. If you have 5 million cells you will get 25 trillion function evaluations.
| call omp_set_num_threads(OMP_GET_NUM_PROCS()) !> Set number of threads to max for this O(N^2) operation | ||
| end if !> no else, in MPI mode omp num threads is already set to 1 | ||
| #endif | ||
| !$OMP PARALLEL DO SCHEDULE(DYNAMIC, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you choose the chunk size of 100? Same question for find1dcells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added dynamic scheduling as the worst case evaluation is quite a bit slower than best-case. Without it it could be that 7 out of 8 threads are already finished with their best-case blocks and then have to wait for thread 8 to finish his worst-case block. chunk size 100 seemed like a good starting point, but to be honest I haven't put a lot of thought into it. Will do some research
| VISUAL_STUDIO_PATH engines_gpl/dflowfm/test | ||
| ) | ||
|
|
||
| # Force sequential build order for the test projects (workaround, to be fixed properly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible to remove this soon 😄
|
|
||
| ! Setup nested dry point polygons: | ||
| ! 1. Outer dry point polygon: x=[0,40], y=[0,40] with zpl=1 | ||
| ! 2. Inner dry point polygon: x=[10,30], y=[10,30] with zpl=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does zpl = 1 or -1 do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah it's explained in the code but not in the test. zpl =1 is a dry-area polygon, zpl -1 is an enclosure polygon. i'll update the description
What was done
Evidence of the work done
<add video/figures if applicable>
Tests
<add testcase numbers if applicable, Issue number>
Documentation
<add description of changes if applicable, Issue number>
Issue link