Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow restart calculations from a p-refined solution #30052

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

YaqiWang
Copy link
Contributor

@YaqiWang YaqiWang commented Mar 7, 2025

This possibly will also enable p-refinement in mesh generators.

Closes #30048.

@YaqiWang YaqiWang force-pushed the p_restart_30048 branch 4 times, most recently from ebb96a0 to 9027bde Compare March 7, 2025 22:29
@moosebuild
Copy link
Contributor

moosebuild commented Mar 8, 2025

Job Documentation, step Docs: sync website on 3c161ae wanted to post the following:

View the site here

This comment will be updated on new commits.

@YaqiWang
Copy link
Contributor Author

Are the test failures related with my change?

@roystgnr
Copy link
Contributor

Are the test failures related with my change?

Well, they're definitely not a fluke. Most of these failures look the same to me: we're hitting a material's computeQpProperties() via FEProblemBase::initElementStatefulProps(), but when we try to set a value for qp 0 we discover that the underlying MooseArray is still sitting at size 0, so it screams at us in debug builds and crashes in opt builds.

I don't see how they could be triggered by your changes here, though. Unless the changes from update() to meshChanged() in SetupMeshCompleteAction::act() are inadvertently causing some initialization to be omitted??

@YaqiWang
Copy link
Contributor Author

I don't see how they could be triggered by your changes here, though.

Me neither ;-( MooseMesh::meshChanged is calling MooseMesh::update. Without it, I actually saw a test crash on my desktop, (the test is using pre-split mesh with a h uniform refinement) which makes sense to me. I think the failing tests are there without it.

@YaqiWang
Copy link
Contributor Author

Eh, I saw one function MooseMesh::buildRefinementAndCoarseningMaps could be a bit suspicious. Let me check.

@YaqiWang
Copy link
Contributor Author

Still cannot find any... One thing is that EmptyAction is no longer build in THM tests with my change, but that does not seem matter because it is an empty action.

@@ -614,6 +614,13 @@ MooseMesh::update()
cacheInfo();
buildElemIDInfo();

// this will make moose mesh aware of p-refinement added by mesh generators including
// a file mesh generator loading a restart checkpoint file
for (const auto & elem : *getActiveLocalElementRange())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something quite insane is happening here. If I replace this with

for (const auto & elem : getMesh().active_local_element_ptr_range())

the error is gone. While I admit I should directly use the libMesh element range, but this code is not affecting anything supposedly. I will push the change momentarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd admit that MOOSE has a lot of shims that are basically "add a second way to do a libMesh thing for no better reason than a preference for camelCase over snake_case", there generally aren't any shims that are "add a second way to do a libMesh thing, but make it broken this time".

... unless this is one, in this context? getActiveLocalElementRange() doesn't just get two libMesh iterators defining a range, it copies the entire range into a new StoredRange, and it caches that copy. This isn't insane (we use StoredRange in our threading APIs, and the copy is too expensive to repeat for every threaded section), but: if anyone adds an element to the mesh or deletes an element from the mesh, then tries to use that cached range again before calling MooseMesh::meshChanged(), the result is undefined behavior. A crash if you're lucky.

Comment on lines 36 to 40
const auto families_enum = AddVariableAction::getNonlinearVariableFamilies();
MultiMooseEnum disable_p_refinement_for_families(families_enum.getRawNames());
params.addParam<MultiMooseEnum>("disable_p_refinement_for_families",
disable_p_refinement_for_families,
"What families we should disable p-refinement for.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very strange to have algebraic information in a mesh action

@@ -22,71 +22,71 @@
input = test.i
csvdiff = test_out.csv
prereq = 'disable_p_refinement/lagrange_second_order_mesh'
cli_args = 'Adaptivity/disable_p_refinement_for_families="hermite" AuxVariables/test/family=HERMITE AuxVariables/test/order=THIRD'
cli_args = 'Mesh/disable_p_refinement_for_families="hermite" AuxVariables/test/family=HERMITE AuxVariables/test/order=THIRD'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I'm not a fan of this at all. What's the point of having an Adaptivity block at this point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mesh could be p-refined by loading a checkpoint file. In such a case, we do not have [Adpativity] but we still need to disable certain shape function families. I think this is the reason that we need to move this into [Mesh]. We could move this into [Problem], but that is a syntax used for creating custom problems and may create implications. Having this parameter in [Mesh] just mean that the created mesh will not support these shape functions. Most of times, users can just use the default I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be much better to save the information about what families to disable p-refinement for as restartable data in the problem/assembly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mesh has absolutely no data about finite element families

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to move the code directly in FEProblemBase::init at its beginning. If so, disable_p_refinement_for_families can be a parameter of [Problem].

Copy link
Contributor Author

@YaqiWang YaqiWang Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a restartable data, it won't be available before calling FEProblemBase::init. We will have to call doingPRefinement after data are restarted in FEProblemBase::initialSetup. Will that be too late?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be too late, but I am not 100% sure. It just needs to happen before a FE::reinit occurs. I would like to at least try and see becuase I still think this parameter is a better fit in the Adaptivity block as it is about adaptivity

@moosebuild
Copy link
Contributor

Job OpenMPI on aec4026 : invalidated by @YaqiWang

Random failure? Cannot produce this on my mac.

@moosebuild
Copy link
Contributor

Job Modules parallel on aec4026 : invalidated by @YaqiWang

Random failure? Cannot produce this on my mac.

@moosebuild
Copy link
Contributor

moosebuild commented Mar 12, 2025

Job Coverage, step Generate coverage on 3c161ae wanted to post the following:

Framework coverage

a5ad50 #30052 3c161a
Total Total +/- New
Rate 85.29% 85.29% +0.00% 93.33%
Hits 109051 109070 +19 70
Misses 18807 18810 +3 5

Diff coverage report

Full coverage report

Modules coverage

Level set

a5ad50 #30052 3c161a
Total Total +/- New
Rate 85.49% 85.49% - 100.00%
Hits 324 324 - 1
Misses 55 55 - 0

Diff coverage report

Full coverage report

Full coverage reports

Reports

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

Job Coverage, step Verify coverage on aec4026 wanted to post the following:

The following coverage requirement(s) failed:

  • Failed to generate subchannel coverage rate (required: 87.0%)

Comment on lines 6189 to 6193
if (_displaced_mesh)
_displaced_mesh->doingPRefinement(true);
if (_displaced_problem)
{
_displaced_problem->mesh().doingPRefinement(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are the same

@@ -22,71 +22,71 @@
input = test.i
csvdiff = test_out.csv
prereq = 'disable_p_refinement/lagrange_second_order_mesh'
cli_args = 'Adaptivity/disable_p_refinement_for_families="hermite" AuxVariables/test/family=HERMITE AuxVariables/test/order=THIRD'
cli_args = 'Mesh/disable_p_refinement_for_families="hermite" AuxVariables/test/family=HERMITE AuxVariables/test/order=THIRD'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be too late, but I am not 100% sure. It just needs to happen before a FE::reinit occurs. I would like to at least try and see becuase I still think this parameter is a better fit in the Adaptivity block as it is about adaptivity

@YaqiWang
Copy link
Contributor Author

The issue is that this parameter disable_p_refinement_for_families is needed for a p-refined mesh in a checkpoint file loaded with file mesh generator, or in general any mesh generator performing p-refinement. I have to say that this parameter does not belong to Adaptivity although it is the only place using it currently. I will try making it restartable in FEProblemBase and report back.

@YaqiWang
Copy link
Contributor Author

Got a PETSc error with the new test I added if I move the doingPRefinement call after restart in FEProblemBase::initialSetup:

restart/p_refinement_restart.steady_from_steady_restart: [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
restart/p_refinement_restart.steady_from_steady_restart: [0]PETSC ERROR: Object is in wrong state
restart/p_refinement_restart.steady_from_steady_restart: [0]PETSC ERROR: Matrix is missing diagonal entry 33

@lindsayad
Copy link
Member

Let me take a look. Thanks for trying

1 similar comment
@lindsayad
Copy link
Member

Let me take a look. Thanks for trying

@lindsayad
Copy link
Member

Can you push a commit with the change to use restartable data so I can figure out why we are getting the PETSc error?

@YaqiWang
Copy link
Contributor Author

Can you push a commit with the change to use restartable data so I can figure out why we are getting the PETSc error?

The test in this PR can reproduce it.

@lindsayad
Copy link
Member

Did you not try declaring restartable data then?

@YaqiWang
Copy link
Contributor Author

Did you not try declaring restartable data then?

No. I stopped after seeing the error by moving the doingPRefinement call after restart in FEProblemBase::initialSetup.

@YaqiWang
Copy link
Contributor Author

YaqiWang commented Mar 17, 2025

Per our discussion today, we want to have a parameter Variables/*/disable_p_refinement replace the current Problem/disable_p_refinement_for_families. The parameter will have default value false for all families not in the list. We will have the same parameter for aux variables.

…a crash in tests/mesh/split_uniform_refine test
@YaqiWang
Copy link
Contributor Author

@lindsayad this is ready again.

Copy link
Member

@lindsayad lindsayad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done, but I'll comment with what I have for now

@@ -1923,7 +1923,7 @@ class Assembly
* of quadrature points reflects the element p-level)
* @param disable_p_refinement_for_families Families that we should disable p-refinement for
*/
void havePRefinement(const std::vector<FEFamily> & disable_p_refinement_for_families);
void havePRefinement(const std::set<FEFamily> & disable_p_refinement_for_families);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void havePRefinement(const std::set<FEFamily> & disable_p_refinement_for_families);
void havePRefinement(const std::unordered_set<FEFamily> & disable_p_refinement_for_families);

@@ -1203,6 +1206,17 @@ class SubProblem : public Problem
/// Whether p-refinement has been requested at any point during the simulation
bool _have_p_refinement;

/// Indicate whether a family is disabled for p-refinement
std::map<FEFamily, bool> _family_for_p_refinement;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::map<FEFamily, bool> _family_for_p_refinement;
std::unordered_map<FEFamily, bool> _family_for_p_refinement;

/// Indicate whether a family is disabled for p-refinement
std::map<FEFamily, bool> _family_for_p_refinement;
/// The set of variable families by default disable p-refinement
std::set<FEFamily> _default_families_without_p_refinement = {libMesh::LAGRANGE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::set<FEFamily> _default_families_without_p_refinement = {libMesh::LAGRANGE,
std::unordered_set<FEFamily> _default_families_without_p_refinement = {libMesh::LAGRANGE,

Comment on lines 1212 to 1218
std::set<FEFamily> _default_families_without_p_refinement = {libMesh::LAGRANGE,
libMesh::NEDELEC_ONE,
libMesh::RAVIART_THOMAS,
libMesh::LAGRANGE_VEC,
libMesh::CLOUGH,
libMesh::BERNSTEIN,
libMesh::RATIONAL_BERNSTEIN};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Populate this in the constructor plz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpoint restart failure with p-refined mesh
4 participants