Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vasilii made test_execute_primitive resilient to MIOPEN_FIND_ENFORCE #3662

Merged
merged 4 commits into from
Apr 8, 2025

Conversation

randyspauldingamd
Copy link
Contributor

@randyspauldingamd randyspauldingamd commented Mar 31, 2025

Addressing a failure in https://ontrack-internal.amd.com/browse/SWDEV-524093

Fix for CPU_ExecutePrimitive_NONE* tests were sensitive to MIOPEN_FIND_ENFORCE being set in the environment. This was only affecting CQE tests.

@BrianHarrisonAMD
Copy link
Contributor

BrianHarrisonAMD commented Mar 31, 2025

Is this not already fixed by @bghimireamd's change here?

#3653

If we still have spots setting the FIND_ENFORCE environment during testing without properly clearing it, then we need to find and fix that spot I think. Otherwise, we could hit huge increases to testing runtime, and volatility in the single gtest binary.

@randyspauldingamd
Copy link
Contributor Author

Is this not already fixed by @bghimireamd's change here?

#3653

If we still have spots setting the FIND_ENFORCE environment during testing without properly clearing it, then we need to find and fix that spot I think. Otherwise, we could hit huge increases to testing runtime, and volatility in the single gtest binary.

We're not sure. I had in my notes that I repro'd the failure last week, but I couldn't repro it today. I'm going to do some testing with and without the single binary this evening to learn more, and I've asked Bibek what has been done.

In any case, this is a good fail-safe. Increasing test robustness is one of my primary concerns.

@randyspauldingamd randyspauldingamd changed the title Vasilii made test_execute_primitive resilient to FIND_ENFORCE Vasilii made test_execute_primitive resilient to MIOPEN_FIND_ENFORCE Apr 2, 2025
@randyspauldingamd
Copy link
Contributor Author

Is this not already fixed by @bghimireamd's change here?

#3653

If we still have spots setting the FIND_ENFORCE environment during testing without properly clearing it, then we need to find and fix that spot I think. Otherwise, we could hit huge increases to testing runtime, and volatility in the single gtest binary.

I've verified this crash does still exist with PR 3653. Continuing to investigate whether it might occur in CQE.

@randyspauldingamd
Copy link
Contributor Author

randyspauldingamd commented Apr 3, 2025

Still affecting QA.
Will merge this and create followup tickets to investigate the cause.

@randyspauldingamd
Copy link
Contributor Author

randyspauldingamd commented Apr 8, 2025

Promoted a test-local object to use for this and changed name to better reflect the usage.
Added a ticket for further investigation.

@randyspauldingamd
Copy link
Contributor Author

The ExecutePrimitive test now follows black-box better without requiring modifications to the test itself.

bghimireamd
bghimireamd previously approved these changes Apr 8, 2025
@@ -46,6 +46,29 @@ MIOPEN_INTERNALS_EXPORT void UpdateEnvVariable(std::string_view name, std::strin
// MT-Unsafe
MIOPEN_INTERNALS_EXPORT void ClearEnvVariable(std::string_view name);

class EnvVarSanitizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar class is here https://github.com/ROCm/MIOpen/blob/develop/test/gtest/gtest_common.hpp#L43. Can we reuse that class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add functionality to clear the var to it

@bghimireamd bghimireamd dismissed their stale review April 8, 2025 18:09

accidentally approved

@randyspauldingamd randyspauldingamd force-pushed the ddu/execute-primitive-env-proof branch from 20fd6b7 to 5b36341 Compare April 8, 2025 19:15

if(prev_env)
{
lib_env::clear(env_name);
Copy link
Contributor

@reidkwja reidkwja Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to provide logging/exception if clearing the variable fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked under the hood and I'd say the existing mechanism is sufficient. It already has strict enforcement on valid env names:

MIOPEN_THROW(miopenStatusNotImplemented, "Environment variable not found: " + name);

Glad you mentioned that--I hadn't dived into this code before and it seems the env lists may have gotten out of sync.

@randyspauldingamd randyspauldingamd merged commit 2adc013 into develop Apr 8, 2025
14 of 66 checks passed
@randyspauldingamd randyspauldingamd deleted the ddu/execute-primitive-env-proof branch April 8, 2025 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants