Skip to content

DaCe backend: set unstructured_horizontal_has_unit_stride=True#1130

Open
havogt wants to merge 5 commits intomainfrom
havogt-patch-2
Open

DaCe backend: set unstructured_horizontal_has_unit_stride=True#1130
havogt wants to merge 5 commits intomainfrom
havogt-patch-2

Conversation

@havogt
Copy link
Copy Markdown
Contributor

@havogt havogt commented Mar 25, 2026

No description provided.

@havogt havogt requested a review from edopao March 25, 2026 19:14
@havogt
Copy link
Copy Markdown
Contributor Author

havogt commented Mar 25, 2026

cscs-ci run default

@havogt
Copy link
Copy Markdown
Contributor Author

havogt commented Mar 25, 2026

cscs-ci run distributed

@havogt
Copy link
Copy Markdown
Contributor Author

havogt commented Mar 26, 2026

cscs-ci run distributed

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Mar 31, 2026

cscs-ci run distributed

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Mar 31, 2026

cscs-ci run default

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Apr 1, 2026

cscs-ci run dace

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Apr 1, 2026

@msimberg Do you have any idea why the distributed CI pipeline is failing? It seems that the test passes.

@havogt
Copy link
Copy Markdown
Contributor Author

havogt commented Apr 1, 2026

@edopao
from Mikael (a few days ago I had the same question):

hmm, on the job you linked it's useful to know the following:

  1. it passed on rank 0 but failed overall -> likely another rank failed
  2. srun: error: nid005301: task 1: Exited with exit code 1 -> rank 1 failed
  3. go to the right panel and browse or download the job artifacts
  4. open the log for rank 1
  5. ???
  6. profit

in this case the output is

FAILED model/atmosphere/dycore/tests/dycore/mpi_tests/test_parallel_solve_nonhydro.py::test_run_solve_nonhydro_single_step[experiment0-1-2021-06-20T12:00:10.000-1-2-2021-06-20T12:00:10.000-1-True] - AssertionError: assert False
 +  where False = <function dallclose at 0x4003eee55870>(array([[-4.49902001, -2.47195228, -2.43418221, ..., -1.1147065 ,\n        -0.97308378, -0.78161297],\n       [-4.74286997, -2.51603478, -2.50672366, ..., -1.60057587,\n        -1.36821611, -1.03553511],\n       [-3.69400648, -1.96190308, -3.1953062 , ..., -1.31898163,\n        -1.17395118, -0.9497928 ],\n       ...,\n       [ 9.05451828,  1.8218479 ,  3.39239301, ...,  0.90259989

...directly from that I can't tell if it's completely off or just something in the numerics that has changed the error slightly such that you just need to change tolerances
if you want to see the output more easily in ci you can also remove the ci-mpi-wrapper around the nox/pytest call, but all the output will be interleaved (it's a bit more obvious that something really went wrong, but it's a pain to read the logs)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Apr 1, 2026

cscs-ci run distributed

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Apr 1, 2026

cscs-ci run dace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants