-
Notifications
You must be signed in to change notification settings - Fork 95
tests/f: fix remote tests & improve portability #6882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| if cylc.flow.flags.verbosity > 1: | ||
| print(f'$ ln -s "{target}" "{path}"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make symlink dir issues easier to debug.
| [[_retrieve]] | ||
| $(cylc config -i "[platforms][$CYLC_TEST_PLATFORM]") | ||
| [[_retrieve]] | ||
| retrieve job logs = True | ||
| install target = $CYLC_TEST_PLATFORM | ||
| [[_no_retrieve]] | ||
| $(cylc config -i "[platforms][$CYLC_TEST_PLATFORM]") | ||
| [[_no_retrieve]] | ||
| retrieve job logs = False | ||
| install target = $CYLC_TEST_PLATFORM | ||
| " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Allow default
[directives]to be specified in the test config. install targetshould already be set correctly.
| sed -i -E 's/--max-size=[^ ]* //' 'my-rsync.log.edited' # strip "retrieve job logs max size" arg | ||
| sort -u 'my-rsync.log.edited' # stip out duplicates (can result from PBS log file spooling) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Strip
--max-sizeoptions which may or may not be added based on config. - Tolerate job log retrieval retries (needed when PBS output spooling is in play).
| [[goodhostplatform]] | ||
| hosts = ${CYLC_TEST_HOST} | ||
| install target = ${CYLC_TEST_INSTALL_TARGET} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't define platforms like this because they don't inherit any config from $CYLC_TEST_PLATFORM.
| named_grep_ok "job kill retries & succeeds" \ | ||
| "\[jobs-kill out\] \[TASK JOB SUMMARY\].*1/mixedhosttask/01" \ | ||
| "${LOGFILE}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test turned out to be fragile. Replaced with more targetted checks.
| [[goodhosttask]] | ||
| script = sleep 60 | ||
| platform = goodhostplatform | ||
|
|
||
| [[mixedhosttask]] | ||
| script = sleep 60 | ||
| platform = mixedhostplatform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure both tasks are running when the kill command is sent.
| share/cycle = \$TMPDIR/\$USER/cylctb_tmp_share_dir | ||
| work = \$TMPDIR/\$USER | ||
| [[[$CYLC_TEST_INSTALL_TARGET]]] | ||
| run = \$TMPDIR/\$USER/test_cylc_symlink/ctb_tmp_run_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$TMPDIR/$USER does not exist on our new machine and $TMPDIR path changes from node to node :/
Switched to a permanent directory $HOME/cylctb-symlinks with one subdir for each test.
| [[symlink dirs]] | ||
| [[[${CYLC_TEST_INSTALL_TARGET}]]] | ||
| run = \$TMPDIR/\$USER/sym-run | ||
| run = \$HOME/cylctb-symlinks/$TEST_NAME/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
f1ba4c3 to
1d0563e
Compare
| cylc message -- 'echo done' | ||
| # wait up to PT1M for the cat-log task to succeed | ||
| # (the workflow will shut down if cat-log fails) | ||
| sleep 60 | ||
| # fail if the task was not orphaned by this point | ||
| false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was supposed to be testing cat-log against running tasks:
Test "cylc cat-log" of currently-running local and remote jobs.
However, there was nothing to keep the task running, so it was actually testing succeeded jobs.
I've changed the approach, the tasks now sleep for 60s, then fail.
If the cat-log works, it will cylc set the task, orphaning the sleep and triggering fin which is used to diagnose test success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back to an early version of this test, and it used to have sleep 60 there. Maybe got borked during migration to Cylc 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt did purposefully try to pull out sleeps from the tests in order to speed them up reduce flakyness. It might have vanished then.
In this particular case, the sleep is ok, but it could have easily been mistaken.
| local OPTS="$*" | ||
| local TEST_NAME | ||
| TEST_NAME="grep-ok: ${NAME}" | ||
| TEST_NAME="$(basename "${FILE}")-grep-ok" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert an erroneous change made in 15ac2fb which broke test reporting.
|
All tests now passing on _remote_background_indep_pbs |
tests/functional/events/17-task-event-job-logs-retrieve-command.t
Outdated
Show resolved
Hide resolved
|
(I had a quick run through this; it generally LGTM, mysterious polling result notwithstanding). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have made this _remote_background_indep_* only. The consequences of using the _remote_pbs_indep_tcp don't make sense.
| # Test job kill will retry on a different host if there is a connection failure | ||
|
|
||
| export REQUIRE_PLATFORM='loc:remote fs:indep comms:tcp' | ||
| export REQUIRE_PLATFORM='loc:remote fs:indep comms:tcp runner:background' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test relies on mocking background submission (my_background) so can only be run with background platforms.
65df52a to
be94b23
Compare
| # give a bit of grace time for the job to leave the job runner queue | ||
| sleep 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unavoidable sleep :(
Give PBS a grace period to remove the finished job from the queue before attempting to poll.
This failed reliably yesterday, but passed flakily today. Looks like the required grace period is fairly small.
4eeca35 to
36f98ea
Compare
|
Have added a bunch of fixes for tests which were using spaces, colons, slashes, etc in test IDs. These IDs are used as filename prefixes. |
36f98ea to
8715bcc
Compare
* Cylc platforms defined in tests must inherit from the test platform config. * Reference tests only capture triggering, not task outcome (add a downstream task if you need to check outputs). * Fix some timing issues. * Use the platform configured SSH command for any add-hoc SSH'es. * Remove `TMPDIR` dependence (ain't always there!).
* Test requires the background job runner. * Test names cannot contain spaces. * Revert an erroneous change made in 15ac2fb which broke test reporting.
* Tasks needed to wait for their started messages. * Give the job runner a few seconds grace time for the job to exit the queue. * Document the test.
* Test works by overriding the background job runner so is not compatible with other job runners.
* Test is only compatible with background platforms.
* Test IDs should be prefixed with "$TEST_NAME_BASE". * Test IDs should be valid for use as file names (no ":", "/", " ", etc). * `named_grep_ok` was broken by default :(
8715bcc to
5860116
Compare
|
All tests now passing on:
(and the other remote platforms covered by CI) |
|
Over 8 runs against
|
|
All pass for me. Have you synced your branch to the test platform? |
|
Now have all tests passing. Some of these seem a little sensitive to global config settings. |
|
(for info, the failure(s) Tim was seeing were related to timing issues around PBS job log spooling, we have to configure log retrieval retries to make this work) |
|
Ping @hjoliver |
Pre-release testing for 8.5.0 flagged several test issues.
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.