[CICD] Expand unit test coverage on CUDA#1210
Open
BrianPei wants to merge 10 commits into
Open
Conversation
# Conflicts: # flagscale/train/megatron/training/arguments_fs.py
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR expands unit test coverage across transformations, runner utilities, elastic monitoring/diagnostics, platform detection, and updates CI/coverage configuration to better target package code during reporting.
Changes:
- Added new unit tests for transformations (selectors/hooks/state scoping/log IO), runner utilities/launchers/factory/base, elastic monitoring/diagnostics/gpu health check, and platform management.
- Updated coverage configuration generation to focus on
flagscale/and omit non-target paths. - Adjusted CI workflow defaults for CUDA image builds and platform config for MetaX.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/transformations/test_transformation.py | Adds selector/transformation registry/config tests. |
| tests/unit_tests/transformations/test_state_store.py | Extends StateStore coverage (init args/reset/BaseState). |
| tests/unit_tests/transformations/test_state_scope_transform.py | Adds state-scope warning + cleanup callback tests. |
| tests/unit_tests/transformations/test_log_io_transformation.py | Adds LogIO hook/transformation tests. |
| tests/unit_tests/transformations/test_hook.py | Extends hook registry behavior/order/stateful propagation tests. |
| tests/unit_tests/test_small_modules.py | Adds tests for several small utility modules and integration helpers. |
| tests/unit_tests/test_agent_tool_matcher.py | Adds ToolMatcher scoring/degradation/cache/init-path tests. |
| tests/unit_tests/runner/test_runner_utils.py | Adds runner utils tests (args/envs/cmd update/log discovery/SSH helpers). |
| tests/unit_tests/runner/test_runner_train_inference_helpers.py | Adds train/inference runner helper tests (config updates + arg generation). |
| tests/unit_tests/runner/test_runner_factory.py | Adds RunnerFactory registry tests with stubbed imports. |
| tests/unit_tests/runner/test_runner_base.py | Adds Runner behavior tests (backend selection + launcher delegation). |
| tests/unit_tests/runner/test_launcher_ssh.py | Adds extensive SSH launcher tests (scripts, health checks, profiling, status). |
| tests/unit_tests/runner/elastic/test_simulated_fault.py | Adds simulated fault loop/CLI tests. |
| tests/unit_tests/runner/elastic/test_monitor_service.py | Adds MonitorService threading, hang detection, diagnostics, pid checks tests. |
| tests/unit_tests/runner/elastic/test_monitor_launcher.py | Adds monitor launcher/runner tests (pid wait, stop conditions, errors). |
| tests/unit_tests/runner/elastic/test_log_collector.py | Extends log collector tests (file resolution, offsets, local/remote paths). |
| tests/unit_tests/runner/elastic/test_gpu_health_check.py | Greatly expands GPU health check branch/path coverage (with torch stubs). |
| tests/unit_tests/runner/elastic/test_diagnostic.py | Extends diagnostic report tests (offsets/helpers/incremental output/errors). |
| tests/unit_tests/platforms/test_platforms.py | Adds platform registration/selection tests with optional torch stubbing. |
| tests/test_utils/runners/run_unit_tests.sh | Refines coverage config to focus on flagscale/ and omit non-target paths. |
| flagscale/train/megatron/training/arguments_fs.py | Formatting fixes + adds Engram CLI argument group. |
| .github/workflows/build_image_cuda.yml | Updates default runner labels/volumes and uses JSON-parsed runs-on. |
| .github/configs/metax.yml | Updates MetaX config paths and adds shared tar dir setting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| name: Build ${{ matrix.task }} | ||
| needs: prepare | ||
| runs-on: [self-hosted, Linux, X64, nvidia-0, gpus-8] | ||
| runs-on: ${{ fromJson(inputs.runs_on || '["flagscale-nvidia-a100-gpu2-32c-128g"]') }} |
| name: Load and push images | ||
| needs: ['build', 'summary'] | ||
| runs-on: [self-hosted, Linux, X64, nvidia-0, gpus-8] | ||
| runs-on: ${{ fromJson(inputs.runs_on || '["flagscale-nvidia-a100-gpu2-32c-128g"]') }} |
Comment on lines
+58
to
+59
| ["/mnt/airs-business/cicd/baai_datasets:/home/gitlab-runner/data", | ||
| "/mnt/airs-business/cicd/baai_tokenizers:/home/gitlab-runner/tokenizers"] |
Comment on lines
+360
to
+361
| '["/mnt/airs-business/cicd/docker_data:/home/gitlab-runner/data", | ||
| "/mnt/airs-business/cicd/docker_tokenizers:/home/gitlab-runner/tokenizers"]' }} |
| import pytest | ||
| from omegaconf import OmegaConf | ||
|
|
||
| torch = pytest.importorskip("torch") |
| warn.assert_called_once() | ||
|
|
||
|
|
||
| def test_add_decive_extra_config_merges_matching_device_and_preserves_other_dicts(): |
| } | ||
|
|
||
|
|
||
| def test_add_decive_extra_config_without_device_returns_full_config(): |
Comment on lines
+60
to
+61
| self.assertEqual(store._state_by_scope, {}) | ||
| self.assertIsNone(store._active_scope) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Improve test coverage across runner-related modules by adding broad unit test coverage for runner, elastic, platforms, transformations, and smaller utility modules, while also stabilizing the CUDA and MetaX CI setup through runner label, volume, and coverage configuration fixes.
Type of change
Changes
Checklist