|
| 1 | +# DPDispatcher Development Guide |
| 2 | + |
| 3 | +DPDispatcher is a Python package used to generate HPC (High-Performance Computing) scheduler systems (Slurm/PBS/LSF/Bohrium) jobs input scripts, submit them to HPC systems, and poke until they finish. |
| 4 | + |
| 5 | +Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here. |
| 6 | + |
| 7 | +## Working Effectively |
| 8 | + |
| 9 | +### Bootstrap, Build, and Test |
| 10 | + |
| 11 | +- **Create virtual environment and install:** |
| 12 | + |
| 13 | + ```bash |
| 14 | + uv venv .venv |
| 15 | + source .venv/bin/activate |
| 16 | + uv pip install .[test] coverage ruff pre-commit |
| 17 | + ``` |
| 18 | + |
| 19 | +- **Run the test suite:** |
| 20 | + |
| 21 | + ```bash |
| 22 | + python -m coverage run -p --source=./dpdispatcher -m unittest -v |
| 23 | + python -m coverage combine |
| 24 | + python -m coverage report |
| 25 | + ``` |
| 26 | + |
| 27 | + - **TIMING: Tests take ~25 seconds. NEVER CANCEL - set timeout to 2+ minutes.** |
| 28 | + |
| 29 | +- **Run linting and formatting:** |
| 30 | + |
| 31 | + ```bash |
| 32 | + ruff check . |
| 33 | + ruff format --check . |
| 34 | + ``` |
| 35 | + |
| 36 | + - **TIMING: Linting takes <1 second.** |
| 37 | + |
| 38 | +- **Run type checking:** |
| 39 | + |
| 40 | + ```bash |
| 41 | + pyright |
| 42 | + ``` |
| 43 | + |
| 44 | + - **TIMING: Type checking takes ~2-3 seconds.** |
| 45 | + |
| 46 | +- **Build documentation:** |
| 47 | + |
| 48 | + ```bash |
| 49 | + uv pip install .[docs] |
| 50 | + cd doc |
| 51 | + make html |
| 52 | + ``` |
| 53 | + |
| 54 | + - **TIMING: Documentation build takes ~14 seconds.** |
| 55 | + |
| 56 | +### CLI Usage |
| 57 | + |
| 58 | +- **Test basic CLI functionality:** |
| 59 | + |
| 60 | + ```bash |
| 61 | + dpdisp --help |
| 62 | + dpdisp run examples/dpdisp_run.py |
| 63 | + ``` |
| 64 | + |
| 65 | +- **Run sample job dispatch script:** |
| 66 | + ```bash |
| 67 | + python examples/dpdisp_run.py |
| 68 | + ``` |
| 69 | + |
| 70 | +## Validation |
| 71 | + |
| 72 | +- **ALWAYS run the test suite after making code changes.** Tests execute quickly (~25 seconds) and should never be cancelled. |
| 73 | +- **ALWAYS run linting before committing:** `ruff check . && ruff format --check .` |
| 74 | +- **ALWAYS run type checking:** `pyright` to catch type-related issues. |
| 75 | +- **Test CLI functionality:** Run `dpdisp --help` and test with example scripts to ensure the CLI works correctly. |
| 76 | +- **Build documentation:** Run `make html` in the `doc/` directory to verify documentation builds without errors. |
| 77 | + |
| 78 | +## Build and Development Workflow |
| 79 | + |
| 80 | +- **Dependencies:** The project uses `uv` as the package manager for fast dependency resolution. |
| 81 | +- **Build system:** Uses `setuptools` with `pyproject.toml` configuration. |
| 82 | +- **Python versions:** Supports Python 3.7+ (check `pyproject.toml` for current support matrix). |
| 83 | +- **Testing:** Uses `unittest` framework with coverage reporting. |
| 84 | +- **Linting:** Uses `ruff` for both linting and formatting. |
| 85 | +- **Type checking:** Uses `pyright` for static type analysis (configured in `pyproject.toml`). |
| 86 | +- **Documentation:** Uses Sphinx with MyST parser for markdown support. |
| 87 | + |
| 88 | +## Key Components |
| 89 | + |
| 90 | +### Core Modules |
| 91 | + |
| 92 | +- **`dpdispatcher/submission.py`**: Main classes - `Submission`, `Job`, `Task`, `Resources` |
| 93 | +- **`dpdispatcher/machine.py`**: Machine configuration and dispatch logic |
| 94 | +- **`dpdispatcher/contexts/`**: Context managers for different execution environments (SSH, local, etc.) |
| 95 | +- **`dpdispatcher/machines/`**: HPC scheduler implementations (Slurm, PBS, LSF, Shell, etc.) |
| 96 | + |
| 97 | +### Configuration Files |
| 98 | + |
| 99 | +- **`examples/machine/`**: Example machine configurations (JSON format) |
| 100 | +- **`examples/resources/`**: Example resource specifications |
| 101 | +- **`examples/task/`**: Example task definitions |
| 102 | +- **`examples/dpdisp_run.py`**: Complete working example using PEP 723 script metadata |
| 103 | + |
| 104 | +### Important Directories |
| 105 | + |
| 106 | +- **`tests/`**: Comprehensive test suite including unit tests and integration tests |
| 107 | +- **`doc/`**: Sphinx documentation source files |
| 108 | +- **`ci/`**: CI/CD configuration for Docker-based testing |
| 109 | + |
| 110 | +## Common Tasks |
| 111 | + |
| 112 | +### Running Examples |
| 113 | + |
| 114 | +```bash |
| 115 | +# Example with lazy local execution (no HPC needed) |
| 116 | +dpdisp run examples/dpdisp_run.py |
| 117 | + |
| 118 | +# Example output files created: |
| 119 | +# - log: contains "hello world!" output |
| 120 | +# - Various job tracking files (automatically cleaned up) |
| 121 | +``` |
| 122 | + |
| 123 | +### Testing with Different HPC Systems |
| 124 | + |
| 125 | +- **Local/Shell execution:** Use `LazyLocalContext` with `Shell` batch type (examples/machine/lazy_local.json) |
| 126 | +- **SSH remote execution:** Use `SSHContext` (requires SSH setup) |
| 127 | +- **HPC schedulers:** Use `Slurm`, `PBS`, `LSF` batch types (require actual HPC environment) |
| 128 | + |
| 129 | +### Debugging Job Execution |
| 130 | + |
| 131 | +- Check job logs in the work directory |
| 132 | +- Use `dpdisp submission` commands to inspect job states |
| 133 | +- Review machine and resource configurations in JSON files |
| 134 | + |
| 135 | +## Project Structure Reference |
| 136 | + |
| 137 | +``` |
| 138 | +. |
| 139 | +├── README.md |
| 140 | +├── pyproject.toml # Project configuration and dependencies |
| 141 | +├── dpdispatcher/ # Main package |
| 142 | +│ ├── __init__.py |
| 143 | +│ ├── submission.py # Core submission/job/task classes |
| 144 | +│ ├── machine.py # Machine configuration |
| 145 | +│ ├── contexts/ # Execution contexts (SSH, local, etc.) |
| 146 | +│ ├── machines/ # HPC scheduler implementations |
| 147 | +│ └── utils/ # Utility functions |
| 148 | +├── examples/ # Example configurations and scripts |
| 149 | +│ ├── machine/ # Example machine configs |
| 150 | +│ ├── resources/ # Example resource specs |
| 151 | +│ ├── task/ # Example task definitions |
| 152 | +│ └── dpdisp_run.py # Complete working example |
| 153 | +├── tests/ # Comprehensive test suite |
| 154 | +├── doc/ # Sphinx documentation |
| 155 | +└── ci/ # CI/CD Docker configurations |
| 156 | +``` |
| 157 | + |
| 158 | +## Development Notes |
| 159 | + |
| 160 | +- **Always use virtual environments** - The system Python is externally managed |
| 161 | +- **Always add type hints** - Include proper type annotations in all Python code for better maintainability |
| 162 | +- **Always use conventional commit format** - All commit messages and PR titles must follow the conventional commit specification (e.g., `feat:`, `fix:`, `docs:`, `refactor:`, `test:`, `chore:`) |
| 163 | +- **Test artifacts are gitignored** - Job execution creates temporary files that are automatically excluded |
| 164 | +- **Pre-commit hooks available** - Use `pre-commit install` to enable automated code quality checks |
| 165 | +- **Multiple execution contexts** - Code supports local execution, SSH remote execution, and various HPC schedulers |
| 166 | +- **Extensive examples** - Use `examples/` directory as reference for proper configuration formats |
| 167 | + |
| 168 | +## Troubleshooting |
| 169 | + |
| 170 | +- **Virtual environment issues:** Always use `uv venv .venv` and `source .venv/bin/activate` |
| 171 | +- **Test failures:** Most tests run locally; some require specific HPC environments and will be skipped |
| 172 | +- **Documentation build warnings:** Some warnings about external inventory URLs are expected in sandboxed environments |
| 173 | +- **Pre-commit network issues:** If pre-commit fails with network timeouts, run `ruff check` and `ruff format` directly |
0 commit comments