Add CI/CD pipeline with GitHub Actions #1

sderev · 2025-10-16T17:18:17Z

Summary

This PR introduces a comprehensive CI/CD pipeline using GitHub Actions.

Changes

1. Development Dependencies (`pyproject.toml`)

Add pytest-cov ~= 6.0 for code coverage
Add ruff ~= 0.14 for linting and formatting
Configure ruff with 100-character line length and isort import sorting

2. CI Workflow (`.github/workflows/ci.yml`)

Features:

Concurrency control - Cancel outdated runs automatically
Matrix testing - Python 3.10, 3.11, 3.12 across Ubuntu, macOS, Windows
Comprehensive checks:
- Linting with ruff (GitHub annotations)
- Code formatting validation
- Full test suite with coverage
- Security audit with pip-audit
- Package build and metadata validation
- Installation testing across platforms
Optimized caching - Smart uv dependency caching
Artifact management - JUnit XML, coverage reports, build artifacts
SLSA provenance - Supply chain security attestation
Aggregate status - Single "all checks pass" requirement

3. Release Workflow (`.github/workflows/release.yml`)

Features:

Semver validation - Ensures tag matches pyproject.toml version
Full test suite - Runs before any release
SLSA Level 3 provenance - Supply chain security
PyPI publishing - Trusted publishing with OIDC (no API tokens!)
GitHub releases - Auto-generated changelog grouped by type
Rollback support - Failure handling and recovery instructions
Manual dispatch - Flexible release triggering
Prerelease detection - Automatic handling of alpha/beta/rc tags

4. Dependency Check Workflow (`.github/workflows/dependencies.yml`)

Weekly automated dependency audits
Auto-creates GitHub issues for outdated packages
Smart issue management (no duplicates)

5. Documentation (`.github/workflows/README.md`)

Complete setup instructions
PyPI trusted publishing configuration
Manual workflow triggers
Release process guide
Status badges
Local testing commands
Troubleshooting guide

Key Features

Security
- SLSA Level 3 build provenance
- Dependency vulnerability scanning
- Trusted publishing (no secrets!)
Developer Experience
- Fast feedback with concurrency control
- Clear error annotations
- Comprehensive test results
- Coverage tracking
Automation
- Weekly dependency checks
- Auto-generated changelogs
- Smart issue management

Prerequisites

Before merging, configure PyPI trusted publishing:

Go to https://pypi.org/manage/account/publishing/
Add GitHub as trusted publisher
Specify workflow: release.yml
Specify environment: pypi

Testing

All workflows have been validated:

✅ YAML syntax verified
✅ Atomic commits created
✅ Git best practices followed
✅ Documentation complete

The CI will run automatically on this PR to validate everything works!

Author: AI 🤖

- introduce AllWordsTranslatedError and reuse its message - handle the custom type in translate and --count code paths - update the CLI test double to raise the new error class

Introduce `get_pair_file_paths()` and `backup_language_pair_files()` helper functions to simplify file path resolution and backup operations for language pairs.

Introduce `calculate_vocabulary_stats()` to compute total, translated, and pending word counts for a vocabulary file.

Introduce `rename_language_pair()` in config_handler to enable renaming existing language pairs while preserving default pair associations. Include tests for successful rename, validation of existing pairs, and prevention of duplicate pairs.

Introduce `AliasedGroup` class to enable command aliases (e.g., 'pair' → 'pairs') while hiding aliases from help output.

Add `pairs` command group with subcommands: list, add, default, set-default, remove, rename, inspect. Remove deprecated commands (setup, default, show, config default, config remove). Extract `create_language_pair_interactively()` helper from old setup command. Add helper functions for pair selection, parsing, and management. Update all related tests to reflect new command structure.

Update README.md to reflect new `pairs` command group and enhanced command options. Add documentation for `pairs list`, `pairs set-default`, `pairs remove`, `pairs rename`, and `pairs inspect` commands.

Use single quotes for test string literal in test_backup_occurs_before_chat_request to maintain consistency with project coding style.

The test was incorrectly expecting that canceling the pairs add operation would save the entered values. This contradicts the actual behavior where cancellation prevents any state changes. Add initial state setup and verify that the original state is preserved when the user cancels adding a new language pair.

* Add `pyproject.toml` with modern build system configuration * Use setuptools>=61.0 as build backend * Migrate all metadata from `setup.py` * Add dependency version constraints to prevent vulnerable updates: * `click ~= 8.3` (compatible release for 8.x) * `openai >= 0.28, <1.0` (preserve <1.0 constraint) * `tiktoken ~= 0.12` (compatible release for 0.12.x) * Add dev dependencies under `[project.optional-dependencies]` * Include comprehensive PyPI classifiers and project URLs * Remove deprecated packaging files: * Delete `setup.py` * Delete `requirements.txt` * Delete `requirements-dev.txt` * Add `uv.lock` to `.gitignore` (package, not application) This modernizes the build system per PEP 517/518 and simplifies dependency management. Version constraints use compatible release (`~=`) to allow security patches while preventing breaking changes. All tests pass (111 passed).

* Update `gpt_integration.py` * Change pricing from per-1k to per-1M tokens * Add support for `gpt-4.1`, `gpt-4o`, `o1`, `o3`, `gpt-5` models * Add fallback to `cl100k_base` encoding for token counting * Change `estimate_prompt_cost()` to accept `model` parameter and return single value * Update `cli.py` * Fix `compute_prompt_estimate()` to use `gpt-4.1` instead of `gpt-3.5-turbo` * Improve output colors (yellow for values, blue for model) * Update wording for better clarity * Update test suite * Fix test mocks to return single cost string * Update assertions to match new output format

* Use whitelist approach with character set validation (simple, Pythonic, secure) * Only allow letters, numbers, underscores, and hyphens * Replace regex with `set(string.ascii_letters + string.digits + '_-')` * User-friendly error messages (no scary security jargon) * Validate language names at all entry points (CLI, config, utils) * Add comprehensive security tests in `test_path_traversal_prevention.py` This prevents attackers from reading/writing files outside the intended data directory by using malicious language pair names like `../../../etc/passwd`. The whitelist blocks ALL dangerous characters that could be used for path traversal: `.`, `/`, `\`, and any special characters. Fixes: PERSO-186

* Add `is_same_language_pair()` to detect matching languages * Add `get_pair_mode()` to return "definition" or "translation" mode * Use case-insensitive comparison via `casefold()` These utilities enable the tool to determine whether a language pair should operate in translation mode (different languages) or definition mode (same language).

* Update prompt system to support definition mode * Update Anki deck naming for definition mode

* Test `is_same_language_pair()` with exact matches and case variations * Test `is_same_language_pair()` correctly identifies different languages * Test `get_pair_mode()` returns "definition" for same-language pairs * Test `get_pair_mode()` returns "translation" for different-language pairs * Verify case-insensitive matching works correctly All tests pass, confirming the mode detection logic works as expected.

Add `test_gpt_integration.py` with 5 tests for `format_prompt()` covering translation mode, definition mode, default behavior, multiple words handling, and system message consistency. Extend `test_csv_handler.py` with 2 tests for same-language pairs verifying definition mode deck naming and case-insensitive behavior. All 34 tests pass.

Create `test_same_language_integration.py` with 6 comprehensive integration tests: * Test full workflow with same-language pairs generates definitions * Verify Anki output uses "definitions" deck name for same-language pairs * Test case-insensitive language matching (English:ENGLISH) * Verify different-language pairs still work with translation mode * Confirm Anki output uses "vocabulary" deck name for translation mode * Test prompt generation correctly detects and applies mode All 129 tests pass.

Add documentation for the new same-language pair feature: * Add feature bullet in Features section explaining definition mode * Add subsection "Definition mode for same-language pairs" with usage example * Update Table of Contents to include new subsection * Explain how same-language pairs work (concise definitions, target language examples, Anki deck naming)

Add \`ensure_csv_has_fieldnames()\` call to \`get_words_to_translate()\` to prevent KeyError when CSV files have incorrect headers (e.g., 'translate' instead of 'translation'). Ensures all vocabulary files have correct fieldnames before reading.

Remove "concise" and "(2-3 words)" constraints from definition mode prompt: * Change "Provide concise definitions" to "Provide definitions" * Remove "(2-3 words)" length specification * Update format example to use "definition" instead of "concise definition" This allows the LLM to generate more detailed, comprehensive definitions when appropriate.

* Add atomic write pattern to prevent config file corruption * Add CSV sanitization to prevent formula injection attacks

* Add directory path validation to prevent system directory access * Add word input validation to prevent CSV corruption and injection

* Reformat code with `ruff format` * Apply automatic fixes with `ruff check --fix`

The test was missing the `fake_home` fixture, causing it to write to the real user config file at `~/.config/vocabmaster/config.json` instead of an isolated test location. * Add `fake_home` fixture parameter to test signature * Add explicit config file path monkeypatch for clarity * Tests now properly isolated and don't touch user config The bug was introduced in 8652183 when the path traversal tests were added.

* Remove `uv.lock` from `.gitignore` * Add `uv.lock` file to repository The migration commit (69a0ca2) incorrectly added `uv.lock` to `.gitignore` with the reasoning "(package, not application)". However, per uv's official documentation, the lockfile should be committed for all projects - both libraries and applications - to ensure reproducible environments across development and deployment. References: * https://docs.astral.sh/uv/concepts/projects/layout/#the-lockfile * https://docs.astral.sh/uv/guides/projects/#uvlock

* Add `pytest-cov` for code coverage reporting in CI * Add `ruff` for fast, comprehensive linting and formatting * Configure ruff with 100-character line length * Enable isort-style import sorting (I001) These dependencies support the new CI/CD pipeline workflows.

* **Lint job**: Run `ruff` linter and formatter with GitHub annotations * **Test job**: Matrix testing across Python 3.10-3.12 and Ubuntu/macOS/Windows * **Security job**: Run `pip-audit` for vulnerability scanning * **Build job**: Build distribution packages with metadata validation * **Install test job**: Verify package installation across all platforms * **All checks pass job**: Aggregate status check using alls-green Key features: * Concurrency control to cancel outdated runs * Optimized caching strategy for `uv` dependencies * Coverage reporting to Codecov * JUnit XML test results and HTML coverage reports * SLSA Level 3 build provenance attestation * Retention policies for artifacts (30-90 days) The workflow runs on push to `main`, pull requests, and manual dispatch.

* **Validate tag job**: Verify semver format and match with `pyproject.toml` * **Test job**: Run full test suite before releasing * **Build job**: Create distribution packages with hash generation * **Generate provenance job**: SLSA Level 3 supply chain security * **Publish PyPI job**: Automated publishing with trusted publishing (OIDC) * **Create release job**: GitHub release with auto-generated changelog * **Announce job**: Success summary with links * **Rollback job**: Failure handling and recovery instructions Key features: * Semver validation (v*.*.*) with pyproject.toml version matching * Manual dispatch support for flexible releases * Grouped changelog by feature/fix/other * Prerelease detection for alpha/beta/rc tags * SLSA provenance attached to releases for verification * Comprehensive error handling and rollback guidance Triggered on tags matching `v*.*.*` or manual workflow dispatch.

* Check for outdated dependencies using `uv pip list --outdated` * Create or update GitHub issue when outdated packages found * Run every Monday at 9:00 UTC * Manual dispatch available * Smart issue management (avoid duplicates) The workflow automatically creates issues labeled `dependencies` and `automated` for tracking updates.

* Overview of all workflows (CI, release, dependencies) * Setup instructions for PyPI trusted publishing * Manual trigger examples using `gh` CLI * Release process documentation * Status badge snippets for README * Local testing commands * Troubleshooting guide This documentation helps maintainers understand and operate the CI/CD pipeline effectively.

Add security workflow with three scanning jobs: * **Dependency scan** - Check for vulnerable dependencies with `pip-audit` and `safety` * **Secret scan** - Detect leaked credentials using Gitleaks * **CodeQL analysis** - Semantic security vulnerability detection Runs on push/PR to main, daily at 2:00 UTC, and supports manual dispatch. Standalone approach provides better security coverage than the integrated job in `ci.yml`.

Remove the integrated security audit job from `ci.yml` since security checks are now handled by the standalone `security.yml` workflow. This separates security concerns from the main CI pipeline. * Remove `security` job (dependency scan with `pip-audit`) * Update `build` and `all-checks-pass` jobs to remove `security` from `needs` arrays

Update `.github/workflows/README.md` to document the new standalone security workflow: * Add section 4 describing `security.yml` workflow and its three scanning jobs * Add `security.yml` to manual workflow triggers * Update troubleshooting section to reference the standalone security workflow * Replace "Security Audit Issues" with "Security Scan Issues"

github-advanced-security · 2025-11-14T09:47:18Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

AI and others added 24 commits October 15, 2025 01:25

Signal already-translated vocab with a custom exception

b872448

- introduce AllWordsTranslatedError and reuse its message - handle the custom type in translate and --count code paths - update the CLI test double to raise the new error class

Add backend: Utility functions for language pair file paths

7a90a93

Introduce `get_pair_file_paths()` and `backup_language_pair_files()` helper functions to simplify file path resolution and backup operations for language pairs.

Add backend: Vocabulary statistics calculation

90ac9ec

Introduce `calculate_vocabulary_stats()` to compute total, translated, and pending word counts for a vocabulary file.

Add CLI infrastructure: Command aliasing support

b5fba29

Introduce `AliasedGroup` class to enable command aliases (e.g., 'pair' → 'pairs') while hiding aliases from help output.

Update README.md

de7166f

Update README.md to reflect new `pairs` command group and enhanced command options. Add documentation for `pairs list`, `pairs set-default`, `pairs remove`, `pairs rename`, and `pairs inspect` commands.

Refactor test: Normalize string quote style

c066bc8

Use single quotes for test string literal in test_backup_occurs_before_chat_request to maintain consistency with project coding style.

Normalize string quote style

9ddac1f

Update prompt system and Anki handling for definition mode

5950d84

* Update prompt system to support definition mode * Update Anki deck naming for definition mode

Add data protection against corruption and injection

f540a93

* Add atomic write pattern to prevent config file corruption * Add CSV sanitization to prevent formula injection attacks

Add input validation to prevent corruption and injection in CLI

9e939ed

* Add directory path validation to prevent system directory access * Add word input validation to prevent CSV corruption and injection

Apply ruff formatting and linting fixes

a3a1851

* Reformat code with `ruff format` * Apply automatic fixes with `ruff check --fix`

sderev force-pushed the feature/github-actions-ci-cd branch from 8bbecb6 to 472f613 Compare October 16, 2025 22:01

sderev force-pushed the main branch from 10ee47b to a3a1851 Compare October 16, 2025 22:07

AI added 4 commits October 19, 2025 06:45

AI added 6 commits November 14, 2025 10:38

sderev force-pushed the feature/github-actions-ci-cd branch from 472f613 to f9117cb Compare November 14, 2025 09:46

sderev force-pushed the main branch from 0a14ce1 to 59e73df Compare November 16, 2025 01:28

sderev closed this Dec 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CI/CD pipeline with GitHub Actions #1

Add CI/CD pipeline with GitHub Actions #1

Uh oh!

sderev commented Oct 16, 2025

Uh oh!

github-advanced-security bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add CI/CD pipeline with GitHub Actions #1

Add CI/CD pipeline with GitHub Actions #1

Uh oh!

Conversation

sderev commented Oct 16, 2025

Summary

Changes

1. Development Dependencies (pyproject.toml)

2. CI Workflow (.github/workflows/ci.yml)

3. Release Workflow (.github/workflows/release.yml)

4. Dependency Check Workflow (.github/workflows/dependencies.yml)

5. Documentation (.github/workflows/README.md)

Key Features

Prerequisites

Testing

Uh oh!

github-advanced-security bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Development Dependencies (`pyproject.toml`)

2. CI Workflow (`.github/workflows/ci.yml`)

3. Release Workflow (`.github/workflows/release.yml`)

4. Dependency Check Workflow (`.github/workflows/dependencies.yml`)

5. Documentation (`.github/workflows/README.md`)