Skip to content

Conversation

@sderev
Copy link
Owner

@sderev sderev commented Oct 16, 2025

Summary

This PR introduces a comprehensive CI/CD pipeline using GitHub Actions.

Changes

1. Development Dependencies (pyproject.toml)

  • Add pytest-cov ~= 6.0 for code coverage
  • Add ruff ~= 0.14 for linting and formatting
  • Configure ruff with 100-character line length and isort import sorting

2. CI Workflow (.github/workflows/ci.yml)

Features:

  • Concurrency control - Cancel outdated runs automatically
  • Matrix testing - Python 3.10, 3.11, 3.12 across Ubuntu, macOS, Windows
  • Comprehensive checks:
    • Linting with ruff (GitHub annotations)
    • Code formatting validation
    • Full test suite with coverage
    • Security audit with pip-audit
    • Package build and metadata validation
    • Installation testing across platforms
  • Optimized caching - Smart uv dependency caching
  • Artifact management - JUnit XML, coverage reports, build artifacts
  • SLSA provenance - Supply chain security attestation
  • Aggregate status - Single "all checks pass" requirement

3. Release Workflow (.github/workflows/release.yml)

Features:

  • Semver validation - Ensures tag matches pyproject.toml version
  • Full test suite - Runs before any release
  • SLSA Level 3 provenance - Supply chain security
  • PyPI publishing - Trusted publishing with OIDC (no API tokens!)
  • GitHub releases - Auto-generated changelog grouped by type
  • Rollback support - Failure handling and recovery instructions
  • Manual dispatch - Flexible release triggering
  • Prerelease detection - Automatic handling of alpha/beta/rc tags

4. Dependency Check Workflow (.github/workflows/dependencies.yml)

  • Weekly automated dependency audits
  • Auto-creates GitHub issues for outdated packages
  • Smart issue management (no duplicates)

5. Documentation (.github/workflows/README.md)

  • Complete setup instructions
  • PyPI trusted publishing configuration
  • Manual workflow triggers
  • Release process guide
  • Status badges
  • Local testing commands
  • Troubleshooting guide

Key Features

  1. Security

    • SLSA Level 3 build provenance
    • Dependency vulnerability scanning
    • Trusted publishing (no secrets!)
  2. Developer Experience

    • Fast feedback with concurrency control
    • Clear error annotations
    • Comprehensive test results
    • Coverage tracking
  3. Automation

    • Weekly dependency checks
    • Auto-generated changelogs
    • Smart issue management

Prerequisites

Before merging, configure PyPI trusted publishing:

  1. Go to https://pypi.org/manage/account/publishing/
  2. Add GitHub as trusted publisher
  3. Specify workflow: release.yml
  4. Specify environment: pypi

Testing

All workflows have been validated:

  • ✅ YAML syntax verified
  • ✅ Atomic commits created
  • ✅ Git best practices followed
  • ✅ Documentation complete

The CI will run automatically on this PR to validate everything works!


Author: AI 🤖

AI and others added 24 commits October 15, 2025 01:25
- introduce AllWordsTranslatedError and reuse its message

- handle the custom type in translate and --count code paths

- update the CLI test double to raise the new error class
Introduce `get_pair_file_paths()` and `backup_language_pair_files()` helper
functions to simplify file path resolution and backup operations for
language pairs.
Introduce `calculate_vocabulary_stats()` to compute total, translated, and
pending word counts for a vocabulary file.
Introduce `rename_language_pair()` in config_handler to enable renaming
existing language pairs while preserving default pair associations.

Include tests for successful rename, validation of existing pairs, and
prevention of duplicate pairs.
Introduce `AliasedGroup` class to enable command aliases (e.g., 'pair' →
'pairs') while hiding aliases from help output.
Add `pairs` command group with subcommands: list, add, default, set-default,
remove, rename, inspect. Remove deprecated commands (setup, default, show,
config default, config remove).

Extract `create_language_pair_interactively()` helper from old setup command.
Add helper functions for pair selection, parsing, and management.

Update all related tests to reflect new command structure.
Update README.md to reflect new `pairs` command group and enhanced command options.
Add documentation for `pairs list`, `pairs set-default`, `pairs
remove`, `pairs rename`, and `pairs inspect` commands.
Use single quotes for test string literal in test_backup_occurs_before_chat_request to maintain consistency with project coding style.
The test was incorrectly expecting that canceling the pairs add operation would save the entered values. This contradicts the actual behavior where cancellation prevents any state changes.

Add initial state setup and verify that the original state is preserved when the user cancels adding a new language pair.
* Add `pyproject.toml` with modern build system configuration
  * Use setuptools>=61.0 as build backend
  * Migrate all metadata from `setup.py`
  * Add dependency version constraints to prevent vulnerable updates:
    * `click ~= 8.3` (compatible release for 8.x)
    * `openai >= 0.28, <1.0` (preserve <1.0 constraint)
    * `tiktoken ~= 0.12` (compatible release for 0.12.x)
  * Add dev dependencies under `[project.optional-dependencies]`
  * Include comprehensive PyPI classifiers and project URLs

* Remove deprecated packaging files:
  * Delete `setup.py`
  * Delete `requirements.txt`
  * Delete `requirements-dev.txt`

* Add `uv.lock` to `.gitignore` (package, not application)

This modernizes the build system per PEP 517/518 and simplifies
dependency management. Version constraints use compatible release
(`~=`) to allow security patches while preventing breaking changes.

All tests pass (111 passed).
* Update `gpt_integration.py`
  * Change pricing from per-1k to per-1M tokens
  * Add support for `gpt-4.1`, `gpt-4o`, `o1`, `o3`, `gpt-5` models
  * Add fallback to `cl100k_base` encoding for token counting
  * Change `estimate_prompt_cost()` to accept `model` parameter and return single value

* Update `cli.py`
  * Fix `compute_prompt_estimate()` to use `gpt-4.1` instead of `gpt-3.5-turbo`
  * Improve output colors (yellow for values, blue for model)
  * Update wording for better clarity

* Update test suite
  * Fix test mocks to return single cost string
  * Update assertions to match new output format
* Use whitelist approach with character set validation (simple, Pythonic, secure)
* Only allow letters, numbers, underscores, and hyphens
* Replace regex with `set(string.ascii_letters + string.digits + '_-')`
* User-friendly error messages (no scary security jargon)
* Validate language names at all entry points (CLI, config, utils)
* Add comprehensive security tests in `test_path_traversal_prevention.py`

This prevents attackers from reading/writing files outside the intended
data directory by using malicious language pair names like `../../../etc/passwd`.

The whitelist blocks ALL dangerous characters that could be used for path
traversal: `.`, `/`, `\`, and any special characters.

Fixes: PERSO-186
* Add `is_same_language_pair()` to detect matching languages
* Add `get_pair_mode()` to return "definition" or "translation" mode
* Use case-insensitive comparison via `casefold()`

These utilities enable the tool to determine whether a language pair
should operate in translation mode (different languages) or definition
mode (same language).
* Update prompt system to support definition mode

* Update Anki deck naming for definition mode
* Test `is_same_language_pair()` with exact matches and case variations
* Test `is_same_language_pair()` correctly identifies different languages
* Test `get_pair_mode()` returns "definition" for same-language pairs
* Test `get_pair_mode()` returns "translation" for different-language pairs
* Verify case-insensitive matching works correctly

All tests pass, confirming the mode detection logic works as expected.
Add `test_gpt_integration.py` with 5 tests for `format_prompt()` covering translation mode, definition mode, default behavior, multiple words handling, and system message consistency.

Extend `test_csv_handler.py` with 2 tests for same-language pairs verifying definition mode deck naming and case-insensitive behavior.

All 34 tests pass.
Create `test_same_language_integration.py` with 6 comprehensive integration tests:

* Test full workflow with same-language pairs generates definitions
* Verify Anki output uses "definitions" deck name for same-language pairs
* Test case-insensitive language matching (English:ENGLISH)
* Verify different-language pairs still work with translation mode
* Confirm Anki output uses "vocabulary" deck name for translation mode
* Test prompt generation correctly detects and applies mode

All 129 tests pass.
Add documentation for the new same-language pair feature:

* Add feature bullet in Features section explaining definition mode
* Add subsection "Definition mode for same-language pairs" with usage example
* Update Table of Contents to include new subsection
* Explain how same-language pairs work (concise definitions, target language examples, Anki deck naming)
Add \`ensure_csv_has_fieldnames()\` call to \`get_words_to_translate()\` to prevent KeyError when CSV files have incorrect headers (e.g., 'translate' instead of 'translation').

Ensures all vocabulary files have correct fieldnames before reading.
Remove "concise" and "(2-3 words)" constraints from definition mode prompt:
* Change "Provide concise definitions" to "Provide definitions"
* Remove "(2-3 words)" length specification
* Update format example to use "definition" instead of "concise definition"

This allows the LLM to generate more detailed, comprehensive definitions when appropriate.
* Add atomic write pattern to prevent config file corruption

* Add CSV sanitization to prevent formula injection attacks
* Add directory path validation to prevent system directory access

* Add word input validation to prevent CSV corruption and injection
* Reformat code with `ruff format`
* Apply automatic fixes with `ruff check --fix`
AI added 4 commits October 19, 2025 06:45
The test was missing the `fake_home` fixture, causing it to write to the
real user config file at `~/.config/vocabmaster/config.json` instead of
an isolated test location.

* Add `fake_home` fixture parameter to test signature
* Add explicit config file path monkeypatch for clarity
* Tests now properly isolated and don't touch user config

The bug was introduced in 8652183 when the path traversal tests were added.
* Remove `uv.lock` from `.gitignore`
* Add `uv.lock` file to repository

The migration commit (69a0ca2) incorrectly added `uv.lock` to `.gitignore`
with the reasoning "(package, not application)". However, per uv's official
documentation, the lockfile should be committed for all projects - both
libraries and applications - to ensure reproducible environments across
development and deployment.

References:
* https://docs.astral.sh/uv/concepts/projects/layout/#the-lockfile
* https://docs.astral.sh/uv/guides/projects/#uvlock
* Add `pytest-cov` for code coverage reporting in CI
* Add `ruff` for fast, comprehensive linting and formatting
* Configure ruff with 100-character line length
* Enable isort-style import sorting (I001)

These dependencies support the new CI/CD pipeline workflows.
* **Lint job**: Run `ruff` linter and formatter with GitHub annotations
* **Test job**: Matrix testing across Python 3.10-3.12 and Ubuntu/macOS/Windows
* **Security job**: Run `pip-audit` for vulnerability scanning
* **Build job**: Build distribution packages with metadata validation
* **Install test job**: Verify package installation across all platforms
* **All checks pass job**: Aggregate status check using alls-green

Key features:
* Concurrency control to cancel outdated runs
* Optimized caching strategy for `uv` dependencies
* Coverage reporting to Codecov
* JUnit XML test results and HTML coverage reports
* SLSA Level 3 build provenance attestation
* Retention policies for artifacts (30-90 days)

The workflow runs on push to `main`, pull requests, and manual dispatch.
AI added 6 commits November 14, 2025 10:38
* **Validate tag job**: Verify semver format and match with `pyproject.toml`
* **Test job**: Run full test suite before releasing
* **Build job**: Create distribution packages with hash generation
* **Generate provenance job**: SLSA Level 3 supply chain security
* **Publish PyPI job**: Automated publishing with trusted publishing (OIDC)
* **Create release job**: GitHub release with auto-generated changelog
* **Announce job**: Success summary with links
* **Rollback job**: Failure handling and recovery instructions

Key features:
* Semver validation (v*.*.*) with pyproject.toml version matching
* Manual dispatch support for flexible releases
* Grouped changelog by feature/fix/other
* Prerelease detection for alpha/beta/rc tags
* SLSA provenance attached to releases for verification
* Comprehensive error handling and rollback guidance

Triggered on tags matching `v*.*.*` or manual workflow dispatch.
* Check for outdated dependencies using `uv pip list --outdated`
* Create or update GitHub issue when outdated packages found
* Run every Monday at 9:00 UTC
* Manual dispatch available
* Smart issue management (avoid duplicates)

The workflow automatically creates issues labeled `dependencies` and
`automated` for tracking updates.
* Overview of all workflows (CI, release, dependencies)
* Setup instructions for PyPI trusted publishing
* Manual trigger examples using `gh` CLI
* Release process documentation
* Status badge snippets for README
* Local testing commands
* Troubleshooting guide

This documentation helps maintainers understand and operate the CI/CD
pipeline effectively.
Add security workflow with three scanning jobs:

* **Dependency scan** - Check for vulnerable dependencies with `pip-audit` and `safety`
* **Secret scan** - Detect leaked credentials using Gitleaks
* **CodeQL analysis** - Semantic security vulnerability detection

Runs on push/PR to main, daily at 2:00 UTC, and supports manual dispatch. Standalone approach provides better security coverage than the integrated job in `ci.yml`.
Remove the integrated security audit job from `ci.yml` since security checks are now handled by the standalone `security.yml` workflow. This separates security concerns from the main CI pipeline.

* Remove `security` job (dependency scan with `pip-audit`)
* Update `build` and `all-checks-pass` jobs to remove `security` from `needs` arrays
Update `.github/workflows/README.md` to document the new standalone security workflow:

* Add section 4 describing `security.yml` workflow and its three scanning jobs
* Add `security.yml` to manual workflow triggers
* Update troubleshooting section to reference the standalone security workflow
* Replace "Security Audit Issues" with "Security Scan Issues"
@sderev sderev force-pushed the feature/github-actions-ci-cd branch from 472f613 to f9117cb Compare November 14, 2025 09:46
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants