-
Notifications
You must be signed in to change notification settings - Fork 9
Add FAQ automation system with GitHub Actions integration #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FAQ automation system with GitHub Actions integration #14
Conversation
This commit introduces a comprehensive FAQ automation system that uses RAG and LLM-based triage to intelligently process new FAQ proposals. Features: - AI-powered FAQ proposal analysis (NEW/UPDATE/DUPLICATE decisions) - Automated PR creation for approved changes - GitHub issue template for structured FAQ proposals - Complete test suite with unit and integration tests - Comprehensive documentation (README, CONTRIBUTING) Components: - faq_automation/: Python module with core logic - core.py: FAQ processing utilities - rag_agent.py: LLM-based decision agent using OpenAI - actions.py: GitHub Actions integration helpers - cli.py: Command-line interface for workflow - .github/workflows/faq-automation.yml: GitHub Actions workflow - .github/ISSUE_TEMPLATE/faq-proposal.yml: Structured issue template - tests/: Comprehensive test coverage - CONTRIBUTING.md: Contributor guidelines - README.md: Updated with full documentation Dependencies added: - minsearch: Lightweight text search for FAQ retrieval - openai: LLM integration for decision making - pydantic: Structured output validation The system processes FAQ proposals through: 1. Issue submission via GitHub template 2. Retrieval of similar existing FAQs 3. LLM analysis and decision (NEW/UPDATE/DUPLICATE) 4. Automated PR creation or issue closure with feedback Supports: machine-learning-zoomcamp (initial course) Can be extended to support all courses in the future
|
Created this WIP PR using claude-code. Needs testing before it can be merged. |
|
That's a lot of code and text =) Have you tested it? |
|
Also can you add a link to the contributing guide to the FAQ pages at the top? And we probably need an issue template where people can select the course they are continuing to. Maybe we can have a drop down list? Not sure if it's possible But at least we need a format so it's easy to extract it from the text |
|
Thank you for your comments, @alexeygrigorev! I will work on addressing them. FYI, this is just an early draft that I added, made with claude-code, and needs a lot of testing still. I've already caught a few bugs but many more to come I'm sure. |
|
- Update CLI default model from gpt-4 to gpt-5-nano - Update RAG agent default model to gpt-5-nano - Update GitHub Actions workflow to use gpt-5-nano - Fix setuptools package configuration - Fix minsearch version requirement (0.0.7)
Added a simple text banner with link to CONTRIBUTING.md at the top of each course page to encourage user contributions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Set font size to 1.17em to match the question heading size for better visibility and consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Change course field from input to dropdown menu - Add all 4 available courses as options: - machine-learning-zoomcamp - data-engineering-zoomcamp - llm-zoomcamp - mlops-zoomcamp - Update uv.lock to sync with FAQ automation dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add workflow_dispatch trigger with issue_number input - Support both automatic (issue opened) and manual execution - Fetch issue data when manually triggered - Update error handler to use correct issue number This allows testing the workflow on feature branches without merging. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Is it ready for merge? |
- Remove incorrect 'python -m' prefix from uv commands in Makefile - Update parse_issue_body to stop collecting content at any ### section - Ensures Checklist and other sections are excluded from parsed answer - All tests now passing
- Remove Quick Start section (missing necessary environment setup) - Update all GPT-4 references to GPT-5 - Development section now contains all necessary setup instructions
|
I've made significant updates to the |
- Replace relative issue links with absolute URLs - Use direct link to FAQ proposal template - Simplify CONTRIBUTING instructions (remove redundant steps) - Point to https://github.com/DataTalksClub/faq/issues
- Document test_faq_automation.py (core functions) - Document test_cli_parsing.py (issue body parsing) - Document test_faq_actions.py (GitHub Actions integration) - Update test coverage section with test counts - Add example commands for running FAQ automation tests
- Reorganize test_faq_automation.py into classes (TestParseFrontmatter, TestWriteFrontmatter, TestGenerateDocumentId, TestKeepRelevant) - Reorganize test_cli_parsing.py into TestParseIssueBody class - Reorganize test_faq_actions.py into classes (TestGeneratePRBody, TestGenerateDuplicateComment) - Follow the established test structure pattern from test_sorting.py - All 102 unit tests passing
- Add example for running specific FAQ automation test method - Add example for running specific CLI parsing test method - Show proper class-based test structure in examples
- Use 'make test' commands as primary examples - Show 'uv run pytest' commands as alternatives - Remove '--extra dev' flag for consistency with Makefile - All test commands now consistent across documentation
Update faq_automation/rag_agent.py to use the correct OpenAI API syntax from the notebook prototype: - Changed beta.chat.completions.parse to responses.parse - Changed messages parameter to input - Changed response_format parameter to text_format - Updated response parsing to extract from response.output All 116 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Eliminates code duplication between JavaScript and Python parsing by using a single Python implementation for all issue body parsing. Changes: - Add parse_full_issue_body() to extract course, question, and answer - Create scripts/extract_issue_fields.py for GitHub Actions integration - Simplify workflow to use Python parsing instead of JS - Add 6 comprehensive tests for parse_full_issue_body() - Update tests/README.md with new test documentation Benefits: - Single source of truth for parsing logic - All parsing code is testable - Easier maintenance (one language, one implementation) - No duplication between JS and Python Test results: All 122 tests pass (was 116, added 6 new tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Creates a shared GitHub Actions helper module to eliminate bash scripting for writing to GITHUB_OUTPUT environment variable. Changes: - Create faq_automation/github_actions.py with write_github_output() - Supports both multiline (heredoc) and single-line formats - Handles local testing mode (prints to stdout) - Environment detection helpers (is_github_actions, get_github_output_path) - Update scripts/extract_issue_fields.py to use shared function - Import write_github_output from github_actions module - Remove duplicate implementation - Create scripts/write_faq_decision_output.py - Reads faq_decision.json - Writes to GITHUB_OUTPUT using Python - Replaces bash: echo "decision=$(jq -c .)" >> $GITHUB_OUTPUT - Update .github/workflows/faq-automation.yml - Replace bash output logic with Python script call - Cleaner, more maintainable workflow - Add comprehensive tests (10 new tests) - test_github_actions.py with 3 test classes - Multiline and single-line output formats - Local testing mode behavior - Environment detection - Update tests/README.md documentation Benefits: - All GitHub Actions integration uses Python - Single source of truth for output writing - Fully testable (no bash to test) - Consistent approach across all scripts - Better error handling and validation Test results: All 132 tests pass (was 122, added 10 new tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Replace 'uv pip install --system -e .' with 'uv sync --no-dev' for: - Consistency with local development workflow - Modern uv best practices - Declarative dependency management - Faster installation (skips dev dependencies not needed in automation) Benefits: - Uses same uv sync approach as README recommends locally - Only installs production dependencies needed for FAQ automation - Automatic virtual environment management - Better reproducibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Removes code duplication and simplifies workflow by having CLI parse course from issue body instead of extracting it separately. Changes: - Remove --course argument from CLI (faq_automation/cli.py) - Always use parse_full_issue_body() to extract course, question, answer - Update all references from args.course to parsed course variable - Simplify workflow (.github/workflows/faq-automation.yml) - Keep "Fetch issue body" step for clean separation - Remove "Extract issue fields with Python" step entirely (~26 lines removed) - Simplify "Process FAQ with AI" step (~14 lines removed) - Pass full issue body directly to CLI without reconstruction - Delete scripts/extract_issue_fields.py (no longer needed) - Update README.md example - Add course field to test_issue.txt example - Remove --course argument from CLI command Benefits: - Workflow reduced from 3 steps to 2 steps - Removed ~40 lines from workflow file - Deleted 1 script file (67 lines) - Simpler CLI interface (one less argument) - Single parsing path (no conditionals) - Easier to maintain and understand All 132 tests pass ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
After switching to 'uv sync --no-dev', Python commands need to run within the uv-managed virtual environment using 'uv run'. Changes: - Use 'uv run python -m faq_automation.cli' to run CLI module - Use 'uv run scripts/write_faq_decision_output.py' for script (leverages shebang line for cleaner syntax) This ensures Python commands execute with the correct dependencies installed by uv sync. Fixes: ModuleNotFoundError: No module named 'yaml' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Removed parse_issue_body() function from CLI as it is no longer used in production. After removing the --course argument, the CLI always uses parse_full_issue_body() which extracts course, question, and answer from the full issue body. Changes: - Removed parse_issue_body() function from faq_automation/cli.py (58 lines) - Removed TestParseIssueBody class from tests/unit/test_cli_parsing.py (79 lines) - Updated import statement in test file - Updated tests/README.md to reflect new test count (132 → 127 tests) All 127 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Updated description to clarify tests cover both site generator and FAQ automation - Added guidance for adding tests to FAQ automation system - Specified which test files to use for different FAQ automation components - Added notes about testing with real issue bodies and mocking external dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added 26 integration tests covering the complete end-to-end FAQ automation workflow, bringing total test count from 143 to 153. Test coverage includes: - FAQ agent integration (5 tests): initialization, search, and proposal processing - File creation and updates (3 tests): creating new FAQs and updating existing ones - PR and comment generation (5 tests): generating outputs for all decision types - CLI integration (2 tests): parsing issue bodies and full CLI execution - Error handling and edge cases (4 tests): empty sections, non-existent docs, etc. - Site generator integration (3 tests): verifying created files work with generate_website.py - End-to-end workflows (3 tests): complete NEW/UPDATE/DUPLICATE flows All tests use mocked OpenAI API responses for consistency and speed, while performing real file I/O to verify format compatibility with the site generator. Updated tests/README.md with comprehensive documentation of the new test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Modified FAQ automation workflow to include "Closes #<issue>" in PR body. This uses GitHub's native auto-close feature to automatically close the originating issue when the FAQ bot's PR is merged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated README.md and CONTRIBUTING.md to clarify that issues with NEW or UPDATE actions are automatically closed when their associated PRs are merged, using GitHub's native "Closes #issue" feature. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Removed workflow_dispatch trigger and all related logic for manual workflow execution. The workflow now only triggers automatically on issue creation with the faq-proposal label, which simplifies the codebase and reduces maintenance burden. Changes: - Removed workflow_dispatch trigger and inputs section - Simplified job condition to only check for faq-proposal label - Removed fallback logic for manual issue number input - Streamlined issue body fetching (always uses context.payload.issue) - Cleaned up error handler to assume issue context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
This PR is finally ready for final review and merge:
|
|
Thank you! |

FAQ Automation System with AI-Powered Triage
This PR introduces a comprehensive automated FAQ management system that uses Retrieval-Augmented Generation (RAG) and LLM-based triage to intelligently process new FAQ proposals submitted via GitHub issues.
🎯 Overview
The system automatically analyzes new FAQ proposals and determines whether to:
🚀 Features
AI-Powered Decision Making
Automated Workflow
Developer Experience
📦 What's Included
1. FAQ Automation Module (
faq_automation/)Core Functions (
core.py)parse_frontmatter(): Extract YAML frontmatter from markdownwrite_frontmatter(): Write structured FAQ filesread_questions(): Load and parse all FAQs from a coursegenerate_document_id(): Create collision-resistant 10-char IDsfind_question_files(): Map document IDs to file pathsfind_largest_sort_order(): Determine next sort order numberRAG Agent (
rag_agent.py)FAQAgent: Main agent class for processing proposalsFAQDecision: Pydantic model for structured LLM outputsprocess_faq_proposal(): Convenience function for single proposalsGitHub Actions Integration (
actions.py)create_new_faq_file(): Generate new FAQ markdown filesupdate_existing_faq_file(): Update existing FAQ contentgenerate_pr_body(): Create detailed PR descriptionsgenerate_duplicate_comment(): Create helpful duplicate commentsCLI Tool (
cli.py)2. GitHub Integration
Issue Template (
.github/ISSUE_TEMPLATE/faq-proposal.yml)Fields:
Workflow (
.github/workflows/faq-automation.yml)Trigger: Issue opened with 'faq-proposal' label
Steps:
3. Testing
Unit Tests
test_faq_automation.py: Core function tests (frontmatter parsing, document ID generation, search result filtering)test_faq_actions.py: Action function tests (PR body generation, duplicate comment generation)test_cli_parsing.py: CLI parsing tests (issue body parsing, multi-line content handling, error cases)4. Documentation
README.md Updates
CONTRIBUTING.md (New)
5. Dependencies
New Dependencies (added to
pyproject.toml)minsearch>=0.4.1- Lightweight text searchopenai>=1.0.0- LLM integrationpydantic>=2.0.0- Data validation🔄 How It Works
User Flow
faq-proposallabel)Automation Flow
LLM Decision Process
The system sends GPT-4:
GPT-4 returns structured output with action, rationale, document_id, section_id, section_rationale, order, question, proposed_content, filename_slug, and warnings.
🛠️ Technical Details
RAG Pipeline
File Naming Convention
Frontmatter Structure
📋 Setup Requirements
Before Merging
Add OpenAI API Key
OPENAI_API_KEYsecretVerify Permissions
contents: write,issues: write,pull-requests: writeAfter Merging
Test the Workflow
Monitor Initial Runs
🧪 Testing
Run the test suite:
Test locally with CLI:
🎓 Course Support
Initial Support: machine-learning-zoomcamp
Future Expansion: The system is designed to support all courses. To add a new course:
_metadata.yaml📊 Expected Impact
For Contributors
For Maintainers
For Users
🔍 Example Outputs
NEW Decision PR
DUPLICATE Comment
🚦 Breaking Changes
None - this is a new feature addition.
📝 Notes
faq-proposallabel (applied automatically by template)--modelparameter✅ Checklist
🔗 Related
notebooks/rag.ipynbgenerate_website.py)Ready to Review! 🎉
Once merged and configured with OpenAI API key, the FAQ automation system will be live and ready to process proposals.