Skip to content

Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10-K, 10-Q, 13-D, S-1, 8-K, etc.)

License

Notifications You must be signed in to change notification settings

ryansmccoy/py-sec-edgar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

py-sec-edgar: Professional SEC EDGAR Filing Processor

PyPI version Python 3.10+ License: MIT Documentation Status Code style: ruff Tests Coverage GitHub stars

A powerful, modern Python application for downloading, processing, and analyzing SEC EDGAR filings with professional-grade workflow automation.

πŸš€ Quick Start β€’ πŸ“– Documentation β€’ πŸ’Ό Examples β€’ πŸ”§ Installation β€’ ⚑ Workflows


py-sec-edgar transforms complex SEC filing data into accessible, structured information with enterprise-grade reliability and ease of use:

🎯 Key Features

  • πŸ—οΈ Professional Workflow System: Four specialized workflows for different data collection needs
  • ⚑ High-Performance Processing: Efficient bulk download and processing of SEC archives
  • πŸŽ›οΈ Advanced Filtering: Filter by ticker symbols, form types, date ranges, and more
  • πŸ”„ Real-Time Monitoring: RSS feed integration for live filing notifications
  • πŸ“Š Structured Data Extraction: Extract and parse filing contents automatically
  • πŸ›‘οΈ Enterprise-Ready: Robust error handling, logging, and configuration management
  • 🐍 Modern Python: Built with Python 3.10+, type hints, and modern best practices

πŸŽͺ Use Cases

  • πŸ“ˆ Investment Research: Download 10-K/10-Q filings for fundamental analysis
  • πŸ” Compliance Monitoring: Track insider trading (Form 4) and ownership changes
  • πŸ“° News & Events: Monitor 8-K filings for material corporate events
  • 🏫 Academic Research: Bulk download historical filing data for studies
  • πŸ€– Machine Learning: Create datasets for NLP and financial prediction models
  • πŸ“Š Portfolio Management: Automated due diligence for investment portfolios

πŸš€ Quick Start

Get up and running with py-sec-edgar in under 2 minutes:

1. Install with uv (Recommended)

# Install uv if you haven't already
pip install uv

# Clone and setup the project
git clone https://github.com/ryansmccoy/py-sec-edgar.git
cd py-sec-edgar

# Install dependencies
uv sync

# Verify installation
uv run python -m py_sec_edgar --help

2. Your First Filing Download

# First, explore what's available without downloading (safe exploration)
uv run python -m py_sec_edgar workflows rss --show-entries --count 10 --list-only

# See what Apple filings are available without downloading
uv run python -m py_sec_edgar workflows daily --tickers AAPL --days-back 7 --forms "8-K" --no-download

# When ready, download Apple's latest 10-K annual report (includes 2025Q3 data)
uv run python -m py_sec_edgar workflows full-index --tickers AAPL --forms "10-K" --download --extract

# Process the latest quarterly data (2025Q3)
uv run python -m py_sec_edgar workflows full-index --quarter 2025Q3 --download --extract

# Monitor recent filings for your portfolio (explore first, then download)
uv run python -m py_sec_edgar workflows daily --tickers AAPL --tickers MSFT --tickers GOOGL --days-back 7 --forms "8-K" --no-download
# When satisfied, add --download flag to actually download files

# Monitor Apple's earnings announcement from August 1, 2024
uv run python -m py_sec_edgar workflows daily --tickers AAPL --start-date 2024-08-01 --end-date 2024-08-01 --forms "8-K" --download --extract

# Real-time RSS monitoring (list mode for exploration)
uv run python -m py_sec_edgar workflows rss --show-entries --count 10 --list-only

3. Explore Your Data

# Your downloaded filings are organized like this:
sec_data/
β”œβ”€β”€ Archives/edgar/data/
β”‚   └── 320193/                    # Apple's CIK
β”‚       └── 000032019324000123/    # Specific filing
β”‚           β”œβ”€β”€ aapl-20240930.htm  # Main 10-K document
β”‚           β”œβ”€β”€ exhibits/          # All exhibits
β”‚           └── Financial_Report.xlsx  # Structured financial data

πŸ”§ Installation

Prerequisites

  • Python 3.10+ (Required)
  • uv package manager (Recommended) or pip
  • 5GB+ disk space for substantial data collection

Method 1: Development Installation (Recommended)

# Clone the repository
git clone https://github.com/ryansmccoy/py-sec-edgar.git
cd py-sec-edgar

# Install with uv (handles everything automatically)
uv sync

# Install with pip (alternative)
pip install -e .

Method 2: Direct Installation

# Install from PyPI
pip install py-sec-edgar

# Or with uv
uv pip install py-sec-edgar

Method 3: Production Installation

# For production environments
uv pip install py-sec-edgar[prod]

# For development with all tools
uv sync --extra dev

⚑ Powerful Workflows

py-sec-edgar provides four specialized workflows, each optimized for different use cases. Each workflow has comprehensive documentation with dozens of real-world examples:

Workflow Best For Data Source Time Range Full Documentation
πŸ“š Full Index Historical research, bulk analysis Quarterly archives All historical data πŸ“– Complete Guide
πŸ“… Daily Recent monitoring, current events Daily index feeds Last 1-90 days πŸ“– Complete Guide
πŸ“Š Monthly XBRL structured data Monthly XBRL archives Monthly intervals πŸ“– Complete Guide
πŸ“‘ RSS Real-time monitoring Live RSS feeds Real-time updates πŸ“– Complete Guide

πŸ“… SEC Data Availability & Update Schedule

Understanding when SEC data is available helps you choose the right workflow for your needs:

Data Type Update Frequency Availability Best Workflow Notes
πŸ”΄ Live Filings Real-time As filed RSS Immediate access to new filings
πŸ“Š Daily Index Nightly at 10 PM ET Previous business day Daily Complete daily filing lists
πŸ“ˆ Full Index Updated throughout quarter Current quarter + historical Full Index Comprehensive quarterly data
πŸ“‹ Quarterly Index End of quarter Complete quarter (static) Full Index Final quarterly archives
πŸ”„ Weekly Rebuild Saturday mornings All corrected data All workflows Post-acceptance corrections included

Key Update Schedule Details:

  • πŸŒ™ Daily Index Files: Updated nightly starting around 10:00 PM ET with the previous business day's filings
  • πŸ“Š Full Index Files: Updated continuously throughout the current quarter, including all filings from quarter start through the previous business day
  • πŸ“… Quarterly Index Files: Static archives created at quarter-end containing the complete, final quarterly data
  • πŸ”§ Weekly Rebuilds: Every Saturday morning, all full and quarterly index files are rebuilt to incorporate post-acceptance corrections and amendments
  • ⚑ Real-time RSS: Live feed updated immediately as filings are accepted by the SEC

πŸ“– Data Currency Best Practices:

  • For current events: Use RSS workflow for immediate access to breaking filings
  • For recent activity: Use Daily workflow for systematic monitoring of the last 1-90 days
  • For historical research: Use Full Index workflow for comprehensive quarterly archives
  • For completeness: Wait until Saturday morning rebuild for the most accurate quarterly data

πŸ’‘ Pro Tip: Each workflow documentation contains 20+ practical examples, from basic usage to advanced enterprise patterns. Start with the Workflow Documentation Hub for complete coverage!

πŸ“š Full Index Workflow

Perfect for comprehensive historical analysis and bulk data collection

# First, explore what's available for Apple without downloading
uv run python -m py_sec_edgar workflows full-index --tickers AAPL --no-download

# When ready, download all Apple filings from quarterly archives
uv run python -m py_sec_edgar workflows full-index --tickers AAPL --download

# Process the latest quarterly data (2025Q3) with extraction
uv run python -m py_sec_edgar workflows full-index --quarter 2025Q3 --download --extract

# Investment research: Explore tech giants first, then download
uv run python -m py_sec_edgar workflows full-index \
    --tickers AAPL --tickers MSFT --tickers GOOGL --tickers AMZN --tickers META \
    --forms "10-K" \
    --no-download  # Remove this flag when ready to download

# Academic research: Fortune 500 analysis with latest data
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/fortune500.csv \
    --forms "10-K" "10-Q" \
    --quarter 2025Q3 \
    --download --extract

πŸ“… Daily Workflow

Ideal for monitoring recent activity and staying current

# Explore yesterday's filings without downloading first
uv run python -m py_sec_edgar workflows daily --days-back 1 --no-download

# When ready, download yesterday's filings
uv run python -m py_sec_edgar workflows daily --days-back 1 --download

# Weekly portfolio monitoring (explore first)
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/portfolio.csv \
    --days-back 7 \
    --forms "8-K" "4" \
    --no-download  # Remove this flag when ready to download

# Monitor Apple's specific earnings announcement (August 1, 2024)
uv run python -m py_sec_edgar workflows daily \
    --tickers AAPL \
    --start-date 2024-08-01 \
    --end-date 2024-08-01 \
    --forms "8-K" \
    --download --extract  # Direct download since we know what we want

πŸ“Š Monthly Workflow

Specialized for XBRL structured financial data

# Explore what structured financial data is available (6 months)
uv run python -m py_sec_edgar workflows monthly --months-back 6 --no-download

# Download structured financial data when ready
uv run python -m py_sec_edgar workflows monthly --months-back 6 --download

# Focus on specific companies with extraction
uv run python -m py_sec_edgar workflows monthly \
    --tickers AAPL --tickers MSFT \
    --months-back 12 \
    --download --extract

πŸ“‘ RSS Workflow

Real-time monitoring and live feed processing

# Explore latest filings in real-time (safe exploration)
uv run python -m py_sec_edgar workflows rss --show-entries --count 20 --list-only

# Monitor specific companies (list mode first)
uv run python -m py_sec_edgar workflows rss \
    --query-ticker AAPL \
    --count 10 \
    --show-entries --list-only

# When ready to process/download, remove --list-only flag
uv run python -m py_sec_edgar workflows rss \
    --query-ticker AAPL \
    --count 10 \
    --download

# Save RSS data for analysis (no download, just save feed data)
uv run python -m py_sec_edgar workflows rss \
    --save-file rss_filings.json \
    --count 100 \
    --list-only

πŸ’Ό Comprehensive Examples

🏒 Investment Research Workflow

Scenario: You're analyzing potential investments in the renewable energy sector.

# Step 1: Use the provided renewable energy ticker list
# File: examples/renewable_energy.csv (already created)

# Step 2: Explore historical annual reports first (no download)
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/renewable_energy.csv \
    --forms "10-K" \
    --no-download

# Step 3: When ready, get historical annual reports with extraction
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/renewable_energy.csv \
    --forms "10-K" \
    --download --extract

# Step 4: Process specific quarterly filings (2025Q3)
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/renewable_energy.csv \
    --quarter 2025Q3 \
    --forms "10-Q" \
    --download --extract

# Step 5: Monitor recent Tesla activity (last 30 days for better data coverage)
uv run python -m py_sec_edgar workflows daily \
    --tickers TSLA \
    --days-back 30 \
    --forms "8-K" \
    --no-download  # Explore first, then add --download when ready

# Step 6: Set up real-time monitoring (exploration mode)
uv run python -m py_sec_edgar workflows rss \
    --query-ticker TSLA \
    --count 10 \
    --show-entries --list-only

Result: Complete dataset with historical context, recent activity, and real-time monitoring setup.

πŸ“Š Academic Research Pipeline

Scenario: Studying CEO compensation trends across S&P 500 companies.

# Step 1: Explore proxy statements availability (no download)
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/sp500_tickers.csv \
    --forms "DEF 14A" \
    --no-download

# Step 2: Download proxy statements when ready
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/sp500_tickers.csv \
    --forms "DEF 14A" \
    --download --extract

# Step 3: Process latest quarterly data (2025Q3) for comprehensive analysis
uv run python -m py_sec_edgar workflows full-index \
    --ticker-file examples/sp500_tickers.csv \
    --quarter 2025Q3 \
    --forms "10-Q" "DEF 14A" \
    --download --extract

# Step 4: Get recent quarterly filings (last 60 days for good data coverage)
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/sp500_tickers.csv \
    --days-back 60 \
    --forms "10-Q" \
    --no-download  # Explore first

# Step 5: Extract structured financial data for analysis
uv run python -m py_sec_edgar workflows monthly \
    --ticker-file examples/sp500_tickers.csv \
    --months-back 12 \
    --download --extract

πŸ” Compliance Monitoring System

Scenario: Monitor insider trading and ownership changes for your portfolio.

# Step 1: Explore recent insider trading (Form 4) - last 7 days
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/portfolio.csv \
    --days-back 7 \
    --forms "4" \
    --no-download  # Explore first

# Step 2: When ready, download recent insider trading data
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/portfolio.csv \
    --days-back 14 \
    --forms "4" \
    --download --extract

# Step 3: Track large ownership changes (last 30 days)
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/portfolio.csv \
    --days-back 30 \
    --forms "SC 13G" "SC 13D" \
    --download --extract

# Step 4: Set up real-time insider trading alerts (exploration mode)
uv run python -m py_sec_edgar workflows rss \
    --query-form "4" \
    --count 25 \
    --show-entries --list-only

πŸ“° News & Events Monitoring

Scenario: Stay ahead of market-moving news with automated 8-K monitoring.

# Monitor Apple's recent activity (last 30 days for good coverage)
uv run python -m py_sec_edgar workflows daily \
    --tickers AAPL \
    --days-back 30 \
    --forms "8-K" \
    --no-download  # Explore first, then add --download

# Monitor Tesla's recent activity (last 30 days)
uv run python -m py_sec_edgar workflows daily \
    --tickers TSLA \
    --days-back 30 \
    --forms "8-K" \
    --no-download  # Explore first

# When ready to download Apple's recent annual reports
uv run python -m py_sec_edgar workflows daily \
    --tickers AAPL \
    --days-back 90 \
    --forms "10-K" \
    --download --extract

# Set up comprehensive current events monitoring (exploration mode)
uv run python -m py_sec_edgar workflows rss \
    --query-form "8-K" \
    --show-entries \
    --count 25 \
    --list-only

# Advanced: Monitor multiple companies for 8-K filings
uv run python -m py_sec_edgar workflows daily \
    --ticker-file examples/portfolio.csv \
    --days-back 14 \
    --forms "8-K" \
    --no-download

πŸ—‚οΈ Understanding SEC Filings

py-sec-edgar makes it easy to work with SEC filings, but understanding what each form contains helps you choose the right data:

πŸ“‹ Essential Form Types

Form Description Frequency Key Content
10-K Annual Report Yearly Complete business overview, audited financials, risk factors
10-Q Quarterly Report Quarterly Unaudited quarterly financials, updates since last 10-K
8-K Current Events As needed Material corporate events, breaking news
DEF 14A Proxy Statement Annually Executive compensation, board elections, shareholder proposals
4 Insider Trading Within 2 days Executive stock transactions
SC 13G/D Beneficial Ownership When threshold crossed Large shareholder positions (>5%)

πŸ—οΈ How SEC Data is Organized

SEC Website Structure:

https://www.sec.gov/Archives/edgar/data/[CIK]/[AccessionNumber]/[Filename]

py-sec-edgar Local Structure:

sec_data/
β”œβ”€β”€ Archives/edgar/
β”‚   β”œβ”€β”€ full-index/           # Downloaded quarterly archives
β”‚   β”‚   β”œβ”€β”€ 2024/QTR1/
β”‚   β”‚   β”œβ”€β”€ 2024/QTR2/
β”‚   β”‚   └── 2025/QTR3/        # Latest quarterly data
β”‚   └── data/                 # Extracted filing contents
β”‚       └── [CIK]/            # Company folders (e.g., 320193 for Apple)
β”‚           └── [Filing]/     # Individual filing folders
β”‚               β”œβ”€β”€ main_document.htm
β”‚               β”œβ”€β”€ exhibits/
β”‚               └── Financial_Report.xlsx

πŸ” Understanding Company Identifiers

Central Index Key (CIK): Unique numerical identifier assigned by SEC

  • Example: Apple Inc. = 320193
  • Permanent, never recycled
  • Used in all SEC filings and URLs

Ticker Symbol: Stock exchange trading symbol

  • Example: AAPL for Apple Inc.
  • Can change due to rebranding, mergers
  • py-sec-edgar handles ticker-to-CIK mapping automatically

πŸ“Š Filing Statistics (Historical Context)

Form Type Total Filings Average per Year Primary Use Case
Form 4 6,420,154 ~800,000 Insider trading monitoring
8-K 1,473,193 ~180,000 Breaking news and events
10-Q 552,059 ~70,000 Quarterly earnings analysis
10-K 180,787 ~22,000 Annual comprehensive analysis
13F-HR 224,996 ~28,000 Institutional holdings tracking

πŸŽ›οΈ Advanced Configuration

βš™οΈ Environment Configuration

py-sec-edgar works out of the box with sensible defaults from .env.example. For custom configuration, create a .env file:

# Copy the example file and customize
cp .env.example .env

Key environment variables:

# SEC Data Directory (cross-platform)
SEC_DATA_DIR=./sec_data

# User Agent (Required by SEC)
USER_AGENT="YourCompany [email protected]"

# Request Settings (Conservative defaults)
REQUEST_DELAY=5.5
MAX_RETRIES=3

# Logging Configuration
LOG_LEVEL=WARNING
DEBUG=false

πŸ’‘ Important: You must update USER_AGENT with your contact information for production use, as required by SEC guidelines.

πŸ“ Ticker File Format

Create CSV files with ticker symbols (or use the provided examples):

# examples/portfolio.csv
TICKER
AAPL
MSFT
GOOGL
AMZN
TSLA

# Or simple format
AAPL
MSFT
GOOGL

πŸ”§ Programmatic Usage

py-sec-edgar provides two Python APIs for programmatic usage:

Simple API (SEC class) - Quick Downloads

from py_sec_edgar import SEC, Forms

async with SEC(data_dir="./sec_data") as sec:
    # Download filings for specific companies
    result = await sec.download(
        tickers=["AAPL", "MSFT"],
        forms=[Forms.FORM_10K],
        days=365
    )
    print(f"Downloaded {result.file_count} files")

    # List downloaded filings
    filings = await sec.list_filings(ticker="AAPL")

Advanced API (SECFeed class) - Full FeedSpine Integration

from py_sec_edgar import SECFeed, SECFeedConfig
from py_sec_edgar.reporters import RichProgressReporter

# SECFeed provides: DuckDB storage, blob storage, search, caching
async with SECFeed(
    tickers=["AAPL", "MSFT"],
    forms=["10-K", "10-Q"],
    days=365,
    enable_search=True,
    enable_cache=True,
) as feed:
    # Collect with progress reporting
    await feed.collect(progress=RichProgressReporter())

    # Typed access to filings
    async for filing in feed.filings(form_type="10-K"):
        print(f"{filing.content.company_name}: {filing.content.accession_number}")

    # Full-text search
    results = await feed.search("revenue growth", limit=10)

    # Download documents (cached in blob storage)
    doc = await feed.download_document(filing_url)

Workflow Functions

from py_sec_edgar.workflows import (
    run_full_index_workflow,
    run_daily_workflow,
    run_monthly_workflow,
    run_rss_workflow
)

# Run full index workflow
run_full_index_workflow(
    tickers=["AAPL", "MSFT"],
    forms=["10-K", "10-Q"],
    extract=True
)

# Monitor recent filings
run_daily_workflow(
    tickers=["AAPL", "MSFT"],
    days_back=7,
    forms=["8-K"],
    extract=True
)

πŸ”¨ Development & Contribution

πŸ—οΈ Development Setup

# Clone repository
git clone https://github.com/ryansmccoy/py-sec-edgar.git
cd py-sec-edgar

# Setup development environment
uv sync --extra dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run linting
uv run ruff check
uv run ruff format

# Type checking
uv run mypy src/

πŸ§ͺ Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=py_sec_edgar --cov-report=html

# Run specific test categories
uv run pytest -m "not slow"          # Skip slow tests
uv run pytest -m integration         # Integration tests only
uv run pytest tests/test_filing.py   # Specific test file

πŸ“Š Performance Testing

# Test with small dataset
uv run python -m py_sec_edgar workflows full-index \
    --tickers AAPL \
    --forms "10-K" \
    --no-extract

# Benchmark larger operations
time uv run python -m py_sec_edgar workflows daily \
    --tickers AAPL --tickers MSFT --tickers GOOGL \
    --days-back 30 \
    --extract

πŸ“– Documentation

πŸ“š Comprehensive Guides

πŸ”— Quick References


🚨 Important Notes

βš–οΈ SEC Compliance

  • User Agent Required: The SEC requires a proper User-Agent header with your contact information
  • Rate Limiting: py-sec-edgar includes respectful rate limiting (0.1s delay by default)
  • Fair Use: Please be respectful of SEC resources and don't overwhelm their servers

πŸ’Ύ Storage Requirements

  • Full Index Processing: Can generate several GB of data per quarter
  • Extracted Content: Individual filings can be 10-100MB when extracted
  • Recommendation: Start with specific tickers/forms, then scale up

πŸ”’ Data Privacy

  • Public Data Only: All data accessed is publicly available SEC filings
  • No Personal Info: py-sec-edgar only accesses corporate disclosure documents
  • Compliance Ready: Suitable for professional and academic use

🀝 Contributing

We welcome contributions! Here's how to get started:

πŸ› Reporting Issues

  1. Check existing issues
  2. Create detailed bug reports with examples
  3. Include system information and error logs

πŸ”§ Contributing Code

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with tests
  4. Run the test suite: uv run pytest
  5. Submit a pull request

πŸ“ Contributing Documentation

  • Improve existing documentation
  • Add new examples and use cases
  • Create tutorials for specific workflows

πŸ“„ License

py-sec-edgar is dual-licensed:

  • Personal Use: MIT License (free for personal, educational, and research use)
  • Commercial Use: GNU AGPLv3 License (free with copyleft requirements)
  • Business Licensing: Contact [email protected] for commercial licensing options

See LICENSE for full details.


πŸ“ž Support & Community

πŸ’¬ Getting Help

πŸ“§ Professional Support

  • Business Inquiries: [email protected]
  • Commercial Licensing: Available for enterprise use
  • Custom Development: Professional services available

πŸ™ Acknowledgments

  • SEC EDGAR System: For providing free access to corporate filing data
  • Python Community: For the excellent libraries that make this project possible
  • Contributors: Everyone who has contributed code, documentation, and feedback
  • Users: The community that drives continuous improvement

⭐ Star this repository if py-sec-edgar helps your financial analysis! ⭐

Built with ❀️ for the financial analysis and research community

🏠 Homepage β€’ πŸ“– Docs β€’ πŸ› Issues β€’ πŸ’¬ Discussions

About

Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10-K, 10-Q, 13-D, S-1, 8-K, etc.)

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •