Skip to content

[Python 3.10+] Modernize codebase with Python 3.10+ idioms and featuresΒ #1927

@rahul-tuli

Description

@rahul-tuli

Now that we've dropped Python 3.9 support (#1910), we can modernize our codebase to use Python 3.10+ features for better readability and maintainability. This is a great opportunity for community contributions!

Background

Python 3.10 introduced several powerful features that make code more readable and Pythonic:

  • PEP 604: Union types using | operator instead of Union[]
  • PEP 585: Built-in collection types for generics (e.g., list instead of List)
  • PEP 634: Structural pattern matching with match/case

Our codebase has 500+ instances across 70+ files that can benefit from these modern idioms.

Modernization Categories

1. Type Hints with | Operator

Replace Union[] and Optional[] with the more readable | syntax.

Example:

# Before (Python 3.9 style)
from typing import Union, Optional, List, Dict

def oneshot(
    model: Union[str, PreTrainedModel],
    recipe: Optional[Union[str, List[str]]] = None,
    dataset: Optional[Dict[str, Any]] = None,
) -> PreTrainedModel:
    ...

# After (Python 3.10+ style)
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
    from transformers import PreTrainedModel

def oneshot(
    model: str | PreTrainedModel,
    recipe: str | list[str] | None = None,
    dataset: dict[str, Any] | None = None,
) -> PreTrainedModel:
    ...

Benefits:

  • More concise and readable
  • Aligns with modern Python style guides
  • Reduces import overhead

Files affected: ~70 files, ~500 instances


2. Built-in Generic Types

Use built-in types (list, dict, tuple, set) instead of importing from typing.

Example:

# Before
from typing import List, Dict, Tuple, Optional

def process_batch(
    items: List[str],
    config: Optional[Dict[str, int]] = None,
) -> Tuple[List[str], int]:
    ...

# After
def process_batch(
    items: list[str],
    config: dict[str, int] | None = None,
) -> tuple[list[str], int]:
    ...

Benefits:

  • Cleaner imports
  • Standard library usage
  • Better IDE support

3. Structural Pattern Matching

Replace if/elif isinstance chains with match/case for type-based dispatch.

Example 1: Type dispatch

# Before
def log_value(log_tag: str, log_value: Any, epoch: float):
    if isinstance(log_value, dict):
        logger_manager.log_scalars(tag=log_tag, values=log_value, step=epoch)
    elif isinstance(log_value, (int, float)):
        logger_manager.log_scalar(tag=log_tag, value=log_value, step=epoch)
    else:
        logger_manager.log_string(tag=log_tag, string=log_value, step=epoch)

# After
def log_value(log_tag: str, log_value: Any, epoch: float):
    match log_value:
        case dict():
            logger_manager.log_scalars(tag=log_tag, values=log_value, step=epoch)
        case int() | float():
            logger_manager.log_scalar(tag=log_tag, value=log_value, step=epoch)
        case _:
            logger_manager.log_string(tag=log_tag, string=log_value, step=epoch)

Example 2: Recursive type handling

# Before
def onload_value(value: Any, device: torch.device) -> Any:
    if isinstance(value, torch.Tensor):
        return value.to(device=device)

    if isinstance(value, list):
        return [onload_value(v, device) for v in value]

    if isinstance(value, tuple):
        return tuple(onload_value(v, device) for v in value)

    if isinstance(value, dict):
        return {k: onload_value(v, device) for k, v in value.items()}

    return value

# After
def onload_value(value: Any, device: torch.device) -> Any:
    match value:
        case torch.Tensor():
            return value.to(device=device)
        case list():
            return [onload_value(v, device) for v in value]
        case tuple():
            return tuple(onload_value(v, device) for v in value)
        case dict():
            return {k: onload_value(v, device) for k, v in value.items()}
        case _:
            return value

Example 3: Configuration handling

# Before
if splits is None:
    splits = {"all": None}
elif isinstance(splits, str):
    splits = {get_split_name(splits): splits}
elif isinstance(splits, list):
    splits = {get_split_name(s): s for s in splits}

# After
match splits:
    case None:
        splits = {"all": None}
    case str():
        splits = {get_split_name(splits): splits}
    case list():
        splits = {get_split_name(s): s for s in splits}

Benefits:

  • More explicit type handling
  • Better readability
  • Easier to extend with new types
  • Type checkers can provide better analysis

Files affected: ~11 files with isinstance chains


How to Contribute

This is a great opportunity for first-time contributors! Each file can be updated independently.

Getting Started

  1. Pick a scope: Choose a single file or small module to modernize
  2. Check existing PRs: Look at example PRs to see the pattern (links will be added)
  3. Make changes: Update type hints and/or add pattern matching
  4. Test thoroughly: Run make quality and relevant tests
  5. Submit PR: Reference this issue in your PR

Example Contribution Sizes

Small PR (Good First Issue) 🟒

  • 1-2 files
  • 10-20 type hint changes
  • ~30 minutes of work

Medium PR 🟑

  • 1 module (3-5 files)
  • Add pattern matching to 1-2 functions
  • ~1-2 hours of work

Large PR πŸ”΄

  • Complete module modernization
  • Multiple pattern matching refactors
  • ~3-4 hours of work

Suggested Files to Start With

Easy (Type Hints Only):

  • src/llmcompressor/args/*.py - Dataclass arguments
  • src/llmcompressor/core/helpers.py - Helper functions
  • src/llmcompressor/recipe/metadata.py - Recipe metadata

Medium (Type Hints + Pattern Matching):

  • src/llmcompressor/core/helpers.py - _log_model_loggable_items function
  • src/llmcompressor/datasets/utils.py - Dataset splits handling
  • src/llmcompressor/modifiers/*/base.py - Individual modifier files

Advanced (Complex Pattern Matching):

  • src/llmcompressor/pipelines/cache.py - Cache system with recursive types
  • src/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py - GPTQ logic
  • src/llmcompressor/modifiers/pruning/sparsegpt/sgpt_base.py - SparseGPT logic

Requirements for PRs

βœ… Must have:

  • All make quality checks pass (ruff formatting and linting)
  • Relevant tests pass (pytest tests/{module} -v)
  • No functional changes (type hints/style only)
  • Clean commit messages
  • Reference this issue (e.g., "Part of #XXX")

βœ… Nice to have:

  • Updated docstrings if they reference old type syntax
  • Multiple files in same module (for consistency)
  • Comments explaining complex pattern matches

Testing Guidelines

# Code quality (required)
make quality

# Run tests for your changed module
pytest tests/llmcompressor/core -v                    # For core/* changes
pytest tests/llmcompressor/modifiers -v               # For modifiers/* changes
pytest tests/llmcompressor/transformers -v            # For transformers/* changes

# Quick smoke test (recommended)
pytest tests -m smoke

# Full test suite (for core/entrypoints changes)
make test

Progress Tracking

We'll update this section as PRs are merged. Track overall progress here:

Overall Progress

  • Type hints: 0/513 instances modernized (0%)
  • Pattern matching: 0/11 files modernized (0%)

By Module

  • core/ (80 instances across 5 files)
  • modifiers/ (81 instances across 15 files)
  • transformers/ (42 instances across 8 files)
  • entrypoints/ (33 instances across 3 files)
  • metrics/ (137 instances in logger.py)
  • args/ (27 instances across 4 files)
  • Other modules...

Resources

Python Enhancement Proposals:

Guides:

Example PRs

We'll create 2-3 example PRs to demonstrate the patterns:

  • Example 1: Type hints in a core module (simple)
  • Example 2: Pattern matching in a helper function (medium)
  • Example 3: Comprehensive modernization of a modifier (advanced)

Links will be added here once created.

Related Issues

Questions?

Feel free to ask questions in this issue! We're happy to help guide contributions.

Common questions:

  • "Which file should I start with?" β†’ Pick any file from the "Easy" list above
  • "Can I mix type hints and pattern matching?" β†’ Yes, but keep PRs focused on one module
  • "How do I handle forward references?" β†’ Use TYPE_CHECKING blocks (see examples above)
  • "What about breaking changes?" β†’ None - these are syntax-only changes

Acknowledgments

Contributors who help modernize the codebase will be acknowledged in:

  • PR reviews and merges
  • Release notes
  • This issue (we'll maintain a contributors list)

Thank you for helping make LLM Compressor more modern and maintainable! πŸš€

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueA good first issue for users wanting to contribute

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions