-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Now that we've dropped Python 3.9 support (#1910), we can modernize our codebase to use Python 3.10+ features for better readability and maintainability. This is a great opportunity for community contributions!
Background
Python 3.10 introduced several powerful features that make code more readable and Pythonic:
- PEP 604: Union types using
|
operator instead ofUnion[]
- PEP 585: Built-in collection types for generics (e.g.,
list
instead ofList
) - PEP 634: Structural pattern matching with
match/case
Our codebase has 500+ instances across 70+ files that can benefit from these modern idioms.
Modernization Categories
1. Type Hints with |
Operator
Replace Union[]
and Optional[]
with the more readable |
syntax.
Example:
# Before (Python 3.9 style)
from typing import Union, Optional, List, Dict
def oneshot(
model: Union[str, PreTrainedModel],
recipe: Optional[Union[str, List[str]]] = None,
dataset: Optional[Dict[str, Any]] = None,
) -> PreTrainedModel:
...
# After (Python 3.10+ style)
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from transformers import PreTrainedModel
def oneshot(
model: str | PreTrainedModel,
recipe: str | list[str] | None = None,
dataset: dict[str, Any] | None = None,
) -> PreTrainedModel:
...
Benefits:
- More concise and readable
- Aligns with modern Python style guides
- Reduces import overhead
Files affected: ~70 files, ~500 instances
2. Built-in Generic Types
Use built-in types (list
, dict
, tuple
, set
) instead of importing from typing
.
Example:
# Before
from typing import List, Dict, Tuple, Optional
def process_batch(
items: List[str],
config: Optional[Dict[str, int]] = None,
) -> Tuple[List[str], int]:
...
# After
def process_batch(
items: list[str],
config: dict[str, int] | None = None,
) -> tuple[list[str], int]:
...
Benefits:
- Cleaner imports
- Standard library usage
- Better IDE support
3. Structural Pattern Matching
Replace if/elif
isinstance chains with match/case
for type-based dispatch.
Example 1: Type dispatch
# Before
def log_value(log_tag: str, log_value: Any, epoch: float):
if isinstance(log_value, dict):
logger_manager.log_scalars(tag=log_tag, values=log_value, step=epoch)
elif isinstance(log_value, (int, float)):
logger_manager.log_scalar(tag=log_tag, value=log_value, step=epoch)
else:
logger_manager.log_string(tag=log_tag, string=log_value, step=epoch)
# After
def log_value(log_tag: str, log_value: Any, epoch: float):
match log_value:
case dict():
logger_manager.log_scalars(tag=log_tag, values=log_value, step=epoch)
case int() | float():
logger_manager.log_scalar(tag=log_tag, value=log_value, step=epoch)
case _:
logger_manager.log_string(tag=log_tag, string=log_value, step=epoch)
Example 2: Recursive type handling
# Before
def onload_value(value: Any, device: torch.device) -> Any:
if isinstance(value, torch.Tensor):
return value.to(device=device)
if isinstance(value, list):
return [onload_value(v, device) for v in value]
if isinstance(value, tuple):
return tuple(onload_value(v, device) for v in value)
if isinstance(value, dict):
return {k: onload_value(v, device) for k, v in value.items()}
return value
# After
def onload_value(value: Any, device: torch.device) -> Any:
match value:
case torch.Tensor():
return value.to(device=device)
case list():
return [onload_value(v, device) for v in value]
case tuple():
return tuple(onload_value(v, device) for v in value)
case dict():
return {k: onload_value(v, device) for k, v in value.items()}
case _:
return value
Example 3: Configuration handling
# Before
if splits is None:
splits = {"all": None}
elif isinstance(splits, str):
splits = {get_split_name(splits): splits}
elif isinstance(splits, list):
splits = {get_split_name(s): s for s in splits}
# After
match splits:
case None:
splits = {"all": None}
case str():
splits = {get_split_name(splits): splits}
case list():
splits = {get_split_name(s): s for s in splits}
Benefits:
- More explicit type handling
- Better readability
- Easier to extend with new types
- Type checkers can provide better analysis
Files affected: ~11 files with isinstance chains
How to Contribute
This is a great opportunity for first-time contributors! Each file can be updated independently.
Getting Started
- Pick a scope: Choose a single file or small module to modernize
- Check existing PRs: Look at example PRs to see the pattern (links will be added)
- Make changes: Update type hints and/or add pattern matching
- Test thoroughly: Run
make quality
and relevant tests - Submit PR: Reference this issue in your PR
Example Contribution Sizes
Small PR (Good First Issue) π’
- 1-2 files
- 10-20 type hint changes
- ~30 minutes of work
Medium PR π‘
- 1 module (3-5 files)
- Add pattern matching to 1-2 functions
- ~1-2 hours of work
Large PR π΄
- Complete module modernization
- Multiple pattern matching refactors
- ~3-4 hours of work
Suggested Files to Start With
Easy (Type Hints Only):
src/llmcompressor/args/*.py
- Dataclass argumentssrc/llmcompressor/core/helpers.py
- Helper functionssrc/llmcompressor/recipe/metadata.py
- Recipe metadata
Medium (Type Hints + Pattern Matching):
src/llmcompressor/core/helpers.py
-_log_model_loggable_items
functionsrc/llmcompressor/datasets/utils.py
- Dataset splits handlingsrc/llmcompressor/modifiers/*/base.py
- Individual modifier files
Advanced (Complex Pattern Matching):
src/llmcompressor/pipelines/cache.py
- Cache system with recursive typessrc/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py
- GPTQ logicsrc/llmcompressor/modifiers/pruning/sparsegpt/sgpt_base.py
- SparseGPT logic
Requirements for PRs
β Must have:
- All
make quality
checks pass (ruff formatting and linting) - Relevant tests pass (
pytest tests/{module} -v
) - No functional changes (type hints/style only)
- Clean commit messages
- Reference this issue (e.g., "Part of #XXX")
β Nice to have:
- Updated docstrings if they reference old type syntax
- Multiple files in same module (for consistency)
- Comments explaining complex pattern matches
Testing Guidelines
# Code quality (required)
make quality
# Run tests for your changed module
pytest tests/llmcompressor/core -v # For core/* changes
pytest tests/llmcompressor/modifiers -v # For modifiers/* changes
pytest tests/llmcompressor/transformers -v # For transformers/* changes
# Quick smoke test (recommended)
pytest tests -m smoke
# Full test suite (for core/entrypoints changes)
make test
Progress Tracking
We'll update this section as PRs are merged. Track overall progress here:
Overall Progress
- Type hints: 0/513 instances modernized (0%)
- Pattern matching: 0/11 files modernized (0%)
By Module
-
core/
(80 instances across 5 files) -
modifiers/
(81 instances across 15 files) -
transformers/
(42 instances across 8 files) -
entrypoints/
(33 instances across 3 files) -
metrics/
(137 instances in logger.py) -
args/
(27 instances across 4 files) - Other modules...
Resources
Python Enhancement Proposals:
- PEP 604 - Union Types as
X | Y
- PEP 634 - Structural Pattern Matching
- PEP 636 - Pattern Matching Tutorial
- PEP 585 - Type Hinting Generics
Guides:
Example PRs
We'll create 2-3 example PRs to demonstrate the patterns:
- Example 1: Type hints in a core module (simple)
- Example 2: Pattern matching in a helper function (medium)
- Example 3: Comprehensive modernization of a modifier (advanced)
Links will be added here once created.
Related Issues
- Drop support for python 3.9Β #1910 - Drop Python 3.9 support (merged)
Questions?
Feel free to ask questions in this issue! We're happy to help guide contributions.
Common questions:
- "Which file should I start with?" β Pick any file from the "Easy" list above
- "Can I mix type hints and pattern matching?" β Yes, but keep PRs focused on one module
- "How do I handle forward references?" β Use
TYPE_CHECKING
blocks (see examples above) - "What about breaking changes?" β None - these are syntax-only changes
Acknowledgments
Contributors who help modernize the codebase will be acknowledged in:
- PR reviews and merges
- Release notes
- This issue (we'll maintain a contributors list)
Thank you for helping make LLM Compressor more modern and maintainable! π