Skip to content

Replace Bare Exception Handling with Specific Exception Types #7

Description

@Huneybaked

The codebase contains 25+ instances of overly broad except Exception: blocks that catch all exceptions indiscriminately. This makes debugging difficult and can mask serious bugs.

Examples

Example 1: Image Web Task (image/web.py)

try:
    dataset = load_dataset(dataset_id)
except Exception:  # ❌ TOO BROAD
    logger.error(f"Failed to load dataset {dataset_id}")
    return None

Problems:

  • Network errors, auth errors, and malformed data all handled the same way
  • Swallows keyboard interrupts and system exits
  • No actionable error information logged

Example 2: Document Parsing

try:
    parsed_doc = parse_pdf(file_path)
except Exception as e:  # ❌ TOO BROAD
    logger.warning(f"Parse failed: {e}")
    pass  # ❌ Silent failure

Impact

  • Debugging Nightmare - "Something failed" doesn't help
  • Silent Failures - Critical errors hidden
  • Incorrect Recovery - Can't distinguish recoverable vs. fatal errors
  • Masks Bugs - Programming errors treated like expected failures

Affected Files

Based on codebase analysis:

  • sdgsystem/tasks/image/web.py (multiple instances)
  • sdgsystem/documents/parse.py
  • sdgsystem/documents/load.py
  • sdgsystem/huggingface/download.py
  • sdgsystem/parallel.py
  • Others across the codebase

Recommended Fix Pattern

Pattern 1: Specific Exception Types

from requests.exceptions import RequestException, Timeout
from datasets.exceptions import DatasetNotFoundError

try:
    dataset = load_dataset(dataset_id)
except DatasetNotFoundError:
    logger.error(f"Dataset not found: {dataset_id}")
    return None
except Timeout:
    logger.error(f"Timeout loading {dataset_id} - retrying...")
    # Implement retry logic
except RequestException as e:
    logger.error(f"Network error: {e}", exc_info=True)
    raise

Pattern 2: Custom Exceptions

# Define in exceptions.py
class DocumentParseError(Exception):
    """Raised when document parsing fails."""
    pass

class InvalidDatasetError(Exception):
    """Raised when dataset format is invalid."""
    pass

# Use in code
try:
    doc = parse_pdf(path)
except PyMuPDFError as e:
    raise DocumentParseError(f"Failed to parse {path}: {e}") from e

Pattern 3: Fail Fast on Unexpected Errors

try:
    critical_operation()
except KnownRecoverableError:
    handle_gracefully()
except Exception as e:
    # Log full traceback for unexpected errors
    logger.exception(f"Unexpected error in critical_operation: {e}")
    raise  # ✅ Don't swallow unknown errors

Suggested Exception Hierarchy

# sdgsystem/exceptions.py
class SynDataError(Exception):
    """Base exception for all SynData errors."""
    pass

class ConfigurationError(SynDataError):
    """Invalid configuration."""
    pass

class DataGenerationError(SynDataError):
    """Error during data generation."""
    pass

class ModelClientError(SynDataError):
    """LLM/VLM client error."""
    pass

class DocumentProcessingError(SynDataError):
    """Document parsing/chunking error."""
    pass

class DatasetError(SynDataError):
    """Dataset loading/validation error."""
    pass

Implementation Plan

  1. Create sdgsystem/exceptions.py with custom exception hierarchy
  2. Audit all except Exception: occurrences
  3. Replace with specific exception types
  4. Add proper logging with exc_info=True for tracebacks
  5. Document exception handling in CONTRIBUTING.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions