-
-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Title & Overview
Template: Data Augmentation & Fairness: An Intermediate, End-to-End Analysis Tutorial
Overview (≤2 sentences): Learners will implement text augmentation techniques (back-translation, paraphrasing, noising) and fairness checks (bias evaluation across slices). It is intermediate because it combines augmentation experiments with structured fairness/error analysis, reproducibility, and defensible reporting.
Purpose
The value-add is showing learners how to improve robustness with augmentation while also evaluating fairness impacts. This emphasizes reproducibility, slice-based fairness checks, and documenting trade-offs between performance gains and bias risks.
Prerequisites
- Skills: Python, Git, pandas, ML basics.
- NLP: embeddings, augmentation, fairness concepts (bias, group performance).
- Tooling: pandas, scikit-learn, Hugging Face Transformers + Datasets, nlpaug or TextAttack, MLflow, FastAPI.
Setup Instructions
-
Environment: Conda/Poetry (Python 3.11), deterministic seeds.
-
Install: pandas, scikit-learn, Hugging Face Transformers + Datasets, nlpaug or TextAttack, MLflow, FastAPI.
-
Datasets:
- Small: SST-2 (binary sentiment).
- Medium: Civil Comments or Jigsaw Toxicity (for bias/fairness slices).
-
Repo layout:
tutorials/t13-augmentation-fairness/ ├─ notebooks/ ├─ src/ │ ├─ augment.py │ ├─ fairness.py │ ├─ eval.py │ └─ config.yaml ├─ data/README.md ├─ reports/ └─ tests/
Core Concepts
- Augmentation strategies: back-translation, synonym/paraphrase replacement, noise injection.
- Fairness evaluation: slice metrics across sensitive attributes (gender, identity terms).
- Trade-offs: augmentation improves robustness but may amplify bias.
- Evaluation: macro-F1, calibration, per-slice fairness metrics.
- Reproducibility: log augmentation seeds, configs, and dataset versions.
Step-by-Step Walkthrough
-
Data intake & splits: load SST-2 and Civil Comments/Jigsaw; reproducible splits.
-
Augmentation baselines:
- Synonym replacement with WordNet.
- Back-translation (EN→FR→EN).
- Noise injection (typos, character swaps).
-
Model baselines: Logistic Regression (TF-IDF) and DistilBERT fine-tune.
-
Fairness evaluation: compute slice metrics by sensitive attribute groups (e.g., male/female, identity terms).
-
Error analysis: identify which groups see metric gains vs losses under augmentation.
-
Reporting: metrics tables, augmentation impact by slice, fairness analysis in
reports/t13-augmentation-fairness.md. -
(Optional) Serve: FastAPI endpoint with toggle for augmentation during training/inference; log fairness slice metrics.
Hands-On Exercises
- Ablations: original vs augmented training; synonym vs back-translation vs noise.
- Robustness: evaluate macro-F1 drop under domain-shift test set.
- Slice analysis: fairness metrics across sensitive groups before/after augmentation.
- Stretch: combine augmentation + debiasing (adversarial training, reweighting).
Common Pitfalls & Troubleshooting
- Over-augmentation: noisy data may harm clean test performance.
- Bias amplification: careless augmentation replicates group biases.
- Metrics misuse: reporting only aggregate F1 hides slice disparities.
- Reproducibility gaps: augmentation randomness must be seeded.
- OOM: back-translation is compute heavy — subset or batch.
Best Practices
- Always log augmentation configs (percent augmented, method, seeds).
- Compare augmentation strategies side by side.
- Evaluate both aggregate and slice metrics.
- Unit tests: verify augmentation reproducibility on fixed corpus.
- Guardrails: enforce maximum augmentation ratio to avoid overfitting noise.
Reflection & Discussion Prompts
- When does augmentation actually help vs hurt?
- How do fairness metrics change when training with noisy or synthetic data?
- What are ethical implications of bias amplification through augmentation?
Next Steps / Advanced Extensions
- Explore paraphrase generation with T5/BART.
- Apply counterfactual data augmentation for fairness (e.g., swapping gender terms).
- Lightweight monitoring: slice-level fairness metrics over time.
- Domain adaptation with augmentation + fairness auditing.
Glossary / Key Terms
Augmentation, back-translation, paraphrasing, noise injection, fairness, slice metrics, bias amplification.
Additional Resources
- [TextAttack](https://textattack.readthedocs.io/)
- [nlpaug](https://github.com/makcedward/nlpaug)
- [Hugging Face Datasets](https://huggingface.co/datasets)
- [MLflow](https://mlflow.org/)
- [FastAPI](https://fastapi.tiangolo.com/)
Contributors
Author(s): TBD
Reviewer(s): TBD
Maintainer(s): TBD
Date updated: 2025-09-20
Dataset licenses: SST-2 (GLUE), Civil Comments/Jigsaw (CC).
Issues Referenced
Epic: HfLA Text Analysis Tutorials (T0–T14).
This sub-issue: T13: Data Augmentation & Fairness.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status