Skip to content

galafis/ibm-data-science-capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Ibm Data Science Capstone

IBM Data Science Professional Certificate Capstone Project

Python NumPy Pandas Plotly scikit--learn Streamlit License

English | Português


English

🎯 Overview

Ibm Data Science Capstone is a production-grade Python application that showcases modern software engineering practices including clean architecture, comprehensive testing, containerized deployment, and CI/CD readiness.

The codebase comprises 142 lines of source code organized across 5 modules, following industry best practices for maintainability, scalability, and code quality.

✨ Key Features

  • 🔄 Data Pipeline: Scalable ETL with parallel processing
  • ✅ Data Validation: Schema validation and quality checks
  • 📊 Monitoring: Pipeline health metrics and alerting
  • 🔧 Configurability: YAML/JSON-based pipeline configuration
  • 🏗️ Object-Oriented: 2 core classes with clean architecture

🏗️ Architecture

graph LR
    subgraph Input["📥 Input"]
        A[Raw Data]
        B[Feature Config]
    end
    
    subgraph Pipeline["🔬 ML Pipeline"]
        C[Preprocessing]
        D[Feature Engineering]
        E[Model Training]
        F[Evaluation]
    end
    
    subgraph Output["📤 Output"]
        G[Trained Models]
        H[Metrics & Reports]
        I[Predictions]
    end
    
    A --> C --> D --> E --> F
    B --> D
    F --> G
    F --> H
    G --> I
    
    style Input fill:#e1f5fe
    style Pipeline fill:#f3e5f5
    style Output fill:#e8f5e9
Loading

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/ibm-data-science-capstone.git
cd ibm-data-science-capstone

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

🧪 Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov --cov-report=html

# Run specific test module
pytest tests/test_main.py -v

# Run with detailed output
pytest -v --tb=short

📁 Project Structure

ibm-data-science-capstone/
├── docs/          # Documentation
│   ├── api_documentation.md
│   └── user_guide.md
├── src/          # Source code
│   ├── data_science_pipeline.py
│   └── main_platform.py
├── tests/         # Test suite
│   ├── __init__.py
│   ├── performance_test.py
│   └── test_platform.py
├── LICENSE
├── README.md
└── requirements.txt

🛠️ Tech Stack

Technology Description Role
Python Core Language Primary
NumPy Numerical computing Framework
Pandas Data manipulation library Framework
Plotly Interactive visualization Framework
scikit-learn Machine learning library Framework
Streamlit Data app framework Framework

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Gabriel Demetrios Lafis


Português

🎯 Visão Geral

Ibm Data Science Capstone é uma aplicação Python de nível profissional que demonstra práticas modernas de engenharia de software, incluindo arquitetura limpa, testes abrangentes, implantação containerizada e prontidão para CI/CD.

A base de código compreende 142 linhas de código-fonte organizadas em 5 módulos, seguindo as melhores práticas do setor para manutenibilidade, escalabilidade e qualidade de código.

✨ Funcionalidades Principais

  • 🔄 Data Pipeline: Scalable ETL with parallel processing
  • ✅ Data Validation: Schema validation and quality checks
  • 📊 Monitoring: Pipeline health metrics and alerting
  • 🔧 Configurability: YAML/JSON-based pipeline configuration
  • 🏗️ Object-Oriented: 2 core classes with clean architecture

🏗️ Arquitetura

graph LR
    subgraph Input["📥 Input"]
        A[Raw Data]
        B[Feature Config]
    end
    
    subgraph Pipeline["🔬 ML Pipeline"]
        C[Preprocessing]
        D[Feature Engineering]
        E[Model Training]
        F[Evaluation]
    end
    
    subgraph Output["📤 Output"]
        G[Trained Models]
        H[Metrics & Reports]
        I[Predictions]
    end
    
    A --> C --> D --> E --> F
    B --> D
    F --> G
    F --> H
    G --> I
    
    style Input fill:#e1f5fe
    style Pipeline fill:#f3e5f5
    style Output fill:#e8f5e9
Loading

🚀 Início Rápido

Prerequisites

  • Python 3.12+
  • pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/ibm-data-science-capstone.git
cd ibm-data-science-capstone

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

🧪 Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov --cov-report=html

# Run specific test module
pytest tests/test_main.py -v

# Run with detailed output
pytest -v --tb=short

📁 Estrutura do Projeto

ibm-data-science-capstone/
├── docs/          # Documentation
│   ├── api_documentation.md
│   └── user_guide.md
├── src/          # Source code
│   ├── data_science_pipeline.py
│   └── main_platform.py
├── tests/         # Test suite
│   ├── __init__.py
│   ├── performance_test.py
│   └── test_platform.py
├── LICENSE
├── README.md
└── requirements.txt

🛠️ Stack Tecnológica

Tecnologia Descrição Papel
Python Core Language Primary
NumPy Numerical computing Framework
Pandas Data manipulation library Framework
Plotly Interactive visualization Framework
scikit-learn Machine learning library Framework
Streamlit Data app framework Framework

🤝 Contribuindo

Contribuições são bem-vindas! Sinta-se à vontade para enviar um Pull Request.

📄 Licença

Este projeto está licenciado sob a Licença MIT - veja o arquivo LICENSE para detalhes.

👤 Autor

Gabriel Demetrios Lafis

Releases

No releases published

Packages

 
 
 

Contributors

Languages