📊 Ibm Data Science Capstone

IBM Data Science Professional Certificate Capstone Project

English

🎯 Overview

Ibm Data Science Capstone is a production-grade Python application that showcases modern software engineering practices including clean architecture, comprehensive testing, containerized deployment, and CI/CD readiness.

The codebase comprises 142 lines of source code organized across 5 modules, following industry best practices for maintainability, scalability, and code quality.

✨ Key Features

🔄 Data Pipeline: Scalable ETL with parallel processing
✅ Data Validation: Schema validation and quality checks
📊 Monitoring: Pipeline health metrics and alerting
🔧 Configurability: YAML/JSON-based pipeline configuration
🏗️ Object-Oriented: 2 core classes with clean architecture

🏗️ Architecture

graph LR
    subgraph Input["📥 Input"]
        A[Raw Data]
        B[Feature Config]
    end
    
    subgraph Pipeline["🔬 ML Pipeline"]
        C[Preprocessing]
        D[Feature Engineering]
        E[Model Training]
        F[Evaluation]
    end
    
    subgraph Output["📤 Output"]
        G[Trained Models]
        H[Metrics & Reports]
        I[Predictions]
    end
    
    A --> C --> D --> E --> F
    B --> D
    F --> G
    F --> H
    G --> I
    
    style Input fill:#e1f5fe
    style Pipeline fill:#f3e5f5
    style Output fill:#e8f5e9

🚀 Quick Start

Prerequisites

Python 3.12+
pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/ibm-data-science-capstone.git
cd ibm-data-science-capstone

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

🧪 Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov --cov-report=html

# Run specific test module
pytest tests/test_main.py -v

# Run with detailed output
pytest -v --tb=short

📁 Project Structure

ibm-data-science-capstone/
├── docs/          # Documentation
│   ├── api_documentation.md
│   └── user_guide.md
├── src/          # Source code
│   ├── data_science_pipeline.py
│   └── main_platform.py
├── tests/         # Test suite
│   ├── __init__.py
│   ├── performance_test.py
│   └── test_platform.py
├── LICENSE
├── README.md
└── requirements.txt

🛠️ Tech Stack

Technology	Description	Role
Python	Core Language	Primary
NumPy	Numerical computing	Framework
Pandas	Data manipulation library	Framework
Plotly	Interactive visualization	Framework
scikit-learn	Machine learning library	Framework
Streamlit	Data app framework	Framework

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Gabriel Demetrios Lafis

GitHub: @galafis
LinkedIn: Gabriel Demetrios Lafis

Português

🎯 Visão Geral

Ibm Data Science Capstone é uma aplicação Python de nível profissional que demonstra práticas modernas de engenharia de software, incluindo arquitetura limpa, testes abrangentes, implantação containerizada e prontidão para CI/CD.

A base de código compreende 142 linhas de código-fonte organizadas em 5 módulos, seguindo as melhores práticas do setor para manutenibilidade, escalabilidade e qualidade de código.

✨ Funcionalidades Principais

🔄 Data Pipeline: Scalable ETL with parallel processing
✅ Data Validation: Schema validation and quality checks
📊 Monitoring: Pipeline health metrics and alerting
🔧 Configurability: YAML/JSON-based pipeline configuration
🏗️ Object-Oriented: 2 core classes with clean architecture

🏗️ Arquitetura

graph LR
    subgraph Input["📥 Input"]
        A[Raw Data]
        B[Feature Config]
    end
    
    subgraph Pipeline["🔬 ML Pipeline"]
        C[Preprocessing]
        D[Feature Engineering]
        E[Model Training]
        F[Evaluation]
    end
    
    subgraph Output["📤 Output"]
        G[Trained Models]
        H[Metrics & Reports]
        I[Predictions]
    end
    
    A --> C --> D --> E --> F
    B --> D
    F --> G
    F --> H
    G --> I
    
    style Input fill:#e1f5fe
    style Pipeline fill:#f3e5f5
    style Output fill:#e8f5e9

🚀 Início Rápido

Prerequisites

Python 3.12+
pip (Python package manager)

Installation

# Clone the repository
git clone https://github.com/galafis/ibm-data-science-capstone.git
cd ibm-data-science-capstone

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Running

# Run the application
python src/main.py

🧪 Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov --cov-report=html

# Run specific test module
pytest tests/test_main.py -v

# Run with detailed output
pytest -v --tb=short

📁 Estrutura do Projeto

ibm-data-science-capstone/
├── docs/          # Documentation
│   ├── api_documentation.md
│   └── user_guide.md
├── src/          # Source code
│   ├── data_science_pipeline.py
│   └── main_platform.py
├── tests/         # Test suite
│   ├── __init__.py
│   ├── performance_test.py
│   └── test_platform.py
├── LICENSE
├── README.md
└── requirements.txt

🛠️ Stack Tecnológica

Tecnologia	Descrição	Papel
Python	Core Language	Primary
NumPy	Numerical computing	Framework
Pandas	Data manipulation library	Framework
Plotly	Interactive visualization	Framework
scikit-learn	Machine learning library	Framework
Streamlit	Data app framework	Framework

🤝 Contribuindo

Contribuições são bem-vindas! Sinta-se à vontade para enviar um Pull Request.

📄 Licença

Este projeto está licenciado sob a Licença MIT - veja o arquivo LICENSE para detalhes.

👤 Autor

Gabriel Demetrios Lafis

GitHub: @galafis
LinkedIn: Gabriel Demetrios Lafis

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📊 Ibm Data Science Capstone

English

🎯 Overview

✨ Key Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Running

🧪 Testing

📁 Project Structure

🛠️ Tech Stack

🤝 Contributing

📄 License

👤 Author

Português

🎯 Visão Geral

✨ Funcionalidades Principais

🏗️ Arquitetura

🚀 Início Rápido

Prerequisites

Installation

Running

🧪 Testing

📁 Estrutura do Projeto

🛠️ Stack Tecnológica

🤝 Contribuindo

📄 Licença

👤 Autor

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages