This project adopts a modular, object-oriented Pythonic approach for Machine Learning Operations (MLOps), harnessing a diverse range of tools and services. Below, you'll find the key components and tools integral to this project:
The project follows a modular approach, organized into components and pipelines:
- Config YAML File: Contains configuration parameters for the project.
- Components: Individual modules or functions that perform specific tasks.
- Pipelines: Sequences of components orchestrated to execute machine learning workflows.
- MLflow: Used for model registry and management.
- DVC (Data Version Control): Utilized for pipeline orchestration and versioning data.
- Dagshub: Hosted repository for collaborative development and version control.
- AWS Free Tier: Utilized for cloud computing with a 1GB RAM instance.
- Google Drive: Used for storing project artifacts.
- AWS EC2: Hosting platform for deploying services and applications.
- AWS ECR (Elastic Container Registry): Used for storing and managing Docker container images.
- Docker: Containerization technology for packaging applications and dependencies.
- Streamlit: Web application framework used for interactive data visualization.
Continuous Integration and Continuous Deployment (CI/CD) pipelines are implemented using GitHub Actions. The pipelines automate the build, test, and deployment processes, delivering the project to AWS services.
The complete project code and resources can be accessed on Dagshub:
The project utilizes AWS Free Tier services for cloud computing, including EC2 for hosting and ECR for Docker image management. Make sure to manage resources within the Free Tier limits.
All project artifacts, including data, models, and logs, are stored on Google Drive.
The project's data can be accessed from Google Drive using the following link:
Feel free to explore the project repository for more details and contributions.