The ETL Pipeline Web App is a no-code tool for extracting, transforming, and loading data from CSV, JSON, or APIs into a database. It offers data cleaning, parsing, outlier removal, scaling, and feature engineering. Users can store processed data in SQLite or PostgreSQL, making it ideal for data analysts, engineers, and business users.
The ETL Pipeline Web App is a no-code, interactive tool for extracting data from CSV, JSON, and APIs, transforming it with data cleaning, parsing, and feature engineering, and loading it into a SQLite or PostgreSQL database. It is designed for data analysts, engineers, and business users who need an efficient ETL solution.
- CSV files
- JSON files
- APIs (via URL)
- Clean column names and remove duplicates
- Parse date columns and fix data types
- Handle missing values and outliers
- Encode categorical variables
- Scale and normalize features
- Apply feature engineering
- Store transformed data in SQLite (default) or PostgreSQL
- Frontend: Streamlit
- Backend: Pandas, NumPy, Requests
- Storage: SQLite / PostgreSQL
- Machine Learning Support: Scikit-learn
-
Clone the repository:
git clone https://github.com/your-repo/etl-pipeline-web-app.git cd etl-pipeline-web-app -
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run etl_web_app.py
- Cloud storage integration (AWS S3, Google Drive)
- Automated scheduling with APScheduler
- Data visualization dashboard
This tool provides a streamlined ETL workflow without requiring coding expertise, making data transformation and integration accessible to all users.