Food Allergy Detection System (Flask + Machine Learning) Overview
This project is a Flask-based machine learning web application that predicts potential food allergy categories based on structured ingredient attributes. It demonstrates the end-to-end deployment of a trained ML model into an interactive web interface, with a clear separation between data preprocessing, model training, and inference.
The current implementation focuses on controlled, dropdown-based inputs to ensure prediction consistency and robustness. The system is intentionally designed to support future extension to text-based ingredient analysis using NLP techniques.
Key Features
Web-based prediction interface built with Flask
Decision Tree classifier trained on categorical food attributes
Dynamic dropdown inputs populated directly from training encoders
End-to-end pipeline: preprocessing → training → model persistence → inference
Clean separation between current functionality and future roadmap
Technical Stack
Backend: Python, Flask
Machine Learning: scikit-learn (Decision Tree Classifier)
Model Persistence: joblib
Data Processing: NumPy, Pandas
Frontend: HTML (Jinja2 templating)
How the System Works
- Input Handling
Users select values for the following categorical features:
Food Product
Main Ingredient
Sweetener
Fat / Oil
Seasoning
Dropdown options are generated directly from the trained LabelEncoder classes, ensuring all inputs are valid and consistent with the model’s training data.
- Prediction Pipeline
User selections are encoded using pre-trained encoders.
Encoded features are passed to the trained Decision Tree model.
The predicted class is decoded using the target encoder.
The allergy category is displayed on the web interface.
Machine Learning Pipeline Data Preparation
Raw food and ingredient datasets were cleaned and normalized.
Categorical variables were encoded using LabelEncoder.
Target allergy labels were encoded separately.
Model Training
A Decision Tree Classifier was trained on encoded categorical features.
Trained artifacts were saved for deployment:
allergy_model.pkl
encoders.pkl
target_encoder.pkl
Deployment
The trained model and encoders are loaded at application startup.
Predictions are performed in real time via Flask routes.
Project Structure Allergies_Detection/ │ ├── app.py # Flask application entry point ├── allergy_model.pkl # Trained ML model ├── encoders.pkl # Feature encoders ├── target_encoder.pkl # Target label encoder │ ├── train.ipynb # Model training notebook ├── test2.ipynb # Model evaluation / testing │ ├── cleaned_food_allergy_dataset.csv ├── food_allergy_preprocessed.csv ├── food_ingredients_and_allergens.csv │ └── templates/ └── index.html # Web interface
How to Run Locally
- Install Dependencies pip install flask numpy pandas scikit-learn joblib
Note: For full reproducibility, dependency versions should match those used during model training.
-
Start the Application python app.py
-
Access the App
Open a browser and navigate to:
Current Limitations
Input is limited to dropdown-based categorical selections
Free-text ingredient descriptions are not yet supported
Model artifacts were trained under a specific scikit-learn version
No authentication or database persistence implemented
Future Enhancements (Planned)
Text-based ingredient input using NLP techniques
Feature extraction via TF-IDF or embedding-based methods
Model retraining to support unstructured inputs
Enhanced UI/UX and prediction explanations
Improved model performance using ensemble methods