The primary aim of this project is to develop a data-driven approach for detecting potentially fraudulent insurance claims. Beyond simply building a functional model, the focus lies in ensuring quality, stability, and real-world applicability. The core objectives are:
- Build accurate machine learning models capable of identifying fraudulent claims based on structured insurance data.
- Ensure model stability and reliability by achieving consistently high prediction scores across different subsets and unseen data.
- Apply the full data science workflow — from data understanding and preprocessing to feature engineering and model evaluation.
It contains the Data exploration, visualization and pre-processing part of the project, which results will serve for the modelling part
The implementation and the performance evaluation of differents Machine Learning models is done in this notebook.
List of required librairies for this project
Description of the project and the repository structure
Insurance claims data set from https://data.mendeley.com/datasets/992mh7dk9y/2
Install all libraries from the 'requirements.txt' in a virtual environment to run the notebooks
