🚢Titanic Survival Prediction

📌 Project Overview

This project analyzes the Titanic dataset to predict passenger survival using various machine learning models. The dataset is preprocessed, explored, and evaluated through multiple classification algorithms.

📂 Dataset

The titanic dataset used is this project is fetched from Seaborn:

titanic.csv
Features include age, fare, pclass, sex, embarked, and others.

⚙️ Tech Stack

Languages & Libraries: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Plotly
ML Models: Logistic Regression, KNN, Random Forest, SVM, Decision Tree, Naive Bayes
Other Tools: Streamlit for visualization, GridSearchCV for hyperparameter tuning

🔍 Data Preprocessing

Handling missing values (mean/mode imputation)
Dropping columns having high missing values
Encoding categorical variables using OneHotEncoding
Normalization (MinMaxScaler, StandardScaler)

📊 Exploratory Data Analysis (EDA)

Count plots for survival distribution
Scatter plots for fare vs. age
Histograms of fares for different passenger classes
Boxplots for age and fare distribution
FacetGrid visualizations for survival analysis

🚀 Model Training & Evaluation

Models were trained using a pipeline approach:

Preprocessing (Scaling + Encoding)
Splitting Data (80% train, 20% test)
Model Training (Logistic Regression, KNN, etc.)
Hyperparameter Tuning (GridSearchCV for Logistic Regression)
Evaluation Metrics:
- Accuracy
- Precision, Recall, F1 Score
- Confusion Matrix
- Cross-validation scores

🔥 Best Model Selection

The best model was Logistic Regression, achieving the highest accuracy of 83.7079%.
Logistic Regression with hyperparameter tuning performed well with an accuracy of 84.26966%.

📌 Deployment

The trained model is saved using joblib (titanic_trained_model.pkl).
The trained model is tested using sample data.
Streamlit is used for interactive visualizations.
Confusion matrices, survival rate visualizations, and EDA graphs are included.

📂 File Structure

|Titanic-Insights/
   |dashboard/
      |-- dataset/
         |-- titanic.csv
         |-- cleaned_data.csv
      |-- notebook/
         |-- titanic.ipynb
         |-- titanic_trained_model.pkl
   |-- app.py  # Streamlit app
   |-- requirements.txt
|-- README.md

🔧 Installation & Usage

Clone the repository:

git clone https://github.com/UFAQUE123/Titanic-Insights.git

Install dependencies:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run app.py
```

✨ Conclusion

Feature engineering and proper preprocessing significantly improve model performance.
Logistic Regression is the best-performing model for this dataset.
The project demonstrates an end-to-end machine learning pipeline, from data preprocessing to deployment.

🚀 Future Work: Feature selection for better interpretability, and deploying a web-based ML model interface.

📌 Author: UFAQUE SHADAB
📧 Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
dashboard		dashboard
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚢Titanic Survival Prediction

📌 Project Overview

📂 Dataset

⚙️ Tech Stack

🔍 Data Preprocessing

📊 Exploratory Data Analysis (EDA)

🚀 Model Training & Evaluation

🔥 Best Model Selection

📌 Deployment

📂 File Structure

🔧 Installation & Usage

✨ Conclusion

About

Releases

Packages

Languages

License

UFAQUE123/Titanic-Insights

Folders and files

Latest commit

History

Repository files navigation

🚢Titanic Survival Prediction

📌 Project Overview

📂 Dataset

⚙️ Tech Stack

🔍 Data Preprocessing

📊 Exploratory Data Analysis (EDA)

🚀 Model Training & Evaluation

🔥 Best Model Selection

📌 Deployment

📂 File Structure

🔧 Installation & Usage

✨ Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages