Welcome to my Data Science Learning Journey repository! This repository contains the tasks and projects I completed during my internship at Developers Hub Corporation. Each task focuses on key aspects of data science, including data preprocessing, exploratory data analysis, machine learning, and model evaluation.
This repository documents my learning journey as a Data Science intern at Developers Hub Corporation. Through these tasks, I gained hands-on experience with data preprocessing, machine learning model development, visualization techniques, and more. Each task is designed to solve a real-world problem, demonstrating the application of data science concepts in various domains.
Objective: Perform exploratory data analysis (EDA) and visualize patterns in a real-world dataset.
Description:
- Loaded and explored datasets like the Titanic Dataset or Airbnb Listings Dataset using Pandas.
- Cleaned the data by handling missing values, removing duplicates, and managing outliers.
- Visualized insights using bar charts, histograms, and correlation heatmaps.
- Summarized key observations and patterns from the data.
Outcome: A Python script or Jupyter Notebook with the complete EDA process, visualizations, and insights.
Objective: Build a sentiment analysis model to predict the sentiment of textual data, such as movie reviews.
Description:
- Preprocessed text data by tokenizing, removing stopwords, and performing lemmatization.
- Converted text into numerical features using TF-IDF and word embeddings.
- Trained machine learning classifiers like Logistic Regression and Naive Bayes for sentiment prediction.
- Evaluated model performance using precision, recall, and F1-score metrics.
Outcome: A Python script that processes input text, predicts sentiment, and displays evaluation metrics.
Objective: Develop a system to detect fraudulent transactions using credit card fraud datasets.
Description:
- Preprocessed data by handling imbalanced datasets using SMOTE.
- Trained machine learning models like Random Forest and Gradient Boosting to classify fraudulent transactions.
- Evaluated model performance using metrics such as precision, recall, and F1-score.
- Created a simple command-line interface for testing transactions in real time.
Outcome: A Python script capable of detecting fraudulent transactions with evaluation metrics and a testing interface.
Objective: Build a regression model to predict house prices using the Boston Housing Dataset.
Description:
- Preprocessed the dataset by normalizing numerical features and handling null values.
- Implemented regression models from scratch, including Linear Regression, Random Forest, and XGBoost.
- Compared model performance using metrics such as RMSE and R².
- Visualized feature importance for tree-based models to understand the most influential variables.
Outcome: A Python script containing the implementation of regression models, performance comparisons, and visualizations.
The following tools and technologies were used to complete the tasks:
- Python: Core programming language for analysis and model building.
- Libraries:
- Data Manipulation:
pandas,numpy - Visualization:
matplotlib,seaborn - Machine Learning:
scikit-learn,xgboost,imblearn - Text Processing:
nltk,TF-IDF,spacy
- Data Manipulation:
- Jupyter Notebook: For interactive coding and step-by-step analysis.
- Git: For version control and collaboration.
- Command-Line Interface (CLI): To test models and run scripts.
I am grateful to Developers Hub Corporation for providing me with this amazing internship opportunity. This journey has been a pivotal step in my data science career, allowing me to gain hands-on experience with real-world datasets and practical machine learning problems.
Special thanks to:
- My mentors at Developers Hub Corporation for their guidance, feedback, and support.
- The open-source community for the amazing Python libraries that made these projects possible.
- Clone this repository to your local machine:
git clone https://github.com/SSaadAKHTAR/Data_Science_Learning_Journey.git