- ๐ Master's Student in Computer Engineering (ML Concentration) at Northeastern University.
- ๐จโ๐ป Previously worked as a Data Scientist at Kemper Insurance, where I improved ML model accuracy and engineered data pipelines that drove significant improvements in customer targeting and product adoption.
- ๐ค Former Machine Learning Engineer at Accenture, applying ML models to solve real-world problems in energy forecasting, route optimization, and operational efficiency.
- ๐ I have strong skills in machine learning, deep learning, generative AI, LLM fine-tuning, retrieval-augmented generation (RAG), data engineering, data visualization, and MLOps, using tools like Python, TensorFlow, PyTorch, and cloud services like AWS, Azure, and GCP.
- Languages: Python, R, C++, SQL, Scala, MATLAB
- ML & Data Science: TensorFlow, PyTorch, Scikit-learn, XGBoost, Pandas, NumPy
- Data Engineering: Apache Spark, Hadoop, Kafka, Airflow, Snowflake, ETL Pipelines
- Cloud & Tools: AWS (SageMaker, EC2, Lambda), GCP, Azure, Docker, Terraform, Jenkins, Git
- Data Visualization: Tableau, Power BI, Plotly, Matplotlib, Seaborn
Fine-tuned BART and FLAN-T5 models using innovative techniques like LoRA to enhance meeting summarization accuracy. Implemented tokenization strategies for informal content, improving summarization performance.
Built an interactive data visualization app using Streamlit to explore Airbnb listings across U.S. cities. It features interactive maps, word clouds, calendar heatmaps, and more, providing a detailed neighborhood-level analysis.
Used PySpark to develop a distributed ML pipeline, reducing training time for credit card fraud detection by 8x on large datasets.
๐ผ๏ธ Generating Captions for Images
Developed an image captioning model using an encoder-decoder architecture with ResNet50 and LSTM. Achieved significant accuracy improvement using attention mechanisms.
- LinkedIn: linkedin.com/in/harshitsampgaon