Skip to content
View aryaMehta26's full-sized avatar

Highlights

  • Pro

Block or report aryaMehta26

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aryaMehta26/README.md

Hey there, I'm Arya Mehta πŸ‘‹

Passionate Software Developer | JAVA | Python & SQL Enthusiast | Distributed Systems | KAFKA | Spark | Bay Area Resident
πŸš€ I enjoy building data pipelines and automating workflows.
πŸ’‘ Constantly learning and exploring Data + AI.


About Me πŸ‘¨β€πŸ’»

I am a versatile technologist with a strong foundation in Python, SQL, and software development principles, complemented by deep expertise in data engineering (Apache Spark, Hadoop, Snowflake, Airflow). I thrive on architecting and optimizing scalable systems, whether they are ETL pipelines, real-time data processing solutions, cloud-based architectures, or machine learning applications.

πŸ”Ή Technical Expertise: βœ” Programming & Development: Python (incl. software design, testing), SQL, Java (Spring Boot), JavaScript (React/Next.js), APIs, System Design

βœ” Big Data & Cloud: Apache Spark, Hadoop, Snowflake, AWS/GCP/Azure

βœ” Data Engineering & Pipelines: ETL/ELT, Apache Airflow, Kafka, Docker

βœ” Machine Learning & AI: Model Development & Deployment (TensorFlow, Keras, scikit-learn), Feature Engineering (TF-IDF), Exploratory Data Analysis (EDA), Data Preparation (Pandas), MLOps concepts

βœ” Databases: MySQL, PostgreSQL, NoSQL, MongoDB

πŸ”Ή Projects & Experience:

βœ… Led a team to win SJ Hacks 2025. by developing "SJ HOPES," a full-stack platform addressing homelessness in San Jose. Built in 24 hours, it features real-time shelter visibility, client support, and micro-opportunities using Spring Boot, React/Next.js, MySQL, and Google Maps API.

βœ… Developed a machine learning model to predict Netflix content popularity, leveraging TF-IDF for feature extraction from textual data and performing comprehensive EDA to uncover key insights. Successfully built, trained, and evaluated various models to achieve robust predictive performance.

βœ… Engineered an end-to-end real-time finance data pipeline using Kafka, Spark, and Snowflake for streaming data processing and analytics.

βœ… Designed and automated a robust ETL workflow using Airflow and AWS Lambda for efficient data ingestion and transformation from diverse sources.

βœ… Created a Big Data analytics solution using Hadoop, Spark, and Tableau to derive insights from large-scale datasets.

πŸ’‘ I am passionate about leveraging technology to solve complex problems and continuously explore new paradigms in software engineering, data science, and AI. My goal is to contribute to impactful projects by building scalable, high-performance software and data infrastructure, with a keen interest in tech for social good.


My Tech Stack πŸ› οΈ

Here are some of the technologies I work with:

Python SQL Apache Airflow Snowflake Apache Spark Apache Hadoop Pandas dbt Git Docker Tableau


My GitHub Stats & Activity πŸ“ŠπŸ“ˆ

Arya's GitHub Stats
Top Languages
GitHub Streak
profile views
GitHub Activity Graph
Arya's GitHub Contribution Graph

GitHub Trophies

Snake Game for GitHub Contributions


Connect with Me 🌐

Let's connect! You can find me on:

LinkedIn Medium


πŸ¦‰ Fun Fact: Some of my best code gets written after sunset... I code till night! ?

Pinned Loading

  1. Netflix_popularity_Predicition Netflix_popularity_Predicition Public

    Decoding the Next Big Hit: How I Built a Crystal Ball for Netflix Content (Before It Airs!)

    Jupyter Notebook 1 1

  2. sj-hopes sj-hopes Public

    Forked from vatsalgandhi83/sj-hopes

    TypeScript

  3. enterprise-rag enterprise-rag Public

    Enterprise RAG platform: FastAPI + React for secure document ingestion & retrieval with vector embeddings. 1M+ docs, 92% accuracy, 75% faster, 99.9% uptime. Docker β€’ AWS β€’ Redis.

    Python 1

  4. data-platform data-platform Public

    Distributed data platform processing 5TB+ daily with 100+ concurrent jobs. Built with Airflow, PySpark, FastAPI on Kubernetes. 50+ DQ rules, 99.9% uptime, <100ms latency, 40% cost reduction.

    Python 1

  5. LayoverOS LayoverOS Public

    Forked from ShachiMistry/LayoverOS

    An autonomous, state-persistent recovery agent for stranded travelers. Powered by MongoDB Atlas Vector Search & LangGraph.

    Python

  6. isaac-sim-synth-data isaac-sim-synth-data Public

    High-fidelity synthetic data generation pipeline built on NVIDIA Omniverse Isaac Sim. Creates USD-based simulation environments with domain randomization and multi-sensor (camera/LiDAR/IMU) modelin…

    Python 1