I'm a 3rd-year BSc Computer Science (Data Science & AI) student at the University of Dundee, on track for a First-Class degree. I build production-grade systems across three tracks: event-driven data pipelines (Kafka, Airflow, AWS, Star Schema), end-to-end ML and LLM systems (XGBoost, MLflow, RAG, LangChain, LLM-as-judge evaluation), and full-stack cloud applications (React, Flask/FastAPI, AWS, CI/CD) - backed by 1,452 automated tests across one project and 670 across another. I care about engineering rigour: dead-letter routing before data hits a database, leakage prevention before any CV fold runs, and deployment pipelines that abort on failure rather than hoping nothing breaks.
I'm currently seeking a post-graduate role starting in 2027, in Data Engineering, ML/AI Engineering, or Software Engineering.
- 🎓 BSc (Hons) Computer Science (Data Science & AI) - expected graduation June 2028
- 🏆 AWS Academy - Machine Learning Foundations
- 🏅 Microsoft Learn - Foundations of Azure AI: Concepts, Capabilities, and Implementation
- 🏆 AWS Academy - Cloud Foundations
- 📍 Based in Dundee, Scotland - open to relocation
I treat reliability and observability as non-negotiable from the start, not retrofitted after the fact. That means dead-letter routing before data reaches a database, leakage prevention baked into sklearn Pipelines before any cross-validation fold runs, and CI/CD gates that abort deployment on any test failure rather than hoping nothing breaks in production. I'm drawn to problems where silent failures are the hardest kind to debug - concurrent write contention, foreign key mismatches, retrieval quality in RAG systems, and I build systems that make those failures impossible to miss. I work from requirements before writing code - functional, non-functional, and acceptance criteria first - and treat API documentation, schema contracts, and test plans as deliverables in their own right, not afterthoughts.
Python FastAPI PostgreSQL Apache Kafka Redis XGBoost Scikit-learn LangChain ChromaDB MLflow React Vite Docker AWS
Industry project for NCR Atleos - production-grade log ingestion pipeline with 3-layer anomaly detection and Agentic RAG diagnostic assistant. Led backend, data engineering, and ML end-to-end across a 7-person Agile team.
- Kafka event streaming: KRaft mode, 2 topics × 3 partitions, at-least-once delivery with manual offset commits. Hybrid deduplication: Redis SET with 1h TTL + 10K-entry in-memory LRU fallback.
- 3-layer detection engine: ML_ENSEMBLE (XGBoost + Isolation Forest, 99.8% CV accuracy) + ZSCORE (rolling 20-window sigma) + HEURISTIC (7 deterministic multi-source correlators). 600s configurable sliding window, 10-min cross-layer dedup.
- Agentic RAG: Cross-encoder reranking (ms-marco-MiniLM), 3-sample self-consistency with 3-gram Jaccard similarity, Reflexion (self-critique → regenerate), citation grounding with regex entity verification. 4-signal confidence fusion: retrieval (30%) + consistency (25%) + verbalized (25%) + grounding (20%).
- Redis 8 patterns: Rate limiting (sorted set), deduplication (set + TTL), JWT blacklist (string + TTL), distributed locks (SET NX EX), Pub/Sub streaming, response caching, dead-letter queue (streams with exponential backoff), analytics counters (INCR + HLL + ZINCRBY).
- MLOps via MLflow: Experiment tracking, model registry with "champion" aliases. 7 artifacts per training run: xgb_classifier, isolation_forest, scaler, label_encoder, feature names, IF feature indices, calibrated UNKNOWN threshold.
- 670 automated tests (521 backend + 149 frontend) across 10 tiers: unit, integration, stress, security, ML, RAG, Redis, Kafka, generators, parsers
React Tailwind CSS Flask Socket.IO PostgreSQL Docker AWS GitHub Actions Pytest Jest Cypress
Multi-service full-stack application - Flask API with Socket.IO real-time sync, React SPA served through nginx, and PostgreSQL on AWS RDS. Designed for team collaboration: task management, GitHub issue/PR linking, and role-based access control with OIDC-authenticated CI/CD.
- Multi-stage Docker: Backend compressed to 330MB (python:3.11-slim runtime, build deps stripped in stage 1). Frontend built on node:20-alpine, served by nginx:1.27-alpine with envsubst template for API_UPSTREAM. Docker resolver (127.0.0.11) for runtime DNS. Two-compose-file pattern cleanly separates PostgreSQL from the app stack.
- Real-time collaboration: Flask-SocketIO with JWT-authenticated WebSocket handshake, project-scoped rooms preventing cross-project data leaks. Socket.IO-client on React side broadcasts task updates, comments, and notifications to all room members - zero polling.
- Full OIDC CI/CD pipeline: GitHub Actions with OIDC federation (no static credentials). Path-filtered test gates run BE (pytest) and FE (Jest) independently. On merge: ECR push to ECS Fargate rolling update with 200-second health check deployment gate, then S3/CloudFront frontend distribution. Any test failure aborts the pipeline.
- JWT dual auth: Access + refresh token flow with both cookie and Bearer header transport. 3-tier RBAC (Developer / Team Lead / Admin) enforced at endpoint level via decorators - middleware validates the numerical hierarchy so higher roles inherit all lower permissions.
- Database design: 12 PostgreSQL tables with SQLAlchemy ORM, composite indexes on (project_id, status) and (user_id, notification_type) for common query patterns. Full-text search on task titles. Audit logging with automatic timestamping across all entity mutations.
- 1,452 automated tests (518 Pytest + 929 Jest + 5 Cypress) gate every PR. Backend tests run on SQLite in-memory - zero external database dependency required. Coverage thresholds at 85% for both backend and frontend.
Apache Airflow Python PostgreSQL AWS RDS Power BI Power Automate
Fully automated ETL pipeline transforming raw W3C IIS logs into a 9-dimension Star Schema on AWS RDS. 9-way parallel Airflow fan-out makes phase three 8× faster than sequential. Geolocation enrichment across 78 countries, −1 surrogate key fallback ensuring zero dropped records, and Power Automate failure alerting. 7-page Power BI dashboard including P95 response time via DAX.
React Native TypeScript Firebase Node.js Jest Alpha Vantage API
Full-stack mobile app converting physical receipts via OCR into structured financial records, mapping spending to stock tickers via Alpha Vantage, and projecting portfolio performance using ARIMA forecasting and Linear Regression. AES encryption at rest, biometric auth, 78 Jest tests.
Python Scikit-learn XGBoost Pandas Matplotlib Jupyter Notebook
End-to-end ML pipeline: 7 classifiers benchmarked (~90% accuracy), 2 novel ratio-based features engineered that became top-3 predictors, GridSearchCV 5-fold CV, K-Means + DBSCAN clustering, and Linear Regression (R²=0.756). Strict leakage prevention via sklearn Pipelines with ColumnTransformer throughout.
Python Scikit-learn Flask MovieLens Dataset NumPy Pandas
Hybrid recommendation engine (collaborative filtering + content-based) on MovieLens. ~78% hit rate, ~0.22 Precision@10. Dependency-injected strategy pattern means recommendation algorithms are fully swappable without touching the API layer. Cold-start problem addressed via hybrid signal combination.
C++
Modular C++ OOP system - polymorphic vehicle hierarchy, generic repository template pattern, zero raw pointer usage (smart pointers throughout). Levenshtein distance fuzzy search, automated late fee and loyalty rewards logic, file-based persistence, and an 8-scenario E2E test suite.
Bash Unix
Git-like VCS built from scratch in pure Bash - zero external dependencies beyond native Unix utilities. File locking, timestamped versioning, automatic diff generation, filterable activity logs with user attribution, multi-repo support, and compressed archive export. Currently implementing branching, three-way merge, and benchmarking against Git.
Organized to mirror CV track structure - Software Engineering, ML & AI Engineering, and Data Engineering.
Software Engineering
ML & AI Engineering
Data Engineering
Cloud & DevOps
Testing



