-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlistofExperiences-longresume.tex
107 lines (95 loc) · 5.71 KB
/
listofExperiences-longresume.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
% TODO: put in numbers and impact
% TODO: explain role in each
% TODO: check for grammare
% TODO: get checked by recruiters
\newcommand{\myExpOne}{
\item Directing HPC Tools and Runtime Systems R\&D for Sandia's Exascale Software.
\item Implementing tooling for Kokkos that is integrated with (1) HPC performance monitoring (LDMS) and feedback and (2) PMPI and adaptive runtime systems for MPI.
\item Implementing and specifying tooling API features for Kokkos Tools for OpenMP, OpenACC, MPI, and C++ standards.
\item Developing AI-assisted HPC Tools through coderosetta.com for Kokkos-CUDA and APEX Kokkos Tools autotuning, resulting in an NVIDIA GTC 2025 poster.
}
\newcommand{\myExpTwo}{
\item Owner and IC of Kokkos Tools: merged 15 github Pull Requests (PRs), averaging 40 lines of code, that improved CMake and Spack Build System, reducing tooling overheads, and tool development.
\item Contributed to autotuning features to the Kokkos 4.5 release.
\item Developed an LLVM-based SymEx tool to debug Kokkos programs that detected 7 Kokkos bug examples with no false positives, leading to a paper at SC24's Correctness workshop.
\item Implemented prototype LLVM OpenMP loop transformation directive 'split', leading to a 1.24x speedup for a OpenMP + CUDA benchmark and OpenMP 6.0's split directive.
% \item Developed and implemented CPU+GPU auto-tuning optimizations, leading to a 1.2x speedup.
\item Problem solving and collaboration to support OpenMP multi-GPU features for AWS, Google Cloud and OCI, leading to 19 proposed features for OpenMP 7.0.
%which led to technical report on the OpenMP Set object.
\item Porting HPC Tools and Runtime Systems to exascale+multicloud, collaborating with AWS, Google Cloud and Oracle OCI.
% \item Impacted six different production-level scientific applications running on El Capitan and Frontier supercomputers.
}
\newcommand{\myExpThree}{
% \item Contributed to developing an LLVM OpenMP implementation, specifically the OpenMP implementation's compiler and its runtime, targetted for Department of Energy's upcoming Exascale Supercomputer platforms.
\item Implemented OpenMP user-defined multiGPU scheduling for LLVM, offering 2x speedup over using an MPI parallelization, leading to two papers at IWOMP 24.
\item Problem solving by implementing optimizations in LLVM for OpenMP asynchronous GPU offloading that showed 1.23x speedup, leading to a paper at SC22's HiPar.
\item Critical thinking by developing performance benchmarks to evaluate OpenMP implementations on GPUs, leading to a journal paper and two workshop papers.
% \item Developed benchmarks and evaluating OpenMP implementations, e.g., LLVM's OpenMP, NVIDIA's OpenMP, on Exascale Supercomputers.
\item Technical leadership as demonstrated by leading OpenMP GPU hackathons and training, technical Project Manager for ECP SOLLVE project; representative in OpenMP ARB
}
\newcommand{\myExpFour}{
\item Implemented and researched LLVM OpenMP User-defined Loop Schedules (UDS).
\item Added a UDS feature to Charm++'s CkLoop, with one PR merged.
\item Analysis and optimization of science simulations on NVIDIA GPUs via CUPTI and auto-tuning.
}
\textbf{Sandia National Laboratories}\\
\textit{Principal Member of Technical Staff II} \hfill \textit{July 2024 - Present}
%\vspace{-0.02in}
\noindent
\begin{itemize}\onlyitems[include={1,2}]
\myExpOne
\end{itemize}
\noindent
\textit{Senior Member of Technical Staff} \hfill \textit{August 2022 - June 2024}
%\vspace{-0.02in}
\begin{itemize}
\myExpTwo
\end{itemize}
\textbf{Brookhaven National Laboratory}\hfill
\textit{Assistant Computational Scientist} \hfill \textit{May 2019 - August 2022}
%\vspace{-0.02in}
\begin{itemize}
\myExpThree
\end{itemize}
\noindent
\textbf{Charmworks and ISI}\hfill
\textit{Software Engineer} \hfill \textit{Jan 2016 - April 2019}
\vspace{-0.02in}
\begin{itemize}
\myExpFour
\end{itemize}
\noindent
\comments{
\textbf{University of Southern California / ISI}\hfill
\textit{Computer Scientist} \hfill \textit{Dec 2016 - Jun 2018}
%\vspace{-0.02in}
\begin{itemize}
\item Worked in team to manage computational performance aspects of running an application program for 3-D image reconstruction algorithms on NVIDIA GPUs.
%\item Ensured external network infrastructure to support transfer of application code's input data files were adequate for an application code's efficient execution using the Globus Toolkit.
%\item Translated an x-ray tomography code written in Matlab code to C code and then parallelizing it to run on a supercomputer
%having nodes with GPGPUs.
%\item \small Doing optimizations for MPI+CUDA application code involving low-overhead loop scheduling and loop optimizations such as loop unrolling.
%\item \small Working on transformations in LLVM.
\end{itemize}
\noindent
\textbf{Charmworks}\hfill
\textit{Software Developer} \hfill \textit{Jan 2016 - Nov 2016}
%\vspace*{-0.02in}
\begin{itemize}
\item Implemented user-defined loop schedules within Charm++'s thread scheduling library CkLoop.
%TODO: consider adding 'including in cloud environments' the end of
%the sentence.
%TODO: make paragraph
%\item Helped to improve portability of Charm++ to a variety of platforms.
%\item Assisted with business aspects of a high-tech startup.
\end{itemize}
}
\noindent
\textbf{University of Illinois}\hfill
\textit{Postdoctoral Associate} \hfill \textit{Jul 2015 - Dec 2015}
%\vspace*{-0.02in}
\begin{itemize}
%\item Developed LLVM OpenMP lw-sched library that allows application programmers to use strategies from dissertation.
\item Exhibited problem solving: made MPI+OpenMP+OpenACC science simulation code 1.24x faster.
%\item Incorporated over-decomposition and locality awareness into low-overhead OpenMP loop scheduling strategies.
\end{itemize}