facebookincubator · aather · May 20, 2025
diff --git a/README.md b/README.md
@@ -1,92 +1,126 @@
-# Strobelight
+# BPF GPUEventSnoop with LLM-based CUDA Trace Analysis
 
-![Strobelight Logo](images/Strobelight_brandmark_full-color-black-text.svg)
-Strobelight is a fleetwide profiler framework developed by Meta, designed to provide comprehensive profiling capabilities across large-scale infrastructure. This framework helps in identifying performance bottlenecks and optimizing resource utilization across a fleet of machines.
+Traces CUDA GPU kernel functions via BPF and provides in-depth analysis through visualizations and optional LLM (Large Language Model)-powered summaries.
 
-Strobelight is composed of a number of profilers, each profiler collects a certain type of profile. This can include CPU, GPU, Memory, or other types of profiles.
+---
 
-## gpuevent profiler
-The `gpuevent` profiler attaches to `cudaLaunchKernel` events and collects information about kernels being launched, including demangled name, arguments, stacks, dimensions, etc.
+## 🚀 Prerequisites
 
-## Getting Started
+- NVIDIA GPU instance  
+- Ubuntu with kernel headers  
+- CUDA Toolkit installed (`nvcc` should be at `/usr/local/cuda/bin/nvcc`)
 
-### Prerequisites
+---
 
-- A Linux-based system.
-- Gpu host with [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
-- Cuda binary for testing
-- cmake
+## 🔧 Install Required Packages
 
-### Installation
+```bash
+sudo apt update
+sudo apt install -y clang llvm libbpf-dev
+sudo apt install -y linux-headers-$(uname -r)
+sudo apt install -y build-essential git cmake libelf-dev libfl-dev pkg-config
 
-1. Clone the repository:
-   ```bash
-   git clone https://github.com/facebookincubator/strobelight.git
-   ```
+🛠️ Build Strobelight and GPUEventSnoop
+cd strobelight
+./scripts/build.sh
 
-2. Navigate to the project directory and follow the build instructions:
-   ```bash
-    cd strobelight
-    ./scripts/build.sh -u
-   ```
+- BPF source for user and kernel prog is located in: strobelight/strobelight/src/profilers/gpuevent_snoop
+- After a successful build, binaries will be dumped in: strobelight/strobelight/src/_build/profilers
 
-### Usage
+🧪 Run the Profiler
+cd strobelight/strobelight/src/_build/profilers
+./gpuevent_snoop -p <PID>
+./gpuevent_snoop --help  # For all options
 
-Once build is done, you can run the generated binary on any cuda pid
-   ```bash
-   $ strobelight/src/_build/profilers/gpuevent_snoop --help
-Usage: gpuevent_snoop [OPTION...]
-GpuEventSnoop.
+Supported CUDA routines traced:
+- cudaMalloc, cudaFree
+- cudaMemcpy, cudaMemcpyAsync
+- cudaLaunchKernel
+- cudaStreamCreate, cudaStreamDestroy
+- cudaStreamSynchronize, cudaDeviceSynchronize
 
-Traces GPU kernel function execution and its input parameters
+🧫 Test Programs and Tracing
 
-USAGE: ./gpuevent_snoop -p PID [-v] [-d duration_sec]
+Build a Sample Program
+/usr/local/cuda/bin/nvcc test_cuda_api_multi_gpu.cu -o test_cuda_api_multi_gpu
 
-  -a, --args                   Collect Kernel Launch Arguments
-  -d, --duration=SEC           Trace for given number of seconds
-  -p, --pid=PID                Trace process with given PID
-  -r, --rb-count=CNT           RingBuf max entries
-  -s, --stacks                 Collect Kernel Launch Stacks
-  -v, --verbose                Verbose debug output
-  -?, --help                   Give this help list
-   --usage                     Give a short usage message
-   ```
+Run and Collect Trace
 
- ```bash
-./gpuevent_snoop  -p <pid> -a -s
-Found Symbol cudaLaunchKernel at /strobelight/oss/src/cuda_example/__cuda_kernel_example__/cuda_kernel_example Offset: 0xca480
-Started profiling at Thu Apr  4 13:20:28 2024
-cuda_kernel_exa [4024506] KERNEL [0x269710] STREAM 0x0                GRID (1,1,1) BLOCK (256,1,1) add_vectors(double*, double*, do...
-Args: add_vectors arg0=0x7f2096800000
-double arg1=0x7f2096800400
-double arg2=0x7f2096800800
-double arg3=0x100000064
-int arg4=0x7ffc2a866690
-Stack:
-00000000002cb480: cudaLaunchKernel @ 0x2cb480+0x0
-000000000026a050: main @ 0x26a050+0x912
-000000000002c5f0: libc_start_call_main @ 0x2c5f0+0x67
-000000000002c690: libc_start_main_alias_2 @ 0x2c690+0x88
-0000000000269330: _start @ 0x269330+0x21
+Update demo collect_trace.sh with the correct $REPO path. Script will run the above cuda program and trace it
 
-...
+$ ./collect_trace.sh > sample-trace.out
+A sample trace file (sample-trace.out) is provided for testing.
 
- ```
 
-## Contributing
+📊 CUDA Trace Analysis (LLM-Enhanced)
 
-We welcome contributions from the community. To contribute:
+This toolset parses and analyzes CUDA traces using visualizations and LLMs like OpenAI GPT.
 
-1. Fork the repository and create a new branch from `main`.
-2. Make your changes, ensuring they adhere to the project's coding standards.
-3. Submit a pull request, including a detailed description of your changes.
+⸻
 
-For more information, please refer to the [Contributing Guide](https://github.com/facebookincubator/strobelight/blob/main/CONTRIBUTING.md).
+🔧 Setup Environment
 
-## License
+cd llm-analysis
+python3 -m venv venv && source venv/bin/activate
+pip install -r requirements.txt
 
-Strobelight is licensed under the Apache License, Version 2.0. See the [LICENSE](https://github.com/facebookincubator/strobelight/blob/main/LICENSE) file for more details.
+📈 Run Analysis
+
+With OpenAI API key::
+python ./enhanced_cuda_trace_analysis.py trace.out --llm_mode openai --api_key YOUR_API_KEY
+
+Without API key (mock mode):
+python ./enhanced_cuda_trace_analysis.py trace.out --llm_mode mock
+
+Results will be stored in the cuda_analysis_results folder.
+
+
+🧰 Command Line Options
+- --trace_file: Path to the CUDA trace file (required)
+- --output_dir: Output folder (default: ./cuda_analysis_results)
+- --llm_mode: LLM mode (mock, openai, local; default: mock)
+- --api_key: OpenAI API key (required for openai mode)
+- --model_endpoint: Local LLM API endpoint (default: http://localhost:8000)
+- --skip_parsing: Skip trace parsing
+- --skip_analysis: Skip trace data analysis
+- --skip_visualization: Skip visualization generation
+- --test_llm: Run LLM test suite
+
+📂 Output Artifacts
+- Parsed trace data (JSON)
+- Analysis results (JSON)
+- Visualizations (PNG)
+- Enhanced dashboards
+- LLM analysis (Markdown, HTML)
+- Final summary reports
+
+Sample outputs:
+- llm-sample-results/
+- sample_llm_analysis_report.html
+
+
+.
+├── strobelight/                   # Strobelight GPU profiler
+├── llm-analysis/                 # LLM analysis tools
+├── collect_trace.sh              # CUDA trace demo script
+├── test_cuda_api_multi_gpu.cu    # demo CUDA program for tracing
+├── sample_llm_analysis_report.html # Example output
+├── llm-sample-results/           # Example LLM results
+└── README.md
+
+
+🧠 Components in llm-analysis/
+- cuda_trace_parser.py – Parses trace data
+- cuda_trace_analyzer.py – Analyzes kernel launches
+- cuda_visualization_organizer.py – Generates visualizations
+- enhanced_cuda_llm_analyzer.py – Performs LLM analysis
+- enhanced_cuda_trace_analysis.py – CLI wrapper
+- cuda_prompt_templates.py – Prompt templates for LLMs
+- cuda_llm_analysis_tester.py – LLM test suite
+
+📬 Feedback & Contributions
+
+Feedback and contributions are welcome!
+Please open an issue or pull request to help improve this GPU tracing and analysis framework.
 
-## Acknowledgements
 
-This project is maintained by Meta's engineering team and is open to community contributions. We thank all contributors for their efforts in improving this project.
diff --git a/collect_trace.sh b/collect_trace.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# Set variables
+CUDA_PROGRAM="test_cuda_api_multi_gpu.cu"   # src code of the cuda program that will be traced
+EXECUTABLE="test_cuda_api_multi_gpu"        # Executable of the cuda program that will be traced 
+REPO="/efs/NFLX-GENAI-PROJECTS/GPUSNOOP"    # full path of the directory where repository is cloned
+TRACE_DURATION=30  			    # Run gpuevent_snoop for 30 seconds
+GPU_EVENTSNOOP="$REPO/strobelight/strobelight/src/_build/profilers/gpuevent_snoop"     # BPF user program
+
+# Step 1: Compile the CUDA Program
+echo "Compiling $CUDA_PROGRAM..."
+/usr/local/cuda/bin/nvcc $CUDA_PROGRAM -o $EXECUTABLE 
+if [ $? -ne 0 ]; then
+    echo "Compilation failed!"
+    exit 1
+fi
+echo "Compilation successful"
+
+# Step 2: Run the Program in the Background
+echo "Starting $EXECUTABLE..."
+./$EXECUTABLE &
+CUDA_PID=$!
+
+# Give some time for the process to start
+sleep 3
+
+# Step 3: Verify the Process is Running
+if ! ps -p $CUDA_PID > /dev/null; then
+    echo "Error: CUDA process ($CUDA_PID) is not running!"
+    exit 1
+fi
+echo "CUDA process running with PID: $CUDA_PID"
+
+# Step 4: Run gpuevent_snoop for 30 Seconds
+echo "Running gpuevent_snoop for $TRACE_DURATION seconds..."
+sudo $GPU_EVENTSNOOP -p $CUDA_PID -a -s -v --args --duration=$TRACE_DURATION
+
+# Step 5: Kill the CUDA Program After Tracing (Optional)
+echo "Stopping CUDA program..."
+kill $CUDA_PID
+
+echo "Tracing completed."
+
diff --git a/llm-analysis/README.md b/llm-analysis/README.md
@@ -0,0 +1,52 @@
+# CUDA Trace Analysis Tool
+
+This tool analyzes CUDA trace files generated by BPF programs, providing detailed insights through visualizations and LLM-based analysis.
+
+## Installation
+
+```bash
+pip install -r requirements.txt
+```
+
+## Usage
+
+```bash
+# Basic usage
+python enhanced_cuda_trace_analysis.py /path/to/trace.out --output_dir ./output_directory --llm_mode mock
+
+# With OpenAI API for enhanced analysis
+python enhanced_cuda_trace_analysis.py /path/to/trace.out --output_dir ./output_directory --llm_mode openai --api_key YOUR_API_KEY
+```
+
+## Command Line Options
+
+- `trace_file`: Path to the CUDA trace file (required)
+- `--output_dir`: Output directory for analysis results (default: ./cuda_analysis_results)
+- `--llm_mode`: LLM analysis mode (choices: mock, openai, local; default: mock)
+- `--api_key`: OpenAI API key (required for openai mode)
+- `--model_endpoint`: Local LLM API endpoint (for local mode; default: http://localhost:8000/v1/chat/completions)
+- `--skip_parsing`: Skip trace file parsing (use existing parsed data)
+- `--skip_analysis`: Skip trace data analysis (use existing analysis results)
+- `--skip_visualization`: Skip visualization enhancement (use existing enhanced visualizations)
+- `--test_llm`: Test LLM analysis using the testing framework
+
+## Output
+
+The tool generates the following outputs in the specified output directory:
+- Parsed trace data (JSON)
+- Analysis results (JSON)
+- Visualizations (PNG)
+- Enhanced visualizations and dashboard
+- LLM analysis reports (Markdown)
+- HTML report with integrated visualizations and analysis
+- Final summary report (Markdown)
+
+## Components
+
+- `cuda_trace_parser.py`: Parses CUDA trace files into structured data
+- `cuda_trace_analyzer.py`: Analyzes trace data and generates visualizations
+- `cuda_visualization_organizer.py`: Enhances visualizations and creates dashboard
+- `enhanced_cuda_llm_analyzer.py`: Performs LLM-based analysis of trace data
+- `enhanced_cuda_trace_analysis.py`: Main program integrating all components
+- `cuda_prompt_templates.py`: Templates for LLM prompts
+- `cuda_llm_analysis_tester.py`: Testing framework for LLM analysis