Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 101 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,126 @@
# Strobelight
# BPF GPUEventSnoop with LLM-based CUDA Trace Analysis

![Strobelight Logo](images/Strobelight_brandmark_full-color-black-text.svg)
Strobelight is a fleetwide profiler framework developed by Meta, designed to provide comprehensive profiling capabilities across large-scale infrastructure. This framework helps in identifying performance bottlenecks and optimizing resource utilization across a fleet of machines.
Traces CUDA GPU kernel functions via BPF and provides in-depth analysis through visualizations and optional LLM (Large Language Model)-powered summaries.

Strobelight is composed of a number of profilers, each profiler collects a certain type of profile. This can include CPU, GPU, Memory, or other types of profiles.
---

## gpuevent profiler
The `gpuevent` profiler attaches to `cudaLaunchKernel` events and collects information about kernels being launched, including demangled name, arguments, stacks, dimensions, etc.
## 🚀 Prerequisites

## Getting Started
- NVIDIA GPU instance
- Ubuntu with kernel headers
- CUDA Toolkit installed (`nvcc` should be at `/usr/local/cuda/bin/nvcc`)

### Prerequisites
---

- A Linux-based system.
- Gpu host with [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
- Cuda binary for testing
- cmake
## 🔧 Install Required Packages

### Installation
```bash
sudo apt update
sudo apt install -y clang llvm libbpf-dev
sudo apt install -y linux-headers-$(uname -r)
sudo apt install -y build-essential git cmake libelf-dev libfl-dev pkg-config

1. Clone the repository:
```bash
git clone https://github.com/facebookincubator/strobelight.git
```
🛠️ Build Strobelight and GPUEventSnoop
cd strobelight
./scripts/build.sh

2. Navigate to the project directory and follow the build instructions:
```bash
cd strobelight
./scripts/build.sh -u
```
- BPF source for user and kernel prog is located in: strobelight/strobelight/src/profilers/gpuevent_snoop
- After a successful build, binaries will be dumped in: strobelight/strobelight/src/_build/profilers

### Usage
🧪 Run the Profiler
cd strobelight/strobelight/src/_build/profilers
./gpuevent_snoop -p <PID>
./gpuevent_snoop --help # For all options

Once build is done, you can run the generated binary on any cuda pid
```bash
$ strobelight/src/_build/profilers/gpuevent_snoop --help
Usage: gpuevent_snoop [OPTION...]
GpuEventSnoop.
Supported CUDA routines traced:
- cudaMalloc, cudaFree
- cudaMemcpy, cudaMemcpyAsync
- cudaLaunchKernel
- cudaStreamCreate, cudaStreamDestroy
- cudaStreamSynchronize, cudaDeviceSynchronize

Traces GPU kernel function execution and its input parameters
🧫 Test Programs and Tracing

USAGE: ./gpuevent_snoop -p PID [-v] [-d duration_sec]
Build a Sample Program
/usr/local/cuda/bin/nvcc test_cuda_api_multi_gpu.cu -o test_cuda_api_multi_gpu

-a, --args Collect Kernel Launch Arguments
-d, --duration=SEC Trace for given number of seconds
-p, --pid=PID Trace process with given PID
-r, --rb-count=CNT RingBuf max entries
-s, --stacks Collect Kernel Launch Stacks
-v, --verbose Verbose debug output
-?, --help Give this help list
--usage Give a short usage message
```
Run and Collect Trace

```bash
./gpuevent_snoop -p <pid> -a -s
Found Symbol cudaLaunchKernel at /strobelight/oss/src/cuda_example/__cuda_kernel_example__/cuda_kernel_example Offset: 0xca480
Started profiling at Thu Apr 4 13:20:28 2024
cuda_kernel_exa [4024506] KERNEL [0x269710] STREAM 0x0 GRID (1,1,1) BLOCK (256,1,1) add_vectors(double*, double*, do...
Args: add_vectors arg0=0x7f2096800000
double arg1=0x7f2096800400
double arg2=0x7f2096800800
double arg3=0x100000064
int arg4=0x7ffc2a866690
Stack:
00000000002cb480: cudaLaunchKernel @ 0x2cb480+0x0
000000000026a050: main @ 0x26a050+0x912
000000000002c5f0: libc_start_call_main @ 0x2c5f0+0x67
000000000002c690: libc_start_main_alias_2 @ 0x2c690+0x88
0000000000269330: _start @ 0x269330+0x21
Update demo collect_trace.sh with the correct $REPO path. Script will run the above cuda program and trace it

...
$ ./collect_trace.sh > sample-trace.out
A sample trace file (sample-trace.out) is provided for testing.

```

## Contributing
📊 CUDA Trace Analysis (LLM-Enhanced)

We welcome contributions from the community. To contribute:
This toolset parses and analyzes CUDA traces using visualizations and LLMs like OpenAI GPT.

1. Fork the repository and create a new branch from `main`.
2. Make your changes, ensuring they adhere to the project's coding standards.
3. Submit a pull request, including a detailed description of your changes.

For more information, please refer to the [Contributing Guide](https://github.com/facebookincubator/strobelight/blob/main/CONTRIBUTING.md).
🔧 Setup Environment

## License
cd llm-analysis
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Strobelight is licensed under the Apache License, Version 2.0. See the [LICENSE](https://github.com/facebookincubator/strobelight/blob/main/LICENSE) file for more details.
📈 Run Analysis

With OpenAI API key::
python ./enhanced_cuda_trace_analysis.py trace.out --llm_mode openai --api_key YOUR_API_KEY

Without API key (mock mode):
python ./enhanced_cuda_trace_analysis.py trace.out --llm_mode mock

Results will be stored in the cuda_analysis_results folder.


🧰 Command Line Options
- --trace_file: Path to the CUDA trace file (required)
- --output_dir: Output folder (default: ./cuda_analysis_results)
- --llm_mode: LLM mode (mock, openai, local; default: mock)
- --api_key: OpenAI API key (required for openai mode)
- --model_endpoint: Local LLM API endpoint (default: http://localhost:8000)
- --skip_parsing: Skip trace parsing
- --skip_analysis: Skip trace data analysis
- --skip_visualization: Skip visualization generation
- --test_llm: Run LLM test suite

📂 Output Artifacts
- Parsed trace data (JSON)
- Analysis results (JSON)
- Visualizations (PNG)
- Enhanced dashboards
- LLM analysis (Markdown, HTML)
- Final summary reports

Sample outputs:
- llm-sample-results/
- sample_llm_analysis_report.html


.
├── strobelight/ # Strobelight GPU profiler
├── llm-analysis/ # LLM analysis tools
├── collect_trace.sh # CUDA trace demo script
├── test_cuda_api_multi_gpu.cu # demo CUDA program for tracing
├── sample_llm_analysis_report.html # Example output
├── llm-sample-results/ # Example LLM results
└── README.md


🧠 Components in llm-analysis/
- cuda_trace_parser.py – Parses trace data
- cuda_trace_analyzer.py – Analyzes kernel launches
- cuda_visualization_organizer.py – Generates visualizations
- enhanced_cuda_llm_analyzer.py – Performs LLM analysis
- enhanced_cuda_trace_analysis.py – CLI wrapper
- cuda_prompt_templates.py – Prompt templates for LLMs
- cuda_llm_analysis_tester.py – LLM test suite

📬 Feedback & Contributions

Feedback and contributions are welcome!
Please open an issue or pull request to help improve this GPU tracing and analysis framework.

## Acknowledgements

This project is maintained by Meta's engineering team and is open to community contributions. We thank all contributors for their efforts in improving this project.
43 changes: 43 additions & 0 deletions collect_trace.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/bin/bash

# Set variables
CUDA_PROGRAM="test_cuda_api_multi_gpu.cu" # src code of the cuda program that will be traced
EXECUTABLE="test_cuda_api_multi_gpu" # Executable of the cuda program that will be traced
REPO="/efs/NFLX-GENAI-PROJECTS/GPUSNOOP" # full path of the directory where repository is cloned
TRACE_DURATION=30 # Run gpuevent_snoop for 30 seconds
GPU_EVENTSNOOP="$REPO/strobelight/strobelight/src/_build/profilers/gpuevent_snoop" # BPF user program

# Step 1: Compile the CUDA Program
echo "Compiling $CUDA_PROGRAM..."
/usr/local/cuda/bin/nvcc $CUDA_PROGRAM -o $EXECUTABLE
if [ $? -ne 0 ]; then
echo "Compilation failed!"
exit 1
fi
echo "Compilation successful"

# Step 2: Run the Program in the Background
echo "Starting $EXECUTABLE..."
./$EXECUTABLE &
CUDA_PID=$!

# Give some time for the process to start
sleep 3

# Step 3: Verify the Process is Running
if ! ps -p $CUDA_PID > /dev/null; then
echo "Error: CUDA process ($CUDA_PID) is not running!"
exit 1
fi
echo "CUDA process running with PID: $CUDA_PID"

# Step 4: Run gpuevent_snoop for 30 Seconds
echo "Running gpuevent_snoop for $TRACE_DURATION seconds..."
sudo $GPU_EVENTSNOOP -p $CUDA_PID -a -s -v --args --duration=$TRACE_DURATION

# Step 5: Kill the CUDA Program After Tracing (Optional)
echo "Stopping CUDA program..."
kill $CUDA_PID

echo "Tracing completed."

52 changes: 52 additions & 0 deletions llm-analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# CUDA Trace Analysis Tool

This tool analyzes CUDA trace files generated by BPF programs, providing detailed insights through visualizations and LLM-based analysis.

## Installation

```bash
pip install -r requirements.txt
```

## Usage

```bash
# Basic usage
python enhanced_cuda_trace_analysis.py /path/to/trace.out --output_dir ./output_directory --llm_mode mock

# With OpenAI API for enhanced analysis
python enhanced_cuda_trace_analysis.py /path/to/trace.out --output_dir ./output_directory --llm_mode openai --api_key YOUR_API_KEY
```

## Command Line Options

- `trace_file`: Path to the CUDA trace file (required)
- `--output_dir`: Output directory for analysis results (default: ./cuda_analysis_results)
- `--llm_mode`: LLM analysis mode (choices: mock, openai, local; default: mock)
- `--api_key`: OpenAI API key (required for openai mode)
- `--model_endpoint`: Local LLM API endpoint (for local mode; default: http://localhost:8000/v1/chat/completions)
- `--skip_parsing`: Skip trace file parsing (use existing parsed data)
- `--skip_analysis`: Skip trace data analysis (use existing analysis results)
- `--skip_visualization`: Skip visualization enhancement (use existing enhanced visualizations)
- `--test_llm`: Test LLM analysis using the testing framework

## Output

The tool generates the following outputs in the specified output directory:
- Parsed trace data (JSON)
- Analysis results (JSON)
- Visualizations (PNG)
- Enhanced visualizations and dashboard
- LLM analysis reports (Markdown)
- HTML report with integrated visualizations and analysis
- Final summary report (Markdown)

## Components

- `cuda_trace_parser.py`: Parses CUDA trace files into structured data
- `cuda_trace_analyzer.py`: Analyzes trace data and generates visualizations
- `cuda_visualization_organizer.py`: Enhances visualizations and creates dashboard
- `enhanced_cuda_llm_analyzer.py`: Performs LLM-based analysis of trace data
- `enhanced_cuda_trace_analysis.py`: Main program integrating all components
- `cuda_prompt_templates.py`: Templates for LLM prompts
- `cuda_llm_analysis_tester.py`: Testing framework for LLM analysis
Loading