Skip to content

Preserve run details and allow custom stack metadata in "Run-Only" mode #876

@adinilfeld

Description

@adinilfeld

Component

Analysis/Plotting

Desired use case or feature

When executing llmdbenchmark run against an existing inference endpoint (e.g., using --endpoint-url without running a prior standup), the resulting benchmark_report.json should still contain complete and accurate metadata. Currently, much of this metadata drops out or defaults to empty due to a mix of subshell scoping bugs and hardcoded assumptions about the standup phase.

There are two distinct areas where metadata drops out during run-only execution:

  1. Lost Harness Execution Details: The harness execution wrapper (workload/harnesses/inference-perf-llm-d-benchmark.sh) properly attempts to record run metrics by exporting environment variables right before it exits:
export LLMDBENCH_HARNESS_START=$(date -d "@${start}" --iso-8601=seconds)
export LLMDBENCH_HARNESS_ARGS="--config_file ..."
export LLMDBENCH_HARNESS_VERSION=...

However, the parent orchestrator (build/llm-d-benchmark.sh) executes this wrapper as a child process (/usr/local/bin/${LLMDBENCH_RUN_EXPERIMENT_HARNESS}). Because it is not sourced, the exported variables die with the subshell.

By the time the analyzer executes (/usr/local/bin/${LLMDBENCH_RUN_EXPERIMENT_ANALYZER}), those variables are lost. The conversion script (native_to_br0_2.py) attempts os.environ.get("LLMDBENCH_HARNESS_START"), finds nothing, and leaves scenario.load.native.args and the custom timing metrics empty or null.

  1. Missing stack specification ConfigMap: Because standup wasn't used, the llm-d-benchmark-standup-parameters ConfigMap is never created in the cluster, and /standup/ev.yaml isn't mounted into the launcher pod. When native_to_br0_2.py queries the active Kubernetes namespace to populate the stack specifications (like TP, DP, accelerator type, model name), it catches an empty volume/ConfigMap, yielding a totally blank stack representation in the final report.

Proposed solution

  1. Fix the subshell variable loss: Instead of relying on passing environment variables between sequential child scripts, write the metadata (start time, delta, CLI args, tool version) from the harness script to a predictable intermediary file within $LLMDBENCH_RUN_EXPERIMENT_RESULTS_DIR (e.g., run_metadata.yaml). Modify native_to_br0_2.py to read this artifact rather than strictly relying on os.environ.

  2. Support custom Stack definitions in Run-Only Mode: Permit users running with --endpoint-url to provide an optional --topology YAML file or flag (e.g. --tp=4, --accelerator=h100) that manually passes sequence dimensions to the launcher. The analyzer would fallback to injecting these direct CLI values into the resulting BenchmarkReport so users analyzing external endpoints still get fully populated scenario.stack blocks.

Alternatives

No response

Additional context or screenshots

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions