-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-8511][feat] AutoDeploy: optimize fused_mlp_moe_kernel tiles #8597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[TRTLLM-8511][feat] AutoDeploy: optimize fused_mlp_moe_kernel tiles #8597
Conversation
📝 WalkthroughWalkthroughThis pull request adds approximately 100+ JSON configuration files for Triton fused Mixture-of-Experts (MoE) kernel tuning across multiple NVIDIA GPU devices, data types, and model configurations. Each file maps numeric keys to kernel launch parameters (BLOCK_SIZE_M, BLOCK_SIZE_N, BLOCK_SIZE_K, GROUP_SIZE_M, num_warps, num_stages). No code logic is altered—this is purely configuration data for the auto-deploy mechanism. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Rationale: While the changes are homogeneous (repetitive JSON configuration files following identical structure), the large volume (~100+ files) requires systematic spot-checking for JSON validity, parameter consistency, and plausibility of tuning values across devices. The review benefits from the consistent pattern but demands verification of coverage, proper naming conventions, and absence of obvious configuration errors or duplicates. Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (9)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json (1)
1-146: Confirm configuration tuning and validity.Since this PR introduces 100+ JSON config files without code changes, please verify:
- These configurations were generated through an automated tuning process on H100 hardware.
- The schema is validated at runtime when loading these configs.
- Fallback/default behavior is documented if a batch size key is not present in this file.
Consider adding a README or schema validation to document expected parameter ranges and any inter-parameter constraints (e.g., GROUP_SIZE_M must divide BLOCK_SIZE_M).
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json (1)
1-146: Verify batch size lookup strategy and fallback handling in loader code.This JSON configuration file is syntactically valid and appears well-structured for mapping batch sizes to kernel tuning parameters. However, several aspects require verification in the loading/lookup mechanism:
Batch size coverage: The file defines configurations for specific batch sizes (1, 2, 4, 8, 16, 24, 32, 48, 64, 96, 128, 256, 512, 1024, 1536, 2048, 3072, 4096). How are requests for batch sizes not in this set handled? (e.g., batch size 5, 10, etc.)
Fallback mechanism: The PR mentions fallback to a default configuration if the device config is not found. Confirm that the fallback also handles batch sizes not defined in this file.
Parameter validation: Ensure the loading code validates that required fields (BLOCK_SIZE_M/N/K, GROUP_SIZE_M, num_warps, num_stages) are present and have reasonable numeric values.
Please verify the configuration loading mechanism by examining the code that reads these JSON files. The verification should confirm:
- How batch size lookups are performed (exact match vs. closest match vs. fallback to default)
- Error handling for missing batch sizes
- Parameter validation before use
Consider adding a top-level
_metadataobject to document the configuration schema and version:{ "_metadata": { "version": "1.0", "device": "NVIDIA_H200", "dtype": "fp8_w8a8", "E": 128, "N": 768, "block_shape": [128, 128], "description": "Triton MoE kernel parameters indexed by batch size" }, "1": { "BLOCK_SIZE_M": 64, ... } }This would improve maintainability and self-documentation without changing the core lookup logic.
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json (1)
2-145: Configuration patterns appear reasonable but lack tuning documentation.The file shows sensible progression: small batches (1–128) use conservative block sizes (BLOCK_SIZE_M=16), while larger batches (256+) scale up to BLOCK_SIZE_M=64 with increasing GROUP_SIZE_M, which aligns with typical Triton kernel tuning heuristics. However,
num_warps(4) andnum_stages(3) remain constant across all batch sizes—clarify whether this is device-specific tuning wisdom or a missed opportunity for further optimization.Additionally, include a brief comment (either in the filename, a README, or the PR description) documenting the tuning methodology: Which benchmark/workload was used? What performance metric was optimized? This context is valuable for future maintenance and validation.
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json (2)
1-218: Consider adding schema metadata for maintainability and validation.With 100+ configuration files across multiple devices and configurations, adding metadata to the JSON would improve robustness:
{ "_schema_version": "1.0", "_generated_at": "2025-10-22", "_device": "NVIDIA_A100-SXM4-80GB", "_experts": 16, "_hidden_dim": 1792, "_notes": "Batch sizes are keys; values are Triton kernel launch parameters.", "1": { ... }, ... }This enables:
- Validation of file structure and compatibility
- Tracking of config age and maintenance
- Documentation of filename encoding (what do E and N represent?)
- Easier debugging of issues across the configuration suite
1-218: Establish validation and testing strategy for the configuration suite.With 100+ device-specific configuration files, consider:
- Schema validation: Add a CI check that validates all JSON files conform to the expected schema (required fields, value ranges).
- Coverage testing: Verify that batch sizes in each config file cover the expected range and that interpolation/fallback logic is tested.
- Consistency checks: Flag suspicious patterns (e.g., identical configs across different device types) that might indicate copy-paste errors.
- Configuration generation process: Document how these configs were tuned/generated (benchmark tool, hyperparameter search strategy, reproducibility) so future maintainers can regenerate or update them.
- Integration tests: Load each config file and verify it produces valid kernel launch parameters when queried.
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json (1)
1-146: Valid configuration file; note indentation inconsistency.JSON syntax and structure are correct. Extended batch-size coverage (26 keys) is appropriate for larger model scenarios. However, this file uses 2-space indentation while earlier files (files 1–4) use 4-space indentation. This is a minor formatting inconsistency across the configuration set.
Consider normalizing indentation to 4 spaces across all configuration files for consistency.
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json (1)
1-146: Valid configuration file; note indentation inconsistency.JSON syntax and structure are correct. Configuration values are within Triton valid ranges. However, this file uses 2-space indentation while most other files use 4-space indentation, reinforcing the formatting inconsistency observed across the configuration set.
Normalize indentation to 4 spaces across all configuration files for consistency.
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3072,device_name=NVIDIA_H200,dtype=int8_w8a16.json (1)
1-146: JSON is valid, but indentation is inconsistent with the companion configuration file.This file uses 4-space indentation while
E=128,N=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8.jsonuses 2-space indentation. While not a functional issue, normalizing indentation across all configuration files would improve consistency and maintainability.tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json (1)
2-17: Minor: Identical configurations for batch sizes 1 and 2.Batch sizes "1" and "2" share identical kernel parameters. While this may be intentional (treating both as "small batch" cases), verify this was not an unintended duplication during configuration generation or tuning.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (107)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=1792,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3072,device_name=NVIDIA_H200,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=1,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1024,device_name=NVIDIA_H100,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1024,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=192,device_name=NVIDIA_H20-3e.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=192,device_name=NVIDIA_H20.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=192,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H20-3e.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H20.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=384,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H20.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=768,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=96,device_name=NVIDIA_H20.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1024,device_name=NVIDIA_B200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1024,device_name=NVIDIA_H100.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1024,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=2048,device_name=NVIDIA_H200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3072,device_name=NVIDIA_H200,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=192,device_name=NVIDIA_H20-3e.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=320,device_name=NVIDIA_H20-3e.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=512,device_name=NVIDIA_H100_80GB_HBM3.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=32,N=1408,device_name=NVIDIA_B200.json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=32,N=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=32,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=384,N=128,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=384,N=128,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json(1 hunks)
⛔ Files not processed due to max files limit (24)
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=384,N=128,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=384,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=384,N=256,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=40,N=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=128,device_name=NVIDIA_B200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=128,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=128,device_name=NVIDIA_H20-3e.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=128,device_name=NVIDIA_H200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_B200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_H100_80GB_HBM3.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_H20-3e.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=256,device_name=NVIDIA_H200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=512,device_name=NVIDIA_B200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=512,device_name=NVIDIA_GB200,dtype=fp8_w8a8.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=512,device_name=NVIDIA_H100_80GB_HBM3.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=512,device_name=NVIDIA_H20-3e.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=512,device_name=NVIDIA_H200.json
- tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=512,N=64,device_name=NVIDIA_B200.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
| { | ||
| "triton_version": "3.4.0", | ||
| "1": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "2": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "4": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "8": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "16": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "24": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 64, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "32": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "48": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "64": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "96": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "128": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "256": { | ||
| "BLOCK_SIZE_M": 32, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "512": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "1024": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "1536": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "2048": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "3072": { | ||
| "BLOCK_SIZE_M": 256, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "4096": { | ||
| "BLOCK_SIZE_M": 256, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Validate batch-size key format and configuration file contract.
The JSON uses string keys (e.g., "1", "2", ..., "4096") to represent batch sizes. Ensure that:
- The consuming code correctly parses batch-size keys as strings and performs the intended lookup/matching logic
- All numeric parameter values (BLOCK_SIZE_M, BLOCK_SIZE_N, BLOCK_SIZE_K, GROUP_SIZE_M, num_warps, num_stages) are within valid ranges for Triton kernel configuration
- The triton_version field matches the version expected by the deployment system
Run a verification to check:
- Are batch-size keys consistently treated as strings across all similar config files?
- Are there any out-of-bounds or malformed entries across all device config files that follow this pattern?
🏁 Script executed:
#!/bin/bash
# Description: Validate JSON structure and key format consistency across all triton_fused_moe_configs files
# Check if all config files are valid JSON
echo "Validating JSON syntax..."
find tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs -name "*.json" -exec sh -c '
jq empty "$1" 2>&1 || echo "Invalid JSON: $1"
' _ {} \;
# Sample check: Verify that all files have triton_version field
echo -e "\nChecking triton_version presence..."
find tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs -name "*.json" -exec sh -c '
if ! jq -e ".triton_version" "$1" > /dev/null 2>&1; then
echo "Missing triton_version: $1"
fi
' _ {} \;
# Check for inconsistent GROUP_SIZE_M values relative to BLOCK_SIZE_M
echo -e "\nChecking GROUP_SIZE_M <= BLOCK_SIZE_M constraint..."
find tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs -name "*.json" -exec sh -c '
jq -r "to_entries[] | select(.value | type == \"object\" and has(\"GROUP_SIZE_M\") and has(\"BLOCK_SIZE_M\")) | select(.value.GROUP_SIZE_M > .value.BLOCK_SIZE_M) | \"\(.key): GROUP_SIZE_M=\(.value.GROUP_SIZE_M) > BLOCK_SIZE_M=\(.value.BLOCK_SIZE_M)\"" "$1" | while read line; do
[ -n "$line" ] && echo "$1: $line"
done
' _ {} \;Length of output: 100555
Fix systematic configuration constraint violations across triton_fused_moe_configs.
Verification reveals two critical issues affecting this file and nearly all others in the directory:
-
Missing triton_version field: 243+ config files lack the required
triton_versionfield. The reviewed file correctly includes"triton_version": "3.4.0", but this is exceptional. -
GROUP_SIZE_M > BLOCK_SIZE_M constraint violations: Widespread violations detected. In the reviewed file, batch size key
"24"violates the constraint:GROUP_SIZE_M=64 > BLOCK_SIZE_M=16. This pattern is repeated across hundreds of entries in all device config files, affecting batch sizes 1, 2, 4, 8, 16, 24, 32, 48, 64, 96, 128, 256, 512, 1024, and higher.
The scale and consistency of these violations across the entire config directory suggest either a broken config generation pipeline or a schema/validation gap. Audit the generation process and ensure all configs satisfy kernel constraints before deployment.
🤖 Prompt for AI Agents
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_H100_80GB_HBM3.json
lines 1-147: this file (and many others) either miss the required triton_version
field or contain entries where GROUP_SIZE_M > BLOCK_SIZE_M (e.g. key "24" has
GROUP_SIZE_M=64 while BLOCK_SIZE_M=16). Fix by ensuring every config has a
"triton_version" key set (add if missing) and enforce GROUP_SIZE_M <=
BLOCK_SIZE_M for every batch-size entry (adjust violating values to be <=
BLOCK_SIZE_M, e.g. change GROUP_SIZE_M from 64 to 16 or recompute to a valid
divisor), and add/restore a generation-time validation step (and CI check) that
rejects config files that fail these schema constraints so the generator cannot
emit invalid configs.
| { | ||
| "2": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 64, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "4": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "8": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 32, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "16": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "32": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "64": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "128": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "256": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "512": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 3 | ||
| }, | ||
| "1024": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 8, | ||
| "num_stages": 3 | ||
| }, | ||
| "2048": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "3072": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "4096": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "8192": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "16384": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 32, | ||
| "num_warps": 8, | ||
| "num_stages": 2 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify JSON structure and batch-size key lookup in the consuming code.
The JSON file is syntactically valid and well-formatted, with consistent parameter structures across all batch sizes. However, without access to the Python code that loads this file, I cannot confirm critical integration details.
Please verify the following using the code that loads these configuration files:
- Batch size key type handling: Are the string keys (e.g., "2", "4") correctly parsed? Does the code convert the incoming batch size to a string before lookup, or does it expect integer keys?
- Error handling & fallback: If a batch size is not found in the config, does the code gracefully fall back to a default configuration as mentioned in the PR objectives?
- File path resolution: Confirm that the filename convention (especially the device_name, E, N, and dtype parameters) is correctly resolved at runtime.
Run the following verification script to locate and inspect the code that consumes these configuration files:
🏁 Script executed:
#!/bin/bash
# Search for code that loads triton_fused_moe_configs JSON files
# Find Python files that reference the config directory or load JSON configs
rg -n "triton_fused_moe_configs|fused_moe.*config" --type=py -A 5 -B 2
# Search for JSON loading logic to understand key type handling
rg -n "json\.load|json\.loads" --type=py tensorrt_llm/_torch/auto_deploy -A 3 -B 1 | head -50Length of output: 2554
🏁 Script executed:
#!/bin/bash
# Get full function context around the config loading code
sed -n '190,240p' tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py
# Find the function definition and caller context
rg -n "def.*config|batch_size" tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py -A 2 -B 1 | head -100
# Search for error handling and KeyError catches
rg -n "except|KeyError|get\(|fallback|default" tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py -B 2 -A 2Length of output: 4805
Fix dtype parameter not being passed to get_moe_configs().
The JSON file structure is valid and batch-size key handling is correct: string keys are converted to integers at line 220, and lookup uses the closest matching batch size to avoid KeyError.
However, a critical bug was found in the calling code: _get_kernel_config() at line 234 in tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py hard-codes dtype=None instead of using the dtype parameter passed to the function. Since the config filename includes dtype (e.g., dtype=fp8_w8a8.json), this causes the wrong configuration file to be selected.
Change line 234 from:
configs = get_moe_configs(E, N, dtype=None)to:
configs = get_moe_configs(E, N, dtype=dtype)🤖 Prompt for AI Agents
In tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py around
line 234, the call to get_moe_configs(...) hard-codes dtype=None; update the
call to pass the function's dtype parameter (i.e., use dtype=dtype) so the
correct dtype-specific JSON config file is selected.
| { | ||
| "1": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "2": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "4": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "8": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "16": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "24": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "32": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "48": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "64": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "96": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "128": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "256": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "512": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "1024": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "1536": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "2048": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "3072": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 32, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "4096": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 32, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
JSON structure is valid; parameter ranges appear reasonable for B200 FP8 tuning.
The configuration file is well-formed with strategically spaced batch size entries (sparse at small batch sizes, exponential-like spacing at larger sizes). Parameter transitions are coherent: BLOCK_SIZE_M and GROUP_SIZE_M scale with batch size, and num_stages/BLOCK_SIZE_K vary appropriately. All values fall within expected ranges for Triton kernel tuning (num_warps=4, BLOCK_SIZE_N=128 fixed; BLOCK_SIZE_M∈{16,64}, GROUP_SIZE_M∈{1,16,32}, num_stages∈{3,4,5}).
However, I need to verify the integration with the calling code and tuning methodology.
Run the following script to verify the batch size lookup strategy and confirm that fallback logic handles missing configs gracefully:
Additionally, verify that these configurations were generated/tuned for the B200 device. Please provide or link to:
- Documentation on how these parameters were auto-tuned or benchmarked
- Evidence that this config was validated for correctness and performance on B200
[/request_verification]
🏁 Script executed:
#!/bin/bash
# Description: Inspect the config loading code for batch size lookup and fallback strategy.
# Find Python files that load these JSON configs
fd -e py -x grep -l 'triton_fused_moe_configs' {} \; 2>/dev/null
# Search for the config loader function/class
rg -nP 'def.*(?:load|select).*config|class.*ConfigLoader' --type=py -g 'fused_moe*'
# Verify fallback mechanism for missing batch sizes
rg -nP 'fallback|default.*config|nearest|batch.*size.*lookup' --type=py -g 'fused_moe*' -A 3 -B 1Length of output: 68
🏁 Script executed:
#!/bin/bash
# Examine the config loading code in triton_moe.py
cat ./tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.pyLength of output: 16987
🏁 Script executed:
#!/bin/bash
# Verify how _get_kernel_config is called and whether there are other callers
rg -n '_get_kernel_config' --type=pyLength of output: 226
JSON file is properly formatted, but config loading code has critical bugs preventing this file from being used.
The JSON structure is valid and parameter ranges are reasonable for B200 FP8 tuning. However, the configuration will never be loaded due to:
-
dtype hardcoded to None (line 234):
get_moe_configs(E, N, dtype=None)causesget_config_file_name()to generate filenames without dtype/block_shape. The file named withdtype=fp8_w8a8,block_shape=[128,128]will never be found; the system falls back to generic defaults. -
Parameter type mismatch (line 410):
_get_kernel_config(M, E, inter_size2, H, top_k)passes H (tensor dimension) and top_k (int) where the function expectsdtype: str | Noneandblock_shape: list[int]. While these parameters are currently unused in the function body, this indicates incorrect API usage and prevents dtype/block_shape from being passed through correctly.
Fix the dtype hardcoding in get_moe_configs() call and correct the parameter types in the caller to enable device/dtype-specific configurations.
| { | ||
| "1": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "2": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "4": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "8": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "16": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "24": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "32": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "48": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "64": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "96": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "128": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "256": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "512": { | ||
| "BLOCK_SIZE_M": 32, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "1024": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "1536": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "2048": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "3072": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "4096": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 16, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
JSON is valid; flag device name inconsistency in documentation.
The H100 configuration is well-formed and shows appropriate device-specific tuning divergence from the GB200 variant (e.g., different GROUP_SIZE_M and num_stages for mid-range batch sizes), which reflects distinct GPU architectures. However, the AI summary claims this file is named ...H100_80GB_HBM3... but the actual filename is NVIDIA_H100.
Clarify the device naming convention. Across the ~100+ configuration files, should device identifiers include memory/variant details (e.g., H100_80GB_HBM3) or remain generic (e.g., NVIDIA_H100)? Ensure the naming is consistent with the device lookup key used at runtime. Run the following verification to check naming consistency across the config directory:
🏁 Script executed:
#!/bin/bash
# List all device config files and their naming patterns
find tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs -name "*.json" -type f | \
sed 's/.*device_name=//; s/,dtype.*//' | sort | uniq -c | sort -rnLength of output: 598
Rename file: NVIDIA_H100 → NVIDIA_H100_80GB_HBM3 to match config directory convention.
The script output confirms a critical naming inconsistency. Across ~178 configuration files, 51+ use the detailed device variant NVIDIA_H100_80GB_HBM3, while only 4 (including this file) use the generic NVIDIA_H100. Similar patterns hold for other device families (H200, H20, A100, etc.), where detailed names with memory/variant specifiers are the standard.
The file under review uses the minority naming pattern and should be renamed to NVIDIA_H100_80GB_HBM3 to align with the convention. Misnamed configs may fail device lookup or apply incorrect tuning parameters at runtime.
🤖 Prompt for AI Agents
In
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json
(lines 1-146) the device name in the filename uses the generic "NVIDIA_H100" but
the repo convention requires the detailed variant "NVIDIA_H100_80GB_HBM3";
rename the file to replace NVIDIA_H100 with NVIDIA_H100_80GB_HBM3 (keeping the
rest of the filename identical), update any references in code/CI/configs that
import or lookup this filename to the new name, and verify there are no
duplicate files after the rename.
|
@nzmora-nvidia : what kind of perf upside are you seeing with the ability to dynamically select the config? |
…kernel For the triton fused_moe_kernel, search for a device-specific (skew) tile size configuration using the batch size as key. Each device has it's own configuration file in JSON format. If the config file is not found then we revert to the default tile size configuration. Signed-off-by: Neta Zmora <[email protected]>
fd27001 to
7a4518b
Compare
|
/bot run |
| @@ -0,0 +1,146 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is INT8 needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this.
For the triton fused_moe_kernel, search for a device-specific (skew) tile size configuration using the batch size as key. Each device has it's own configuration file in JSON format. If the config file is not found then we revert to the default tile size configuration.
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.