Skip to content

Fail to buid inference trt_llm image : make: *** [Makefile:64: release_build] Error 1 #1904

Open
@dadaguai-jiangjun

Description

@dadaguai-jiangjun

System Info

CPU: X86
Memory size: 2TB
GPU Name: H20
TensorRT-LLM: 0.10.0
OS:Alibaba Cloud Linux release 3 (Soaring Falcon)
GPU Driver:550.54.15
CUDA:cuda_12.4.r12.4/compiler.33961263_0
Docker: 26.1.3

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

1、cd nvtest-20240218
2、install nvtest
3、pip3 install paramiko
4、nvtest image make benchmarks/gpu/inference/trt_llm/

Expected behavior

successfully build trt_llm image

actual behavior

nvtest - INFO - #34 415.7
nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_40_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_4_bf16_gelu.generated.cu".
nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_64_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_64_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_32_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_32_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_40_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_40_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_64_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_64_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.7 gmake[3]: *** [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:636: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_f16_silu.generated.cu.o] Error 2
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_104_sm89.cubin.cpp.o
nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_qk_tanh_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_128_128_64_2_f16_silu.generated.cu".
nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_40_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 gmake[3]: *** [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:650: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_4_bf16_gelu.generated.cu.o] Error 2
nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_64_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_80_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_96_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_104_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_104_sm90.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_qk_tanh_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_sm89.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_qk_tanh_sm90.cubin.cpp.o
nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_sm90.cubin.cpp.o
nvtest - INFO - #34 415.8 gmake[3]: *** [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:244: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_128_128_64_2_f16_silu.generated.cu.o] Error 2
nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_40_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_40_sm89.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_80_sm89.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_64_sm89.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_64_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_80_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_96_sm89.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_96_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_160_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_160_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_16_sm89.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_192_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_192_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_256_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_256_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_32_sm89.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_160_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_16_sm89.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_160_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_192_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_192_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_256_alibi_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_16_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_256_tma_ws_sm90.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_32_sm89.cubin.cpp.o
nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_32_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_32_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 90%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_64_ldgsts_sm90.cubin.cpp.o
nvtest - INFO - #34 416.1 [ 90%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/fmhaRunner.cpp.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banBadWords.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banRepeatNgram.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels16.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels32.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels4.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels64.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels8.cu.o
nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cumsumLastDim.cu.o
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/buildRelativeAttentionBiasKernel.cu.o
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/customAllReduceKernels.cu.o
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention.cu.o
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decodingCommon.cu.o
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decodingKernels.cu.o
nvtest - INFO - #34 416.2 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_bf16_gelu.generated.cu".
nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/gptKernels.cu.o
nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/layernormKernels.cu.o
nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/groupGemm.cu.o
nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/lookupKernels.cu.o
nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/lruKernel.cu.o
nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mambaConv1dKernels.cu.o
nvtest - INFO - #34 416.3 gmake[3]: *** [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:594: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_bf16_gelu.generated.cu.o] Error 2
nvtest - INFO - #34 416.3 gmake[2]: *** [CMakeFiles/Makefile2:1092: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/all] Error 2
nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/penaltyKernels.cu.o
nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mixtureOfExperts/moe_kernels.cu.o
nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/quantization.cu.o
nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/preQuantScaleKernel.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/rmsnormKernels.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingAirTopPKernels.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopPKernels.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopKKernels.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/selectiveScan.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/explicitDraftTokensKernels.cu.o
nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/externalDraftTokensKernels.cu.o
nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/common.cu.o
nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/medusaDecodingKernels.cu.o
nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/kvCacheUpdateKernels.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/splitkGroupGemm.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/stopCriteriaKernels.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_bf16.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_fp8.cu.o
nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_int8.cu.o
nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_float.cu.o
nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_fp8.cu.o
nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_int8.cu.o
nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_fp8.cu.o
nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_half.cu.o
nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_int8.cu.o
nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/fp8Gemm.cu.o
nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/int8SQ.cu.o
nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4GroupwiseColumnMajorFalse.cu.o
nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4GroupwiseColumnMajorInterleavedTrue.cu.o
nvtest - INFO - #34 416.6 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4PerChannelColumnMajorFalse.cu.o
nvtest - INFO - #34 416.6 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4PerChannelColumnMajorInterleavedTrue.cu.o
nvtest - INFO - #34 416.7 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int8PerChannelColumnMajorFalse.cu.o
nvtest - INFO - #34 416.7 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int8PerChannelColumnMajorInterleavedTrue.cu.o
nvtest - INFO - #34 416.8 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int4GroupwiseColumnMajorFalse.cu.o
nvtest - INFO - #34 417.1 In file included from /src/tensorrt_llm/cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.h:22,
nvtest - INFO - #34 417.1 from /src/tensorrt_llm/cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.cu:44:
nvtest - INFO - #34 417.1 /src/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels.h:26:10: fatal error: cutlass/gemm/group_array_problem_shape.hpp: No such file or directory
nvtest - INFO - #34 417.1 26 | #include <cutlass/gemm/group_array_problem_shape.hpp>
nvtest - INFO - #34 417.1 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nvtest - INFO - #34 417.1 compilation terminated.
nvtest - INFO - #34 417.1 gmake[3]: *** [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:6278: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mixtureOfExperts/moe_kernels.cu.o] Error 1
nvtest - INFO - #34 417.1 gmake[3]: *** Waiting for unfinished jobs....
nvtest - INFO - #34 422.9 [ 96%] Built target common_src
nvtest - INFO - #34 423.7 [ 96%] Built target layers_src
nvtest - INFO - #34 426.5 [ 96%] Built target runtime_src
nvtest - INFO - #34 482.4 [ 96%] Linking CUDA device code CMakeFiles/cutlass_src.dir/cmake_device_link.o
nvtest - INFO - #34 482.5 [ 96%] Linking CXX static library libcutlass_src.a
nvtest - INFO - #34 482.7 [ 96%] Built target cutlass_src
nvtest - INFO - #34 521.1 gmake[2]: *** [CMakeFiles/Makefile2:1014: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/all] Error 2
nvtest - INFO - #34 868.5 [ 96%] Built target decoder_attention_src
nvtest - INFO - #34 868.5 gmake[1]: *** [CMakeFiles/Makefile2:969: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
nvtest - INFO - #34 868.5 gmake: *** [Makefile:205: tensorrt_llm] Error 2
nvtest - INFO - #34 868.5 Traceback (most recent call last):
nvtest - INFO - #34 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 389, in
nvtest - INFO - #34 868.5 main(**vars(args))
nvtest - INFO - #34 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 187, in main
nvtest - INFO - #34 868.5 build_run(
nvtest - INFO - #34 868.5 File "/usr/lib/python3.10/subprocess.py", line 526, in run
nvtest - INFO - #34 868.5 raise CalledProcessError(retcode, process.args,
nvtest - INFO - #34 868.5 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 120 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings benchmarks executorWorker ' returned non-zero exit status 2.
nvtest - INFO - #34 ERROR: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
nvtest - INFO - ------
nvtest - INFO - > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks --cuda_architectures 89-real;90-real:nvtest - INFO - 868.5 gmake[1]: *** [CMakeFiles/Makefile2:969: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
nvtest - INFO - 868.5 gmake: *** [Makefile:205: tensorrt_llm] Error 2
nvtest - INFO - 868.5 Traceback (most recent call last):
nvtest - INFO - 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 389, in
nvtest - INFO - 868.5 main(**vars(args))
nvtest - INFO - 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 187, in main
nvtest - INFO - 868.5 build_run(
nvtest - INFO - 868.5 File "/usr/lib/python3.10/subprocess.py", line 526, in run
nvtest - INFO - 868.5 raise CalledProcessError(retcode, process.args,
nvtest - INFO - 868.5 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 120 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings benchmarks executorWorker ' returned non-zero exit status 2.
nvtest - INFO - ------
nvtest - INFO - Dockerfile.multi:72
nvtest - INFO - --------------------
nvtest - INFO - 71 | ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks"
nvtest - INFO - 72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache
nvtest - INFO - 73 | >>> python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
nvtest - INFO - 74 |
nvtest - INFO - --------------------
nvtest - INFO - ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
nvtest - INFO - make: *** [Makefile:64: release_build] Error 1
nvtest - INFO - make: Leaving directory '/home/gpu_mode/nvtest-20240218/image/TensorRT-LLM/docker'
Traceback (most recent call last):
File "/usr/bin/nvtest", line 11, in
load_entry_point('nvtest==22.12.1', 'console_scripts', 'nvtest')()
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 445, in run
Command(sys.argv[1:])
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 52, in init
getattr(self, args.command)()
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 344, in image
args.func(args)
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 242, in _image
ret = self.host.run(final_command)
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/host.py", line 70, in run
return self.backend.run(command, *args, **kwargs)
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/backend/local.py", line 18, in run
return self.run_local(self.get_command(command, *args))
File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/backend/base.py", line 216, in run_local
stderr = p.stderr.read()
AttributeError: 'NoneType' object has no attribute 'read'

additional notes

① I found that the folders (cutlass cxxopts json NVTX) in TensorRT-LLM/3rdparty are empty, so I replaced them with what I downloaded on github.
② I have tried changing the --parallel count in TensorRT-LLM/scripts/build_wheel.py,but it didn't work.There is enough memory in my system(2TB).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions