-
Notifications
You must be signed in to change notification settings - Fork 350
multi GPU error #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
我的cuda是11.7 |
尝试降低python至3.8.5仍然出现类似错误 |
code:
|
这个越界问题确实源于不正确的权重配置,[PAD]特殊token带来了额外的token id ,但是然而这个问题却只出现在了自定义的数据,这仍然无法解释 |
你好,请为你的自定义数据集是什么样子的呢?目前出现的问题是使用自定义数据集会出现token id越界的现象吗?如果尝试将[PAD]去掉是否可以解决你的问题呢? |
自定义数据就是一般的多元时序数据,目前出现的问题是,我从modelscope下载的权重缺少eostoken,然后导致自定义数据失败,但是这个在ETTh2上却不会出现问题,然后我目前使用了您项目中的权重地址下载了相关权重,发现可以正常训练了。我预计是由于越界,我同时打印了最小和最大的token id 发现最小是0,最大是32000(这样就有32001个token id),词表只有32000,但是如果越界应该同时发生,然而ETTh2上却没有出现,所以我比较困惑。删去[PAD] token我目前尚未尝试这一步骤,我将会在后续实验中尝试,进一步反馈 |
您好 我在A100和V100上都跑了都出现了您图中的报错,我也是用的modelscope的llama权重,请问您提到的“您项目中的权重地址”在哪儿,我没找到,想试一下,谢谢 |
1 similar comment
您好 我在A100和V100上都跑了都出现了您图中的报错,我也是用的modelscope的llama权重,请问您提到的“您项目中的权重地址”在哪儿,我没找到,想试一下,谢谢 |
你可以使用作者源码里提供的hugging face 地址找到需要的权重,modelscope的权重是会出现上述问题 |
“作者源码里提供的hugging face 地址” 在哪个代码文件里? 刚才没找到, 能麻烦给一下链接吗,谢谢 |
|
Uh oh!
There was an error while loading. Please reload this page.
我用两张A6000 96GB和两张GV100 尝试运行LLama 模型,但是cuda报错
单卡bert是能够正常运行,但是一旦切换到双卡就在soure_embedding前向传播部分开始报错
source embeddings = self.mapping_layer(self.word embeddings.permute(1, 0)).permute(1, 0)
报错如下,请问有碰见过类似情况的吗,求助!
[2024-06-01 22:11:10,870] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
The following values were not passed to
accelerate launch
and had defaults used instead:--num_machines
was set to a value of1
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.[2024-06-01 22:11:13,842] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-01 22:11:14,481] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-01 22:11:15,874] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-01 22:11:16,787] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-01 22:11:16,787] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Loading checkpoint shards: 100%|████████████████████████████████████| 33/33 [00:13<00:00, 2.48it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 33/33 [00:13<00:00, 2.45it/s]
[2024-06-01 22:12:03,981] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.13.0, git-hash=unknown, git-branch=unknown
[2024-06-01 22:12:15,085] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-06-01 22:12:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-06-01 22:12:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-06-01 22:12:15,087] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = Adam
[2024-06-01 22:12:15,087] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=Adam type=<class 'torch.optim.adam.Adam'>
[2024-06-01 22:12:15,087] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-06-01 22:12:15,087] [INFO] [stage_1_and_2.py:143:init] Reduce bucket size 200000000
[2024-06-01 22:12:15,087] [INFO] [stage_1_and_2.py:144:init] Allgather bucket size 200000000
[2024-06-01 22:12:15,087] [INFO] [stage_1_and_2.py:145:init] CPU Offload: False
[2024-06-01 22:12:15,087] [INFO] [stage_1_and_2.py:146:init] Round robin gradient partitioning: False
0it [00:00, ?it/s][2024-06-01 22:12:15,837] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-06-01 22:12:15,837] [INFO] [utils.py:792:see_memory_usage] MA 12.51 GB Max_MA 12.55 GB CA 12.55 GB Max_CA 13 GB
[2024-06-01 22:12:15,838] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 9.52 GB, percent = 1.9%
[2024-06-01 22:12:15,934] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-06-01 22:12:15,935] [INFO] [utils.py:792:see_memory_usage] MA 12.68 GB Max_MA 12.94 GB CA 12.98 GB Max_CA 13 GB
[2024-06-01 22:12:15,935] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 9.52 GB, percent = 1.9%
[2024-06-01 22:12:15,935] [INFO] [stage_1_and_2.py:533:init] optimizer state initialized
[2024-06-01 22:12:16,027] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-06-01 22:12:16,027] [INFO] [utils.py:792:see_memory_usage] MA 12.68 GB Max_MA 12.68 GB CA 12.98 GB Max_CA 13 GB
[2024-06-01 22:12:16,027] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 9.52 GB, percent = 1.9%
[2024-06-01 22:12:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = Adam
[2024-06-01 22:12:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-06-01 22:12:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-06-01 22:12:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0003999999999999993], mom=[(0.95, 0.999)]
[2024-06-01 22:12:16,028] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] amp_enabled .................. False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] amp_params ................... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] bfloat16_enabled ............. True
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7bd3dbb9f6d0>
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] communication_data_type ...... None
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] curriculum_params_legacy ..... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] data_efficiency_enabled ...... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] dataloader_drop_last ......... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] disable_allgather ............ False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] dump_state ................... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_enabled ........... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] eigenvalue_verbose ........... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] elasticity_enabled ........... False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] fp16_auto_cast ............... None
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] fp16_enabled ................. False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] global_rank .................. 0
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] grad_accum_dtype ............. None
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] gradient_accumulation_steps .. 1
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] gradient_clipping ............ 0.0
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] graph_harvesting ............. False
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-06-01 22:12:16,029] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] load_universal_checkpoint .... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] loss_scale ................... 1.0
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] memory_breakdown ............. False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] mics_hierarchial_params_gather False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] mics_shard_size .............. -1
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] optimizer_name ............... None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] optimizer_params ............. None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] pld_enabled .................. False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] pld_params ................... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] prescale_gradients ........... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] scheduler_name ............... None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] scheduler_params ............. None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] sparse_attention ............. None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] steps_per_print .............. inf
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] train_batch_size ............. 48
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 24
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] use_data_before_expert_parallel_ False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] use_node_local_storage ....... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] wall_clock_breakdown ......... False
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] weight_quantization_config ... None
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] world_size ................... 2
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] zero_allow_untested_optimizer True
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] zero_enabled ................. True
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. True
[2024-06-01 22:12:16,030] [INFO] [config.py:988:print] zero_optimization_stage ...... 2
[2024-06-01 22:12:16,030] [INFO] [config.py:974:print_user_config] json = {
"bf16": {
"enabled": true,
"auto_cast": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2.000000e+08,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2.000000e+08,
"contiguous_gradients": true,
"sub_group_size": 1.000000e+09
},
"gradient_accumulation_steps": 1,
"train_batch_size": 48,
"train_micro_batch_size_per_gpu": 24,
"steps_per_print": inf,
"wall_clock_breakdown": false,
"fp16": {
"enabled": false
},
"zero_allow_untested_optimizer": true
}
0it [00:00, ?it/s]../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [183,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [183,0,0], thread: [65,0,0] Assertion
srcIndex < srcSelectDimSize
failed....
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [221,0,0], thread: [127,0,0] Assertion
srcIndex < srcSelectDimSize
failed.0it [00:00, ?it/s]
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [523,0,0], thread: [32,0,0] Assertion
srcIndex < srcSelectDimSize
failed.....
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [523,0,0], thread: [63,0,0] Assertion
srcIndex < srcSelectDimSize
failed.0it [00:00, ?it/s]
Traceback (most recent call last):
File "/media/lenovo/DATA/zth/Time-LLM-main/run_main.py", line 211, in
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1842, in forward
loss = self.module(*inputs, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/lenovo/DATA/zth/Time-LLM-main/models/TimeLLM.py", line 197, in forward
dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
File "/media/lenovo/DATA/zth/Time-LLM-main/models/TimeLLM.py", line 238, in forecast
source_embeddings = self.mapping_layer(self.word_embeddings.permute(1, 0)).permute(1, 0)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling
cublasCreate(handle)
Traceback (most recent call last):
File "/media/lenovo/DATA/zth/Time-LLM-main/run_main.py", line 211, in
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1842, in forward
loss = self.module(*inputs, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/lenovo/DATA/zth/Time-LLM-main/models/TimeLLM.py", line 197, in forward
dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
File "/media/lenovo/DATA/zth/Time-LLM-main/models/TimeLLM.py", line 238, in forecast
source_embeddings = self.mapping_layer(self.word_embeddings.permute(1, 0)).permute(1, 0)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling
cublasCreate(handle)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x70cf3ddaf4d7 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x70cf3dd7936b in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x70cf52d42b58 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x70cee4777450 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x70cee477aa28 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x227 (0x70cee477bf77 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xdc253 (0x70cf3d2dc253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: + 0x94ac3 (0x70cf5d494ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: + 0x126850 (0x70cf5d526850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7bd4877af4d7 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7bd48777936b in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7bd491003b58 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7bd419177450 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7bd41917aa28 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x227 (0x7bd41917bf77 in /home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xdc253 (0x7bd471cdc253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: + 0x94ac3 (0x7bd491e94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: + 0x126850 (0x7bd491f26850 in /lib/x86_64-linux-gnu/libc.so.6)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 5552) of binary: /home/lenovo/anaconda3/envs/timellm/bin/python
Traceback (most recent call last):
File "/home/lenovo/anaconda3/envs/timellm/bin/accelerate", line 8, in
sys.exit(main())
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/accelerate/commands/launch.py", line 932, in launch_command
multi_gpu_launcher(args)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/accelerate/commands/launch.py", line 627, in multi_gpu_launcher
distrib_run.run(args)
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/lenovo/anaconda3/envs/timellm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
run_main.py FAILED
Failures:
[1]:
time : 2024-06-01_22:12:17
host : lenovo-ThinkStation-P7
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 5553)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 5553
Root Cause (first observed failure):
[0]:
time : 2024-06-01_22:12:17
host : lenovo-ThinkStation-P7
rank : 0 (local_rank: 0)
exitcode : -6 (pid: 5552)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 5552
The text was updated successfully, but these errors were encountered: