Skip to content

RuntimeError: expected scalar type Float but found BFloat16 #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gsamaras opened this issue Feb 6, 2024 · 17 comments
Closed

RuntimeError: expected scalar type Float but found BFloat16 #3

gsamaras opened this issue Feb 6, 2024 · 17 comments

Comments

@gsamaras
Copy link

gsamaras commented Feb 6, 2024

I am trying to run the ETTm1 example, but despite a plethora of efforts, I keep getting:

[2024-02-07 17:07:11,875] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-07 17:07:12,281] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.13.1, git-hash=unknown, git-branch=unknown
[2024-02-07 17:07:12,282] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-07 17:07:12,282] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-02-07 17:07:12,293] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=172.19.2.2, master_port=29500
[2024-02-07 17:07:12,293] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-02-07 17:07:13,600] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-02-07 17:07:13,601] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-02-07 17:07:13,601] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-02-07 17:07:13,602] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = Adam
[2024-02-07 17:07:13,602] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=Adam type=<class 'torch.optim.adam.Adam'>
[2024-02-07 17:07:13,603] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:143:__init__] Reduce bucket size 200000000
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:144:__init__] Allgather bucket size 200000000
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:145:__init__] CPU Offload: False
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:146:__init__] Round robin gradient partitioning: False
[2024-02-07 17:07:13,759] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-02-07 17:07:13,760] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:13,761] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.27 GB, percent = 7.2%
[2024-02-07 17:07:13,980] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-02-07 17:07:13,981] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:13,981] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.32 GB, percent = 7.4%
[2024-02-07 17:07:13,981] [INFO] [stage_1_and_2.py:533:__init__] optimizer state initialized
[2024-02-07 17:07:14,103] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-02-07 17:07:14,104] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:14,105] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.32 GB, percent = 7.4%
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = Adam
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[3.9999999999999996e-05], mom=[(0.95, 0.999)]
[2024-02-07 17:07:14,108] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   amp_enabled .................. False
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   amp_params ................... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   bfloat16_enabled ............. True
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_parallel_write_pipeline  False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_tag_validation_enabled  True
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_tag_validation_fail  False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7985856dae60>
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   communication_data_type ...... None
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   curriculum_enabled_legacy .... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   curriculum_params_legacy ..... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   data_efficiency_enabled ...... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dataloader_drop_last ......... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   disable_allgather ............ False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dump_state ................... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dynamic_loss_scale_args ...... None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_enabled ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_gas_boundary_resolution  1
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_layer_num ......... 0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_max_iter .......... 100
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_stability ......... 1e-06
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_tol ............... 0.01
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_verbose ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   elasticity_enabled ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_auto_cast ............... None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_enabled ................. False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_master_weights_and_gradients  False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   global_rank .................. 0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   grad_accum_dtype ............. None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_accumulation_steps .. 1
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_clipping ............ 0.0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_predivide_factor .... 1.0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   graph_harvesting ............. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   initial_dynamic_scale ........ 1
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   load_universal_checkpoint .... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   loss_scale ................... 1.0
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   memory_breakdown ............. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   mics_hierarchial_params_gather  False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   mics_shard_size .............. -1
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_legacy_fusion ...... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_name ............... None
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_params ............. None
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pld_enabled .................. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pld_params ................... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   prescale_gradients ........... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   scheduler_name ............... None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   scheduler_params ............. None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   seq_parallel_communication_data_type  torch.float32
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   sparse_attention ............. None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   sparse_gradients_enabled ..... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   steps_per_print .............. inf
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   train_batch_size ............. 24
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   train_micro_batch_size_per_gpu  24
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   use_data_before_expert_parallel_  False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   use_node_local_storage ....... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   wall_clock_breakdown ......... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   weight_quantization_config ... None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   world_size ................... 1
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_allow_untested_optimizer  True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_enabled ................. True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_force_ds_cpu_optimizer .. True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_optimization_stage ...... 2
[2024-02-07 17:07:14,113] [INFO] [config.py:974:print_user_config]   json = {
    "bf16": {
        "enabled": true, 
        "auto_cast": true
    }, 
    "zero_optimization": {
        "stage": 2, 
        "allgather_partitions": true, 
        "allgather_bucket_size": 2.000000e+08, 
        "overlap_comm": true, 
        "reduce_scatter": true, 
        "reduce_bucket_size": 2.000000e+08, 
        "contiguous_gradients": true, 
        "sub_group_size": 1.000000e+09
    }, 
    "gradient_accumulation_steps": 1, 
    "train_batch_size": 24, 
    "train_micro_batch_size_per_gpu": 24, 
    "steps_per_print": inf, 
    "wall_clock_breakdown": false, 
    "fp16": {
        "enabled": false
    }, 
    "zero_allow_untested_optimizer": true
}
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/kaggle/working/Time-LLM/run_main.py", line 208, in <module>
    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1842, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/models/Autoformer.py", line 146, in forward
    dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
  File "/kaggle/working/Time-LLM/models/Autoformer.py", line 102, in forecast
    enc_out = self.enc_embedding(x_enc, x_mark_enc)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/layers/Embed.py", line 145, in forward
    x = self.value_embedding(x) + self.temporal_embedding(x_mark)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/layers/Embed.py", line 42, in forward
    x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 303, in _conv_forward
    return F.conv1d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
RuntimeError: expected scalar type Float but found BFloat16
@gsamaras gsamaras closed this as completed Feb 6, 2024
@gsamaras gsamaras reopened this Feb 6, 2024
@gsamaras gsamaras changed the title Quick Start RuntimeError: expected scalar type Float but found BFloat16 Feb 7, 2024
@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@gsamaras I suspect this was caused by the --mixed_precision bf16 and related configurations. The current implementation is natively on Ampere cards (or above) that support such tensor operations well.

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

If I try to execute with accelerate_launch, I get the same problem as in #1 (comment).

If I try to execute with python run_main.py I get the RuntimeError: expected scalar type Float but found BFloat16 error. The mixed precision parameter is only used in accelerate launch, when executing with python it's not recognized.

Do you have an online notebook I could use where you have a working instance of your model?

@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@gsamaras please use accelerate launch and make sure you have correctly configurated num_process as I mentioned in #1. You may also refer to paperswithcode/galai#3 for the error of RuntimeError: CUDA error: invalid device ordinal. The default configuration and scripts should be ready to use on an instance with 8*A100.

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

I changed num_processes to the no of GPUs. Even if I try to run with fp16 (while changing it in deepseed config too), I get:

: mat1 and mat2 must have the same dtype, but got Float and HalfRuntimeError
: mat1 and mat2 must have the same dtype, but got Float and Half

Are you aware where I can find a free online instance with A100 for a basic demo of your code @KimMeen? Kaggle? Colab maybe?

@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@gsamaras The error you encountered may be caused by this:

enc_out, n_vars = self.patch_embedding(x_enc.to(torch.bfloat16))

Slight modifications, like the one mentioned above, are needed if you are not using Ampere cards. You may refer to this for information on GPU instances.

@aliper96
Copy link

aliper96 commented Feb 8, 2024

`for ii in range(args.itr):
# setting record of experiments
setting = '{}{}{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}fc{}eb{}{}{}'.format(
args.task_name,
args.model_id,
args.model,
args.data,
args.features,
args.seq_len,
args.label_len,
args.pred_len,
args.d_model,
args.n_heads,
args.e_layers,
args.d_layers,
args.d_ff,
args.factor,
args.embed,
args.des, ii)

train_data, train_loader = data_provider(args, 'train')
vali_data, vali_loader = data_provider(args, 'val')
test_data, test_loader = data_provider(args, 'test')

if args.model == 'Autoformer':
    model = Autoformer.Model(args).float()
elif args.model == 'DLinear':
    model = DLinear.Model(args).float()
else:
    model = TimeLLM.Model(args).float()

model = model.to(torch.bfloat16)


path = os.path.join(args.checkpoints,
                    setting + '-' + args.model_comment)  # unique checkpoint saving path
args.content = load_content(args)
if not os.path.exists(path) and accelerator.is_local_main_process:
    os.makedirs(path)

time_now = time.time()

train_steps = len(train_loader)
early_stopping = EarlyStopping(accelerator=accelerator, patience=args.patience)

trained_parameters = []
for p in model.parameters():
    if p.requires_grad is True:
        trained_parameters.append(p)

model_optim = optim.Adam(trained_parameters, lr=args.learning_rate)

if args.lradj == 'COS':
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(model_optim, T_max=20, eta_min=1e-8)
else:
    scheduler = lr_scheduler.OneCycleLR(optimizer=model_optim,
                                        steps_per_epoch=train_steps,
                                        pct_start=args.pct_start,
                                        epochs=args.train_epochs,
                                        max_lr=args.learning_rate)

criterion = nn.MSELoss()
mae_metric = nn.L1Loss()

train_loader, vali_loader, test_loader, model, model_optim, scheduler = accelerator.prepare(
    train_loader, vali_loader, test_loader, model, model_optim, scheduler)

if args.use_amp:
    scaler = torch.cuda.amp.GradScaler()

for epoch in range(args.train_epochs):
    iter_count = 0
    train_loss = []

    model.train()
    epoch_time = time.time()
    for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(train_loader)):
        iter_count += 1
        model_optim.zero_grad()

        batch_x = batch_x.float().to(accelerator.device)
        batch_y = batch_y.float().to(accelerator.device)
        batch_x_mark = batch_x_mark.float().to(accelerator.device)
        batch_y_mark = batch_y_mark.float().to(accelerator.device)

        # decoder input
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(
            accelerator.device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(
            accelerator.device)

        # encoder - decoder
        if args.use_amp:
            with torch.cuda.amp.autocast():
                if args.output_attention:
                    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
                else:
                    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

                f_dim = -1 if args.features == 'MS' else 0
                outputs = outputs[:, -args.pred_len:, f_dim:]
                batch_y = batch_y[:, -args.pred_len:, f_dim:].to(accelerator.device)
                loss = criterion(outputs, batch_y)
                train_loss.append(loss.item())
        else:
            if args.output_attention:
                outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
            else:
                outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

            f_dim = -1 if args.features == 'MS' else 0
            outputs = outputs[:, -args.pred_len:, f_dim:]
            batch_y = batch_y[:, -args.pred_len:, f_dim:]
            loss = criterion(outputs, batch_y)
            train_loss.append(loss.item())

        if (i + 1) % 100 == 0:
            accelerator.print(
                "\titers: {0}, epoch: {1} | loss: {2:.7f}".format(i + 1, epoch + 1, loss.item()))
            speed = (time.time() - time_now) / iter_count
            left_time = speed * ((args.train_epochs - epoch) * train_steps - i)
            accelerator.print('\tspeed: {:.4f}s/iter; left time: {:.4f}s'.format(speed, left_time))
            iter_count = 0
            time_now = time.time()

        if args.use_amp:
            scaler.scale(loss).backward()
            scaler.step(model_optim)
            scaler.update()
        else:
            accelerator.backward(loss)
            model_optim.step()

        if args.lradj == 'TST':
            adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=False)
            scheduler.step()

    accelerator.print("Epoch: {} cost time: {}".format(epoch + 1, time.time() - epoch_time))
    train_loss = np.average(train_loss)
    vali_loss, vali_mae_loss = vali(args, accelerator, model, vali_data, vali_loader, criterion, mae_metric)
    test_loss, test_mae_loss = vali(args, accelerator, model, test_data, test_loader, criterion, mae_metric)
    accelerator.print(
        "Epoch: {0} | Train Loss: {1:.7f} Vali Loss: {2:.7f} Test Loss: {3:.7f} MAE Loss: {4:.7f}".format(
            epoch + 1, train_loss, vali_loss, test_loss, test_mae_loss))

    early_stopping(vali_loss, model, path)
    if early_stopping.early_stop:
        accelerator.print("Early stopping")
        break

    if args.lradj != 'TST':
        if args.lradj == 'COS':
            scheduler.step()
            accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
        else:
            if epoch == 0:
                args.learning_rate = model_optim.param_groups[0]['lr']
                accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
            adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=True)

    else:
        accelerator.print('Updating learning rate to {}'.format(scheduler.get_last_lr()[0]))

accelerator.wait_for_everyone()
` this worked for me, also in windows

@aliper96
Copy link

aliper96 commented Feb 8, 2024

ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True) accelerator = Accelerator(kwargs_handlers=[ddp_kwargs]) also had to remove the deepspeed dependency like this the only change in the previous code is tha i converted the entire model to 16 bits with this line : model = model.to(torch.bfloat16)

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

@aliper96 thanks for joining in, can you provide a minimal complete and reproducible example please?

@aliper96
Copy link

aliper96 commented Feb 8, 2024

alitimellm.zip
here you have the notebook

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

@aliper96 I think I'm close, but I get the following error, something with the paths maybe? Check it live in Kaggle here:

---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    383 try:
    384     # Load from URL or cache if already cached
--> 385     resolved_file = hf_hub_download(
    386         path_or_repo_id,
    387         filename,
    388         subfolder=None if len(subfolder) == 0 else subfolder,
    389         repo_type=repo_type,
    390         revision=revision,
    391         cache_dir=cache_dir,
    392         user_agent=user_agent,
    393         force_download=force_download,
    394         proxies=proxies,
    395         resume_download=resume_download,
    396         token=token,
    397         local_files_only=local_files_only,
    398     )
    399 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    109 if arg_name in ["repo_id", "from_id", "to_id"]:
--> 110     validate_repo_id(arg_value)
    112 elif arg_name == "token" and arg_value is not None:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id)
    157 if repo_id.count("/") > 1:
--> 158     raise HFValidationError(
    159         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160         f" '{repo_id}'. Use `repo_type` argument if needed."
    161     )
    163 if not REPO_ID_REGEX.match(repo_id):

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[22], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

File /kaggle/working/Time-LLM/models/TimeLLM.py:44, in Model.__init__(self, configs, patch_len, stride)
     41 self.patch_len = configs.patch_len
     42 self.stride = configs.stride
---> 44 self.llama_config = LlamaConfig.from_pretrained('/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/')
     45 # self.llama_config = LlamaConfig.from_pretrained('huggyllama/llama-7b')
     46 self.llama_config.num_hidden_layers = configs.llm_layers

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:605, in PretrainedConfig.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
    601 kwargs["revision"] = revision
    603 cls._set_token_in_kwargs(kwargs, token)
--> 605 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    606 if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
    607     logger.warning(
    608         f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
    609         f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
    610     )

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:634, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    632 original_kwargs = copy.deepcopy(kwargs)
    633 # Get config dict associated with the base config file
--> 634 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    635 if "_commit_hash" in config_dict:
    636     original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:689, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    685 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME)
    687 try:
    688     # Load from local folder or from cache or download from model Hub and cache
--> 689     resolved_config_file = cached_file(
    690         pretrained_model_name_or_path,
    691         configuration_file,
    692         cache_dir=cache_dir,
    693         force_download=force_download,
    694         proxies=proxies,
    695         resume_download=resume_download,
    696         local_files_only=local_files_only,
    697         token=token,
    698         user_agent=user_agent,
    699         revision=revision,
    700         subfolder=subfolder,
    701         _commit_hash=commit_hash,
    702     )
    703     commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
    704 except EnvironmentError:
    705     # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted to
    706     # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:450, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    448     raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
    449 except HFValidationError as e:
--> 450     raise EnvironmentError(
    451         f"Incorrect path_or_model_id: '{path_or_repo_id}'. Please provide either the path to a local folder or the repo_id of a model on the Hub."
    452     ) from e
    453 return resolved_file

OSError: Incorrect path_or_model_id: '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@aliper96 I think I'm close, but I get the following error, something with the paths maybe? Check it live in Kaggle here:

---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    383 try:
    384     # Load from URL or cache if already cached
--> 385     resolved_file = hf_hub_download(
    386         path_or_repo_id,
    387         filename,
    388         subfolder=None if len(subfolder) == 0 else subfolder,
    389         repo_type=repo_type,
    390         revision=revision,
    391         cache_dir=cache_dir,
    392         user_agent=user_agent,
    393         force_download=force_download,
    394         proxies=proxies,
    395         resume_download=resume_download,
    396         token=token,
    397         local_files_only=local_files_only,
    398     )
    399 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    109 if arg_name in ["repo_id", "from_id", "to_id"]:
--> 110     validate_repo_id(arg_value)
    112 elif arg_name == "token" and arg_value is not None:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id)
    157 if repo_id.count("/") > 1:
--> 158     raise HFValidationError(
    159         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160         f" '{repo_id}'. Use `repo_type` argument if needed."
    161     )
    163 if not REPO_ID_REGEX.match(repo_id):

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[22], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

File /kaggle/working/Time-LLM/models/TimeLLM.py:44, in Model.__init__(self, configs, patch_len, stride)
     41 self.patch_len = configs.patch_len
     42 self.stride = configs.stride
---> 44 self.llama_config = LlamaConfig.from_pretrained('/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/')
     45 # self.llama_config = LlamaConfig.from_pretrained('huggyllama/llama-7b')
     46 self.llama_config.num_hidden_layers = configs.llm_layers

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:605, in PretrainedConfig.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
    601 kwargs["revision"] = revision
    603 cls._set_token_in_kwargs(kwargs, token)
--> 605 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    606 if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
    607     logger.warning(
    608         f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
    609         f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
    610     )

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:634, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    632 original_kwargs = copy.deepcopy(kwargs)
    633 # Get config dict associated with the base config file
--> 634 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    635 if "_commit_hash" in config_dict:
    636     original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:689, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    685 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME)
    687 try:
    688     # Load from local folder or from cache or download from model Hub and cache
--> 689     resolved_config_file = cached_file(
    690         pretrained_model_name_or_path,
    691         configuration_file,
    692         cache_dir=cache_dir,
    693         force_download=force_download,
    694         proxies=proxies,
    695         resume_download=resume_download,
    696         local_files_only=local_files_only,
    697         token=token,
    698         user_agent=user_agent,
    699         revision=revision,
    700         subfolder=subfolder,
    701         _commit_hash=commit_hash,
    702     )
    703     commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
    704 except EnvironmentError:
    705     # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted to
    706     # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:450, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    448     raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
    449 except HFValidationError as e:
--> 450     raise EnvironmentError(
    451         f"Incorrect path_or_model_id: '{path_or_repo_id}'. Please provide either the path to a local folder or the repo_id of a model on the Hub."
    452     ) from e
    453 return resolved_file

OSError: Incorrect path_or_model_id: '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

@gsamaras Simply use 'huggyllama/llama-7b' instead of '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/' in TimeLLM.py will solve this issue.

@aliper96
Copy link

aliper96 commented Feb 8, 2024

@KimMeen I'm PhD student in generative AI and thanks for the code... most clean and understandle code!!

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

@KimMeen unfortunately when changing this, I this error:

ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or `pip install bitsandbytes`.

although I installed both packages.

@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@gsamaras Remove load_in_8bit=True and have a try

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

@KimMeen it seems I don't have control there, it's not in your code, see the full error please:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[32], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = Model(args).float() #TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

Cell In[12], line 55, in Model.__init__(self, configs, patch_len, stride)
     53 self.llama_config.output_attentions = True
     54 self.llama_config.output_hidden_states = True
---> 55 self.llama = LlamaModel.from_pretrained(
     56     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/",
     57     'huggyllama/llama-7b',
     58     trust_remote_code=True,
     59     local_files_only=True,
     60     config=self.llama_config,
     61     load_in_4bit=True
     62 )
     64 self.tokenizer = LlamaTokenizer.from_pretrained(
     65     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/tokenizer.model",
     66     'huggyllama/llama-7b',
     67     trust_remote_code=True,
     68     local_files_only=True
     69 )
     71 if self.tokenizer.eos_token:

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3034, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3032     raise RuntimeError("No GPU found. A GPU is needed for quantization.")
   3033 if not (is_accelerate_available() and is_bitsandbytes_available()):
-> 3034     raise ImportError(
   3035         "Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of"
   3036         " bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or"
   3037         " `pip install bitsandbytes`."
   3038     )
   3040 if torch_dtype is None:
   3041     # We force the `dtype` to be float16, this is a requirement from `bitsandbytes`
   3042     logger.info(
   3043         f"Overriding torch_dtype={torch_dtype} with `torch_dtype=torch.float16` due to "
   3044         "requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. "
   3045         "Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass"
   3046         " torch_dtype=torch.float16 to remove this warning."
   3047     )

ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or `pip install bitsandbytes`.

@KimMeen
Copy link
Owner

KimMeen commented Feb 8, 2024

@gsamaras Will removing load_in_4bit=True works on your end? See also https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/discussions/11

@gsamaras
Copy link
Author

gsamaras commented Feb 8, 2024

No, sorry:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[45], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = Model(args).float() #TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

Cell In[42], line 54, in Model.__init__(self, configs, patch_len, stride)
     52 self.llama_config.output_attentions = True
     53 self.llama_config.output_hidden_states = True
---> 54 self.llama = LlamaModel.from_pretrained(
     55     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/",
     56     'huggyllama/llama-7b',
     57     trust_remote_code=True,
     58     local_files_only=True,
     59     config=self.llama_config,
     60     load_in_4bit=False
     61 )
     63 self.tokenizer = LlamaTokenizer.from_pretrained(
     64     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/tokenizer.model",
     65     'huggyllama/llama-7b',
     66     trust_remote_code=True,
     67     local_files_only=True
     68 )
     70 if self.tokenizer.eos_token:

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3455, in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3452         return key.replace("gamma", "weight")
   3453     return key
-> 3455 original_loaded_keys = loaded_keys
   3456 loaded_keys = [_fix_key(key) for key in loaded_keys]
   3458 if len(prefix) > 0:

OSError: huggyllama/llama-7b does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants