-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Hi! I've been using spaCy over the last few weeks to fine-tune a roberta-base model for NER. So far, the experience has been great and I'm able to train and use the fine-tuned models without any issues.
I now wanted to enable mixed precision to speed up the training process. However, when I do that, I get the following error:
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.
Toggling mixed_precision back to false results in successful training.
Traceback
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
ℹ Saving to output directory: spacy_trained_pipeline_en
ℹ Using GPU: 0
=========================== Initializing pipeline ===========================
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✔ Initialized pipeline
============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 0.0
E # LOSS TRANS... LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------- -------- ------ ------ ------ ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('current_scale must be a float tensor.')
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/spacy/__main__.py", line 4, in <module>
setup_cli()
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/_util.py", line 87, in setup_cli
command(prog_name=COMMAND)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 783, in main
return _main(
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 225, in _main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 54, in train_cli
train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 84, in train
train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 135, in train
raise e
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 118, in train
for batch, info, is_best_checkpoint in training_step_iterator:
File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 236, in train_while_improving
proc.finish_update(optimizer) # type: ignore[attr-defined]
File "spacy/pipeline/trainable_pipe.pyx", line 252, in spacy.pipeline.trainable_pipe.TrainablePipe.finish_update
File "/usr/local/lib/python3.10/dist-packages/thinc/model.py", line 342, in finish_update
shim.finish_update(optimizer)
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch.py", line 180, in finish_update
self._grad_scaler.update()
File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.
To me, this hints that the grad_scaler_config is somehow not getting to PyTorch, but I'm not sure what I'm doing wrong.
I'm following the example config from spacy-transformers.TransformerModel.v3.
My config file, trf_config.cfg
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = "pytorch"
seed = 0
[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 64
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100
[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"
[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
# mixed_precision = false
mixed_precision = true
grad_scaler_config = {"init_scale": 32768}
[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96
[components.transformer.model.tokenizer_config]
use_fast = true
[components.transformer.model.transformer_config]
[corpora]
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null
[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 200
buffer = 256
get_length = null
[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005
[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
How to reproduce the behaviour
I'm running the training on Google Colab, using a Tesla T4 runtime:
!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
GPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)I've tried not executing the line !export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, but it doesn't make a difference.
I've also made sure that I call spacy train with --gpu-id 0.
Here's the exact steps of the Colab notebook I use:
Colab notebook
!nvcc --versionnvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
!pip install spacy[cuda12x,transformers] transformers[sentencepiece]!pip freeze | grep cupycupy-cuda12x==12.2.0
!python -m spacy download en_core_web_trf!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:TrueGPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)
!pip3 freeze | grep torchtorch @ https://download.pytorch.org/whl/cu121/torch-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=0a12aa9aa6bc442dff8823ac8b48d991fd0771562eaa38593f9c8196d65f7007
torchaudio @ https://download.pytorch.org/whl/cu121/torchaudio-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=38b49393f8c322dcaa29d19e5acbf5a0b1978cf1b719445ab670f1fb486e3aa6
torchsummary==1.5.1
torchtext==0.18.0
torchvision @ https://download.pytorch.org/whl/cu121/torchvision-0.18.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=13e1b48dc5ce41ccb8100ab3dd26fdf31d8f1e904ecf2865ac524493013d0df5
!python -m spacy train ./trf_config.cfg --output ./spacy_trained_pipeline_en --paths.train "train.spacy" --paths.dev "dev.spacy" --gpu-id 0Could you please give me a hand? Thanks a lot!
Info about spaCy
- spaCy version: 3.7.4
- Platform: Linux-6.1.85+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Pipelines: en_core_web_trf (3.7.3), en_core_web_sm (3.7.1)