error message from spacy training #13715
Unanswered
mcveigh-h16
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Followup. I tried removing all non-ascii characters which I am sure will help but am still encountering the same error so clearly that wasn't it. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am encountering an error message when trying to train spacy data. I suspect it's a problem with the training data but am not sure what the issue is. I created the NER labels with (https://arunmozhi.in/ner-annotator/). I then used spacy convert to create the .spacy file from the .json file
python -m spacy convert ./annotations_strain1.json ./
The training command then generates and error
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy
⚠ Aborting and saving the final best model. Encountered exception:
ValueError('[E986] Could not create any training batches: check your input. Are
the train and dev paths defined? Is
discard_oversize
set appropriately? ')I also tried debug and get this
python -m spacy debug data config.cfg --paths.train ./train.spacy --paths.dev ./train.spacy
============================ Data file validation ============================
✔ Pipeline can be initialized with data
✔ Corpus is loadable
=============================== Training stats ===============================
Language: en
Training pipeline:
0 training docs
0 evaluation docs
✔ No overlap between training and evaluation data
✘ Low number of examples to train a new pipeline (0)
============================== Vocab & Vectors ==============================
ℹ 0 total word(s) in the data (0 unique)
ℹ No word vectors present in the package
================================== Summary ==================================
✔ 3 checks passed
✘ 1 error
Any clue on where the problem is? The training data .json is attached and contains many non-ascii characters, I suspect this could be the issue but don't find any documentation on that.
annotations_strain1.json
Beta Was this translation helpful? Give feedback.
All reactions