Slow training speed for textcat pipeline #4633
Replies: 4 comments
-
|
Sparse categories are definitely a problem with the current training format. I haven't tried using the It's hard to guess what's going on from the code provided here. I would suggest profiling with something like |
Beta Was this translation helpful? Give feedback.
-
|
@adrianeboyd Hi! Thanks for your reply! Sorry the I would need to investigate the default model's implementation further. But currently, you can only pass in I would update the ticket later with the profilling result. Thanks for your suggestion! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @adrianeboyd . I got some My train script is really simple. The train loop looks like Also, it seems like the process is still using Thank you for your time! |
Beta Was this translation helpful? Give feedback.
-
|
That's an interesting analysis. I am not an expert on the thinc internals, so I think @honnibal might need to take a look to see if he knows what might be going on. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I am trying to train a textcat pipeline with over 6000 classes. The training data consists of around 300k documents. I tried to convert my training data to the correct
jsonlformat but that would result in a file size of over 100G. And the initialization ofGoldCorpuswould take forever in writing the message packs. Therefore I wrote the followingTextcatGoldCorpusclass:Then I write a regular train loop to call
nlp.updatefor batch size of 64. However, the training is so slow. I am using an Nvidia V100 GPU and the average update speed is around 2-3 documents/second. This would take around two days to train one epoch for my task. I also notice the GPU training does not gain any significant speedup from CPU.I used to train a Convolution model (with PyTorch) on the exact same task and each epoch takes around 3 to 4 hours. I also used to fine-tune Bert Base model on classification task and the entire training finished in around one day with 3 epochs.
I have almost no idea about the potential cause of this slowdown. Please give me some suggestions. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions