Implementation on bigdata #132
Unanswered
St-mlengineer
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
SymSpell can process about 10,000 words per second, on a single processor core. With multithreading up to 100,000 words per second. Of course it depends on
Which SymSpell port/language you are using? Python is known for its slow execution times as an interpreted language. For big data you should use C#, C++, Java or Rust instead of Python. Also, make sure that LoadDictionary() is called only once, when you are initializing SymSpell - but NOT every single time before you do a Lookup() spelling correction. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to implement symspell on a large dataset consisting of text strings. The udf is running infinitely on databricks cluster. Is there a way to make it faster?
Beta Was this translation helpful? Give feedback.
All reactions