-
Hi I've found that Lingua generally works well with clean text. However, sometimes it can really be thrown off by words that are not in any language vocabulary (to my knowledge). For example, the text
English is not even in the top 5 here. If I remove |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi Anders, thanks for trying my library and for reaching out to me. I'm afraid, there is nothing you can do about this. The language detector is sensitive to noise especially for very short texts because there are not enough distinct ngrams available for calculating a reliable language estimate. If the sentence was longer with the word |
Beta Was this translation helpful? Give feedback.
Hi Anders, thanks for trying my library and for reaching out to me.
I'm afraid, there is nothing you can do about this. The language detector is sensitive to noise especially for very short texts because there are not enough distinct ngrams available for calculating a reliable language estimate. If the sentence was longer with the word
SIARxKnru3t
being the only noisy one, the detector would surely output English. You should try to filter out the noise before trying to detect the language.