-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
Description
Here is the original from this article:
Antik Yunanca Grekçe: matesis kelimesi matematik kelimesinin köküdür ve bilirim anlamına gelmektedir.
This is the related source:
Antik Yunanca ''{{dil|grc|matesis}}'' kelimesi matematik kelimesinin köküdür ve ''bilirim'' anlamına gelmektedir.
And this is what is extracted (from text/AA/wiki_00 file):
Antik Yunanca ' kelimesi matematik kelimesinin köküdür ve \"bilirim\" anlamına gelmektedir.
Somehow a ' is introduced and the Greek word is dropped. So the sentence has no meaning but except for the ' character, it is OK.
As the Greek word is also removed, we also cannot blacklist it.
I'm not sure how many such occurrences would drop into the random 3 selection, but a solution might be good.
PS: I'm aware this is NOT a cv-sentence-extractor issue, but the workflow includes wikiextractor, so...