Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lemmatization #10

Open
1 of 3 tasks
ctschroeder opened this issue Mar 14, 2017 · 2 comments
Open
1 of 3 tasks

lemmatization #10

ctschroeder opened this issue Mar 14, 2017 · 2 comments
Assignees
Labels

Comments

@ctschroeder
Copy link
Member

ctschroeder commented Mar 14, 2017

All corpora need to be checked for lemmatization of ⲩⲛⲟⲩ; should be ⲟⲩⲛⲟⲩ. See this ANNIS search

  • AP
  • sahidica.mark

Corpora should also be checked for ⲡⲱⲛⲅ; should lemmatize as ⲡⲱⲛⲕ (ⲡⲱⲛⲅ a known variant, not sure if it should be normalized).

(Also lemmatizer should be checked)

@ctschroeder ctschroeder self-assigned this Mar 14, 2017
@ctschroeder
Copy link
Member Author

ctschroeder commented Mar 15, 2017

adjust in lemmatizer: inconsistent lemmatization of ⲙⲟⲓϩⲉ/ⲙⲟⲉⲓϩⲉ (should probably be normalized/lemmatized ⲙⲟⲉⲓϩⲉ as in Crum);
https://corpling.uis.georgetown.edu/annis/?id=2d63101d-4e33-48e5-9f54-d9c2a4c900e4

  • Eagerness

  • Abraham

also

  • ⲡⲟⲣⲛⲓⲁ should be normalized/lemmatized porneia. (Checked only lemmas; no lemmas in corpora are ⲡⲟⲣⲛⲓⲁ.)

another normalization/lemmatization issue: we are normalizing and lemmatizing ϩⲟⲉⲓⲧⲉ to itself and ϩⲟⲓⲧⲉ to itself. (Also dictionary lists it as ϩⲟ(ⲉ)ⲓⲧⲉ, which links to nothing in ANNIS of course.) https://corpling.uis.georgetown.edu/annis/?id=40917f1a-f549-43b8-b633-d35f704533c0 . We should at least change lemmas to ϩⲟⲉⲓⲧⲉ

  • Mark

  • AP

@amir-zeldes
Copy link
Member

I think this is a normalization issue for moihe. For unou it's different, since after a vowel that is actually the expected (normal and hence norm) spelling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants