Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmatization of "Jew" is lowercased #28

Open
AngledLuffa opened this issue Jan 6, 2025 · 1 comment
Open

Lemmatization of "Jew" is lowercased #28

AngledLuffa opened this issue Jan 6, 2025 · 1 comment

Comments

@AngledLuffa
Copy link
Contributor

Probably should be Jew to match the lemmatization of Jews and the lemmatization in other treebanks

examples:

# sent_id = en_lines-ud-train-doc3-883
# text = You must be a Jew, we are speaking Yiddish.
1       You     you     PRON    PERS-P2 Case=Nom|Person=2|PronType=Prs  5       nsubj   _       _
2       must    must    AUX     PRES-AUX        VerbForm=Fin    5       aux     _       _
3       be      be      AUX     INF     VerbForm=Inf    5       cop     _       _
4       a       a       DET     IND-SG  Definite=Ind|PronType=Art       5       det     _       _
5       Jew     jew     NOUN    SG-NOM  Number=Sing     0       root    _       SpaceAfter=No
6       ,       ,       PUNCT   Comma   _       9       punct   _       _
7       we      we      PRON    PERS-P1PL-NOM   Case=Nom|Number=Plur|Person=1|PronType=Prs      9       nsubj   _       _
8       are     be      AUX     PRES-AUX        Mood=Ind|Tense=Pres|VerbForm=Fin        9       aux     _       _
9       speaking        speak   VERB    ING     Tense=Pres|VerbForm=Part        5       advcl   _       _
10      Yiddish Yiddish NOUN    SG-NOM  Number=Sing     9       obj     _       SpaceAfter=No
11      .       .       PUNCT   Period  _       5       punct   _       _
@LarsAhrenberg
Copy link
Contributor

Thanks for seeing this. Fixed now in dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants