-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plural nouns not using singular lemma, possible pluralia tantum #33
Comments
There's a lot of
I've never heard of single new issue created:
UniversalDependencies/UD_English-EWT#477
|
There are many more such words, some questionable. I've analyzed the Morpha lexicon, BNC annotations and several other resources when building this lemmatizer. Perhaps there are better resources today, but maybe it can still be of some use. |
For "species" -- as with other words such as "glasses" -- it should be annotated as a plurale tantum:
Hence me linking to the plurale tantum issue in EWT. Note: My validator is applying plural stemming rules for |
That makes sense, but I've never seen the plurale tantum feature on a word in either EWT or GUM. I'd prefer to wait for the larger treebanks to settle on a standard before implementing that here |
I've created an issue in the docs repo to ask about adding |
Discussed with @nschneid earlier, I think we'd both be willing to use Ptan. I've posted it elsewhere too, but the GUM validator also maintains a list of acceptable xpos=NNS where form=lemma. Here again for convenience: https://github.com/amir-zeldes/gum/blob/master/_build/utils/validate.py#L725-L735 I can trivially assign Ptan to exactly the same items that the validator accepts, but it would be nice to have a definitive list/guidelines for consistency across corpora. |
See UniversalDependencies/docs#999 for the suggested definition of |
Where are we with this at the moment? @rhdunn would you recheck the list of words which are not properly featurized / lemmatized, and we'll take a look at fixing those up? |
These are instances of nouns (
NN
) and proper nouns (NNPS
) marked as plurals (Number=Plur
) where the lemma is the plural form. Each of these (on a case by case basis) should either:Number=Ptan
to mark them as plurale tantum -- see also Ambiguous lemmatization of pluralia tantum UD_English-EWT#374.nouns
proper nouns
The text was updated successfully, but these errors were encountered: