tokenization principles #119
Unanswered
arademaker
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In Portuguese, Monday is segunda-feira. All week days are compounds with hífen. They are semantically compositional although feira is a Latin word.
How do you tokenize them? Any justification for breaking it into ADJ NOUN?
Somehow related to it, how to deal with productive prefixes like ex- and vice-
Beta Was this translation helpful? Give feedback.
All reactions