-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
jan Koko loved the idea of automated Toki Pona analysis or parsers that takes account of many possible grammatical interpretation of the input Toki Pona texts. That "tawa" could be a preposition or a modifier, such tool will output both. Then another idea clicked: a translator that also takes account of its many possible semantic interpretation. Now we're dealing with the meaning of the word itself, that "sike" could mean a circle or a cycle. jan Koko loved this idea that they have to build it.
The scope of rule-based translation from Toki Pona to English is smaller. Parsing Toki Pona is easier and we can use a subset of English as the translation target. Going the other way have larger scope. The grammar and vocabulary of English is very huge and inconsistent. Not to say this is impossible, there are natural language processing or NLP tools to make this easy, but it is definitely harder.
By "smaller" and "easier" we don't mean it is trivial. ilo Token itself have thousands lines of codes. It just mean it is easier than translating the other way around.
jan Koko personally think instead of rule-based approach, using machine learning would be preferable for English (or any language) to Toki Pona machine translation. That is, if there's enough data set available. Otherwise, the translation would be inaccurate. There isn't much data set available in Toki Pona because the language is still young. There are already existing machine learning translators nonetheless.
When using machine learning, we need huge amount of data set, like a lot. Otherwise, the translation would be very inaccurate. There isn't much data set in Toki Pona because the language is still young. There are already existing machine learning translators nonetheless.
For the time being, making use of traditional programming to parse the patterns and constructions of the text is preferable.
The translator is limited to translating at most 2 sentences. There are two reasons:
- The translator considers many ways the original Toki Pona text could be interpreted and outputs multiple results. If it accepts multiple sentences, the output would grow exponentially as there are many sentences.
- When the original Toki Pona text spans multiple sentences, it's important to take note of context between sentences, which the translator cannot do. This is the main reason we disallow it.
By allowing 2 sentences, we could translate sentences that uses "ni:", we could also translate sentences in form "X o, Y" which are technically 2 sentences according to ilo Token.
"Token" is derived from the ISO codes of Toki Pona and English: "tok" and "en". "Token" also happens to be a jargon in parsing, which is fitting.
It is a simplified representation of a flowchart. ilo Token's processes can be described with a bunch of flowcharts. This highlights ilo Token is rule-based as opposed to using machine learning.
It is due to the number of output translation: ilo Token has to process many data. We consider this a bug and might be fixed in the future.
It's theoretically possible to do this by having compound to word dictionary. However, we didn't do it for two reasons: First, we want to discourage lexicalization. Second, we don't want to compete with Sonja's Toki Pona Dictionary. If you want compound to word translations, buy the book.