-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
The translator is rule-based: Meaning before translating, it recognizes the patterns and construction of the Toki Pona text through traditional programming instead of neural networks. This is a lot of effort but it's doable due to the relatively simple rules of Toki Pona.
If we instead start at English and translate it into Toki Pona, now we're dealing with the rules of English, which is very huge and inconsistent.
jan Koko personally think instead of rule-based approach, using neural networks would be preferable for English (or any language) to Toki Pona machine translation. That is, if there's enough data set available. Otherwise, the translation would be inaccurate. There isn't much data set available in Toki Pona because the language is still young. There are already existing machine learning translators nonetheless.
When using machine learning, we need huge amount of data set, like a lot. Otherwise, the translation would be very inaccurate. There isn't much data set in Toki Pona because the language is still young. There are already existing machine learning translators nonetheless.
For the time being, making use of traditional programming to parse the patterns and constructions of the text is preferable.
The translator is limited to translating at most 2 sentences. There are two reasons:
- The translator considers many ways the original Toki Pona text could be interpreted and outputs multiple results. If it accepts multiple sentences, the output would grow exponentially as there are many sentences.
- When the original Toki Pona text spans multiple sentences, it's important to take note of context between sentences, which the translator cannot do. This is the main reason we disallow it.
By allowing 2 sentences, we could translate sentences that uses "ni:".
You can still give it complicated sentences like giving the word so many modifiers or give the sentence so many predicates. We found no reason to limit that. Be careful not to crash the browser however.
"Token" is derived from the ISO codes of Toki Pona and English: "tok" and "en". "Token" also happens to be a jargon in parsing, which is fitting.
We prioritized ease of development and left behind the consideration for good error messages. You may turn on telo misikeke error messages in the settings, which is a lot better.