-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add final thoughts for Tagalog on draft mode
- Loading branch information
1 parent
b57c0d9
commit 5634a55
Showing
2 changed files
with
38 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
<!-- | ||
In Tagalog, we have this word called diskarte. There is no direct translation in | ||
English, but I can describe it loosely as resourcefulness and creativity. It's | ||
not a highly-cognitive trait: smart people may be bookish, but not madiskarte. | ||
It's more practical, a form of street smarts, even. Diskarte is a | ||
highly-Filipino trait, borne from our need to solve things creatively in the | ||
presence of constraints. I mention this because working in Tagalog, or any | ||
low-resource language, requires a little diskarte, and I enjoy it! | ||
There are many exciting ways to tackle Tagalog NLP. Right now, I'm taking the | ||
standard labeling, training, and evaluation approach. However, I'm interested in | ||
exploring model-based techniques like cross-lingual transfer learning and | ||
multilingual NLP to "get around" the data bottleneck. After three months (twelve | ||
weekends, to be specific) of labeling, I realized how long and costly the | ||
process was. I still believe in getting gold-standard annotations, but I also | ||
want to balance this approach with short-term solutions. | ||
I wish we had more consolidated efforts to work on Tagalog NLP. Right now, I | ||
noticed that research progress for each institution is disconnected from one | ||
another. I definitely like what's happening in | ||
[Masakhane](https://www.masakhane.io/) for African languages and | ||
[IndoNLP](https://indonlp.github.io/) for Indonesian. I think they are good | ||
community models to follow. In the future, wouldn't it be great if [Komisyon sa | ||
Wikang Filipino](https://kwf.gov.ph/) had a dedicated computational linguistics | ||
group? Tagalog is not the only language in the Philippines, and being able to | ||
solve one Filipino language at a time would be nice. | ||
Right now, I'm working on | ||
[calamanCy](https://github.com/ljvmiranda921/calamanCy), my attempt to create | ||
spaCy pipelines for Tagalog. Its name is based on kalamansi, a citrus fruit | ||
common in the Philippines. Unfortunately, it's something that I've been working | ||
on in my spare time, so progress is slower than usual! This blog post contains | ||
my experiments on building the NER part of the pipeline. I plan to add a | ||
dependency parser and POS tagger from Universal Dependencies in the future. | ||
That's all for now. Feel free to hit me up if you have any questions and want to | ||
collaborate! Maraming salamat! | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters