Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shona to English - Mchechesi Innocent #177

Open
wants to merge 57 commits into
base: sn-en
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
4ecf9ed
Create README.md
chrisemezue May 10, 2021
0adc8df
Updated torch version
Freshia May 11, 2021
d0cf005
Updated torch version
Freshia May 11, 2021
3a8abc0
Merge pull request #153 from Freshia/master
juliakreutzer May 12, 2021
bc0cf01
Merge pull request #152 from chrisemezue/patch-1
juliakreutzer May 12, 2021
b990d95
Added reverse model for sw-en
Freshia May 12, 2021
da17bc5
Update README.md
Freshia May 12, 2021
d269bdf
Changed to proper vocab files
Freshia May 14, 2021
0674880
Merge branch 'master' of https://github.com/Freshia/masakhane
Freshia May 14, 2021
d31d3f0
Adding Masakhane web check in md
Kabongosalomon Jun 24, 2021
101e2d9
fix missing icon for kiswahili
Kabongosalomon Jun 24, 2021
d83987c
CDL: adding new script for constructing test sets, and a README about…
testusernamegithub Jun 24, 2021
ceba19d
Merge pull request #156 from Kabongosalomon/master
Jun 25, 2021
42302d0
Merge pull request #159 from cdleong/master
juliakreutzer Jul 2, 2021
24e79dc
Merge pull request #155 from Freshia/master
juliakreutzer Jul 6, 2021
78bcb1a
Merge pull request #154 from masakhane-io/sn-en
juliakreutzer Jul 6, 2021
461be9e
Olatomiwa Nigerian-Pidgin to English
Oct 8, 2021
255caef
Update README.md
NonchalantLaja Oct 8, 2021
575f4fe
Trained a JoeyNMT model to translate Kimbundu to English
kavilan-nair Oct 8, 2021
2707c6c
Added results for Kimbundu to English to machine translation - Kavila…
kavilan-nair Oct 9, 2021
22cc38e
Updated notebook to contain local training results
kavilan-nair Oct 9, 2021
85149e1
added config used to train
kavilan-nair Oct 9, 2021
5f4bf6f
Initial commit for baseline ve - en
Michael-Beukman Oct 10, 2021
86a2eec
Update README.md
Michael-Beukman Oct 10, 2021
252b8dd
Update README.md
Michael-Beukman Oct 10, 2021
9b602fa
Initial commit for baseline nr - en
Michael-Beukman Oct 10, 2021
7b8ab0a
Create ReadMe.md
umair-nasir14 Oct 11, 2021
b34711d
Update ReadMe.md
umair-nasir14 Oct 11, 2021
223ff0f
Update ReadMe.md
umair-nasir14 Oct 11, 2021
01be7c5
SN-EN Noam Baseline (NLP Lab) - Muhammad U. Nasir
umair-nasir14 Oct 11, 2021
89b903a
Rename starter_notebook_into_English_training.ipynb to sn_en_with_noa…
umair-nasir14 Oct 11, 2021
3b6fa23
Add files via upload
umair-nasir14 Oct 11, 2021
bc8bff2
Initial commit for baseline af - en
Michael-Beukman Oct 11, 2021
88b0426
Update readme
Michael-Beukman Oct 11, 2021
7517c48
Update README.md
NonchalantLaja Oct 11, 2021
bd1af7d
TN-EN
vkthe1st Oct 14, 2021
21b07da
seTshwana to English
Samundlov Oct 14, 2021
5588eef
Added the notebook to transfer data to a huggingface dataset object
Nov 21, 2021
da374c7
Added a load script to get the data in a proper format
sanchit-ahuja Dec 4, 2021
8a042bb
Olatomiwa Nigerian-Pidgin to English
Jan 19, 2022
5085297
Merge https://github.com/NonchalantLaja/masakhane-mt
Jan 19, 2022
17ab7d0
Update language_pairs.md
NonchalantLaja Jan 19, 2022
f119d33
Merge pull request #167 from NonchalantLaja/master
juliakreutzer Jan 20, 2022
57f9f6f
Merge pull request #203 from NonchalantLaja/patch-1
juliakreutzer Jan 20, 2022
d13549d
Merge pull request #168 from TripleBlackCat/master
juliakreutzer Jan 20, 2022
cfe2939
Merge pull request #171 from Michael-Beukman/ve-en
juliakreutzer Jan 20, 2022
1567676
Merge pull request #173 from Michael-Beukman/nr-en
juliakreutzer Jan 20, 2022
412c6a7
Merge pull request #179 from Michael-Beukman/af-en
juliakreutzer Jan 20, 2022
d115e21
Merge pull request #180 from umair-nasir14/master
juliakreutzer Jan 20, 2022
8e10dac
Update language_pairs.md
Michael-Beukman Jan 21, 2022
3c8442c
Update language_pairs.md
Michael-Beukman Jan 21, 2022
4ae66da
Update language_pairs.md
Michael-Beukman Jan 21, 2022
40c7559
Merge pull request #204 from Michael-Beukman/patch-1
juliakreutzer Jan 25, 2022
55a47a1
Merge pull request #192 from vkthe1st/master
juliakreutzer Jan 25, 2022
fedc239
Merge pull request #193 from Samundlov/master
juliakreutzer Jan 25, 2022
8d30a2f
Merge pull request #201 from sanchit-ahuja/master
juliakreutzer Mar 2, 2022
47dc469
Update README.txt
chrisemezue Jun 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7,337 changes: 7,337 additions & 0 deletions benchmarks/af-en/jw300-baseline/AfrikaansToEnglish.ipynb

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions benchmarks/af-en/jw300-baseline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Afrikaans to English

Author: Michael Beukman

## Data

- The JW300 English-Afrikaans dataset.

## Model

- Default Masakhane Transformer translation model, [into-english notebook](https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook_into_English_training.ipynb), with some changes, specifically changing the model parameters to be larger, as specified in the TODOs. We trained for 23 epochs.
- [Link to google drive folder with models](https://drive.google.com/drive/folders/1XOgy2VNkQ_7oDWvW2EKiaJvNGf13qT29?usp=sharing)


Note, the final config provided here has a load_model directive, which was created by the notebook, and is still reproducible.

## Analysis
At the end of the training, the results were:


Example 0
```
Source: My pa was die groepkneg , die term wat destyds gebruik is vir die broer wat die leiding in ’ n gemeente geneem het .
Reference: Father was the company servant , the term then used for the one taking the lead in a congregation .
Hypothesis: My father was the group servant , the term used at that time for the brother who took the lead in a congregation .
```
Example 1
```
Source: Die aantrekkingskrag is verstaanbaar , want adolessensie is ’ n tyd wanneer ’ n mens jouself leer ken en jou gevoelens op ’ n manier uitdruk wat tot ander spreek en hulle ontroer .
Reference: The appeal is understandable , for adolescence is a time of learning about oneself and revealing one ’ s feelings in a way that reaches and moves others .
Hypothesis: The attraction is understandable , for adolescence is a time when one comes to know oneself and expresses feelings in a way that speaks to others and touches them .
```
Example 2
```
Source: Hy het selfs die woorde “ u seun Dawid ” met verwysing na homself gebruik , moontlik om eerbiedig te erken dat Nabal ouer as hy was .
Reference: He even referred to himself as “ your son David , ” perhaps a respectful acknowledgment of Nabal ’ s greater age .
Hypothesis: He even used the words “ your son David ” with reference to himself , perhaps to respectfully acknowledge that Nabal was older than he was .
```

Example 3
```
Source: HOEKOM MOET ONS GEESTELIKE DOELWITTE STEL ?
Reference: WHY SET SPIRITUAL GOALS ?
Hypothesis: HOW DO WE SHOULD WE STECTS DOELVITS ?
```


## Qualitative Analysis
It looked like the model struggles with sentences that only contain capital letters (see Example 3).

In addition to this, the model seems to do translation much more literally than the reference, as in the following examples:


Source: `Hy het selfs die woorde “ u seun Dawid ” met verwysing na homself gebruik`

Hypothesis: `He even used the words “ your son David ” with reference to himself`

Reference: `He even referred to himself as “ your son David , ”`

The model's translation above is very literal, almost word for word, whereas the reference is more holistically translated, even though both mean the same thing.

## Observations
The dataset was quite large, comparably larger than other African languages, and this made training quite slow, and 20 epochs took about 32 hours to train.

# Results

BLEU dev | BLEU test
--- | ---
51.47 | 57.22
Loading