NorSID

This is the code used by the MaiNLP team in the VarDial 2025 shared task on dialectal Norwegian slot and intent detection. It is described in the paper Add Noise, Tasks, or Layers? A Comparison of Methods to Improve Dialectal Norwegian Slot and Intent Detection (Verena Blaschke*, Felicia Körner*, Barbara Plank), to be published at VarDial @ COLING 2025.

Code

Clone with "recursive" flag to download the submodule contents: git clone [email protected]:mainlp/NorSID.git --recursive

Our experiments are described more closely in the respective folders:

experiments_baselines: Compare mDeBERTa-v3, ScandiBERT and NorBERT-v3 when trained on the English training data, its machine-translated counterpart, or 90% of the (predominantly dialectal) development set. This folder also contains some general scripts/notes (making MaChAmp compatible with NorBERT, preparing the 90:10 split of the dev data).
experiments_noise: Inject character-level noise into the MT'ed Norwegian training data and fine-tune the language models on the noised data.
experiments_auxtasks: Train ScandiBERT on another NLP task in (standard or dialectal) Norwegian (POS tagging, dependency parsing, NER, dialect identification), either before training on the English SID training set or simultaneosuly.
experiments_layer_swap: Train models on different datasets and swap out some of their layers (or reset some of their layers).

Data

Data in submodule folder:

xSID (regular training/dev/test data for English + other languages)
NoMusic (shared task dev/test data; MT'ed Norwegian training data)
UD Nynorsk LIA (treebank with dialectal transcriptions)
NorNE (named entity annotations for Norwegian)

Data to be additionally downloaded:

Nordic Dialect Corpus (CC BY-SA 4.0): Download the phonetic & orthographic transcriptions with informant codes, unzip them, and add the folders ndc_phon_with_informant_codes and ndc_with_informant_codes to the data directory.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
data		data
experiments_auxtasks		experiments_auxtasks
experiments_baselines		experiments_baselines
experiments_layer_swap		experiments_layer_swap
experiments_noise		experiments_noise
machamp @ 052044a		machamp @ 052044a
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
predict_eval.sh		predict_eval.sh
predict_eval_multitask.sh		predict_eval_multitask.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NorSID

Code

Data

About

Releases

Packages

Contributors 2

Languages

mainlp/NorSID

Folders and files

Latest commit

History

Repository files navigation

NorSID

Code

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages