Skip to content

mainlp/NorSID

Repository files navigation

NorSID

This is the code used by the MaiNLP team in the VarDial 2025 shared task on dialectal Norwegian slot and intent detection. It is described in the paper Add Noise, Tasks, or Layers? A Comparison of Methods to Improve Dialectal Norwegian Slot and Intent Detection (Verena Blaschke*, Felicia Körner*, Barbara Plank), to be published at VarDial @ COLING 2025.

Code

Clone with "recursive" flag to download the submodule contents: git clone [email protected]:mainlp/NorSID.git --recursive

Our experiments are described more closely in the respective folders:

  • experiments_baselines: Compare mDeBERTa-v3, ScandiBERT and NorBERT-v3 when trained on the English training data, its machine-translated counterpart, or 90% of the (predominantly dialectal) development set. This folder also contains some general scripts/notes (making MaChAmp compatible with NorBERT, preparing the 90:10 split of the dev data).
  • experiments_noise: Inject character-level noise into the MT'ed Norwegian training data and fine-tune the language models on the noised data.
  • experiments_auxtasks: Train ScandiBERT on another NLP task in (standard or dialectal) Norwegian (POS tagging, dependency parsing, NER, dialect identification), either before training on the English SID training set or simultaneosuly.
  • experiments_layer_swap: Train models on different datasets and swap out some of their layers (or reset some of their layers).

Data

Data in submodule folder:

  • xSID (regular training/dev/test data for English + other languages)
  • NoMusic (shared task dev/test data; MT'ed Norwegian training data)
  • UD Nynorsk LIA (treebank with dialectal transcriptions)
  • NorNE (named entity annotations for Norwegian)

Data to be additionally downloaded:

  • Nordic Dialect Corpus (CC BY-SA 4.0): Download the phonetic & orthographic transcriptions with informant codes, unzip them, and add the folders ndc_phon_with_informant_codes and ndc_with_informant_codes to the data directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published