Skip to content

BaselineChipaSystem

Alex Rudnick edited this page Aug 26, 2014 · 2 revisions

This will end up as a significant part of the tasks/evaluation chapter.

Experiments that we'll do repeatedly...

  • disambiguate es-gn for the top 100 words in es, using the Bible
  • also es-qu, using the Bible
  • also es-en and en-es using the Bible
  • also es-en and en-es using Europarl

baseline system looks like this

  • take the one-to-many alignments from cdec as ground truth
  • for each source-language word, train a classifier
  • do 10-fold cross validation with that classifier

classifiers that we'll use are...

  • scikit-learn maxent classifiers
  • can try other classifiers too
  • try some different feature selection and regularization approaches

feature set that we'll use...

  • try a few different context window sizes: definitely at least 2,3,4,5
  • also bag of words over the whole sentence
  • try some dependency features: we can run a parser for both es and en as source languages

some relevant links...

Clone this wiki locally