BaselineChipaSystem

Jump to bottom Edit New page

Alex Rudnick edited this page Aug 26, 2014 · 2 revisions

This will end up as a significant part of the tasks/evaluation chapter.

Experiments that we'll do repeatedly...

disambiguate es-gn for the top 100 words in es, using the Bible
also es-qu, using the Bible
also es-en and en-es using the Bible
also es-en and en-es using Europarl

baseline system looks like this

take the one-to-many alignments from cdec as ground truth
for each source-language word, train a classifier
do 10-fold cross validation with that classifier

classifiers that we'll use are...

scikit-learn maxent classifiers
can try other classifiers too
try some different feature selection and regularization approaches

feature set that we'll use...

try a few different context window sizes: definitely at least 2,3,4,5
also bag of words over the whole sentence
try some dependency features: we can run a parser for both es and en as source languages

some relevant links...