-
Notifications
You must be signed in to change notification settings - Fork 0
BaselineChipaSystem
Alex Rudnick edited this page Aug 26, 2014
·
2 revisions
This will end up as a significant part of the tasks/evaluation chapter.
- disambiguate es-gn for the top 100 words in es, using the Bible
- also es-qu, using the Bible
- also es-en and en-es using the Bible
- also es-en and en-es using Europarl
- take the one-to-many alignments from cdec as ground truth
- for each source-language word, train a classifier
- do 10-fold cross validation with that classifier
- scikit-learn maxent classifiers
- can try other classifiers too
- try some different feature selection and regularization approaches
- try a few different context window sizes: definitely at least 2,3,4,5
- also bag of words over the whole sentence
- try some dependency features: we can run a parser for both es and en as source languages
- http://scikit-learn.org/stable/modules/feature_selection.html
- http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
- http://scikit-learn.org/stable/modules/feature_selection.html#l1-feature-selection
- http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC