diff --git a/papers/2009-icon-hyderabad/acl-ijcnlp2009.sty b/papers/2009-icon-hyderabad/acl-ijcnlp2009.sty deleted file mode 100644 index 927779a..0000000 --- a/papers/2009-icon-hyderabad/acl-ijcnlp2009.sty +++ /dev/null @@ -1,368 +0,0 @@ -% File acl-ijcnlp2009.sty -% adapted from -- -% File eacl2006.sty -% September 19, 2005 -% Contact: e.agirre@ehu.es or Sergi.Balari@uab.es - -% This is the LaTeX style file for EACL 2006. It is nearly identical to the -% style files for ACL2005, ACL 2002, ACL 2001, ACL 2000, EACL 95 and EACL -% 99. -% -% Changes made include: adapt layout to A4 and centimeters, widden abstract - -% This is the LaTeX style file for ACL 2000. It is nearly identical to the -% style files for EACL 95 and EACL 99. Minor changes include editing the -% instructions to reflect use of \documentclass rather than \documentstyle -% and removing the white space before the title on the first page -% -- John Chen, June 29, 2000 - -% To convert from submissions prepared using the style file aclsub.sty -% prepared for the ACL 2000 conference, proceed as follows: -% 1) Remove submission-specific information: \whichsession, \id, -% \wordcount, \otherconferences, \area, \keywords -% 2) \summary should be removed. The summary material should come -% after \maketitle and should be in the ``abstract'' environment -% 3) Check all citations. This style should handle citations correctly -% and also allows multiple citations separated by semicolons. -% 4) Check figures and examples. Because the final format is double- -% column, some adjustments may have to be made to fit text in the column -% or to choose full-width (\figure*} figures. -% 5) Change the style reference from aclsub to acl2000, and be sure -% this style file is in your TeX search path - - -% This is the LaTeX style file for EACL-95. It is identical to the -% style file for ANLP '94 except that the margins are adjusted for A4 -% paper. -- abney 13 Dec 94 - -% The ANLP '94 style file is a slightly modified -% version of the style used for AAAI and IJCAI, using some changes -% prepared by Fernando Pereira and others and some minor changes -% by Paul Jacobs. - -% Papers prepared using the aclsub.sty file and acl.bst bibtex style -% should be easily converted to final format using this style. -% (1) Submission information (\wordcount, \subject, and \makeidpage) -% should be removed. -% (2) \summary should be removed. The summary material should come -% after \maketitle and should be in the ``abstract'' environment -% (between \begin{abstract} and \end{abstract}). -% (3) Check all citations. This style should handle citations correctly -% and also allows multiple citations separated by semicolons. -% (4) Check figures and examples. Because the final format is double- -% column, some adjustments may have to be made to fit text in the column -% or to choose full-width (\figure*} figures. - -% Place this in a file called aclap.sty in the TeX search path. -% (Placing it in the same directory as the paper should also work.) - -% Prepared by Peter F. Patel-Schneider, liberally using the ideas of -% other style hackers, including Barbara Beeton. -% This style is NOT guaranteed to work. It is provided in the hope -% that it will make the preparation of papers easier. -% -% There are undoubtably bugs in this style. If you make bug fixes, -% improvements, etc. please let me know. My e-mail address is: -% pfps@research.att.com - -% Papers are to be prepared using the ``acl'' bibliography style, -% as follows: -% \documentclass[11pt]{article} -% \usepackage{acl2000} -% \title{Title} -% \author{Author 1 \and Author 2 \\ Address line \\ Address line \And -% Author 3 \\ Address line \\ Address line} -% \begin{document} -% ... -% \bibliography{bibliography-file} -% \bibliographystyle{acl} -% \end{document} - -% Author information can be set in various styles: -% For several authors from the same institution: -% \author{Author 1 \and ... \and Author n \\ -% Address line \\ ... \\ Address line} -% if the names do not fit well on one line use -% Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\ -% For authors from different institutions: -% \author{Author 1 \\ Address line \\ ... \\ Address line -% \And ... \And -% Author n \\ Address line \\ ... \\ Address line} -% To start a seperate ``row'' of authors use \AND, as in -% \author{Author 1 \\ Address line \\ ... \\ Address line -% \AND -% Author 2 \\ Address line \\ ... \\ Address line \And -% Author 3 \\ Address line \\ ... \\ Address line} - -% If the title and author information does not fit in the area allocated, -% place \setlength\titlebox{} right after -% \usepackage{acl2000} -% where can be something larger than 2.25in - -% \typeout{Conference Style for ACL 2000 -- released June 20, 2000} -\typeout{Conference Style for ACL 2005 -- released Octobe 11, 2004} - -% NOTE: Some laser printers have a serious problem printing TeX output. -% These printing devices, commonly known as ``write-white'' laser -% printers, tend to make characters too light. To get around this -% problem, a darker set of fonts must be created for these devices. -% - -%% % Physical page layout - slightly modified from IJCAI by pj -%% \setlength\topmargin{0.0in} \setlength\oddsidemargin{-0.0in} -%% \setlength\textheight{9.0in} \setlength\textwidth{6.5in} -%% \setlength\columnsep{0.2in} -%% \newlength\titlebox -%% \setlength\titlebox{2.25in} -%% \setlength\headheight{0pt} \setlength\headsep{0pt} -%% %\setlength\footheight{0pt} -%% \setlength\footskip{0pt} -%% \thispagestyle{empty} \pagestyle{empty} -%% \flushbottom \twocolumn \sloppy - -%% Original A4 version of page layout -%% \setlength\topmargin{-0.45cm} % changed by Rz -1.4 -%% \setlength\oddsidemargin{.8mm} % was -0cm, changed by Rz -%% \setlength\textheight{23.5cm} -%% \setlength\textwidth{15.8cm} -%% \setlength\columnsep{0.6cm} -%% \newlength\titlebox -%% \setlength\titlebox{2.00in} -%% \setlength\headheight{5pt} -%% \setlength\headsep{0pt} -%% \setlength\footheight{0pt} -%% \setlength\footskip{0pt} -%% \thispagestyle{empty} -%% \pagestyle{empty} - -% A4 modified by Eneko -\setlength{\paperwidth}{21cm} % A4 -\setlength{\paperheight}{29.7cm}% A4 -\setlength\topmargin{-0.5cm} -\setlength\oddsidemargin{0cm} -\setlength\textheight{24.7cm} -\setlength\textwidth{16.0cm} -\setlength\columnsep{0.6cm} -\newlength\titlebox -\setlength\titlebox{2.00in} -\setlength\headheight{5pt} -\setlength\headsep{0pt} -\thispagestyle{empty} -\pagestyle{empty} - - -\flushbottom \twocolumn \sloppy - -% We're never going to need a table of contents, so just flush it to -% save space --- suggested by drstrip@sandia-2 -\def\addcontentsline#1#2#3{} - -% Title stuff, taken from deproc. -\def\maketitle{\par - \begingroup - \def\thefootnote{\fnsymbol{footnote}} - \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}} - \twocolumn[\@maketitle] \@thanks - \endgroup - \setcounter{footnote}{0} - \let\maketitle\relax \let\@maketitle\relax - \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax} -\def\@maketitle{\vbox to \titlebox{\hsize\textwidth - \linewidth\hsize \vskip 0.125in minus 0.125in \centering - {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in - {\def\and{\unskip\enspace{\rm and}\enspace}% - \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil - \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}% - \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup - \vskip 0.25in plus 1fil minus 0.125in - \hbox to \linewidth\bgroup\large \hfil\hfil - \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf} - \hbox to \linewidth\bgroup\large \hfil\hfil - \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf\@author - \end{tabular}\hss\egroup - \hfil\hfil\egroup} - \vskip 0.3in plus 2fil minus 0.1in -}} - -% margins for abstract -\renewenvironment{abstract}% - {\centerline{\large\bf Abstract}% - \begin{list}{}% - {\setlength{\rightmargin}{0.6cm}% - \setlength{\leftmargin}{0.6cm}}% - \item[]\ignorespaces}% - {\unskip\end{list}} - -%\renewenvironment{abstract}{\centerline{\large\bf -% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex} - - -% bibliography - -\def\thebibliography#1{\section*{References} - \global\def\@listi{\leftmargin\leftmargini - \labelwidth\leftmargini \advance\labelwidth-\labelsep - \topsep 1pt plus 2pt minus 1pt - \parsep 0.25ex plus 1pt \itemsep 0.25ex plus 1pt} - \list {[\arabic{enumi}]}{\settowidth\labelwidth{[#1]}\leftmargin\labelwidth - \advance\leftmargin\labelsep\usecounter{enumi}} - \def\newblock{\hskip .11em plus .33em minus -.07em} - \sloppy - \sfcode`\.=1000\relax} - -\def\@up#1{\raise.2ex\hbox{#1}} - -% most of cite format is from aclsub.sty by SMS - -% don't box citations, separate with ; and a space -% also, make the penalty between citations negative: a good place to break -% changed comma back to semicolon pj 2/1/90 -% \def\@citex[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi -% \def\@citea{}\@cite{\@for\@citeb:=#2\do -% {\@citea\def\@citea{;\penalty\@citeseppen\ }\@ifundefined -% {b@\@citeb}{{\bf ?}\@warning -% {Citation `\@citeb' on page \thepage \space undefined}}% -% {\csname b@\@citeb\endcsname}}}{#1}} - -% don't box citations, separate with ; and a space -% Replaced for multiple citations (pj) -% don't box citations and also add space, semicolon between multiple citations -\def\@citex[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi - \def\@citea{}\@cite{\@for\@citeb:=#2\do - {\@citea\def\@citea{; }\@ifundefined - {b@\@citeb}{{\bf ?}\@warning - {Citation `\@citeb' on page \thepage \space undefined}}% - {\csname b@\@citeb\endcsname}}}{#1}} - -% Allow short (name-less) citations, when used in -% conjunction with a bibliography style that creates labels like -% \citename{, } -% -\let\@internalcite\cite -\def\cite{\def\citename##1{##1, }\@internalcite} -\def\shortcite{\def\citename##1{}\@internalcite} -\def\newcite{\def\citename##1{{\frenchspacing##1} (}\@internalciteb} - -% Macros for \newcite, which leaves name in running text, and is -% otherwise like \shortcite. -\def\@citexb[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi - \def\@citea{}\@newcite{\@for\@citeb:=#2\do - {\@citea\def\@citea{;\penalty\@m\ }\@ifundefined - {b@\@citeb}{{\bf ?}\@warning - {Citation `\@citeb' on page \thepage \space undefined}}% -{\csname b@\@citeb\endcsname}}}{#1}} -\def\@internalciteb{\@ifnextchar [{\@tempswatrue\@citexb}{\@tempswafalse\@citexb[]}} - -\def\@newcite#1#2{{#1\if@tempswa, #2\fi)}} - -\def\@biblabel#1{\def\citename##1{##1}[#1]\hfill} - -%%% More changes made by SMS (originals in latex.tex) -% Use parentheses instead of square brackets in the text. -\def\@cite#1#2{({#1\if@tempswa , #2\fi})} - -% Don't put a label in the bibliography at all. Just use the unlabeled format -% instead. -\def\thebibliography#1{\vskip\parskip% -\vskip\baselineskip% -\def\baselinestretch{1}% -\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi% -\vskip-\parskip% -\vskip-\baselineskip% -\section*{References\@mkboth - {References}{References}}\list - {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent} - \setlength{\itemindent}{-\parindent}} - \def\newblock{\hskip .11em plus .33em minus -.07em} - \sloppy\clubpenalty4000\widowpenalty4000 - \sfcode`\.=1000\relax} -\let\endthebibliography=\endlist - -% Allow for a bibliography of sources of attested examples -\def\thesourcebibliography#1{\vskip\parskip% -\vskip\baselineskip% -\def\baselinestretch{1}% -\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi% -\vskip-\parskip% -\vskip-\baselineskip% -\section*{Sources of Attested Examples\@mkboth - {Sources of Attested Examples}{Sources of Attested Examples}}\list - {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent} - \setlength{\itemindent}{-\parindent}} - \def\newblock{\hskip .11em plus .33em minus -.07em} - \sloppy\clubpenalty4000\widowpenalty4000 - \sfcode`\.=1000\relax} -\let\endthesourcebibliography=\endlist - -\def\@lbibitem[#1]#2{\item[]\if@filesw - { \def\protect##1{\string ##1\space}\immediate - \write\@auxout{\string\bibcite{#2}{#1}}\fi\ignorespaces}} - -\def\@bibitem#1{\item\if@filesw \immediate\write\@auxout - {\string\bibcite{#1}{\the\c@enumi}}\fi\ignorespaces} - -% sections with less space -\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus - -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}} -\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus - -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}} -%% changed by KO to - values to get teh initial parindent right -\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus - -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}} -\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus - 0.5ex minus .2ex}{-1em}{\normalsize\bf}} -\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus - 0.5ex minus .2ex}{-1em}{\normalsize\bf}} - -% Footnotes -\footnotesep 6.65pt % -\skip\footins 9pt plus 4pt minus 2pt -\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt } -\setcounter{footnote}{0} - -% Lists and paragraphs -\parindent 1em -\topsep 4pt plus 1pt minus 2pt -\partopsep 1pt plus 0.5pt minus 0.5pt -\itemsep 2pt plus 1pt minus 0.5pt -\parsep 2pt plus 1pt minus 0.5pt - -\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em -\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em -\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt - -\def\@listi{\leftmargin\leftmargini} -\def\@listii{\leftmargin\leftmarginii - \labelwidth\leftmarginii\advance\labelwidth-\labelsep - \topsep 2pt plus 1pt minus 0.5pt - \parsep 1pt plus 0.5pt minus 0.5pt - \itemsep \parsep} -\def\@listiii{\leftmargin\leftmarginiii - \labelwidth\leftmarginiii\advance\labelwidth-\labelsep - \topsep 1pt plus 0.5pt minus 0.5pt - \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt - \itemsep \topsep} -\def\@listiv{\leftmargin\leftmarginiv - \labelwidth\leftmarginiv\advance\labelwidth-\labelsep} -\def\@listv{\leftmargin\leftmarginv - \labelwidth\leftmarginv\advance\labelwidth-\labelsep} -\def\@listvi{\leftmargin\leftmarginvi - \labelwidth\leftmarginvi\advance\labelwidth-\labelsep} - -\abovedisplayskip 7pt plus2pt minus5pt% -\belowdisplayskip \abovedisplayskip -\abovedisplayshortskip 0pt plus3pt% -\belowdisplayshortskip 4pt plus3pt minus3pt% - -% Less leading in most fonts (due to the narrow columns) -% The choices were between 1-pt and 1.5-pt leading -\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} -\def\small{\@setsize\small{10pt}\ixpt\@ixpt} -\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt} -\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt} -\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt} -\def\large{\@setsize\large{14pt}\xiipt\@xiipt} -\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt} -\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt} -\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt} -\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt} diff --git a/papers/2009-icon-hyderabad/paper.bib b/papers/2009-icon-hyderabad/paper.bib deleted file mode 100644 index 2978cb7..0000000 --- a/papers/2009-icon-hyderabad/paper.bib +++ /dev/null @@ -1,120 +0,0 @@ -%% Saved with string encoding Unicode (UTF-8) - -@article{malt, - Author = {Nivre, Joakim and Hall, Johan and Nilsson, Jens and Chanev, Atanas and Eryiğit, Gülşen and Kübler, Sandra and Marinov, Svetoslav and Marsi, Erwin}, - Title = {MaltParser: A language-independent system for data-driven dependency parsing}, - Year = 2007, - Journal = {Natural Language Engineering}, - Volume = 13, - Number = 2, - Pages = {95--135}, - CommentDoi = {10.1017/S1351324906004505}, - CommentUrl = {http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1012768}} - -@InProceedings{mst, - author = {McDonald, Ryan and Pereira, Fernando and Ribarov, Kiril and Hajič, Jan}, - title = {Non-Projective Dependency Parsing using Spanning Tree Algorithms}, - booktitle = {Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing}, - month = {October}, - year = {2005}, - address = {Vancouver, British Columbia, Canada}, - publisher = {Association for Computational Linguistics}, - pages = {523--530}, - url = {http://www.aclweb.org/anthology/H/H05/H05-1066} -} - -@InProceedings{buchholz-marsi:2006:CoNLL-X, - author = {Buchholz, Sabine and Marsi, Erwin}, - title = {CoNLL-X Shared Task on Multilingual Dependency Parsing}, - booktitle = {Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)}, - month = {June}, - year = {2006}, - address = {New York City}, - publisher = {Association for Computational Linguistics}, - pages = {149--164}, - url = {http://www.aclweb.org/anthology/W/W06/W06-2920} -} - -@InProceedings{nivre-EtAl:2007:EMNLP-CoNLL2007, - author = {Nivre, Joakim and Hall, Johan and K\"ubler, Sandra and McDonald, Ryan and Nilsson, Jens and Riedel, Sebastian and Yuret, Deniz}, - title = {The {CoNLL} 2007 Shared Task on Dependency Parsing}, - booktitle = {Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007}, - month = {June}, - year = {2007}, - address = {Praha, Czechia}, - publisher = {Association for Computational Linguistics}, - pages = {915--932}, - url = {http://www.aclweb.org/anthology/D/D07/D07-1096} -} - -@book{svm, - author = {Cristianini, Nello and Shawe-Taylor, John}, - title = {An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods}, - year = {2000}, - address = {Cambridge, UK}, - publisher = {Cambridge University Press}} - -@phdthesis{dzparser, - author = {Daniel Zeman}, - title = {Parsing with a Statistical Dependency Model}, - year = {2004}, - address = {Praha, Czechia}, - school = {Univerzita Karlova v Praze}, -} - -@InProceedings{titov-henderson:2007:EMNLP-CoNLL2007, - author = {Titov, Ivan and Henderson, James}, - title = {Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model}, - booktitle = {Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007}, - month = {June}, - year = {2007}, - address = {Praha, Czechia}, - publisher = {Association for Computational Linguistics}, - pages = {947--951}, - url = {http://www.aclweb.org/anthology/D/D07/D07-1099} -} - -@InProceedings{nivre:2009:ACLIJCNLP, - author = {Nivre, Joakim}, - title = {Non-Projective Dependency Parsing in Expected Linear Time}, - booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP}, - month = {August}, - year = {2009}, - address = {Suntec, Singapore}, - publisher = {Association for Computational Linguistics}, - pages = {351--359}, - url = {http://www.aclweb.org/anthology/P/P09/P09-1040} -} - -@inproceedings{ biblio:ZeZaImprovingParsing2005, - author = {Daniel Zeman and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}}}, - title = {Improving Parsing Accuracy by Combining Diverse Dependency Parsers}, - booktitle = {Proceedings of the Ninth International Workshop on Parsing Technologies ({IWPT})}, - year = {2005}, - publisher = {Association for Computational Linguistics}, - institution = {Simon Fraser University}, - address = {Vancouver, British Columbia, Canada}, - venue = {Westin Hotel}, - pages = {171--178}, - isbn = {1-932432-58-2}, -} - -@article{neproj, - Author = {Hajičová, Eva and Havelka, Jiří and Sgall, Petr and Veselá, Kateřina and Zeman, Daniel}, - Title = {Issues of Projectivity in the Prague Dependency Treebank}, - Year = 2004, - Journal = {The Prague Bulletin of the Mathematical Linguistics}, - Volume = 81, -} - -@InProceedings{nivre-mcdonald:2008:ACLMain, - author = {Nivre, Joakim and McDonald, Ryan}, - title = {Integrating Graph-Based and Transition-Based Dependency Parsers}, - booktitle = {Proceedings of ACL-08: HLT}, - month = {June}, - year = {2008}, - address = {Columbus, Ohio}, - publisher = {Association for Computational Linguistics}, - pages = {950--958}, - url = {http://www.aclweb.org/anthology/P/P08/P08-1108} -} diff --git a/papers/2009-icon-hyderabad/paper.tex b/papers/2009-icon-hyderabad/paper.tex deleted file mode 100644 index 5e3a202..0000000 --- a/papers/2009-icon-hyderabad/paper.tex +++ /dev/null @@ -1,326 +0,0 @@ -% !TEX TS-program = xelatex -% !TEX encoding = UTF-8 Unicode -% File acl-ijcnlp2009.tex -% -% Contact jshin@csie.ncnu.edu.tw -%% -%% Based on the style files for EACL-2009 and IJCNLP-2008... -%% Based on the style files for EACL 2006 by -%%e.agirre@ehu.es or Sergi.Balari@uab.es -%% and that of ACL 08 by Joakim Nivre and Noah Smith - -\documentclass[11pt]{article} -\usepackage{acl-ijcnlp2009} -\usepackage[ - pdfdisplaydoctitle, breaklinks, colorlinks, linkcolor=black, citecolor=black, filecolor=black, urlcolor=black, - backref, hyperfootnotes]{hyperref} % backref a modre URL asi nakonec zrusime -%\usepackage{times} -\usepackage{url} -\usepackage{amsmath} -\usepackage{color} %pro korektury -\usepackage{paralist} % for better itemize and enumerate - -% a footer required for the first page -\usepackage{fancyhdr} -\fancyhead{} % clear all header fields -\renewcommand{\headrulewidth}{0pt} -\fancyfoot[C]{Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009} - -% xelatex -\usepackage{fontspec, xunicode, xltxtra} -\defaultfontfeatures{Mapping=tex-text} -\setmainfont{Times New Roman} -\setmonofont[Scale=MatchLowercase]{Luxi Mono} -\setmathsf{Lohit Hindi}%\XXX -%\newfontinstance\hi[Script=Devanagari]{Lohit Hindi} -\newfontinstance\hifont[Script=Devanagari]{Code2000} -\newfontinstance\bnfont[Script=Bengali]{Code2000} -\newfontinstance\tefont[Script=Telugu]{Code2000} -\newfontinstance\translitfont{Gentium} -\newcommand{\hi}[1]{{\hifont #1}} -\newcommand{\bn}[1]{{\bnfont #1}} -\newcommand{\te}[1]{{\tefont #1}} -\newcommand{\translit}[1]{{\translitfont \textit{(#1)}}} - -% natbib -\usepackage{natbib} -\bibliographystyle{plainnat} -\bibpunct{(}{)}{;}{a}{,}{,} - -% our defs -\def\perscite#1{\citet{#1}} -\def\parcite#1{\citep{#1}} -%ps: Did you mean \citep and \citet (in-parentheses and textual reference)? There is even more, see `texdoc natbib`. -\def\Sref#1{Section~\ref{#1}} -\def\Tref#1{Table~\ref{#1}} -\def\Fref#1{Figure~\ref{#1}} -\newcommand{\red}[1]{\textcolor{red}{#1}} % komentare (TODO) -\newcommand{\XXX}{\textcolor{red}{XXX }} % komentare (TODO) - -\def\microsection#1{{\bf #1.}} - - - -\title{Maximum Spanning Malt: Hiring World's Leading Dependency Parsers to Plant Indian Trees% -% Tohle nechat zakomentované, je to jen tahák, jak udělat acknowledgement grantu při nedostatku místa. Jinak ale mám momentálně na konci opravdovou sekci Acknowledgements. -%\thanks{ \hspace{.6em}The research has been supported by the grant -%MSM0021620838 (Czech Ministry of Education).} -} - -\author{Daniel Zeman\\ -Univerzita Karlova v Praze, Ústav formální a aplikované lingvistiky\\ -Malostranské náměstí 25, CZ-11800, Praha, Czechia\\ -\texttt{zeman@ufal.mff.cuni.cz} -} - -%\title{Instructions for ACL-IJCNLP 2009 Proceedings} -% -%\author{First Author\\ -% Affiliation / Address line 1\\ -% Affiliation / Address line 2\\ -% {\tt email@domain} \And -% Second Author\\ -% Affiliation / Address line 1\\ -% Affiliation / Address line 2\\ -% {\tt email@domain}} - -\date{} - -\begin{document} -\maketitle -\thispagestyle{fancy} - -\begin{abstract} -We present our system used for participation in the ICON 2009 NLP Tools Contest: dependency parsing of Hindi, Bangla and Telugu. The system consists of three existing, freely available dependency parsers, two of which (MST and Malt) have been known to produce state-of-the-art structures on data sets for other languages. Various settings of the parsers are explored in order to adjust them for the three Indian languages, and a voting approach is used to combine them into a superparser. Since there is nothing novel about the approach used, substantial part of the paper is devoted to the analysis of errors the system makes on the given data sets. -\end{abstract} - -\section{Introduction} -\label{sec:intro} - -Dependency parsing, i.e. sentence analysis that outputs tree of word-on-word dependencies (as opposed to constituent trees of context-free derivations), gained growing attention and popularity recently. There are data-driven dependency parsers that can be trained on syntactically annotated corpora (treebanks) and new, previously unseen material can be parsed very efficiently \citep{nivre:2009:ACLIJCNLP}. - -Most of the successful parsers employ discriminative learning techniques to sort out vast sets of potentially useful features observed in the input text. Thus, for every new training treebank, smart feature engineering is the key to getting the most out of the existing parsers, regardless how well they performed on other data sets and languages. Now that there are new treebanks available for two Indo-Aryan and one Dravidian language, we took three existing dependency parsers and explored the possibilities of tuning them for the new training data. Both parser configuration and data preprocessing are relevant approaches to the tuning. In addition, we used parser combination to further improve the results. - -Throughout the paper we focus mainly on the unlabeled attachment score. Although the parsers produce labeled dependencies, we do not optimize the system towards label accuracy. - -The rest of the paper is organized as follows: In \Sref{sec:system}, we describe the respective parsers and the combined parsing system. In \Sref{sec:experiments}, we report on the experiments we performed, discuss various results on the development set and analyze the errors. In \Sref{sec:evaluation} we present the official results on the test data. We conclude by summarizing the best configuration we were able to find, and future implications. - -\section{System Description} -\label{sec:system} - -Several good trainable dependency parsers have emerged during the past five years. The CoNLL-X \citep{buchholz-marsi:2006:CoNLL-X} and CoNLL 2007 \citep{nivre-EtAl:2007:EMNLP-CoNLL2007} shared tasks in multilingual dependency parsing have greatly contributed to the development of the parsers. Some of the parsers are now freely available on the web, some are even open-source. We selected three of the publicly available parsers for our experiments: - -%\microsection{MST Parser} -\subsection{MST Parser} -\label{sec:mst} -The Maximum Spanning Tree (MST) parser \citep{mst} views the sentence as a directed complete graph with edges weighted by a feature scoring function. It finds for the graph the spanning tree that maximizes the weights of the edges. A multi-class classification algorithm called MIRA is used to compute the scoring function. - -MST Parser achieved the best unlabeled attachment scores (UAS) for 9 out of the 13 languages of CoNLL-X, and second best scores in two others. Parsing is fast but training the parser takes many hours on large treebanks. On small data however, multiple quick experiments with different settings are still doable. The parser is implemented in Java and freely available for download.\footnote{\url{http://sourceforge.net/projects/mstparser/}} - -%\microsection{Malt Parser} -\subsection{Malt Parser} -\label{sec:malt} -The Malt Parser \citep{malt} is a deterministic shift-reduce parser where input words can be either put to the stack or taken from the stack and combined to form a dependency. The decision which operation to perform is made by an oracle based on various features of the words in the input buffer and the stack. The default machine learning algorithm used to train the oracle is a sort of SVM (support vector machine) classifier \citep{svm}. - -Malt Parser has participated in both CoNLL-X and CoNLL 2007 shared tasks, and although it achieved the best UAS in three languages only, it usually scored among the five best parsers, sometimes with statistically insignificant difference from the winner. Malt Parser is really fast and its new Java implementation is open-source, freely available for download.\footnote{\url{http://maltparser.org/}} - -%\microsection{DZ Parser} -\subsection{DZ Parser} -\label{sec:dz} -In order to combine the two above parsers, we needed a third parser. We picked DZ Parser \citep{dzparser}, which is also reasonably fast and freely available.\footnote{\url{http://ufal.mff.cuni.cz/~zeman/projekty/parser/}} Although its accuracy, if compared to MST or Malt, is worse by a wide margin, this parser proved useful because its only role was to help to form a majority whenever MST and Malt disagreed. - -DZ Parser builds a model of bigrams of words that occur together in a dependency; most of the time, words are identified by their part of speech tags and morphological features. The parser was originally developed for Czech but it can be re-trained for any other language.\footnote{Of course there are other dependency parsers that successfully participated in the CoNLL shared tasks and are available for download. One alternative worth mentioning is the ISBN Parser \citep{titov-henderson:2007:EMNLP-CoNLL2007} at \url{http://flake.cs.uiuc.edu/~titov/}.} - -\subsection{Voting Superparser} -\label{sec:voting} -The three parsers are combined using a simple weighted-voting approach similar to \citet{biblio:ZeZaImprovingParsing2005}, except that the output is guaranteed to be cycle-free. We start by evaluating every parser separately on the development data. The UAS of each parser is subsequently used as the weight of that parser's vote. Dependencies are parent-child relations, and for every node there are up to three candidates for its parent (if all three parsers disagree). Candidates get weighted votes -- e.g., if parsers with weights $w_1 = 0.8$ and $w_2 = 0.7$ agree on the candidate, the candidate gets 1.5 votes. Since we have only three parsers, in practice this means that the candidate of the best parser looses only if 1. the other two parsers agree on someone else, or 2. if attaching the child to this candidate would create a cycle. - -The tree is constructed from the root down. We repeatedly add nodes whose winning parent candidates are already in the tree. If none of the remaining nodes meet this condition, we have to break a cycle. We do so by examining all unattached nodes. At each node we note the votes of its current winning parent. Then we remove the least-scoring winner and go on with adding nodes until all nodes are attached or there is another cycle to break. - -\section{Experiments} -\label{sec:experiments} - -The final test data are blind, any error analysis is therefore impossible. That is why all scores given in this section were measured on the development data. All three treebanks follow the same annotation scheme and each of them is available in two flavors: - -\begin{compactitem} -\item \textit{nomorph} variety contains word forms, chunk labels, dependency links and dependency labels -\item \textit{morph} variety is augmented by automatically assigned lemmas, part of speech tags and values of morphological features (gender, number, person, case, postposition and tam -- tense+aspect+modality) -\end{compactitem} - -\subsection{Morphology} -\label{sec:morphology} - -\Tref{tab:baseline} shows baseline results on the \textit{nomorph} data. Both MST and Malt parsers were invoked in projective mode, Malt with the default Nivre arc-eager algorithm. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|l|l|l|l} -& \textbf{MST} & \textbf{Malt} & \textbf{DZ} & \textbf{Vote} \\ -\hline -hi & 80.32 & 81.84 & 62.00 & \textbf{82.48}\\ -bn & 82.00 & \textbf{84.71} & 71.02 & 83.11\\ -te & 77.63 & \textbf{80.89} & 70.52 & 80.59\\ -\end{tabular} -\caption{Baseline UAS of the four parsers on \textit{nomorph} development data. Language codes follow ISO 639: hi = Hindi, bn = Bangla, te = Telugu.} -\label{tab:baseline} -\end{centering} -\end{table} - -There are several ways how to use the additional information from the \textit{morph} data. The easiest way, exploitable by all three parsers, is to combine the chunk label, the POS tag and the features into one tag string. The results (not presented here) are very poor. Although there may be tagging errors, the most likely cause is data sparseness. In \Tref{tab:corpus} we illustrate this by showing the numbers of unique values in the various attributes of treebank words. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|r|r|r|r|r|r} -& \textbf{occ} & \textbf{frm} & \textbf{lem} & \textbf{cl} & \textbf{pos} & \textbf{feat} \\ -\hline -hi & 13779 & 3973 & 3134 & 10 & 33 & 714\\ -bn & 6449 & 2997 & 2336 & 14 & 30 & 367\\ -te & 5494 & 2462 & 1403 & 12 & 31 & 453\\ -\end{tabular} -\caption{Size of the training corpora: occ -- word occurrences, frm -- distinct forms, lem -- lemmas, cl -- chunk labels, pos -- part of speech tags, feat -- feature value combinations.} -\label{tab:corpus} -\end{centering} -\end{table} - -To fight sparseness, we could either restrict the tag to selected information, or split the information into multiple features learnt separately, or both. We first restricted the tag string to selected information. Parsing on other treebanks showed that POS with case is especially useful \citep{dzparser}. The other feature we selected is called \textit{vibhakti}, which partially corresponds to case suffix and partially to postposition.\footnote{The Indian treebanks at hand are unusual in that nodes do not always map to words. They represent chunks, with function words such as postpositions hidden in node attributes.} - -\Tref{tab:poscasevib} presents the results of restricting the tag string to POS+case+vibhakti. This move especially improves the MST Parser, which now outperforms Malt on Hindi and Bangla. Malt improves on Hindi but drops behind on Bangla and Telugu. DZ Parser also improves on Hindi and deteriorates elsewhere but there is an interesting observation: even though its own scores are worse, the worsened output actually improves the voting results (provided the configurations of MST and Malt are fixed). So it seems that the newly introduced errors are less important (because taken care of by the more powerful parsers) while some difficult parts of the data are now covered better. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|l|l|l|l} -& \textbf{MST} & \textbf{Malt} & \textbf{DZ} & \textbf{Vote}\\ -\hline -hi & 86.16 & 85.84 & 75.12 & \textbf{87.12}\\ -bn & 85.70 & 77.31 & 54.38 & \textbf{85.82}\\ -te & \textbf{79.85} & 77.78 & 45.78 & 79.70\\ -\end{tabular} -\caption{UAS on refined \textit{morph} data (POS tag, case and vibhakti concatenated).} -\label{tab:poscasevib} -\end{centering} -% Když se ke značce POS přilepí pád, ale ne vibhakti, a s tím se přetrénuje Malt parser (ostatní parsery jsme nechali beze změny, tj. MST byl natrénovaný na nomorph a DZ naopak i na vibhakti), tak Malt sám má na všech třech jazycích pořád stejné výsledky (jako v této tabulce), ale výsledky Vote se na hindštině zhorší a na zbývajících dvou jazycích zlepší: hi=85.36, bn=83.35, te=80.89. -\end{table} - -Finally, ve returned to \textit{nomorph} with Malt on Bangla and Telugu, where the POS+case+vibhakti tags did not help. The results are given in \Tref{tab:posmix2}. This is also the configuration we used to parse the test set for the official evaluation round 1. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|l|l|l|l} -& \textbf{MST} & \textbf{Malt} & \textbf{DZ} & \textbf{Vote}\\ -\hline -hi & 86.16 & 85.84 & 75.12 & \textbf{87.12}\\ -bn & 85.70 & 84.71 & 54.38 & \textbf{86.19}\\ -te & 79.85 & 80.89 & 45.78 & \textbf{82.37}\\ -\end{tabular} -\caption{UAS on mixed data: MST and DZ use POS+case+vibhakti for all languages, Malt uses that for Hindi only, elsewhere it uses just POS.} -\label{tab:posmix2} -\end{centering} -\end{table} - -\subsection{Nonprojectivity} -\label{sec:nonprojectivity} - -Nonprojectivity is a property of the dependency structure and the word order \citep{neproj} that makes parsing more difficult. All three parsers can produce nonprojective structures and all three treebanks are nonprojective. However, except for Hindi, the proportion of nonprojective dependencies is so small that one can hardly imagine that running the parsers in nonprojective mode would bring any improvement. A quick experiment with Malt Parser switched to the nonprojective Stack Eager algorithm revealed that it actually hurts the results even for Hindi. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|ll} -& \textbf{Edges} & \textbf{Sentences} \\ -\hline -hi & 01.83 & 13.93\\ -bn & 00.96 & 05.49\\ -te & 00.45 & 01.31\\ -\end{tabular} -\caption{Percentage of nonprojective dependencies, and of sentences containing at least one nonprojectivity.} -\label{tab:nonprojectivity} -\end{centering} -\end{table} - -\subsection{Error Patterns} -\label{sec:errors} - -The accuracy of the dependencies is relatively high and it is difficult to trace repetitive error patterns. In Hindi, many wrong attachments seem to be long-distance, and verbs, conjunctions, root and NULL nodes are frequently involved. Frequent words should perhaps be available to the parsers as parts of tag strings: for instance, Hindi \hi{कि} \translit{ki} ``that'' or \hi{तो} \translit{to} are wrongly attached because the parser only sees the general CC tag. On a similar note, problems with coordination, also observed e.g. by \citet{dzparser}, occur here, too: \hi{भाई और भाभी} \translit{bhāī aura bhābhī} ``brother and his wife'' is correctly recognized as coordination rooted by the conjunction \hi{और}, however, the conjunction node lacks the information about its noun children and fails to attach as the subject of the verb. - -The tag string should contain both the chunk label and the POS tag. So far we wrongly assumed that POS always determines the chunk label. It is often so but not always, as exemplified in the Bangla chunk sequence \bn{তবে সুদীপ ওকে একদিন আড়ালে ডেকে বলেছিল কৌতূহল দেখালে তুমি উঁচুতে উঠতে অনিমেষ} \translit{tabe sudīpa oke ekadina āṛāle ḍeke balechila kautūhala dekhāle tumi um̃cute uṭhate animeša}. The words \bn{ডেকে} and \bn{দেখালে} are tagged VGNF|VM while \bn{বলেছিল} and \bn{উঠতে} are VGF|VM. The parser gets them wrong and it could be caused by it seeing only VM in the tag. - -In Telugu, extraordinal number of sentences follow the SOV order so strongly that the last node (verb) is almost always attached to the root and most other nodes are attached directly to the last node. An example chunk sequence where this rule would lead to 100~\% accuracy follows: \te{రాష్ట్రంలొ రంగారెడ్డి మెదక్ నిజామాబాద్ జిల్లాలలొ పంటను గొప్పొ పండిస్తున్నారు} \translit{rāšṭraṁlo raṁgāreḍḍi medak nijāmābād jillālalo paṁṭanu goppo paṁḍistunnāru}. In the light of such examples it seems reasonable to provide the parsers with an additional feature telling whether a particular dependency observes the ``naïve Telugu'' structure. Note however that this will not help with the other two languages. While 73.75~\% of Telugu dependncies follow this rule, it is only 39.52~\% in Bangla and 35.71~\% in Hindi. - -\subsection{Voting Potential} -\label{sec:votingpotential} - -In order to see how much can be potentially gained from parser combination, we summarized the attachments that at least one of the parsers got correct. This \textit{oracle accuracy} gives an upper limit for the real scores we can achieve. It corresponds to the case that for every word, an oracle correctly tells which parser to ask about the word's parent. \Tref{tab:oracle} presents the oracle accuracies together with percentage of unique correct attachments that only one parser delivered. These figures give some idea of how much similar are the errors of the respective parsers to each other. Malt parser has the most unique know-how in all three languages, which could be explained by its focus on local features. Both MST and DZ can reach for global, sentence-wide relations. Note however, that the development data set is small and the percentages correspond to 42 (Malt/Bangla) or less words. - -\begin{table}[ht] -\begin{centering} -%\small -\begin{tabular}{l|l|l|l|l} -& \textbf{Oracle} & \textbf{UqMST} & \textbf{UqMalt} & \textbf{UqDZ} \\ -\hline -hi & 93.92 & 2.96 & \textbf{3.12} & 1.84\\ -bn & 94.20 & 4.32 & \textbf{5.18} & 1.97\\ -te & 88.00 & 2.37 & \textbf{5.48} & 2.07\\ -\end{tabular} -\caption{Oracle accuracy for the three languages, and unique correct attachments (\%) proposed by a single parser.} -\label{tab:oracle} -\end{centering} -\end{table} - -%\subsection{\XXX To Do} -%\begin{compactitem} -%\item \XXX How many times a candidate lost due to cycle prevention, and how many times this introduced an error? -%\item \XXX Learning curve -%\item \XXX Spočítat, kolik je ve kterém jazyce uzlů NULL. -%\item \XXX casta slova, priklad z hindstiny -%\item \XXX Změřit výkon kombinovaného parseru. K tomu je potřeba rozchodit skript zmeritvykon.pl, nechtěl mi fungovat. -%\end{compactitem} - -\section{Official Evaluation} -\label{sec:evaluation} - -Finally, we present the official evaluation of our voting superparser, as measured by the organizers on the test data. For this purpose, the parsing system has been retrained on both the training data and the development data. The results are shown in \Tref{tab:evaluation}. - -\begin{table}[ht] -\begin{centering} -\small -\begin{tabular}{l|l|l|l} -& \textbf{UAS} & \textbf{LAA} & \textbf{LAS} \\ -\hline -hi & 88.58 (3:90.31) & 72.66 (4:76.38) & 68.60 (4:74.48)\\ -bn & 86.06 (4:90.32) & 71.28 (4:81.27) & 66.70 (5:79.81)\\ -te & 80.27 (4:86.28) & 54.20 (4:61.58) & 49.91 (4:60.55)\\ -\end{tabular} -\caption{Official scores on the test data: unlabeled attachment score (UAS), label assignment accuracy (LAA) and labeled attachment score (LAS). The numbers in parentheses are the rank of our system and the score of the best system w.r.t. the given metric.} -\label{tab:evaluation} -\end{centering} -\end{table} - -\section{Related and Future Work} -\label{sec:related} - -There is a large body of work on parser combination. A summary can be found in \citet{nivre-mcdonald:2008:ACLMain}, whose approach is also related to ours w.r.t. the selection of parsers. However, their feature-based integration of MST and Malt parsers is much more sophisticated than our lightweight voting. Further improvement of accuracy can be expected if MST-Malt integration is applied to the Indian treebanks. - -Future work, at the time of writing, includes experiments that can be run before the evaluation round 2, and thus their results will appear in the final version of this paper. We definitely intend to test other algorithms of the Malt parser, as well as to reconfigure its feature pool and let it work with POS, case and vibhakti separately. - -Labeling of the dependencies is another problem that deserves more attention. We have concentrated on the unlabeled attachment score so far and for the sake of the official evaluation, we simply pushed the MST labels through. A separate postprocessing classifier would probably produce better results. - -\section{Conclusion} -\label{sec:concl} - -We have described our system of voting parsers, as applied to the ICON 2009 NLP Tools Contest task. We showed that case and vibhakti are important features at least for parsing Hindi while their usability in Bangla and Telugu is limited by data sparseness. Providing these features to MST and DZ in all languages, and to Malt in Hindi only yielded the best combined parser. We also discussed several error patterns that could lead to further improvements of the parsing system in future. - -\section*{Acknowledgements} - -We are enormously grateful to the developers of MST and Malt parsers for making their software available to the research community. -The research has been supported by the grant -MSM0021620838 (Czech Ministry of Education). - -\begin{small} -\bibliography{paper} -\end{small} - -\end{document} diff --git a/papers/2009-icon-hyderabad/submitted1.pdf b/papers/2009-icon-hyderabad/submitted1.pdf deleted file mode 100644 index fc34e1e..0000000 Binary files a/papers/2009-icon-hyderabad/submitted1.pdf and /dev/null differ