Skip to content

Commit

Permalink
Working on anaphora resolution
Browse files Browse the repository at this point in the history
  • Loading branch information
Stephen Bly committed Apr 14, 2013
1 parent bc71d26 commit 7eb8b66
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 5 deletions.
47 changes: 47 additions & 0 deletions README.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) (format=pdflatex 2012.6.30) 5 APR 2013 02:11
entering extended mode
restricted \write18 enabled.
file:line:error style messages enabled.
%&-line parsing enabled.
**README.md
(./README.md
LaTeX2e <2011/06/27>
Babel <v3.8m> and hyphenation patterns for english, dumylang, nohyphenation, ge
rman-x-2012-05-30, ngerman-x-2012-05-30, afrikaans, ancientgreek, ibycus, arabi
c, armenian, basque, bulgarian, catalan, pinyin, coptic, croatian, czech, danis
h, dutch, ukenglish, usenglishmax, esperanto, estonian, ethiopic, farsi, finnis
h, french, friulan, galician, german, ngerman, swissgerman, monogreek, greek, h
ungarian, icelandic, assamese, bengali, gujarati, hindi, kannada, malayalam, ma
rathi, oriya, panjabi, tamil, telugu, indonesian, interlingua, irish, italian,
kurmanji, latin, latvian, lithuanian, mongolian, mongolianlmc, bokmal, nynorsk,
polish, portuguese, romanian, romansh, russian, sanskrit, serbian, serbianc, s
lovak, slovenian, spanish, swedish, turkish, turkmen, ukrainian, uppersorbian,
welsh, loaded.

./README.md:1: LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...

l.1 A
RKref
?
./README.md:1: Emergency stop.
...

l.1 A
RKref
You're in trouble here. Try typing <return> to proceed.
If that doesn't work, type X <return> to quit.


Here is how much of TeX's memory you used:
7 strings out of 493488
256 string characters out of 3141326
49432 words of memory out of 3000000
3412 multiletter control sequences out of 15000+200000
3640 words of font info for 14 fonts, out of 3000000 for 9000
957 hyphenation exceptions out of 8191
5i,0n,4p,18b,14s stack positions out of 5000i,500n,10000p,200000b,50000s
./README.md:1: ==> Fatal error occurred, no output PDF file produced!
18 changes: 13 additions & 5 deletions answer
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import re
import itertools
import nltk
from nltk.stem import PorterStemmer
from bs4 import BeautifulSoup
import bs4
# Import our modules from /modules
sys.path.append("modules")
import questionClassifier
Expand All @@ -27,16 +27,24 @@ def contains_negative(sent):

# the set of pronouns, used for anaphora resolution
pronouns = set(["he", "she", "it", "him", "her", "his","they","their","we",
"our","I","you","your","my","mine","yours","ours"])
"our","i","you","your","my","mine","yours","ours"])

# Runs coreference resolution on the article using arkref.
# This still needs to be implemented.
def coref(path_to_article):
subprocess.call(["./arkref.sh", "-input", path_to_article])
tagged_article = open(path_to_article.replace("txt", "tagged")).read()
tagged_article = "<root>"+tagged_article+"</root>"
soup = BeautifulSoup(tagged_article, "xml")
print soup.prettify()
print tagged_article
soup = bs4.BeautifulSoup(tagged_article, "html.parser").root
for entity in soup.find_all(True):
if entity.string != None and entity.string.strip().lower() in pronouns:
antecedent_id = entity["entityid"].split("_")[0]
antecedent = soup.find(mentionid=antecedent_id)
#entity.string.replace_with(antecedent)
print antecedent
#entity.unwrap()

return open(path_to_article).read()

# Answers a question from the information in article.
Expand Down Expand Up @@ -83,4 +91,4 @@ if __name__ == '__main__':
print "Our answer:", answer(question, article)
print "Correct answer:", correct_answer

print
print

0 comments on commit 7eb8b66

Please sign in to comment.