Clarification in the jsonify.py code #4

kurtespinosa · 2017-05-11T17:10:07Z

Dear Butsugiri,

Thank you for sharing your code. I have a question about the input dataset which I would need to jsonify. I download the dataset and used the respective data partitions, for example, WikiQA-test.tsv for test set which has a sample file entry below.

QuestionID Question DocumentID DocumentTitle SentenceID Sentence Label
Q0 HOW AFRICAN AMERICANS WERE IMMIGRATED TO THE US D0 African immigration to the United States D0-0 African immigration to the United States refers to immigrants to the United States who are or were nationals of Africa . 0

Now, I'm confused because in the jsonify code, the question would point to D0-0 which is the sentenceID. It seems that the question_id and the question were interchanged, am I right or did I miss out anything?

question_id = data[1]
....
question = data[-3]
answer = data[-2]
....
....
'question': question.lower().split(" "),
'answer': answer.lower().split(" "),

should have been the following?

question = data[1]
.....
question_id = data[-3]
answer = data[-2]
....
....
'question': question.lower().split(" "),
'answer': answer.lower().split(" "),

Cheers,
Kurt

The text was updated successfully, but these errors were encountered:

butsugiri · 2017-05-13T02:34:01Z

Hi,

The indexing for some variables like question and queston_id seem interchanged, because jsonify.py requires some extra preprocessing beforehand (and I am sorry that it is not provided on this repo).
It is basically for removing the questions that do not contain correct answer in it, as described on the original paper.
So please fix the code if you think it is necessary.

After preprocessing, the file should look like:

{"label": "0", "sentence_id": "D11-0", "question": ["how", "big", "is", "bmc", "software", "in", "houston", ",", "tx"], "title": "BMC Software", "answer": ["bmc", "software", ",", "inc.", "is", "an", "american", "company", "specializing", "in", "business", "service", "management", "(", "bsm", ")", "software", "."], "document_id": "D11", "question_id": "Q11"}
{"label": "0", "sentence_id": "D11-1", "question": ["how", "big", "is", "bmc", "software", "in", "houston", ",", "tx"], "title": "BMC Software", "answer": ["headquartered", "in", "houston", ",", "texas", ",", "bmc", "develops", ",", "markets", "and", "sells", "software", "used", "for", "multiple", "functions", ",", "including", "it", "service", "management", ",", "data", "center", "automation", ",", "performance", "management", ",", "virtualization", "lifecycle", "management", "and", "cloud", "computing", "management", "."], "document_id": "D11", "question_id": "Q11"}

Each line contains one QA pair in json format.

kurtespinosa · 2017-05-15T10:20:50Z

Thank you for taking time to answer my question. This clarifies it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification in the jsonify.py code #4

Clarification in the jsonify.py code #4

kurtespinosa commented May 11, 2017 •

edited

Loading

butsugiri commented May 13, 2017

kurtespinosa commented May 15, 2017

Clarification in the jsonify.py code #4

Clarification in the jsonify.py code #4

Comments

kurtespinosa commented May 11, 2017 • edited Loading

butsugiri commented May 13, 2017

kurtespinosa commented May 15, 2017

kurtespinosa commented May 11, 2017 •

edited

Loading