Skip to content

crowd-deliberation/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d368578 · Sep 19, 2017

History

4 Commits
Sep 17, 2017
Sep 19, 2017
Sep 19, 2017
Sep 19, 2017
Sep 19, 2017

Repository files navigation

Crowd Deliberation

This is the official repository of the Crowd Deliberation data set. There are three different files with relevant data:

  • data.csv which lists all 80 texts that were analyzed in the study: 40 from the Sarcasm data set, and 40 from the Relation data set.
  • labels.csv which lists all labels collected from crowd workers for the 80 texts listed in data.csv.
  • deliberations.csv which lists all group deliberations that were done on individual cases. Each group deliberation has 3 members who may or may not have been active in each of the Justify and Reconsider sessions.

Columns in data.csv


  • DATASET: the identifier of the data set that text belongs to; either "deliberation-sarcasm" or "deliberation-relation-person-place"
  • DATA_ID: the unique numeric ID of that text, used to match entries in labels.csv and deliberations.csv
  • TEXT: the actual text content for that text
  • GROUND_TRUTH_LABEL: the correct label for that text; only available for the first 25 cases of the "deliberation-relation-person-place" data set.

Columns in labels.csv


  • DATA_ID: the numeric ID of the text that label was given for
  • DATASET: the data set this text belongs to
  • ANNOTATOR_ID: the numeric ID of the crowd worker who provided that label
  • LABEL_ID: the unique numeric ID of that label, used to match entries in deliberations.csv
  • ORIGINAL_LABEL: the actual label provided; one of "sarcastic" / "not_sarcastic" for the Sarcasm data set, and "relation_expressed" / "not_expressed" for the Relation Extraction data set
  • ORIGINAL_CONFIDENCE: the confidence for the label provided; one of 0 (Not Sure) / 0.5 (Sure) / 1 (Very Sure)
  • EVIDENCE: A list of text snippets highlighted as evidence for the provided label in JSON format: [ { "start": 123, "end": 456, "quote": "The highlight text snippet.", "comment": "An optional comment from the annotator." }, { ... }, { ... } ]; annotators were required to highlight at least one text snippet, but they could highlight as many as they wanted.
  • QUESTION_DO_YOU_THINK_OTHER_PEOPLE_MIGHT_CHOOSE_A_DIFFERENT_ANSWER_THAN_YOU_DID: One of "I expect most people to agree with me." / "I expect only about half of the people to agree with me." / "I expect most people to disagree with me."
  • QUESTION_WHY_DO_YOU_THINK_OTHER_PEOPLE_MIGHT_CHOOSE_A_DIFFERENT_ANSWER: Annotators were shown a list of six possible reasons and an "Other: ______" field to provide a free-form text answer; they had to check at least one, but could check multiple of them if they had indicated they expected some disagreement from other people; this column contains the list of all answers checked, including a potential free-form answer in JSON format: [ "The text contains relevant details other people could easily miss.", "This is a case where a person's answer would depend heavily on their personal preferences and taste.", "Other: some other free-form answer." ]
  • QUESTION_PLEASE_ELABORATE_ON_YOUR_ANSWER_TO_THE_PREVIOUS_QUESTION_EXPLAINING_WHY_YOU_THINK_OTHER_PEOPLE_MIGHT_CHOOSE_A_DIFFERENT_ANSWER: Free-form text answer
  • QUESTION_IF_THERE_WERE_OTHER_PEOPLE_WHO_CHOSE_A_DIFFERENT_ANSWER_THAN_YOU_DID_DO_YOU_THINK_A_GROUP_DISCUSSION_WOULD_HELP_TO_RESOLVE_THE_CASE: One of "Yes, a group discussion would help to resolve the case." / "No, a group discussion would not help to resolve the case."
  • DELIBERATION_ID: the numeric ID of the deliberation this label was discussed in if any. If this label was never part of any group discussion, this field is empty.

Columns in deliberations.csv


  • DELIBERATION_ID: The unique numeric ID of that deliberation
  • DATA_ID: The numeric ID of the text discussed in that deliberatino
  • WAS_RESOLVED: Whether this case was resolved; one of "True" / "False"
  • NUM_DISSENTING_OPINIONS_DISCUSSED: The number of dissenting opinions discussed in this deliberation; one of 0, 2, 3
  • NUM_DISSENTING_OPINIONS_RECONSIDERED: The number of dissenting opinions reconsidered in this deliberation; one of 0, 2, 3
  • NUM_DISSENTING_OPINIONS_DISCUSSED_AND_RECONSIDERED: The number of dissenting opinions discussed AND reconsidered in this deliberation; one of 0, 2, 3
  • NUM_DISSENTING_OPINIONS_DISCUSSED_RECONSIDERED_AND_CONCLUDED: The number of dissenting opinions discussed, reconsidered AND concluded in this deliberation; one of 0, 2, 3
  • MESSAGES: The chat history of messages exchanged in this deliberation in JSON format: [ { "user_id": 123, "user_pseudonym": "Happy Hippo", "content": "This is my message for the group." }, { ... }, { ... } ]

The following fields starting with MEMBER_ exist for all 3 members of the group discussion, starting with MEMBER_1_, MEMBER_2_, MEMBER_3_ respectively:

  • MEMBER_X_USER_ID: The numeric ID of group member X
  • MEMBER_X_NAME: The human-readable pseudonym assigned to group member X
  • MEMBER_X_LABEL_ID: The numeric ID of group member X's label discussed in this deliberation
  • MEMBER_X_ORIGINAL_LABEL: Group member X's original label; one of "sarcastic" / "not_sarcastic" for the Sarcasm data set, and "relation_expressed" / "not_expressed" for the Relation Extraction data set
  • MEMBER_X_ORIGINAL_CONFIDENCE: Group member X's original confidence in her label; one of 0 (Not Sure) / 0.5 (Sure) / 1 (Very Sure)
  • MEMBER_X_DID_DISCUSS: Whether group member X came back for follow-up session 1 to discuss the case; one of "True" / "False"
  • MEMBER_X_DID_RECONSIDER: Whether group member X came back for follow-up session 2 to reconsider her position; one of "True" / "False"
  • MEMBER_X_RECONSIDERED_LABEL: Group member X's reconsidered label; one of "sarcastic" / "not_sarcastic" for the Sarcasm data set, and "relation_expressed" / "not_expressed" for the Relation Extraction data set
  • MEMBER_X_RECONSIDERED_CONFIDENCE: Group member X's reconsidered confidence in her label; one of 0 (Not Sure) / 0.5 (Sure) / 1 (Very Sure)
  • MEMBER_X_QUESTION_BASED_ON_YOUR_DELIBERATION_WHY_DO_YOU_THINK_THE_OTHER_PEOPLE_IN_THE_GROUP_CHOSE_A_DIFFERENT_ANSWER: Same format as in column QUESTION_WHY_DO_YOU_THINK_OTHER_PEOPLE_MIGHT_CHOOSE_A_DIFFERENT_ANSWER for labels.csv to make both answers easily comparable.
  • MEMBER_X_QUESTION_PLEASE_ELABORATE_ON_YOUR_ANSWER_TO_THE_PREVIOUS_QUESTION_FOR_EXAMPLE_IF_YOU_CHANGED_YOUR_MIND_ABOUT_THE_SOURCE_OF_DISAGREEMENT_PLEASE_EXPLAIN_WHY: Free-form text answer
  • MEMBER_X_DID_CONCLUDE: Whether group member X came back for follow-up session 3 to conclude the case and give their assessment on why the case could be/could not be resolved; one of "True" / "False"
  • MEMBER_X_QUESTION_WHY_DO_YOU_THINK_THIS_CASE_COULD_BE_RESOLVED: Free-form text answer
  • MEMBER_X_QUESTION_WHY_DO_YOU_THINK_THIS_CASE_COULD_NOT_BE_FULLY_RESOLVED: Free-form text answer
  • MEMBER_X_QUESTION_DID_SOMEBODY_MAKE_YOU_DOUBT_YOUR_ORIGINAL_ANSWER_WHY_OR_WHY_NOT: A text string starting with "Yes" / "No" followed with an explanation.
  • MEMBER_X_QUESTION_DID_SOMEBODY_MAKE_YOU_CHANGE_YOUR_ORIGINAL_ANSWER_WHY_OR_WHY_NOT: A text string starting with "Yes" / "No" followed with an explanation.
  • MEMBER_X_QUESTION_DID_YOU_MANAGE_TO_CONVINCE_SOMEONE_TO_CHANGE_THEIR_ANSWER_OR_CONFIDENCE_LEVEL_WHY_DO_YOU_THINK_YOU_WERE_ABLEUNABLE_TO_CONVINCE_THEM: A text string starting with "Yes" / "No" followed with an explanation.
  • MEMBER_X_QUESTION_DESCRIBE_HOW_YOU_FEEL_ABOUT_THE_DELIBERATION_PROCESS: Free-form text answer
  • MEMBER_X_QUESTION_DESCRIBE_HOW_YOU_FEEL_ABOUT_THE_DELIBERATION_OUTCOME: Free-form text answer