diff --git a/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.csv b/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.csv new file mode 100644 index 0000000..729c741 --- /dev/null +++ b/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.csv @@ -0,0 +1,17 @@ +tableId,dataType,Empty,Description +GLOBALEVENTID,INTEGER,NULLABLE,This is the ID of the event that was mentioned in the article. +EventTimeDate,INTEGER,NULLABLE,This is the 15-minute timestamp (YYYYMMDDHHMMSS) when the event being mentioned was first recorded by GDELT (the DATEADDED field of the original event record). This field can be compared against the next one to identify events being mentioned for the first time (their first mentions) or to identify events of a particular vintage being mentioned now (such as filtering for mentions of events at least one week old). +MentionTimeDate,INTEGER,NULLABLE,This is the 15-minute timestamp (YYYYMMDDHHMMSS) of the current update. This is identical for all entries in the update file but is included to make it easier to load the Mentions table into a database. +MentionType,INTEGER,NULLABLE,"This is a numeric identifier that refers to the source collection the document came from and is used to interpret the MentionIdentifier in the next column. In essence, it specifies how to interpret the MentionIdentifier to locate the actual document. At present, it can hold one of the following values:o 1 = WEB (The document originates from the open web and the MentionIdentifier is a fully-qualified URL that can be used to access the document on the web).o 2 = CITATIONONLY (The document originates from a broadcast, print, or other offline source in which only a textual citation is available for the document. In this case the MentionIdentifier contains the textual citation for the document).o 3 = CORE (The document originates from the CORE archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through the CORE website).o 4 = DTIC (The document originates from the DTIC archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through the DTIC website).o 5 = JSTOR (The document originates from the JSTOR archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through your JSTOR subscription if your institution subscribes to it).o 6 = NONTEXTUALSOURCE (The document originates from a textual proxy (such as closed captioning) of a non-textual information source (such as a video) available via a URL and the MentionIdentifier provides the URL of the non-textual original source. At present, this Collection Identifier is used for processing of the closed captioning streams of the Internet Archive Television News Archive in which each broadcast is available via a URL, but the URL offers access only to the video of the broadcast and does not provide any access to the textual closed captioning used to generate the metadata. This code is used in order to draw a distinction between URL-based textual material (Collection Identifier 1 (WEB) and URL-based non-textual material like the Television News Archive)." +MentionSourceName,STRING,NULLABLE,"This is a human-friendly identifier of the source of the document. For material originating from the open web with a URL this field will contain the top-level domain the page was from. For BBC Monitoring material it will contain “BBC Monitoring” and for JSTOR material it will contain “JSTOR.” This field is intended for human display of major sources as well as for network analysis of information flows by source, obviating the requirement to perform domain or other parsing of the MentionIdentifier field." +MentionIdentifier,STRING,NULLABLE,"This is the unique external identifier for the source document. It can be used to uniquely identify the document and access it if you have the necessary subscriptions or authorizations and/or the document is public access. This field can contain a range of values, from URLs of open web resources to textual citations of print or broadcast material to DOI identifiers for various document repositories. For example, if MentionType is equal to 1, this field will contain a fully-qualified URL suitable for direct access. If MentionType is equal to 2, this field will contain a textual citation akin to what would appear in an academic journal article referencing that document (NOTE that the actual citation format will vary (usually between APA, Chicago, Harvard, or MLA) depending on a number of factors and no assumptions should be made on its precise format at this time due to the way in which this data is currently provided to GDELT – future efforts will focus on normalization of this field to a standard citation format). If MentionType is 3, the field will contain a numeric or alpha-numeric DOI that can be typed into JSTOR’s search engine to access the document if your institution has a JSTOR subscription." +SentenceID,INTEGER,NULLABLE,"The sentence within the article where the event was mentioned (starting with the first sentence as 1, the second sentence as 2, the third sentence as 3, and so on). This can be used similarly to the CharOffset fields below, but reports the event’s location in the article in terms of sentences instead of characters, which is more amenable to certain measures of the “importance” of an event’s positioning within an article." +Actor1CharOffset,INTEGER,NULLABLE,"The location within the article (in terms of English characters) where Actor1 was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser." +Actor2CharOffset,INTEGER,NULLABLE,"The location within the article (in terms of English characters) where Actor2 was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser." +ActionCharOffset,INTEGER,NULLABLE,"The location within the article (in terms of English characters) where the core Action description was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser." +InRawText,INTEGER,NULLABLE,This records whether the event was found in the original unaltered raw article text (a value of 1) or whether advanced natural language processing algorithms were required to synthesize and rewrite the article text to identify the event (a value of 0). See the discussion on the Confidence field below for more details. Mentions with a value of “1” in this field likely represent strong detail-rich references to an event. +Confidence,INTEGER,NULLABLE,Percent confidence in the extraction of this event from this article. See the discussion in the codebook at http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf +MentionDocLen,INTEGER,NULLABLE,The length in English characters of the source document (making it possible to filter for short articles focusing on a particular event versus long summary articles that casually mention an event in passing). +MentionDocTone,FLOAT,NULLABLE,"The same contents as the AvgTone field in the Events table, but computed for this particular article. NOTE: users interested in emotional measures should use the MentionIdentifier field above to merge the Mentions table with the GKG table to access the complete set of 2,300 emotions and themes from the GCAM system." +MentionDocTranslationInfo,STRING,NULLABLE,"This field is internally delimited by semicolons and is used to record provenance information for machine translated documents indicating the original source language and the citation of the translation system used to translate the document for processing. It will be blank for documents originally in English. At this time the field will also be blank for documents translated by a human translator and provided to GDELT in English (such as BBC Monitoring materials) – in future this field may be expanded to include information on human translation pipelines, but at present it only captures information on machine translated materials. An example of the contents of this field might be “srclc:fra; eng:Moses 2.1.1 / MosesCore Europarl fr-en / GT-FRA 1.0”. NOTE: Machine translation is often not as accurate as human translation and users requiring the highest possible confidence levels may wish to exclude events whose only mentions are in translated reports, while those needing the highest-possible coverage of the non-Western world will find that these events often offer the earliest glimmers of breaking events or smaller-bore events of less interest to Western media.o SRCLC. This is the Source Language Code, representing the three-letter ISO639-2 code of the language of the original source material. o ENG. This is a textual citation string that indicates the engine(s) and model(s) used to translate the text. The format of this field will vary across engines and over time and no expectations should be made on the ordering or formatting of this field. In the example above, the string “Moses 2.1.1 / MosesCore Europarl fr-en / GT-FRA 1.0” indicates that the document was translated using version 2.1.1 of the Moses SMT platform, using the “MosesCore Europarl fr-en” translation and language models, with the final translation enhanced via GDELT Translingual’s own version 1.0 French translation and language models. A value of “GT-ARA 1.0” indicates that GDELT Translingual’s version 1.0 Arabic translation and language models were the sole resources used for translation. Additional language systems used in the translation pipeline such as word segmentation systems are also captured in this field such that a value of “GT-ZHO 1.0 / Stanford PKU” indicates that the Stanford Chinese Word Segmenter was used to segment the text into individual words and sentences, which were then translated by GDELT Translingual’s own version 1.0 Chinese (Traditional or Simplified) translation and language models." +Extras,STRING,NULLABLE,"This field is currently blank, but is reserved for future use to encode special additional measurements for selected material." diff --git a/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.tsv b/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.tsv deleted file mode 100644 index 4740ea8..0000000 --- a/schema_csvs/GDELT_2.0_eventMentions_Column_Labels_Header_Row_Sep2016.tsv +++ /dev/null @@ -1,18 +0,0 @@ - 0 1 2 3 -0 GLOBALEVENTID INTEGER NULLABLE This is the ID of the event that was mentioned in the article. -1 EventTimeDate INTEGER NULLABLE This is the 15-minute timestamp (YYYYMMDDHHMMSS) when the event being mentioned was first recorded by GDELT (the DATEADDED field of the original event record). This field can be compared against the next one to identify events being mentioned for the first time (their first mentions) or to identify events of a particular vintage being mentioned now (such as filtering for mentions of events at least one week old). -2 MentionTimeDate INTEGER NULLABLE This is the 15-minute timestamp (YYYYMMDDHHMMSS) of the current update. This is identical for all entries in the update file but is included to make it easier to load the Mentions table into a database. -3 MentionType INTEGER NULLABLE This is a numeric identifier that refers to the source collection the document came from and is used to interpret the MentionIdentifier in the next column. In essence, it specifies how to interpret the MentionIdentifier to locate the actual document. At present, it can hold one of the following values:o 1 = WEB (The document originates from the open web and the MentionIdentifier is a fully-qualified URL that can be used to access the document on the web).o 2 = CITATIONONLY (The document originates from a broadcast, print, or other offline source in which only a textual citation is available for the document. In this case the MentionIdentifier contains the textual citation for the document).o 3 = CORE (The document originates from the CORE archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through the CORE website).o 4 = DTIC (The document originates from the DTIC archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through the DTIC website).o 5 = JSTOR (The document originates from the JSTOR archive and the MentionIdentifier contains its DOI, suitable for accessing the original document through your JSTOR subscription if your institution subscribes to it).o 6 = NONTEXTUALSOURCE (The document originates from a textual proxy (such as closed captioning) of a non-textual information source (such as a video) available via a URL and the MentionIdentifier provides the URL of the non-textual original source. At present, this Collection Identifier is used for processing of the closed captioning streams of the Internet Archive Television News Archive in which each broadcast is available via a URL, but the URL offers access only to the video of the broadcast and does not provide any access to the textual closed captioning used to generate the metadata. This code is used in order to draw a distinction between URL-based textual material (Collection Identifier 1 (WEB) and URL-based non-textual material like the Television News Archive). -4 MentionSourceName STRING NULLABLE This is a human-friendly identifier of the source of the document. For material originating from the open web with a URL this field will contain the top-level domain the page was from. For BBC Monitoring material it will contain “BBC Monitoring” and for JSTOR material it will contain “JSTOR.” This field is intended for human display of major sources as well as for network analysis of information flows by source, obviating the requirement to perform domain or other parsing of the MentionIdentifier field. -5 MentionIdentifier STRING NULLABLE This is the unique external identifier for the source document. It can be used to uniquely identify the document and access it if you have the necessary subscriptions or authorizations and/or the document is public access. This field can contain a range of values, from URLs of open web resources to textual citations of print or broadcast material to DOI identifiers for various document repositories. For example, if MentionType is equal to 1, this field will contain a fully-qualified URL suitable for direct access. If MentionType is equal to 2, this field will contain a textual citation akin to what would appear in an academic journal article referencing that document (NOTE that the actual citation format will vary (usually between APA, Chicago, Harvard, or MLA) depending on a number of factors and no assumptions should be made on its precise format at this time due to the way in which this data is currently provided to GDELT – future efforts will focus on normalization of this field to a standard citation format). If MentionType is 3, the field will contain a numeric or alpha-numeric DOI that can be typed into JSTOR’s search engine to access the document if your institution has a JSTOR subscription. -6 SentenceID INTEGER NULLABLE The sentence within the article where the event was mentioned (starting with the first sentence as 1, the second sentence as 2, the third sentence as 3, and so on). This can be used similarly to the CharOffset fields below, but reports the event’s location in the article in terms of sentences instead of characters, which is more amenable to certain measures of the “importance” of an event’s positioning within an article. -7 Actor1CharOffset INTEGER NULLABLE The location within the article (in terms of English characters) where Actor1 was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser. -8 Actor2CharOffset INTEGER NULLABLE The location within the article (in terms of English characters) where Actor2 was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser. -9 ActionCharOffset INTEGER NULLABLE The location within the article (in terms of English characters) where the core Action description was found. This can be used in combination with the GKG or other analysis to identify further characteristics and attributes of the actor. NOTE: due to processing performed on each article, this may be slightly offset from the position seen when the article is rendered in a web browser. -10 InRawText INTEGER NULLABLE This records whether the event was found in the original unaltered raw article text (a value of 1) or whether advanced natural language processing algorithms were required to synthesize and rewrite the article text to identify the event (a value of 0). See the discussion on the Confidence field below for more details. Mentions with a value of “1” in this field likely represent strong detail-rich references to an event. -11 Confidence INTEGER NULLABLE Percent confidence in the extraction of this event from this article. See the discussion in the codebook at http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf -12 MentionDocLen INTEGER NULLABLE The length in English characters of the source document (making it possible to filter for short articles focusing on a particular event versus long summary articles that casually mention an event in passing). -13 MentionDocTone FLOAT NULLABLE The same contents as the AvgTone field in the Events table, but computed for this particular article. NOTE: users interested in emotional measures should use the MentionIdentifier field above to merge the Mentions table with the GKG table to access the complete set of 2,300 emotions and themes from the GCAM system. -14 MentionDocTranslationInfo STRING NULLABLE This field is internally delimited by semicolons and is used to record provenance information for machine translated documents indicating the original source language and the citation of the translation system used to translate the document for processing. It will be blank for documents originally in English. At this time the field will also be blank for documents translated by a human translator and provided to GDELT in English (such as BBC Monitoring materials) – in future this field may be expanded to include information on human translation pipelines, but at present it only captures information on machine translated materials. An example of the contents of this field might be “srclc:fra; eng:Moses 2.1.1 / MosesCore Europarl fr-en / GT-FRA 1.0”. NOTE: Machine translation is often not as accurate as human translation and users requiring the highest possible confidence levels may wish to exclude events whose only mentions are in translated reports, while those needing the highest-possible coverage of the non-Western world will find that these events often offer the earliest glimmers of breaking events or smaller-bore events of less interest to Western media.o SRCLC. This is the Source Language Code, representing the three-letter ISO639-2 code of the language of the original source material. o ENG. This is a textual citation string that indicates the engine(s) and model(s) used to translate the text. The format of this field will vary across engines and over time and no expectations should be made on the ordering or formatting of this field. In the example above, the string “Moses 2.1.1 / MosesCore Europarl fr-en / GT-FRA 1.0” indicates that the document was translated using version 2.1.1 of the Moses SMT platform, using the “MosesCore Europarl fr-en” translation and language models, with the final translation enhanced via GDELT Translingual’s own version 1.0 French translation and language models. A value of “GT-ARA 1.0” indicates that GDELT Translingual’s version 1.0 Arabic translation and language models were the sole resources used for translation. Additional language systems used in the translation pipeline such as word segmentation systems are also captured in this field such that a value of “GT-ZHO 1.0 / Stanford PKU” indicates that the Stanford Chinese Word Segmenter was used to segment the text into individual words and sentences, which were then translated by GDELT Translingual’s own version 1.0 Chinese (Traditional or Simplified) translation and language models. -15 Extras STRING NULLABLE This field is currently blank, but is reserved for future use to encode special additional measurements for selected material. -16 Add New Fields diff --git a/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.csv b/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.csv new file mode 100644 index 0000000..78633cd --- /dev/null +++ b/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.csv @@ -0,0 +1,28 @@ +tableId,dataType,Empty,Description +GKGRECORDID,STRING,NULLABLE,GKG record ID takes the form “YYYYMMDDHHMMSS-X” or “YYYYMMDDHHMMSS-TX” in which the first portion of the ID is the full date+time of the 15 minute update batch that this record was created in +DATE,INTEGER,NULLABLE,Date in YYYYMMDDHHMMSS format on which the news media used to construct this GKG file was published. +SourceCollectionIdentifier,INTEGER,NULLABLE,Numeric identifier that refers to the source collection the document came from and is used to interpret the DocumentIdentifier in the next column. In essence +SourceCommonName,STRING,NULLABLE,A human-friendly identifier of the source of the document. For material originating from the open web with a URL this field will contain the toplevel domain the page was from +DocumentIdentifier,STRING,NULLABLE,This is the unique external identifier for the source document. It can be used to uniquely identify the document and access it if you have the necessary subscriptions or authorizations and/or the document is public access. This field can contain a range of values +Counts,STRING,NULLABLE,This is the list of Counts found in this document. Each Count found is separated with a semicolon +V2Counts,STRING,NULLABLE,This field is identical to the V1COUNTS field except that it adds a final additional field to the end of each entry that records its approximate character offset in the document +Themes,STRING,NULLABLE,This is the list of all Themes found in the document. For the complete list of possible themes +V2Themes,STRING,NULLABLE,This contains a list of all GKG themes referenced in the document +Locations,STRING,NULLABLE,This is a list of all locations found in the text +V2Locations,STRING,NULLABLE,This field is identical to the V1LOCATIONS field with the primary exception of an extra field appended to the end of each location block after its FeatureID that lists the approximate character offset of the reference to that location in the text. In addition +Persons,STRING,NULLABLE,This is the list of all person names found in the text +V2Persons,STRING,NULLABLE,This contains a list of all person names referenced in the document +Organizations,STRING,NULLABLE,This is the list of all company and organization names found in the text +V2Organizations,STRING,NULLABLE,This contains a list of all organizations/companies referenced in the document +V2Tone,STRING,NULLABLE,This field contains a comma-delimited list of six core emotional dimensions +Dates,STRING,NULLABLE,This contains a list of all date references in the document +GCAM,STRING,NULLABLE,The Global Content Analysis Measures (GCAM) system runs an array of content analysis systems over each document and compiles their results into this field. New content analysis systems will be constantly added to the GCAM pipeline over time +SharingImage,STRING,NULLABLE,Many news websites specify a so-called “sharing image” for each article in which the news outlet manually specifies a particular image to be displayed when that article is shared via social media or other formats. +RelatedImages,STRING,NULLABLE,News articles frequently include photographs +SocialImageEmbeds,STRING,NULLABLE,News websites are increasingly embedding image-based social media posts inline in their articles to illustrate them with realtime reaction or citizen reporting from the ground. GDELT currently recognizes embedded image-based Twitter and Instagram posts and records their URLs in this field. Only those posts containing imagery are included in this field. +SocialVideoEmbeds,STRING,NULLABLE,News websites are increasingly embedding videos inline in their articles to illustrate them with realtime reaction or citizen reporting from the ground. Some news outlets that also have television properties may crosslink their television reporting into their web-based presentation. +Quotations,STRING,NULLABLE,News coverage frequently features excerpted statements from participants in an event and/or those affected by it and these quotations can offer critical insights into differing perspectives and emotions surrounding that event. GDELT identifies and extracts all quoted statements from each article and additionally attempts to identify the verb introducing the quote to help lend additional context +AllNames,STRING,NULLABLE,This field contains a list of all proper names referenced in the document +Amounts,STRING,NULLABLE,This field contains a list of all precise numeric amounts referenced in the document +TranslationInfo,STRING,NULLABLE,This field is used to record provenance information for machine translated documents indicating the original source language and the citation of the translation system used to translate the document for processing. It will be blank for documents originally in English. At this time the field will also be blank for documents translated by a human translator and provided to GDELT in English (such as BBC Monitoring materials) – in future this field may be expanded to include information on human translation pipelines +Extras,STRING,NULLABLE,This field is reserved to hold special non-standard data applicable to special subsets of the GDELT collection. diff --git a/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.tsv b/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.tsv deleted file mode 100644 index fb71763..0000000 --- a/schema_csvs/GDELT_2.0_gdeltKnowledgeGraph_Column_Labels_Header_Row_Sep2016.tsv +++ /dev/null @@ -1,28 +0,0 @@ - tableId dataType Empty Description -0 GKGRECORDID STRING NULLABLE GKG record ID takes the form “YYYYMMDDHHMMSS-X” or “YYYYMMDDHHMMSS-TX” in which the first portion of the ID is the full date+time of the 15 minute update batch that this record was created in -1 DATE INTEGER NULLABLE Date in YYYYMMDDHHMMSS format on which the news media used to construct this GKG file was published. -2 SourceCollectionIdentifier INTEGER NULLABLE Numeric identifier that refers to the source collection the document came from and is used to interpret the DocumentIdentifier in the next column. In essence -3 SourceCommonName STRING NULLABLE A human-friendly identifier of the source of the document. For material originating from the open web with a URL this field will contain the toplevel domain the page was from -4 DocumentIdentifier STRING NULLABLE This is the unique external identifier for the source document. It can be used to uniquely identify the document and access it if you have the necessary subscriptions or authorizations and/or the document is public access. This field can contain a range of values -5 Counts STRING NULLABLE This is the list of Counts found in this document. Each Count found is separated with a semicolon -6 V2Counts STRING NULLABLE This field is identical to the V1COUNTS field except that it adds a final additional field to the end of each entry that records its approximate character offset in the document -7 Themes STRING NULLABLE This is the list of all Themes found in the document. For the complete list of possible themes -8 V2Themes STRING NULLABLE This contains a list of all GKG themes referenced in the document -9 Locations STRING NULLABLE This is a list of all locations found in the text -10 V2Locations STRING NULLABLE This field is identical to the V1LOCATIONS field with the primary exception of an extra field appended to the end of each location block after its FeatureID that lists the approximate character offset of the reference to that location in the text. In addition -11 Persons STRING NULLABLE This is the list of all person names found in the text -12 V2Persons STRING NULLABLE This contains a list of all person names referenced in the document -13 Organizations STRING NULLABLE This is the list of all company and organization names found in the text -14 V2Organizations STRING NULLABLE This contains a list of all organizations/companies referenced in the document -15 V2Tone STRING NULLABLE This field contains a comma-delimited list of six core emotional dimensions -16 Dates STRING NULLABLE This contains a list of all date references in the document -17 GCAM STRING NULLABLE The Global Content Analysis Measures (GCAM) system runs an array of content analysis systems over each document and compiles their results into this field. New content analysis systems will be constantly added to the GCAM pipeline over time -18 SharingImage STRING NULLABLE Many news websites specify a so-called “sharing image” for each article in which the news outlet manually specifies a particular image to be displayed when that article is shared via social media or other formats. -19 RelatedImages STRING NULLABLE News articles frequently include photographs -20 SocialImageEmbeds STRING NULLABLE News websites are increasingly embedding image-based social media posts inline in their articles to illustrate them with realtime reaction or citizen reporting from the ground. GDELT currently recognizes embedded image-based Twitter and Instagram posts and records their URLs in this field. Only those posts containing imagery are included in this field. -21 SocialVideoEmbeds STRING NULLABLE News websites are increasingly embedding videos inline in their articles to illustrate them with realtime reaction or citizen reporting from the ground. Some news outlets that also have television properties may crosslink their television reporting into their web-based presentation. -22 Quotations STRING NULLABLE News coverage frequently features excerpted statements from participants in an event and/or those affected by it and these quotations can offer critical insights into differing perspectives and emotions surrounding that event. GDELT identifies and extracts all quoted statements from each article and additionally attempts to identify the verb introducing the quote to help lend additional context -23 AllNames STRING NULLABLE This field contains a list of all proper names referenced in the document -24 Amounts STRING NULLABLE This field contains a list of all precise numeric amounts referenced in the document -25 TranslationInfo STRING NULLABLE This field is used to record provenance information for machine translated documents indicating the original source language and the citation of the translation system used to translate the document for processing. It will be blank for documents originally in English. At this time the field will also be blank for documents translated by a human translator and provided to GDELT in English (such as BBC Monitoring materials) – in future this field may be expanded to include information on human translation pipelines -26 Extras STRING NULLABLE This field is reserved to hold special non-standard data applicable to special subsets of the GDELT collection.