Various datasets for the workshop.
We provide a script to parse these data sources into a suitable format. Follow the instructions here.
Interviews of world leaders from various journalistic sources.
Vladimir Putin
File: vladimir_putin_interviews.json
Sources:
- https://www.ft.com/content/878d2344-98f0-11e9-9573-ee5cbb98ed36
- https://www.npr.org/news/specials/putin/nprinterview.html
- https://english.alarabiya.net/en/News/world/2019/10/13/Full-transcript-of-Russian-president-Vladimir-Putin-interview-with-Al-Arabiya.html
Barack Obama
File: barack_obama_interviews.json
Sources:
- https://www.bbc.com/news/world-us-canada-33646542
- https://edition.cnn.com/2016/12/26/politics/axe-files-obama-transcript/index.html
- https://abcnews.go.com/Politics/week-transcript-president-barack-obama/story?id=44630949
File: cornell_movie_dialogs_corpus.json.zip
220,579 conversational exchanges between 10,292 pairs of movie characters
Source: http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
File: html-dataset.txt
HTML files from various Github projects.
Scraped from these repositories: https://gist.github.com/VladislavZavadskyy/e31ab07b03a5c22b11982c49669a400b
Source: https://www.kaggle.com/zavadskyy/lots-of-code
Files:
typescript.zip
json.zip
TypeScript (.ts) and JSON (.json) files collected from a fresh angular app with routing (ng new <app-name>
).
For installation of angular see https://angular.io/guide/setup-local
File: javascript.zip
Sample of JavaScript files (.js) collected from a data set containing JS-Files.
Source: https://www.sri.inf.ethz.ch/js150
Chess games from 2019 in PGN format.
File: ficsgamesdb_2019_standard2000_nomovetimes_110541.pgn
Source: https://www.ficsgames.org/download.html
Music in ABC-Notation.
File: abc_notation_songs.txt
Source: https://www.kaggle.com/raj5287/abc-notation-of-tunes/version/3
Data set is split into two files.
Files:
realdonaldtrump-1.ndjson
realdonaldtrump-2.ndjson
Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FKJEBIL
File: shakespeare_data.csv
Source: https://www.kaggle.com/kingburrito666/shakespeare-plays