diff --git a/archives/nines.md b/archives/nines.md index 05f7d62..2f32531 100644 --- a/archives/nines.md +++ b/archives/nines.md @@ -4,13 +4,13 @@ Once a text is encoded in TEI, the work of a textual scholar has only begun. Enc When you find a work encoded in TEI online, it might not be clearly apparent at first glance. You can find the TEI materials of _The Old Bailey Online_ by clicking "View as XML" at the bottom of a particular entry \(TEI is a version of XML, which is a more general [markup language](https://en.wikipedia.org/wiki/Markup_language)\). If you do so for [this entry](http://www.oldbaileyonline.org/browse.jsp?id=t18881119-50&div=t18881119-50&terms=ripper#highlight) on a robbery case involving mention of Jack the Ripper, you can take a look at their TEI. It might not recognize a lot of the tags \(there are _loads_ of TEI tags\), but the general arrangement of them should look familiar: -![Jack the Ripper TEI](/assets/archives/old-bailey-tei.png) +![Jack the Ripper TEI](/assets/archives/old-bailey-tei.jpg) Focus on what you do know: the tagging syntax, and where to go to look for help. A lot of working with technology consists of not panicking and then looking up what you don't know. But I digress. When you look at the entry on the _Old Bailey Online_, almost all the tags disappear: -![Same entry without TEI](/assets/archives/old-bailey-sans-tei.png) +![Same entry without TEI](/assets/archives/old-bailey-sans-tei.jpg) We have already talked a bit about the functions you can get from TEI, but, after all, those might not be enough to warrant the amount of work that goes into putting together a TEI-encoded text. @@ -26,13 +26,13 @@ If you have ever tried to access a resource from an online newspaper only to be Once materials are put online, it is possible to connect them to a wider, global network of similar digital materials. In the same way that a library gathers information about the materials so that they may be organized in a systematic way \(more on **metadata** in our chapter on "Problems with Data"\), the process has to be overseen carefully in order for this networking to happen. Technical standards also shift \(TEI tags can change over time\), so archival materials require constant maintenance. If you have ever used a digital archive, you have taken advantage of a vast and often invisible amount of labor happening behind the scenes. The hidden work of gallery, library, archive, and museum \(or **GLAM**\) professionals ensures that our cultural heritage will remain accessible and sustainable for centuries to come. -![nines splash page](/assets/archives/nines-splash.png) +![nines splash page](/assets/archives/nines-splash.jpg) The **Networked Infrastructure for Nineteenth-Century Electronic Scholarship ([NINES](https://www.nines.org))** is one such digital humanities organization that attempts to facilitate this process and gather archived materials pertaining to the Nineteenth Century. You might think of NINES as something like a one-stop shopping for all your nineteenth-century digital humanities and text analysis needs. It gathers together peer-reviewed projects on nineteenth-century literature and culture that were put together by different research teams around the globe; some focus on an individual author, others on a genre \(periodicals or "penny dreadfuls"\) or a particular issue \(disability or court cases\). If you go to the site and scroll through "Federated Websites," you'll see the range of projects you can get to from NINES, from one on the eighteenth-century book trade in France to another featuring the letters of Emily Dickinson. For the purposes of this class, you'll notice that some of the projects will be extremely useful to you, such as the Old Bailey Online, which contains trials records from London's central criminal court. Others, such as a project on the journals of Lewis and Clark, won't be relevant for this class -- but might be for others you are taking. You might also notice that NINES has a relatively expansive view of what the nineteenth century is, since this site includes projects that deal with the late eighteenth and early nineteenth century. Historians often talk of the "long nineteenth century" as the period from the beginning of the French Revolution in 1789 to the outbreak of World War I in 1914. In other words, historians of the nineteenth century like to claim big chunks of other people's time periods. -![nines federation](/assets/archives/nines-federated.png) +![nines federation](/assets/archives/nines-federated.jpg) Archives submit themselves for affiliation with NINES so that their materials can be searchable alongside other NINES sites, but they must pass a rigorous process of **peer review** first. Academic journals rely on peer review to ensure that scholarship meets particular standards of rigor and relevance; you can think of it as a bit like quality control for scholarly writing. The peer review process typically involves submitting an article or book to a series of blind reviewers who, anonymous themselves, write letters in support or rejection of the project and offer suggestions for improvement. Should the piece pass, it moves onto publication and receives the explicit seal of approval from the publication. @@ -44,7 +44,7 @@ One of the early missions of NINES was to provide scholars who could perform thi The peer reviewers at NINES had several comments in their feedback for you. They wanted you to make changes before they would accept your project into NINES, but, good news! These comments make your work much stronger anyway, and you are happy to do them \(peer review doesn't always go so smoothly\). Once reviewed, NINES makes the archival materials available for searching alongside other peer-reviewed projects. You can see an example search of _The Old Bailey Online_ [here](http://www.nines.org/search?q=old%20bailey). Now that your archival materials are part of the archive, a search for 'old bailey' in NINES reveals objects not only in NINES, but also in a wide range of other archives. -![old bailey federated search results](/assets/archives/nines-old-bailey-search.png) +![old bailey federated search results](/assets/archives/nines-old-bailey-search.jpg) **What does peer review mean for you as a user of an archive?** @@ -54,7 +54,7 @@ If you've made it this far in life, you've probably realized that you can't trus Beyond the fact that you can have a lot of confidence in the projects you find here, NINES is just going to make it easier for you to find things. For one, you might not have known about all these different projects. NINES has also made sure that these projects "play nice" with each other (a.k.a. interoperability), which means you can find references to a particular topic or word across these projects with a simple search. -![nines crime search](/assets/archives/nines-crime-search.png) +![nines crime search](/assets/archives/nines-crime-search.jpg) Doing a search for "crime" gets you all the references to this term in all of the different projects linked to NINES, saving you from having to search each individual one. \(One warning: only some of the results you get in the left pane will get you to material from the online projects affiliated with NINES. In other cases, NINES is searching library catalogs where the material is not available digitally. In this instance, if you wanted to read the first work, Alexandre Dumas's _Celebrated Crimes_, you would have to drive to Charlottesville and go to UVA's Special Collections Library.\) diff --git a/archives/tei.md b/archives/tei.md index 4c912fd..19f0661 100644 --- a/archives/tei.md +++ b/archives/tei.md @@ -33,11 +33,11 @@ Think about all the annotations that you put on your own pages as you read them. And we can represent it graphically, like so, where black denotes the stanza boundaries, horizontal blue the lines, and the rotating colors under the final words describe a rhyme scheme: -![marking up poem by hand graphically](/assets/archives/tei-graphic.png) +![marking up poem by hand graphically](/assets/archives/tei-graphic.jpg) But you probably would need a moment to realize what was going on if you came to this having not highlighted things yourself. We can perform a similar function with text annotations, which gets closer to a meaning that we could understand without having any prompting: -![tei with text annotations](/assets/archives/tei.png) +![tei with text annotations](/assets/archives/tei.jpg) But for a computer to understand this, we need an even more delineated way of describing the passage. We have to pay careful attention to **syntax**, the ways in which we mark particular things to provide information to the computer. Computers require very specific systematic guidelines to be able to process information, as you will learn in our chapter on Cyborg readers. Scholars have been working for years to develop such a system for describing texts in a way that can be processed by software. The *Text Encoding Initiative* (TEI), the result of this work, is an attempt to make abstract humanities concepts legible to machines. TEI applied to the passage, might begin to look something like this: diff --git a/assets/archives/nines-crime-search.jpg b/assets/archives/nines-crime-search.jpg new file mode 100644 index 0000000..ae61afd Binary files /dev/null and b/assets/archives/nines-crime-search.jpg differ diff --git a/assets/archives/nines-crime-search.png b/assets/archives/nines-crime-search.png deleted file mode 100644 index a310345..0000000 Binary files a/assets/archives/nines-crime-search.png and /dev/null differ diff --git a/assets/archives/nines-federated.jpg b/assets/archives/nines-federated.jpg new file mode 100644 index 0000000..72f3c77 Binary files /dev/null and b/assets/archives/nines-federated.jpg differ diff --git a/assets/archives/nines-federated.png b/assets/archives/nines-federated.png deleted file mode 100644 index 1b9d22d..0000000 Binary files a/assets/archives/nines-federated.png and /dev/null differ diff --git a/assets/archives/nines-old-bailey-search.jpg b/assets/archives/nines-old-bailey-search.jpg new file mode 100644 index 0000000..77fa5a3 Binary files /dev/null and b/assets/archives/nines-old-bailey-search.jpg differ diff --git a/assets/archives/nines-old-bailey-search.png b/assets/archives/nines-old-bailey-search.png deleted file mode 100644 index b832446..0000000 Binary files a/assets/archives/nines-old-bailey-search.png and /dev/null differ diff --git a/assets/archives/nines-splash.jpg b/assets/archives/nines-splash.jpg new file mode 100644 index 0000000..8af42a9 Binary files /dev/null and b/assets/archives/nines-splash.jpg differ diff --git a/assets/archives/nines-splash.png b/assets/archives/nines-splash.png deleted file mode 100644 index 01bddb1..0000000 Binary files a/assets/archives/nines-splash.png and /dev/null differ diff --git a/assets/archives/old-bailey-sans-tei.jpg b/assets/archives/old-bailey-sans-tei.jpg new file mode 100644 index 0000000..f536923 Binary files /dev/null and b/assets/archives/old-bailey-sans-tei.jpg differ diff --git a/assets/archives/old-bailey-sans-tei.png b/assets/archives/old-bailey-sans-tei.png deleted file mode 100644 index f7082c8..0000000 Binary files a/assets/archives/old-bailey-sans-tei.png and /dev/null differ diff --git a/assets/archives/old-bailey-tei.jpg b/assets/archives/old-bailey-tei.jpg new file mode 100644 index 0000000..9679ce6 Binary files /dev/null and b/assets/archives/old-bailey-tei.jpg differ diff --git a/assets/archives/old-bailey-tei.png b/assets/archives/old-bailey-tei.png deleted file mode 100644 index 2e60ed4..0000000 Binary files a/assets/archives/old-bailey-tei.png and /dev/null differ diff --git a/assets/archives/tei-graphic.jpg b/assets/archives/tei-graphic.jpg new file mode 100644 index 0000000..7b825c8 Binary files /dev/null and b/assets/archives/tei-graphic.jpg differ diff --git a/assets/archives/tei-graphic.png b/assets/archives/tei-graphic.png deleted file mode 100644 index c899ed6..0000000 Binary files a/assets/archives/tei-graphic.png and /dev/null differ diff --git a/assets/archives/tei.jpg b/assets/archives/tei.jpg new file mode 100644 index 0000000..ad9340f Binary files /dev/null and b/assets/archives/tei.jpg differ diff --git a/assets/archives/tei.png b/assets/archives/tei.png deleted file mode 100644 index baea844..0000000 Binary files a/assets/archives/tei.png and /dev/null differ diff --git a/assets/close-reading/prism-raven-font-size.jpg b/assets/close-reading/prism-raven-font-size.jpg new file mode 100644 index 0000000..f0087c9 Binary files /dev/null and b/assets/close-reading/prism-raven-font-size.jpg differ diff --git a/assets/close-reading/prism-raven-font-size.png b/assets/close-reading/prism-raven-font-size.png deleted file mode 100644 index 5e92259..0000000 Binary files a/assets/close-reading/prism-raven-font-size.png and /dev/null differ diff --git a/assets/close-reading/prism-raven-highlights.jpg b/assets/close-reading/prism-raven-highlights.jpg new file mode 100644 index 0000000..6b5a182 Binary files /dev/null and b/assets/close-reading/prism-raven-highlights.jpg differ diff --git a/assets/close-reading/prism-raven-highlights.png b/assets/close-reading/prism-raven-highlights.png deleted file mode 100644 index e090946..0000000 Binary files a/assets/close-reading/prism-raven-highlights.png and /dev/null differ diff --git a/assets/close-reading/prism-raven-winning-facet.jpg b/assets/close-reading/prism-raven-winning-facet.jpg new file mode 100644 index 0000000..38d4653 Binary files /dev/null and b/assets/close-reading/prism-raven-winning-facet.jpg differ diff --git a/assets/close-reading/prism-raven-winning-facet.png b/assets/close-reading/prism-raven-winning-facet.png deleted file mode 100644 index 007e9fb..0000000 Binary files a/assets/close-reading/prism-raven-winning-facet.png and /dev/null differ diff --git a/assets/conclusion/clone-url.jpg b/assets/conclusion/clone-url.jpg new file mode 100644 index 0000000..b30e29b Binary files /dev/null and b/assets/conclusion/clone-url.jpg differ diff --git a/assets/conclusion/clone-url.png b/assets/conclusion/clone-url.png deleted file mode 100644 index bb64087..0000000 Binary files a/assets/conclusion/clone-url.png and /dev/null differ diff --git a/assets/conclusion/fork-button.jpg b/assets/conclusion/fork-button.jpg new file mode 100644 index 0000000..f490919 Binary files /dev/null and b/assets/conclusion/fork-button.jpg differ diff --git a/assets/conclusion/fork-button.png b/assets/conclusion/fork-button.png deleted file mode 100644 index 6ab9cd9..0000000 Binary files a/assets/conclusion/fork-button.png and /dev/null differ diff --git a/assets/conclusion/gitbook-add-book.jpg b/assets/conclusion/gitbook-add-book.jpg new file mode 100644 index 0000000..2668091 Binary files /dev/null and b/assets/conclusion/gitbook-add-book.jpg differ diff --git a/assets/conclusion/gitbook-add-book.png b/assets/conclusion/gitbook-add-book.png deleted file mode 100644 index 5729f64..0000000 Binary files a/assets/conclusion/gitbook-add-book.png and /dev/null differ diff --git a/assets/conclusion/gitbook-repo-selection.jpg b/assets/conclusion/gitbook-repo-selection.jpg new file mode 100644 index 0000000..aa138d1 Binary files /dev/null and b/assets/conclusion/gitbook-repo-selection.jpg differ diff --git a/assets/conclusion/gitbook-repo-selection.png b/assets/conclusion/gitbook-repo-selection.png deleted file mode 100644 index 14f36c9..0000000 Binary files a/assets/conclusion/gitbook-repo-selection.png and /dev/null differ diff --git a/assets/conclusion/gitbooks-clone.jpg b/assets/conclusion/gitbooks-clone.jpg new file mode 100644 index 0000000..c8dd753 Binary files /dev/null and b/assets/conclusion/gitbooks-clone.jpg differ diff --git a/assets/conclusion/gitbooks-clone.png b/assets/conclusion/gitbooks-clone.png deleted file mode 100644 index 9cf361b..0000000 Binary files a/assets/conclusion/gitbooks-clone.png and /dev/null differ diff --git a/assets/conclusion/gitbooks-editor-interface.jpg b/assets/conclusion/gitbooks-editor-interface.jpg new file mode 100644 index 0000000..8af415f Binary files /dev/null and b/assets/conclusion/gitbooks-editor-interface.jpg differ diff --git a/assets/conclusion/gitbooks-editor-interface.png b/assets/conclusion/gitbooks-editor-interface.png deleted file mode 100644 index a9d7dee..0000000 Binary files a/assets/conclusion/gitbooks-editor-interface.png and /dev/null differ diff --git a/assets/conclusion/gitbooks-github-complete-import-template.jpg b/assets/conclusion/gitbooks-github-complete-import-template.jpg new file mode 100644 index 0000000..ef6df87 Binary files /dev/null and b/assets/conclusion/gitbooks-github-complete-import-template.jpg differ diff --git a/assets/conclusion/gitbooks-github-complete-import-template.png b/assets/conclusion/gitbooks-github-complete-import-template.png deleted file mode 100644 index a1aaa6a..0000000 Binary files a/assets/conclusion/gitbooks-github-complete-import-template.png and /dev/null differ diff --git a/assets/conclusion/gitbooks-import-github.jpg b/assets/conclusion/gitbooks-import-github.jpg new file mode 100644 index 0000000..1344ead Binary files /dev/null and b/assets/conclusion/gitbooks-import-github.jpg differ diff --git a/assets/conclusion/gitbooks-import-github.png b/assets/conclusion/gitbooks-import-github.png deleted file mode 100644 index 992236e..0000000 Binary files a/assets/conclusion/gitbooks-import-github.png and /dev/null differ diff --git a/assets/conclusion/gitbooks-sync.jpg b/assets/conclusion/gitbooks-sync.jpg new file mode 100644 index 0000000..011fbef Binary files /dev/null and b/assets/conclusion/gitbooks-sync.jpg differ diff --git a/assets/conclusion/gitbooks-sync.png b/assets/conclusion/gitbooks-sync.png deleted file mode 100644 index d799a05..0000000 Binary files a/assets/conclusion/gitbooks-sync.png and /dev/null differ diff --git a/assets/conclusion/github-forking.jpg b/assets/conclusion/github-forking.jpg new file mode 100644 index 0000000..538d48a Binary files /dev/null and b/assets/conclusion/github-forking.jpg differ diff --git a/assets/conclusion/github-forking.png b/assets/conclusion/github-forking.png deleted file mode 100644 index 8f3b90e..0000000 Binary files a/assets/conclusion/github-forking.png and /dev/null differ diff --git a/assets/conclusion/sentence-difficulty.jpg b/assets/conclusion/sentence-difficulty.jpg new file mode 100644 index 0000000..498470c Binary files /dev/null and b/assets/conclusion/sentence-difficulty.jpg differ diff --git a/assets/conclusion/sentence-difficulty.png b/assets/conclusion/sentence-difficulty.png deleted file mode 100644 index 7ad3d18..0000000 Binary files a/assets/conclusion/sentence-difficulty.png and /dev/null differ diff --git a/assets/crowdsourcing/prism-create-one.jpg b/assets/crowdsourcing/prism-create-one.jpg new file mode 100644 index 0000000..a48bfef Binary files /dev/null and b/assets/crowdsourcing/prism-create-one.jpg differ diff --git a/assets/crowdsourcing/prism-create-one.png b/assets/crowdsourcing/prism-create-one.png deleted file mode 100644 index 5657f78..0000000 Binary files a/assets/crowdsourcing/prism-create-one.png and /dev/null differ diff --git a/assets/crowdsourcing/prism-create-two.jpg b/assets/crowdsourcing/prism-create-two.jpg new file mode 100644 index 0000000..6f89bef Binary files /dev/null and b/assets/crowdsourcing/prism-create-two.jpg differ diff --git a/assets/crowdsourcing/prism-create-two.png b/assets/crowdsourcing/prism-create-two.png deleted file mode 100644 index 0ed4508..0000000 Binary files a/assets/crowdsourcing/prism-create-two.png and /dev/null differ diff --git a/assets/crowdsourcing/prism-future-stacked.jpg b/assets/crowdsourcing/prism-future-stacked.jpg new file mode 100644 index 0000000..27f8a39 Binary files /dev/null and b/assets/crowdsourcing/prism-future-stacked.jpg differ diff --git a/assets/crowdsourcing/prism-future-stacked.png b/assets/crowdsourcing/prism-future-stacked.png deleted file mode 100644 index 296c4a1..0000000 Binary files a/assets/crowdsourcing/prism-future-stacked.png and /dev/null differ diff --git a/assets/crowdsourcing/prism-myprisms.jpg b/assets/crowdsourcing/prism-myprisms.jpg new file mode 100644 index 0000000..d61c2ad Binary files /dev/null and b/assets/crowdsourcing/prism-myprisms.jpg differ diff --git a/assets/crowdsourcing/prism-myprisms.png b/assets/crowdsourcing/prism-myprisms.png deleted file mode 100644 index a386fe4..0000000 Binary files a/assets/crowdsourcing/prism-myprisms.png and /dev/null differ diff --git a/assets/cyborg-readers/scandal-in-bohemia-word-cloud.jpg b/assets/cyborg-readers/scandal-in-bohemia-word-cloud.jpg new file mode 100644 index 0000000..eb6213d Binary files /dev/null and b/assets/cyborg-readers/scandal-in-bohemia-word-cloud.jpg differ diff --git a/assets/cyborg-readers/scandal-in-bohemia-word-cloud.png b/assets/cyborg-readers/scandal-in-bohemia-word-cloud.png deleted file mode 100644 index 0c237a6..0000000 Binary files a/assets/cyborg-readers/scandal-in-bohemia-word-cloud.png and /dev/null differ diff --git a/assets/cyborg-readers/stopword-free-concordance.jpg b/assets/cyborg-readers/stopword-free-concordance.jpg new file mode 100644 index 0000000..9818512 Binary files /dev/null and b/assets/cyborg-readers/stopword-free-concordance.jpg differ diff --git a/assets/cyborg-readers/stopword-free-concordance.png b/assets/cyborg-readers/stopword-free-concordance.png deleted file mode 100644 index ce78249..0000000 Binary files a/assets/cyborg-readers/stopword-free-concordance.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-frequency-graph.jpg b/assets/cyborg-readers/voyant-frequency-graph.jpg new file mode 100644 index 0000000..f47b1f7 Binary files /dev/null and b/assets/cyborg-readers/voyant-frequency-graph.jpg differ diff --git a/assets/cyborg-readers/voyant-frequency-graph.png b/assets/cyborg-readers/voyant-frequency-graph.png deleted file mode 100644 index 5baad43..0000000 Binary files a/assets/cyborg-readers/voyant-frequency-graph.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-links.jpg b/assets/cyborg-readers/voyant-links.jpg new file mode 100644 index 0000000..f49100c Binary files /dev/null and b/assets/cyborg-readers/voyant-links.jpg differ diff --git a/assets/cyborg-readers/voyant-links.png b/assets/cyborg-readers/voyant-links.png deleted file mode 100644 index a8b6b30..0000000 Binary files a/assets/cyborg-readers/voyant-links.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-overview.jpg b/assets/cyborg-readers/voyant-overview.jpg new file mode 100644 index 0000000..550d3a8 Binary files /dev/null and b/assets/cyborg-readers/voyant-overview.jpg differ diff --git a/assets/cyborg-readers/voyant-overview.png b/assets/cyborg-readers/voyant-overview.png deleted file mode 100644 index 53cca64..0000000 Binary files a/assets/cyborg-readers/voyant-overview.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-phrases.jpg b/assets/cyborg-readers/voyant-phrases.jpg new file mode 100644 index 0000000..e0d7ec1 Binary files /dev/null and b/assets/cyborg-readers/voyant-phrases.jpg differ diff --git a/assets/cyborg-readers/voyant-phrases.png b/assets/cyborg-readers/voyant-phrases.png deleted file mode 100644 index 5a090c3..0000000 Binary files a/assets/cyborg-readers/voyant-phrases.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-settings.jpg b/assets/cyborg-readers/voyant-settings.jpg new file mode 100644 index 0000000..ae8469d Binary files /dev/null and b/assets/cyborg-readers/voyant-settings.jpg differ diff --git a/assets/cyborg-readers/voyant-settings.png b/assets/cyborg-readers/voyant-settings.png deleted file mode 100644 index fb2d7b2..0000000 Binary files a/assets/cyborg-readers/voyant-settings.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-splash-page.jpg b/assets/cyborg-readers/voyant-splash-page.jpg new file mode 100644 index 0000000..724a757 Binary files /dev/null and b/assets/cyborg-readers/voyant-splash-page.jpg differ diff --git a/assets/cyborg-readers/voyant-splash-page.png b/assets/cyborg-readers/voyant-splash-page.png deleted file mode 100644 index 7794dd9..0000000 Binary files a/assets/cyborg-readers/voyant-splash-page.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-stopwords.jpg b/assets/cyborg-readers/voyant-stopwords.jpg new file mode 100644 index 0000000..72adcbe Binary files /dev/null and b/assets/cyborg-readers/voyant-stopwords.jpg differ diff --git a/assets/cyborg-readers/voyant-stopwords.png b/assets/cyborg-readers/voyant-stopwords.png deleted file mode 100644 index 5abc091..0000000 Binary files a/assets/cyborg-readers/voyant-stopwords.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-summary.jpg b/assets/cyborg-readers/voyant-summary.jpg new file mode 100644 index 0000000..b1a019d Binary files /dev/null and b/assets/cyborg-readers/voyant-summary.jpg differ diff --git a/assets/cyborg-readers/voyant-summary.png b/assets/cyborg-readers/voyant-summary.png deleted file mode 100644 index df37e37..0000000 Binary files a/assets/cyborg-readers/voyant-summary.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-term-frequencies.jpg b/assets/cyborg-readers/voyant-term-frequencies.jpg new file mode 100644 index 0000000..29362af Binary files /dev/null and b/assets/cyborg-readers/voyant-term-frequencies.jpg differ diff --git a/assets/cyborg-readers/voyant-term-frequencies.png b/assets/cyborg-readers/voyant-term-frequencies.png deleted file mode 100644 index f6fc96d..0000000 Binary files a/assets/cyborg-readers/voyant-term-frequencies.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-word-cloud-default.jpg b/assets/cyborg-readers/voyant-word-cloud-default.jpg new file mode 100644 index 0000000..083fb0e Binary files /dev/null and b/assets/cyborg-readers/voyant-word-cloud-default.jpg differ diff --git a/assets/cyborg-readers/voyant-word-cloud-default.png b/assets/cyborg-readers/voyant-word-cloud-default.png deleted file mode 100644 index 22bd326..0000000 Binary files a/assets/cyborg-readers/voyant-word-cloud-default.png and /dev/null differ diff --git a/assets/cyborg-readers/voyant-word-cloud-dense.jpg b/assets/cyborg-readers/voyant-word-cloud-dense.jpg new file mode 100644 index 0000000..01e0ef2 Binary files /dev/null and b/assets/cyborg-readers/voyant-word-cloud-dense.jpg differ diff --git a/assets/cyborg-readers/voyant-word-cloud-dense.png b/assets/cyborg-readers/voyant-word-cloud-dense.png deleted file mode 100644 index 6136feb..0000000 Binary files a/assets/cyborg-readers/voyant-word-cloud-dense.png and /dev/null differ diff --git a/assets/data-cleaning/data-cat-high-five.jpg b/assets/data-cleaning/data-cat-high-five.jpg new file mode 100644 index 0000000..bfc8d9e Binary files /dev/null and b/assets/data-cleaning/data-cat-high-five.jpg differ diff --git a/assets/data-cleaning/data-cat-high-five.png b/assets/data-cleaning/data-cat-high-five.png deleted file mode 100644 index 2427f4b..0000000 Binary files a/assets/data-cleaning/data-cat-high-five.png and /dev/null differ diff --git a/assets/data-cleaning/holmes-ocr-text.jpg b/assets/data-cleaning/holmes-ocr-text.jpg new file mode 100644 index 0000000..5eb358e Binary files /dev/null and b/assets/data-cleaning/holmes-ocr-text.jpg differ diff --git a/assets/data-cleaning/holmes-ocr-text.png b/assets/data-cleaning/holmes-ocr-text.png deleted file mode 100644 index 186b00e..0000000 Binary files a/assets/data-cleaning/holmes-ocr-text.png and /dev/null differ diff --git a/assets/data-cleaning/holmes.jpg b/assets/data-cleaning/holmes.jpg new file mode 100644 index 0000000..eedb972 Binary files /dev/null and b/assets/data-cleaning/holmes.jpg differ diff --git a/assets/data-cleaning/holmes.png b/assets/data-cleaning/holmes.png deleted file mode 100644 index 93ddacc..0000000 Binary files a/assets/data-cleaning/holmes.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-add-citation.jpg b/assets/data-cleaning/zotero-add-citation.jpg new file mode 100644 index 0000000..87e73a8 Binary files /dev/null and b/assets/data-cleaning/zotero-add-citation.jpg differ diff --git a/assets/data-cleaning/zotero-add-citation.png b/assets/data-cleaning/zotero-add-citation.png deleted file mode 100644 index 699703e..0000000 Binary files a/assets/data-cleaning/zotero-add-citation.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-document-with-bibliography.jpg b/assets/data-cleaning/zotero-document-with-bibliography.jpg new file mode 100644 index 0000000..26a17f3 Binary files /dev/null and b/assets/data-cleaning/zotero-document-with-bibliography.jpg differ diff --git a/assets/data-cleaning/zotero-document-with-bibliography.png b/assets/data-cleaning/zotero-document-with-bibliography.png deleted file mode 100644 index ee214cf..0000000 Binary files a/assets/data-cleaning/zotero-document-with-bibliography.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-download-from-chrome.jpg b/assets/data-cleaning/zotero-download-from-chrome.jpg new file mode 100644 index 0000000..b2edb3e Binary files /dev/null and b/assets/data-cleaning/zotero-download-from-chrome.jpg differ diff --git a/assets/data-cleaning/zotero-download-from-chrome.png b/assets/data-cleaning/zotero-download-from-chrome.png deleted file mode 100644 index 31c5767..0000000 Binary files a/assets/data-cleaning/zotero-download-from-chrome.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-download.jpg b/assets/data-cleaning/zotero-download.jpg new file mode 100644 index 0000000..e094f98 Binary files /dev/null and b/assets/data-cleaning/zotero-download.jpg differ diff --git a/assets/data-cleaning/zotero-download.png b/assets/data-cleaning/zotero-download.png deleted file mode 100644 index ca19194..0000000 Binary files a/assets/data-cleaning/zotero-download.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-editing-pane.jpg b/assets/data-cleaning/zotero-editing-pane.jpg new file mode 100644 index 0000000..6e45526 Binary files /dev/null and b/assets/data-cleaning/zotero-editing-pane.jpg differ diff --git a/assets/data-cleaning/zotero-editing-pane.png b/assets/data-cleaning/zotero-editing-pane.png deleted file mode 100644 index 6187161..0000000 Binary files a/assets/data-cleaning/zotero-editing-pane.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-example-citation.jpg b/assets/data-cleaning/zotero-example-citation.jpg new file mode 100644 index 0000000..792091a Binary files /dev/null and b/assets/data-cleaning/zotero-example-citation.jpg differ diff --git a/assets/data-cleaning/zotero-example-citation.png b/assets/data-cleaning/zotero-example-citation.png deleted file mode 100644 index 8fe62be..0000000 Binary files a/assets/data-cleaning/zotero-example-citation.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-give-page-numbers-to-citation.jpg b/assets/data-cleaning/zotero-give-page-numbers-to-citation.jpg new file mode 100644 index 0000000..a2a81ee Binary files /dev/null and b/assets/data-cleaning/zotero-give-page-numbers-to-citation.jpg differ diff --git a/assets/data-cleaning/zotero-give-page-numbers-to-citation.png b/assets/data-cleaning/zotero-give-page-numbers-to-citation.png deleted file mode 100644 index 6e29818..0000000 Binary files a/assets/data-cleaning/zotero-give-page-numbers-to-citation.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-input-by-isbn.jpg b/assets/data-cleaning/zotero-input-by-isbn.jpg new file mode 100644 index 0000000..1cbc5f1 Binary files /dev/null and b/assets/data-cleaning/zotero-input-by-isbn.jpg differ diff --git a/assets/data-cleaning/zotero-input-by-isbn.png b/assets/data-cleaning/zotero-input-by-isbn.png deleted file mode 100644 index 5029a89..0000000 Binary files a/assets/data-cleaning/zotero-input-by-isbn.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-input-from-web.jpg b/assets/data-cleaning/zotero-input-from-web.jpg new file mode 100644 index 0000000..aecc7b0 Binary files /dev/null and b/assets/data-cleaning/zotero-input-from-web.jpg differ diff --git a/assets/data-cleaning/zotero-input-from-web.png b/assets/data-cleaning/zotero-input-from-web.png deleted file mode 100644 index 69a0bf9..0000000 Binary files a/assets/data-cleaning/zotero-input-from-web.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-magic-wand.jpg b/assets/data-cleaning/zotero-magic-wand.jpg new file mode 100644 index 0000000..993c93d Binary files /dev/null and b/assets/data-cleaning/zotero-magic-wand.jpg differ diff --git a/assets/data-cleaning/zotero-magic-wand.png b/assets/data-cleaning/zotero-magic-wand.png deleted file mode 100644 index 33cf6c7..0000000 Binary files a/assets/data-cleaning/zotero-magic-wand.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-menu-in-word.jpg b/assets/data-cleaning/zotero-menu-in-word.jpg new file mode 100644 index 0000000..e76e985 Binary files /dev/null and b/assets/data-cleaning/zotero-menu-in-word.jpg differ diff --git a/assets/data-cleaning/zotero-menu-in-word.png b/assets/data-cleaning/zotero-menu-in-word.png deleted file mode 100644 index 8702a24..0000000 Binary files a/assets/data-cleaning/zotero-menu-in-word.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-searching-for-citation.jpg b/assets/data-cleaning/zotero-searching-for-citation.jpg new file mode 100644 index 0000000..0534e3f Binary files /dev/null and b/assets/data-cleaning/zotero-searching-for-citation.jpg differ diff --git a/assets/data-cleaning/zotero-searching-for-citation.png b/assets/data-cleaning/zotero-searching-for-citation.png deleted file mode 100644 index 6b1241b..0000000 Binary files a/assets/data-cleaning/zotero-searching-for-citation.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-select-citation-style.jpg b/assets/data-cleaning/zotero-select-citation-style.jpg new file mode 100644 index 0000000..8c3aa0f Binary files /dev/null and b/assets/data-cleaning/zotero-select-citation-style.jpg differ diff --git a/assets/data-cleaning/zotero-select-citation-style.png b/assets/data-cleaning/zotero-select-citation-style.png deleted file mode 100644 index 23f4969..0000000 Binary files a/assets/data-cleaning/zotero-select-citation-style.png and /dev/null differ diff --git a/assets/data-cleaning/zotero-standalone.jpg b/assets/data-cleaning/zotero-standalone.jpg new file mode 100644 index 0000000..d6999e1 Binary files /dev/null and b/assets/data-cleaning/zotero-standalone.jpg differ diff --git a/assets/data-cleaning/zotero-standalone.png b/assets/data-cleaning/zotero-standalone.png deleted file mode 100644 index 5788a32..0000000 Binary files a/assets/data-cleaning/zotero-standalone.png and /dev/null differ diff --git a/assets/image-convert.py b/assets/image-convert.py new file mode 100644 index 0000000..aef27c4 --- /dev/null +++ b/assets/image-convert.py @@ -0,0 +1,28 @@ +import os +from PIL import Image + + +def all_files(dirname): + for (root, _, files) in os.walk(dirname): + for fn in files: + yield os.path.join(root, fn) + + +def convert_image(file): + img = Image.open(file) + print(os.path.splitext(file)[0] + '.jpg') + img.save(os.path.splitext(file)[0] + '.jpg') + os.remove(file) + + +def main(): + fns = [] + for fn in all_files('.'): + if os.path.splitext(fn)[1] == '.png': + fns.append(fn) + + for fn in fns: + convert_image(fn) + +if __name__ == '__main__': + main() diff --git a/assets/issues/google-ngram-viewer.jpg b/assets/issues/google-ngram-viewer.jpg new file mode 100644 index 0000000..c5ee75d Binary files /dev/null and b/assets/issues/google-ngram-viewer.jpg differ diff --git a/assets/issues/google-ngram-viewer.png b/assets/issues/google-ngram-viewer.png deleted file mode 100644 index cd29b57..0000000 Binary files a/assets/issues/google-ngram-viewer.png and /dev/null differ diff --git a/assets/issues/visual-clarity.jpg b/assets/issues/visual-clarity.jpg new file mode 100644 index 0000000..53656c0 Binary files /dev/null and b/assets/issues/visual-clarity.jpg differ diff --git a/assets/issues/visual-clarity.png b/assets/issues/visual-clarity.png deleted file mode 100644 index b91dee2..0000000 Binary files a/assets/issues/visual-clarity.png and /dev/null differ diff --git a/assets/reading-at-scale/distant-reading-dinosaur.jpg b/assets/reading-at-scale/distant-reading-dinosaur.jpg new file mode 100644 index 0000000..95b9448 Binary files /dev/null and b/assets/reading-at-scale/distant-reading-dinosaur.jpg differ diff --git a/assets/reading-at-scale/distant-reading-dinosaur.png b/assets/reading-at-scale/distant-reading-dinosaur.png deleted file mode 100644 index f3f9b27..0000000 Binary files a/assets/reading-at-scale/distant-reading-dinosaur.png and /dev/null differ diff --git a/assets/reading-at-scale/distant-reading-graphs.jpg b/assets/reading-at-scale/distant-reading-graphs.jpg new file mode 100644 index 0000000..e07d847 Binary files /dev/null and b/assets/reading-at-scale/distant-reading-graphs.jpg differ diff --git a/assets/reading-at-scale/distant-reading-graphs.png b/assets/reading-at-scale/distant-reading-graphs.png deleted file mode 100644 index 1a95fc5..0000000 Binary files a/assets/reading-at-scale/distant-reading-graphs.png and /dev/null differ diff --git a/assets/reading-at-scale/sweeney-said.jpg b/assets/reading-at-scale/sweeney-said.jpg new file mode 100644 index 0000000..34fa7fa Binary files /dev/null and b/assets/reading-at-scale/sweeney-said.jpg differ diff --git a/assets/reading-at-scale/sweeney-said.png b/assets/reading-at-scale/sweeney-said.png deleted file mode 100644 index f45fedf..0000000 Binary files a/assets/reading-at-scale/sweeney-said.png and /dev/null differ diff --git a/assets/reading-at-scale/voyant-collocates.jpg b/assets/reading-at-scale/voyant-collocates.jpg new file mode 100644 index 0000000..7ed5892 Binary files /dev/null and b/assets/reading-at-scale/voyant-collocates.jpg differ diff --git a/assets/reading-at-scale/voyant-collocates.png b/assets/reading-at-scale/voyant-collocates.png deleted file mode 100644 index 1c54a75..0000000 Binary files a/assets/reading-at-scale/voyant-collocates.png and /dev/null differ diff --git a/assets/reading-at-scale/voyant-contexts.jpg b/assets/reading-at-scale/voyant-contexts.jpg new file mode 100644 index 0000000..8288dc5 Binary files /dev/null and b/assets/reading-at-scale/voyant-contexts.jpg differ diff --git a/assets/reading-at-scale/voyant-contexts.png b/assets/reading-at-scale/voyant-contexts.png deleted file mode 100644 index f3c6a59..0000000 Binary files a/assets/reading-at-scale/voyant-contexts.png and /dev/null differ diff --git a/assets/reading-at-scale/voyant-word-cloud-default.jpg b/assets/reading-at-scale/voyant-word-cloud-default.jpg new file mode 100644 index 0000000..b3453ee Binary files /dev/null and b/assets/reading-at-scale/voyant-word-cloud-default.jpg differ diff --git a/assets/reading-at-scale/voyant-word-cloud-default.png b/assets/reading-at-scale/voyant-word-cloud-default.png deleted file mode 100644 index 8a265c3..0000000 Binary files a/assets/reading-at-scale/voyant-word-cloud-default.png and /dev/null differ diff --git a/assets/topic-modeling/topic-modeling-highlights.jpg b/assets/topic-modeling/topic-modeling-highlights.jpg new file mode 100644 index 0000000..9059382 Binary files /dev/null and b/assets/topic-modeling/topic-modeling-highlights.jpg differ diff --git a/assets/topic-modeling/topic-modeling-highlights.png b/assets/topic-modeling/topic-modeling-highlights.png deleted file mode 100644 index bc2376d..0000000 Binary files a/assets/topic-modeling/topic-modeling-highlights.png and /dev/null differ diff --git a/close-reading/prism-part-one.md b/close-reading/prism-part-one.md index 7f7f2fd..5ddb325 100644 --- a/close-reading/prism-part-one.md +++ b/close-reading/prism-part-one.md @@ -14,15 +14,15 @@ The game asks you to make graphic representations of these decisions, to identif *Prism* is a digital version of the same game. Given a choice between a few predetermined categories, *Prism* asks you to highlight a given text. In this *Prism *example, readers are asked to mark an excerpt from Edgar Allan Poe's *The Raven*. By selecting one of the buttons next to the categories on the right, your cursor will change into a colored highlighter. Clicking and dragging across the text will highlight it in the same way that you might if you were reading a print version. -![prism highlights of the raven](/assets/close-reading/prism-raven-highlights.png) +![prism highlights of the raven](/assets/close-reading/prism-raven-highlights.jpg) After you click "Save Highlights", the tool combines your markings with those made by everyone else who has ever read the same *Prism* text to help you visualize how people are marking things. By default, *Prism* will bring up the **winning facet visualization**, which colors the text according to the category that was most frequently marked for each individual word. Clicking on an individual word will color the pie chart and tell you exactly what percentage the word got from each category. -![prism winning facet](/assets/close-reading/prism-raven-winning-facet.png) +![prism winning facet](/assets/close-reading/prism-raven-winning-facet.jpg) Seeing a graphic representation of the reading process might help you to notice things that you might not otherwise. For example, here you might notice that people tended to mark passages containing first person pronouns as "sense." Is it because "sense" implies thinking? Phrases like "I remember," "my soul grew," and "I stood there wondering" do suggest an emphasis on introspection, at the very least. Did you mark the same phrases, or did you select other passages? *Prism* comes with two visualizations baked into it. To change visualizations, click the "Font Size Visualization" button on the right sidebar. The **font size visualization** lets you see which parts of the text tended to be more frequently thought of as belonging to a particular category: *Prism* resizes the text to reflect concentrations of reading. So in this example, where readers were marking for "sound," they tended to mark rhyming words more frequently. -![prism font size visualization](/assets/close-reading/prism-raven-font-size.png) +![prism font size visualization](/assets/close-reading/prism-raven-font-size.jpg) Makes sense, and you might have done the same. By selecting the other category, you could check out what readers tended to mark for "sense." By design, *Prism* forces you to think more deeply about the categories that you are given for highlighting. The creator of this *Prism* wants you to mark for "sound" and "sense" - categories that relate to Alexander Pope's famous formulation of poetry from *An Essay on Criticism*. In it, Pope suggests that the sound of a poem should complement the underlying meaning of the poem. So the creator of this game wants you to try and pinpoint where these categories overlap and where they depart. You might not have known this context, though you might have intuited elements of it. Guided reading in this way might change how you otherwise would read the passage, and the absence of clear guidelines complicates your experience of the text. diff --git a/conclusion/adapting.md b/conclusion/adapting.md index d13233b..654c677 100644 --- a/conclusion/adapting.md +++ b/conclusion/adapting.md @@ -17,29 +17,29 @@ The contents of this book are hosted in [a repository on GitHub](https://github. First you will need to make a copy of our GitHub repository for your own account. When logged in and looking at our repository page, you should see these three buttons in the top-left corner of the window: -![fork button on github](/assets/conclusion/fork-button.png) +![fork button on github](/assets/conclusion/fork-button.jpg) **Forking** is Github's term for creating a copy of a repository for yourself - imagine a road forking and diverging into two paths. If you click fork, GitHub should start the copying process. When finished, you will be redirected to your fresh copy of the repository. -![copy of github repository after forking](/assets/conclusion/github-forking.png) +![copy of github repository after forking](/assets/conclusion/github-forking.jpg) Note the "forked from bmw9t/introduction-to-text-analysis" window at the top of the window, which lets you know where the book originated from. Above that you will see your own book's location. #### Publishing -You have a copy of the files that make up the book, but you will need to sync them with GitBooks if you want to publish them online in the same way that we have done here. To do so, after logging into GitBooks you will click on the green 'Import Button.' ![gitbook add book button](/assets/conclusion/gitbook-add-book.png) +You have a copy of the files that make up the book, but you will need to sync them with GitBooks if you want to publish them online in the same way that we have done here. To do so, after logging into GitBooks you will click on the green 'Import Button.' ![gitbook add book button](/assets/conclusion/gitbook-add-book.jpg) Selecting the "GITHUB" option, you will need to link your GitHub account and verify your account by an email. -![import github repository to gitbook](/assets/conclusion/gitbooks-import-github.png) +![import github repository to gitbook](/assets/conclusion/gitbooks-import-github.jpg) After linking your GitHub account, if you have more than one respository under your name you will have to select the one that you want to import to GitBooks. In this case, we will import the *Introduction to Text Analysis* repository. -![select your repo in GitBooks](/assets/conclusion/gitbook-repo-selection.png) +![select your repo in GitBooks](/assets/conclusion/gitbook-repo-selection.jpg) Give your repository a name and a description, and you're all set. A complete form should look something like this: -![Complete form for importing a github repository into GitBooks](/assets/conclusion/gitbooks-github-complete-import-template.png) +![Complete form for importing a github repository into GitBooks](/assets/conclusion/gitbooks-github-complete-import-template.jpg) You now have a working copy of the book hosted on GitHub and rendered in GitBooks (GitBooks should automatically redirect you to your copy). You can do anything you want with these files, and they won't affect our own base copy of the resources. @@ -57,15 +57,15 @@ If markdown feels too complicated, GitBooks also provides a handy [desktop edito But I can also highlight text and press command + b as I would in Microsoft Word to produce the same effect. -![gitbooks editor interface](/assets/conclusion/gitbooks-editor-interface.png) +![gitbooks editor interface](/assets/conclusion/gitbooks-editor-interface.jpg) The interface provides a preview of what your text will look like to the right of the window, which can be very helpful if you are new to markdown. If you do decide to work in the GitBooks Editor, you will need to log in the first you do so. Then select the "GitBooks.com" option for importing. -![gitbooks cloning locally](/assets/conclusion/gitbooks-clone.png) +![gitbooks cloning locally](/assets/conclusion/gitbooks-clone.jpg) The computer will **clone**, or copy, the book to your computer. From there, you can follow the instructions in the [editor's documentation](https://help.gitbook.com/). The only significant difference from MS Word is that, after saving your work, you will need to click the sync button to upload your content to GitHub. -![gitbooks sync](/assets/conclusion/gitbooks-sync.png) +![gitbooks sync](/assets/conclusion/gitbooks-sync.jpg) After doing so, any changes you have made from the GitBooks editor will also change the GitHub repository's files, which will then automatically get rendered in the GitBooks version of the site. You are all set! @@ -73,7 +73,7 @@ After doing so, any changes you have made from the GitBooks editor will also cha If you are planning to use terminal, the process is fairly similar. Once you have forked and have your own copy of the book on github, you will just clone it to your computer using the clone url found at the top of your repository's page on GitHub. Here is the one for the original book: -![github clone url](/assets/conclusion/clone-url.png) +![github clone url](/assets/conclusion/clone-url.jpg) Find your own clone url, copy it to your clipboard, and use it like so (without curly braces): diff --git a/conclusion/where-to-go.md b/conclusion/where-to-go.md index 5fbe39c..ce116d7 100644 --- a/conclusion/where-to-go.md +++ b/conclusion/where-to-go.md @@ -2,7 +2,7 @@ While writing this book, I used [GitBook's text editor](https://www.gitbook.com/editor/osx) so that I could preview the final product before it was published online. I found it really annoying to type while my text was screaming at me like this: Test -![sentence difficulty in GitBook editor](/assets/conclusion/sentence-difficulty.png) +![sentence difficulty in GitBook editor](/assets/conclusion/sentence-difficulty.jpg) The most irritating thing was that I could not tell what metrics they were using to diagnose my writing. What makes a sentence difficult? The number of words in each sentence? The number of clauses? Subjects in particular positions? I have all sorts of opinions about why writing might be unclear, but, as best I could tell, the editor was mostly basing their suggestions on the number of words in each sentence. I turned the feature off and went on with my life, but not before noting a truism of working digital humanities: using a tool built by someone else forces you to abide by their own assumptions and conventions. You might have had similar feelings while reading this book. You have used a series of powerful tools in the course of working through this book, but tools have their limitations. While using *Prism*, for example, you might have wished that you could see an individual user's interpretations to compare it with the group's reading. Or when using *Voyant*, you might have wondered if you could analyze patterns in the use of particular parts of speech throughout a text. diff --git a/crowdsourcing/prism-part-two.md b/crowdsourcing/prism-part-two.md index fa7fdb0..4871abe 100644 --- a/crowdsourcing/prism-part-two.md +++ b/crowdsourcing/prism-part-two.md @@ -2,7 +2,7 @@ Think back to *[Prism](prism.scholarslab.org)* and the transparency game. So far we have only really focused on the single transparencies, the individual interpretations supplied by each person. But the crucial last element of the game involves collecting the transparencies and stacking them. Hold the stack up the light, and you get a whole rainbow. *Prism*'s visualizations offer one way of adapting this activity to a digital environment. -![prism transparencies stacked](/assets/crowdsourcing/prism-future-stacked.png) +![prism transparencies stacked](/assets/crowdsourcing/prism-future-stacked.jpg) In this photo from the "Future Directions" page for *Prism*, you can see the prototype for another possible visualization that would shuffle through the various sets of highlights. Even without this animated interpretation, *Prism* allows you to get a sense of how a whole group interprets a text. When you upload your own markings, they become collected along with those of everyone who has ever read that text in *Prism*. We can begin to get some sense of trends in the ways that the group reads. @@ -10,12 +10,12 @@ In this photo from the "Future Directions" page for *Prism*, you can see the pro *Prism* offers a few options to facilitate group reading. Most importantly, it assumes very little about how its users will use the tool. Anyone can upload their own text as a *Prism* and, within certain guidelines, adapt the tool to their own purposes. When logged in, you can create a *Prism* by clicking the big create button to pull up the uploading interface: -![prism creation interface](/assets/crowdsourcing/prism-create-one.png) +![prism creation interface](/assets/crowdsourcing/prism-create-one.jpg) When uploading a text, you will paste your text into the window provided. *Prism* does not play with super long texts, so you may have to play around in order to find a length that works for the tool as well as for you. The three facets on the right correspond to the three marking categories according to which you want users to highlight. The rest of these categories should be self-explanatory. Note, however, that you will only be able to give a short description to readers: your document and marking categories will largely have to stand on their own. Below these main parameters for your text, you will be asked to make some other choices that may be less intuitive: -![listed vs unlisted prism interface](/assets/crowdsourcing/prism-create-two.png) +![listed vs unlisted prism interface](/assets/crowdsourcing/prism-create-two.jpg) By default, *Prism* assumes that the text you will upload and all its markings will be made available to the public. Selecting **unlisted** will make your *Prism* private so that it will be viewable only to people to whom you send the URL. Once you create the *Prism*, you will want to be extra certain that you copy that URL down somewhere so that you can send it out to your group. @@ -23,7 +23,7 @@ By default, *Prism* assumes that the text you will upload and all its markings w Once you upload a text, the easiest way to find it will be to go your personal page by clicking the "MYPRISMS" link from the top menu. In this profile page, you can easily access both those texts that you have uploaded as well as the ones that you have highlighted by others (quite handy if you lose the URL for an unlisted text). -![myprisms page](/assets/crowdsourcing/prism-myprisms.png) +![myprisms page](/assets/crowdsourcing/prism-myprisms.jpg) With these tools, you can upload a range of texts for any kind of experiment. It is tempting to say that you are limited by your imagination, but you will run up against scenarios in which the parameters for the tool cause you headaches. That's OK! Take these opportunities to reflect: diff --git a/cyborg-readers/voyant-part-one.md b/cyborg-readers/voyant-part-one.md index 68821d9..d2adde1 100644 --- a/cyborg-readers/voyant-part-one.md +++ b/cyborg-readers/voyant-part-one.md @@ -4,17 +4,17 @@ We will be using a tool called [Voyant](http://voyant-tools.org/) to introduce s Upon arriving at Voyant you will encounter a space where you can upload texts. For the following graphs, we have uploaded the full text of _The String of Pearls_, the 1846-7 penny dreadful that featured Sweeney Todd, the demon barber of Fleet Street. Feel free to [download that dataset](/assets/the-string-of-pearls-full.txt) and use it to produce the same results for following along, or upload your own texts using the window provided. -![Voyant splash page and text uploader](/assets/cyborg-readers/voyant-splash-page.png) +![Voyant splash page and text uploader](/assets/cyborg-readers/voyant-splash-page.jpg) After Voyant processes your text you'll get a series of window panes with lots of information. Voyant packages several features into one tight digital package: each pane offers you different means of interacting with the text. -![default view of string of pearls in voyant](/assets/cyborg-readers/voyant-overview.png) +![default view of string of pearls in voyant](/assets/cyborg-readers/voyant-overview.jpg) Voyant gives you lots of options, so do not be overwhelmed. Voyant provides great documentation for working through their interface, and we will not rehearse them all again [here](http://docs.voyant-tools.org/start/). Instead, we will just focus on a few features. The top left pane may be the most familiar to you: -![voyant default wordcloud of string of pearls](/assets/cyborg-readers/voyant-word-cloud-default.png) +![voyant default wordcloud of string of pearls](/assets/cyborg-readers/voyant-word-cloud-default.jpg) Word clouds like these have been made popular in recent years by [Wordle](http://www.wordle.net/). They do nothing more than count the different words in a text: the more frequent a particular word appears, the larger its presence in the word cloud. In fact, Voyant allows you to see the underlying frequencies that it is using to generate the cloud if you click the "Corpus Terms" button above the word cloud. -![underlying corpus term frequency](/assets/cyborg-readers/voyant-term-frequencies.png) +![underlying corpus term frequency](/assets/cyborg-readers/voyant-term-frequencies.jpg) Concordances like these are some of the oldest forms of text analysis that we have, and computers are especially good at producing them. In fact, a project of this kind frequently cited as one of the origin stories of digital humanities: [Father Roberto Busa's massive concordance of the works of St. Thomas Aquinas](http://www.historyofinformation.com/expanded.php?id=2321) begun on punch cards in the 1940's and 50's was one of the first works of its kind and was instrumental in expanding the kinds of things that we could use computers to do. @@ -22,11 +22,11 @@ Busa's work took years. We can now carry out similar searches in seconds, and we Notice the words that you do not see on this list: words like 'a' or 'the.' Words like these, **stopwords** are _so_ common that they are frequently excluded from analyses entirely, the reasoning being that they become something like linguistic noise, overshadowing words that might be more meaningful to the document. To see the words that Voyant excludes by default, hover next to the question mark at the top of the pane and click the second option from the right: -![voyant settings](/assets/cyborg-readers/voyant-settings.png) +![voyant settings](/assets/cyborg-readers/voyant-settings.jpg) Use the dropdown list to switch from 'auto-detect' to none. Now the concordance will show you the actual word frequencies in the text. Notice that 'said', the number one result in the original graph, does not even come close to the frequent usage of articles, prepositions, and pronouns. -![concordance with no stopwords](/assets/cyborg-readers/stopword-free-concordance.png) +![concordance with no stopwords](/assets/cyborg-readers/stopword-free-concordance.jpg) Words like these occur with such frequency that we often need to remove them entirely in order to get meaningful results. But the list of words that we might want to remove might change depending on the context. For example, language does not remain stable over time. Different decades and centuries have different linguistic patterns for which you might need to account. Shakespearean scholars might want to use an [early modern stopword list](file.path/assets/early-modern-stopwords.txt) provided by Stephen Wittek. You can use this same area of _Voyant_ to edit the stoplist used by your analyis. @@ -34,7 +34,7 @@ There are some instances in which we might care a lot about just these words. Th Return to the word cloud. Using the slider below the word cloud, you can reduce or expand the number of terms visible in the visualization. Slide it all the way to the left to include the maximum number of words. -![voyant word clouse dense](/assets/cyborg-readers/voyant-word-cloud-dense.png) +![voyant word clouse dense](/assets/cyborg-readers/voyant-word-cloud-dense.jpg) Just like the stopword list can be used to adjust the filters that help to give you meaningful results, this slider adjusts the visualization that you get. It should become clear as you play with both options that different filters and different visualizations can give you radically different readings. The results are far from objective: your own reading, the tool itself and how you use it all shape the data as it comes to be known. @@ -52,7 +52,7 @@ Additionally, most of the words are relatively short and most are only one or tw If we load Arthur Conan Doyle's "A Scandal in Bohemia" into Voyant, you can see that we get quite different results. \(Again, feel free to follow along!\) -![scandal in bohemia word cloud](/assets/cyborg-readers/scandal-in-bohemia-word-cloud.png) +![scandal in bohemia word cloud](/assets/cyborg-readers/scandal-in-bohemia-word-cloud.jpg) A quick glance shows that the most common words tend to be longer in length than those in _A String of Pearls_, with the three syllable "photograph" being one of the most frequently used terms in this short story, one written for a middle-class as opposed to lower-class audience. So maybe the simple vocabulary of the penny dreadful is related to the nature of its readership. diff --git a/data-cleaning/problems-with-data.md b/data-cleaning/problems-with-data.md index ddda961..9c2aa3a 100644 --- a/data-cleaning/problems-with-data.md +++ b/data-cleaning/problems-with-data.md @@ -8,13 +8,13 @@ The basic principle to remember is "GIGO," or "garbage in, garbage out:" you are Take this image taken from a newspaper ad for the American film version of Sherlock Holmes: -![sherlock holmes article clipping](/assets/data-cleaning/holmes.png) +![sherlock holmes article clipping](/assets/data-cleaning/holmes.jpg) By default, the computer has no idea that there is text inside of this image. For a computer, an image is just an image, and you can only do image-y things to it. The computer could rotate it, crop it, zoom in, or paint over parts of it, but your machine cannot read the text there - unless you tell it how to do so. The computer requires a little extra help to pull out the text information from the image. The process of using software to extract the text from an image of a text is called **optical character recognition** or OCR. There are many tools that can do this, and some are proprietary. All of these tools are only so good at the process. Running this image through tesseract, a common tool for OCR'ing text, I get something like this: -![ocr'd sherlock holmes text](/assets/data-cleaning/holmes-ocr-text.png)The material here is still recognizable as being part of the same text, though there are obvious problems with the reproduction. At first blush, you might think, "This should be easy! I learned to read in first grade \[or whenever you learned to read\]. I can even read things written in cursive! Why does the computer have such a hard time with this?" This is one of those instances where what is really easy for you is really hard for a computer. Humans are great at pattern recognition, which is essentially what OCR is. Computers, not so much. +![ocr'd sherlock holmes text](/assets/data-cleaning/holmes-ocr-text.jpg)The material here is still recognizable as being part of the same text, though there are obvious problems with the reproduction. At first blush, you might think, "This should be easy! I learned to read in first grade \[or whenever you learned to read\]. I can even read things written in cursive! Why does the computer have such a hard time with this?" This is one of those instances where what is really easy for you is really hard for a computer. Humans are great at pattern recognition, which is essentially what OCR is. Computers, not so much. OCR'ing text is actually a pretty complicated problem for computers. [WhatFontis.com](https://www.whatfontis.com) lists over 342,000 fonts, and this count only appears to include Western fonts. A single word will look slightly different in each font and at each size. And that doesn't even begin to account for hand-written text or text that has been partially damaged: even a slight imperfection in a letter can complicate the scanning process. The process is complicated and takes a lot of work: even the most expensive OCR software is prone to errors. If you see clean text transcriptions of an image online, odds are high that a human cleaned up the OCR to make it readable. You can find a more detailed explanation of how OCR workings [here](http://www.explainthatstuff.com/how-ocr-works.html). @@ -28,7 +28,7 @@ I'm going to count to ten! You probably meant to have a 9 in there, but the computer will have no idea that you probably mistyped and left out a number. You would have to specifically tell it to account for such errors. This simple fact about computational logic becomes a big problem in the humanities, because humanities data is _messy_. To see what I mean, go check out the Wikipedia section on Sir Arthur Conan Doyle's [name](https://en.wikipedia.org/wiki/Arthur_Conan_Doyle#Name). I will wait. Here is a picture of a cat in the meantime. Imagine it's a cat high fiving you when you clean up some data. -![high fiving cat](/assets/data-cleaning/data-cat-high-five.png) +![high fiving cat](/assets/data-cleaning/data-cat-high-five.jpg) Did you read it? Don't lie to me. diff --git a/data-cleaning/zotero.md b/data-cleaning/zotero.md index bb1d744..50b6fb4 100644 --- a/data-cleaning/zotero.md +++ b/data-cleaning/zotero.md @@ -43,39 +43,39 @@ To deal with citation situations like this, the academic community has developed First visit the [Zotero download page](https://www.zotero.org/download). You can run Zotero out of Firefox, which allows you to capture and manage your citations without leaving the browser. But we like to separate the process of collecting citations from managing and using them. Download "Zotero Standalone" from the right window pane: -![zotero download](/assets/data-cleaning/zotero-download.png) +![zotero download](/assets/data-cleaning/zotero-download.jpg) This will download an application to your desktop that, if you're like me, you'll want to put in a place where you'll have quick and easy access to it. While you're at it, you will need to add at least one of the browser extensions by clicking on the button in the same pane. I recommend you just add the extension to every browser that you have on your computer: this little download is what will allow you to pull citation information down off a webpage. Once you download all those, open up Zotero Standalone. It should look something like this: -![zotero standalone screen](/assets/data-cleaning/zotero-standalone.png) +![zotero standalone screen](/assets/data-cleaning/zotero-standalone.jpg) Your Zotero installation and library will look different from Brandon's, because his is full of materials related to things he has written. Let's grab some metadata to store it in Zotero. We will do this three different ways. First, let's enter information manually. Let's pull out our copy of Rosalind Crone's _Violent Victorians_ \(you can find the relevant metadata [here](https://www.amazon.com/dp/071908685X/?tag=mh0b-20&hvadid=4965340066&hvqmt=p&hvbmt=bp&hvdev=c&ref=pd_sl_6usflqlo9o_p) if you don't have your own copy. By clicking on the plus sign at the top, you can select the type of object you are adding to your collection. This is a book, so let's select that. -![adding a book citation](/assets/data-cleaning/zotero-add-citation.png) +![adding a book citation](/assets/data-cleaning/zotero-add-citation.jpg) Doing so will shift your Zotero pane so that you can enter the metadata for Crone's book on the right. Go ahead and do that. -![adding crone citation](/assets/data-cleaning/zotero-editing-pane.png) +![adding crone citation](/assets/data-cleaning/zotero-editing-pane.jpg) Zotero now knows about our citation and we could use it for any number of things. But before we move on, let's cover two other ways to add citation information. Every book is given an an identifying number, an **International Standard Book Number \(ISBN\)**. This number is unique to every book. Zotero can map these numbers to their associated metadata. Clicking on the magic wand at the top of the Zotero Standalone pane will give you a place to enter an ISBN: -![magic wand ISBN input](/assets/data-cleaning/zotero-magic-wand.png) +![magic wand ISBN input](/assets/data-cleaning/zotero-magic-wand.jpg) Try entering this one: 0520221680. Zotero will automatically go out and grab the metadata for its associated book: _Spectacular Realities_ by Vanessa Schwartz. -![input by isbn](/assets/data-cleaning/zotero-input-by-isbn.png) +![input by isbn](/assets/data-cleaning/zotero-input-by-isbn.jpg) Magic! But wait - there's more. Visit the Amazon webpage for _[Sara Baartman and the Hottentot Venus: A Ghost Story and a Biography](https://www.amazon.com/Sara-Baartman-Hottentot-Venus-Biography/dp/0691147965)_. If you pay careful attention to your toolbars at the top of the webpage, you may have noticed a new one for Zotero \(fourth from the left in this image from Brandon's computer\). -![zotero download from chrome](/assets/data-cleaning/zotero-download-from-chrome.png) +![zotero download from chrome](/assets/data-cleaning/zotero-download-from-chrome.jpg) By default, Zotero just assumes you are trying to grab the webpage itself. When you visit a page with a citation you can download, however, the Zotero icon will change accordingly as it recognizes the metadata embedded in the page. Zotero will suck down the metadata on the page and store them in your Standalone App so that you can use them later. Magic! -![zotero input from web](/assets/data-cleaning/zotero-input-from-web.png) +![zotero input from web](/assets/data-cleaning/zotero-input-from-web.jpg) Well, not quite. We won't get lost in the weeds of explaining the technical details of how this works, as that is a subject for a different class. For now, just know that Zotero is interacting with hidden information on the webpage that provides information to programs like it. The average user never knows that these webpages contain information like this, but Zotero can leverage that invisible data into powerful content to make your life easier. @@ -85,19 +85,19 @@ That being said, you need to remember the principle of Garbage In, Garbage Out. Now that we have our metadata, the fun begins. If you use Microsoft Word, strap in and buckle up. Go to the Zotero menu and select Preferences. From the 'Cite' menu, Install the Microsoft Word Add-in. Doing so will add a special 'Zotero' menu to every Microsoft Word document that you open. -![zotero menu in word](/assets/data-cleaning/zotero-menu-in-word.png) +![zotero menu in word](/assets/data-cleaning/zotero-menu-in-word.jpg) There are lot of options here, but the most important for right now are 'Add Bibliography' and 'Add Citation'. First, click 'Add Citation.' You will need to select a citation style since there is none associated with this document yet. Let's suggest MLA 7th Edition because Brandon is a literary studies person. It will make him happy. -![zotero selection citation style](/assets/data-cleaning/zotero-select-citation-style.png) +![zotero selection citation style](/assets/data-cleaning/zotero-select-citation-style.jpg) Next pops up an input field that asks you to give some information so that it can locate a citation for you. Typing in 'Crone' will allow Zotero to recognize the author for the book we inputted. It will bring up the metadata we want. Click it to accept. -![zotero searching for citation](/assets/data-cleaning/zotero-searching-for-citation.png) +![zotero searching for citation](/assets/data-cleaning/zotero-searching-for-citation.jpg) The input field now shows how the citation will show up. In most cases, however, we want to customize it. To do so, click on the citation to bring up some more options. Here you can add page numbers or, importantly suppress the author name depending on whether you only need the page numbers themselves. Let's give this entry the page numbers 45-7. Hit return to accept your changes. -![zotero give page numbers to citation](/assets/data-cleaning/zotero-give-page-numbers-to-citation.png) +![zotero give page numbers to citation](/assets/data-cleaning/zotero-give-page-numbers-to-citation.jpg) Et voila! Your citation appears in the text in just the same way as if you were doing it by hand, properly formatted with the correct page numbers. The process might appear a little slow, but once you get the hang of the workflow, this process greatly speeds up writing. Gather all your citations in one place, and then learn the handful of keyboard shortcuts for working with Zotero in word. These are the ones I use most often: @@ -105,13 +105,13 @@ Et voila! Your citation appears in the text in just the same way as if you were * 'arrow keys' allow you to highlight particular objects from the Zotero search when inputting a citation * 'return' selects a particular citation. * 'cmd + down arrow' will bring up additional options like adding page numbers once you have a citation selected in the add citation input field. - !\[zotero example citation\[\(\/assets\/zotero\-example\-citation.png\) + !\[zotero example citation\[\(\/assets\/zotero\-example\-citation.jpg\) Get the hang of these commands, and you'll save loads of time. But adding citations is only one part of the process. You also want to add a bibliography to your document based on those citations. Zotero can do this too! From the Zotero menu in Word, select 'Add Bibliography'. Zotero will magically format the metadata you're using into bibliographical entries. Then format those into a bibliography based on the citation style you have chosen for the document. By default, the bibliography will appear wherever your cursor was. So you'll need to move it around to put it in a location that works for you. Add a couple more citations using the other things we added to Zotero and see if you can get it looking reasonable. Here is what I came up with: -![zotero text with bibliography](/assets/data-cleaning/zotero-document-with-bibliography.png) +![zotero text with bibliography](/assets/data-cleaning/zotero-document-with-bibliography.jpg) I put the bibliography at the end of the text and hit return a few times after the last sentence to give it space \(you often might insert a page break to put it on its own page\). I gave it a centered heading. And I inserted a couple other citations to flesh things out. diff --git a/issues/google-ngram.md b/issues/google-ngram.md index 7a196e7..89eff2e 100644 --- a/issues/google-ngram.md +++ b/issues/google-ngram.md @@ -1,7 +1,7 @@ # Google NGram Viewer The [Google NGram Viewer](https://books.google.com/ngrams) is often the first thing brought out when people discuss large-scale textual analysis, and it serves well as a basic introduction into the possibilities of computer-assisted reading. -![google ngram splash page](/assets/issues/google-ngram-viewer.png) +![google ngram splash page](/assets/issues/google-ngram-viewer.jpg) The Google NGram Viewer provides a quick and easy way to explore changes in language over the course of many years in many texts. Provide a word or comma-seperated phrase, and the NGram viewer will graph how often those search terms occur over a given corpus for a given number of years. You can specify a number of years as well as a particular google books corpus. @@ -78,6 +78,6 @@ That last phrase should cause some alarm: we haven't actually read any of these ## Interpretation -Of course, these graphs mean nothing on their own. It is our job to look at the results and describe them in meaningful ways. But be critical of what you see. You might find something interesting, but you might be looking at nonsense. It is your job to tell the difference. Beware of **apophenia**, the all to human urge to look at random data and find meaningful patterns in it. You can find wild patterns in relationships in anything if [you look hard enough](http://tylervigen.com/spurious-correlations). After all, visualizations can confuse as much as clarify. Numbers and graphs do not carry objective meaning.![apophenia illustrated - noise illustration](/assets/issues/visual-clarity.png) +Of course, these graphs mean nothing on their own. It is our job to look at the results and describe them in meaningful ways. But be critical of what you see. You might find something interesting, but you might be looking at nonsense. It is your job to tell the difference. Beware of **apophenia**, the all to human urge to look at random data and find meaningful patterns in it. You can find wild patterns in relationships in anything if [you look hard enough](http://tylervigen.com/spurious-correlations). After all, visualizations can confuse as much as clarify. Numbers and graphs do not carry objective meaning.![apophenia illustrated - noise illustration](/assets/issues/visual-clarity.jpg) Always think. Never let a graph think for you. \ No newline at end of file diff --git a/reading-at-scale/distant-reading.md b/reading-at-scale/distant-reading.md index ba45e58..85dde93 100644 --- a/reading-at-scale/distant-reading.md +++ b/reading-at-scale/distant-reading.md @@ -32,11 +32,11 @@ If you have a corpus where the dates are known, you can begin to draw inferences It is easy to think that the results given to you by the computer are correct, to take them at their word. After all, how could numbers lie? The truth is, however, that statistics are the results of the biases of the people who produced them. Seemingly good statistics can make anything seem like objective truth when there might not be anything more than a pretty picture: -![bad statistics make a dinosaur](/assets/reading-at-scale/distant-reading-dinosaur.png) +![bad statistics make a dinosaur](/assets/reading-at-scale/distant-reading-dinosaur.jpg) And a flashy visualization can just as easily show nothing: -![bad visualization](/assets/reading-at-scale/distant-reading-graphs.png) +![bad visualization](/assets/reading-at-scale/distant-reading-graphs.jpg) Your own results might be the result of some setting that you have configured just slightly incorrectly. Or maybe you uploaded the wrong text. Or maybe you are misunderstanding how the tool works in the first place. If something has you scratching your head, take a step back and see if there is something wrong with your setup. diff --git a/reading-at-scale/voyant-part-two.md b/reading-at-scale/voyant-part-two.md index 9d652e8..0208a64 100644 --- a/reading-at-scale/voyant-part-two.md +++ b/reading-at-scale/voyant-part-two.md @@ -2,13 +2,13 @@ Look back at the word cloud that Voyant gave us for _The String of Pearls_: -![voyant default wordcloud of string of pearls](/assets/reading-at-scale/voyant-word-cloud-default.png) +![voyant default wordcloud of string of pearls](/assets/reading-at-scale/voyant-word-cloud-default.jpg) Using the standard stopword filter in Voyant, the most common word by far is 'said.' Taken alone, that might not mean an awful lot to you. But it implies a range of conversations: people speaking to each other, about different things. One of the limitations of a concordance is that it only shows you a very high-level view of the text. Once you find an interesting observation, such as 'said' being the most frequent word in the text, you might want to drill down more deeply to see particular instances of the usage. Voyant can help you just do that by providing a number of context-driven tools. In the bottom-right pane Voyant provides with a series of options for examining the contexts around a particular word. You can change the word being examined by selecting a new word from the 'Reader' pane. By adjusting the context slider, you can modify exactly how much context \(i.e., how many words\) you see around the instances of the word you are examining. Tools like these can be helpful for interpreting the more quantitative results that the tool provides you. 670 instances of 'said' might not mean an awful lot, and the contexts pane can help you to understand how this word is being used. In this case, it can be useful for seeing different conversations: frequently, said followed by a name indicates dialogue from a particular character. -![voyant contexts](/assets/reading-at-scale/voyant-contexts.png) +![voyant contexts](/assets/reading-at-scale/voyant-contexts.jpg) In this list of the first ten uses of 'said', two of them are closely joined with a name: 'Sweeney Todd.' If we look back at the word cloud for the text, we can see that these two words occur with high frequency in the text itself. Given this information, we might become interested in a series of related questions: How often is he talking? What is he talking about? Who is he talking to? @@ -22,7 +22,7 @@ You may have heard of **ngrams** from [the Google Ngram Viewer](https://books.go **Collocations** are words that tend to occur together in meaningful patterns: so 'good night' is a collocation because it is part of a recognized combination of words whose meaning change when put together. 'A night,' on the other hand, is not a collocation because the words do not form a unit of meaning in the same way. We can think of collocations as bigrams that occur with such frequency that the combination itself is meaningful in some way. -![voyant collocates](/assets/reading-at-scale/voyant-collocates.png) +![voyant collocates](/assets/reading-at-scale/voyant-collocates.jpg) In this case, it helps to know that the 'context' slider allows you to find sentences where two words occur near each other. So setting a context of three for 'sweeney' and 'todd' will give you all the three word phrases in which those two words occur: they do not need to be contiguous. So in this case "Sweeney Todd said" would match, as would "Sweeney said Todd." Each row tells you how often those words appear within a certain distance from each other. @@ -32,7 +32,7 @@ Click on the row that lists 'said sweeney 52'. Many of the windows in Voyant are When you do, you will see a graph of the selected collocation over time. 'Sweeny' and 'said' occur within a space of three words in highly variable amounts over the course of the text. By looking at the graph, we can get a rough idea of when Sweeny Todd speaks over the course of the narrative. -![graph of sweeney said](/assets/reading-at-scale/sweeney-said.png) +![graph of sweeney said](/assets/reading-at-scale/sweeney-said.jpg) To graph things, Voyant breaks up your document into a series of segments \(you can change how many it uses\). Within each piece of the text it calculates how often the selected phrase or word appears. In this case, we might say Sweeney Todd talks significantly more in the first 70% of the text than he does in the last portion. Since you read the last few chapters of the novel, you might have a sense of why this is. The end of the text deals more with revelations about Todd's actions \(and the consequences of those actions\) than his actions themselves. Of course, you wouldn't know this if you hadn't read portions of the text, a good example of how "distant reading" and regular, old-fashioned reading can and should enrich each other. diff --git a/topic-modeling/bags-of-words.md b/topic-modeling/bags-of-words.md index b913b8c..f70d6ac 100644 --- a/topic-modeling/bags-of-words.md +++ b/topic-modeling/bags-of-words.md @@ -14,7 +14,7 @@ The exercept is from a letter about the Jack the Ripper murders from the *Pall M Now let's take the same materials but highlight for each word. REWORKS THIS PIECE -![topic modeling highlights](/assets/topic-modeling/topic-modeling-highlights.png) +![topic modeling highlights](/assets/topic-modeling/topic-modeling-highlights.jpg) > How does this work?