Skip to content

[Dataset] Typo in util.py causes WikiConv downloads to fail #350

Description

@jpwchang

If you attempt to run any of the example code from the WikiConv documentation, e.g., wikiconv_russian_2004 = Corpus(filename=download("wikiconv-russian-2004")), you will find that the code immediately crashes with a 404 error. From my debugging, this is ultimately caused by a simple typo in util.py: the _get_wikiconv_year_info function builds the download URL with the string "corpus_zipped", when the actual URL on the zissou server is "corpus-zipped" (dash instead of underscore)

Steps to reproduce

  • Run download("wikiconv-russian-2004") (or any other wikiconv corpus)
  • Observe that this immediately dies with a 404 error

Additional information

This was tested on the latest ConvoKit (4.1.1) running in a Python 3.11.15 conda environment on a Linux server (but, the typo still exists as of the most recent commits on the ConvoKit GitHub).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions