User Documentation: Importing a Dataset

In order for the import system to read your dataset without having to write code the documents must be in a certain format. The command-line utility takes a folder, or directory, of where your documents are located. That folder must contain a file named dataset_metadata.txt and a directory named documents. The format of each is specified below.

Dataset Metadata Format

The dataset_metadata.txt file specifies metadata about the dataset itself, not to be confused with document metadata. Each line in the file represents a metadata key-value pair. The key is text naming the data and the value is text, a date, or a number. The value is separated from the key by a colon (":"). Note that any other colons on the same line are part of the value and not the key; also, white space will be removed from either side of the key and the value. There are only two keys that are special: readable_name and description. The readable_name key specifies the name of the dataset as it will appear on the website. The description key specifies the text that will be displayed describing the details of the dataset. For example:

readable_name: My Example Dataset
description: This is just a description. This dataset is a trivial example.
a number: 42
a date: 12/31/2011

Documents Directory and Format

The documents directory must contain files, each file represents a document. Each document file must be in ASCII or UTF-8 format, otherwise some characters may be dropped during the import process. The top of each document must contain the document's metadata. After a blank line the text of the document begins. For example:

metadata: this is some metadata

The document text starts here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Documentation: Importing a Dataset

Dataset Metadata Format

Documents Directory and Format

Contents

Home

User Documentation

Interface

Importing a Dataset

Metrics

Programmer Documentation

Dependencies

Web API v1

Import System

GUI

Visualizations

Clone this wiki locally