Skip to content

Programmer Documentation: Import System

craig8196 edited this page Feb 9, 2015 · 1 revision

This document covers how to create code that interfaces with a custom formatted dataset or creating an analysis tool for creating topic distributions for a dataset. For importing a dataset using a standard document format see Importing a Dataset.

Datasets

To implement your own dataset interface create your own python script in the import_tool/dataset/interfaces directory. Then create a python class that inherits from the AbstractDataset class found in import_tool/dataset/interfaces/abstract_dataset.py (also, there is a AbstractDocument class that must be implemented). From there follow the comments found in the previously named file. To make the dataset class work with the command-line utility add your class in the import_tool/dataset/interfaces/__init__.py file. Note that there is a --dry-run flag when using the command-line utility to aid with debugging.

Analyses

Creating a way to perform an analysis on an imported dataset is similar to creating an interface to a dataset. First, create your own python script in import_tool/analysis/interfaces. Second, create a class that inherits from AbstractAnalysis as found in import_tool/analysis/interfaces/abstract_analysis.py. Third, follow the documentation as found in the previously named file. Fourth, test your analysis by hooking it up to the command-line utility by adding the class to a list in import_tool/analysis/interfaces/__init__.py. Then use the --dry-run flag to test your analysis on an existing dataset.

Import System Utilities

If you don't want to use the command-line utility, or you want to create some other way for the user to select options before importing a dataset or running an analysis, then use the methods as found in import_tool/import_system_utilities.py. This file contains the methods that import the data into the database from the interfaces as mentioned above. See the documentation in the file for more details.