-
Notifications
You must be signed in to change notification settings - Fork 13
Programmer Documentation: Import System
This document covers how to create code that interfaces with a custom formatted dataset or creating an analysis tool for creating topic distributions for a dataset. For importing a dataset using a standard document format see Importing a Dataset.
To implement your own dataset interface create your own python script in the import_tool/dataset/interfaces
directory. Then create a python class that inherits from the AbstractDataset
class found in import_tool/dataset/interfaces/abstract_dataset.py
(also, there is a AbstractDocument class that must be implemented). From there follow the comments found in the previously named file. To make the dataset class work with the command-line utility add your class in the import_tool/dataset/interfaces/__init__.py
file. Note that there is a --dry-run
flag when using the command-line utility to aid with debugging.
Creating a way to perform an analysis on an imported dataset is similar to creating an interface to a dataset. First, create your own python script in import_tool/analysis/interfaces
. Second, create a class that inherits from AbstractAnalysis
as found in import_tool/analysis/interfaces/abstract_analysis.py
. Third, follow the documentation as found in the previously named file. Fourth, test your analysis by hooking it up to the command-line utility by adding the class to a list in import_tool/analysis/interfaces/__init__.py
. Then use the --dry-run
flag to test your analysis on an existing dataset.
If you don't want to use the command-line utility, or you want to create some other way for the user to select options before importing a dataset or running an analysis, then use the methods as found in import_tool/import_system_utilities.py
. This file contains the methods that import the data into the database from the interfaces as mentioned above. See the documentation in the file for more details.