Allow loading already-indexed data #352

matt-gardner · 2017-05-11T18:37:03Z

This would cut down pre-processing time, at the expense of having to make sure you're using the right vocabulary files and such. It would probably also make some of the sequence tagging stuff simpler.

This depends on #328, and you would basically have an option in each script to output a pre-indexed file, running the data indexing code and saving the results. Or maybe this would be a stand-alone script that just ran the pre-processing and saved the data indexer... The second option is probably cleaner, and doesn't depend on #328. You'd have to also add an option to TextTrainer that tells it it's loading a pre-indexed dataset, and add a way to save and load IndexedInstances (maybe just pickling them...)

The text was updated successfully, but these errors were encountered:

matt-gardner added New API feature P2 Performance improvement labels May 11, 2017

matt-gardner mentioned this issue Jun 1, 2017

Let datasets know how to load themselves #378

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow loading already-indexed data #352

Allow loading already-indexed data #352

matt-gardner commented May 11, 2017

Allow loading already-indexed data #352

Allow loading already-indexed data #352

Comments

matt-gardner commented May 11, 2017