Skip to content
This repository was archived by the owner on Jan 19, 2019. It is now read-only.

Allow loading already-indexed data #352

Open
matt-gardner opened this issue May 11, 2017 · 0 comments
Open

Allow loading already-indexed data #352

matt-gardner opened this issue May 11, 2017 · 0 comments

Comments

@matt-gardner
Copy link
Contributor

This would cut down pre-processing time, at the expense of having to make sure you're using the right vocabulary files and such. It would probably also make some of the sequence tagging stuff simpler.

This depends on #328, and you would basically have an option in each script to output a pre-indexed file, running the data indexing code and saving the results. Or maybe this would be a stand-alone script that just ran the pre-processing and saved the data indexer... The second option is probably cleaner, and doesn't depend on #328. You'd have to also add an option to TextTrainer that tells it it's loading a pre-indexed dataset, and add a way to save and load IndexedInstances (maybe just pickling them...)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant