Skip to content

Latest commit

 

History

History
28 lines (22 loc) · 1.73 KB

File metadata and controls

28 lines (22 loc) · 1.73 KB

TODO.md

  • Rename repositories in the packages/storage to use the word Storage instead of Repository.

  • Vector Storage (not chunk storage)

    • Rename the files from packages/storage/src/vector-storage to packages/storage/src/vector
    • No fixed column names, use the schema to define the columns.
    • Option for which column to use if there are multiple, default to the first one.
    • Use @mceachen/sqlite-vec for sqlite storage.
  • Chunks and nodes are not always the same.

    • And we may need to save the chunk's node path. Or paths? or document range? Standard metadata?
    • Instead of passing doc_id around, pass a document key that is of type unknown (string or object)
  • Get a better model for question answering.

  • Get a better model for named entity recognition, the current one recognized everything as a token, not helpful.

  • Titles are not making it into the chunks.

  • Tests for CLI commands.

  • Add ability for queues to specify if inputs should be converted to text, binary blob, a transferable object, structured clone, or just passed as is.

  • Add specialized versions of the task queues for hugging face transformers and tensorflow mediapipe.

  • Audio conversion like the image conversion

  • rename the registration stuff to not look ugly: registerHuggingfaceTransformers() and registerHuggingfaceTransformersUsingWorkers() and registerHuggingfaceTransformersInsideWorker()

  • fix image transferables

  • Consider different ways to connect tasks to queues. What is a task? What is a job?

onnx-community/ModernBERT-finetuned-squad-ONNX - summarization

The sqlitevectorstorage currently does not use a built in vector search. Use @mceachen/sqlite-vec for sqlite storage vector indexing.