Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 264 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 264 Bytes

A Basic Byte Pair Encoder Tokeniser with a GUI and included training data.

Trains on some given text data, and then reconstructs the input.

To Do:

  • handle memory better for 1 gb txt file (ie text8)
  • CLI
  • remove the unsafe
  • improve the README