A Basic Byte Pair Encoder Tokeniser with a GUI and included training data.
Trains on some given text data, and then reconstructs the input.
To Do:
- handle memory better for 1 gb txt file (ie text8)
- CLI
- remove the unsafe
- improve the README
A Basic Byte Pair Encoder Tokeniser with a GUI and included training data.
Trains on some given text data, and then reconstructs the input.
To Do: