Skip to content

ConnorArmstrong/rs-tokeniser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Basic Byte Pair Encoder Tokeniser with a GUI and included training data.

Trains on some given text data, and then reconstructs the input.

To Do:

  • handle memory better for 1 gb txt file (ie text8)
  • CLI
  • remove the unsafe
  • improve the README

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published