Releases: meta-pytorch/tokenizers
Releases · meta-pytorch/tokenizers
v1.0.1
What's Changed
- Add installation instructions to README by @mergennachin in #148
- Changes required to release pip wheel by @larryliu0820 in #150
Full Changelog: v1.0.0...v1.0.1
v1.0.0
What's Changed
- Add base64.h by @larryliu0820 in #1
- Enable trunk CI job by @larryliu0820 in #2
- Add base64 test by @larryliu0820 in #3
- Add CODE_OF_CONDUCT by @larryliu0820 in #4
- Add log.h by @larryliu0820 in #5
- Add tiktoken by @larryliu0820 in #6
- Add tiktoken tests by @larryliu0820 in #7
- Change license to align with torchchat/ET. by @larryliu0820 in #8
- Adding Contributing file by @facebook-github-bot in #9
- Include tk_pal_emit_log_message tk_pal_log_level_t into tokenizers by @larryliu0820 in #10
- HF Tokenizers by @gabe-l-hart in #11
- Change CMAKE_SOURCE_DIR to CMAKE_CURRENT_SOURCE_DIR by @larryliu0820 in #12
- Improve error message by @larryliu0820 in #13
- Format by @larryliu0820 in #14
- Pull machine by @larryliu0820 in #15
- Update trunk job to use macos-14-xlarge by @larryliu0820 in #16
- Buckify tokenizers by @larryliu0820 in #17
- Add llama2.c tokenizers by @larryliu0820 in #19
- Move headers from include/ to include/pytorch/tokenizers/ by @larryliu0820 in #20
- Migrate extension/llm/tokenizer python users to use the new repo by @larryliu0820 in #22
- Migrate helios' usage of extension/llm/tokenizer to pytorch/tokenizers by @larryliu0820 in #23
- Tokenizer test by @lucylq in #21
- Add setup.py by @larryliu0820 in #24
- Fix tiktoken test by @lucylq in #25
- Add a TARGETS file for executorch's buck build in oss by @larryliu0820 in #26
- Use executorch's glob_defs.bzl by @larryliu0820 in #28
- Extending build systems to include FBCode by @ChristianWLang in #27
- Add override to destructor by @larryliu0820 in #29
- Fix clang format by @larryliu0820 in #30
- Update bpe_tokenizer_base.h by @larryliu0820 in #31
- clang format the rest of code by @larryliu0820 in #32
- Rely on runtime_wrapper to provide supported platforms by @larryliu0820 in #33
- Add .clang-format by @larryliu0820 in #34
- Add TARGETS file for internal build by @larryliu0820 in #35
- Fix build for tokenizer tool script by @jackzhxng in #36
- Port string_integer_map and changes to tiktoken to pytorch by @larryliu0820 in #37
- Remove old tokenizer/ directory in ExecuTorch by @larryliu0820 in #39
- fix qnn export by @cccclai in #41
- Make it public by @helunwencser in #42
- Update pattern key for split pretokenizer by @jackzhxng in #38
- Include build files in gitignore by @jathu in #46
- Use common base class private functions for TikToken by @jackzhxng in #45
- Add regex interface with re2 and std::regex implementations by @jackzhxng in #48
- Decouple tokenizers from Re2 and use IRegex interface by @jackzhxng in #49
- Add pcre2 as re2 fallback by @jackzhxng in #50
- Initialize bos_tok_ = 0 in tokenizer.h by @kirklandsign in #54
- Pcre2 buck target in third-party (#55) by @jackzhxng in #58
- Gate regex lookahead in cmake behind compile flag by @jackzhxng in #59
- Fix CQS signal. Id] 29511954 -- readability-redundant-string-init in fbcode/pytorch/tokenizers by @facebook-github-bot in #57
- Fix pcre2 target by @jackzhxng in #60
- Log hf tokenizer load failure to cerr instead of cout by @jackzhxng in #61
- Add cmakelist for llama unicode by @jackzhxng in #62
- Fix duplicate bpe tokenizer base symbol by @jackzhxng in #63
- Forward fix missing vtable for bpe tokenizer by @jackzhxng in #64
- Handle null bos and eos token by @jackzhxng in #66
- Fix tokenizer special token handling by @jackzhxng in #67
- Accept custom pattern string and special tokens by @sxu in #69
- Revert "Fix tokenizer special token handling" by @jackzhxng in #72
- Revert "Handle null bos and eos token" by @jackzhxng in #73
- Add look ahead tiktoken target by @larryliu0820 in #75
- Add is_loaded() API by @larryliu0820 in #53
- Use weak symbol create_fallback_regex to separate the implementation using PCRE2 and std::regex by @larryliu0820 in #77
- Add regex unit tests and enable shared linkage in fbcode by @larryliu0820 in #78
- Reland #66 and #67 by @jackzhxng in #74
- Fix cmake for regex lookahead by @jackzhxng in #80
- Enable install find package by @larryliu0820 in #82
- Consolidate TokenIndex definition by @kimishpatel in #84
- Add sentencepiece tokenizer support to llm runner by @larryliu0820 in #85
- [hf] Add new features to HF tokenizer by @larryliu0820 in #87
- Use a small tokenizer.json for unit test (#92) by @larryliu0820 in #94
- Add support for more behavior in Split pretokenizer by @larryliu0820 in #93
- Fix test_hf_tokenizer by @larryliu0820 in #95
- Support gemma3 HF tokenizer.json by @larryliu0820 in #96
- Add Package.swift by @larryliu0820 in #97
- Add python bindings by @larryliu0820 in #98
- Only run python test in fbcode by @larryliu0820 in #100
- Fix SmolLM3 support and add unit test to cover it by @larryliu0820 in #102
- Support NFC Normalizer by @larryliu0820 in #104
- Don't build any binaries for 3rd party deps like sentencepiece by @shoumikhin in #105
- Fixes in the CMake install setup for EXPORT by @swolchok in #103
- Remove unused file by @larryliu0820 in #107
- Fix lint issues by @larryliu0820 in #108
- Don't use CMAKE_INSTALL_PREFIX in tokenizers-config.cmake by @swolchok in #106
- Install the sentencepiece headers and fix include path by @swolchok in #110
- Rename build files from TARGETS to BUCK (group ID: -4292110907067644689) by @bigfootjon in #112
- Remove unused exception parameter from pymk/aggregator/early_stage_ranking/EarlyStageRanker.cpp by @r-barnes in #117
- Add Tekken tokenizer implementation with Python bindings by @mergennachin in #118
- Add tekken to the tokenize tool by @jackzhxng in #119
- [EZ] Replace
pytorch-labswithmeta-pytorchby @ZainRizvi in #120 - [Windows] Fix build issues using Clang-CL on Windows, add CI by @GregoryComer in #121
- Forward fix tekken tokeniz...