feat: remove pympler dependency and add better way to calculate size of tokenizer cache #3580

jacopo-chevallard · 2025-01-30T11:31:08Z

We now compute the size of the tokenizer cache using the disk file size of the different tokenizer files

linear · 2025-01-30T11:31:12Z

AmineDiro · 2025-01-30T11:47:25Z

core/quivr_core/llm/llm_endpoint.py

+        if not hasattr(self.tokenizer, "vocab_files_names") or not hasattr(
+            self.tokenizer, "init_kwargs"
+        ):
+            return 5 * 1024 * 1024


core/quivr_core/llm/llm_endpoint.py

🤖 I have created a release *beep* *boop* --- ## [0.0.31](core-0.0.30...core-0.0.31) (2025-01-30) ### Features * cache tokenizers ([#3558](#3558)) ([699dc2e](699dc2e)) * limit tokenizers cache size ([#3577](#3577)) ([e2a3bcb](e2a3bcb)) * remove pympler dependency and add better way to calculate size of tokenizer cache ([#3580](#3580)) ([2fbd5d4](2fbd5d4)) * remove tokenizer load ([#3576](#3576)) ([05e212a](05e212a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

jacopo-chevallard added 3 commits January 30, 2025 11:06

chore: removed pympler dependency

3255114

feat: computing the size of the object without using pympler

79a2a8a

feat: compute tokenizer size from the size of the tokenizer files

619782d

jacopo-chevallard requested a review from AmineDiro January 30, 2025 11:31

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 30, 2025

jacopo-chevallard changed the title ~~feature: remove pympler dependency and add better way to calculate size of tokenizer cache~~ feat: remove pympler dependency and add better way to calculate size of tokenizer cache Jan 30, 2025

AmineDiro reviewed Jan 30, 2025

View reviewed changes

feat: using variable for default size of tokenizer

09cbebe

AmineDiro approved these changes Jan 30, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 30, 2025

AmineDiro merged commit 2fbd5d4 into main Jan 30, 2025
6 checks passed

AmineDiro deleted the feature/core-354-remove-pympler-dependency branch January 30, 2025 12:23

StanGirard mentioned this pull request Jan 30, 2025

chore(main): release core 0.0.31 #3560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: remove pympler dependency and add better way to calculate size of tokenizer cache #3580

feat: remove pympler dependency and add better way to calculate size of tokenizer cache #3580

jacopo-chevallard commented Jan 30, 2025

linear bot commented Jan 30, 2025

AmineDiro Jan 30, 2025

feat: remove pympler dependency and add better way to calculate size of tokenizer cache #3580

feat: remove pympler dependency and add better way to calculate size of tokenizer cache #3580

Conversation

jacopo-chevallard commented Jan 30, 2025

linear bot commented Jan 30, 2025

AmineDiro Jan 30, 2025

Choose a reason for hiding this comment