-
Notifications
You must be signed in to change notification settings - Fork 44
Closed
Description
Description
When using tiktoken-go (v0.1.7) in aiproxy service for token counting, we're experiencing severe CPU saturation. The profiling data shows that tiktoken's byte pair encoding operations are consuming nearly all CPU resources during request processing.
Environment
- Library:
github.com/pkoukk/[email protected] - Go Version: 1.25.0
- OS/Arch: Linux/amd64
Performance Profile Analysis
Based on CPU profiling over 10.18 seconds with 4.94s of samples:
Hotspot Breakdown
-
bytePairMerge(34.01% of total time)- 1680ms spent in this function alone
- Called repeatedly from
bytePairEncode
-
runtime.memmove(63.16% of total time)- 3120ms spent in memory copy operations
- Called extensively from
bytePairMergeat line 54 - Multiple call sites indicating heavy slice manipulation
Call Chain
CountTokenText
→ getTokenNum
→ tiktoken.Encode
→ CoreBPE.encodeNative
→ bytePairEncode
→ bytePairMerge (multiple calls)
→ runtime.memmove (63% of CPU time)
Impact
- CPU Usage: Near 100% CPU saturation during token counting
- Request Latency: Significant delays in response processing
- Service Throughput: Severely limited by token counting operations
Code Context
The issue occurs in our streaming response handler when calculating token usage:
openai.StreamHandler
→ ResponseText2Usage
→ CountTokenText
→ tiktoken operations
Root Cause Analysis
The profiling clearly shows that:
bytePairMergeis performing excessive memory operations- The slice manipulation at
bpe.go:54is causing repeatedmemmovecalls - The algorithm appears to have O(n²) or worse complexity for certain inputs
Reproduction
Currently unable to reproduce stably
Attached Resources
Expected Behavior: Token counting should be a lightweight operation that doesn't dominate CPU usage.
Actual Behavior: Token counting consumes 100% of CPU time, with 63% spent in memory copy operations alone.
Metadata
Metadata
Assignees
Labels
No labels