bug: tiktoken causing CPU saturation in aiproxy

## Description

When using `tiktoken-go` (v0.1.7) in aiproxy service for token counting, we're experiencing severe CPU saturation. The profiling data shows that tiktoken's byte pair encoding operations are consuming nearly all CPU resources during request processing.

## Environment

- **Library**: `github.com/pkoukk/tiktoken-go@v0.1.7`
- **Go Version**: 1.25.0
- **OS/Arch**: Linux/amd64

## Performance Profile Analysis

Based on CPU profiling over 10.18 seconds with 4.94s of samples:

### Hotspot Breakdown

1. **`bytePairMerge` (34.01% of total time)**
   - 1680ms spent in this function alone
   - Called repeatedly from `bytePairEncode`

2. **`runtime.memmove` (63.16% of total time)**
   - 3120ms spent in memory copy operations
   - Called extensively from `bytePairMerge` at line 54
   - Multiple call sites indicating heavy slice manipulation

### Call Chain

```
CountTokenText 
  → getTokenNum 
    → tiktoken.Encode 
      → CoreBPE.encodeNative 
        → bytePairEncode 
          → bytePairMerge (multiple calls)
            → runtime.memmove (63% of CPU time)
```

## Impact

- **CPU Usage**: Near 100% CPU saturation during token counting
- **Request Latency**: Significant delays in response processing
- **Service Throughput**: Severely limited by token counting operations

## Code Context

The issue occurs in our streaming response handler when calculating token usage:

```
openai.StreamHandler 
  → ResponseText2Usage 
    → CountTokenText 
      → tiktoken operations
```

## Root Cause Analysis

The profiling clearly shows that:

1. `bytePairMerge` is performing excessive memory operations
2. The slice manipulation at `bpe.go:54` is causing repeated `memmove` calls
3. The algorithm appears to have O(n²) or worse complexity for certain inputs

## Reproduction

Currently unable to reproduce stably

## Attached Resources

<img width="1868" height="928" alt="Image" src="https://github.com/user-attachments/assets/d1a4bc31-f415-439a-8cb2-e2bcf2c84e3b" />

[aiproxy-profile.pb.gz](https://github.com/user-attachments/files/23070443/aiproxy-profile.pb.gz)

---

**Expected Behavior**: Token counting should be a lightweight operation that doesn't dominate CPU usage.

**Actual Behavior**: Token counting consumes 100% of CPU time, with 63% spent in memory copy operations alone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: tiktoken causing CPU saturation in aiproxy #400

Description

Environment

Performance Profile Analysis

Hotspot Breakdown

Call Chain

Impact

Code Context

Root Cause Analysis

Reproduction

Attached Resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: tiktoken causing CPU saturation in aiproxy #400

Description

Description

Environment

Performance Profile Analysis

Hotspot Breakdown

Call Chain

Impact

Code Context

Root Cause Analysis

Reproduction

Attached Resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions