Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Oct 23, 2025

Summary

  • Return to simple, robust core family substring matching across the full raw model string
  • Remove fnmatch/globbing and stop using normalization for feature detection
  • Update pattern tables to pure substrings (no wildcards)
  • Adjust tests accordingly and add coverage that validates Bedrock-style names

What & Why
Recent refactor introduced fnmatch-based globbing over a normalized basename. This unintentionally diverged from the prior V0 behavior where we effectively matched by substring on the full provider/model name. That change broke real-world cases, notably with AWS Bedrock where names embed dotted vendor prefixes and version suffixes inside the basename (e.g., bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0). Our patterns like 'claude-3-5-sonnet*' stopped matching after normalization and globbing.

This PR restores the durable invariant: if a meaningful family token (e.g., 'claude-3-5-sonnet', 'gpt-4o', 'o3', 'gemini-2.5-pro') appears anywhere in the model string, the feature applies. This eliminates the pattern maintenance whack-a-mole caused by dotted prefixes and provider-specific suffixes and aligns again with proven behavior in the wild.

Implementation Details

  1. model_matches()

    • Lowercase + strip the incoming model string and perform case-insensitive substring checks on the full raw string
    • For each pattern, lowercase/strip and drop any trailing '*' (migration aid); treat the remaining token as a plain substring
    • Return True on first match; False otherwise
    • No use of normalize_model_name() here
  2. Pattern tables: remove '*'

    • FUNCTION_CALLING_PATTERNS, REASONING_EFFORT_PATTERNS, PROMPT_CACHE_PATTERNS, SUPPORTS_STOP_WORDS_FALSE_PATTERNS, RESPONSES_API_PATTERNS now contain pure substrings
    • Provider-qualified entries remain supported by virtue of substring matching against the raw string
  3. normalize_model_name()

    • Not used by matching. Tests exercising normalization for matching were removed to avoid confusion
  4. Tests

    • Remove wildcard expectations; adapt to pure substring semantics
    • Ensure Bedrock coverage: e.g., 'bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0' enables function calling and prompt cache
    • Verify provider-qualified substrings gate as expected (e.g., 'openai/gpt-4o' matches 'openai/gpt-4o' but not 'anthropic/*')
    • Keep conservative defaults for unknown models

Outcomes

  • Clear behavior: if the essential family token appears in the model string, the feature applies
  • Fewer special-case patterns and more durable matching across providers
  • Restores pre-refactor semantics that worked reliably in practice

Checklist

  • Code formatted and linted via pre-commit
  • Updated tests for sdk changes; all impacted sdk tests pass locally

Closes #844

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Base Image Docs / Tags
golang golang:1.21-bookworm Link
java eclipse-temurin:17-jdk Link
python nikolaik/python-nodejs:python3.12-nodejs22 Link

Pull (multi-arch manifest)

docker pull ghcr.io/openhands/agent-server:60d1363-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-60d1363-python \
  ghcr.io/openhands/agent-server:60d1363-python

All tags pushed for this build

ghcr.io/openhands/agent-server:60d1363-golang
ghcr.io/openhands/agent-server:v1.0.0a4_golang_tag_1.21-bookworm_binary
ghcr.io/openhands/agent-server:60d1363-java
ghcr.io/openhands/agent-server:v1.0.0a4_eclipse-temurin_tag_17-jdk_binary
ghcr.io/openhands/agent-server:60d1363-python
ghcr.io/openhands/agent-server:v1.0.0a4_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The 60d1363 tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

Cross-repo impact: Fix: OpenHands/OpenHands#11248

…normalize usage

- model_matches now does case-insensitive substring on full raw model
- strip trailing '*' in patterns (migration aid)
- pattern tables converted to plain substrings (no '*')
- drop normalize_model_name and related tests
- update tests to reflect substring semantics and Bedrock coverage

Fixes #844

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 23, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL10595464756% 
report-only-changed-files is enabled. No files were changed during this commit :)

@enyst enyst marked this pull request as draft October 23, 2025 18:51
enyst and others added 4 commits October 23, 2025 18:53
…handling and empty-token skipping

- Patterns are now used exactly as provided (lowercased/stripped)
- No special handling for '*' or empty tokens

Co-authored-by: openhands <[email protected]>
…eature detection

- Validate provider-prefixed Bedrock ids and plain vendor-prefixed names
- Ensure function-calling and prompt-cache features are enabled for Claude families

Co-authored-by: openhands <[email protected]>
…edrock dotted vendor prefixes

- Function-calling: adds claude-sonnet-4-5 and claude-sonnet-4.5, and us.anthropic.* examples
- Prompt cache: keep only supported families; drop unsupported haiku-4.5 dotted vendor case

Co-authored-by: openhands <[email protected]>
… extend tests with dotted vendor forms

- Add claude-haiku-4.5 and claude-haiku-4-5 to PROMPT_CACHE_PATTERNS
- Expand tests for us.anthropic.* and local names for Haiku 4.5

Co-authored-by: openhands <[email protected]>
@enyst enyst added the integration-test Runs the integration tests and comments the results label Oct 23, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 3
Timestamp: 2025-10-23 22:32:12 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_deepseek_deepseek_chat 0.0% 0/7 7 $0.00
litellm_proxy_openai_gpt_5_mini 0.0% 0/7 7 $0.00
litellm_proxy_anthropic_claude_sonnet_4_5_20250929 0.0% 0/7 7 $0.00

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 0.0% (0/7)
  • Total Cost: $0.00
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_4b0e7bd_deepseek_run_N7_20251023_223118

Failed Tests:

  • t07_interactive_commands: Test execution failed: Conversation run failed for id=3381176d-bbf5-4e09-aa99-9d37f0172810: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=daa50858-8bda-46ca-b17d-1dfb5ed26937: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=caa42434-1fd8-4c0c-98b7-4a2d90387e67: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=d654e711-96c2-46dc-bac9-363f0cf5ac83: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=8b07fea4-494d-4dc1-bb65-dba27a4ee011: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=ac31aab6-3f85-4776-8701-6957bf2a1afc: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=691a3e8d-1605-4167-9798-3109f516c82a: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

litellm_proxy_openai_gpt_5_mini

  • Success Rate: 0.0% (0/7)
  • Total Cost: $0.00
  • Run Suffix: litellm_proxy_openai_gpt_5_mini_4b0e7bd_gpt5_mini_run_N7_20251023_223120

Failed Tests:

  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=7ea1018f-3f7d-421f-a3d4-704ce271e6f8: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=1e745fe5-2578-4960-8be8-9f134b8e40fe: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=b29a12b7-02a4-47ca-bcd0-1276d9e7a115: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=fe526d6f-12e4-4d0b-86cd-ee6190e0f417: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=6339e0ed-571b-43cd-83b8-741ee3a51af9: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=df8efe23-aad7-4a43-be6f-0e1f0ae2ae23: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=c7ebc958-d96b-411e-a31f-a1d7a8b295c2: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - {"error":{"message":"Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable","type":"token_not_found_in_db","param":"key","code":"401"}} (Cost: $0.00)

litellm_proxy_anthropic_claude_sonnet_4_5_20250929

  • Success Rate: 0.0% (0/7)
  • Total Cost: $0.00
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_5_20250929_4b0e7bd_sonnet_run_N7_20251023_223118

Failed Tests:

  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=442e74c5-486d-4649-b83a-44faf8ecb301: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=8da7a4d4-b9f2-4e00-b64f-c1cc0303e3ae: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=780227fd-fdf6-488b-8bdd-e82b8e9a7e5f: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=b1d32414-3058-46cd-b012-7f5010c03ac4: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=22cf4b92-a21f-4e8d-8921-02785e74cc71: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=ede7d737-1649-4025-a454-6eba705dc062: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=5d32c77b-fe3d-43c7-aede-ff4de873af4e: litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...T9_Q, Key Hash (Token) =61c9fb32902f3b0764b58f832bcf8f0908410d7664c9b8d9030801eba8dbffde. Unable to find token in cache or LiteLLM_VerificationTokenTable (Cost: $0.00)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

2 participants