Skip to content

Error cascade when trying to use Gemini CLI modelsΒ #233

@mrbungie

Description

@mrbungie

Pre-submission checklist

  • I have searched existing issues for duplicates
  • I have read the πŸ“‹ Troubleshooting Guide
  • I have read the README installation instructions

Model used

gemini-3-flash-preview, gemini-2.5-flash (Gemini CLI models)

Note: Antigravity models (antigravity-gemini-3-pro, antigravity-claude-sonnet-4-5, etc.) work correctly. This issue only affects Gemini CLI model routing.

Exact error message

429: You have exhausted your capacity on this model. Your quota will reset after 58s.
400: Request contains an invalid argument.
403: You are currently configured to use a Google Cloud Project but lack a Gemini Code Assist license. (#3501)

Bug description

When using opencode-antigravity-auth to proxy requests through Gemini CLI endpoints (not Antigravity endpoints), multiple parallel requests trigger a cascade of errors across different fallback endpoints before eventually succeeding. This results in:

  1. Unnecessary latency as the plugin tries multiple failing endpoints
  2. Poor UX with intermittent failures
  3. Wasted quota on endpoints that will fail anyway

The issue appears when OpenCode fires concurrent requests (e.g., title generation + summarization + main prompt simultaneously).

Important: Antigravity-prefixed models work fine. The issue is specific to Gemini CLI models like gemini-3-flash-preview and gemini-2.5-flash.

Steps to reproduce

  1. Configure opencode-antigravity-auth with a single Google account
  2. Configure Gemini CLI models in opencode.json (see config below)
  3. Start OpenCode and send a message using a Gemini CLI model
  4. OpenCode fires multiple parallel requests (title gen, summary, main response)
  5. First request succeeds on daily-cloudcode-pa.sandbox.googleapis.com
  6. Second request hits 429 rate limit on same endpoint
  7. Third request with thinkingConfig gets 400 Bad Request
  8. Fallback to autopush-cloudcode-pa.sandbox.googleapis.com returns 403 (license required)
  9. Finally succeeds on cloudcode-pa.googleapis.com

Did this ever work?

Sometimes - depends on timing and request volume

Number of Google accounts configured

1

Reproducibility

Often (50-90% of the time)

Plugin version

1.3.0

OpenCode version

1.1.25

Operating System

macOS

Node.js version

v22.17.0

Environment type

Standard terminal

Debug logs (REQUIRED)

Request Flow (anonymized)

Request Model Endpoint Result Latency
1 gemini-3-flash-preview daily-cloudcode-pa.sandbox.* βœ… 200 1378ms
2 gemini-3-flash-preview daily-cloudcode-pa.sandbox.* ❌ 429 ~350ms
3 gemini-2.5-flash daily-cloudcode-pa.sandbox.* ❌ 400 728ms
4 gemini-3-flash-preview autopush-cloudcode-pa.sandbox.* ❌ 403 1289ms
5 gemini-3-flash-preview cloudcode-pa.googleapis.com βœ… 200 1170ms

Error Details

429 Rate Limit (Request 2):

{
  "error": {
    "code": 429,
    "message": "You have exhausted your capacity on this model. Your quota will reset after 58s.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [{
      "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      "reason": "RATE_LIMIT_EXCEEDED",
      "domain": "cloudcode-pa.googleapis.com"
    }]
  }
}

400 Invalid Argument (Request 3):

{
  "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "status": "INVALID_ARGUMENT"
  }
}

This request included thinkingConfig: { includeThoughts: true, thinkingBudget: 16000 } which may not be supported on this endpoint.

403 License Required (Request 4):

{
  "error": {
    "code": 403,
    "message": "You are currently configured to use a Google Cloud Project but lack a Gemini Code Assist license. (#3501)",
    "status": "PERMISSION_DENIED",
    "details": [{
      "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      "reason": "SUBSCRIPTION_REQUIRED",
      "domain": "cloudaicompanion.googleapis.com"
    }]
  }
}

Configuration

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "chrome-devtools": {
      "type": "local",
      "command": ["npx", "-y", "chrome-devtools-mcp@latest"]
    },
    "mantic": {
      "type": "local",
      "command": ["npx", "-y", "mantic.sh@latest", "server"]
    }
  },
  "plugin": [
    "opencode-antigravity-auth@latest",
    "oh-my-opencode"
  ],
  "provider": {
    "google": {
      "models": {
        "antigravity-gemini-3-pro": {
          "name": "Gemini 3 Pro (Antigravity)",
          "limit": { "context": 1048576, "output": 65535 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
          "variants": {
            "low": { "thinkingLevel": "low" },
            "high": { "thinkingLevel": "high" }
          }
        },
        "antigravity-gemini-3-flash": {
          "name": "Gemini 3 Flash (Antigravity)",
          "limit": { "context": 1048576, "output": 65536 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
          "variants": {
            "minimal": { "thinkingLevel": "minimal" },
            "low": { "thinkingLevel": "low" },
            "medium": { "thinkingLevel": "medium" },
            "high": { "thinkingLevel": "high" }
          }
        },
        "antigravity-claude-sonnet-4-5": {
          "name": "Claude Sonnet 4.5 (Antigravity)",
          "limit": { "context": 200000, "output": 64000 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] }
        },
        "antigravity-claude-sonnet-4-5-thinking": {
          "name": "Claude Sonnet 4.5 Thinking (Antigravity)",
          "limit": { "context": 200000, "output": 64000 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
          "variants": {
            "low": { "thinkingConfig": { "thinkingBudget": 8192 } },
            "max": { "thinkingConfig": { "thinkingBudget": 32768 } }
          }
        },
        "antigravity-claude-opus-4-5-thinking": {
          "name": "Claude Opus 4.5 Thinking (Antigravity)",
          "limit": { "context": 200000, "output": 64000 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] },
          "variants": {
            "low": { "thinkingConfig": { "thinkingBudget": 8192 } },
            "max": { "thinkingConfig": { "thinkingBudget": 32768 } }
          }
        },
        "gemini-2.5-flash": {
          "name": "Gemini 2.5 Flash (Gemini CLI)",
          "limit": { "context": 1048576, "output": 65536 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] }
        },
        "gemini-2.5-pro": {
          "name": "Gemini 2.5 Pro (Gemini CLI)",
          "limit": { "context": 1048576, "output": 65536 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] }
        },
        "gemini-3-flash-preview": {
          "name": "Gemini 3 Flash Preview (Gemini CLI)",
          "limit": { "context": 1048576, "output": 65536 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] }
        },
        "gemini-3-pro-preview": {
          "name": "Gemini 3 Pro Preview (Gemini CLI)",
          "limit": { "context": 1048576, "output": 65535 },
          "modalities": { "input": ["text", "image", "pdf"], "output": ["text"] }
        }
      }
    }
  }
}

Suggested Improvements

  1. Request queuing: Serialize requests to same endpoint to avoid instant 429s
  2. Endpoint capability detection: Skip autopush-* endpoints if user lacks Code Assist license
  3. thinkingConfig validation: Check if endpoint/model supports thinking before sending
  4. Smarter fallback: Skip known-bad endpoints rather than trying all in sequence
  5. Retry-After handling: Respect the 58s reset time from 429 response

Compliance

  • I'm using this plugin for personal development only
  • This issue is not related to commercial use or TOS violations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions