[BUG] Phoenix UI not showing VertexAI Token Count #1287

TheMellyBee · 2025-02-14T21:58:18Z

Describe the bug
UI isn't showing token count for VertexAI after building up the callback manager when using LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

However, I can see them on the log on completion.

Code

# -*- coding: utf-8 -*-
"""LLM Settings

This module contains the settings for the LLM used by LlamaIndex.
Uses a singleton pattern to manage initialization state.

Attributes:
    None

Todo:
    * Add support for more LLM models as they become available
"""
import logging
from typing import Optional, Union

from llama_index.core import Settings
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.llms.vertex import Vertex
from vertexai.generative_models import HarmCategory, HarmBlockThreshold, SafetySetting
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from vertexai.preview.tokenization import get_tokenizer_for_model

from .constants import GeminiModel

def patched_get_string_tokens(self, string: str) -> int:
    """Get the token count for a string."""
    return self.tokenizer(string).total_tokens

class SettingsManager:
    _instance = None
    _initialized = False
    _current_model = None

    @classmethod
    def get_instance(cls):
        if cls._instance is None:
            cls._instance = cls()
        return cls._instance

    @classmethod
    def reset(cls):
        """Reset the singleton state - should only be used in testing"""
        cls._instance = None
        cls._initialized = False
        cls._current_model = None
        # Reset global Settings
        Settings.llm = None
        Settings.embed_model = None
        Settings.callback_manager = None

    def is_initialized(self) -> bool:
        return self._initialized

    def get_current_model(self) -> Optional[GeminiModel]:
        return self._current_model
    
    def gemini_tokenizer(self, text: str) -> int:
        """
        Get the number of tokens in the text using the Gemini tokenizer.
        """
        tokenizer = get_tokenizer_for_model(str(self._current_model))
        
        return tokenizer.count_tokens(text)


    def initialize(self, model_id: Union[str, GeminiModel] = GeminiModel.get_default()) -> None:
        """Initialize settings with the specified model (defaults to GeminiModel.get_default())"""
        model = model_id if GeminiModel.is_valid_model(model_id) else GeminiModel.get_default()

        # Set up embedding model
        embed_model = GeminiEmbedding(
            model_name="models/text-embedding-004", 
            embed_batch_size=1
        )

        # Set up safety config for the LLM
        safety_config = [
            SafetySetting(
                category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
                threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
            ),
            SafetySetting(
                category=HarmCategory.HARM_CATEGORY_HARASSMENT,
                threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
            ),
            SafetySetting(
                category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
                threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
            ),
            SafetySetting(
                category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
                threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
            ),
        ]

        # Initialize Vertex LLM
        Vertex.update_forward_refs()
        
        llm = Vertex(
            model=str(model),
            temperature=0.1,
            safety_settings=safety_config
        )

        token_counter = TokenCountingHandler(tokenizer=self.gemini_tokenizer, verbose=True)
        token_counter._token_counter.get_string_tokens = patched_get_string_tokens.__get__(token_counter._token_counter, token_counter._token_counter.__class__)
        Settings.callback_manager = CallbackManager([token_counter])


        # Update global settings
        Settings.llm = llm
        Settings.embed_model = embed_model
    
        
        self._current_model = model
        self._initialized = True
        logging.info(f"Settings initialized with model: {model}")

        # Patch for fixing TokenCountingHandler bug.
  

def initialize_settings(llm_model_id: Optional[Union[str, GeminiModel]] = None):
    """
    Initialize global settings for LlamaIndex if not already initialized,
    or if a new model is specified.

    Args:
        llm_model_id (Union[str, GeminiModel], optional): The LLM model to use. Can be either a string
            matching a valid model name or a GeminiModel enum. Defaults to FLASH_1_5_002.
    """
    settings_manager = SettingsManager.get_instance()
    
    # Only initialize if:
    # 1. Settings haven't been initialized yet, or
    # 2. A specific model is requested that's different from current
    if (not settings_manager.is_initialized() or 
        (llm_model_id is not None and 
         GeminiModel.is_valid_model(llm_model_id) and
         str(llm_model_id) != str(settings_manager.get_current_model()))):
        settings_manager.initialize(llm_model_id)


def get_settings() -> Settings:
    """
    Get the current global settings.
    This is useful when you need settings for a specific context.

    Returns:
        Settings: The current global settings instance
    """
    # Initialize if not already done
    if not SettingsManager.get_instance().is_initialized():
        initialize_settings()
    return Settings

def reset_settings():
    """Reset all settings to their initial state. Should only be used in testing."""
    SettingsManager.reset()

if __name__ == "__main__":
    initialize_settings()

Expected behavior
A clear and concise description of what you expected to happen.
Should show the tokens.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: MacOS
Browser chrome

Additional context
Add any other context about the problem here (e.x. a link to a colab)

VertexAIInstrumentor().instrument(tracer_provider=tracer_provider)
Will get the tokens, but it comes in a a different trace entirely (makes sense). But if they can it seems like this should.

TheMellyBee · 2025-02-14T22:04:10Z

https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-vertex/llama_index/llms/vertex/base.py

I'm also seeing the kwargs are getting passed into the completion. Which along with the completion and callback getting put to the log, it appears the miss is on the Arize UI side.

mikeldking · 2025-02-14T22:51:43Z

Hey @TheMellyBee ! Thanks so much for reporting this! I believe this is on the instrumentation side so will be transferring it over to that repository. We will get this prioritized and looked at. Thanks for the thorough analysis!

cc @nate-mar

dosubot · 2025-02-14T22:54:11Z

The issue you're experiencing with the Phoenix UI not displaying the token count for VertexAI, despite it being visible in the logs, might be related to how the TokenCountingHandler is integrated and how the UI retrieves and displays this information.

Here are a few potential areas to investigate:

TokenCountingHandler Configuration: Ensure that the TokenCountingHandler is correctly configured and integrated within the CallbackManager. The handler should be properly set up to count tokens using the gemini_tokenizer, and any modifications, such as the patched_get_string_tokens, should be correctly applied.
UI Integration: Check if there is a specific UI component or configuration that needs to be updated to display the token counts. The UI might require additional setup to fetch and render this data.
Tracer Provider: Since LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider) is used, ensure that the tracer provider is correctly configured to capture and relay the necessary information to the UI. Compare this setup with VertexAIInstrumentor().instrument(tracer_provider=tracer_provider), which seems to work, to identify any discrepancies.
Logs vs. UI Data Flow: Verify the data flow from the logs to the UI. There might be a disconnect or missing link in how the token count data is passed from the backend to the frontend.

By focusing on these areas, you might be able to identify the root cause of the discrepancy between the logs and the UI display.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

TheMellyBee added bug Something isn't working triage Issues that require triage labels Feb 14, 2025

mikeldking removed the triage Issues that require triage label Feb 14, 2025

github-project-automation bot added this to Instrumentation Feb 14, 2025

mikeldking transferred this issue from Arize-ai/phoenix Feb 14, 2025

mikeldking added the triage Issues that require triage label Feb 14, 2025

dosubot bot added the language: python Related to Python integration label Feb 14, 2025

mikeldking removed this from phoenix Feb 14, 2025

mikeldking added instrumentation: llama-index Related to llama-index instrumentation module and removed triage Issues that require triage labels Feb 19, 2025

mikeldking assigned davidgmonical Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Phoenix UI not showing VertexAI Token Count #1287

[BUG] Phoenix UI not showing VertexAI Token Count #1287

TheMellyBee commented Feb 14, 2025

TheMellyBee commented Feb 14, 2025

mikeldking commented Feb 14, 2025

dosubot bot commented Feb 14, 2025

[BUG] Phoenix UI not showing VertexAI Token Count #1287

[BUG] Phoenix UI not showing VertexAI Token Count #1287

Comments

TheMellyBee commented Feb 14, 2025

TheMellyBee commented Feb 14, 2025

mikeldking commented Feb 14, 2025

dosubot bot commented Feb 14, 2025