Skip to content

Langfuse SDK alignment & Support for embedders #2464

@Hansehart

Description

@Hansehart

Is your feature request related to a problem? Please describe.
When using MistralChatGenerator, MistralTextEmbedder, or MistralDocumentEmbedder with the Langfuse integration, usage metrics and cost details are not tracked properly. This is because these components are
created as SPAN observation types instead of GENERATION and EMBEDDING types respectively.

The Langfuse SDK only supports usage_details and cost_details parameters for ObservationTypeGenerationLike types (i.e., generation and embedding), but not for SPAN types. Without proper observation typing, it's
impossible to track LLM costs and token usage for Mistral components, which is critical for production monitoring and cost management.

Describe the solution you'd like
Add Mistral components to the appropriate whitelists in the Langfuse tracer:

  1. Add MistralChatGenerator to the _SUPPORTED_CHAT_GENERATORS list so it's created as a generation observation type
  2. Add MistralTextEmbedder and MistralDocumentEmbedder to a new _SUPPORTED_EMBEDDERS list and update the tracer logic to create them as embedding observation types

Describe alternatives you've considered

  1. Manual workaround: Directly monkey-patching the tracer or manually logging usage - this is brittle and defeats the purpose of the integration
  2. Using only supported generators: Switch from Mistral to OpenAI/Anthropic - not ideal when Mistral is the preferred provider

Additional context

  • Mistral components inherit from OpenAI components (MistralChatGenerator(OpenAIChatGenerator), MistralTextEmbedder(OpenAITextEmbedder)), so they have the same interface and return usage data in the same format
  • Langfuse Python SDK @ 3.8.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions