Skip to content

Conversation

@igordayen
Copy link
Contributor

@igordayen igordayen commented Dec 31, 2025

Add Comprehensive Thinking Extraction Support

Summary

This PR implements complete end-to-end thinking extraction functionality,
enabling LLMs clients with reasoning content alongside structured outputs.
The implementation spans API design, integration with Spring AI, manual
converter chains, and comprehensive test coverage.

New API Surface

 // Fluent API for thinking extraction

ThinkingResponse<Person> response = 
      promptRunner
     .withThinking()
     .createObject("Analyze this person", Person.class);

//  Response Structure

 data class ThinkingResponse<T>(
     val result: T?,                      // Converted object
     val thinkingBlocks: List<ThinkingBlock>  // Extracted reasoning
 )

Execution Flow

  User Code
    ↓
  PromptRunner.withThinking() 
     ↓
  OperationContextPromptRunner
   ↓
  ThinkingPromptRunnerOperationsImpl
    ↓
  ChatClientLlmOperations.doTransformWithThinking()
    ↓
  Manual Converter Chain:
    - WithExampleConverter (with examples)
    - SuppressThinkingConverter (JSON cleaning)
    - FilteringJacksonOutputConverter (parsing)
    ↓
  extractAllThinkingBlocks(rawResponse)
    ↓
  ThinkingResponse<T>

Key Components

  1. API Layer
  • PromptRunner - core thinking functionality withThinking
    operations
  • ThinkingPromptRunnerOperations: Interface defining thinking-aware
    operations
  • ThinkingResponse: Response wrapper containing result + thinking
    blocks
  1. Implementation Layer
  • ThinkingPromptRunnerOperationsImpl: Core implementation routing to
    ChatClientLlmOperations
  • ChatClientLlmOperations: Enhanced with doTransformWithThinking()
    methods
  • Manual converter chains: Bypassing responseEntity() to preserve
    thinking blocks
  1. Infrastructure
  • SuppressThinkingConverter: Cleans thinking blocks from JSON parsing
  • ThinkingDetector: Used in Streaming; Refactored to use centralized ThinkingTags
    definitions

Technical Approach

Manual Converter Chain Pattern

  // Extract thinking BEFORE converter chain
  val thinkingBlocks = extractAllThinkingBlocks(rawText)

  // Clean conversion without responseEntity() 
  val result = converter.convert(rawText)

  // Combine both in thinking-aware response
ThinkingResponse(result, thinkingBlocks)

Thinking Block Preservation

  • Extract thinking blocks from raw LLM response text first
  • Use manual converter chains to avoid single-consumption constraints
  • Preserve thinking blocks even in failure scenarios via
    ThinkingException for createObjectIfPossible

Test Coverage

Unit Tests

  • ThinkingPromptRunnerOperationsTest: API contract testing
  • ThinkingPromptRunnerOperationsExtractionTest: Thinking extraction
    validation
  • ChatClientLlmOperationsThinkingTest: Core implementation testing
  • ThinkingPromptRunnerBuilderTest: Java builder pattern validation

Integration Tests

  • LLMAnthropicThinkingBuilderIT: End-to-end with Anthropic Claude models
  • LLMOllamaThinkingBuilderIT: End-to-end with Ollama models

Files Modified

  • Enhanced: ChatClientLlmOperations.kt, OperationContextPromptRunner.kt
  • Fixed: SuppressThinkingConverter.kt, ThinkingDetector.kt
  • Updated: LlmOptions.kt, existing test files

Usage Examples

Java

    PromptRunner runner = ai.withLlm("claude-sonnet-4-5")
                                .withToolObject(Tooling.class)
                                 .withGenerateExamples(true);

        String prompt = """
                What is the hottest month in Florida and  provide its temperature.
                Please respond with your reasoning using tags <reason>.
                
                The name should be the month name, temperature should be in Fahrenheit.
                """;

        // When: Use runner to create object with thinking
        ThinkingResponse<MonthItem> response = runner
                .withThinking()
                .createObject(prompt, MonthItem.class);


19:27:56.460 [main] INFO  LLMAnthropicThinkingBuilderIT - Created object: MonthItem{name='August', temperature=91}
19:27:56.460 [main] INFO  LLMAnthropicThinkingBuilderIT - Extracted [ThinkingBlock(content=Florida's hottest month is typically August, when temperatures peak during the summer season. The average high temperature in August across Florida is approximately 91-92°F, though it can vary slightly by region. In many areas, particularly inland and southern Florida, temperatures regularly reach into the low to mid-90s during this month. I'll use 91°F as a representative average high temperature for Florida in August., tagType=TAG, tagValue=reason)] thinking blocks

Addressed code complexity in ChatClientLlmOperations by moving exception handling blocks and prompt builders into separate private methods,

This implementation provides complete thinking extraction capabilities
while maintaining backward compatibility and comprehensive test coverage
across multiple LLM providers

Note: this PR depends on another PR:
embabel/embabel-common#99

ThinkingBlocks abstraction suppot
PromptRunner API withThinking
Introduced ChatResponseWithThinking for PromptRunner createObject APIs
Delegation from OperationContextPromptRunner to ChatClientLLMOperations for thinking support
Comprehensive unit and integration testing
Copy link
Contributor

@johnsonr johnsonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. Really important functionality.

renamed and moved Reponse With Thinking and Thinking Exception to recommended package
factored API withThinking into PromptRunner
introduced tag interface Thinking Capability to ensure thinking gets applied only to proper prompt runner operations
ensure createObject and other APIs will not compile on prompt runners that do not implement thinking functionality
more fluent java builder API
PromptRunner withThinking returns ThinkingPromptRunnerOperations - single critical change
Removed Thinking Extensions
Renamed ResponseWithThinking to ThinkingResponse
Updated java Thinking IT tests to use Core PromptRunner API, rather than builder
Thinking Builder - not deprecated, for supporting builder pattern, does not rely on extensions anylonger
Updated documentation with reference to Core API rather than builder
@igordayen
Copy link
Contributor Author

commit 688a53a (HEAD -> thinking-blocks-support, origin/thinking-blocks-support)
Author: Igor Dayen [email protected]
Date: Mon Jan 5 12:05:54 2026 -0500

Include Thinking into Core PromptRunner API:

PromptRunner withThinking returns ThinkingPromptRunnerOperations - single critical change
Removed Thinking Extensions
Renamed ResponseWithThinking to ThinkingResponse
Updated java Thinking IT tests to use Core PromptRunner API, rather than builder
Thinking Builder - not deprecated, for supporting builder pattern, does not rely on extensions anylonger
Updated documentation with reference to Core API rather than builder

Copy link
Contributor

@johnsonr johnsonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close now. Good stuff

Enhanced documentation
Updated multi-line text
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2026

@johnsonr johnsonr merged commit 5dcf26f into main Jan 6, 2026
13 checks passed
@johnsonr johnsonr deleted the thinking-blocks-support branch January 6, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants