Skip to content

refactor: standardize Observation class #923

@simonrosenberg

Description

@simonrosenberg

There are 9 different observation classes that implement the same things but with minor differences. It should be standardized in the base class.

Common Patterns Identified

1. Error Handling (7/9 observations)

Pattern variants:

# Variant A: String error field (4 observations)
error: str | None = None              # FileEditor, Grep, Glob, Browser

# Variant B: Boolean error flag (2 observations)
error: bool = False                   # ExecuteBash
is_error: bool = False                # MCPTool

# No error field (2 observations): Think, Finish

Usage Pattern in to_llm_content:

if self.error:
    return [TextContent(text=f"Error: {self.error}")]
# ... normal content

2. Output/Content Field (8/9 observations)

Pattern variants:

output: str                           # ExecuteBash, FileEditor, Browser
content: str                          # TaskTracker, Think
message: str                          # Finish
content: list[TextContent | ImageContent]  # MCPTool

# No direct output field: Grep, Glob (use structured data instead)

Refactoring Opportunities

1. Standardized Error Handling

Current Problem:

  • Inconsistent error field naming (error vs is_error)
  • Inconsistent types (str | None vs bool)
  • Repeated error-first pattern in to_llm_content

Proposed Base Class Enhancement:

class Observation(Schema, ABC):
    # Optional error field that can be overridden
    error: str | None = Field(
        default=None, 
        description="Error message if operation failed"
    )
    
    @property
    def has_error(self) -> bool:
        """Check if observation represents an error."""
        return self.error is not None
    
    def _format_error(self) -> TextContent:
        """Standard error formatting."""
        return TextContent(text=f"Error: {self.error}")

2. Success/Status Indication

Current Problem:

  • No standardized way to indicate success/failure
  • Some use exit codes, some use error flags, some have no indicator

Proposed Base Class Enhancement:

from enum import Enum

class ObservationStatus(str, Enum):
    SUCCESS = "success"
    ERROR = "error"
    
class Observation(Schema, ABC):
    @property
    def status(self) -> ObservationStatus:
        """Compute observation status."""
        if self.has_error:
            return ObservationStatus.ERROR
        return ObservationStatus.SUCCESS

3. Standardized Output Field

Current Problem:

  • Inconsistent field names for primary output: output, content, message
  • 8 different observations use 3 different naming conventions

Current Field Usage:

# Variant A: "output" field (4 observations)
output: str                           # ExecuteBash, FileEditor, Browser

# Variant B: "content" field (3 observations)  
content: str                          # TaskTracker, Think
content: list[TextContent | ImageContent]  # MCPTool

# Variant C: "message" field (1 observation)
message: str                          # Finish

# Variant D: No direct output field (2 observations)
# Use structured data only: matches, files  # Grep, Glob

Proposed Base Class Enhancement:

class Observation(Schema, ABC):
    # Standardized primary output field
    output: str = Field(
        default="",
        description="Primary text output from the tool operation"
    )

and add a base implementation of to_llm_content

@property
def to_llm_content(self) -> Sequence[TextContent | ImageContent]:
    if self.error:
        return [TextContent(text=f"Error: {self.error}")]
    return [TextContent(text=self.output)]

Sub-classes can override the base implementation of to_llm_content to provide more context to the agent if needed

Metadata

Metadata

Labels

openhandsSolving the issue with OpenHands.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions