Skip to content

Conversation

@its-hammer-time
Copy link
Contributor

@its-hammer-time its-hammer-time commented Oct 23, 2025

Optimize error path performance by caching ReferenceSchema string conversion

Summary

This PR significantly improves validation performance for the error path by eliminating redundant []byte to string conversions of the rendered schema. The optimization caches the string representation and reuses it across all validation errors, leveraging Go's string immutability guarantees.

Problem

When validating requests/responses against large OpenAPI schemas, validation errors include the full rendered schema in each SchemaValidationFailure object for debugging purposes. Previously, the code converted renderedSchema (a []byte) to a string inside the error processing loop, causing the large byte slice to be copied for every single validation error.

For endpoints with complex schemas (e.g., schemas that inline many references and result in 500KB+ rendered schemas), this caused severe performance degradation:

  • Each validation error triggered a full copy of the rendered schema
  • For requests with 1,000 items generating 6 errors each, this meant 6,000 × 500KB = ~3GB of allocations per request
  • Massive GC pressure and high memory consumption

Solution

The fix is simple and leverages Go's string immutability:

  1. Convert once, reuse many: Convert renderedSchema to a string immediately after rendering (or loading from cache), before the error loop
  2. Cache the string: Add a ReferenceSchema string field to SchemaCacheEntry to cache the pre-converted string
  3. Share across errors: When assigning to error objects, Go's string semantics ensure all copies share the same underlying buffer (only the 16-byte string header is copied, not the data)

Performance Results

Benchmark comparison using a 536KB rendered schema with 100 invalid items:

Before Optimization

BenchmarkErrorPath_100Actions-16       169   33,782,870 ns/op   504,285,983 B/op   194,989 allocs/op
  • Time: 33.8ms per operation
  • Memory: 504MB per operation

After Optimization

BenchmarkErrorPath_100Actions-16       418   13,733,243 ns/op   16,861,448 B/op    192,162 allocs/op
  • Time: 13.7ms per operation
  • Memory: 16.8MB per operation

Improvement

  • Latency: 2.46x faster (59% reduction)
  • 🧠 Memory: 30x less memory (96.7% reduction)
  • 🔢 Throughput: 2.47x higher

Valid Path (No Regression)

Valid request performance remains unchanged (differences within measurement noise):

BEFORE: BenchmarkValidPath_100Actions-16    5,721   1,046,346 ns/op   1,547,632 B/op
AFTER:  BenchmarkValidPath_100Actions-16    5,576   1,057,066 ns/op   1,548,117 B/op

Changes Made

1. Enhanced Cache Structure

// cache/cache.go
type SchemaCacheEntry struct {
    Schema          *base.Schema
    RenderedInline  []byte
    RenderedJSON    []byte
    ReferenceSchema string // NEW: Cached string version to avoid repeated conversions
    CompiledSchema  *jsonschema.Schema
}

2. Request Validation (requests/validate_request.go)

  • Cache hit path: Load pre-converted referenceSchema from cache (zero conversion cost)
  • Cache miss path: Convert renderedSchema to string immediately after RenderInline() and cache it
  • Error loop: Use the pre-converted referenceSchema variable instead of calling string(renderedSchema) repeatedly

3. Response Validation (responses/validate_response.go)

  • Same changes as request validation

Why This Works

Go strings are immutable and use copy-on-write semantics with shared backing storage:

  • String header: struct { Data uintptr; Len int } = 16 bytes
  • When assigned (str2 = str1), only the 16-byte header is copied
  • All string references share the same underlying buffer

Memory impact for 1,000 validation errors:

  • Before: 1,000 × 536KB = 536MB (1,000 separate allocations)
  • After: 536KB + (1,000 × 16 bytes) = ~536KB (1 allocation + 1,000 header copies)

This is guaranteed safe because strings are immutable in Go - the shared buffer can never be modified.

Testing

  • ✅ All existing tests pass
  • ✅ Benchmark suite confirms improvements
  • ✅ No behavioral changes to validation logic
  • ✅ Valid path performance unaffected

Breaking Changes

None. This is a pure performance optimization with no API changes.

Related Issues

This optimization is particularly beneficial for:

  • Large OpenAPI schemas (500KB+) with many inlined references
  • Bulk validation operations (hundreds or thousands of items)
  • Endpoints with complex oneOf/anyOf schemas that may generate many validation errors
  • High-throughput validation scenarios

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.32%. Comparing base (5ace66c) to head (690e4a5).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #193      +/-   ##
==========================================
+ Coverage   97.21%   97.32%   +0.11%     
==========================================
  Files          42       42              
  Lines        4597     3740     -857     
==========================================
- Hits         4469     3640     -829     
+ Misses        128      100      -28     
Flag Coverage Δ
unittests 97.32% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@its-hammer-time its-hammer-time marked this pull request as ready for review October 23, 2025 17:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes validation error path performance by caching the string conversion of rendered schemas. Previously, []byte to string conversions occurred repeatedly inside error loops, causing significant memory allocations (e.g., 504MB per operation in benchmarks). The optimization converts the schema once and stores it in the cache, reducing memory usage by 96.7% and improving latency by 59%.

Key Changes:

  • Added ReferenceSchema string field to SchemaCacheEntry to cache the pre-converted string
  • Modified request/response validation to convert renderedSchema once and reuse across all validation errors
  • Updated cache warming logic to populate the new field

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cache/cache.go Added ReferenceSchema field to cache struct for storing pre-converted string
requests/validate_request.go Updated to use cached ReferenceSchema string instead of repeated string(renderedSchema) conversions
responses/validate_response.go Updated to use cached ReferenceSchema string instead of repeated string(renderedSchema) conversions
validator.go Modified cache warming functions to populate ReferenceSchema field during schema compilation
validator_test.go Added assertions to verify ReferenceSchema is properly cached and matches RenderedInline

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@daveshanley daveshanley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@daveshanley daveshanley merged commit ab0675e into pb33f:main Oct 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants