-
Notifications
You must be signed in to change notification settings - Fork 84
π§ Refactor populate_field_type to use type-directed schema walking
#1639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
7e380a2
π§ Refactor `populate_field_type` to use type-directed schema walking
chrisjsewell ef9aac4
Merge branch 'master' into cjs-improve-reduce-needs
chrisjsewell efd8324
Delete local_validation_cache.md
chrisjsewell fae627e
Update config_utils.py
chrisjsewell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,294 @@ | ||
| # Local Validation Caching System | ||
|
|
||
| This document explains the local validation caching system implemented in `sphinx_needs/schema/core.py` to optimize schema validation performance. | ||
|
|
||
| ## Overview | ||
|
|
||
| The caching system avoids redundant JSON schema validation when the same need is validated against the same schema multiple times. This is common in network validation where linked needs are traversed repeatedly from different starting points. | ||
|
|
||
| ## Problem Statement | ||
|
|
||
| In network validation, the same need can be validated multiple times: | ||
|
|
||
| ``` | ||
| Need REQ_002 links to REQ_001 | ||
| Need REQ_003 links to REQ_001 | ||
|
|
||
| ββββββββββ links ββββββββββ | ||
| βREQ_002 ββββββββββββββββΊβREQ_001 β | ||
| ββββββββββ ββββββββββ | ||
| β² | ||
| ββββββββββ links β | ||
| βREQ_003 ββββββββββββββββββββββ | ||
| ββββββββββ | ||
|
|
||
| Validation order (without cache): | ||
| 1. Validate REQ_002 | ||
| βββ Network: validate REQ_001 (expensive!) | ||
|
|
||
| 2. Validate REQ_003 | ||
| βββ Network: validate REQ_001 (same validation repeated!) | ||
| ``` | ||
|
|
||
| ## Solution: Higher-Level Caching | ||
|
|
||
| Rather than restructuring the recursive algorithm (which would change warning attribution), we cache at a higher level: the raw validation results before context-dependent data is added. | ||
|
|
||
| ## Data Structures | ||
|
|
||
| ### CachedLocalResult | ||
|
|
||
| Stores the raw validation outcome without context-dependent data: | ||
|
|
||
| ```python | ||
| @dataclass(slots=True) | ||
| class CachedLocalResult: | ||
| reduced_need: dict[str, Any] # The reduced need that was validated | ||
| errors: tuple[ValidationError, ...] # Validation errors (empty if passed) | ||
| schema_error: str | None = None # Schema compilation error, if any | ||
|
|
||
| @property | ||
| def success(self) -> bool: | ||
| return not self.errors and self.schema_error is None | ||
| ``` | ||
|
|
||
| ### LocalValidationCache | ||
|
|
||
| A cache keyed by `(need_id, schema_key)`: | ||
|
|
||
| ```python | ||
| @dataclass(slots=True) | ||
| class LocalValidationCache: | ||
| _cache: dict[tuple[str, tuple[str, ...]], CachedLocalResult] | ||
|
|
||
| def get(self, need_id: str, schema_key: tuple[str, ...]) -> CachedLocalResult | None | ||
| def store(self, need_id: str, schema_key: tuple[str, ...], result: CachedLocalResult) -> None | ||
| def invalidate(self, need_id: str) -> None # Remove all entries for a need | ||
| def clear(self) -> None | ||
| ``` | ||
|
|
||
| ## Cache Key Structure | ||
|
|
||
| ``` | ||
| schema_key = tuple of schema path components | ||
|
|
||
| Examples: | ||
| ("requirement_schema", "local") | ||
| ("requirement_schema", "validate", "network", "implements", "items", "local") | ||
| ("specification_schema", "local") | ||
|
|
||
| Combined with need_id for full cache key: | ||
| ("REQ_001", ("requirement_schema", "local")) | ||
| ("REQ_001", ("requirement_schema", "validate", "network", ...)) | ||
| ``` | ||
|
|
||
| ## Validation Flow | ||
|
|
||
| ### Without Cache | ||
|
|
||
| ``` | ||
| validate_type_schema(needs, schema) | ||
| β | ||
| βΌ | ||
| βββββββββββββββββ | ||
| β For each need β | ||
| βββββββββββββββββ | ||
| β | ||
| βΌ | ||
| recurse_validate_schemas(need, ...) | ||
| β | ||
| ββββΊ LOCAL: get_ontology_warnings() | ||
| β β | ||
| β βΌ | ||
| β βββββββββββββββββββββββββββ | ||
| β β 1. reduce_need() β βββ Expensive | ||
| β β 2. validator.validate() β βββ Expensive | ||
| β β 3. Build warnings β | ||
| β βββββββββββββββββββββββββββ | ||
| β | ||
| ββββΊ NETWORK: For each link_type | ||
| β | ||
| βΌ | ||
| For each target_need_id | ||
| β | ||
| βΌ | ||
| recurse_validate_schemas(target_need, ...) βββ Same need may be | ||
| β validated multiple | ||
| βΌ times! | ||
| (repeat LOCAL + NETWORK) | ||
| ``` | ||
|
|
||
| ### With Cache | ||
|
|
||
| ``` | ||
| validate_type_schema(needs, schema) | ||
| β | ||
| βΌ | ||
| βββββββββββββββββββββββββββββββ | ||
| β Create LocalValidationCache β | ||
| β Create validator_cache β | ||
| βββββββββββββββββββββββββββββββ | ||
| β | ||
| βΌ | ||
| βββββββββββββββββ | ||
| β For each need β | ||
| βββββββββββββββββ | ||
| β | ||
| βΌ | ||
| recurse_validate_schemas(need, ..., local_cache) | ||
| β | ||
| ββββΊ LOCAL validation: | ||
| β β | ||
| β βΌ | ||
| β ββββββββββββββββββββββββββββββββββββ | ||
| β β cache_key = (schema_path, "local")β | ||
| β β cached = local_cache.get( β | ||
| β β need_id, cache_key) β | ||
| β ββββββββββββββββββββββββββββββββββββ | ||
| β β | ||
| β ββββββββ΄βββββββ | ||
| β β β | ||
| β βΌ βΌ | ||
| β CACHE HIT CACHE MISS | ||
| β β β | ||
| β βΌ βΌ | ||
| β ββββββββββ βββββββββββββββββββββββ | ||
| β βRebuild β β 1. reduce_need() β | ||
| β βwarningsβ β 2. validate() β | ||
| β βfrom β β 3. Cache result β | ||
| β βcached β β 4. Build warnings β | ||
| β βresult β βββββββββββββββββββββββ | ||
| β ββββββββββ | ||
| β | ||
| ββββΊ NETWORK: (unchanged, passes local_cache down) | ||
| ``` | ||
|
|
||
| ## What Gets Cached vs. Rebuilt | ||
|
|
||
| ### Cached (Context-Independent) | ||
|
|
||
| | Field | Description | | ||
| |-------|-------------| | ||
| | `reduced_need` | The need dict after field reduction | | ||
| | `errors` | Tuple of `ValidationError` objects | | ||
| | `schema_error` | Schema compilation error message | | ||
|
|
||
| ### Rebuilt (Context-Dependent) | ||
|
|
||
| | Field | Description | | ||
| |-------|-------------| | ||
| | `need_path` | Path like `["REQ_002", "implements", "REQ_001"]` | | ||
| | `schema_path` | Path like `["my_schema", "local", "properties"]` | | ||
| | `user_message` | Only at `recurse_level == 0` | | ||
| | `user_severity` | Only at `recurse_level == 0` | | ||
| | `rule` | `local_fail` vs `network_local_fail` | | ||
|
|
||
| This separation is key: the expensive validation is cached, but warnings are rebuilt with the correct context for each call site. | ||
|
|
||
| ## Invalidation for Incremental Updates | ||
|
|
||
| When a need is modified, its cached entries can be invalidated: | ||
|
|
||
| ``` | ||
| Before modification: | ||
| βββββββββββββββββββββββββββββββββββββββββββ | ||
| β Cache entries: β | ||
| β (REQ_001, schema_A) β result1 β | ||
| β (REQ_001, schema_B) β result2 β | ||
| β (REQ_002, schema_A) β result3 β | ||
| β (REQ_003, schema_A) β result4 β | ||
| βββββββββββββββββββββββββββββββββββββββββββ | ||
|
|
||
| After local_cache.invalidate("REQ_001"): | ||
| βββββββββββββββββββββββββββββββββββββββββββ | ||
| β Cache entries: β | ||
| β (REQ_002, schema_A) β result3 β β | ||
| β (REQ_003, schema_A) β result4 β β | ||
| βββββββββββββββββββββββββββββββββββββββββββ | ||
| ``` | ||
|
|
||
| ## Code Changes | ||
|
|
||
| ### Modified Files | ||
|
|
||
| #### `sphinx_needs/schema/core.py` | ||
|
|
||
| 1. **Added imports:** | ||
| ```python | ||
| from dataclasses import dataclass, field as dataclass_field | ||
| ``` | ||
|
|
||
| 2. **Added `CachedLocalResult` dataclass** (after line 35): | ||
| - Stores validation results without context | ||
| - Has `success` property | ||
|
|
||
| 3. **Added `LocalValidationCache` dataclass:** | ||
| - `get()` - Retrieve cached result | ||
| - `store()` - Store a validation result | ||
| - `invalidate()` - Remove all entries for a need | ||
| - `clear()` - Clear entire cache | ||
| - `__len__()` - Return cache size | ||
|
|
||
| 4. **Modified `validate_type_schema()`:** | ||
| - Creates `LocalValidationCache` instance | ||
| - Passes `local_cache` to `recurse_validate_schemas()` | ||
|
|
||
| 5. **Modified `recurse_validate_schemas()`:** | ||
| - Added `local_cache: LocalValidationCache` parameter | ||
| - Checks cache before validating | ||
| - Uses `_construct_warnings_from_cache()` on cache hit | ||
| - Passes `local_cache` and `cache_key` to `get_ontology_warnings()` on cache miss | ||
|
|
||
| 6. **Added `_construct_warnings_from_cache()` function:** | ||
| - Rebuilds `OntologyWarning` objects from cached results | ||
| - Applies current context (need_path, schema_path, user_message, user_severity) | ||
|
|
||
| 7. **Modified `get_ontology_warnings()`:** | ||
| - Added optional `local_cache` and `cache_key` parameters | ||
| - Stores results in cache when parameters are provided | ||
|
|
||
| ### New Test Files | ||
|
|
||
| #### `tests/schema/test_core_cache.py` | ||
|
|
||
| Unit tests for the caching classes: | ||
|
|
||
| - `TestCachedLocalResult` - Tests for the result dataclass | ||
| - `TestLocalValidationCache` - Tests for cache operations including: | ||
| - Basic get/store | ||
| - Different schema keys for same need | ||
| - Different needs with same schema | ||
| - Invalidation | ||
| - Clear | ||
| - Overwrite | ||
|
|
||
| ## Usage Example | ||
|
|
||
| ```python | ||
| from sphinx_needs.schema.core import LocalValidationCache, validate_type_schema | ||
|
|
||
| # The cache is created and used automatically within validate_type_schema | ||
| # No changes needed to calling code | ||
|
|
||
| # For incremental updates (future use): | ||
| cache = LocalValidationCache() | ||
| # ... validation happens ... | ||
|
|
||
| # When a need is modified: | ||
| cache.invalidate("REQ_001") # Clear cached results for this need | ||
|
|
||
| # Re-validate only affected needs | ||
| ``` | ||
|
|
||
| ## Performance Impact | ||
|
|
||
| The caching is most effective when: | ||
|
|
||
| 1. **Many needs link to the same targets** - Each target is validated once instead of N times | ||
| 2. **Deep network validation** - Reduces exponential re-validation | ||
| 3. **Multiple schemas validate the same needs** - Each (need, schema) pair is cached separately | ||
|
|
||
| The cache has minimal overhead: | ||
| - Cache lookup is O(1) dict access | ||
| - Stored data is already computed (no copies needed) | ||
| - Memory usage scales with unique (need_id, schema_key) pairs | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.