perf(normalizer): use Rc<str> interning to reduce alias resolution allocations#726
Open
anakrish wants to merge 1 commit into
Open
perf(normalizer): use Rc<str> interning to reduce alias resolution allocations#726anakrish wants to merge 1 commit into
anakrish wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes Azure Policy alias normalization by precomputing and reusing Rc<str>-backed strings (lowercased short names and split ARM path segments) so the hot normalization path avoids repeated String allocations and per-call split('.') work.
Changes:
- Precompute
ResolvedEntrylowercase short names and path segments asRc<str>at registry-load time. - Add
ObjMaphelpers to insert pre-allocatedRc<str>keys and to lowercase keys with a fast-path. - Update alias resolution to navigate ARM paths via precomputed segments and insert common-case scalar aliases without allocating new keys.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/languages/azure_policy/aliases/types.rs |
Stores precomputed Rc<str> lowercased short name and pre-split path segments in ResolvedEntry. |
src/languages/azure_policy/aliases/obj_map.rs |
Adds Rc<str> insertion/lowercasing helpers and changes root-field collision prefix check to avoid allocation. |
src/languages/azure_policy/aliases/normalizer/alias_resolution.rs |
Uses precomputed segments for ARM navigation and inserts scalar aliases via Rc cloning in the common case. |
Comments suppressed due to low confidence (1)
src/languages/azure_policy/aliases/obj_map.rs:283
is_root_field_collisionno longer matches the previousstarts_with("properties.")behavior: the newdefault_path.len() > 11excludes"properties."(len == 11), which previously returned true. If this function is only meant to avoid allocations, consider using a slice-based, case-insensitive prefix check that preserves the exact prior semantics (including the len==11 case).
pub fn is_root_field_collision(short_name: &str, default_path: &str) -> bool {
ROOT_FIELDS
.iter()
.any(|f| f.eq_ignore_ascii_case(short_name))
&& default_path
.as_bytes()
.iter()
.zip(b"properties.")
.all(|(a, b)| a.to_ascii_lowercase() == *b)
&& default_path.len() > 11
ab744ce to
89d3997
Compare
…locations Replace per-alias heap allocations with reference-counted string interning throughout the normalizer's alias resolution pipeline: - Store alias ARM path segments as Vec<Rc<str>> instead of Vec<String>, enabling zero-alloc BTreeMap lookups via Rc::clone (refcount bump) rather than Value::from() (heap allocation per lookup) - Pre-compute lowercased short name (short_name_lc: Rc<str>) at registry-load time, enabling allocation-free insertion for the common case of non-dotted, non-collision alias short names - Add rc_lowercase() fast-path that skips allocation when the input string is already lowercase ASCII (common for ARM property names) - Update set_nested_lowercased/set_nested_inner/set_nested_in_btree to thread Rc<str> through intermediate object construction, avoiding String temporaries at every nesting level - Replace to_ascii_lowercase().starts_with() in is_root_field_collision with a zero-alloc byte-level comparison Measured 27-49% improvement in normalization time across a range of Azure Policy alias-heavy resource types (293-608 aliases per type). Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
89d3997 to
9b773fa
Compare
Comment on lines
391
to
416
| /// A resolved alias entry with its default path and optional versioned paths. | ||
| #[derive(Debug, Clone)] | ||
| pub struct ResolvedEntry { | ||
| /// The original-cased alias short name (e.g., `"accountType"`, not | ||
| /// `"accounttype"`). The entries map uses lowercase keys for | ||
| /// case-insensitive lookup, but the normalizer needs the original casing | ||
| /// to write values at correctly-cased paths in the output. | ||
| pub short_name: String, | ||
| /// The default ARM JSON path. | ||
| pub default_path: String, | ||
| /// Versioned path overrides: `(api_version, arm_path)` pairs. | ||
| pub versioned_paths: Vec<(String, String)>, | ||
| /// Optional metadata from the alias catalog (type, modifiability). | ||
| pub metadata: Option<AliasPathMetadata>, | ||
|
|
||
| // ── Precomputed fields (derived at registry-load time) ────────────── | ||
| /// Whether `short_name` contains `[*]` (i.e., this is a wildcard/array alias). | ||
| pub is_wildcard: bool, | ||
| /// Pre-lowercased short name as `Rc<str>` for allocation-free common-case inserts. | ||
| pub(crate) short_name_lc: Rc<str>, | ||
| /// Precomputed `default_path.split('.').collect()` for fast ARM path navigation. | ||
| pub default_path_segments: Vec<String>, | ||
| pub(crate) default_path_segments: Vec<Rc<str>>, | ||
| /// Precomputed path segments for each versioned path, in the same order | ||
| /// as `versioned_paths`. | ||
| pub versioned_path_segments: Vec<Vec<String>>, | ||
| pub(crate) versioned_path_segments: Vec<Vec<Rc<str>>>, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The alias normalizer allocates fresh
Strings for path segments and lowercased short names on every resolution. Since these values are derived from the alias registry and never change, we can precompute them asRc<str>at registry-load time and reuse them via cheap refcount bumps during normalization.Concretely,
ResolvedEntrynow stores pre-split path segments asVec<Rc<str>>and a pre-lowercased short name asRc<str>. The normalizer usesRc::cloneinstead ofString::from/to_ascii_lowercasein the hot path. A few new helpers (obj_insert_rc,rc_lowercase,obj_insert_lc) keep the call sites clean. Theis_root_field_collisioncheck also avoids a temporary allocation by doing byte-level comparison directly.Benchmarking across 11 real Azure policies shows 27-49% improvement in cached eval time (including normalization), with the biggest wins on high-alias-count resources where per-alias allocation cost dominates.