You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine
TL;DR: We've completely rewritten the OAL (Observability Analysis Language) engine from scratch. The new V2 engine generates bytecode-identical classes to V1, ensuring zero behavioral changes while providing a cleaner architecture, better error messages, and improved maintainability.
Background: What is OAL?
OAL (Observability Analysis Language) is a domain-specific language that defines how SkyWalking aggregates telemetry data into metrics. Every time you see metrics like service_resp_time, service_cpm, or endpoint_sla in the SkyWalking UI, they're computed by classes generated from OAL scripts.
Here's what OAL looks like:
// Calculate average response time for services
service_resp_time = from(Service.latency).longAvg();
// Calculate success rate (SLA) for endpoints
endpoint_sla = from(Endpoint.*).filter(status == true).percent();
// Calculate calls per minute for service relations (client side only)
service_relation_client_cpm = from(ServiceRelation.*)
.filter(detectPoint == DetectPoint.CLIENT).cpm();
At runtime, SkyWalking parses these scripts and generates Java classes using bytecode manipulation (Javassist). These generated classes handle:
Metrics classes: Data aggregation, serialization, time-bucketing
Dispatcher classes: Routing telemetry data to the correct metrics
Why Rewrite?
The original OAL engine (now called V1) was developed in 2018 and has served SkyWalking well. However, over 6+ years of evolution, several architectural issues emerged:
1. Mutable Parser Models
V1 uses mutable objects throughout the parsing pipeline. The ANTLR listener directly modifies shared state:
// V1: Mutable accumulation during parsingpublicclassOALListener {
privateAnalysisResultcurrent; // Mutable, reused across rulespublicvoidenterSource(SourceContextctx) {
current.setSourceName(ctx.getText()); // Direct mutation
}
}
This made the code harder to reason about and test in isolation.
2. Mixed Concerns
V1 combines parsing logic with code generation concerns. The parser directly accesses reflection APIs to validate sources:
// V1: Parsing and validation coupled togetherpublicvoidexitAggregationStatement(...) {
// Parsing + immediate reflection-based validationClass<?> sourceClass = Class.forName(sourceName);
Fieldfield = sourceClass.getDeclaredField(fieldName);
// ... continues with more interleaved logic
}
3. Limited Error Messages
When OAL scripts had syntax errors, V1's error messages were often cryptic:
OAL parsing failure.
No line number, no column, no indication of what went wrong.
4. String-Based Type Handling
Filter values and function arguments were stored as raw strings, requiring repeated parsing:
// V1: String-based, parsed multiple timesStringfilterValue = "DetectPoint.CLIENT";
// Later: parse again to determine if it's an enum, string, or number
What Changed in V2
Clean Architecture with Immutable Models
V2 introduces a clear separation between parsing, enrichment, and code generation:
// V2: Immutable model with builder@GetterpublicclassMetricDefinition {
privatefinalStringname;
privatefinalSourceReferencesource;
privatefinalList<FilterExpression> filters; // UnmodifiableprivatefinalFunctionCallaggregationFunction;
// Built via builder, never modified after construction
}
Type-Safe Filter Values
V2 represents filter values with proper types from the start:
The most critical question: Does V2 generate identical code to V1?
We performed a comprehensive cross-check by:
Building both branches (master with V1, oal-v2 branch with V2)
Generating all classes from all 9 OAL scripts
Decompiling 946 generated classes using CFR decompiler
Comparing line-by-line
Results
Category
Classes
Identical
Metrics
455
0*
Builder
455
455 (100%)
Dispatcher
36
36 (100%)
Total
946
491
*The 455 Metrics classes have exactly one difference - a bug fix:
V1 (incorrect):
if (remoteData.getDataStrings(0) != "") { // Reference comparison!this.setEntityId(remoteData.getDataStrings(0));
}
V2 (correct):
if (!remoteData.getDataStrings(0).isEmpty()) { // Proper string checkthis.setEntityId(remoteData.getDataStrings(0));
}
V1 used != for string comparison, which compares object references rather than string content. V2 fixes this by using .isEmpty(). This is the only behavioral difference, and it's a bug fix.
Migration Strategy
For Users: Zero Action Required
OAL V2 is a pure internal refactoring. Your OAL scripts work exactly as before. No configuration changes, no migration steps.
For Contributors: Simplified Development
If you're extending SkyWalking's metrics:
Same OAL syntax - All existing scripts work unchanged
Better debugging - Set SW_OAL_ENGINE_DEBUG=true to write generated .class files to disk
Clearer code - Each component has a single responsibility
Easier testing - Immutable models can be constructed directly in tests
For Maintainers: Cleaner Codebase
The V1 code remains in the repository temporarily for reference but is no longer used. Future OAL enhancements (new aggregation functions, new filter operators, etc.) will be implemented in V2 only.
Technical Deep Dive
Parser Implementation
V2 uses the same ANTLR4 grammar as V1 (OALParser.g4, OALLexer.g4) but with a completely new listener:
The MetricDefinitionEnricher adds metadata required for code generation:
publicclassMetricDefinitionEnricher {
publicCodeGenModelenrich(MetricDefinitionmetric) {
// Resolve source class via reflectionClass<?> sourceClass = resolveSourceClass(metric.getSource());
// Extract column metadata from annotationsList<SourceColumn> columns = extractColumns(sourceClass);
// Determine persistent fields based on aggregation functionList<PersistentField> fields = determinePersistentFields(
metric.getAggregationFunction(), columns);
returnCodeGenModel.builder()
.metric(metric)
.sourceColumns(columns)
.persistentFields(fields)
.build();
}
}
Code Generation
V2 uses FreeMarker templates for generating method bodies:
<#-- metrics/deserialize.ftl -->
@Override
public void deserialize(RemoteData remoteData) {
<#list serializeFields.stringFields as field>
if (!remoteData.getDataStrings(${field?index}).isEmpty()) {
this.${field.setter}(remoteData.getDataStrings(${field?index}));
}
</#list>
// ... more field types
}
This is cleaner than V1's string concatenation approach and easier to maintain.
Future Possibilities
With V2's clean architecture, several enhancements become easier:
Static analysis - Detect issues before runtime (unused metrics, type mismatches)
Incremental compilation - Only regenerate changed metrics
New aggregation functions - Cleaner extension points
Conclusion
OAL V2 is a foundational improvement to SkyWalking's core metrics engine. While invisible to end users, it sets the stage for easier maintenance and future enhancements. The bytecode-level validation gives us confidence that this rewrite preserves exact behavioral compatibility (minus one bug fix).
We welcome feedback and contributions! If you're interested in the implementation details, check out:
core featureCore and important feature. Sometimes, break backwards compatibility.complexity:highRelate to multiple(>4) components of SkyWalking
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine
TL;DR: We've completely rewritten the OAL (Observability Analysis Language) engine from scratch. The new V2 engine generates bytecode-identical classes to V1, ensuring zero behavioral changes while providing a cleaner architecture, better error messages, and improved maintainability.
Background: What is OAL?
OAL (Observability Analysis Language) is a domain-specific language that defines how SkyWalking aggregates telemetry data into metrics. Every time you see metrics like
service_resp_time,service_cpm, orendpoint_slain the SkyWalking UI, they're computed by classes generated from OAL scripts.Here's what OAL looks like:
At runtime, SkyWalking parses these scripts and generates Java classes using bytecode manipulation (Javassist). These generated classes handle:
Why Rewrite?
The original OAL engine (now called V1) was developed in 2018 and has served SkyWalking well. However, over 6+ years of evolution, several architectural issues emerged:
1. Mutable Parser Models
V1 uses mutable objects throughout the parsing pipeline. The ANTLR listener directly modifies shared state:
This made the code harder to reason about and test in isolation.
2. Mixed Concerns
V1 combines parsing logic with code generation concerns. The parser directly accesses reflection APIs to validate sources:
3. Limited Error Messages
When OAL scripts had syntax errors, V1's error messages were often cryptic:
No line number, no column, no indication of what went wrong.
4. String-Based Type Handling
Filter values and function arguments were stored as raw strings, requiring repeated parsing:
What Changed in V2
Clean Architecture with Immutable Models
V2 introduces a clear separation between parsing, enrichment, and code generation:
All parser output is immutable:
Type-Safe Filter Values
V2 represents filter values with proper types from the start:
Rich Error Messages with Source Location
Every parsed element carries its source location:
Error messages now include precise locations:
Independent Codebase
V2 has zero dependencies on V1 code. The entire implementation lives in dedicated packages:
How We Validated: Bytecode-Level Comparison
The most critical question: Does V2 generate identical code to V1?
We performed a comprehensive cross-check by:
Results
*The 455 Metrics classes have exactly one difference - a bug fix:
V1 (incorrect):
V2 (correct):
V1 used
!=for string comparison, which compares object references rather than string content. V2 fixes this by using.isEmpty(). This is the only behavioral difference, and it's a bug fix.Migration Strategy
For Users: Zero Action Required
OAL V2 is a pure internal refactoring. Your OAL scripts work exactly as before. No configuration changes, no migration steps.
For Contributors: Simplified Development
If you're extending SkyWalking's metrics:
SW_OAL_ENGINE_DEBUG=trueto write generated.classfiles to diskFor Maintainers: Cleaner Codebase
The V1 code remains in the repository temporarily for reference but is no longer used. Future OAL enhancements (new aggregation functions, new filter operators, etc.) will be implemented in V2 only.
Technical Deep Dive
Parser Implementation
V2 uses the same ANTLR4 grammar as V1 (
OALParser.g4,OALLexer.g4) but with a completely new listener:Enrichment Phase
The
MetricDefinitionEnricheradds metadata required for code generation:Code Generation
V2 uses FreeMarker templates for generating method bodies:
This is cleaner than V1's string concatenation approach and easier to maintain.
Future Possibilities
With V2's clean architecture, several enhancements become easier:
Conclusion
OAL V2 is a foundational improvement to SkyWalking's core metrics engine. While invisible to end users, it sets the stage for easier maintenance and future enhancements. The bytecode-level validation gives us confidence that this rewrite preserves exact behavioral compatibility (minus one bug fix).
We welcome feedback and contributions! If you're interested in the implementation details, check out:
oap-server/oal-rt/src/main/java/org/apache/skywalking/oal/v2/oap-server/oal-rt/src/test/java/org/apache/skywalking/oal/v2/Apache SkyWalking is an open-source APM system. Learn more at skywalking.apache.org.
Beta Was this translation helpful? Give feedback.
All reactions