OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine (since 10.4.0) #13701

wu-sheng · 2026-02-12T09:25:50Z

wu-sheng
Feb 12, 2026
Collaborator

OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine

TL;DR: We've completely rewritten the OAL (Observability Analysis Language) engine from scratch. The new V2 engine generates bytecode-identical classes to V1, ensuring zero behavioral changes while providing a cleaner architecture, better error messages, and improved maintainability.

Background: What is OAL?

OAL (Observability Analysis Language) is a domain-specific language that defines how SkyWalking aggregates telemetry data into metrics. Every time you see metrics like service_resp_time, service_cpm, or endpoint_sla in the SkyWalking UI, they're computed by classes generated from OAL scripts.

Here's what OAL looks like:

// Calculate average response time for services
service_resp_time = from(Service.latency).longAvg();

// Calculate success rate (SLA) for endpoints
endpoint_sla = from(Endpoint.*).filter(status == true).percent();

// Calculate calls per minute for service relations (client side only)
service_relation_client_cpm = from(ServiceRelation.*)
    .filter(detectPoint == DetectPoint.CLIENT).cpm();

At runtime, SkyWalking parses these scripts and generates Java classes using bytecode manipulation (Javassist). These generated classes handle:

Metrics classes: Data aggregation, serialization, time-bucketing
Builder classes: Storage conversion (entity ↔ database)
Dispatcher classes: Routing telemetry data to the correct metrics

Why Rewrite?

The original OAL engine (now called V1) was developed in 2018 and has served SkyWalking well. However, over 6+ years of evolution, several architectural issues emerged:

1. Mutable Parser Models

V1 uses mutable objects throughout the parsing pipeline. The ANTLR listener directly modifies shared state:

// V1: Mutable accumulation during parsing
public class OALListener {
    private AnalysisResult current;  // Mutable, reused across rules

    public void enterSource(SourceContext ctx) {
        current.setSourceName(ctx.getText());  // Direct mutation
    }
}

This made the code harder to reason about and test in isolation.

2. Mixed Concerns

V1 combines parsing logic with code generation concerns. The parser directly accesses reflection APIs to validate sources:

// V1: Parsing and validation coupled together
public void exitAggregationStatement(...) {
    // Parsing + immediate reflection-based validation
    Class<?> sourceClass = Class.forName(sourceName);
    Field field = sourceClass.getDeclaredField(fieldName);
    // ... continues with more interleaved logic
}

3. Limited Error Messages

When OAL scripts had syntax errors, V1's error messages were often cryptic:

OAL parsing failure.

No line number, no column, no indication of what went wrong.

4. String-Based Type Handling

Filter values and function arguments were stored as raw strings, requiring repeated parsing:

// V1: String-based, parsed multiple times
String filterValue = "DetectPoint.CLIENT";
// Later: parse again to determine if it's an enum, string, or number

What Changed in V2

Clean Architecture with Immutable Models

V2 introduces a clear separation between parsing, enrichment, and code generation:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  .oal file  │───▶│   Parser    │───▶│  Enricher   │───▶│  Generator  │
│             │    │             │    │             │    │             │
│ OAL script  │    │ MetricDef   │    │ CodeGenModel│    │ Bytecode    │
│             │    │ (immutable) │    │ (metadata)  │    │             │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

All parser output is immutable:

// V2: Immutable model with builder
@Getter
public class MetricDefinition {
    private final String name;
    private final SourceReference source;
    private final List<FilterExpression> filters;  // Unmodifiable
    private final FunctionCall aggregationFunction;

    // Built via builder, never modified after construction
}

Type-Safe Filter Values

V2 represents filter values with proper types from the start:

// V2: Type-safe from parsing
public sealed interface FilterValue {
    record StringValue(String value) implements FilterValue {}
    record LongValue(long value) implements FilterValue {}
    record BooleanValue(boolean value) implements FilterValue {}
    record EnumValue(String enumClass, String enumValue) implements FilterValue {}
    record ListValue(List<FilterValue> values) implements FilterValue {}
}

Rich Error Messages with Source Location

Every parsed element carries its source location:

public record SourceLocation(String fileName, int line, int column) {
    public String format() {
        return fileName + ":" + line + ":" + column;
    }
}

Error messages now include precise locations:

OAL parsing error at core.oal:42:15
  Undefined source field: Service.latencyy (did you mean: latency?)

Independent Codebase

V2 has zero dependencies on V1 code. The entire implementation lives in dedicated packages:

org.apache.skywalking.oal.v2/
├── model/           # Immutable AST models
├── parser/          # ANTLR listener and parser facade
├── generator/       # Code generation (Javassist + FreeMarker)
├── metadata/        # Source introspection utilities
└── OALEngineV2      # Main entry point

How We Validated: Bytecode-Level Comparison

The most critical question: Does V2 generate identical code to V1?

We performed a comprehensive cross-check by:

Building both branches (master with V1, oal-v2 branch with V2)
Generating all classes from all 9 OAL scripts
Decompiling 946 generated classes using CFR decompiler
Comparing line-by-line

Results

Category	Classes	Identical
Metrics	455	0*
Builder	455	455 (100%)
Dispatcher	36	36 (100%)
Total	946	491

*The 455 Metrics classes have exactly one difference - a bug fix:

V1 (incorrect):

if (remoteData.getDataStrings(0) != "") {  // Reference comparison!
    this.setEntityId(remoteData.getDataStrings(0));
}

V2 (correct):

if (!remoteData.getDataStrings(0).isEmpty()) {  // Proper string check
    this.setEntityId(remoteData.getDataStrings(0));
}

V1 used != for string comparison, which compares object references rather than string content. V2 fixes this by using .isEmpty(). This is the only behavioral difference, and it's a bug fix.

Migration Strategy

For Users: Zero Action Required

OAL V2 is a pure internal refactoring. Your OAL scripts work exactly as before. No configuration changes, no migration steps.

For Contributors: Simplified Development

If you're extending SkyWalking's metrics:

Same OAL syntax - All existing scripts work unchanged
Better debugging - Set SW_OAL_ENGINE_DEBUG=true to write generated .class files to disk
Clearer code - Each component has a single responsibility
Easier testing - Immutable models can be constructed directly in tests

For Maintainers: Cleaner Codebase

The V1 code remains in the repository temporarily for reference but is no longer used. Future OAL enhancements (new aggregation functions, new filter operators, etc.) will be implemented in V2 only.

Technical Deep Dive

Parser Implementation

V2 uses the same ANTLR4 grammar as V1 (OALParser.g4, OALLexer.g4) but with a completely new listener:

public class OALListenerV2 extends OALParserBaseListener {
    private final List<MetricDefinition> metrics = new ArrayList<>();
    private MetricDefinition.Builder currentBuilder;

    @Override
    public void enterMetricStatement(MetricStatementContext ctx) {
        // Start fresh builder for each metric
        currentBuilder = MetricDefinition.builder()
            .location(locationOf(ctx));
    }

    @Override
    public void exitMetricStatement(MetricStatementContext ctx) {
        // Finalize and collect
        metrics.add(currentBuilder.build());
        currentBuilder = null;
    }
}

Enrichment Phase

The MetricDefinitionEnricher adds metadata required for code generation:

public class MetricDefinitionEnricher {
    public CodeGenModel enrich(MetricDefinition metric) {
        // Resolve source class via reflection
        Class<?> sourceClass = resolveSourceClass(metric.getSource());

        // Extract column metadata from annotations
        List<SourceColumn> columns = extractColumns(sourceClass);

        // Determine persistent fields based on aggregation function
        List<PersistentField> fields = determinePersistentFields(
            metric.getAggregationFunction(), columns);

        return CodeGenModel.builder()
            .metric(metric)
            .sourceColumns(columns)
            .persistentFields(fields)
            .build();
    }
}

Code Generation

V2 uses FreeMarker templates for generating method bodies:

<#-- metrics/deserialize.ftl -->
@Override
public void deserialize(RemoteData remoteData) {
    <#list serializeFields.stringFields as field>
    if (!remoteData.getDataStrings(${field?index}).isEmpty()) {
        this.${field.setter}(remoteData.getDataStrings(${field?index}));
    }
    </#list>
    // ... more field types
}

This is cleaner than V1's string concatenation approach and easier to maintain.

Future Possibilities

With V2's clean architecture, several enhancements become easier:

Static analysis - Detect issues before runtime (unused metrics, type mismatches)
Incremental compilation - Only regenerate changed metrics
New aggregation functions - Cleaner extension points

Conclusion

OAL V2 is a foundational improvement to SkyWalking's core metrics engine. While invisible to end users, it sets the stage for easier maintenance and future enhancements. The bytecode-level validation gives us confidence that this rewrite preserves exact behavioral compatibility (minus one bug fix).

We welcome feedback and contributions! If you're interested in the implementation details, check out:

PR: #13699
Code: oap-server/oal-rt/src/main/java/org/apache/skywalking/oal/v2/
Tests: oap-server/oal-rt/src/test/java/org/apache/skywalking/oal/v2/

Apache SkyWalking is an open-source APM system. Learn more at skywalking.apache.org.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine (since 10.4.0) #13701

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine (since 10.4.0) #13701

Uh oh!

wu-sheng Feb 12, 2026 Collaborator

OAL V2: A Complete Rewrite of Apache SkyWalking's Metrics Engine

Background: What is OAL?

Why Rewrite?

1. Mutable Parser Models

2. Mixed Concerns

3. Limited Error Messages

4. String-Based Type Handling

What Changed in V2

Clean Architecture with Immutable Models

Type-Safe Filter Values

Rich Error Messages with Source Location

Independent Codebase

How We Validated: Bytecode-Level Comparison

Results

Migration Strategy

For Users: Zero Action Required

For Contributors: Simplified Development

For Maintainers: Cleaner Codebase

Technical Deep Dive

Parser Implementation

Enrichment Phase

Code Generation

Future Possibilities

Conclusion

Replies: 0 comments

wu-sheng
Feb 12, 2026
Collaborator