Feature: PPL command - WMA Trendline #3293

andy-k-improving · 2025-02-03T23:24:27Z

Description

This PR introduces a new varient (Weighted Moving Average - WMA) of trendline command, along with corresponded test-cases and documentation.

Reference:
WMA calculation: https://corporatefinanceinstitute.com/resources/career-map/sell-side/capital-markets/weighted-moving-average-wma/
WMA implementation on Spark: opensearch-project/opensearch-spark#872

High-level changes:

Introduce WMA trendline feature
Fix existing SMA implementation to support numeric types (short, int, long....etc) instead of double exclusive, with test-cases.

Related Issues

Resolves #3011, #3277

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andy Kwok <[email protected]>

currantw

Reviewed to f6c82ae.

core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java

currantw · 2025-02-04T18:04:17Z

core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java

+    private static class NumericWmaEvaluator implements WmaTrendlineEvaluator {
+
+      private static final NumericWmaEvaluator INSTANCE = new NumericWmaEvaluator();
+
+      @Override
+      public ExprValue evaluate(ArrayList<ExprValue> receivedValues) {
+        double sum = 0D;
+        int totalWeight = (receivedValues.size() * (receivedValues.size() + 1)) / 2;
+        for (int i = 0; i < receivedValues.size(); i++) {
+          sum += receivedValues.get(i).doubleValue() * ((i + 1D) / totalWeight);
+        }
+        return new ExprDoubleValue(sum);
+      }
+    }


There seems to have been quite an explosion of classes here! Some ideas:

Do we need to do this with singleton classes? Rather than an Evaluator class with a single evaluate method, would it make sense to instead store a reference to a evaluate method that takes a List of ExprValues and returns the resulting ExprValue? Don't know what is more common, but it seems like a lot to have seven separate Evaluator classes; perhaps seven Evaluator method would be more manageable?

As alluded to above, could this take List<ExprValue> instead of an ArrayList?

As mentioned in a previous comment, I think LinkedList might be more efficient for this purpose -- but not if we iterator over it like this! Are we able to use an iterator to do so? Then we would get O(1) access for both a LinkedList or an ArrayList. See below for how I think this could look.

As mentioned in a previous comment, SMA and WMA are pretty much the same ... only the weights change. Would it be possible to change the evaluate signature so that it takes a List<ExprValue> and a List<Double> of weights? i.e. evaluate(List<ExprValue> values, List<Double> weights)? This would have the added advantage that, in the case of WMA, we wouldn't need to re-calculate the weights every time.

public ExprValue evaluate(List<ExprValue> values) { // Calculate weights int n = values.size(); double denominator = (double) (n * (n + 1)) / 2.0; List<Double> weights = IntStream.range(n, 0).mapToDouble(i -> (double) i / denominator).boxed().toList().reversed(); // Calculate weighted average. double average = 0.0; Iterator<ExprValue> valuesIterator = values.iterator(); Iterator<Double> weightsIterator = weights.iterator(); while(valuesIterator.hasNext() && weightsIterator.hasNext()) { average += valuesIterator.next().doubleValue() * weightsIterator.next(); } return new ExprDoubleValue(average); }

I understand the point for too many Evaluator, however indeed these are two distinct types of Evaluator, in the case of SMA, Evaluator not just take the new item, but also read the running-total in order to avoid re-computation, and all evaluators under the SMA umbrella take this into consideration especially on the API signature level, ex:

public ExprValue evaluate(Expression runningTotal, LiteralExpression numberOfDataPoints) { return DSL.divide(runningTotal, numberOfDataPoints).valueOf(); }

Also the method calculateFirstTotal is unique to SMA:

@Override public Expression calculateFirstTotal(List<ExprValue> dataPoints) { Expression total = DSL.literal(0.0D); for (ExprValue dataPoint : dataPoints) { total = DSL.add(total, DSL.literal(dataPoint.doubleValue())); } return DSL.literal(total.valueOf().doubleValue()); }

However in contrast, WMA don't share the same characteristic, which re-computation is required upon every update, also, the concept of running-total is not applicable here.

Regarding the concern of too many evaluators, I have updated to move all WMA related evaludator inside of class WeightedMovingAverageAccumulator in order to further narrow down the scope.

Thanks. Good explanation for why SMA and WMA accumulators should be separate - thanks for that!

For WMA, I still think there is opportunity to combine some common logic. The weights should only need to be calculated once - can we pass them directly to the Evaluator from the Accumulator? Moreover, as I (tried to) describe in this comment, can we extract the common logic from all the WMA accumulators (related applying the weights to the values), and only have the different parts (mapping each value to a number, mapping the result back to the right ExprValue) split out into the different implementations.

As mentioned elsewhere, I think we should also use an iterator for these loops, so that we can get O(1) access for a linked list or queue.

Above make sense, and I have now using BiFunction to replace original usage of custom interface of wmaEvalulator along with the static class creation.
Also I have moved out the logic of totalWeight calculation out from respective function call, as that is common to all calculation.

Above make sense, and I have now using BiFunction to replace original usage of custom interface of wmaEvalulator along with the static class creation.

Thanks. This looks good to me.

Also I have moved out the logic of totalWeight calculation out from respective function call, as that is common to all calculation.

I like that you have moved the totalWeight (the denominator) so that it doesn't need to be re-calculated each time. However, I think it is possible to move all the weight calculations out of these functions, and just pass a list containing all the weights to the function (i.e. to store n "complete" weights values as a WeightedMovingAverageAccumulator member).

I also think it is possible to extract a the common logic for determining sum into another helper function: as mentioned in a previous comment, that logic is the same, except for mapping the ExprValue list to longs.

Let me know if you want to discuss either.

docs/user/ppl/cmd/trendline.rst

integ-test/src/test/java/org/opensearch/sql/ppl/TrendlineCommandIT.java

YANG-DB · 2025-02-04T19:10:52Z

LGTM - thanks

Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>

Signed-off-by: Andy Kwok <[email protected]>

core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java

currantw

Reviewed to 4d95529.

Signed-off-by: Andy Kwok <[email protected]>

Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>

Signed-off-by: Andy Kwok <[email protected]>

docs/user/ppl/cmd/trendline.rst

currantw · 2025-02-11T01:08:02Z

docs/user/ppl/cmd/trendline.rst

+
+    WMA(t) = ( Σ from i=t−n+1 to t of (w[i] * f[i]) ) / ( Σ from i=t−n+1 to t of w[i] )
+
+Example 1: Calculate the weighted moving average on one field.


Since the headers for the WMA examples include "weighted moving average:, probably makes sense to update the headers for the SMA examples to include "simple moving average" for consistency?

currantw · 2025-02-11T01:14:12Z

core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java

+      super(
+          DSL.literal(computation.getNumberOfDataPoints().doubleValue()),
+          EvictingQueue.create(computation.getNumberOfDataPoints()));


Can this logic be moved up to the parent class? Seems like it is duplicated in both the SMA and WMA cases, with the exception that the SMA sub-class uses an EvictingQueue while the WMA sub-class uses a LinkedList. Can't they both use the same data structure?

core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java

currantw

Reviewed to b8cf496d.

Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>

Signed-off-by: Andy Kwok <[email protected]>

andy-k-improving added 8 commits January 31, 2025 15:32

WMA

927fbfa

Signed-off-by: Andy Kwok <[email protected]>

Update switch

3836f31

Signed-off-by: Andy Kwok <[email protected]>

Unit-test

0fcd688

Signed-off-by: Andy Kwok <[email protected]>

Integ-test

898d3a6

Signed-off-by: Andy Kwok <[email protected]>

Doc test

ab58370

Signed-off-by: Andy Kwok <[email protected]>

Spotless

51d8395

Signed-off-by: Andy Kwok <[email protected]>

Update test cases

e4c25b3

Signed-off-by: Andy Kwok <[email protected]>

Update test coverage

f6c82ae

Signed-off-by: Andy Kwok <[email protected]>

andy-k-improving requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, Swiddis, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224, LantaoJin and noCharger as code owners February 3, 2025 23:24

currantw reviewed Feb 4, 2025

View reviewed changes

YANG-DB previously approved these changes Feb 4, 2025

View reviewed changes

Update docs/user/ppl/cmd/trendline.rst

d0cf898

Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>

andy-k-improving dismissed YANG-DB’s stale review via d0cf898 February 6, 2025 19:46

andy-k-improving and others added 3 commits February 6, 2025 11:46

Update docs/user/ppl/cmd/trendline.rst

ce94ca9

Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>

Remove debug

f32e5ad

Signed-off-by: Andy Kwok <[email protected]>

Address code comments

4d95529

Signed-off-by: Andy Kwok <[email protected]>