Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: PPL command - WMA Trendline #3293

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

andy-k-improving
Copy link
Contributor

@andy-k-improving andy-k-improving commented Feb 3, 2025

Description

This PR introduces a new varient (Weighted Moving Average - WMA) of trendline command, along with corresponded test-cases and documentation.

Reference:
WMA calculation: https://corporatefinanceinstitute.com/resources/career-map/sell-side/capital-markets/weighted-moving-average-wma/
WMA implementation on Spark: opensearch-project/opensearch-spark#872

High-level changes:

  • Introduce WMA trendline feature
  • Fix existing SMA implementation to support numeric types (short, int, long....etc) instead of double exclusive, with test-cases.

Related Issues

Resolves #3011, #3277

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Copy link
Contributor

@currantw currantw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed to f6c82ae.

Comment on lines 234 to 247
private static class NumericWmaEvaluator implements WmaTrendlineEvaluator {

private static final NumericWmaEvaluator INSTANCE = new NumericWmaEvaluator();

@Override
public ExprValue evaluate(ArrayList<ExprValue> receivedValues) {
double sum = 0D;
int totalWeight = (receivedValues.size() * (receivedValues.size() + 1)) / 2;
for (int i = 0; i < receivedValues.size(); i++) {
sum += receivedValues.get(i).doubleValue() * ((i + 1D) / totalWeight);
}
return new ExprDoubleValue(sum);
}
}
Copy link
Contributor

@currantw currantw Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to have been quite an explosion of classes here! Some ideas:

  • Do we need to do this with singleton classes? Rather than an Evaluator class with a single evaluate method, would it make sense to instead store a reference to a evaluate method that takes a List of ExprValues and returns the resulting ExprValue? Don't know what is more common, but it seems like a lot to have seven separate Evaluator classes; perhaps seven Evaluator method would be more manageable?
  • As alluded to above, could this take List<ExprValue> instead of an ArrayList?
  • As mentioned in a previous comment, I think LinkedList might be more efficient for this purpose -- but not if we iterator over it like this! Are we able to use an iterator to do so? Then we would get O(1) access for both a LinkedList or an ArrayList. See below for how I think this could look.
  • As mentioned in a previous comment, SMA and WMA are pretty much the same ... only the weights change. Would it be possible to change the evaluate signature so that it takes a List<ExprValue> and a List<Double> of weights? i.e. evaluate(List<ExprValue> values, List<Double> weights)? This would have the added advantage that, in the case of WMA, we wouldn't need to re-calculate the weights every time.
  public ExprValue evaluate(List<ExprValue> values) {

    // Calculate weights
    int n = values.size();
    double denominator = (double) (n * (n + 1)) / 2.0;
    List<Double> weights = IntStream.range(n, 0).mapToDouble(i -> (double) i / denominator).boxed().toList().reversed();

    // Calculate weighted average.
    double average = 0.0;
    
    Iterator<ExprValue> valuesIterator = values.iterator();
    Iterator<Double> weightsIterator = weights.iterator();
    
    while(valuesIterator.hasNext() && weightsIterator.hasNext()) {
      average += valuesIterator.next().doubleValue() * weightsIterator.next();
    }

    return new ExprDoubleValue(average);
  }

Copy link
Contributor Author

@andy-k-improving andy-k-improving Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point for too many Evaluator, however indeed these are two distinct types of Evaluator, in the case of SMA, Evaluator not just take the new item, but also read the running-total in order to avoid re-computation, and all evaluators under the SMA umbrella take this into consideration especially on the API signature level, ex:

public ExprValue evaluate(Expression runningTotal, LiteralExpression numberOfDataPoints) {
      return DSL.divide(runningTotal, numberOfDataPoints).valueOf();
    }

Also the method calculateFirstTotal is unique to SMA:

@Override
    public Expression calculateFirstTotal(List<ExprValue> dataPoints) {
      Expression total = DSL.literal(0.0D);
      for (ExprValue dataPoint : dataPoints) {
        total = DSL.add(total, DSL.literal(dataPoint.doubleValue()));
      }
      return DSL.literal(total.valueOf().doubleValue());
    }

However in contrast, WMA don't share the same characteristic, which re-computation is required upon every update, also, the concept of running-total is not applicable here.

Regarding the concern of too many evaluators, I have updated to move all WMA related evaludator inside of class WeightedMovingAverageAccumulator in order to further narrow down the scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Good explanation for why SMA and WMA accumulators should be separate - thanks for that!

For WMA, I still think there is opportunity to combine some common logic. The weights should only need to be calculated once - can we pass them directly to the Evaluator from the Accumulator? Moreover, as I (tried to) describe in this comment, can we extract the common logic from all the WMA accumulators (related applying the weights to the values), and only have the different parts (mapping each value to a number, mapping the result back to the right ExprValue) split out into the different implementations.

As mentioned elsewhere, I think we should also use an iterator for these loops, so that we can get O(1) access for a linked list or queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above make sense, and I have now using BiFunction to replace original usage of custom interface of wmaEvalulator along with the static class creation.
Also I have moved out the logic of totalWeight calculation out from respective function call, as that is common to all calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above make sense, and I have now using BiFunction to replace original usage of custom interface of wmaEvalulator along with the static class creation.

Thanks. This looks good to me.

Also I have moved out the logic of totalWeight calculation out from respective function call, as that is common to all calculation.

I like that you have moved the totalWeight (the denominator) so that it doesn't need to be re-calculated each time. However, I think it is possible to move all the weight calculations out of these functions, and just pass a list containing all the weights to the function (i.e. to store n "complete" weights values as a WeightedMovingAverageAccumulator member).

I also think it is possible to extract a the common logic for determining sum into another helper function: as mentioned in a previous comment, that logic is the same, except for mapping the ExprValue list to longs.

Let me know if you want to discuss either.

docs/user/ppl/cmd/trendline.rst Outdated Show resolved Hide resolved
docs/user/ppl/cmd/trendline.rst Outdated Show resolved Hide resolved
YANG-DB
YANG-DB previously approved these changes Feb 4, 2025
@YANG-DB
Copy link
Member

YANG-DB commented Feb 4, 2025

LGTM - thanks

Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
andy-k-improving and others added 3 commits February 6, 2025 11:46
Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Copy link
Contributor

@currantw currantw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed to 4d95529.

andy-k-improving and others added 6 commits February 7, 2025 16:07
Signed-off-by: Andy Kwok <[email protected]>
Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>

WMA(t) = ( Σ from i=t−n+1 to t of (w[i] * f[i]) ) / ( Σ from i=t−n+1 to t of w[i] )

Example 1: Calculate the weighted moving average on one field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the headers for the WMA examples include "weighted moving average:, probably makes sense to update the headers for the SMA examples to include "simple moving average" for consistency?

Comment on lines 144 to 146
super(
DSL.literal(computation.getNumberOfDataPoints().doubleValue()),
EvictingQueue.create(computation.getNumberOfDataPoints()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic be moved up to the parent class? Seems like it is duplicated in both the SMA and WMA cases, with the exception that the SMA sub-class uses an EvictingQueue while the WMA sub-class uses a LinkedList. Can't they both use the same data structure?

Copy link
Contributor

@currantw currantw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed to b8cf496d.

andy-k-improving and others added 4 commits February 11, 2025 10:44
Co-authored-by: Taylor Curran <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[META]PPL new trendline command
3 participants