-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: PPL command - WMA Trendline #3293
base: main
Are you sure you want to change the base?
Feature: PPL command - WMA Trendline #3293
Conversation
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to f6c82ae
.
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
private static class NumericWmaEvaluator implements WmaTrendlineEvaluator { | ||
|
||
private static final NumericWmaEvaluator INSTANCE = new NumericWmaEvaluator(); | ||
|
||
@Override | ||
public ExprValue evaluate(ArrayList<ExprValue> receivedValues) { | ||
double sum = 0D; | ||
int totalWeight = (receivedValues.size() * (receivedValues.size() + 1)) / 2; | ||
for (int i = 0; i < receivedValues.size(); i++) { | ||
sum += receivedValues.get(i).doubleValue() * ((i + 1D) / totalWeight); | ||
} | ||
return new ExprDoubleValue(sum); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to have been quite an explosion of classes here! Some ideas:
- Do we need to do this with singleton classes? Rather than an
Evaluator
class with a singleevaluate
method, would it make sense to instead store a reference to aevaluate
method that takes a List ofExprValue
s and returns the resultingExprValue
? Don't know what is more common, but it seems like a lot to have seven separateEvaluator
classes; perhaps sevenEvaluator
method would be more manageable? - As alluded to above, could this take
List<ExprValue>
instead of anArrayList
? - As mentioned in a previous comment, I think
LinkedList
might be more efficient for this purpose -- but not if we iterator over it like this! Are we able to use an iterator to do so? Then we would getO(1)
access for both aLinkedList
or anArrayList
. See below for how I think this could look. - As mentioned in a previous comment, SMA and WMA are pretty much the same ... only the weights change. Would it be possible to change the
evaluate
signature so that it takes aList<ExprValue>
and aList<Double>
of weights? i.e.evaluate(List<ExprValue> values, List<Double> weights)
? This would have the added advantage that, in the case of WMA, we wouldn't need to re-calculate the weights every time.
public ExprValue evaluate(List<ExprValue> values) {
// Calculate weights
int n = values.size();
double denominator = (double) (n * (n + 1)) / 2.0;
List<Double> weights = IntStream.range(n, 0).mapToDouble(i -> (double) i / denominator).boxed().toList().reversed();
// Calculate weighted average.
double average = 0.0;
Iterator<ExprValue> valuesIterator = values.iterator();
Iterator<Double> weightsIterator = weights.iterator();
while(valuesIterator.hasNext() && weightsIterator.hasNext()) {
average += valuesIterator.next().doubleValue() * weightsIterator.next();
}
return new ExprDoubleValue(average);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the point for too many Evaluator, however indeed these are two distinct types of Evaluator
, in the case of SMA, Evaluator
not just take the new item, but also read the running-total
in order to avoid re-computation, and all evaluators under the SMA umbrella take this into consideration especially on the API signature level, ex:
public ExprValue evaluate(Expression runningTotal, LiteralExpression numberOfDataPoints) {
return DSL.divide(runningTotal, numberOfDataPoints).valueOf();
}
Also the method calculateFirstTotal is unique to SMA:
@Override
public Expression calculateFirstTotal(List<ExprValue> dataPoints) {
Expression total = DSL.literal(0.0D);
for (ExprValue dataPoint : dataPoints) {
total = DSL.add(total, DSL.literal(dataPoint.doubleValue()));
}
return DSL.literal(total.valueOf().doubleValue());
}
However in contrast, WMA don't share the same characteristic, which re-computation is required upon every update, also, the concept of running-total
is not applicable here.
Regarding the concern of too many evaluators, I have updated to move all WMA related evaludator inside of class WeightedMovingAverageAccumulator
in order to further narrow down the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Good explanation for why SMA and WMA accumulators should be separate - thanks for that!
For WMA, I still think there is opportunity to combine some common logic. The weights should only need to be calculated once - can we pass them directly to the Evaluator from the Accumulator? Moreover, as I (tried to) describe in this comment, can we extract the common logic from all the WMA accumulators (related applying the weights to the values), and only have the different parts (mapping each value to a number, mapping the result back to the right ExprValue
) split out into the different implementations.
As mentioned elsewhere, I think we should also use an iterator for these loops, so that we can get O(1) access for a linked list or queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above make sense, and I have now using BiFunction
to replace original usage of custom interface of wmaEvalulator
along with the static class creation.
Also I have moved out the logic of totalWeight
calculation out from respective function call, as that is common to all calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above make sense, and I have now using BiFunction to replace original usage of custom interface of wmaEvalulator along with the static class creation.
Thanks. This looks good to me.
Also I have moved out the logic of totalWeight calculation out from respective function call, as that is common to all calculation.
I like that you have moved the totalWeight
(the denominator) so that it doesn't need to be re-calculated each time. However, I think it is possible to move all the weight calculations out of these functions, and just pass a list containing all the weights to the function (i.e. to store n
"complete" weights values as a WeightedMovingAverageAccumulator
member).
I also think it is possible to extract a the common logic for determining sum
into another helper function: as mentioned in a previous comment, that logic is the same, except for mapping the ExprValue
list to longs.
Let me know if you want to discuss either.
integ-test/src/test/java/org/opensearch/sql/ppl/TrendlineCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/ppl/TrendlineCommandIT.java
Outdated
Show resolved
Hide resolved
LGTM - thanks |
Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>
Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to 4d95529
.
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
|
||
WMA(t) = ( Σ from i=t−n+1 to t of (w[i] * f[i]) ) / ( Σ from i=t−n+1 to t of w[i] ) | ||
|
||
Example 1: Calculate the weighted moving average on one field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the headers for the WMA examples include "weighted moving average:, probably makes sense to update the headers for the SMA examples to include "simple moving average" for consistency?
super( | ||
DSL.literal(computation.getNumberOfDataPoints().doubleValue()), | ||
EvictingQueue.create(computation.getNumberOfDataPoints())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this logic be moved up to the parent class? Seems like it is duplicated in both the SMA and WMA cases, with the exception that the SMA sub-class uses an EvictingQueue
while the WMA sub-class uses a LinkedList
. Can't they both use the same data structure?
core/src/main/java/org/opensearch/sql/planner/physical/TrendlineOperator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to b8cf496d
.
Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Description
This PR introduces a new varient (Weighted Moving Average - WMA) of trendline command, along with corresponded test-cases and documentation.
Reference:
WMA calculation: https://corporatefinanceinstitute.com/resources/career-map/sell-side/capital-markets/weighted-moving-average-wma/
WMA implementation on Spark: opensearch-project/opensearch-spark#872
High-level changes:
double
exclusive, with test-cases.Related Issues
Resolves #3011, #3277
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.