Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce data aggregation (#194) #234

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft

Conversation

stklcode
Copy link
Contributor

@stklcode stklcode commented Sep 8, 2022

We add a new column named hits to the database table with default value of 1.
The tracking routine is preserved as is, i.e. it will insert a single row per hit.

The cleanup routine (cron job) is extended with an aggregation function that walks through the database and groups distinct date/referrer/target combinations with the sum of all hits.

before:

created referrer target hits
2022-09-07 example.com / 1
2022-09-07 pluginkollektiv.org /test/ 1
2022-09-07 example.com / 1
2022-09-07 example.com / 1
2022-09-08 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org / 1

after:

created referrer target hits
2022-09-07 example.com / 3
2022-09-07 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org / 1

The output is generated from SUM(hits) instead of COUNT(*) then.

We also introduce a boolean book statify__skip_aggregation (defaults to false) to keep the previous behavior and disable the new aggregation.

This might be necessary if the inserted entry is extended with custom columns, e.g. in a statify__visit_saved hook. If this is the case, aggregation will fail, because we don't know about additional columns. We could of course add another hook for custom aggregation columns and dynamically build up the statements, but for most users this won't be necessary and it can be extended in future releases as well.

@stklcode stklcode linked an issue Sep 8, 2022 that may be closed by this pull request
@stklcode stklcode force-pushed the feature/194-aggregation branch from 575425b to ac6cb56 Compare September 8, 2022 16:11
This column will be used for aggregation and is filled with 1 initially.
The number of hits for a specific date/target/referrer is thus no longer
the number of entries, but the sum of all hits.
We now aggregate statistics data in the cleanup routine preserving
only distinct entries for each date/referrer/target combination with
the sum of all hits.
If Statify is extended by custom columns the aggregation routine will
fail. To support such scenarios, we introduce a boolean hook, so the
previous behavior can be preserved without breaking compatibility.
@stklcode stklcode force-pushed the feature/194-aggregation branch from ac6cb56 to fd58722 Compare December 15, 2022 08:56
@stklcode stklcode force-pushed the feature/194-aggregation branch from fd58722 to fe5e993 Compare March 18, 2023 15:41
@pluginkollektiv pluginkollektiv deleted a comment from sonarqubecloud bot Mar 18, 2023
@pluginkollektiv pluginkollektiv deleted a comment from sonarqubecloud bot Mar 18, 2023
@stklcode stklcode force-pushed the feature/194-aggregation branch from fe5e993 to cdf4f5a Compare March 18, 2023 15:54
@stklcode stklcode marked this pull request as ready for review March 18, 2023 16:01
@stklcode stklcode force-pushed the feature/194-aggregation branch from cdf4f5a to 18b8a63 Compare March 20, 2023 08:21
@stklcode stklcode force-pushed the feature/194-aggregation branch from 18b8a63 to f8e356a Compare March 20, 2023 19:48
@stklcode stklcode force-pushed the feature/194-aggregation branch from f8e356a to fe3dd71 Compare September 17, 2023 09:22
@pluginkollektiv pluginkollektiv deleted a comment from sonarqubecloud bot Sep 17, 2023
@patrickrobrecht
Copy link
Member

These changes would require changes to keep the functionality of Statify - Extended Evaluation, but hopefully helps to improve the performance with large Statify tables (I've installs with 1,000,000+ entries): patrickrobrecht/extended-evaluation-for-statify#29.

@patrickrobrecht patrickrobrecht self-requested a review October 10, 2023 20:24
@stklcode stklcode force-pushed the feature/194-aggregation branch 2 times, most recently from 48543bb to e789310 Compare October 18, 2023 09:03
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@stklcode
Copy link
Contributor Author

stklcode commented Nov 5, 2023

As previously discussed in various places this feature might need some additional planning.

  • With planned additional data collection (probably even dynamically extensible) this might require constant refactoring.
  • Is database size really still an issue so that it's worth the effort?
  • Probably drop this idea in favor of maybe some more elaborate caching of evaluation.

@stklcode stklcode marked this pull request as draft March 29, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data aggregation
3 participants