From 148591c016aa4e302a239f29146301910b79139c Mon Sep 17 00:00:00 2001 From: diosmosis Date: Fri, 16 Jun 2023 13:04:17 -0700 Subject: [PATCH 1/7] add document on how to write a RecordBuilder --- docs/5.x/writing-a-record-builder.md | 227 +++++++++++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 docs/5.x/writing-a-record-builder.md diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md new file mode 100644 index 000000000..574a000f6 --- /dev/null +++ b/docs/5.x/writing-a-record-builder.md @@ -0,0 +1,227 @@ +--- +category: Develop +--- + +
+**This API is unstable.** + +The RecordBuilder API will eventually be public and the only way to define archiving logic, but for now the API is unstable +and subject to change. Please be aware it could potentially change between minor version releases. +
+ +# Writing a RecordBuilder + +RecordBuilders encapsulate the smallest units of log aggregation logic required to generate records for a plugin. + +They define two methods: `aggregate()` which builds the actual `DataTable` and numeric records to insert into archive tables +and `getRecordMetadata()` which returns information about what records the `RecordBuilder` builds. + +`aggregate()` will generally aggregate data from log tables to create records, but it does not have to. An example of a use case +without aggregation would be importing analytics data from another service. + +`getRecordMetadata()` is used when aggregating records for non-day periods. In this case, Matomo will find the record values +for the subperiods of the non-day period and aggregate them together. + +If your plugin needs to insert data into the archive tables during archiving, then you'll want to create your own `RecordBuilders`. +This guide describes how to do that. + +## How to create one + +### Step one: identify the list of records and log aggregation queries you want to bundle together + +Log aggregation queries are expensive (especially with segmentation) and Matomo wants to be able to run as few of them +as necessary at a time. A `RecordBuilder` is meant to encapsulate the smallest amount of archiving logic possible so Matomo +can run just one bit at a time if it needs to. + +Many times this will either be running a single log aggregation query to generate a single `DataTable` or running a single +log aggregation query to generate multiple numeric metrics. Sometimes it will mean running multiple log aggregation queries +to generate a single `DataTable` or multiple log aggregation queries to generate multiple `DataTable`s and multiple metrics. + +It is up to you as a developer to find the balance between efficiency (executing the fewest log aggregation queries overall) +and modularity (having `RecordBuilders` that individually do as little as possible). + +Once you've done this, create the new `RecordBuilder` class in a `RecordBuilders` subfolder of your plugin. For example, +`/path/to/matomo/plugins/MyPlugin/RecordBuilders/MyRecordBuilder`. + +**A note about Parameterized RecordBuilders** + +`RecordBuilder`s that can be created without specifying constructor arguments (as in, are default constructable) +are found and created automatically by Matomo. But it is also possible to create `RecordBuilder`s that require +parameters. These `RecordBuilder`s are added via the `Archiver.addRecordBuilders` event. + +The ability to create parameterized `RecordBuilder`s may not be necessary in most cases, but if your plugin +manages entities and provides reports about those entities, it can be used to avoid having to run a query for +every entity in the database in one `RecordBuilder`. + +Examples of plugins that use this feature are the Custom Reports premium feature and the A/B Testing premium feature. +Each of these plugins use a `RecordBuilder` that takes an ID. For Custom Reports this is the ID of the specific custom +report and for A/B Testing this is the ID of the experiment. + +### Step two: implement `getRecordMetadata()` + +Once you know what queries the `RecordBuilder`s you are going to create will execute, you can start writing some code. +The first thing to do is implement the `getRecordMetadata()` method. + +All this method does is return a list of `Record` entries describing the records the builder will create: + +``` +use Piwik\ArchiveProcessor\Record; + +public function getRecordMetadata(ArchiveProcessor $archiveProcessor): array +{ + return [ + Record::make(Record::TYPE_BLOB, 'MyPlugin_myRecord'), + Record::make(Record::TYPE_NUMERIC, 'MyPlugin_myMetric'), + ... + ]; +} +``` + +The code in this method will usually look like the above, but it doesn't have to just be a hard-coded array. +You can use the `ArchiveProcessor` to get the current site/period/segment or fetch system settings or measurable +settings and vary the result based on that. The only requirement is that every `Record` returns is matches +what can be returned by the `aggregate()` method, which we'll look at next. + +### Step three: implement `aggregate()` + +The next step is to implement your actual log aggregation logic in the `aggregate()` method. This method accepts +an `ArchiveProcessor` and returns an array mapping record names with record values to insert. Record values are +either numeric metric values or `DataTable` instances, which get serialized and inserted as blobs. + +As for how they are created, well, there is no straightforward way to define how log aggregation is done. + +The current pattern in Matomo is to use the core `LogAggregator` class to query log data and loop through the result. +If your plugin provides its own additional log tables, then the pattern is to define your own `Aggregator` classes +to build and execute log aggregation SQL queries, and use those in `RecordBuilders`. + +An example of this might look like: + +``` +public function aggregate(ArchiveProcessor $archiveProcessor): array +{ + $logAggregator = $archiveProcessor->getLogAggregator(); + + $report = new DataTable(); + + $query = $logAggregator->queryVisitsByDimension(['label' => 'config_browser_name']); + while ($row = $query->fetch()) { + $columns = [ + Metrics::INDEX_NB_UNIQ_VISITORS => $row[Metrics::INDEX_NB_UNIQ_VISITORS], + Metrics::INDEX_NB_VISITS => $row[Metrics::INDEX_NB_VISITS], + Metrics::INDEX_NB_ACTIONS => $row[Metrics::INDEX_NB_ACTIONS], + Metrics::INDEX_NB_USERS => $row[Metrics::INDEX_NB_USERS], + Metrics::INDEX_MAX_ACTIONS => $row[Metrics::INDEX_MAX_ACTIONS], + Metrics::INDEX_SUM_VISIT_LENGTH => $row[Metrics::INDEX_SUM_VISIT_LENGTH], + Metrics::INDEX_BOUNCE_COUNT => $row[Metrics::INDEX_BOUNCE_COUNT], + Metrics::INDEX_NB_VISITS_CONVERTED => $row[Metrics::INDEX_NB_VISITS_CONVERTED], + ]; + + $report->sumRowWithLabel($row['label'] ?? '', $columns); + } + + return [ + 'MyPlugin_myRecord' => $report, + ]; +} +``` + +This example queries the `log_visit` table, grouping by the `config_browser_name` column and aggregating visit metrics. +Then for each row of that query, adds the metrics to a `DataTable` which is eventually returned. + +Most `aggregate()` methods will be more complicated than this, but hopefully it provides you with a general understanding +of how they should work. We recommend looking at existing `RecordBuilder`s in Matomo as well to see what is possible. + +### Step four: decide whether you need to set custom row limits or aggregation operations + +At this point, the hard parts are over. The last two steps are just finishing touches. + +By default, Matomo does not limit the data that is inserted into archive tables. For reports that have a limited number +of rows, like the `VisitorInterest.getVisitsByVisitCount` and `UserCountry.getCountry`, this is fine. But for reports +with a variable number of rows, it is good practice to make sure the number of rows is capped. + +To set a limit, set the `maxRowsInTable` and `maxRowsInSubtable` properties in the constructor of your `RecordBuilder`. +This can be hard-coded or it can come from configuration: + +``` +class MyRecordBuilder extends RecordBuilder +{ + public function __construct() + { + parent::__construct(); + $this->maxRowsInTable = (int)Config::getInstance()->MyPlugin['datatable_archiving_maximum_rows']; + $this->maxRowsInSubtable = (int)Config::getInstance()->MyPlugin['datatable_archiving_maximum_rows_subtable']; + + // we want to sort by the most important metric in our reports before we cut off rows + $this->columnToSortByBeforeTruncation = Metrics::INDEX_NB_VISITS; + } +} +``` + +If you don't know what to use, you can set both values to `Config::getInstance()->General['datatable_archiving_maximum_rows_standard']`. + +Also note we set `columnToSortByBeforeTruncation` to make sure the rows with the least visits are the ones that get removed. + +Additionally, if your plugin provides metrics that should be aggregated together with an operation other than being `sum`-ed, +you will need to set the `$columnAggregationOps` property: + +``` +class MyRecordBuilder extends RecordBuilder +{ + public function __construct() + { + parent::__construct(); + + // ... + + $this->columnAggregationOps = [ + 'my_max_metric' => 'max', + 'my_min_metric' => 'min', + 'my_other_metric' => function ($thisValue, $otherValue) { + // custom aggregation logic here + }, + ]; + } +} +``` + +Note that each of these settings can also be overridden for specific records by setting the relevant property +on `Record`s in your `getRecordMetadata()` method. + +### Step five: if your RecordBuilder is parameterized, implement the relevant event + +If your `RecordBuilder` is not parameterized then there's nothing else to do. You're done and Matomo will detect and use it. + +If it is parameterized, then there's still one thing to do. Matomo will not be able to automatically create a `RecordBuilder` +that takes parameters, so it must be added manually in the `Archiver.addRecordBuilders` event like so: + +``` +class MyPlugin +{ + public function registerEvents() + { + $hooks = [ + 'Archiver.addRecordBuilders' => 'addRecordBuilders', + ]; + return $hooks; + } + + public function addRecordBuilders(array &$recordBuilders): void + { + $idSite = \Piwik\Request::fromRequest()->getIntegerParameter('idSite', 0); + if (!$idSite) { + return; + } + + $entities = StaticContainer::get(MyEntityDao::class)->getAllEntitiesForSite($idSite); + foreach ($entities as $entity) { + $recordBuilders[] = new MyRecordBuilder($entity); + } + } +} +``` + +Here we create a `RecordBuilder` instance for every entity our plugin manages. + +--- + +And that's it, your `RecordBuilder` is done. From 49a3f79d96bf0ddd12ead9e21d4429312d9fb319 Mon Sep 17 00:00:00 2001 From: dizzy Date: Sat, 17 Jun 2023 19:10:47 -0700 Subject: [PATCH 2/7] Update writing-a-record-builder.md --- docs/5.x/writing-a-record-builder.md | 34 ++++++++++++++-------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 574a000f6..3e0d0ab48 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -11,9 +11,9 @@ and subject to change. Please be aware it could potentially change between minor # Writing a RecordBuilder -RecordBuilders encapsulate the smallest units of log aggregation logic required to generate records for a plugin. +RecordBuilders encapsulate the smallest units of aggregation logic required to generate records for a plugin. -They define two methods: `aggregate()` which builds the actual `DataTable` and numeric records to insert into archive tables +They define two methods: `aggregate()` which builds the actual `DataTable` & numeric records to insert into archive tables, and `getRecordMetadata()` which returns information about what records the `RecordBuilder` builds. `aggregate()` will generally aggregate data from log tables to create records, but it does not have to. An example of a use case @@ -22,7 +22,7 @@ without aggregation would be importing analytics data from another service. `getRecordMetadata()` is used when aggregating records for non-day periods. In this case, Matomo will find the record values for the subperiods of the non-day period and aggregate them together. -If your plugin needs to insert data into the archive tables during archiving, then you'll want to create your own `RecordBuilders`. +If your plugin needs to insert data into the archive tables during archiving, then you'll want to create your own `RecordBuilder` classes. This guide describes how to do that. ## How to create one @@ -30,17 +30,17 @@ This guide describes how to do that. ### Step one: identify the list of records and log aggregation queries you want to bundle together Log aggregation queries are expensive (especially with segmentation) and Matomo wants to be able to run as few of them -as necessary at a time. A `RecordBuilder` is meant to encapsulate the smallest amount of archiving logic possible so Matomo -can run just one bit at a time if it needs to. +as possible at a time. A `RecordBuilder` is meant to encapsulate the smallest amount of archiving logic possible, allowing Matomo +can run just what it needs to. Many times this will either be running a single log aggregation query to generate a single `DataTable` or running a single log aggregation query to generate multiple numeric metrics. Sometimes it will mean running multiple log aggregation queries -to generate a single `DataTable` or multiple log aggregation queries to generate multiple `DataTable`s and multiple metrics. +to generate a single `DataTable` or running multiple log aggregation queries to generate multiple `DataTable`s and multiple metrics. It is up to you as a developer to find the balance between efficiency (executing the fewest log aggregation queries overall) and modularity (having `RecordBuilders` that individually do as little as possible). -Once you've done this, create the new `RecordBuilder` class in a `RecordBuilders` subfolder of your plugin. For example, +Once you've identified the `RecordBuilder`s you'll need, create empty classes for them in a `RecordBuilders` subfolder of your plugin. For example, `/path/to/matomo/plugins/MyPlugin/RecordBuilders/MyRecordBuilder`. **A note about Parameterized RecordBuilders** @@ -51,7 +51,7 @@ parameters. These `RecordBuilder`s are added via the `Archiver.addRecordBuilders The ability to create parameterized `RecordBuilder`s may not be necessary in most cases, but if your plugin manages entities and provides reports about those entities, it can be used to avoid having to run a query for -every entity in the database in one `RecordBuilder`. +every entity in the database within a single `RecordBuilder`. Examples of plugins that use this feature are the Custom Reports premium feature and the A/B Testing premium feature. Each of these plugins use a `RecordBuilder` that takes an ID. For Custom Reports this is the ID of the specific custom @@ -59,7 +59,7 @@ report and for A/B Testing this is the ID of the experiment. ### Step two: implement `getRecordMetadata()` -Once you know what queries the `RecordBuilder`s you are going to create will execute, you can start writing some code. +Once you know what queries the `RecordBuilder`s you are going to create will execute, you can start coding. The first thing to do is implement the `getRecordMetadata()` method. All this method does is return a list of `Record` entries describing the records the builder will create: @@ -77,9 +77,9 @@ public function getRecordMetadata(ArchiveProcessor $archiveProcessor): array } ``` -The code in this method will usually look like the above, but it doesn't have to just be a hard-coded array. +The above is a typical example of how this method would be implemented, but it doesn't have to just be a hard-coded array. You can use the `ArchiveProcessor` to get the current site/period/segment or fetch system settings or measurable -settings and vary the result based on that. The only requirement is that every `Record` returns is matches +settings and vary the result based on that information. The only requirement is that every `Record` returned matches what can be returned by the `aggregate()` method, which we'll look at next. ### Step three: implement `aggregate()` @@ -92,7 +92,7 @@ As for how they are created, well, there is no straightforward way to define how The current pattern in Matomo is to use the core `LogAggregator` class to query log data and loop through the result. If your plugin provides its own additional log tables, then the pattern is to define your own `Aggregator` classes -to build and execute log aggregation SQL queries, and use those in `RecordBuilders`. +to build and execute log aggregation SQL queries, and use those classes in your `RecordBuilders`. An example of this might look like: @@ -126,7 +126,7 @@ public function aggregate(ArchiveProcessor $archiveProcessor): array ``` This example queries the `log_visit` table, grouping by the `config_browser_name` column and aggregating visit metrics. -Then for each row of that query, adds the metrics to a `DataTable` which is eventually returned. +Then, for each row of that query, it adds the metrics to a `DataTable` which is eventually returned. Most `aggregate()` methods will be more complicated than this, but hopefully it provides you with a general understanding of how they should work. We recommend looking at existing `RecordBuilder`s in Matomo as well to see what is possible. @@ -136,8 +136,8 @@ of how they should work. We recommend looking at existing `RecordBuilder`s in Ma At this point, the hard parts are over. The last two steps are just finishing touches. By default, Matomo does not limit the data that is inserted into archive tables. For reports that have a limited number -of rows, like the `VisitorInterest.getVisitsByVisitCount` and `UserCountry.getCountry`, this is fine. But for reports -with a variable number of rows, it is good practice to make sure the number of rows is capped. +of rows, like the `VisitorInterest.getVisitsByVisitCount` and `UserCountry.getCountry`, this is acceptable. But for reports +with a variable number of rows, it's good practice to make sure the number of rows is capped. To set a limit, set the `maxRowsInTable` and `maxRowsInSubtable` properties in the constructor of your `RecordBuilder`. This can be hard-coded or it can come from configuration: @@ -185,13 +185,13 @@ class MyRecordBuilder extends RecordBuilder ``` Note that each of these settings can also be overridden for specific records by setting the relevant property -on `Record`s in your `getRecordMetadata()` method. +on `Record` instances in your `getRecordMetadata()` method. ### Step five: if your RecordBuilder is parameterized, implement the relevant event If your `RecordBuilder` is not parameterized then there's nothing else to do. You're done and Matomo will detect and use it. -If it is parameterized, then there's still one thing to do. Matomo will not be able to automatically create a `RecordBuilder` +If it is parameterized, then there's still one thing left to do. Matomo will not be able to automatically create a `RecordBuilder` that takes parameters, so it must be added manually in the `Archiver.addRecordBuilders` event like so: ``` From 1e5ad878ce157736045b70bbd528700ec8dbfca7 Mon Sep 17 00:00:00 2001 From: dizzy Date: Sun, 18 Jun 2023 10:36:47 -0700 Subject: [PATCH 3/7] Update docs/5.x/writing-a-record-builder.md Co-authored-by: Michal Kleiner --- docs/5.x/writing-a-record-builder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 3e0d0ab48..718def002 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -31,7 +31,7 @@ This guide describes how to do that. Log aggregation queries are expensive (especially with segmentation) and Matomo wants to be able to run as few of them as possible at a time. A `RecordBuilder` is meant to encapsulate the smallest amount of archiving logic possible, allowing Matomo -can run just what it needs to. +to run just what it needs to. Many times this will either be running a single log aggregation query to generate a single `DataTable` or running a single log aggregation query to generate multiple numeric metrics. Sometimes it will mean running multiple log aggregation queries From a540e1fc511923e15086a676c69b5949c03e2b4f Mon Sep 17 00:00:00 2001 From: dizzy Date: Sun, 18 Jun 2023 10:37:06 -0700 Subject: [PATCH 4/7] Update docs/5.x/writing-a-record-builder.md Co-authored-by: Michal Kleiner --- docs/5.x/writing-a-record-builder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 718def002..7b1949a4b 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -38,7 +38,7 @@ log aggregation query to generate multiple numeric metrics. Sometimes it will me to generate a single `DataTable` or running multiple log aggregation queries to generate multiple `DataTable`s and multiple metrics. It is up to you as a developer to find the balance between efficiency (executing the fewest log aggregation queries overall) -and modularity (having `RecordBuilders` that individually do as little as possible). +and modularity (having `RecordBuilder`s that individually do as little as possible). Once you've identified the `RecordBuilder`s you'll need, create empty classes for them in a `RecordBuilders` subfolder of your plugin. For example, `/path/to/matomo/plugins/MyPlugin/RecordBuilders/MyRecordBuilder`. From 255d8ad585c1bca77f51a3988422ce97e278cfd3 Mon Sep 17 00:00:00 2001 From: diosmosis Date: Wed, 29 May 2024 04:21:46 -0700 Subject: [PATCH 5/7] add a quick section on overriding non-day period aggregation --- docs/5.x/writing-a-record-builder.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 7b1949a4b..56b04a658 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -225,3 +225,26 @@ Here we create a `RecordBuilder` instance for every entity our plugin manages. --- And that's it, your `RecordBuilder` is done. + +## Advanced + +### Overriding non-day period aggregation + +Archiving for non-day periods is handled by the `buildForNonDayPeriod()` method, which +will use record metadata to query and aggregate records for the requested period's subperiods. + +Normally, when creating a `RecordBuilder`, you will not need to interact with it. But, in +some rare cases, the default behavior of aggregating subperiods will not be enough. + +In this case, it is perfectly acceptable to override the `buildForNonDayPeriod()` method +and implement your own logic. + +If doing so, keep the following in mind: + +* when querying for records of subperiods, do not query fetch all of them in memory at once. + Record data can take up a significant amount of memory, and querying all the data at once here + can cause out of memory errors for the archiving process. Instead, use a method like + `Archive::querySingleBlob()` which uses a cursor. + +* insert blob records via the `RecordBuilder::insertBlobRecord()` method. For numeric records, + use `ArchiveProcessor::insertNumericRecords()`. From 397eb4a37ddb728b432cd491e97b5ce6e41806e8 Mon Sep 17 00:00:00 2001 From: diosmosis Date: Thu, 30 May 2024 03:51:13 -0700 Subject: [PATCH 6/7] apply review feedback --- docs/5.x/writing-a-record-builder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 56b04a658..569ab5dd9 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -176,7 +176,7 @@ class MyRecordBuilder extends RecordBuilder $this->columnAggregationOps = [ 'my_max_metric' => 'max', 'my_min_metric' => 'min', - 'my_other_metric' => function ($thisValue, $otherValue) { + 'my_other_metric' => function ($thisValue, $otherValue, $thisRow, $otherRow) { // custom aggregation logic here }, ]; From 2b9ccbf0b95ce8f881ede7a1aaa10625caab17b0 Mon Sep 17 00:00:00 2001 From: dizzy Date: Sun, 10 Nov 2024 17:05:09 -0800 Subject: [PATCH 7/7] Update writing-a-record-builder.md --- docs/5.x/writing-a-record-builder.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/5.x/writing-a-record-builder.md b/docs/5.x/writing-a-record-builder.md index 569ab5dd9..99fc23c34 100644 --- a/docs/5.x/writing-a-record-builder.md +++ b/docs/5.x/writing-a-record-builder.md @@ -121,6 +121,7 @@ public function aggregate(ArchiveProcessor $archiveProcessor): array return [ 'MyPlugin_myRecord' => $report, + 'MyPlugin_myMetric' => $report->getRowsCount(), ]; } ```