|
| 1 | +[[ml-configuring-populations]] |
| 2 | += Performing population analysis |
| 3 | + |
| 4 | +Population analysis is a method of detecting anomalies by comparing the behavior of entities or events within a specified population. |
| 5 | +In this approach, {ml} analytics create a profile of what is considered "typical" behavior for users, machines, or other entities over a specified time period. |
| 6 | +An entity is considered as anomalous when its behavior deviates from that of the population, indicating abnormal activity compared to the rest of the population. |
| 7 | + |
| 8 | +This type of analysis is most effective when the behavior within a group is generally homogeneous, allowing for the identification of unusual patterns. |
| 9 | +However, it is less useful when members of the population show vastly different behaviors. |
| 10 | +In such cases, you can segment your data into groups with similar behaviors and run separate jobs for each. |
| 11 | +This can be done by using a query filter in the datafeed or by applying the `partition_field_name` to split the analysis across different groups. |
| 12 | + |
| 13 | +Population analysis is resource-efficient and scales well, enabling the analysis of populations consisting of hundreds of thousands or even millions of entities with a lower resource footprint than analyzing each series individually. |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +[discrete] |
| 18 | +[[population-recommendations]] |
| 19 | +== Recommendations |
| 20 | + |
| 21 | +* Use population analysis when the behavior within a group is mostly homogeneous, as it helps identify anomalous patterns effectively. |
| 22 | +* Leverage population analysis when dealing with large-scale datasets. |
| 23 | +* Avoid using population analysis when members of the population exhibit vastly different behaviors, as it may not be effective. |
| 24 | + |
| 25 | + |
| 26 | +[discrete] |
| 27 | +[[creating-population-jobs]] |
| 28 | +== Creating population jobs |
| 29 | + |
| 30 | +. In {kib}, navigate to **{ml-app} > Anomaly Detection > Jobs**. |
| 31 | +. Click **Create job**, select the {data-source} you want to analyze. |
| 32 | +. Select the **Population** wizard from the list. |
| 33 | +. Choose a population field - it's the `clientip` field in this example - and the metric you want to use for the analysis - `Mean(bytes)` in this example. |
| 34 | ++ |
| 35 | +-- |
| 36 | +[role="screenshot"] |
| 37 | +image::images/ml-population-wizard.png[Creating a population job in Kibana] |
| 38 | +-- |
| 39 | +. Click **Next**. |
| 40 | +. Provide a job ID and click **Next**. |
| 41 | +. If the validation is successful, click **Next** to review the summary of the job creation. |
| 42 | +. Click **Create job**. |
| 43 | + |
| 44 | +[%collapsible] |
| 45 | +.API example |
| 46 | +==== |
| 47 | +To specify the population, use the `over_field_name` property. For example: |
| 48 | +
|
| 49 | +[source,console] |
| 50 | +---------------------------------- |
| 51 | +PUT _ml/anomaly_detectors/population |
| 52 | +{ |
| 53 | + "description" : "Population analysis", |
| 54 | + "analysis_config" : { |
| 55 | + "bucket_span":"15m", |
| 56 | + "influencers": [ |
| 57 | + "clientip" |
| 58 | + ], |
| 59 | + "detectors": [ |
| 60 | + { |
| 61 | + "function": "mean", |
| 62 | + "field_name": "bytes", |
| 63 | + "over_field_name": "clientip" <1> |
| 64 | + } |
| 65 | + ] |
| 66 | + }, |
| 67 | + "data_description" : { |
| 68 | + "time_field":"timestamp", |
| 69 | + "time_format": "epoch_ms" |
| 70 | + } |
| 71 | +} |
| 72 | +---------------------------------- |
| 73 | +// TEST[skip:needs-licence] |
| 74 | +
|
| 75 | +<1> This `over_field_name` property indicates that the metrics for each client (as identified by their IP address) are analyzed relative to other clients in each bucket. |
| 76 | +==== |
| 77 | + |
| 78 | +[discrete] |
| 79 | +[[population-job-results]] |
| 80 | +=== Viewing the job results |
| 81 | + |
| 82 | +Use the **Anomaly Explorer** in {kib} to view the analysis results: |
| 83 | + |
| 84 | +[role="screenshot"] |
| 85 | +image::images/ml-population-anomalies.png["Population results in the Anomaly Explorer"] |
| 86 | + |
| 87 | +The results are often quite sparse. |
| 88 | +There might be just a few data points for the selected time period. |
| 89 | +Population analysis is particularly useful when you have many entities and the data for specific entitles is sporadic or sparse. |
| 90 | +If you click on a section in the timeline or swim lanes, you can see more details about the anomalies: |
| 91 | + |
| 92 | +[role="screenshot"] |
| 93 | +image::images/ml-population-anomaly.png["Anomaly details for a specific user"] |
| 94 | + |
| 95 | +In this example, the client IP address `167.145.234.154` received a high volume of bytes on the date and time shown. |
| 96 | +This event is anomalous because the mean is four times higher than the expected behavior of the population. |
0 commit comments