Operational Visibility with Anomaly Detection and Root Cause Analysis

Goal of this sample is to acceleratre deployment of Industrial IoT Visibility Patterns. There is no one size fits all solution, as there are many considerations, please review them before moving your workload to production.

Operational Visiblity enables manufacturers to gain insights & drive decision-making to improve quality and be more efficient and improve safety. There are many data sources including Historians, IIoT telemetry, Operational Systems like MES, ERP, etc. that are key for building a Visibility Control Tower. In this sample we will use the IIoT telemetry data gathered from our previous Connectivity Sample and try to understand trends via time series analysis, perform anomaly detection, root cause analysis and trigger alerts & actions based on anomalies.

High Level Design

Pre-requisites

You have Connectivity Deployment Sample working, or have your IIoT data in Data Explorer already.
Add new Status tag in Kepware which changes every 15 min. Use the LineSimulationDemo-2.json file to update the configuration.
Add the Status tag in opcconfig.json file as shown.
Copy the update file to EFLOW VM and verify the update in PowerShell:
- Copy-EflowVMFile -fromFile "opcconfig.json" -toFile ~\opcconfig\opcconfig.json -pushFile
- Connect-EflowVm
- sudo iotedge logs OPCPublisher --tail 20

Time Series Analysis

Analyzing telemetry data can provide insights such as monitoring service health, physical production processes, and usage trends. Data Explorer contains native support for creation, manipulation, and analysis of multiple time series. For this sample we will build some queries to perform time series analysis and also build a near real-time dashboard to montior all our lines.

Open TimeSeriesQueries.kql file in Data Explorer Web UI
Plot Temperature Sensor for Line 1 with Seasonal, Trend, and Residual components.
Plot Anomalies for Humidity Sensor on Line 1.
In the Data Explorer Web UI, click on Dashboards > New Dashboard > Import Dashboard file and import the iiot-operational-visibility-dashboard.json file.
Click on the IIoT Operational Visibility dashboard

Anomaly Detection and Root Cause Analysis

Anomaly detection is the first step towards predicitve maintenance. It helps understand our baseline of what "normal" looks like, and detects values that are above or below the normal line. It depends on the process and sensor calibration but a simple approach could be to set a hard threshold to send alerts if the normal value goes above/below 2 or 3 standard deviations. This works well when in normal scenarios.

For more complex scenarios, which includes analyzing & correlating multiple sensor values, a better approach may be to use machine learning algorithms that can detect trends over a large corpus of data and extract correlations between mulitple variables simultaneously.

In this sample we will use Metrics Advisor service to setup anomaly detection using machine learning (smart detection), and see how we can perform some root cause analysis.

Setup Metrics Advisor

Create a new Metrics Advisor resource using azure portal in the same region as your Data Explorer.
Sign in to the Metrics Advisor Portal and verify the access.
Assign Databse permissions to Metrics Advisor using Managed Identity of the Metrics Advisor resource. Assign Database permissions using Permissions > Add, and then select the Metrics Advisor name in the Principals list.
Create the connection string as: Data Source=<Data Explorer Cluster URI>;Initial Catalog=<Database>

Data onboarding

Add data feed to fetch data every 1 min using the above connection string and below query:

telemetry | where SourceTimestamp >= datetime(@IntervalStart) and SourceTimestamp < datetime(@IntervalEnd) | summarize avg(todouble(Value)) by ExpandedNodeId, DataSetWriterID, bin(SourceTimestamp, 1m) | project SensorTag = replace_string(ExpandedNodeId,"nsu=KEPServerEX;s=Simulator.",""), SensorValue = avg_Value, SourceTimestamp
Click on Load Data, select the Dimension, Measure and Timestamp columns as show below and Verify Schema
Keep the other defaults AS-IS, we don't need to setup automatic rollups as we have already done the 1 minute rollup in our data explorer query. And use Smart filling for missing points.
Provide a data feed name iiotmfgdevdb and click Submit
Click on the Visit Data Feed button, it should redirect to the data feed progress page. You can also click on Data feeds in the left navigation menu to see this page.

Anomaly detection

From the Data feed page, click on the SensorValue Metric name to setup anomaly configuration.
Click on Choose Series and select all the Humidity tags.
For the Metric-level configuration, select Hard threshold, Above, 55 and click Save. The dashboard should immediately show the anomalies based on the threshold.
From the image above you can see that Line4.Humidity has most number of anomalies in that date range.
Click on the Line4.Humidity and we can drill down further to understand the details of the anomaly.

Root Cause Analysis

Let the Metrics Advisor run for few minutes and then click on Incident hub to perform deeper analysis on each of the anomaly incident.
Click on Diagnose for further drill down and cross dimension diagnostic drill down.
Add/Remove relevant cross dimensions in the chart to perform root cause analysis.

Alerts & Business Actions

Create Logic App Workflow

For this sample we have created a simple workflow to send an email with the anomal details

Update values in anomaly-alert-workflow.json file
- Ocp-Apim-Subscription-Key : Key from Metrics Advisor Resource in Azure Portal
- x-api-key : API key from the Metrics Advisor Portal
- To : Email address to send the Alert Email
- Replace additional connector details if you're using another email connector than the Office 365 Outlook connector.
- Replace the placeholder subscription values (00000000-0000-0000-0000-000000000000) for your connection identifiers (connectionId and id) under the connections parameter ($connections) with your own subscription values. Also replace the resource group name iiotsample with your resource group name in connectionId
Deploy new logic app workflow using Azure CLI:
- az logic workflow create --resource-group "iiotsample" --location "westus2" --name "anomaly-alert-workflow" --definition "anomaly-alert-workflow.json"
Open the Workflow in Azure Portal and copy HTTP POST URL

Create Metrics Advisor Hook

Open Metrics Advisor Portal and create a new hook as shown below. Use the workflow HTTP POST URL copied from above step

Create Metrics Advisor Alerts

In Metrics Advisor Portal, click on Data feeds > SensorValue (Metric name) and add new Alerting Configuration as show below:
Example Alert Email
Click the Alert count on the Incident hub to drill down on each Alert.

Integration with Data Lakehouse

Data Lakehouse is an emerging pattern in the data platform world. The key aspect is that traditional Data Lakes have now advanced and many of the capabilities can overlap with a traditional Data Warehouse. In lot of scenarios it is much more flexible to store the raw data in a Data Lake and use services like Azure Synapse Analytics to process and query that data using T-SQL.

There are multiple ways to push telemetry data from IoT Hub to a Data lake. For this sample we will use the built-in route available in IoT Hub to push the data in AVRO format to a Data Lake. We will use this data in later samples to build machine learning models.

Create Data Lake

Create a Storage Account with hierarchical namespace
az storage account create --name iiotmfgdatalake --resource-group iiotsample --location westus2 --sku Standard_RAGRS --kind StorageV2 --enable-hierarchical-namespace true
az storage fs create -n raw --account-name iiotmfgdatalake --auth-mode login

Create Message Routing in IoT Hub

az iot hub identity assign --name iiotmfghub --resource-group iiotsample --system-assigned
Assign Storage Blob Data Contributor permissions on raw data lake container to iiotmfghub managed identity
Add new routing endpoint to Storage
Validate data in Data Lake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Operational Visibility with Anomaly Detection and Root Cause Analysis

High Level Design

Pre-requisites

Time Series Analysis

Anomaly Detection and Root Cause Analysis

Alerts & Business Actions

Integration with Data Lakehouse

Files

README.md

Latest commit

History

README.md

File metadata and controls

Operational Visibility with Anomaly Detection and Root Cause Analysis

High Level Design

Pre-requisites

Time Series Analysis

Anomaly Detection and Root Cause Analysis

Alerts & Business Actions

Integration with Data Lakehouse