Exploratory Data Analysis for failure predictions using machine learning

Goal of this sample is to acceleratre deployment of Industrial IoT Prediction Patterns. There is no one size fits all solution, as there are many considerations, please review them before moving your workload to production.

Exploratory Data Analysis (EDA) is the first step before we build any custom models using machine learning. This is a critical and often complex step where in we normalize & clean the data, understand data distribution, outliers, correlations and assess the data for various hypothesis and experiments.

Scenario / Hypothesis

Our scenario is around predicting failures (quality related) based on machine condition. The telemetry data contains a point in time snapshot of all the sensor values, how these values actually impacted quality failures conditions is logged in a different system. For this sample we will use:

Simulated Sensor Data
- Generated via an IoT Edge Module
- Contains 40+ different sensor values
- Contain production batch number
Production Quality Data
- Contains production batch number
- Contains quality error code for each batch
- 1 = Meets quality expectations | 0 = Does not meet quality expectations.

High Level Design

Pre-requisites

You have Connectivity Deployment Sample working, or have your IIoT data in Data Explorer already.

Simulate Additional Sensor Data

Add new SimulatedManufacturingSensor module to the IoT Edge Device created from above sample.
- In Azure Portal select IoT Hub > IoT Edge > [Your Device] > Set Modules
- Select Add > IoT Edge Module
- Module Name: SimulatedManufacturingSensors, Image URI: ghcr.io/jomit/simulatedmanufacturingsensors:0.0.1-amd64 and click Add
- Click Next and verify that the upstream route value is FROM /messages/* INTO $upstream
- Click Next and Create
- Wait for few seconds and verify that module is deployed and is sending the logs
- Verify the data in Data Explorer using the query in VerifySimulatedData.kql

Upload production quality data

Open the data lake created earlier in Azure Portal and upload the batch-quality-data.csv file to a folder named qualitydata

Create Machine Learning Workspace

Machine Learning workspace provides end to end data science lifecycle management services. It also provides a centralized place collaborate on artifacts around machine learning development and deployment.

Create a new machine learning workspace
- az ml workspace create -w iiotml -g iiotsample -l westus2
Create a new compute instance for development. (Compute instances are typically per user so prefix with your name.)
- az ml computetarget create computeinstance --name jomitdev --vm-size STANDARD_DS3_V2 -w iiotml -g iiotsample
Go to the Notebooks section in Machine Learning Studio portal and upload the files from notebooks folder

Create Datastore

Open Machine Learning Studio and select the workspace created above.
Create new datastore, to connect with the telemetry data lake that we created before.

Create raw Dataset

Open and run 1_create_raw_dataset.ipynb notebook

Perform Feature Engineering

Open and run 2_exploratory_analysis_feature_selection.ipynb notebook

Perform basic Frequency Analysis

Open and run 2_frequency_analysis.ipynb notebook

Build Baseline Model(s)

Open and run 3_baseline_modeling.ipynb notebook

Align Business and ML Objectives

For any Machine Learning project to succeed, it’s crucial to tie Machine Learning metrics with the overall business performance. Here's an example of how you may approach this for quality prediction scenarios:

Build a baseline of business metrics that you want improve using ML. For example:
- Number of quality failures
- Percentange of scrap
- Additional time spent on quality rework
- Cost of quality failures
- Cost of quality rework
Select machine learning metrics for model performance based on use case / scenario. For example:
- "Precision" attempts to answer: What proportion of positive identifications were actually correct?
- "Recall" attempts to answer : What proportion of actual positives were identified correctly?
- For scenarios where cost of wrong prediction is high, choose higher "precision"
- For scenarios where cost of missing any detection is high, choose higher "recall"

Perform A/B testing and quantify business metric improvements and cost impact as shown in below example:

	Current	With ML (precision=50%, recall=90%)	Cost Impact
Number of quality failures per year	100	25	cost per quality failure - 75%
Percentage of scrap	15%	9%	cost of scrap - 6%
Additional time spent on quality rework	10%	2%	cost of rework - 8%
...	...	...	...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Exploratory Data Analysis for failure predictions using machine learning

Scenario / Hypothesis

High Level Design

Pre-requisites

Simulate Additional Sensor Data

Upload production quality data

Create Machine Learning Workspace

Create Datastore

Create raw Dataset

Perform Feature Engineering

Perform basic Frequency Analysis

Build Baseline Model(s)

Align Business and ML Objectives

Files

README.md

Latest commit

History

README.md

File metadata and controls

Exploratory Data Analysis for failure predictions using machine learning

Scenario / Hypothesis

High Level Design

Pre-requisites

Simulate Additional Sensor Data

Upload production quality data

Create Machine Learning Workspace

Create Datastore

Create raw Dataset

Perform Feature Engineering

Perform basic Frequency Analysis

Build Baseline Model(s)

Align Business and ML Objectives