Machine Learning for Games

Artificial Intelligence

Machine Learning for Games

Gerard Escudero & Samir Kanaan, 2019

Outline

.cyan[Introduction]
Machine Learning
Deep Learning
Reinforcement Learning
References

Classification Data Example

class	sepal length	sepal width	petal length	petal width
setosa	5.1	3.5	1.4	0.2
setosa	4.9	3.0	1.4	0.2
versicolor	6.1	2.9	4.7	1.4
versicolor	5.6	2.9	3.6	1.3
virginica	7.6	3.0	6.6	2.1
virginica	4.9	2.5	4.5	1.7
150 rows or examples (50 per class).red[*]
]

The .blue[class] or .blue[target] column is usually refered as vector .blue[Y]
The matrix of the rest of columns (.blue[attributes] or .blue[features]) is usually referred as matrix .blue[X]

Main objective

where:

data = previous table
unseen = [4.9, 3.1, 1.5, 0.1]
prediction = "setosa"

Regression Data Example

quality	density	pH	sulphates	alcohol
6	0.998	3.16	0.58	9.8
4	0.9948	3.51	0.43	11.4
8	0.9973	3.35	0.86	12.8
3	0.9994	3.16	0.63	8.4
7	0.99514	3.44	0.68	10.55
1599 examples & 12 columns (11 attributes + 1 target).red[*]
]

The main diference between classification and regression is the Y or target values:

.blue[Classification]: discrete or nominal values
Example: Iris, {“setosa”, “virginica”, “versicolor”}.
.blue[Regression]: continuous or real values
Example: WineQuality, values from 0 to 10.

Example method:

kNN with k=1

How can be give a prediction to next examples?

class	sep-len	sep-wid	pet-len	pet-wid
??	4.9	3.1	1.5	0.1
Unseen classification example on Iris]

target	density	pH	sulphates	alcohol
??	0.99546	3.29	0.54	10.1
Unseen regression example on WineQuality]

Let’s begin with a representation of the problems...

Classification Data Example

Regression Data Example

1 Nearest Neighbors algorithm

classification & regression

$$h(T)=y_i$$

.center[where $i = argmin_i(distance(X_i,T))$, $n=\vert features\vert$] .center[and $distance(X,Z) = \sqrt{(x_1-z_1)^2+\ldots+(x_n-z_n)^2}$]

Classification example (Iris):
- distances: [0.47, 0.17, 3.66, 2.53, 6.11, 3.45]
- prediction = setosa (0.17)
Regression example (WineQuality):
- distances: [0.33, 1.32, 2.72, 1.71, 0.49]
- prediction = 6 (0.33)

--

.blue[lazy learning]: it means that the kNN does nothing in learning step
- it calculates all in classify step
This can produce some problems real time applications
- such as .blue[Games]

How about games?

Application examples:

.blue[Classification]: decision about braking a car.

Brake?	Distance	Speed
Y	2.4	11.3
Y	3.2	70.2
N	75.7	72.7
N	2.8	15.2
%?	79.2	12.1
.center[Source: (Millington, 2019)]

.blue[Regression]: required amount of force in a curve.

Decision Learning

A basic application is learning .blue[decisions] from .blue[observations]
- decisions will become classes ($Y$)
- observations will become features ($X$)
Some kind of measure is needed to evaluate the model
.blue[The Balance of Effort]
- Many times is harder learning than human dessign (such as Behaviour Tree)

Some categories

When?

.blue[Online]: learn as playing
.blue[Offline]:
- learn from saved matches
- .blue[bootstrapping]: AI component playing among them
  Same algorithm with diferent paramenters
  Diferent algorithms
  (See chess example)

What?

.blue[Intra-Behaviour]: atomic behaviour
- Example: previous examples of braking or curves
.blue[Inter-Behaviour]: learns Decision Taking layer
- Example: SC2LE(StarCraft II Learning Environment)

Action Prediction

It is a simple technique that tries to guess player movements from previous recordings
Human behaviour is not random
It is also called RPS from: rock-paper-scissor game
Example for a simple RPS:
- Movement: Left, Right
- Recording: "LRRLRLLL"
It requires a window size
- Example: previous 3 movements
It can produce unbeatable AIs
- Some level adjustment would be needed

Outline

.brown[Introduction]
.cyan[Machine Learning]
- .cyan[Naïve Bayes]
- Decision Trees
Deep Learning
Reinforcement Learning
References

Maximum likelihood estimation

class	cap-shape	cap-color	gill-size	gill-color
poisonous	convex	brown	narrow	black
edible	convex	yellow	broad	black
edible	bell	white	broad	brown
poisonous	convex	white	narrow	brown
edible	convex	yellow	broad	brown
edible	bell	white	broad	brown
poisonous	convex	white	narrow	pink
.center[up to 8 124 examples & 22 attributes .red[*]]

What is $P(poisonous)$?

$$P(poisonous)=\frac{N(poisonous)}{N}=\frac{3}{7}\approx 0.429$$

Naïve Bayes

Learning Model

$$\text{model}=[P(y)\simeq\frac{N(y)}{N},P(x_i|y)\simeq\frac{N(x_i|y)}{N(y)};\forall y \forall x_i]$$

$y$	$P(y)$
poisonous	0.429
edible	0.571
]
.col2[
attr:value	poisonous
:-----------------	----------:
cap-shape:convex	1
cap-shape:bell	0
cap-color:brown	0.33
cap-color:yellow	0
cap-color:white	0.67
gill-size:narrow	1
gill-size:broad	0
gill-color:black	0.33
gill-color:brown	0.33
gill-color:pink	0.33
]
]

Naïve Bayes

Classification

$$h(T) \approx argmax_y P(y)\cdot P(t_1|y)\cdot\ldots\cdot P(t_n|y)$$

Test example $T$:

class	cap-shape	cap-color	gill-size	gill-color
??	convex	brown	narrow	black

Numbers: $$P(poisonous|T) = 0.429 \cdot 1 \cdot 0.33 \cdot 1 \cdot 0.33 = 0.047$$ $$P(edible|T) = 0.571 \cdot 0.5 \cdot 0 \cdot 0 \cdot 0.25 = 0$$
Prediction: $$h(T) = poisonous$$

Naïve Bayes

Notes

It needs a smoothing technique to avoid zero counts
- Example: Laplace $$P(x_i|y)\approx\frac{N(x_i|y)+1}{N(y)+N}$$
It is empiricaly a decent classifier but a bad estimator
- This means that $P(y|T)$ is not a good probability

Implementation

Using LINQ of C#: view.red[*] / output / download code

Gaussian Naïve Bayes I

How about numerical features?

Brake?	Distance	Speed
Y	2.4	11.3
Y	3.2	70.2
N	75.7	72.7
N	2.8	15.2
%?	79.2	12.1
.center[Source: (Millington, 2019)]

$$P(x_i|y)=\frac{1}{\sqrt{2\pi\sigma_y^2}}\exp\left(-\frac{(x_i-\mu_y)^2}{2\sigma_y^2}\right)$$

.center[where $\mu_y=\frac{x_1^y+\cdots+x_n^y}{n_y}$ and $\sigma_y^2=\frac{(x_1^y - \mu_y)^2+\cdots+(x_n^y - \mu_y)^2}{n_y}$]

Gaussian Naïve Bayes II

$y$	$P(y)$
Y	0.5
N	0.5

	Distance	Speed
$\mu_Y$	2.8	40.75
$\mu_N$	39.25	43.95
$\sigma_Y^2$	0.32	1734.605
$\sigma_N^2$	2657.205	1653.125
]
.col2[
Classification:

$$P(Y|T)=0.5\cdot 0.0\cdot 0.00756 = 0.0$$

$$P(N|T)=0.5\cdot 0.00573\cdot 0.00722 = 0.00002$$

$$h(T)=N$$

Note:

$$P(speed=12.1|Y)=\frac{1}{\sqrt{2\cdot\pi\cdot 1734.605}}\cdot$$

$$\cdot \exp\left(-\frac{(12.1-40.75)^2}{2\cdot 1734.605}\right)=0.00669$$ ]]

Gaussian Naïve Bayes III

Implementation:

Using LINQ of C#:
- view.red[*] / output / download code
Using sklearn and python in colab:
- view / download
Using sklearn and Unity
- sklearn: view / download / data
- Unity: view.red[*] / download / model

Outline

.brown[Introduction]
.cyan[Machine Learning]
- .brown[Naïve Bayes]
- .cyan[Decision Trees]
Deep Learning
Reinforcement Learning
References

Decision Trees

Splitting measure:

Decision Trees: $accuracy=\frac{N(ok)}{N}$
ID3: $entropy=-\sum_{\forall y} P(y) \cdot log_2(P(y))$
CART: $gini = 1 - \sum_{\forall y} P(y)^2$ + binary trees
.blue[Our approach]: $gini$ + k-ary trees
- nominal and numeric features
- classification and regression
- one of the easiest decision trees

Example I

class	cap-shape	cap-color
poisonous	convex	brown
edible	convex	yellow
edible	bell	white
poisonous	convex	white
edible	convex	yellow
edible	bell	white
poisonous	convex	white
]
.col2[
algorithm:

each node of the tree from

the minimum weighted sum of

Gini index:

Example II

cap-shape:

cap-shape	poisonous	edible	#examples
convex	3	2	5
bell	0	2	2

$gini(\text{cap-shape}=\text{convex})=1-(\frac{3}{5})^2-(\frac{2}{5})^2=0.48$

$gini(\text{cap-shape}=\text{bell})=1-(\frac{0}{2})^2-(\frac{2}{2})^2=0.0$

$gini(\text{cap-shape})=\frac{5}{7}\cdot 0.48+\frac{2}{7}\cdot 0.0=0.343$ ]

Example III

cap-color:

cap-color	poisonous	edible	#examples
brown	1	0	1
yellow	0	2	2
white	2	2	4

$gini(\text{cap-color}=\text{brown})=1-(\frac{1}{1})^2-(\frac{0}{1})^2=0.0$

$gini(\text{cap-color}=\text{yellow})=1-(\frac{0}{2})^2-(\frac{2}{2})^2=0.0$

$gini(\text{cap-color}=\text{white})=1-(\frac{2}{4})^2-(\frac{2}{4})^2=0.5$

$gini(\text{cap-color})=\frac{1}{7}\cdot 0.0+\frac{2}{7}\cdot 0.0+\frac{4}{7}\cdot 0.5=0.286$ ]

Example IV

Selecting best feature:

best feature will be that with minimum gini index:

.small[ $$\text{best_feature}=\min((0.343,\text{cap-shape}),(0.286,\text{cap-color}))=\text{cap-color}$$ ]

every value with only a class will be a leaf:
- brown $\rightarrow$ poisonous
- yellow $\rightarrow$ edible
a new set is built for the rest of values

class	cap-shape
edible	bell
poisonous	convex
edible	bell
poisonous	convex
white examples without cap-color
]

the process restarts with the new set

Example V

Resulting Tree:

chefboost

view / download

Cutting Points I

What about numerical attributes?

class	width
versicolor	2.9
versicolor	2.9
virginica	3.0
virginica	2.5

Cutting points:

class	width	cutting points	weighted ginis
virginica	2.5
versicolor	2.9	2.7	$\frac{1}{4}\cdot 0.0+\frac{3}{4}\cdot 0.45=0.3375$
versicolor	2.9
virginica	3.0	2.95	$\frac{3}{4}\cdot 0.45+\frac{1}{4}\cdot 0.0=0.3375$
.center[.small[cutting points for width attribute]]

Cutting Points II

Gini example:

width	versicolor	virginica	#examples	gini
< 2.7	0	1	1	$1-(\frac{0}{1})^2-(\frac{1}{1})^2=0$
> 2.7	2	1	3	$1-(\frac{2}{3})^2-(\frac{1}{3})^2=0.45$

chefboost

view / download

]]

Regression

standard deviation as splitting measure

$$\sigma=\sqrt{\frac{(x_1-\mu)^2+\dots+(x_n-\mu)^2}{n}}$$

target	outlook	wind
25	sun	weak
30	sun	strong
52	rain	weak
23	rain	strong
45	rain	weak
.center[.small[Source]]
]
.col2[
total amounts:

$$\mu=35$$ $$\sigma=11.472$$ ]]

Regression II

outlook	$\mu$	$\sigma$	#examples
sun	27.5	2.5	2
rain	40.0	12.356	3
]
.col2[
weighted sum:

$$\sigma_{weighted}=\frac{2}{5}\cdot 2.5+\frac{3}{5}\cdot 12.356=8.414$$

$\sigma$ reduction: $$\sigma_{reduction}=11.472-8.414=3.058$$ ]]

wind	$\mu$	$\sigma$	#examples
weak	40.667	11.441	3
strong	26.5	3.5	2
]
.col2[
weighted sum:

$$\sigma_{weighted}=\frac{3}{5}\cdot 11.441+\frac{2}{5}\cdot 3.5=8.265$$

$\sigma$ reduction: $$\sigma_{reduction}=11.472-8.265=3.207$$ ]]

Wins the highest score: .blue[wind]

Notes on Decision Trees

Tends to overfitting when leafs with few examples
High variance
- small changes in training sets produce different trees
Prunning: for avoiding overfitting
- less than 5 instances
- maximum depth
Previous regression tree by averaging leaf instances:

chefboost

view / download

sklearn (Projectile Motion)

view / download

]]

Outline

.brown[Introduction]
.brown[Machine Learning]
.cyan[Deep Learning]
- .cyan[Neural Networks]
- Bias & Variance
- DL Architectures
Reinforcement Learning
References

Artificial Neuron Model

Perceptron

Classifier and regressor
- One neuron (or unit)
- Linear model
Learning process:
- Find hyperplane that splits the data set $$\sum_{i=1}^nw_ix_i+b=0$$
Prediction formula:

$$h(x)=f(\sum_{i=1}^n w_i x_i + b)$$

] .col2[

]]

Example in sklearn:
- view / download / reference

Multi-layer Perceptron

.blue[Classification & regression]
- One hidden layer
- Non-linear model
Learning: .blue[backpropagation]:
- Gradient descent: $W$
- Loss function: $error(h(x),y)$

MLP Capabilities

Examples

Unity / C# ]]

view / download

Example on Unity

BackPropNetwork

Unity Project in Github
Backpropagation implementation

Deep Learning

Neural network with 2 or more hidden layers

]

Keras

Deep Learning high level library.

view / download / Source

Usual .blue[Parameters]:

Task	Loss function	Output Layer
Bin. class	binary cross entropy	single unit, sigmoid activation
Multiclass	categorical cross entropy	one unit per class, softmax activation
Regression	MSE or RMSE	single unit, linear activation

Epochs: times all examples are trained
Batch size: examples at each iteration
Optimizer: gradient descent variants (AdaGrad, RMSProp, Adam)

Keras Documentation

Outline

.brown[Introduction]
.brown[Machine Learning]
.cyan[Deep Learning]
- .brown[Neural Networks]
- .cyan[Bias & Variance]
- DL Architectures
Reinforcement Learning
References

Underfitting & Overfitting

Symptoms:
- high training error
Causes
- model too simple
- not enough training ] .col2[ .blueOverfitting
Symptoms:
- low training error
- higher validation error
Causes
- model too complex
- too much training
- training set too small ]]

Bias & Variance

Normal use of a validation set to select parameters & avoid overfitting!

Outline

.brown[Introduction]
.brown[Machine Learning]
.cyan[Deep Learning]
- .brown[Neural Networks]
- .brown[Bias & Variance]
- .cyan[DL Architectures]
Reinforcement Learning
References

Convolutional Neural Networks

from Computer Vision

to process image & video: .blue[invariant to translation & scale]

Convolutional Neural Networks II

Convolution: extract the high-level features such as edges
- Learns the patterns
Pooling: reduce dimensionality for
- max (edges) or average (photo)
- computational cost
- dominant features (rotational and positional invariant)
Example:
- Keras on MNIST
- view / download / Source ] .col2[

]]

The Neural Network Zoo

The Neural Network Zoo II

The Neural Network Zoo III

Open Neural Network eXchange

ONNX is an open format built to represent machine learning models.

Github
keras2onnx
- Model converter
Pre-trained ONNX models
- Vision
- Language
- Speech
- Time Series ] .col2[

Barracuda:

Lightweight Unity package for neural network inference.

Documentation / github
Example:
- Data
- Jupyter training / download
- C# script.red[*] / download ]]

Outline

.brown[Introduction]
.brown[Machine Learning]
.brown[Deep Learning]
.cyan[Reinforcement Learning]
- .cyan[Q-Learning]
- RL Platforms
References

Reinforcement Learning

an agent
a set of states $S$
a set of actions $A$

] .col2[ Learning a reward function $Q: S \times A \to \mathbb{R}$ for maximizing the total future reward.

]]

Q-Learning: method for learning an aproximation of $Q$.

Q-table Example

$Q(s_t,a_t)=Q(s_t,a_t)+\alpha[r_t+\gamma \max_aQ(s_u,a)-Q(s_t,a_t)], u=t+1$

Q-learning Example

Simple example of Q-learning & Q-Table.

Unity Package (in Spanish)
- See the folder Scripts in Assets

Deep Reinforcement Learning

Convolutional Neural Network for learning $Q$

Outline

.brown[Introduction]
.brown[Machine Learning]
.brown[Deep Learning]
.cyan[Reinforcement Learning]
- .brown[Q-Learning]
- .cyan[RL Platforms]
References

OpenAI Gym

Toolkit for developing and comparing reinforcement learning algorithms.

CartPole

Input: action (0 or 1); training target
Output:
- next_state, reward, info; training features
- done: game ended?
  max: 501
code / py / log ]]
Gaëtan Juvin. My Journey Into Deep Q-Learning with Keras and Gym (github), 2017.

ML-Agents

Unity plugin for training intelligent agents.

Address: https://github.com/Unity-Technologies/ml-agents

Contain: .blue[AI Gym], .blue[Deep Learning] & .blue[Reinforcement Learning].

ML-Agents: Hummingbirds

Unity Learn course by Adam Kelly.

Intelligent hummingbirds:
Navigate to flowers, dip their beaks in, and drink nectar.
Reinforcement Learning with Unity ML-Agents.

Hummingbirds: Observations

The agent's current rotation
The direction to the nearest flower
The distance to the nearest flower
How close the agent's beak is to pointing at the flower ] .col2[ ]]
How close the agent's beak is to being in front of the flower
Several raycasts that act like LIDAR so that the agent can avoid obstacles

public override void CollectObservations(VectorSensor sensor)
    sensor.AddObservation(new float[10]);

Hummingbirds: Rewards

A small positive reward each timestep if tehe bird's beak is touching the nectar: AddReward(.01)
A large negative reward or hitting the ground or boundaries of the training area: AddReward(-.5f)

Hummingbirds: Actions

Output of the Neural Network:

    /// <summary>
    /// Called when and action is received from either
    /// the player input or the neural network
    /// 
    /// vectorAction[i] represents:
    /// Index 0: move vector x (+1 = right, -1 = left)
    /// Index 1: move vector y (+1 = up, -1 = down)
    /// Index 2: move vector z (+1 = forward, -1 = backward)
    /// Index 3: pitch angle (+1 = pitch up, -1 = pitch down)
    /// Index 4: yaw angle (+1 = turn right, -1 = turn left)
    /// </summary>
    /// <param name="vectorAction">The actions to take</param>

    public override void OnActionReceived(float[] vectorAction)
    {
        ...
    };

Outline

.brown[Introduction]
.brown[Machine Learning]
.brown[Deep Learning]
.brown[Reinforcement Learning]
.cyan[References]

References

Gerard Escudero. Supervised Machine Learning. 2020.
Aurélien Géron. Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow, 2nd Edition. O'Reilly, 2019.
Sefik Ilkin Serengil. chefboost (2019): A Step by Step CART Decision Tree Example (2018).
Gaëtan Juvin. My Journey Into Deep Q-Learning with Keras and Gym (github), 2017.
DeepMind. Deep Reinforcement Learning.
Ian Millington. AI for Games (3rd edition). CRC Press, 2019.
Micheal Lanham. Learn Unity ML-Agents - Fundamental of Unity Machine Learning. Packt, 2018.

Videos

Károly Zsolnai-Fehér. OpenAI Plays Hide and Seek…and Breaks The Game!. Two Minute Papers.
Károly Zsolnai-Fehér. DeepMind’s AlphaStar: A Grandmaster Level StarCraft 2 AI. Two Minute Papers.
Siraj Raval. How to Make an Amazing Video Game Bot Easily (OpenAI Gym), 2016.
Adam Kelly. ML-Agents: Hummingbirds. Unity Learn.

Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
codes		codes
figures		figures
LICENSE		LICENSE
estils.css		estils.css
index.html		index.html
readme.md		readme.md

License

gebakx/ml-games

Folders and files

Latest commit

History

Repository files navigation

Artificial Intelligence

Machine Learning for Games

Outline

Classification Data Example

Main objective

Regression Data Example

Example method:

kNN with k=1

Classification Data Example

Regression Data Example

1 Nearest Neighbors algorithm

How about games?

Application examples:

Decision Learning

Some categories

When?

What?

Action Prediction

Outline

Maximum likelihood estimation

Naïve Bayes

Learning Model

Naïve Bayes

Classification

Naïve Bayes

Notes

Implementation

Gaussian Naïve Bayes I

Gaussian Naïve Bayes II

Gaussian Naïve Bayes III

Outline

Decision Trees

Example I

Example II

Example III

Example IV

Example V

chefboost

Cutting Points I

Cutting Points II

chefboost

Regression

Regression II

Notes on Decision Trees

chefboost

sklearn (Projectile Motion)

Outline

Artificial Neuron Model

Perceptron

Multi-layer Perceptron

MLP Capabilities

Examples

Example on Unity

Deep Learning

Keras

Outline

Underfitting & Overfitting

Bias & Variance

Outline

Convolutional Neural Networks

Convolutional Neural Networks II

The Neural Network Zoo

The Neural Network Zoo II

The Neural Network Zoo III

Open Neural Network eXchange

Barracuda:

Outline

Reinforcement Learning

Q-table Example

Q-learning Example

Deep Reinforcement Learning

Outline

OpenAI Gym

CartPole

Packages