Skip to content

Latest commit

 

History

History
305 lines (207 loc) · 14.7 KB

Week2_Uncertainty.md

File metadata and controls

305 lines (207 loc) · 14.7 KB

Uncertainty - CS50AI Lecture 2

Probability Theory:

In a dice, there are 6 possible worlds: 1, 2, 3, 4, 5, and 6. Each world has a probability of being true and we can represent both the world and its probability by doing the following.

-> A possible world
-> The probability of world being true where
0 ≤ P() ≤ 1.

The sum of all P() has to be equal to 1. Logically, this tells us that something thing will be true. In the dice example, it is guaranteed that we will get a number where 1 ≤ ≤ 6. We can represent this with the equation:

$\sum_{\omega \in \Omega} P(\omega) = 1$

Tthe sum of the probabilities of all possible worlds will be equal to 1.

In a more complex model, it's not always the case that each world is equally as likely.
Example:
dice

In this example, although each world is equally likely(a combination of dice rolls), the sum of the rolls are not. There are 6 possible worlds where 7 is the sum but only 1 possible world where the sum is 12. We can represent this using the following.

Only 1 out of the 36 worlds have a sum of 12.
6 worlds have a sum of 12.

Unconditional Probability:

Degree of belief in a proposition in the absence of any other evidence.

Conditional Probability:

Degree of belief in a proposition given some evidence that has already been revealed.

Probability that is true given .

All cases where a and b are true with the exception of b being true.

or,

Example: In the image above, the probability of the sum being 12 given a red die with value 6.

: the a value in the above equation is 12.
: the b value in the above equation is a red 6 die.

First, the probability of red being 6 is . Next, we need the probability of both a and b being true which is because only 1 world has a sum of 12 with red being 6. Finally, we divide these two to get the probability of both a and b being true.


Therefore, given that the red die rolled 6, the probability of the sum being 12 is .

Random Variables:

A variable in porbability that has a domain of values that it can be

Ex: Roll = {1, 2, 3, 4, 5, 6}
Ex: Weather = {sum, cloud, rain, wind, snow}
Ex: Traffic = {none, light, heavy}
Ex: Flight = {on time, delayed, cancelled}

Probability Distribution:

Ex: P(Flight = on time) = 0.6
Ex: P(Flight = delayed) = 0.3
Ex: P(Flight = cancelled) = 0.1

This can be represented using a vector: A sequence of values(interpreted in order).
P(Flight) = <0.6, 0.3, 0.1>

Independence:

The knowledge that one event occurs does not affect the probability of other events.
Ex: The red die roll does not influence the blue die roll.
Ex: Clouds and rain are most likely not independent.

If independent,

Example of Independence:

Example of Non-Independence:

There is no way you can get 4 if you get 6. Therefore the event that red rolls 6 and the event that red rolls 4 are non-independent. If one of them occurs, the other is guaranteed not to occur on that specific roll.

Bayes' Rule:

Derivation:

Given:

and

We can use substitution to derive Bayes' Rule.

Divide by P(a) to get:

This is useful when one probability is easier to know than another probability. Knowing P(visible effect | unknown cause), we can calculate P(unkown cause | visible effect).

Joint Probability:

AM:

C = cloud C = ¬cloud
0.4 0.6

PM:

R = rain R = ¬rain
0.1 0.9

AM & PM:

R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58

Probability of clouds given it is raining(, and are used interchangably).

Conditional distribution of is proportional to the joint probability of where = <0.08, 0.02>(cloud probability given it's raining). Equivalently, = <0.8, 0.2> to reach a sum of 1.

Negation:

The probability of ¬a happening is 1 - the probability of a happening.

Inclusion-Exclusion:

$P(a \verb|∨| b) = P(a)+P(b) - P(a \verb|∧| b)$

The probability of a or b being true is the the probability of a being true plus the probability of b being true - the probability of both a and b being true.

Marginalization:

$P(a) = P(a, b) + P(a, \verb|¬|b)$

This formula helps find the probability of a using some other information like b. Either b is true or b is not true but the probability of a being true is the sum of both these cases. Sometimes b could take the form of a random variable where there are multple possible values. Now, we have to sum up all the possible values that b could take. This can be represented as follows:

Conditioning:

$P(a) = P(a | b)P(b) + p(a|\verb|¬|b)P(\verb|¬|b)$

Similar to margininalization except it uses conditional probability instead of joint probability. Again, there is a version of conditioning that can be applied to random variables.

The above basically means given variable Y, what is the probability that it takes on state . We then multiply this by the probability of X taking on the state given Y took on state .

Bayesian Network:

Data structure that represents the dependencies among random variables.

  • Directed graph
  • Each node represents a random variable
  • Arrow from X to Y means X is a parent of Y
  • Each node X has probabillity distribution:

Example:
BayesianNetworkExample

Description:

Appointment: Dependent on Train(1 arrow pointing to it)
Train: Dependent on maintenance and Rain(2 arrows pointing to it)
Maintenance: Dependent on Rain(1 arrow pointing to it)
Rain: Not dependent on anything(no arrows pointing to it)

Using Bayesian Networks:

Let's say we wanted to find P(light, no, delayed, miss).
Maintenance is dependent on rain so we write the probability of maintence taking on the value of no as P(light)*P(no | light).
The train is dependent on both Rain and Maintenance so we add both light, no to the previous expression to get
P(light)*P(no | light)*P(delayed | light, no).
Ultimately, appointment is only dependent on the Train so we add another segment to get the probability of
P(light, no, delayed, miss):

Inference:

  • Query X: variable for which to compute distribution.
  • Evidence variables E: observed variables for event e.
  • Hidden variables Y: non-evidence, non-query variables.

Goal: Calculate

Let's say we wanted to find

Query: Appointment
Evidence: light, no
Hidden: Train

We know that a conditional probability is equiavalent to times joint probability.

Either the train is on time, or the train is delayed:

$ = \alpha [P(Appointment, light, no, on-time)] + P(Appointment, light, no, delayed)$

Inference by Enumeration:

X is the query variable.
e is the evidence.
y ranges over values of hidden variables.
normalizes the result.

Sampling:

Conditional Sampling:

Randomly choose a world for one of the random variables. Every variable from that point is chosen based on this value. Now, if we run this sampling simulation many many times, we can choose the results where the query is true and estimate the probability rather than computing every possible case.

Unconditional Sampling:

This works similar to conditional sampling except we reject the worlds where the evidence is not satisfied and choose the ones where it is.

Problems with Sampling: If a world is very unlikely to occur, a lot of computation is wasted.

Likelihood Weighting:

  • Start by fixing the values for evidence variables
  • Sample the non-evidence variables using conditional probabilities in the Bayesian Network
  • Eight each sample by its likelihood: the probability of all the evidence.

Markov Model:

Markov Assumption:

The assumption that the curent state depends on only a finite fixed number of previous states.

Markov Chain:

A sequence of random variables where the distribution of each variable follows the Markov assumption.
markovChainExample

Transition Model: Transition from one state to another state.

Given today's weather, predict tomorrow's weather. Use tomorrow's weather to predict the day after's weather and so on...

Hidden Markov Model:

A Markov model for a system with hidden states that generate some observed event.
Sensor Model:
hiddenMarkov
In this case, we aren't directly observing umbrellas but we are using sun and rain to predict something that is "hidden."

Sensor Markov Assumption:

The assumption that the evidence variable depends only on the corresponding state.

Filtering: Given observations from start until now, calculate probability for the current state.
Prediction: Given observations from start until now, calculate probability for a future state.
Smoothing: Given observations from start until now, calculate probability for a past state.
Most Likely: Given observations from start until now, calculate most likely sequence of states.