-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathJaynes.txt
2521 lines (2006 loc) · 103 KB
/
Jaynes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Todo
Rewrite finite sets policy part
Rewrite P(A|B) derivation.
Note Notes
iiuc - A bit unsure(Like below 80%), but if I understand correctly...
mbg - Quite unsure, but My best guess would be something like...
tbh idk/tbh idu - To be Honest I don't know/didn't understand...
Probability Theory - The Logic of Science by E.T. Jaynes
Why this book good?
E.Y. says it's sexy.
"Preface"
Why might You be interested in this book?
Jaynes says:
Book describes inference.
The prob theory explained in this book is more practical than classic
statistics.
The book resolves many known problems in classic statistics.
The book contains new results which would be hard to prove using
classic statistics.
"History"
What was the point of this subchapter?
Answering the question -
Why You should trust Jaynes?
The book actually contains useful results.
The resulting theory gives pretty much the same results as classic
stats, where it works.
Many struggles classic stats have, are explained and solved using the
book's prob theory.
Many Bayesian struggles of the past are explained and solved in the
book.
The book contains new useful results hard or impossible to prove in
classic stats.
The book is quite rigorous in a practical sense.
It starts off with some reasonable rules/axioms about how
common sense and acting on probabilities should idealy work.
@Though some of them don't seem obvious to me.
Then proves that there's only one way of thinking and doing prob
theory, based on the axioms.
And notes where it's using approximations or assumptions, which
might cause confusion or paradoxes if not noted.
Jaynes has spent like 40 years giving lectures and thinking about
the details and consequences.
Why are the rules/axioms Jaynes accepts trustworthy?
The resulting theory works like classic stats,
when the frequency interpretation of possibility makes sense.
But the theory also gives useful results much less awkwardly in
situations where
interpreting possiblity as
You having partial knowledge about the truth
makes more sense than
interpreting it as a frequency.
e.g. so far most obviously
Chosing between two hypotheses.
As to Jaynes's official explanation iiuc:
G. Pólya showed that there were some strong common sense rules as
to how the possiblity of something seems to changes.
He didn't know how to describe it with numbers.
He just had rules for what makes something more or less possible.
Possibility here is intended to mean something like -
How likely is something to be true?
e.g.
What's the possibility that this man is guilty of crime?
What's the possibility of the next coinflip being heads?
and not -
What's the proportion of positive agains the total cases?
(Which is the interpretation in classic stats)
e.g.
What's the proportion/frequency of throwing heads in coinflips?
What's the proportion/frequency of throwing double six's
in monopoly?
In addition to G. Pólya's rules,
R. T. Cox made up some rules about consistency of assigning and
acting on probabilities.
He then proved that any way of thinking and acting on
probabilities is isomorphic to the way it is done in this book,
or it breaks one of Pólyas common sense rules, or Cox's rational
consistency rules.
These rules and the proof are described in chap 1 and 2.
What does isomorphic mean?
Roughly?
Two things are called isomorphic,
if You can prove that they work in the same way, but have
different names assigned to stuff.
Then if You prove something is true in one thing,
by analogy it's true in the other thing.
@I'm a bit unsure about what isomorphism means in the book.
@It can mean a lot of things based on what stuff can change
and what has to be left constant.
Slightly more formal example of how isomorphism could be defined?
I'll call a "thing" a set and functions defined on a set.
Two things are isomorphic if You can rename the elements of the
sets so that both things become equal.
Name colisions are not allowed
e.g.
Thing 1 - set S = {A, B}, function f:{A -> B, B -> A}
Thing 2 - set S2 = {0, 1}, function g:{0 -> 1, 1 -> 0}
If You rename
A -> 0
B -> 1
f -> g
S -> S2
Then thing 1 becomes thing 2.
Thing 1 - set S = R, function + (The plus sign that we all
know and love)
Thing 2 - set S2 = R+, function * (The multiplication sign
we all know and love)
If we take thing 1
and rename
All numbers x -> e ^ x
+ -> *
S -> S2
Then thing 1 becomes thing 2.
anti e.g.
Thing 1 - set S = {0}
Thing 2 - set S2 = {0, 1}
Since name collisions aren't allowed, You can't give two
numbers the same name.
Thing 1 - set S = {0, 1}, f:{0 -> 0, 1 -> 1}
Thing 2 - set S2 = {0, 1}, g:{0 -> 1, 1 -> 0}
I can't think of a way to rename them.
Probably requires some advanced proof to show they're not
isomorphic.
"Foundations"
What was the point of this chapter?
iiuc
Trying to explain where Jaynes got the main ideas used to guide
deriving all the stuff.
What are the most foundational ideas used for deriving Jaynes's prob
theory?
Probability theory as an extension of logic
What's Probability theory as an extension of logic?
iiuc
The idea that prob theory should be able to assign probabilities
to systems of logical statements and observations about those
systems.
Why does Jaynes call it that?
Because it extends the possible values of logic from false
and true, to a bunch of possibility values between.
Why is that important?
In classic prob theory, people start with random variables
defined on sets of events.
iiuc
That makes it's awkward to assign probabilities to systems
of logical statements, yet it is useful in practice.
Pólya's and Cox's axioms/rules
The finite sets policy
What's the finite sets policy?
The idea that infinite sets shouldn't have properties assigned
to them, at least in probability theory.
The way Jaynes proposes modeling things with infinite sets, is to
always define a limiting process of finite systems, and see whether
the properties of interest converge/work.
What does limiting process mean?
You just look at larger and larger systems constructed by some
algorithm, and see whether a property of the system gets closer
and closer to some value.
Why?
Jaynes says the classic approach of assigning properties to
infinite sets causes more paradoxes than always defining a
limiting process.
Some comparisons
The Kolmogorov axioms of probabilitiy theory can be roughly derived
from Pólya's and Cox's work. At least where the axioms are applicable.
deFinetti's prob theory differs in handling infinite sets, that causes
paradoxes.
"Comparisons"
What was the point of this chapter?
iiuc
Explain how prob theory ??philosophies compare when looking at the
problems they can solve.
@I might want to rewrite these notes once I understand better.
What philosophies are discussed?
Frequentism
Bayesianism
Probability as an extension of logic
Maximum entropy
By "better" I mean "more effectively" here.
How do the philosophies overlap in the problems they can solve?
iiuc
Frequentism is the most narrow.
Bayesianism solves many problems better than frequentism.
I suspect that Bayesianism includes frequentism where it works,
thought that wasn't clearly stated, so it might not be true.
Prob theory as an extension of logic solves all problems
frequentism and
Bayesianism can solve.
It also solves some problems they can't.
iiuc
Maximum entropy can solve
all the problems Bayesianism can,
and something extra.
But Bayesianism solves them better than maximum entropy, if the
problem is "well developed".
What does "well developed" mean?
You have to know the problem's
model,
sample space,
hypothesis space,
prior probabilities.
sampling distribution.
@What does that even mean?
tbh idk.
tbh idu how
maximum entropy and
prob theory as an extension of logic
overlap.
Limitations of philosophies?
Frequentism requires that
The problem is interpretable as a repeatable random experiment,
which is rarely true.
iiuc
Other philosophies don't have this limitation.
Frequentism works baddly, when
Prior information is important.
What does that mean?
mbg
Prior information means something like assumptions you know
to be true about the problem.
iiuc
Other philosophies don't have this limitation as much.
Frequentist statistics don't use all the data optimally if a good
enough statistic doesn't exist.
In Bayesianism, it's pretty straight forward how to use the data
optimally.
In practice Bayesianism can solve harder problems than frequentism.
Some important tools used in "frequentist statistics" require
extra assumptions that don't come from the axioms, and they break
down at the extremes.
e.g.
Unbiased estimators
Tail-area hypothesis tests
Confidence intervals
iiuc
Other philosophies, but not frequentism, allow
"elimination of nuisance parameters".
What's that?
iiuc
Some ways of simplifying the calculations, if You're only
interested in some information about the problem, but not all.
To use Bayesianism a problem has to be "well developed".
@Does frequentism require this?
tbh idk.
Maximum entropy doesn't require knowing the model and sampling
distribution.
@Limitations of prob theory as an extension of logic?
I suspect similar to Bayesianism, though that wasn't clearly
stated.
To use maximum entropy, You have to know a problem's
sample space,
hypothesis space,
prior probabilities.
Jaynes suspects there might exist principles which don't even require
that.
@Are there ways in which frequentism is better than Bayesianism?
tbh idu.
Jaynes proposes reading these books for examples of ??maximum entropy:
Bayesian Spectrum Analysis and Parameter Estimation by Bretthorst,
Maximum Entopy in action by Buck and Macaulay
Data Analysis - A Bayesian Tutorial by Sivia
"Mental Activity"
In many cases prob theory can describe how people reason.
In some cases the connection might seem surprising or even disturbing.
Jaynes thinks it has potential for psychological or legal research.
"What is 'safe'?"
What was the point of this chapter?
mbg
Showing that incorrect prior information baked into models and
methods of analysis is important, and can lead to incorrect
results.
If the prior information is wrong, it's possible that no
amount of data will bring correct results.
Example?
People assume a linear model between how much a substance is
eaten and how toxic it is.
But that model is wrong, because many substances have thresholds
up to which they're not dangerous at all.
So testing extremely large doses can lead to overestimating
danger for normal doses,
and testing extremely small doses can lead to underestimating
the danger for normal doses.
iiuc
A more philosophy-level example of incorrect prior information:
You can't derive Newton mechanics by modeling a coin with a
stochastic model no matter how much data You get.
Jaynes thinks these ideas are not taken seriously enough in medical
research, and this is very dangerous.
"Style of Presentation"
@Note Draft
What was the point of this chapter?
Explaining various details relating to
Structure of Jaynse's explanations,
Jaynse's views relating to mathematical rigor and practicality,
Roasting classic frequentist methods a little.
Structure of the book?
Part 1 contains:
Explanations of the principles and axioms.
Explanations for how to apply them.
Explanations of where historically people have messed up.
Part 2 contains:
Just explanations of advanced applications.
Jaynes will usually
First explain the problem, and how people have failed in the past.
Then work out a couple examples.
Opinions on how to explain axioms and principles?
Jaynes pays more attention to explaining the connection between the
world and the principles and axioms, than he does to applications.
Why?
Because in his experience students have trouble understanding the
connection to the real world,
but no trouble generalizing to new problems once they do.
iiuc
When explaining the principles and axioms
Jaynes cares more about clarity and understandability than rigor.
Because
Rigor is useless if the the connection to the real world is not
well understood.
Jaynes will be very strict with his derivations.
Everything will be a result of the axioms and rules.
And I suppose it will be clearly stated when extra assumptions or
approximations will be done.
Unorthodox opinions
Jaynes thinks mathematics creates wrong results because of how it
handles infinite sets, so rigor doesn't even necessarily lead to
correct results when working with infinite sets.
Jaynes will use continuous approximations for finite systems.
But he doesn't think "showing how to generate an uncoutable set as a
limit of a finite one" is important.
@What does that even mean?
tbh idk.
Jaynes will not do statistics as is done in frequentism,
instead always using likelihood functions.
Jaynses Justifications for doing this:
iiuc
the classic frequentist methods for working with statistics doesn't
come as a result of the axioms, and thus causes "paradoxes".
The likelihood function comes as a results of the axioms.
As a result the likelihood funciton works much better:
The likelihood function perfectly includes all the information in
the data, while usually classic statistics don't.
It is always clear how to write down a likelihood function, but not
always clear how to reduce the problem to a good statistic.
This allows for much harder problems to get solved.
likelihood function calculations often allow for simplifications to
be done. (nuisance parameter elimination)
Jaynes tries to explain everything as understandably as possible,
that includes:
Not using any linguistic tricks or hidden meanings.
Explaning everything in plain English.
iiuc
Not explaining things that people understand intuitively
exceptionally well and can't really reduce much further, like what
does it mean for something to be true or exist.
Part 1. Principles and elementary applications
1. Plausible reasoning
1.1 Deductive and plausible reasoning
What are we currently trying to understand at all?
What does common sense mean?
And how to generalize it?
What types of statements does math logic deal with?
If
A => B and
A = True
Then
B = True
What types of statements do people usually use when thinking irl?
If
A => B and
B = True
Then
A becomes more probable
Or
If
A makes B more probable and
B = True
Then
A becomes more probable
Example?
You live in the same room as Your little brother.
Occasionally "Your brother drinks juice in Your room".
You come home, and see "some juice spilled on the floor".
"Your brother drinks juice in Your room" makes
"some juice spilled on the floor" more likely so
it becomes more likely "Your brother drank juice in Your room".
Even when in reality a burglar might have broken into the house,
drank some juice, spilled some on the floor, and now is hiding in
Your wardrobe, because You came home too early.
Does prior information affect reasoning?
Yes.
Example?
If every time You find "some juice is spilled on the floor", a
burglar turns out to have broken into Your house, You wouldn't be
so harsh on Your little brother.
In practice even mathematicians always use the probabilistic style of
thinking when
figuring out what conjectures/theorems are likely to be true and
worth thinking about.
How do we know that?
Figuring out formal proofs is usually delayed to publication
writing time.
Is mathematial implication causation?
Nope.
How come?
It rains => It was cloudy 1 second ago
but rain is not the cause of clouds
Concrete connection between implication and causation?
Causation implies implication, but
implication doesn't imply causation.
1.2 Analogies with physical theories
How do physicists do things in their field?
Approximate procedure
Someone finds a pattern in some features of the world
They write up an idealized model
It seems to successully predict something about these features
This is considered progress in the field
Then use newfound knowledge to find larger patterns
How does this relate to understanding common sense?
Jaynes claims he'll take a similar approach.
Finding small patterns in how probabilistic reasoning seems to
work.
Then writing more general models.
Why create a model at all?
Creating and abstract model of probabilistic resoning would be
useful in cases
which are too complicated to be handled by human reasoning
capabilities.
Examples?
Many propositions or variables.
Situations where emotions come into play.
Situations where precise estimates are necessary.
etc.
1.3 The thinking computer
Psychologically it's better to think about
how to build a thinking computer
rather than how to model human common sense
because
it's hard to think about human common sense without becoming
philosophycal and involved in debates.
Because human common sense includes a lot of
biases and inconsistencies
Because the human mind hasn't evolved just for good reasoning.
1.4 Introducing the robot
Assume we have a robot who can do math logic.
1.5 Boolean algebra
What's AB mean in bool alg in this book?
Both A and B are true
What's A + B ...?
A or B are true (not xor)
What's A = B ...?
A and B have the same truth value
0th reasonable axiom of possible reasoning?
If two statements are mathematically equivalent, their possibilities
are equal.
What's \overline{A} ...?
Not A
What's A => B ...?
If A is true then B is also true
How does possible reasoning improve on mathematical logic?
In math if A => B, then we don't get any info about
B if we find out A is false, or
A if we find out B is true.
However we know that irl there is some information there.
That's what a possible reasoning model could help with.
Note
Why is "implication" a bad name for "=>"?
"A implies B" is sometimes interpreted as "B can be deduced from A"
In math "A implies B" just means "If A is true then B is true".
e.g.
2 + 2 = 5 => I am the pope, is a math true statement. And
2 + 2 = 5 => I am not the pope is a true statement.
Because
False => False is correct in math
False => True is correct in math
1.6 Adequate sets of operations
Assume we know math logic.
1.7 The basic desiderata
What's desiderata?
Useful quality/property that we would like to have.
The author insists on using this word.
What does Jaynes define A|B to mean?
Possibility of A given that B is true.
Corresponds to a real number.
A|B has to be defined only if B is possible.
Why exactly this definition?
We usually we know something about A.
We don't reason about abstract symbols we know nothing about.
It's still possible to write down the possibility of an event given
no observation by writing A|True.
Does A|False exist?
No. It's not defined.
What if A or B is a number?
A and B have to be bool statements.
Clear enough that they work with bool logic.
A or B can be statements like
"x has a value a", which is usually meant by authors who write
numbers instead of statements.
Jaynes says he'll describe how not doing this, actually leads to
paradoxes in practical situations in chap 15.
Properties we'd like to have for possible reasoning?
I Degrees of possibility are represented by real numbers.
Why?
Jaynes couldn't think of a system without this or a property
equivalent in practice.
There is also some theoretical requirement or something.
The directions of updates correspond to common sense.
A bigger possibility will correspond to a bigger number.
Why?
Convenience, but not necessary
A small increase in possibility will cause only a small increase
in the number. (i.e. continuity)
Why?
Convenience, but not necessary
How does Jaynes understand the word "update"?
Moment when we understand that the relevant information for calcing
the possibility of statement A is C' instead of C.
i.e. We calculate A|C' instead of A|C
Usually probably because usually C' = C And "some new statement".
If updating C makes B more possible
i.e. (B|C') > (B|C),
Then (Not B|C') < (B|C),
i.e. The opposite of B becomes less possible.
Why?
Seems quite reasonable
If
updating C makes B more possible,
i.e. (B|C') > (B|C),
and the update doesn't change the possibility of A given B,
i.e. (A|BC') = (A|BC),
then
the information should increase the possibility of (A and B),
i.e. (AB|C') >= (AB|C)
Why?
After thinking of a couple examples seems reasonable
Consistency
IIIa
Any way of calculating the result leads to the same result.
IIIb
Prob theory doesn't guarantee working if not all relevant knowledge
is put into condition.
IIIc
If in two problems the relations are the same except for
labeling, any derived results should be the same.
Why these properties?
Author claims that
these properties ~uniquely define a way to handle possibilities.
Assume our robot follows these rules.
2. The quantitative rules
2.1 The product rule
What variables go into calculating the possibility of AB|C?
AB|C = F(B|C, A|BC) or F(A|C, B|AC)
Why?
It seems reasonable that the possibility of AB|C can be split
into two parts:
Calculating the possibility of
B given C
and then taking into account how possible it is we will see
A given B and C happened.
Also there was an exhaustive proof by a guy named Cox that showed
other options would absurdly contradict intuition in everyday
situations.
When an update C happens such that
B|C' > B|C and
A|BC' = A|BC and thus
AB|C' >= AB|C
When does equality happen?
When A|BC' = A|BC = impossible.
Why?
Not explained really, but seems intuitive.
When an update C -> C' happens such that
B|C' = B|C and
A|BC' > A|BC and thus
AB|C' >= AB|C
When does equality happen?
When B|C' = B|C = impossible.
Why?
Not explained really, but seems intuitive.
F is continuous and monotonically increasing on parameters.
why?
Didn't really understand, but seems reasonable.
Except F(x, y) is impossible if x or y is impossible.
F(x, F(y, z)) = F(F(x, y), z)
Why?
(ABC|D) = F(A|BCD, BC|D) = F(A|BCD, F(B|CD, C|D))
= F(AB|CD, C|D) = F(F(A|BCD, B|CD), C|D)
Why is this important?
All possible solutions to this equation, given properties of F,
are known.
What are the solutions?
F(B|C, A|BC) = inv(w)(w(B|C) * w(A|BC))
w - positive & monotonic function
Why?
There was an advanced proof using differential equations and
math anal.
But I didn't relly understand it.
Name?
Product Rule
What's NCC?
For shortness I call that number corresponding to cerainty.
e.g.
(A or Not A|C),
(True|C),
etc.
w(NCC) = 1
why?
Since equivalent logical statements have the same possibility
(0) Take C => A, and thus A|C = NCC, then
Because
(C => A) => A = True
True And B = B
(1) AB|C = B|C and
Because
B And C = True => (C = True) and
C => A
(2) A|BC = NCC = A|C
AB|C =
= F(A|BC, AB|C) = (From 1 & 2)
= F(B|C, A|C)
(3) i.e. AB|C = F(B|C, A|C)
From 3 and Product Rule.
w(AB|C) = w(B|C)w(A|C) (Changing LHS from 1)
w(B|C) = w(B|C)w(A|C)
w(A|C) = 1 (Changing LHS from 0)
w(NCC) = 1
Q.E.D.
What's NCI?
-||- Number corresponding to impossibility.
e.g.
(False|C),
(A And Not A|C),
etc.
w(NCI) = 0 or +inf
why?
Similar to proof for why w(NCC) = 1
What happens if +inf?
If one was to assume w(NCI) = +inf instead of w(NCI) = 0,
that later turns out to work sort of equivalently.
We'll assume w(NCI) = 0, as is traditional.
2.2 The sum rule
What's the algebraic relation between w(A|B) and w(Not A|B)?
w(A|B)^m + w(Not A|B)^m = 1
Which are all again sort of equivalent.
But because the formulas look simpler,
and by tradition we'll take:
w(A|B) = 1 - w(Not A|B)
name?
Negation rule.
why?
Algebraic, differential and math anal magic, didn't comprehend.
What was all of this fucking around for?
It was sort of proved that,
No matter what the mapping from possibilities to real numbers, and
as long as it obeys the desiderata,
there will exist a function w that maps from the possibility
associated real numbers to [0, 1],
Such that the w's can be calculated regardless of the
possibility -> R association. (Which is ~explained later in notes)
I'm really not sure if this doesn't cause inconsistencies, like
mapping one w value to different possibilities, but not the first
thing I don't understand.
Also w has the properties:
w(number corresponding to True = certain) = 1
w(number oorresponding to False = impossible) = 0
w(AB|C) = w(B|C) * w(A|BC)
w(A|B) = 1 - w(Not A|B)
If two statements w(A|C) and w(B|C) are symmetric, w(A|C) = w(B|C)
Also calculations with this w(A|B) function in practice works
analogously to how people traditionally do calculations with P(A|B).
Except w(A|B) work with math statements and P(A|B) with sets.
Thus as is traditional we'll call the w function P(A|B).
How to calculate P(A + B|C)?
P(A + B|C) = P(A|C) + P(B|C) - P(AB|C)
why?
Algebraically derivable from negation and product rule.
What's mutually exclusive mean?
Two of the events cannot be true at the same time.
Given all P(subset(Ai)|X), can You calculate any P(f(Ai)|f2(Ai)X)?
Yes.
How?
Worst case:
P(f(Ai)|f2(Ai)X) = (Product rule)
= P(f(Ai)f2(Ai)|X) / P(f2(Ai)|X)
Now the problem has been reduced to calculating two probabilities of
form P(f(Ai)|X).
Next calculate probabilities of all possible combinations of Ais,
e.g. P(A1 And Not A2 And A3 And A4 And Not A5...|X),
(Which is possible using the product and negation rules).
Next write f(Ai) as a sum of all the relevant combinations of Ai.
In practice, usually there are other ways of getting the result faster
and without knowing all the P(subset(Ai)|X) values.
2.3 Qualitatitive properties
Does implication make sense using Ps?
Yes?
Why?
Does knowing A in implication still imply B?
C = (A => B)
P(B|AC) = P(AB|C) / P(A|C)
C => AB = A
P(B|AC) = P(A|C) / P(A|C) = 1
Does knowing B is False imply A is False?
C = (A => B)
P(A|Not B and C) = P(A And Not B|C) / P(Not B|C)
C => A And Not B = False, i.e. is impossible assuming A => B
P(A|Not B and C) = P(False|C) / P(Not B|C) = 0
Does the statement
"If A => B, then (if B = True then A becomes more possible)"
work?
Yes.
Why?
C = (A => B)
P(A|BC) = P(AB|C) / P(B|C)
C => AB = A thus
P(A|BC) = P(A|C) / P(B|C)
1 <= 1 / P(B|C)
So P(A|BC) >= P(A|C)
i.e. Knowing that B is true, makes A|BC more likely.
Does the statement A => B, thus if A = False then B is less possible
work?
Yes.
Why?
(1) C = (A => B)
P(B|Not A and C) = P(Not A and B|C) / P(Not A|C) (Prod. rule)
(2) P(B|Not A and C) = P(B|C) * P(Not A|BC) / P(Not A|C)
Since ((A => B) => P(A|BC) >= P(A|C))
P(A|BC) >= P(A|C) => (Negation rule)
1 - P(Not A|BC) >= 1 - P(Not A|C) =>
P(Not A|BC) <= P(Not A|C) =>
P(Not A|BC) / P(Not A|C) <= 1 => (From dividing 2)
P(B|Not A And C) <= P(B|C)
Q.E.D.
Does the statement
(Seeing A makes B more possible, thus seeing B makes A more possible)
i.e.
(0) P(B|AC) > P(B|C) =>
P(A|BC) > P(A|C)
work?
Yes.
Why?
From prod. rule:
P(A|BC) = P(AB|C) / P(B|C) => (Prod rule)
(1) P(A|BC) = P(A|C) * P(B|AC) / P(B|C)
From given:
P(B|AC) > P(B|C) =>
(2) P(B|AC) / P(B|C) > 1
From 1 and 2
P(A|BC) > P(A|C)
Q.E.D.
Interesting memes from looking at formula 1.
If seeing A makes B only slightly more likely, then seeing B makes
A only slightly more likely.
i.e. (3) P(B|AC) = P(B|C) + eps1 => P(A|BC) = P(A|C) + eps2
Why?
Formula 1:
P(A|BC) = P(A|C) * P(B|AC) / P(B|C) => (From 3)
P(A|BC) = P(A|C) * (P(B|C) + eps1) / P(B|C) =>
P(A|BC) = P(A|C) * (1 + eps2) =>
P(A|BC) = P(A|C) + P(A|C) * eps2 =>
P(A|BC) = P(A|C) + eps3
Q.E.D.
e.g?
Eating at McDonalds makes it slightly more likely a person will
become overweight.
So knowing someone's overweight makes it slightly more likely
they eat at McDonalds, but not by a lot.
If a person is gay, it makes them slightly more likely to talk
about gay shit.
So knowing someone talks about gay shit, makes them more likely
to be gay, but not by a lot, because the possibility of someone
talking about gay shit is high anyway.
What's another interesting meme is that the preivous effect always
makes A more likely, never less likely, if it was possible at all.
For A to increase a lot when B is observed, it is necessary but not
sufficient for P(B|C) to be small
i.e. P(B|C) ~= 1 => P(A|BC) ~= P(A|C)
Why?
If P(B|C) ~= 1 =>
P(B|AC) / P(B|C) ~= P(B|AC)
P(B|AC) > P(B|C) =>
P(B|AC) ~= 1 =>
P(B|AC) / P(B|C) ~= P(B|AC) ~= 1 => (From 1)
P(A|BC) ~= P(A|C)
e.g.?
A person who's religious would definitely cellebrate christmas,
Yet nearly everyone cellebrates christmas, so You get nearly
no information on whether they're religious.
i.e.
Assume
A = Religious
B = Cellebrates X-mas = X-mas
P(Religious|C) ~= 0.1
i.e. Default chance of being religious
P(X-mas|Religious and C) ~= 1.0
i.e. Chance of cellebrating x-mas given religious
P(X-mas|C) ~= 0.99
i.e. Default chance of cellebrating x-mas
Then
P(Religious|X-mas and C) = P(Religious|C) *
P(X-mas|Religious and C) / P(X-mas|C) =>
P(Religious|X-mas and C) = 0.1 * 1.0 / 0.99 ~=
= 0.1 * 1.01 ~= 0.101
Seeing that someone breaks a window is very rare, so it gives
You a lot of information on whether the person should be
arrested.
i.e.
Assume
A = Vandalized someone's property = Vandal
B = Broke a window = Broke
P(Vandal|C) ~= 0.0001
i.e. Chance of being a Vandal.
P(Broke|Vandal and C) ~= 0.05
i.e. Chance of breaking a window given a vandal.
P(Broke|C) ~= 0.00001
i.e. Chance that someone would break a window.
Then
P(Vandal|Broke and C) = P(Vandal|C) *
P(Broke|Vandal and C) / P(Broke|C)
P(Vandal|Broke and C) = 0.0001 * 0.05 / 0.00001
= 0.0001 * 5000 = 0.5