biostats_r_code/Week6_Notes.Rpres at master · alexfrieden/biostats_r_code · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
Week6_Notes: Hardy Weinberg, More PCA
========================================================
author: Alexander Frieden
date: 3/07/2016


Hardy Weinberg
========================================================
In 1908 the Mathematician G.H. Hardy wrote in Science about an observation he had.

It turned out a German Physician Wilhelm Weinberg had come up with the same thing.

Thus this equation that was known as Hardy's law was now Hardy Weinberg


The Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences.

Example
============================================================

![allele frequency](pictures/alleleFrequencyCalc.jpg)


Example (part 2)
===================================
In general we let $f_{A/A}$,$f_{A/a}$, and $f_{a/a}$ be the three genotype frequencies at a locus with two alleles then the population frequencies of alleles A are given by
$$
\begin{align}
p&=f_{A/A} + 1/2f_{A/a} \\
q&=f_{a/a} + 1/2f_{A/a}
\end{align}
$$

Note that p + q = 1.


Hardy Weinberg Part 3
========================================================
How can we get genotype frequencies from allele
frequencies?

Assuming random mating, the equilibrium genotype
frequencies will be

$$
\begin{align}
&f_{A/A} = p^2 \\
&f_{A/a} = 2pq \\
&f_{a/a} = q^2
\end{align}
$$

These proportions are obtained after a single generation of
random mating.

Hardy Weinberg Part 4
========================================================
What are the allele frequencies in the next generation?

We compute:
$$
\begin{align}
p' &= f_{A/A} + \frac{1}{2}f_{A/a} \\
   &=p^2 + \frac{1}{2}(2pq) = p^2 + pq \\
   &=p(p + q) = p(1) = p

\end{align}
$$

The allele frequencies are unchanged!


Hardy Weinberg Part 5
========================================================
!["hardy weinberg"](pictures/hardyWeinberg.png)

Hardy Weinberg Part 5 Explanation
========================================================
Hardy–Weinberg proportions for two alleles: the horizontal axis shows the two allele frequencies p and q

the vertical axis shows the expected genotype frequencies.

Each line shows one of the three possible genotypes.

Hardy Weinberg Part 6
========================================================

Note that rare alleles are rarely found in homozygotes


HWE Example
=======================================
Albinism is a rare genetically inherited trait that is only expressed in the phenotype of homozygous recessive individuals (aa).

The most characteristic symptom is a marked deficiency in the skin and hair pigment melanin.

This condition can occur among any human group as well as among other animal species.  The average human frequency of albinism in North America is only about 1 in 20,000.

HWE Example (part 2)
=======================================

Referring back to the Hardy-Weinberg equation (p² + 2pq + q² = 1), the frequency of homozygous recessive individuals (aa) in a population is q².  Therefore, in North America the following must be true for albinism:
$$
q^2 = \frac{1}{20,000} = 0.00005
$$

By taking the square root of both sides of this equation, we get:

$$
q=.00707
$$

In other words, the frequency of the recessive albinism allele (a) is $.00707$ or about 1 in 140.  Knowing one of the two variables $(q)$ in the Hardy-Weinberg equation, it is easy to solve for the other $(p)$.

HWE Example (part 3)
=======================================

Solving:
$$
\begin{align}
p &= 1 - q \\
  &= 1 - 0.00707 \\
  &= 0.99293
\end{align}
$$
The frequency of the dominant, normal allele (A) is, therefore, .99293 or about 99 in 100.

HWE Example (part 4)
=======================================
The next step is to plug the frequencies of p and q into the Hardy-Weinberg equation:


$$
p^2 + 2pq + q^2 = 1 \\
(.993)^2 + 2 (.993)(.007) + (.007)^2 = 1 \\
.986 + .014 + .00005 = 1
$$

HWE Example (part 4)
========================================
This gives us the frequencies for each of the three genotypes for this trait in the population:

$p^2$ =
predicted frequency
of homozygous
dominant individuals	 = .986 = 98.6%

$2pq$ =
predicted frequency
of heterozygous
individuals	 = .014 = 1.4%

$q^2$ =
predicted frequency
of homozygous
recessive individuals
(the albinos)	 = .00005 = .005%

HWE Example (part 5)
========================================

With a frequency of .005% (about 1 in 20,000), albinos are extremely rare.  However, heterozygous carriers for this trait, with a predicted frequency of 1.4% (about 1 in 72), are far more common than most people imagine.  There are roughly 278 times more carriers than albinos.  Clearly, though, the vast majority of humans (98.6%) probably are homozygous dominant and do not have the albinism allele.

Value of HWE
==========================================

By the outset of the 20th century, geneticists were able to use Punnett squares to predict the probability of offspring genotypes for particular traits based on the known genotypes of their two parents when the traits followed simple Mendelian rules of dominance and recessiveness.

The Hardy-Weinberg equation essentially allowed geneticists to do the same thing for entire populations.

Hardy Weinberg Implementation
=========================================
```{r}
install.packages("HardyWeinberg", repos = "http://cran.us.r-project.org")
library("HardyWeinberg")
```


R Example
==================================
We store the genotype counts (298, 489 and 213 for $MM$, $MN$ and $NN$ respectively)
```{r}
x <- c(MM = 298, MN = 489, NN = 213)
HW.test <- HWChisq(x, verbose = TRUE)
```

R Example (part 2)
=============================================

This shows that the chi-square statistic has value 0.179, and that the corresponding p value
for the test is 0.6723. Taking Taking a significance level of $\alpha = 0.05$, we do not reject HWE
for the mn locus.

R Example (part 3)
=============================================

When verbose is set to FALSE (default) the test is silent, and HW.test is
a list object containing the results of the test (chi-square statistic, the p value of the test,
half the deviation from HWE (D) for the heterozygote $D = \frac{1}{2}(f_{AB} − e_{AB})$, the minor allele
frequency $p$ and the inbreeding coefficient $f$.

The coefficient of inbreeding ("f") is a measure of the likelihood of genetic effects due to inbreeding to be expected based on a known pedigree (i.e. a fully documented genealogy e.g. due to a fixed system of breeding).

R Example (part 4)
=============================================

By default, HWChisq applies a continuity correction. This is not recommended for low minor allele frequencies. In order to perform a chi-square test without Yates’ continuity correction, it is necessary to set the cc parameter to zero.

HWE with Correction
=============================================

```{r}
HW.test <- HWChisq(x, cc = 0, verbose = TRUE)
```

There is no significant deviation from HWE.

Chi-Square Distribution
=======================================
```{r}
plotChi<-function(){
x=seq(0,30,0.1)
plot(x,dchisq(x,1),main="Chi-distribution",type="l",col="black")
colors <- c("red", "blue", "darkgreen", "gold", "black")
degf <- c(1, 3, 8, 30)
for (i in 1:4){lines(x,dchisq(x,degf[i]), lwd=2, col=colors[i])}
labels <- c("df=2", "df=3", "df=8", "df=30", "df=1")
legend("topright", inset=.05, title="Chi Square Distribution",labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
}
```


Chi-Square Distribution (part 2)
=======================================
```{r, out.width = '500px', out.height = '500px'}
plotChi()
```

Chi-Square Test
=======================================

* A statistical test that can test out ratios is the Chi-Square aka Goodness of Fit test.

* An important question to answer in any genetic experiment is how can we decide if our data fits any of the Mendelian ratios we have discussed.

Chi-Square Test (part 2)
=======================================

Chi-Square Formula:

$$
\chi^2 = \sum\frac{(Observed\,Value - Expected\,Value)^2}{Expected\,Value}
$$

Degrees of freedom (df) = n-1 where n is the number of classes


Chi-Square Example
============================

The chi-square test provides a method for testing the association between the row and column variables in a two-way table. The null hypothesis $H_0$ assumes that there is no association between the variables (in other words, one variable does not vary according to the other variable)

The alternative hypothesis $H_a$ claims that some association does exist. The alternative hypothesis does not specify the type of association, so close attention to the data is required to interpret the information provided by the test.


Chi-Square Example (part 2)
============================

Original

|Goals|4|5|6|
|----|---------------|---------------|----|
| Grades | 49 | 50 |69|
| Popular |24 |36 |38|
| Sports |19|22|28|

Expected

|Goals|4|5|6|
|----|---------------|---------------|----|
| Grades |46.1|54.2|67.7|
| Popular |26.9|31.6|39.5|
| Sports |18.9|22.2|27.8|
Grade 4 with "grades" chosen to be most important, is calculated to be 168*92/335 = 46.1, for example.


Chi-Square Example (part 2)
============================

The chi-square statistic for the above example is computed as follows:
$$
X^2 = \frac{(49 - 46.1)^2}{46.1} + \frac{(50 - 54.2)^2}{54.2} + \frac{(69 - 67.7)^2}{67.7} \\
+ .... + \frac{(28 - 27.8)^2}{27.8} \\
= 0.18 + 0.33 + 0.03 + .... + 0.01  \\
= 1.51
$$

Chi-Square Example (part 3)
============================

The degrees of freedom are equal to (3-1)(3-1) = 2*2 = 4, so we are interested in the probability P($X^2$ > 1.51) = 0.8244 on 4 degrees of freedom.

This indicates that there is no association between the choice of most important factor and the grade of the student -- the difference between observed and expected values under the null hypothesis is negligible.

Similiar thing in R
==================================

```{r}
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M))  # Prints test summary
```

Similiar thing in R (part 2)
==================================

We can also get the following metadata:
```{r}
Xsq$observed   # observed counts (same as M)
Xsq$expected   # expected counts under the null
```

Similiar thing in R (part 3)
==================================

```{r}
Xsq$residuals  # Pearson residuals
Xsq$stdres     # standardized residuals
```

Example #2
===================================
```{r}
data(Markers)
Markers[1:12,]
```

Example #2 part 2
======================================

Note that this data is at the level of each individual. Dataframe Markers contains one SNP
with missings (SNP1), the two allele intensities of that SNP (iG and iT) and two covariate
markers (SNP2 and SNP3). Here, the covariates have no missing values. We first test SNP1 for
HWE using a chi-square test and ignoring the missing genotypes:


Example #3 part 4
======================================
The data we just looked at is in a data frame, but the Hardy-Weinberg Chi Square test only takes a vector of genotype counts.  How do we convert between these two?

Example #3 part 3
======================================

Answer: We need to take one vector from the Markers data frame.
```{r}
Xt <- table(Markers[,1])
Xv <- as.vector(Xt)
names(Xv) <- names(Xt)
HW.test <- HWChisq(Xv,cc=0,verbose=TRUE)

```


Inbreeding coefficient
==========================================


HWE Power
======================================
Tests for HWE have low power for small samples with a low minor allele frequency or samples that deviate only moderately from HWE. It is therefore important to be able to compute power.

The function **HWPower** can be used to compute the power of a test for HWE. Function mac is used to compute the minor allele count. When setting $\theta=4$ we get the type 1 error rate of the test.

Type 2 occurs when the null hypothesis is false, but erroneously fails to be rejected.
Type 1 occurs when the null hypothesis $H_0$ is true, but is rejected

HWE Power (part 2)
=======================================

```{r}
 x <- c(MM = 298, MN = 489, NN = 213)
n <- sum(x)
nM <- mac(x)
pw4 <- HWPower(n, nM, alpha = 0.05, test = "exact", theta = 4,
               pvaluetype = "selome")
print(pw4)
```

HWE Power (part 3)
=========================================
```{r}
 pw8 <- HWPower(n, nM, alpha = 0.05, test = "exact", theta = 8,
                pvaluetype = "selome")
print(pw8)
```

HWE Power (part 4)
==========================================
These computations show that for a large sample like this one, the type I error rate (0.0482) is very close to the nominal rate, 0.05

Also the standard exact test has good power (0.9997) for detecting deviations as large $\theta = 8$, which is a doubling of the number of heterozygotes
with respect to HWE.

Type I error rate and power for the chi-square test can be calculated
by setting **test="chisq"**.

With the allele frequency of this sample (0.5425), $\theta = 8$ amounts to an inbreeding coefficient of -0.1698.


Plotting HWE
================================
Genetic association studies, genome-wide association studies in particular, use many genetic markers.

In this context graphics such as ternary plots, log-ratio plots and Q-Q plots become particularly useful, because they can reveal whether HWE is a reasonable assumption for the whole data set.

We begin to explore the Han Chinese HapMap data set by making a ternary plot.

Plotting HWE (part 2)
================================
```{r,out.width = '500px', out.height = '500px'}
data("HapMapCHBChr1", package = "HardyWeinberg")
HWTernaryPlot(HapMapCHBChr1, region = 1, vbounds = FALSE)
```


Plotting HWE (part 3)
==================================
```{r,out.width = '500px', out.height = '500px'}
HWTernaryPlot(HapMapCHBChr1, region = 7, vbounds = FALSE)
```


Using Vcf
==================================

Install required packages
```{r}
source("https://bioconductor.org/biocLite.R")
biocLite("VariantAnnotation")

```

Using Vcf part 2
===============================

```{r}
library(VariantAnnotation)
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
```


Using Vcf part 3
================================

```{r}
## The return value is a data.frame with genotype counts
## and allele frequencies.
df <- snpSummary(vcf)
df
```

Using Vcf part 4
================================

```{r,out.width = '500px', out.height = '500px'}
## Compare to ranges in the VCF object:
rowRanges(vcf)
```


Using Vcf part 5
===================================

 * No statistics were computed for the variants in rows 3, 4 and 5.

 * They were omitted because row 3 has two alternate alleles, row 4 has none and row 5 is not a SNP.

```
                                   ALT      QUAL      FILTER
                 <DNAStringSetList> <numeric> <character>
       rs6054257                  A        29        PASS
    20:17330_T/A                  A         3         q10
       rs6040355                G,T        67        PASS
  20:1230237_T/.                           47        PASS
       microsat1             G,GTCT        50        PASS
```

PCA worked example part 1
================================

Lets go through a worked example.

Let us analyze the following 3-variate dataset with 10 observations. Each observation consists of 3 measurements on a wafer: thickness, horizontal displacement, and vertical displacement.

PCA worked example part 2
================================

$$
X=
\left[
\begin{array}
{ccc}
7 & 4 & 3 \\
4 & 1 & 8 \\
6 & 3 & 5 \\
8 & 6 & 1 \\
8 & 5 & 7 \\
7 & 2 & 9 \\
5 & 3 & 3 \\
9 & 5 & 8 \\
7 & 4 & 5 \\
8 & 2 & 2 \\
\end{array}
\right]
$$

PCA worked example part 3
================================
First compute the correlation matrix.

$$
R=
\left[
\begin{array}
{ccc}
1.00 & 0.67 & -0.10 \\
0.67 & 1.00 & -0.29 \\
-0.10 & -0.29 & 1.00 \\
\end{array}
\right]
$$

I don't think the variance has to be 1, I think the data may have just be chosen cleverly?

PCA worked example part 4
================================

I have constructed the data in R because finding eigenvalues sucks.

```{r}
R<-cbind(c(1,0.67,-0.1), c(0.67,1,-0.29),c(-0.1,-0.29,1))
R
```

PCA worked example part 5
================================
Lets compute eigenvalues and eigenvectors

```{r}
eigen(R)
```

PCA worked example part 6
================================

A couple of notes:

* Each eigenvalue satisfies $|R−\lambda I|=0$.

* The sum of the $eigenvalues=3=p$, which is equal to the trace of **R** (i.e., the sum of the main diagonal elements).

* The determinant of R is the product of the eigenvalues.

* The product is $lambda_1×\lambda_2×\lambda_3=0.499$.

PCA worked example part 6
================================

Substituting the first eigenvalue of 1.769 and **R** in the appropriate equation we obtain

$$
R=
\left[
\begin{array}
{ccc}
−0.769 & 0.670 & −0.100 \\
0.67 & 1.00 & -0.29 \\
-0.10 & -0.29 & −0.769 \\
\end{array}
\right]
\left[
\begin{array}
{c}
v_{11} \\
v_{21} \\
v_{31} \\
\end{array}
\right] =
\left[
\begin{array}
{c}
0 \\
0 \\
0 \\
\end{array}
\right]
$$

PCA worked example part 7
================================

This is the matrix expression for three homogeneous equations with three unknowns and yields the first column of V:   0.64  0.69  -0.34  (again, a computerized solution is indispensable).

Repeating this procedure for the other two eigenvalues yields the matrix V.

$$
V=
\left[
\begin{array}
{ccc}
0.64 & 0.38 & -0.66 \\
0.69 & 0.10 & 0.72 \\
-0.34 & 0.91 & 0.20 \\
\end{array}
\right]
$$

PCA worked example part 8
================================

Now form the matrix $L^{1/2}$, which is a diagonal matrix whose elements are the square roots of the eigenvalues of R.


$$
L^{1/2}=
\left[
\begin{array}
{ccc}
1.33 & 0 & 0 \\
0 & 0.96 & 0 \\
0 & 0 & 0.55 \\
\end{array}
\right]
$$

PCA worked example part 9
================================

Then obtain **S**, the factor structure, using $S=VL^{1/2}$.

$$
S=
\left[
\begin{array}
{ccc}
0.64 & 0.38 & -0.66 \\
0.69 & 0.10 & 0.72 \\
-0.34 & 0.91 & 0.20 \\
\end{array}
\right]
\left[
\begin{array}
{ccc}
1.33 & 0 & 0 \\
0 & 0.96 & 0 \\
0 & 0 & 0.55 \\
\end{array}
\right]
=
\left[
\begin{array}
{ccc}
0.85 & 0.37 & -0.37 \\
0.91 & 0.10 & 0.40 \\
-0.45 & 0.88 & 0.11 \\
\end{array}
\right]
$$

This can be read as 0.91 is the correlation between the second variable and the first principal component.  Similiarly for other elements.

PCA worked example part 10
================================

Next compute the communality, using the first two eigenvalues only.

$$
SS^{\prime} =
\left[
\begin{array}
{cc}
0.85 & 0.37 \\
0.91 & 0.10 \\
-0.45 & 0.88 \\
\end{array}
\right]
\left[
\begin{array}
{cc}
0.85 & 0.91 & -0.45 \\
  0.37 & 0.09 & 0.88 \\
\end{array}
\right] =
\left[
\begin{array}
{ccc}
0.8662 & 0.8140 & −0.0606 \\
0.8140 & 0.8420 & −0.3321 \\
−0.0606 & −0.3321 & 0.9876 \\
\end{array}
\right]
$$

PCA worked example part 11
================================
So we see from this the following results:

$$
1	0.8662 \\
2	0.8420 \\
3	0.9876 \\

$$

This means that the first two principal components "explain" 86.62% of the first variable, 84.20% of the second variable, and 98.76% of the third.

PCA worked example part 12
================================
The coefficient matrix, $B$, is formed using the reciprocals of the diagonals of $L^{1/2}$

$$
B = VL^{-1/2} = \left[
\begin{array}
{ccc}
0.48 & 0.40 & −1.20 \\
0.52 & 0.10 & 1.31 \\
-0.26 & 0.95 & 0.37 \\
\end{array}
\right]
$$

PCA worked example part 13
================================

Finally, we can compute the factor scores from ZB, where Z is X converted to standard score form. These columns are the principal factors.

We find these values by taking the original values, subtracting the column mean, then multiplying that new matrix of the principal components coefficients matrix B.

PCA worked example part 14
================================

$$
F = ZB = \left[
\begin{array}
{ccc}
0.41 & -0.69 & 0.06 \\
-2.11 & 0.07 & 0.63 \\
-0.46 & -0.32 & 0.30 \\
1.62 & -1.00 & 0.70 \\
0.70 & 1.09 & 0.65 \\
-0.86 & 1.32 & -0.85 \\
-0.60 & -1.31 & 0.86 \\
0.94 & 1.72 & -0.04 \\
0.22 & 0.03 & 0.34 \\
0.15 & -0.91 & -2.65
\end{array}
\right]
$$


Google Genomics
================================
A couple years ago, Google announced an effort to collection Genomics data and work to build tools for scale.


Google Genomics
=================================
Google has a number of tools, one of which is a tool for running Hardy Weinberg

http://googlegenomics.readthedocs.org/en/latest/use_cases/analyze_variants/hardy_weinberg_equilibrium.html

This allows to make calls to the Google Genomics endpoints to analyze our data and return the metadata we need in R.


Using Google Genomics
===================================
Download correct files

~~~~~~~
~/projects]$ git clone git@github.com:googlegenomics/codelabs.git
~~~~~~~

Run BigQuery Analysis
===================================
~~~~~~~
sortAndLimit <- "ORDER BY ChiSq DESC, reference_name, start, alternate_bases LIMIT 1000"
result <- DisplayAndDispatchQuery("./sql/hardy-weinberg.sql",
                                  project=project,
                                  replacements=c("#_ORDER_BY_"=sortAndLimit,
                                                 queryReplacements))
~~~~~~~~