Repository for the Epigenetic and High-Dimension Mediation Data Challenge (Aussois, June 7-9 2017)
To participate to the challenge, you need to install R on your computer. To make R easier to use, we suggest to install RStudio, which is an integrated development environment (IDE) for R.
To install R packages that are useful for the data challenge, copy and paste in R the following pieces of code
#Install R packages for the Epigenetic & High-Dimension Mediation Data Challenge
#Package to read large table
install.packages("data.table")
#Package for controlling FDR and uses empirical null distribution
install.packages("fdrtool")
#Package to make an R developer's life easier
install.packages("devtools")
install.packages("tidyverse")
#Package to perform mediation analysis with multiple mediators
devtools::install_github("YinanZheng/HIMA")
#Package to compute Sobel test
install.packages("multilevel")
#Package q-value for controlling FDR
#Try https:// or http://
source("http://bioconductor.org/biocLite.R")
biocLite("qvalue")
biocLite("sva")
#Package for Confounder Adjusted Testing
install.packages("cate")
Dataset for the 1st challenge and 2nd challenge can be loaded in R using the following pieces of code.
require(data.table)
if (!file.exists("challenge1.txt"))
download.file("https://goo.gl/iLFGeC",destfile="challenge1.txt")
data1<-fread("challenge1.txt",header=TRUE,data.table=FALSE)
if (!file.exists("challenge2.txt"))
download.file("https://goo.gl/OzDSRO",destfile="challenge2.txt")
data2<-read.table("challenge2.txt")
A tutorial provides details about how to form a team and how to submit lists of markers.
To participate to the challenge, you should form teams. A team can be composed of 1, 2, or 3 participants. Once you have chosen a name for your team, send an email to Florian Privé using "team mediation 2017" as email subject. A key for your team will then be sent to you by email.
The objective of the two data challenges is to find markers that are involved in the mediation of a health outcome by a factor of exposure. For instance, you are asked to find, in the 2nd data challenge, the methylation markers that are involved in the mediation of skin cancer by sun exposure.
To submit a list of markers involved in mediation, you should use the submission website. An example of submission file containing a list of markers involved in mediation is contained in the file mysubmission.txt.
The ranking of the participants will be based on the F1 score. The F1 score depends on the false discovery rate (FDR), which is the percentage of false positive markers in the submitted list, and of the power, which is the percentage of markers involved in mediation, which are found in the submitted list. The F1 score is equal to the harmonic mean of the power and of one minus the false discovery rate
At the end of each challenge, send an email to Florian Privé with a text file containing the list of P-values (one p-value per line).
write(mypvalue,file="pval_nameofmyteam_challengenumber.txt")
A graphic showing your mediation analysis will be returned to you by mail.
At the end of each challenge, prepare 3 slides for debriefing, which contain Materials and Methods, Results and Discussion.
We provide some files to show examples of mediation analysis in R.
Mediation analysis: Baron and Kenny procedure, Sobel test, and 2 data challenges