-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfig3_varianceAnalysis.Rmd
118 lines (81 loc) · 3.07 KB
/
fig3_varianceAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "HubMap pancreata exploratory figures and variance analysis"
output: html_document
---
This document describes the basic analysis we need to showcase the value of the Hubmap data and proteomics analysis.
```{r loading, echo=F, warning=F, message=F}
library(dplyr)
library(purrr)
library(devtools)
#library(MSnSet.utils)
if(!require(MSnSet.utils))
devtools::install_github('PNNL-Comp-Mass-Spec/MSnSet.utils')
library(ggplot2)
library(ggfortify)
library(cowplot)
library(ggbiplot)
source('loadHumanPancData.R')
```
Note that we store all files on synapse at [http://synapse.org/hubmap](http://synapse.org/hubmap). This includes basic data files and analysis results.
Now we have a few ways to normalize.
## Plot of individual proteins
We first need to plot the original protein expression levels
```{r Insulin expression plots}
##we use this function every time
fixed.crosstab<-correctMissingProteins(fulltab)
insgluc<-fixed.crosstab[c('INS_HUMAN','GLUC_HUMAN'),]%>%
as.data.frame()%>%
tibble::rownames_to_column('protein')%>%
tidyr::pivot_longer(2:(1+ncol(fixed.crosstab)),names_to='sample',values_to='logRatio')%>%
left_join(tibble::rownames_to_column(isletMeta,'sample'))
##now we can plot grid
p<- ggplot(insgluc)+
geom_raster(aes(x=`X-coord`,y=`Y-coord`,fill=logRatio))+
scale_fill_viridis_c()+coord_equal()+
geom_point(aes(x=`X-coord`,y=`Y-coord`,col=IsletOrNot,alpha=0.1))+
scale_color_manual(values=c('black','white'))+
facet_grid(protein~`Islet Number`)+theme_classic()
p
ggsave('fig3_protExpr.pdf',p,width=8)#,width=6)
```
Now that we can summarize the overlap between various regions we can also evaluate how various proteins vary. Basically there are over 1000 proteins that are generally up-regulated and down-regulated in islet.
```{r variable proteins}
library(leapR)
#data("ncipid")
data('krbpaths')
##map features to Gene Names
map<-read.table('uniprotMap.txt',header = TRUE)
var.vals <- apply(fixed.crosstab,1,var,na.rm=T)%>%sort()%>%
as.data.frame()%>%
tibble::rownames_to_column('From')%>%
#tidyr::separate(feature,sep='\\|',into=c('sp','id','From'))%>%
left_join(map)%>%
dplyr::select(-c(From))%>%
#subset(!is.na(var.vals))%>%
subset(!is.na(To))%>%
distinct()
colnames(var.vals)[1]<-'variance'
#rownames(var.df)
hist(log10(var.vals$variance))
##now remove dupes
dupes<-var.vals$To[which(duplicated(var.vals$To))]
non.dupes<-var.vals%>%
subset(!To%in%dupes)
fixed.dupes<-var.vals%>%
subset(To%in%dupes)%>%
dplyr::group_by(To,.drop=F)%>%
dplyr::summarize(maxVar=max(variance))%>%
left_join(var.vals)%>%
subset(maxVar==variance)%>%
dplyr::select(-maxVar)
full.var<-rbind(non.dupes,fixed.dupes)%>%
# tibble::remove_rownames()%>%
# tibble::column_to_rßownames('To')%>%
arrange(desc(variance))
##Now we can do analysis on thoes that are changing or not changing.
order.res<-leapR(datamatrix=full.var,krbpaths,'enrichment_in_order',
primary_columns='variance',id_column='To',minsize=10)%>%
subset(BH_pvalue<0.01)
p<-plotLeapR(order.res,'REACTOME')
ggsave('mostVariableReactome.pdf',width=10)
```