fig3_varianceAnalysis.Rmd

---
title: "HubMap pancreata exploratory figures and variance analysis"
output: html_document
---

This document describes the basic analysis we need to showcase the value of the Hubmap data and proteomics analysis.

```{r loading, echo=F, warning=F, message=F}
library(dplyr)
library(purrr)
library(devtools)
#library(MSnSet.utils)
if(!require(MSnSet.utils))
  devtools::install_github('PNNL-Comp-Mass-Spec/MSnSet.utils')

library(ggplot2)
library(ggfortify)
library(cowplot)
library(ggbiplot)

source('loadHumanPancData.R')

```

Note that we store all files on synapse at [http://synapse.org/hubmap](http://synapse.org/hubmap). This includes basic data files and analysis results.

Now we have a few ways to normalize.

## Plot of individual proteins

We first need to plot the original protein expression levels

```{r Insulin expression plots}

##we use this function every time
fixed.crosstab<-correctMissingProteins(fulltab)

insgluc<-fixed.crosstab[c('INS_HUMAN','GLUC_HUMAN'),]%>%
  as.data.frame()%>%
  tibble::rownames_to_column('protein')%>%
  tidyr::pivot_longer(2:(1+ncol(fixed.crosstab)),names_to='sample',values_to='logRatio')%>%
  left_join(tibble::rownames_to_column(isletMeta,'sample'))


##now we can plot grid
p<- ggplot(insgluc)+
   geom_raster(aes(x=`X-coord`,y=`Y-coord`,fill=logRatio))+
   scale_fill_viridis_c()+coord_equal()+
   geom_point(aes(x=`X-coord`,y=`Y-coord`,col=IsletOrNot,alpha=0.1))+
   scale_color_manual(values=c('black','white'))+
   facet_grid(protein~`Islet Number`)+theme_classic()

p
ggsave('fig3_protExpr.pdf',p,width=8)#,width=6)

```


Now that we can summarize the overlap between various regions we can also evaluate how various proteins vary. Basically there are over 1000 proteins that are generally up-regulated and down-regulated in islet. 

```{r variable proteins}

library(leapR)

#data("ncipid")
 data('krbpaths')
##map features to Gene Names
map<-read.table('uniprotMap.txt',header = TRUE)


var.vals <- apply(fixed.crosstab,1,var,na.rm=T)%>%sort()%>%
as.data.frame()%>%
tibble::rownames_to_column('From')%>%
  #tidyr::separate(feature,sep='\\|',into=c('sp','id','From'))%>%
  left_join(map)%>%
  dplyr::select(-c(From))%>%
  #subset(!is.na(var.vals))%>%
  subset(!is.na(To))%>%
  distinct()

colnames(var.vals)[1]<-'variance'
#rownames(var.df)

hist(log10(var.vals$variance))


##now remove dupes
dupes<-var.vals$To[which(duplicated(var.vals$To))]

non.dupes<-var.vals%>%
  subset(!To%in%dupes)

fixed.dupes<-var.vals%>%
    subset(To%in%dupes)%>%
    dplyr::group_by(To,.drop=F)%>%
    dplyr::summarize(maxVar=max(variance))%>%
    left_join(var.vals)%>%
    subset(maxVar==variance)%>%
  dplyr::select(-maxVar)

full.var<-rbind(non.dupes,fixed.dupes)%>%
#  tibble::remove_rownames()%>%
 # tibble::column_to_rßownames('To')%>%
  arrange(desc(variance))

##Now we can do analysis on thoes that are changing or not changing.
order.res<-leapR(datamatrix=full.var,krbpaths,'enrichment_in_order',
                 primary_columns='variance',id_column='To',minsize=10)%>%
  subset(BH_pvalue<0.01)

p<-plotLeapR(order.res,'REACTOME')

ggsave('mostVariableReactome.pdf',width=10)

```