Skip to content

Commit 6cda6a2

Browse files
committed
add post
1 parent e6de942 commit 6cda6a2

File tree

4 files changed

+259
-0
lines changed

4 files changed

+259
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
.Rhistory
33
.RData
44
.Ruserdata
5+
tidyomicsBlog.Rproj
Loading
Loading
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
---
2+
title: "Introducing the Tidyomics Ecosystem: A Comprehensive Guide to Tidy Transcriptomics"
3+
author: "Stefano Mangiola"
4+
date: "`r Sys.Date()`"
5+
slug: []
6+
tags:
7+
- tidyomics/tidyomicsBlog
8+
- ecosystem
9+
- transcriptomics
10+
- tidyverse
11+
- bioconductor
12+
lastmod: "`r Sys.Date()`"
13+
keywords: []
14+
description: "A comprehensive introduction to the tidyomics ecosystem, including core packages, publications, GitHub projects, and community resources for tidy transcriptomics analysis."
15+
comment: false
16+
toc: true
17+
autoCollapseToc: false
18+
postMetaInFooter: true
19+
hiddenFromHomePage: false
20+
contentCopyright: false
21+
reward: false
22+
mathjax: false
23+
mathjaxEnableSingleDollar: false
24+
mathjaxEnableAutoNumber: false
25+
hideHeaderAndFooter: false
26+
flowchartDiagrams:
27+
enable: false
28+
options: ''
29+
sequenceDiagrams:
30+
enable: false
31+
options: ''
32+
format:
33+
html:
34+
toc: true
35+
code-fold: false
36+
---
37+
38+
![Tidyomics Ecosystem Cover](images/tidyomics_book_cover.png){.cover-image width=100%}
39+
40+
# Introduction
41+
42+
The tidyomics ecosystem was born from a common challenge faced by life-scientists: every omics technology and framework in R seemed to require learning a new data structure and synthax. Switching from bulk RNA-seq to single-cell, or from expression data to genomic ranges, often felt climbing a different mountain. Tidyomics keeps the **underlying objects exactly the same** while giving them a single, tidyverse-flavoured grammar so that moving from bulk RNA-seq to single-cell or spatial data is no harder than shifting between two dplyr pipelines. Its design principles take inspiration from the tidyverse philosophy of clear, human-readable code as articulated by Wickham *et al.* (2019) ([JOSS 10.21105/joss.01686](https://joss.theoj.org/papers/10.21105/joss.01686)).
43+
44+
That question snowballed into an international collaboration—and ultimately into **tidyomics**.
45+
46+
# What is Tidyomics?
47+
48+
**Tidyomics** is an open project to develop and integrate software and documentation to enable a tidy data analysis framework for omics data objects ([Hutchison *et al.* 2024](https://doi.org/10.1038/s41592-024-02299-2)). The development of packages and tutorials is organized around [tidyomics open challenges](https://github.com/tidyomics/). Tidyomics enables the use of familiar tidyverse verbs (`select`, `filter`, `mutate`, etc.) to manipulate rich data objects in the Bioconductor ecosystem. Importantly, the data objects are not modified, but tidyomics provides a tidy *interface* to work on the native objects, leveraging existing Bioconductor classes and algorithms.
49+
50+
**Tidyomics** is a set of R packages by an international group of developers. The ecosystem allows for code such as:
51+
52+
```r
53+
single_cell_data |>
54+
filter(Phase == "G1") |>
55+
ggplot(aes(UMAP_1, UMAP_2, color=score)) +
56+
geom_point()
57+
```
58+
59+
(filter single cells in G1 phase and plot UMAP coordinates)
60+
61+
or
62+
63+
```r
64+
chip_seq_peaks |>
65+
filter(FDR < 0.01) |>
66+
join_overlap_inner(promoters) |>
67+
group_by(promoter_type) |>
68+
summarize(ave_score = mean(score))
69+
```
70+
71+
(compute average score by the type of promoter overlap for significant peaks)
72+
73+
## Core Principles
74+
75+
The tidyomics ecosystem is built on several fundamental principles:
76+
77+
- **Tidy interface to native objects**: Provides tidy verbs while preserving Bioconductor object structure
78+
- **Verbose, jargon-free vocabulary**: Function and variable names are designed to be self-explanatory
79+
- **Minimal temporary variables**: Reduce the need for intermediate variables through chaining operations
80+
- **Consistent interfaces**: Provide uniform interfaces across different data containers
81+
- **Compatibility**: Work seamlessly with existing Bioconductor and tidyverse workflows
82+
83+
## Omics Integration Under a Unique Consistent Interface
84+
85+
The tidyomics ecosystem provides a unified approach to omics data analysis, enabling seamless integration across different omics domains through a consistent tidy interface.
86+
87+
![Tidyomics Network Integration](images/tidyomics_net.png){.center-image width=80%}
88+
89+
This integration allows researchers to work with transcriptomics, genomics, and other omics data using the same familiar tidyverse verbs, regardless of the underlying data structure.
90+
91+
# Core Packages
92+
93+
Before diving into the individual packages you can simply load the **meta-package** and immediately gain access to all tidyomics functionality:
94+
95+
```{r}
96+
#| eval: false
97+
# install.packages("tidyomics") # CRAN or r-universe when available
98+
library(tidyomics) # loads tidySummarizedExperiment, tidySingleCellExperiment, plyranges, etc.
99+
```
100+
101+
With a single call you have a tidy interface ready for bulk, single-cell and genomic range data.
102+
103+
## Transcriptomics Packages
104+
105+
Each tidyomics package tackles a real-world analytical challenge. Bulk RNA-seq analyses, for example, are traditionally scattered across disjoint data frames, objects and helper lists. `tidySummarizedExperiment` re-imagines a `SummarizedExperiment` as a tibble-first citizen: you can `filter()`, `mutate()` and `group_by()` genes or samples exactly as you do with any tidyverse data frame. For single-cell data the same philosophy inspired `tidySingleCellExperiment`, while for users of the Seurat workflow we created `tidyseurat`, a drop-in tidy wrapper that never compromises the original Seurat object.
106+
107+
### tidySummarizedExperiment
108+
The tidy interface for `SummarizedExperiment` objects, enabling tidyverse operations on bulk RNA-seq data.
109+
110+
**GitHub**: <https://github.com/tidyomics/tidySummarizedExperiment>
111+
112+
### tidySingleCellExperiment
113+
Single-cell experiments often contain millions of cells and dozens of matrices. `tidySingleCellExperiment` flattens this complexity so you can focus on the biology instead of the bookkeeping.
114+
115+
**GitHub**: <https://github.com/tidyomics/tidySingleCellExperiment>
116+
117+
### tidyseurat
118+
For Seurat users, `tidyseurat` adds the missing tidyverse layer without forcing you to abandon familiar Seurat functions.
119+
120+
**GitHub**: <https://github.com/stemangiola/tidyseurat>
121+
122+
### tidySpatialExperiment
123+
Spatial transcriptomics combines gene expression with tissue geography. `tidySpatialExperiment` brings the tidy philosophy to `SpatialExperiment` objects so you can transform, visualise and model spatial spots with the same verbs you already use for bulk and single-cell data.
124+
125+
**GitHub**: <https://github.com/william-hutchison/tidySpatialExperiment>
126+
127+
## Genomics Packages
128+
129+
Genomic ranges represent locations along chromosomes—think of them as the geographical coordinates of the genome. With traditional Bioconductor tools, even simple tasks such as “take promoters and find overlaps with ATAC-seq peaks” require specialised syntax. The tidy answer is **`plyranges`**, a grammar that lets you manipulate `GRanges` with the fluency of dplyr verbs. And because biology is three-dimensional, the sister package **`plyinteractions`** brings the same elegance to chromatin-interaction data.
130+
131+
### plyranges
132+
A tidy interface for genomic ranges data, providing a grammar of genomic data manipulation.
133+
134+
**GitHub**: [https://github.com/tidyomics/plyranges](https://github.com/tidyomics/plyranges)
135+
136+
## Analysis Packages (tidyomics ecosystem)
137+
138+
The core adapters above focus on **data representation**; the packages below provide high-level analysis grammars that build on those tidy foundations.
139+
140+
### tidybulk
141+
142+
**GitHub**: <https://github.com/stemangiola/tidybulk>
143+
144+
### plyinteractions
145+
A tidy interface for genomic interaction data, enabling analysis of chromatin interactions.
146+
147+
**GitHub**: [https://github.com/tidyomics/plyinteractions](https://github.com/tidyomics/plyinteractions)
148+
149+
150+
# Publications
151+
152+
*Hutchison W.J.*, *Keyes T.J.*, *et al.* (2024). **“The tidyomics ecosystem: enhancing omic data analyses.”** *Nature Methods* 21, 1166–1170. DOI [10.1038/s41592-024-02299-2](https://doi.org/10.1038/s41592-024-02299-2)
153+
154+
This community paper introduces tidyomics and demonstrates its scalability on 7.5 million PBMCs from the Human Cell Atlas.
155+
156+
## Transcriptomics
157+
158+
1. *Mangiola S.*, Molania R., Dong R., Doyle M.A. & Papenfuss A.T. (2021). **“tidybulk: a tidy framework for modular transcriptomic data analysis.”** *Genome Biology* 22, 42. DOI [10.1186/s13059-020-02254-4](https://doi.org/10.1186/s13059-020-02254-4)
159+
2. *Mangiola S.*, Doyle M.A. & Papenfuss A.T. (2021). **“Interfacing Seurat with the R tidy universe.”** *Bioinformatics* 37(22), 4100–4103. DOI [10.1093/bioinformatics/btab404](https://doi.org/10.1093/bioinformatics/btab404)
160+
161+
## Genomics
162+
163+
3. *Lee S.*, Cook D. & Lawrence M. (2019). **“plyranges: a grammar of genomic data transformation.”** *Genome Biology* 20, 4. DOI [10.1186/s13059-018-1597-8](https://doi.org/10.1186/s13059-018-1597-8)
164+
165+
# Community
166+
167+
Tidyomics is more than code — it is a **lively community of developers, users and code-curators** who collaborate across academic labs, core facilities and industry groups on five continents. Developers extend the toolbox, users pressure-test new ideas on real datasets, and curators keep documentation and tutorials clear and current. No matter whether you write R every day or are about to analyse your first sequencing experiment, you’ll find mentors ready to help — and eager to learn from your perspective.
168+
169+
## Getting Involved
170+
171+
### Contributing
172+
The tidyomics ecosystem welcomes contributions from the community. You can contribute by:
173+
174+
1. **Reporting Issues**: Use the GitHub issue trackers for each package -> 1. **Reporting Issues** – open or search issues in the relevant repository: <https://github.com/tidyomics>
175+
2. **Submitting Pull Requests**: Contribute code improvements or new features -> 2. **Submitting Pull Requests**<https://github.com/orgs/tidyomics/projects/1>
176+
3. **Improving Documentation**: Help make the ecosystem more accessible -> 3. **Improving Documentation**
177+
4. **Creating Tutorials**: Share your knowledge with the community!
178+
179+
### Communication Channels
180+
181+
- **GitHub Discussions** – start or join a thread in any tidyomics repository: <https://github.com/orgs/tidyomics/projects/1>
182+
- **Bioconductor Support Forum** – tag your post with *tidyomics*: <https://support.bioconductor.org>
183+
- **Zulip Chat** – drop by the `#tidiness_in_bioc` stream for real-time discussion: <https://bioconductor.zulipchat.com/#narrow/stream/184946-tidiness_in_bioc>
184+
185+
186+
### Transcriptomics Example
187+
```{r}
188+
#| eval: false
189+
library(tidyverse)
190+
library(tidybulk)
191+
library(tidySummarizedExperiment)
192+
193+
# Example workflow (requires airway data)
194+
# data(airway, package = "airway")
195+
# airway %>%
196+
# keep_abundant(factor_of_interest = dex) %>%
197+
# scale_abundance() %>%
198+
# test_differential_abundance(~ dex) %>%
199+
# filter(abundant) %>%
200+
# arrange(desc(abs(logFC)))
201+
```
202+
203+
### Genomics Example
204+
```{r}
205+
#| eval: false
206+
library(plyranges)
207+
library(tidyverse)
208+
209+
# Example workflow (requires genomic data)
210+
# granges %>%
211+
# filter(score > 10) %>%
212+
# join_overlap_inner(promoters) %>%
213+
# group_by(gene_id) %>%
214+
# summarize(mean_score = mean(score))
215+
```
216+
217+
### Single-Cell Example
218+
```{r}
219+
#| eval: false
220+
library(tidySingleCellExperiment)
221+
library(tidyverse)
222+
223+
# Example workflow (requires single-cell data)
224+
# sce %>%
225+
# filter(Phase == "G1") %>%
226+
# ggplot(aes(UMAP_1, UMAP_2, color=score)) +
227+
# geom_point()
228+
```
229+
230+
# Future Directions
231+
232+
## Planned Developments
233+
234+
1. **Enhanced Single-Cell Support**: Expanded analysis capabilities for single-cell data
235+
2. **Multi-Omics Integration**: Support for multi-omics data analysis
236+
3. **Cloud Computing**: Integration with cloud-based analysis platforms
237+
4. **Educational Expansion**: More comprehensive educational materials
238+
239+
## Community Goals
240+
241+
1. **Increased Adoption**: Broader adoption in the bioinformatics community
242+
2. **Educational Integration**: Integration into more university curricula
243+
3. **Industry Applications**: Adoption in pharmaceutical and biotech industries
244+
4. **International Collaboration**: Expansion of the global community
245+
246+
# To conclude..
247+
248+
The tidyomics ecosystem represents a significant advancement in omics data analysis, providing a consistent, intuitive, and powerful framework for biological data analysis across multiple domains including transcriptomics and genomics. By bringing the principles of tidy data to omics, the ecosystem makes complex biological analyses more accessible, reproducible, and enjoyable.
249+
250+
Whether you're a seasoned bioinformatician working with transcriptomics or genomics data, or just starting your journey in omics analysis, the tidyomics ecosystem provides the tools and resources you need to analyze your data effectively and efficiently.
251+
252+
The ecosystem continues to grow with new packages and capabilities being developed through the [tidyomics open challenges](https://github.com/tidyomics/), ensuring that the community drives the development of tools that meet real-world needs.
253+
254+
Join the community, contribute to the ecosystem, and help shape the future of tidy omics!
255+
256+
---
257+
258+
*For more information, visit the [tidyomics GitHub organization](https://github.com/tidyomics) or follow us on [Zulip](https://community-bioc.zulipchat.com/#narrow/channel/507542-tidiness_in_bioc).*

0 commit comments

Comments
 (0)