Skip to content

How to extract row/column order after clustering in a heatmap? #91

@CaoWZ254

Description

@CaoWZ254

The ggalign package has been very beneficial for my actual work, but there's a confusion that I haven't found a good solution for.

In bioinformatics analysis, there is a type of heatmap. Its main body uses relative abundance for representation, while the side annotations display the absolute abundance of each gene to express richer gene abundance information. My question is: After clustering using align_dendro(), the column order of the main heatmap changes accordingly. At this point, how can one conveniently extract the changed column order and apply it to the absolute abundance plot?

First, the ComplexHeatmap package provides simple parameters to achieve this purpose, but I am not familiar with base R graphics grammar:

library(ComplexHeatmap)

set.seed(100)
expr_matrix <- matrix(sample(c(0,1), 400, replace = TRUE),
                     nrow = 20, ncol = 20)

rownames(expr_matrix) <- paste0("Gene", 1:20)
colnames(expr_matrix) <- paste0("Sample", 1:20)

# Calculate z-score
zscore_matrix <- t(scale(t(expr_matrix)))

# Calculate total expression per row
expr_counts <- rowSums(expr_matrix)

# Create heatmap
zscore_matrix |>
 Heatmap(
   name = " ",
   rect_gp = gpar(col = "white", lwd = 2),
   row_names_gp = gpar(fontsize = 8),
   column_names_rot = 45,
   column_names_gp = gpar(fontsize = 8),
   cluster_rows = TRUE,
   cluster_columns = TRUE,
   show_row_dend = TRUE,
   show_column_dend = TRUE,
   right_annotation = rowAnnotation(
     ExprCount = anno_barplot(
       expr_counts,
       bar_width = 0.8,
       gp = gpar(fill = "steelblue"),
       axis_param = list(
         gp = gpar(fontsize = 6)
       )
     )
   )
 ) -> ht

Secondly, I wrote my own set of code to apply to the ggalign workflow:

library(tidyverse)
library(ggalign)

get_gene_order <- function(
    zscore_matrix,
    method = "complete",
    distance = "euclidean"
) {
  dist_mat <- dist(zscore_matrix, method = distance)
  hc_row <- hclust(dist_mat, method = method)
  hc_row$labels[hc_row$order]
}

create_row_barplot <- function(
    expr_matrix,
    gene_order,
    fill_color = "steelblue"
) {
  Gene <- rownames(expr_matrix)
  Count <- rowSums(expr_matrix)

  tibble::tibble(Gene, Count) |>
    mutate(Gene = forcats::fct_relevel(Gene, gene_order)) |>
    ggplot(aes(x = Count, y = Gene)) +
    geom_bar(stat = "identity", fill = fill_color, color = "black") +
    theme_minimal() +
    scale_x_continuous(labels = scales::label_number(accuracy = 1)) +
    theme(
      axis.ticks.y = element_blank(),
      axis.text.y = element_blank(),
      axis.text.x = element_text(angle = 90, hjust = 1),
      axis.title = element_blank(),
      panel.grid = element_blank(),
      panel.border = element_rect(color = "black")
    )
}


ggene_heatmap <- function(
    expr_matrix,
    bar_fill = "steelblue",
    cluster_palette_row = "Set1",
    cluster_palette_col = "Dark2",
    dendro_size = 0.2,
    tile_color = "black",
    tile_width = 0.9,
    tile_height = 0.9,
    axis_text_x_angle = 45,
    cluster_method = "complete",
    cluster_distance = "euclidean"
) {
  # Step 1: Calculate z-score
  zscore_matrix <- t(scale(t(expr_matrix)))

  # Step 2: Get row clustering order
  gene_order <- get_gene_order(
    zscore_matrix,
    method = cluster_method,
    distance = cluster_distance
  )

  # Step 3: Draw right-side barplot
  p_bar <- create_row_barplot(expr_matrix, gene_order, fill_color = bar_fill)

  # Step 4: Assemble ggalign heatmap
  zscore_matrix |>
    ggheatmap(filling = NULL) +
    geom_tile(
      aes(fill = value),
      color = tile_color,
      width = tile_width,
      height = tile_height
    ) +
    scale_fill_viridis_c(name = "") +
    theme_minimal() +
    theme(
      axis.text.x = element_text(angle = axis_text_x_angle, hjust = 1),
      axis.title = element_blank()
    ) +
    # Top dendro
    anno_top(size = dendro_size) +
    align_dendro() +
    scale_color_brewer(palette = cluster_palette_col, guide = "none") +
    theme(
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    ) +
    # Left dendro
    anno_left(size = dendro_size) +
    align_dendro() +
    scale_color_brewer(palette = cluster_palette_row, guide = "none") +
    theme(
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    ) +
    # Right bar chart
    quad_switch(position = "right", size = dendro_size) +
    p_bar
}

ggene_heatmap(expr_matrix)

Although the above code can run, I believe this is not the best solution for the ggalign workflow. Therefore, I sincerely ask if you have a method to achieve either of the following goals:

  1. Extract the row/column order of the heatmap after clustering.
  2. Precisely align the row/column order of two plots when their axis labels are completely identical.

I look forward to your reply!

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationgood first issueGood for newcomers

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions