Skip to content

[QST] Why does the draw_graph result look so weird? #527

@Ci-TJ

Description

@Ci-TJ

Thank you very much for developing rapids‑singlecell (rsc). While using rsc, I noticed that the results of its ForceAtlas2 (FA2) implementation differ significantly from those produced by Scanpy and cuGraph. Since I am not very familiar with the underlying principles of FA2, I would like to understand the reasons behind these differences.

start = time.perf_counter()
rsc.get.anndata_to_GPU(adata)
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")

start = time.perf_counter()

SCVI_LATENT_KEY = "X_scVI"

# use scVI latent space for UMAP generation
#rsc.pp.neighbors(adata, use_rep=SCVI_LATENT_KEY)
#rsc.tl.umap(adata, min_dist=0.5)

rsc.pp.neighbors(adata, n_neighbors=30,use_rep="X_scVI",key_added="scVI")
rsc.tl.umap(adata,neighbors_key="scVI",key_added="X_umap_scVI")
rsc.tl.louvain(adata, resolution=0.6,neighbors_key="scVI",key_added="louvain_scVI")
rsc.tl.leiden(adata, resolution=0.6,neighbors_key="scVI",key_added="leiden_scVI")

elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")

start = time.perf_counter()
#rsc.tl.draw_graph(adata)
rsc.tl.draw_graph(adata, max_iter=500)
sc.pl.draw_graph(
    adata,
    color="Tissue",
    legend_loc="best",
    legend_fontsize="xx-small"
)
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")


Image

I used cuGraph to implement a simplified version of the FA2 algorithm.

start = time.perf_counter()
###################################
# Step 1: Extract the neighbor graph
A = adata.obsp["connectivities"].tocoo()

# Step 2: Convert to a cuDF DataFrame
df = cudf.DataFrame({
    "src": cp.asarray(A.row),
    "dst": cp.asarray(A.col),
    "weight": cp.asarray(A.data)
})

# Step 3: Build a cuGraph Graph object
G = cugraph.Graph()
G.from_cudf_edgelist(df, source="src", destination="dst", edge_attr="weight")

# Step 4: Run cuGraph ForceAtlas2
pos = cugraph.force_atlas2(G, max_iter=500)

###################################
# Step 5: Handle isolated nodes (cuGraph does not return nodes with degree = 0)
# Create a full coordinate matrix (n_cells × 2)
coords = np.full((adata.n_obs, 2), np.nan, dtype=np.float32)

# Sort by vertex index
pos_sorted = pos.sort_values("vertex")

# Fill coordinates for nodes returned by cuGraph
coords[pos_sorted["vertex"].to_pandas().values] = (
    pos_sorted[["x", "y"]].to_pandas().values
)

# Store coordinates in AnnData
adata.obsm["X_draw_graph_fa"] = coords

###################################
# Step 6: Make Scanpy recognize this layout
adata.uns["draw_graph"] = {
    "params": {"layout": "fa"}
}

###################################
# Step 7: Plot the graph layout
sc.pl.draw_graph(
    adata,
    color="Tissue",
    legend_loc="best",
    legend_fontsize="xx-small"
)

elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")
Image

Best,
Seager

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions