-
Notifications
You must be signed in to change notification settings - Fork 40
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Thank you very much for developing rapids‑singlecell (rsc). While using rsc, I noticed that the results of its ForceAtlas2 (FA2) implementation differ significantly from those produced by Scanpy and cuGraph. Since I am not very familiar with the underlying principles of FA2, I would like to understand the reasons behind these differences.
start = time.perf_counter()
rsc.get.anndata_to_GPU(adata)
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")
start = time.perf_counter()
SCVI_LATENT_KEY = "X_scVI"
# use scVI latent space for UMAP generation
#rsc.pp.neighbors(adata, use_rep=SCVI_LATENT_KEY)
#rsc.tl.umap(adata, min_dist=0.5)
rsc.pp.neighbors(adata, n_neighbors=30,use_rep="X_scVI",key_added="scVI")
rsc.tl.umap(adata,neighbors_key="scVI",key_added="X_umap_scVI")
rsc.tl.louvain(adata, resolution=0.6,neighbors_key="scVI",key_added="louvain_scVI")
rsc.tl.leiden(adata, resolution=0.6,neighbors_key="scVI",key_added="leiden_scVI")
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")
start = time.perf_counter()
#rsc.tl.draw_graph(adata)
rsc.tl.draw_graph(adata, max_iter=500)
sc.pl.draw_graph(
adata,
color="Tissue",
legend_loc="best",
legend_fontsize="xx-small"
)
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")
I used cuGraph to implement a simplified version of the FA2 algorithm.
start = time.perf_counter()
###################################
# Step 1: Extract the neighbor graph
A = adata.obsp["connectivities"].tocoo()
# Step 2: Convert to a cuDF DataFrame
df = cudf.DataFrame({
"src": cp.asarray(A.row),
"dst": cp.asarray(A.col),
"weight": cp.asarray(A.data)
})
# Step 3: Build a cuGraph Graph object
G = cugraph.Graph()
G.from_cudf_edgelist(df, source="src", destination="dst", edge_attr="weight")
# Step 4: Run cuGraph ForceAtlas2
pos = cugraph.force_atlas2(G, max_iter=500)
###################################
# Step 5: Handle isolated nodes (cuGraph does not return nodes with degree = 0)
# Create a full coordinate matrix (n_cells × 2)
coords = np.full((adata.n_obs, 2), np.nan, dtype=np.float32)
# Sort by vertex index
pos_sorted = pos.sort_values("vertex")
# Fill coordinates for nodes returned by cuGraph
coords[pos_sorted["vertex"].to_pandas().values] = (
pos_sorted[["x", "y"]].to_pandas().values
)
# Store coordinates in AnnData
adata.obsm["X_draw_graph_fa"] = coords
###################################
# Step 6: Make Scanpy recognize this layout
adata.uns["draw_graph"] = {
"params": {"layout": "fa"}
}
###################################
# Step 7: Plot the graph layout
sc.pl.draw_graph(
adata,
color="Tissue",
legend_loc="best",
legend_fontsize="xx-small"
)
elapsed_min = (time.perf_counter() - start) / 60
print(f"{elapsed_min:.3f} min")
Best,
Seager
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested