st.add_cell_labels error: ValueError: Length mismatch #128

XYZuo · 2021-09-24T13:13:54Z

Hi,
I'm using a loom file saving from a seurat object. My adata is like:

AnnData object with n_obs × n_vars = 56109 × 22965
obs: 'ClusterID', 'ClusterName', 'DF_classification', 'RNA_snn_res_1_5', 'cell_types', 'gender', 'group', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_hsp', 'percent_mt', 'percent_rb', 'seurat_clusters', 'label_color'
var: 'Selected', 'vst_mean', 'vst_variable', 'vst_variance', 'vst_variance_expected', 'vst_variance_standardized'
uns: 'label_color', 'workdir'
obsm: 'harmony_cell_embeddings', 'pca_cell_embeddings', 'umap_cell_embeddings'
varm: 'harmony_feature_loadings_projected', 'pca_feature_loadings'
layers: 'norm_data', 'scale_data'

I extracted my cell labels by this:
adata.obs['cell_types'].to_csv('labels.tsv',sep='\t',header=0)

But when I try to add it to my object by this:
st.add_cell_labels(adata, file_name = 'labels.tsv')

It came an error:
ValueError: Length mismatch: Expected axis has 56110 elements, new values have 56109 elements

I checked my adata, there seems 56109 cells with no problem:

adata.obs.index
Index(['HC_1_AAACCCAAGACAGTCG-1', 'HC_1_AAACCCAAGAGCCTGA-1',
'HC_1_AAACCCAAGGTCGCCT-1', 'HC_1_AAACCCACAGGTATGG-1',
'HC_1_AAACCCAGTCAATGGG-1', 'HC_1_AAACCCAGTGTTACAC-1',
'HC_1_AAACCCATCGTTTACT-1', 'HC_1_AAACCCATCTAACGGT-1',
'HC_1_AAACCCATCTGGTTGA-1', 'HC_1_AAACGAAAGAATTCAG-1',
...
'ITP_5_TTTGTTGAGACTTGTC-1', 'ITP_5_TTTGTTGAGGACAACC-1',
'ITP_5_TTTGTTGCAAACTAGA-1', 'ITP_5_TTTGTTGCACTTCATT-1',
'ITP_5_TTTGTTGGTAGCTTAC-1', 'ITP_5_TTTGTTGGTCATCGGC-1',
'ITP_5_TTTGTTGGTGCATCTA-1', 'ITP_5_TTTGTTGTCGCGCTGA-1',
'ITP_5_TTTGTTGTCTAAGGAA-1', 'ITP_5_TTTGTTGTCTGTAAGC-1'],
dtype='object', name='CellID', length=56109)

adata.obs['cell_types']
CellID
HC_1_AAACCCAAGACAGTCG-1 EryP
HC_1_AAACCCAAGAGCCTGA-1 preB2
HC_1_AAACCCAAGGTCGCCT-1 GMP
HC_1_AAACCCACAGGTATGG-1 MPP
HC_1_AAACCCAGTCAATGGG-1 preB1
...
ITP_5_TTTGTTGGTCATCGGC-1 MPP
ITP_5_TTTGTTGGTGCATCTA-1 GMP
ITP_5_TTTGTTGTCGCGCTGA-1 MPP
ITP_5_TTTGTTGTCTAAGGAA-1 MDP
ITP_5_TTTGTTGTCTGTAAGC-1 MPP
Name: cell_types, Length: 56109, dtype: object

Could you please help me? I can't figure it out.

huidongchen · 2021-09-27T20:52:09Z

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to
adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

XYZuo · 2021-09-28T12:32:34Z

Thank you for your help! I skip the step st.add_cell_labels() and it works.
But the image I got was not consistent with the cell types I annotated.

I want the HSC group to be in the starting position. Could I set the root site by myself? I guess the 'init_nodes_pos’ in st.seed_elastic_principal_graph could realize this, but I don't know how to set it.

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

huidongchen · 2021-09-28T20:30:50Z

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

XYZuo · 2021-09-29T02:26:52Z

Thanks for your advice! It worked. But I find that I can't add my annotaion color if I don't follow STREAM tutorial to add the colors by 'st.add_cell_colors'. My color annotations are stored in adata.obs.label_color, which match the cluster labels in adata.obs.label. How could I use my annotation color when plotting the stream?

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

huidongchen · 2021-09-29T02:42:03Z

For now I guess it has to be done in a hacky way..

You can add your own colors by :
adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

XYZuo · 2021-09-29T07:58:51Z

Thank you so much.
Unfortunately it gave an error message after I run st.plot_stream_sc:
#ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['#EDE574' '#99CCFF' '#99CCFF' ... '#D071A9' '#CBE86B' '#D9534F']

And I met a similar error with issue 115 (#115) after running st.plot_stream(adata,root='S5',color=['label'],save_fig=True, fig_format='pdf')

Traceback (most recent call last):
File "", line 1, in
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/core.py", line 3131, in plot_stream
log_scale=log_scale,factor_zoomin=factor_zoomin)
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/extra.py", line 933, in cal_stream_polygon_string
df_stream.loc[df_stream.index[id_cells],'edge'] = [x]
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 723, in setitem
if not is_list_like_indexer(new_ix):
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1730, in _setitem_with_indexer
# a) avoid getting things via sections and (to minimize dtype changes)
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1769, in _setitem_with_indexer_split_path
key = tuple([key] + [slice(None)] * (len(labels.levels) - 1))
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1830, in _setitem_with_indexer_2d_value
):
ValueError: Must have equal len keys and value when setting with an ndarray

I tried to downgrade to pandas==1.0 or any other versions, but it didn't work.

Sorry to encroach upon your time. I am also looking forward to the release of stream v2.

For now I guess it has to be done in a hacky way..

You can add your own colors by : adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

huidongchen · 2021-09-30T14:11:35Z

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

XYZuo · 2021-10-04T04:17:27Z

Hi,
Thank you for your patience! Strangely, after I added the label_color in seurat object and then transfered it to loom file, the error about color disappeared. But the second error after running st.plot_stream still exists.
I put a test loom file here https://github.com/ZxyChopcat/STREAMtest/blob/master/STREAMtest.zip
And this is my scripts:

import stream as st
st.version
import pandas as pd
import numpy as np
import anndata as ad
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
adata = ad.read_loom("/zxy/STREAM/itp.data1.2.STREAM.loom", sparse=True, cleanup=False, X_name='spliced', obs_names='CellID', var_names='Gene', dtype='float32')
st.set_workdir(adata,'/data/tmp_data/zxy/STREAM')
adata.var_names_make_unique()
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]
adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()
st.plot_visualization_2D(adata,method='umap',n_neighbors=50,color=['label'],use_precomputed=True,save_fig=True, fig_name='visualization_2D.pdf')
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=True)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05,save_fig=True, fig_name='ElPiGraph_analysis.pdf')
st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=False,save_fig=True, fig_name='dimension_reduction.pdf')
st.plot_branches(adata,show_text=True,save_fig=True, fig_name='branches.pdf')
st.plot_flat_tree(adata,color=['label','branch_id_alias','S5_pseudotime'],dist_scale=0.5,show_graph=True,show_text=True,save_fig=True,fig_name='flat_tree.pdf')
st.plot_stream_sc(adata,root='S5',color=['label','GATA1'],dist_scale=0.5,show_graph=True,show_text=False,save_fig=True, fig_format='pdf',fig_size=(14,9))
st.plot_stream(adata,root='S5',color=['label','GATA1'],save_fig=True, fig_format='pdf')

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

huidongchen · 2021-10-08T18:48:36Z

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here.
test_stream.html.zip

XYZuo · 2021-10-09T01:49:29Z

So it is likely that there is an error in my environment.
I created the conda environment by 'create -n stream python=3.7 stream=1.0 jupyter'. And here is my pip list:
Package Version

anndata 0.7.3
argcomplete 1.12.3
argon2-cffi 20.1.0
async-generator 1.10
attrs 21.2.0
backcall 0.2.0
bleach 4.0.0
Bottleneck 1.3.2
cached-property 1.5.2
certifi 2021.5.30
cffi 1.14.6
click 8.0.2
cycler 0.10.0
debugpy 1.4.1
decorator 5.1.0
defusedxml 0.7.1
entrypoints 0.3
fonttools 4.25.0
gunicorn 20.1.0
h5py 3.2.1
importlib-metadata 4.8.1
ipykernel 6.2.0
ipython 7.27.0
ipython-genutils 0.2.0
ipywidgets 7.6.4
jedi 0.18.0
Jinja2 3.0.1
joblib 1.0.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 7.0.1
jupyter-console 6.4.0
jupyter-core 4.7.1
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.0
kiwisolver 1.3.1
llvmlite 0.36.0
loompy 3.0.6
MarkupSafe 2.0.1
matplotlib 3.2.2
matplotlib-inline 0.1.2
mistune 0.8.4
mkl-fft 1.3.0
mkl-random 1.2.2
mkl-service 2.4.0
munkres 1.1.4
natsort 7.1.1
nbclient 0.5.3
nbconvert 6.1.0
nbformat 5.1.3
nest-asyncio 1.5.1
networkx 2.1
notebook 6.4.3
numba 0.53.1
numexpr 2.7.3
numpy 1.17.5
numpy-groupies 0.9.14
olefile 0.46
packaging 21.0
pandas 1.0.5
pandocfilters 1.4.3
parso 0.8.2
patsy 0.5.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.3.1
pip 21.2.2
plotly 5.1.0
prometheus-client 0.11.0
prompt-toolkit 3.0.20
ptyprocess 0.7.0
pycparser 2.20
Pygments 2.10.0
pynndescent 0.5.4
pyparsing 2.4.7
pyrsistent 0.17.3
python-dateutil 2.8.2
python-slugify 5.0.2
pytz 2021.1
pyzmq 22.2.1
qtconsole 5.1.1
QtPy 1.10.0
rpy2 2.9.4
scikit-learn 0.24.2
scipy 1.7.1
seaborn 0.11.2
Send2Trash 1.8.0
setuptools 58.0.4
Shapely 1.7.1
simplegeneric 0.8.1
six 1.15.0
statsmodels 0.12.2
stream 1.0
tenacity 8.0.1
terminado 0.9.4
testpath 0.5.0
text-unidecode 1.3
threadpoolctl 2.2.0
tornado 6.1
traitlets 5.1.0
typing-extensions 3.10.0.2
tzlocal 2.1
umap-learn 0.5.1
Unidecode 1.2.0
wcwidth 0.2.5
webencodings 0.5.1
wheel 0.37.0
widgetsnbextension 3.5.1
zipp 3.5.0

Can you find anything wrong？

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here. test_stream.html.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

st.add_cell_labels error: ValueError: Length mismatch #128

st.add_cell_labels error: ValueError: Length mismatch #128

XYZuo commented Sep 24, 2021 •

edited

Loading

huidongchen commented Sep 27, 2021

XYZuo commented Sep 28, 2021

huidongchen commented Sep 28, 2021

XYZuo commented Sep 29, 2021

huidongchen commented Sep 29, 2021

XYZuo commented Sep 29, 2021

huidongchen commented Sep 30, 2021

XYZuo commented Oct 4, 2021

huidongchen commented Oct 8, 2021

XYZuo commented Oct 9, 2021 •

edited

Loading

st.add_cell_labels error: ValueError: Length mismatch #128

st.add_cell_labels error: ValueError: Length mismatch #128

Comments

XYZuo commented Sep 24, 2021 • edited Loading

huidongchen commented Sep 27, 2021

XYZuo commented Sep 28, 2021

huidongchen commented Sep 28, 2021

XYZuo commented Sep 29, 2021

huidongchen commented Sep 29, 2021

XYZuo commented Sep 29, 2021

huidongchen commented Sep 30, 2021

XYZuo commented Oct 4, 2021

huidongchen commented Oct 8, 2021

XYZuo commented Oct 9, 2021 • edited Loading

XYZuo commented Sep 24, 2021 •

edited

Loading

XYZuo commented Oct 9, 2021 •

edited

Loading