Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st.add_cell_labels error: ValueError: Length mismatch #128

Open
XYZuo opened this issue Sep 24, 2021 · 10 comments
Open

st.add_cell_labels error: ValueError: Length mismatch #128

XYZuo opened this issue Sep 24, 2021 · 10 comments

Comments

@XYZuo
Copy link

XYZuo commented Sep 24, 2021

Hi,
I'm using a loom file saving from a seurat object. My adata is like:

AnnData object with n_obs × n_vars = 56109 × 22965
obs: 'ClusterID', 'ClusterName', 'DF_classification', 'RNA_snn_res_1_5', 'cell_types', 'gender', 'group', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_hsp', 'percent_mt', 'percent_rb', 'seurat_clusters', 'label_color'
var: 'Selected', 'vst_mean', 'vst_variable', 'vst_variance', 'vst_variance_expected', 'vst_variance_standardized'
uns: 'label_color', 'workdir'
obsm: 'harmony_cell_embeddings', 'pca_cell_embeddings', 'umap_cell_embeddings'
varm: 'harmony_feature_loadings_projected', 'pca_feature_loadings'
layers: 'norm_data', 'scale_data'

I extracted my cell labels by this:
adata.obs['cell_types'].to_csv('labels.tsv',sep='\t',header=0)

But when I try to add it to my object by this:
st.add_cell_labels(adata, file_name = 'labels.tsv')

It came an error:
ValueError: Length mismatch: Expected axis has 56110 elements, new values have 56109 elements

I checked my adata, there seems 56109 cells with no problem:

adata.obs.index
Index(['HC_1_AAACCCAAGACAGTCG-1', 'HC_1_AAACCCAAGAGCCTGA-1',
'HC_1_AAACCCAAGGTCGCCT-1', 'HC_1_AAACCCACAGGTATGG-1',
'HC_1_AAACCCAGTCAATGGG-1', 'HC_1_AAACCCAGTGTTACAC-1',
'HC_1_AAACCCATCGTTTACT-1', 'HC_1_AAACCCATCTAACGGT-1',
'HC_1_AAACCCATCTGGTTGA-1', 'HC_1_AAACGAAAGAATTCAG-1',
...
'ITP_5_TTTGTTGAGACTTGTC-1', 'ITP_5_TTTGTTGAGGACAACC-1',
'ITP_5_TTTGTTGCAAACTAGA-1', 'ITP_5_TTTGTTGCACTTCATT-1',
'ITP_5_TTTGTTGGTAGCTTAC-1', 'ITP_5_TTTGTTGGTCATCGGC-1',
'ITP_5_TTTGTTGGTGCATCTA-1', 'ITP_5_TTTGTTGTCGCGCTGA-1',
'ITP_5_TTTGTTGTCTAAGGAA-1', 'ITP_5_TTTGTTGTCTGTAAGC-1'],
dtype='object', name='CellID', length=56109)

adata.obs['cell_types']
CellID
HC_1_AAACCCAAGACAGTCG-1 EryP
HC_1_AAACCCAAGAGCCTGA-1 preB2
HC_1_AAACCCAAGGTCGCCT-1 GMP
HC_1_AAACCCACAGGTATGG-1 MPP
HC_1_AAACCCAGTCAATGGG-1 preB1
...
ITP_5_TTTGTTGGTCATCGGC-1 MPP
ITP_5_TTTGTTGGTGCATCTA-1 GMP
ITP_5_TTTGTTGTCGCGCTGA-1 MPP
ITP_5_TTTGTTGTCTAAGGAA-1 MDP
ITP_5_TTTGTTGTCTGTAAGC-1 MPP
Name: cell_types, Length: 56109, dtype: object

Could you please help me? I can't figure it out.

@huidongchen
Copy link
Collaborator

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to
adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

@XYZuo
Copy link
Author

XYZuo commented Sep 28, 2021

Thank you for your help! I skip the step st.add_cell_labels() and it works.
But the image I got was not consistent with the cell types I annotated.
image
I want the HSC group to be in the starting position. Could I set the root site by myself? I guess the 'init_nodes_pos’ in st.seed_elastic_principal_graph could realize this, but I don't know how to set it.

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

@huidongchen
Copy link
Collaborator

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

@XYZuo
Copy link
Author

XYZuo commented Sep 29, 2021

Thanks for your advice! It worked. But I find that I can't add my annotaion color if I don't follow STREAM tutorial to add the colors by 'st.add_cell_colors'. My color annotations are stored in adata.obs.label_color, which match the cluster labels in adata.obs.label. How could I use my annotation color when plotting the stream?

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

@huidongchen
Copy link
Collaborator

For now I guess it has to be done in a hacky way..

You can add your own colors by :
adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

@XYZuo
Copy link
Author

XYZuo commented Sep 29, 2021

Thank you so much.
Unfortunately it gave an error message after I run st.plot_stream_sc:
#ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['#EDE574' '#99CCFF' '#99CCFF' ... '#D071A9' '#CBE86B' '#D9534F']

And I met a similar error with issue 115 (#115) after running st.plot_stream(adata,root='S5',color=['label'],save_fig=True, fig_format='pdf')

Traceback (most recent call last):
File "", line 1, in
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/core.py", line 3131, in plot_stream
log_scale=log_scale,factor_zoomin=factor_zoomin)
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/extra.py", line 933, in cal_stream_polygon_string
df_stream.loc[df_stream.index[id_cells],'edge'] = [x]
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 723, in setitem
if not is_list_like_indexer(new_ix):
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1730, in _setitem_with_indexer
# a) avoid getting things via sections and (to minimize dtype changes)
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1769, in _setitem_with_indexer_split_path
key = tuple([key] + [slice(None)] * (len(labels.levels) - 1))
File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1830, in _setitem_with_indexer_2d_value
):
ValueError: Must have equal len keys and value when setting with an ndarray

I tried to downgrade to pandas==1.0 or any other versions, but it didn't work.

Sorry to encroach upon your time. I am also looking forward to the release of stream v2.

For now I guess it has to be done in a hacky way..

You can add your own colors by : adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

@huidongchen
Copy link
Collaborator

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

@XYZuo
Copy link
Author

XYZuo commented Oct 4, 2021

Hi,
Thank you for your patience! Strangely, after I added the label_color in seurat object and then transfered it to loom file, the error about color disappeared. But the second error after running st.plot_stream still exists.
I put a test loom file here https://github.com/ZxyChopcat/STREAMtest/blob/master/STREAMtest.zip
And this is my scripts:

import stream as st
st.version
import pandas as pd
import numpy as np
import anndata as ad
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
adata = ad.read_loom("/zxy/STREAM/itp.data1.2.STREAM.loom", sparse=True, cleanup=False, X_name='spliced', obs_names='CellID', var_names='Gene', dtype='float32')
st.set_workdir(adata,'/data/tmp_data/zxy/STREAM')
adata.var_names_make_unique()
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]
adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()
st.plot_visualization_2D(adata,method='umap',n_neighbors=50,color=['label'],use_precomputed=True,save_fig=True, fig_name='visualization_2D.pdf')
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=True)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05,save_fig=True, fig_name='ElPiGraph_analysis.pdf')
st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=False,save_fig=True, fig_name='dimension_reduction.pdf')
st.plot_branches(adata,show_text=True,save_fig=True, fig_name='branches.pdf')
st.plot_flat_tree(adata,color=['label','branch_id_alias','S5_pseudotime'],dist_scale=0.5,show_graph=True,show_text=True,save_fig=True,fig_name='flat_tree.pdf')
st.plot_stream_sc(adata,root='S5',color=['label','GATA1'],dist_scale=0.5,show_graph=True,show_text=False,save_fig=True, fig_format='pdf',fig_size=(14,9))
st.plot_stream(adata,root='S5',color=['label','GATA1'],save_fig=True, fig_format='pdf')

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

@huidongchen
Copy link
Collaborator

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here.
test_stream.html.zip

@XYZuo
Copy link
Author

XYZuo commented Oct 9, 2021

So it is likely that there is an error in my environment.
I created the conda environment by 'create -n stream python=3.7 stream=1.0 jupyter'. And here is my pip list:
Package Version


anndata 0.7.3
argcomplete 1.12.3
argon2-cffi 20.1.0
async-generator 1.10
attrs 21.2.0
backcall 0.2.0
bleach 4.0.0
Bottleneck 1.3.2
cached-property 1.5.2
certifi 2021.5.30
cffi 1.14.6
click 8.0.2
cycler 0.10.0
debugpy 1.4.1
decorator 5.1.0
defusedxml 0.7.1
entrypoints 0.3
fonttools 4.25.0
gunicorn 20.1.0
h5py 3.2.1
importlib-metadata 4.8.1
ipykernel 6.2.0
ipython 7.27.0
ipython-genutils 0.2.0
ipywidgets 7.6.4
jedi 0.18.0
Jinja2 3.0.1
joblib 1.0.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 7.0.1
jupyter-console 6.4.0
jupyter-core 4.7.1
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.0
kiwisolver 1.3.1
llvmlite 0.36.0
loompy 3.0.6
MarkupSafe 2.0.1
matplotlib 3.2.2
matplotlib-inline 0.1.2
mistune 0.8.4
mkl-fft 1.3.0
mkl-random 1.2.2
mkl-service 2.4.0
munkres 1.1.4
natsort 7.1.1
nbclient 0.5.3
nbconvert 6.1.0
nbformat 5.1.3
nest-asyncio 1.5.1
networkx 2.1
notebook 6.4.3
numba 0.53.1
numexpr 2.7.3
numpy 1.17.5
numpy-groupies 0.9.14
olefile 0.46
packaging 21.0
pandas 1.0.5
pandocfilters 1.4.3
parso 0.8.2
patsy 0.5.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.3.1
pip 21.2.2
plotly 5.1.0
prometheus-client 0.11.0
prompt-toolkit 3.0.20
ptyprocess 0.7.0
pycparser 2.20
Pygments 2.10.0
pynndescent 0.5.4
pyparsing 2.4.7
pyrsistent 0.17.3
python-dateutil 2.8.2
python-slugify 5.0.2
pytz 2021.1
pyzmq 22.2.1
qtconsole 5.1.1
QtPy 1.10.0
rpy2 2.9.4
scikit-learn 0.24.2
scipy 1.7.1
seaborn 0.11.2
Send2Trash 1.8.0
setuptools 58.0.4
Shapely 1.7.1
simplegeneric 0.8.1
six 1.15.0
statsmodels 0.12.2
stream 1.0
tenacity 8.0.1
terminado 0.9.4
testpath 0.5.0
text-unidecode 1.3
threadpoolctl 2.2.0
tornado 6.1
traitlets 5.1.0
typing-extensions 3.10.0.2
tzlocal 2.1
umap-learn 0.5.1
Unidecode 1.2.0
wcwidth 0.2.5
webencodings 0.5.1
wheel 0.37.0
widgetsnbextension 3.5.1
zipp 3.5.0

Can you find anything wrong?

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here. test_stream.html.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants