Skip to content

BUG: index.name not preserved in concat in case of unequal object index #13475

Closed
@dllahr

Description

@dllahr

xref #13742 for addl cases.

In [23]: df1 = pd.DataFrame({'a':[1,2]}, index=pd.Index(['a', 'b'], name='idx'))

In [24]: df2 = pd.DataFrame({'b':[2,3]}, index=pd.Index(['b', 'c'], name='idx'))

In [26]: pd.concat([df1, df2], axis=1)
Out[26]:
     a    b
a  1.0  NaN
b  2.0  2.0
c  NaN  3.0

In [27]: print pd.concat([df1, df2], axis=1).index.name
None

So the issue seems to be with a string index that is not equal, as when the index of the two frames is equal (no NaNs are introduced), the name is kept and also when using numerical indexes, see #13475 (comment)


When I use the concat function with input dataframes that have index.name assigned, sometimes the resulting dataframe has the index.name assigned, sometimes it does not.

I ran the code below from the python interpreter, using a conda environment with pandas-0.18.1

I don't see any odd / extra characters around the "pert_well" column in the files between the files.

Code Sample, a copy-pastable example if possible

import pandas

a_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.04784 ACCEPT  10.0    B-REPO-01-B64-101   405.4084    11  Taltirelin  Taltirelin  86.52   HY-B0596    C18 DMSO    BRD-K93869735-001-01-1  CN1C(=O)C[C@H](NC1=O)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N1CCC[C@H]1C(N)=O    4.054084    um  20.0    BRD-K93869735   PMEL008 trt_cp"""

b_data = """pert_well   pert_2_type pert_2_id   pert_2_mfc_id   pert_2_mfc_desc pert_2_id_vendor    pert_2_iname    pert_2_dose pert_2_dose_unit    pert_2_vehicle  pert_3_type pert_3_idpert_3_mfc_id  pert_3_mfc_desc pert_3_id_vendor    pert_3_iname    pert_3_dose pert_3_dose_unit    pert_3_vehicle
A01 ctl_vehicle DMSO    DMSO    DMSO    -666    DMSO    -666    -666    -666    ctl_untrt   CMAP-000    -666    UnTrt   -666    -666    -666    -666    -666"""

d_data = """x_amount_mg x_annotation    x_mmoles_per_liter  mfc_plate_name  x_avg_mol_weight    x_volume_ul pert_mfc_desc   pert_iname  x_purity    pert_id_vendor  pert_well   pert_vehicle    pert_mfc_id x_smiles    x_mg_per_ml pert_dose_unit  pert_dose   pert_id pert_plate  pert_type
0.0 -666    -666    B-REPO-01-B64-107   -666    0   -666    -666    -666    -666    A01 -666    -666    -666    -666    -666    -666    CMAP-000    PMEL001 ctl_untrt"""

a = pandas.read_csv(StringIO(a_data), sep="\t", index_col="pert_well")
b = pandas.read_csv(StringIO(b_data), sep="\t", index_col="pert_well")
c = pandas.concat([a,b], axis=1)
c.index

d = pandas.read_csv(StringIO(d_data), sep="\t", index_col="pert_well")
e = pandas.concat([d,b], axis=1)
e.index

results:

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', length=384)

Index([u'A01', u'A02', u'A03', u'A04', u'A05', u'A06', u'A07', u'A08', u'A09',
       u'A10',
       ...
       u'P15', u'P16', u'P17', u'P18', u'P19', u'P20', u'P21', u'P22', u'P23',
       u'P24'],
      dtype='object', name=u'pert_well', length=384)

Expected Output

c.index.name should be "pert_well"

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.7.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

PMEL_input_files_for_pandas_issue.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsIndexRelated to the Index class or subclassesReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions