Skip to content

BUG: to_hdf on dataframe with string column failing with compression #64180

@jorisvandenbossche

Description

@jorisvandenbossche

#60663 has added support for the new str dtype in HDF IO, but apparently was not tested in combination with compression:

>>> df = pd.DataFrame({"col": ["a", "b", "c"]})
>>> df.to_hdf("test_strings.h5", key="df")
>>> df.to_hdf("test_strings_compressed.h5", key="df", complevel=1)
...
File ~/conda/envs/pandas-30/lib/python3.13/site-packages/pandas/io/pytables.py:3288, in GenericFixed.write_array(self, key, obj, items)
   3285 if self._filters is not None:
   3286     with suppress(ValueError):
   3287         # get the atom for this datatype
-> 3288         atom = _tables().Atom.from_dtype(value.dtype)
   3290 if atom is not None:
   3291     # We only get here if self._filters is non-None and
   3292     #  the Atom.from_dtype call succeeded
   3293 
   3294     # create an empty chunked array and fill it from value
   3295     if not empty_array:

File ~/conda/envs/pandas-30/lib/python3.13/site-packages/tables/atom.py:366, in Atom.from_dtype(cls, dtype, dflt)
    341 @classmethod
    342 def from_dtype(cls, dtype: np.dtype, dflt: Any = None) -> Atom:
    343     """Create an Atom from a NumPy dtype.
    344 
    345     An optional default value may be specified as the dflt
   (...)    364 
    365     """
--> 366     basedtype = dtype.base
    367     shape = tuple(SizeType(i) for i in dtype.shape)
    368     if basedtype.names:

AttributeError: 'StringDtype' object has no attribute 'base'

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO HDF5read_hdf, HDFStoreStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions