-
Notifications
You must be signed in to change notification settings - Fork 988
Open
Labels
0 - BacklogIn queue waiting for assignmentIn queue waiting for assignmentPythonAffects Python cuDF API.Affects Python cuDF API.cuIOcuIO issuecuIO issuefeature requestNew feature or requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Milestone
Description
Is your feature request related to a problem? Please describe.
writing code with import cudf as pd
Describe the solution you'd like
same behavior as import pandas as pd
In [1]: import cudf as pd
In [2]: pd.__version__
Out[2]: '22.12.01'
In [3]: df = pd.DataFrame({'a': ['one','two','three'] * 10})
In [4]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null object
dtypes: object(1)
memory usage: 234.0+ bytes
In [5]: df.a = df.astype('category')
In [6]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null category
dtypes: category(1)
memory usage: 57.0 bytes
In [7]: %ls df.parquet
ls: cannot access 'df.parquet': No such file or directory
In [8]: df.to_pandas().to_parquet('df.parquet')
In [9]: %ls df.parquet
df.parquet
In [10]: pd.read_parquet('df.parquet').info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null object
dtypes: object(1)
memory usage: 234.0+ bytes
In [11]: import pandas
In [12]: pandas.read_parquet('df.parquet').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null category
dtypes: category(1)
memory usage: 290.0 bytes
In [13]: pd.DataFrame(pandas.read_parquet('df.parquet')).info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null category
dtypes: category(1)
memory usage: 57.0 bytes
the parquet reader turns the column into dtype=object
In [10]: pd.read_parquet('df.parquet').info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30 non-null object
dtypes: object(1)
memory usage: 234.0+ bytes
Metadata
Metadata
Assignees
Labels
0 - BacklogIn queue waiting for assignmentIn queue waiting for assignmentPythonAffects Python cuDF API.Affects Python cuDF API.cuIOcuIO issuecuIO issuefeature requestNew feature or requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.Affects libcudf (C++/CUDA) code.
Type
Projects
Status
Todo