Replies: 3 comments 1 reply
-
My errorTODO: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. '(Pipeline(steps=[('imputer', SimpleImputer(fill_value=4, strategy='constant')), My codeimport pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
# Modelling
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
#Setup Random seed
import numpy as np
np.random.seed(seed=42)
# Import data and drop rows with missing labels
data = pd.read_csv('car-sales-extended-missing-data.csv')
data
# Drop rows of Price column with missing data
data.dropna(subset=['Price'] , inplace=True)
# We need to change "Make, Colour" columns into numbers form then fill missing value of the all these column.
# Also we we to fill missing value of "Doors" column.
categorical_features = ['Make' , 'Colour']
categorical_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value="missing")),
('onehot' , OneHotEncoder(handle_unknown="ignore"))
])
# Here, we are imputing categorical_features with Constant value of missing string helps to fill missing value.
# then we are creating OneHotEncoder changing this to numerical value.
door_features = ['Doors']
door_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value=4)),
]),
# We are using SimpleImputer to handle missing data on "Doors" if it has missing data then fill it with 4.
numerical_features = ['Odometer (KM)']
numeric_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="mean"))])
# Here it is going to fill numerical column i.e Odometer (KM) with strategy=mean. It is going to take mean value of
# Odometer (KM) column and fill all missing rows in this respective column with mean of rest of value
# Setup preprocessing steps( fill missing value then convert to numbers)
preprocessor = ColumnTransformer(
transformers = [
("cat" , categorical_transformer , categorical_features),
("door" , door_transformer , door_features ),
("num" , numeric_transformer, numerical_features)
])
# Creating a preprocessing and modelling Pipeline.
model = Pipeline(steps=[('preprocessor',preprocessor),
('model' , RandomForestRegressor() )
])
# Here the first step in Pipeline is to run through preprocessor then creating model of RandomForestRegressor().
# Once preprocessor is done it will build RandomForestRegressor on it.
# Split data.
X = data.drop('Price' , axis=1)
y = data['Price']
# train , test split
X_train, X_test, y_train , y_test = train_test_split(X , y , test_size=0.2)
# Fit and score the model
model.fit(X_train , y_train) ```python <- use triple backticks before and after your code, write "python" after the first set to make the code formatted What I've tried so farTODO: Add some steps for what you've tried to do so far to solve your error... (this will help others know what you've done when troubleshooting) |
Beta Was this translation helpful? Give feedback.
-
Handling missing data with pandas #Fill the odometer column with the mean value for the missing data I got the following error message: TypeError Traceback (most recent call last) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\series.py:6225, in Series.mean(self, axis, skipna, numeric_only, **kwargs) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py:11992, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py:11949, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\series.py:6133, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:147, in bottleneck_switch.call..f(values, axis, skipna, **kwds) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:404, in _datetimelike_compat..new_func(values, axis, skipna, mask, **kwargs) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:719, in nanmean(values, axis, skipna, mask) File ~\Desktop\sample_project_1\env\lib\site-packages\numpy\core_methods.py:49, in _sum(a, axis, dtype, out, keepdims, initial, where) TypeError: unsupported operand type(s) for +: 'float' and 'method' Tried all sources on the net as advised but cannot solve the problem. Would you kindly be able to clarify what is going on? |
Beta Was this translation helpful? Give feedback.
-
My error
|
Beta Was this translation helpful? Give feedback.
-
This template is to help with the formatting of questions to best help others who are reading it.
The four main sections are:
You can copy the demo template below and fill out the sections in your question, much of the formatting has been done just to make things look nicer:
Beta Was this translation helpful? Give feedback.
All reactions