Replies: 3 comments 1 reply
My errorTODO: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. '(Pipeline(steps=[('imputer', SimpleImputer(fill_value=4, strategy='constant')), My codeimport pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
# Modelling
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
#Setup Random seed
import numpy as np
# Import data and drop rows with missing labels
data = pd.read_csv('car-sales-extended-missing-data.csv')
# Drop rows of Price column with missing data
data.dropna(subset=['Price'] , inplace=True)
# We need to change "Make, Colour" columns into numbers form then fill missing value of the all these column.
# Also we we to fill missing value of "Doors" column.
categorical_features = ['Make' , 'Colour']
categorical_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value="missing")),
('onehot' , OneHotEncoder(handle_unknown="ignore"))
# Here, we are imputing categorical_features with Constant value of missing string helps to fill missing value.
# then we are creating OneHotEncoder changing this to numerical value.
door_features = ['Doors']
door_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value=4)),
# We are using SimpleImputer to handle missing data on "Doors" if it has missing data then fill it with 4.
numerical_features = ['Odometer (KM)']
numeric_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="mean"))])
# Here it is going to fill numerical column i.e Odometer (KM) with strategy=mean. It is going to take mean value of
# Odometer (KM) column and fill all missing rows in this respective column with mean of rest of value
# Setup preprocessing steps( fill missing value then convert to numbers)
preprocessor = ColumnTransformer(
transformers = [
("cat" , categorical_transformer , categorical_features),
("door" , door_transformer , door_features ),
("num" , numeric_transformer, numerical_features)
# Creating a preprocessing and modelling Pipeline.
model = Pipeline(steps=[('preprocessor',preprocessor),
('model' , RandomForestRegressor() )
# Here the first step in Pipeline is to run through preprocessor then creating model of RandomForestRegressor().
# Once preprocessor is done it will build RandomForestRegressor on it.
# Split data.
X = data.drop('Price' , axis=1)
y = data['Price']
# train , test split
X_train, X_test, y_train , y_test = train_test_split(X , y , test_size=0.2)
# Fit and score the model , y_train) ```python <- use triple backticks before and after your code, write "python" after the first set to make the code formatted What I've tried so farTODO: Add some steps for what you've tried to do so far to solve your error... (this will help others know what you've done when troubleshooting) |
Beta Was this translation helpful? Give feedback.
Handling missing data with pandas #Fill the odometer column with the mean value for the missing data I got the following error message: TypeError Traceback (most recent call last) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in Series.mean(self, axis, skipna, numeric_only, **kwargs) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in, axis, skipna, **kwds) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in _datetimelike_compat..new_func(values, axis, skipna, mask, **kwargs) File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\, in nanmean(values, axis, skipna, mask) File ~\Desktop\sample_project_1\env\lib\site-packages\numpy\, in _sum(a, axis, dtype, out, keepdims, initial, where) TypeError: unsupported operand type(s) for +: 'float' and 'method' Tried all sources on the net as advised but cannot solve the problem. Would you kindly be able to clarify what is going on? |
Beta Was this translation helpful? Give feedback.
My error
Beta Was this translation helpful? Give feedback.
This template is to help with the formatting of questions to best help others who are reading it.
The four main sections are:
You can copy the demo template below and fill out the sections in your question, much of the formatting has been done just to make things look nicer:
Beta Was this translation helpful? Give feedback.
All reactions