Demo question template #48

mrdbourke · 2023-06-02T01:38:09Z

mrdbourke
Jun 2, 2023
Maintainer

This template is to help with the formatting of questions to best help others who are reading it.

The four main sections are:

Where you're stuck - best to put a timestamp/video number of the course here so people know exactly where you are (e.g. ZTM video 135, timestamp 11:07)
Your problem - what is your problem?
Your code - what code did you try that isn't working?
What you've tried so far - detailing the troubleshooting steps you've taken already will help others help you.

You can copy the demo template below and fill out the sections in your question, much of the formatting has been done just to make things look nicer:

* Course video/timestamp: (e.g. video number 147 on ZTM, timestamp 10:35)

## My error

TODO: Add some info here about your error...

## My code

TODO: Add your code here, best to format with backticks as well, for example: 

\```python <- use triple backticks before and after your code, write "python" after the first set to make the code formatted
[your code here]
\``` <- be sure to delete the slashes, only keep the backticks

## What I've tried so far

TODO: Add some steps for what you've tried to do so far to solve your error... (this will help others know what you've done when troubleshooting)

nischal6 · 2023-09-27T13:05:31Z

nischal6
Sep 27, 2023

Course video/timestamp: (. video number 149 on ZTM, timestamp 10:35)

My error

TODO: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. '(Pipeline(steps=[('imputer', SimpleImputer(fill_value=4, strategy='constant')),
('onehots', OneHotEncoder(handle_unknown='ignore'))]),)' (type <class 'tuple'>) doesn't.

My code

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

# Modelling
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV

#Setup Random seed
import numpy as np
np.random.seed(seed=42)

# Import data and drop rows with missing labels
data = pd.read_csv('car-sales-extended-missing-data.csv')
data

# Drop rows of Price column with missing data
data.dropna(subset=['Price'] , inplace=True)



# We need to change "Make, Colour" columns into numbers form then fill missing value of the all these column.
# Also we we to fill missing value of "Doors" column.

categorical_features = ['Make' , 'Colour']
categorical_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value="missing")),
                                          ('onehot'  , OneHotEncoder(handle_unknown="ignore"))
                                         ])
# Here, we are imputing categorical_features with Constant value of missing string helps to fill missing value.
# then we are creating OneHotEncoder changing this to numerical value.

door_features = ['Doors']
door_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value=4)),
                                  ]),
# We are using SimpleImputer to handle missing data on "Doors" if it has missing data then fill it with 4.

numerical_features = ['Odometer (KM)']
numeric_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="mean"))])
# Here it is going to fill numerical column i.e Odometer (KM) with strategy=mean. It is going to take mean value of 
# Odometer (KM) column and fill all missing rows in this respective column with mean of rest of value


# Setup preprocessing steps( fill missing value then convert to numbers)
preprocessor = ColumnTransformer(
                                  transformers = [
                                               ("cat" , categorical_transformer , categorical_features),
                                               ("door" , door_transformer , door_features ),
                                               ("num" , numeric_transformer, numerical_features)
                                                ])

# Creating a preprocessing and modelling Pipeline.
model = Pipeline(steps=[('preprocessor',preprocessor),
                        ('model' , RandomForestRegressor() )
                        ])
# Here the first step in Pipeline is to run through preprocessor then creating model of RandomForestRegressor().
# Once preprocessor is done it will build RandomForestRegressor on it.

# Split data.
X = data.drop('Price' , axis=1)
y = data['Price']

# train , test split
X_train, X_test,  y_train , y_test = train_test_split(X , y , test_size=0.2)

# Fit and score the model

model.fit(X_train , y_train)

```python <- use triple backticks before and after your code, write "python" after the first set to make the code formatted
[your code here]
``` <- be sure to delete the slashes, only keep the backticks

What I've tried so far

TODO: Add some steps for what you've tried to do so far to solve your error... (this will help others know what you've done when troubleshooting)

1 reply

mrdbourke Sep 28, 2023
Maintainer Author

Hey @nischal6 ,

A small error in your code in this line:

door_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value=4)),
                                  ]),

An extra comma in the middle/end, see the updated code here:

door_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy="constant" , fill_value=4))])

And the full working code here:

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

# Modelling
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV

#Setup Random seed
import numpy as np
np.random.seed(seed=42)

# Import data and drop rows with missing labels
data = pd.read_csv('../data/car-sales-extended-missing-data.csv')
data

# Drop rows of Price column with missing data
data.dropna(subset=['Price'] , inplace=True)



# We need to change "Make, Colour" columns into numbers form then fill missing value of the all these column.
# Also we we to fill missing value of "Doors" column.

categorical_features = ['Make' , 'Colour']
categorical_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="constant" , fill_value="missing")),
                                          ('onehot'  , OneHotEncoder(handle_unknown="ignore"))
                                         ])
# Here, we are imputing categorical_features with Constant value of missing string helps to fill missing value.
# then we are creating OneHotEncoder changing this to numerical value.

door_features = ['Doors']
door_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy="constant" , fill_value=4))])
# We are using SimpleImputer to handle missing data on "Doors" if it has missing data then fill it with 4.

numerical_features = ['Odometer (KM)']
numeric_transformer = Pipeline(steps=[('imputer' , SimpleImputer(strategy="mean"))])
# Here it is going to fill numerical column i.e Odometer (KM) with strategy=mean. It is going to take mean value of 
# Odometer (KM) column and fill all missing rows in this respective column with mean of rest of value


# Setup preprocessing steps( fill missing value then convert to numbers)
preprocessor = ColumnTransformer(
                                  transformers = [
                                               ("cat" , categorical_transformer , categorical_features),
                                               ("door" , door_transformer , door_features ),
                                               ("num" , numeric_transformer, numerical_features)
                                                ])

# Creating a preprocessing and modelling Pipeline.
model = Pipeline(steps=[('preprocessor', preprocessor),
                        ('model', RandomForestRegressor())])

# Here the first step in Pipeline is to run through preprocessor then creating model of RandomForestRegressor().
# Once preprocessor is done it will build RandomForestRegressor on it.

# Split data.
X = data.drop('Price' , axis=1)
y = data['Price']

# train , test split
X_train, X_test,  y_train , y_test = train_test_split(X , y , test_size=0.2)

# Fit and score the model
model.fit(X_train , y_train)

vfonsecal · 2024-07-29T09:53:18Z

vfonsecal
Jul 29, 2024

Handling missing data with pandas
Following section #114
After

#Fill the odometer column with the mean value for the missing data
car_sales_missing["Odometer (KM)"].fillna(value=car_sales_missing["Odometer (KM)"].mean(), inplace=True)

I got the following error message:

TypeError Traceback (most recent call last)
Cell In[70], line 2
1 #Fill the odometer column with the mean value for the missing data
----> 2 car_sales_missing["Odometer (KM)"].fillna(value=car_sales_missing["Odometer (KM)"].mean(), inplace=True)

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\series.py:6225, in Series.mean(self, axis, skipna, numeric_only, **kwargs)
6217 @doc(make_doc("mean", ndim=1))
6218 def mean(
6219 self,
(...)
6223 **kwargs,
6224 ):
-> 6225 return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py:11992, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
11985 def mean(
11986 self,
11987 axis: Axis | None = 0,
(...)
11990 **kwargs,
11991 ) -> Series | float:

11992 return self._stat_function(
11993 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
11994 )

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py:11949, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
11945 nv.validate_func(name, (), kwargs)
11947 validate_bool_kwarg(skipna, "skipna", none_allowed=False)

11949 return self._reduce(
11950 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
11951 )

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\series.py:6133, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
6128 # GH#47500 - change to TypeError to match other methods
6129 raise TypeError(
6130 f"Series.{name} does not allow {kwd_name}={numeric_only} "
6131 "with non-numeric dtypes."
6132 )
-> 6133 return op(delegate, skipna=skipna, **kwds)

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:147, in bottleneck_switch.call..f(values, axis, skipna, **kwds)
145 result = alt(values, axis=axis, skipna=skipna, **kwds)
146 else:
--> 147 result = alt(values, axis=axis, skipna=skipna, **kwds)
149 return result

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:404, in _datetimelike_compat..new_func(values, axis, skipna, mask, **kwargs)
401 if datetimelike and mask is None:
402 mask = isna(values)
--> 404 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
406 if datetimelike:
407 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~\Desktop\sample_project_1\env\lib\site-packages\pandas\core\nanops.py:719, in nanmean(values, axis, skipna, mask)
716 dtype_count = dtype
718 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 719 the_sum = values.sum(axis, dtype=dtype_sum)
720 the_sum = _ensure_numeric(the_sum)
722 if axis is not None and getattr(the_sum, "ndim", False):

File ~\Desktop\sample_project_1\env\lib\site-packages\numpy\core_methods.py:49, in _sum(a, axis, dtype, out, keepdims, initial, where)
47 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
48 initial=_NoValue, where=True):
---> 49 return umr_sum(a, axis, dtype, out, keepdims, initial, where)

TypeError: unsupported operand type(s) for +: 'float' and 'method'

Tried all sources on the net as advised but cannot solve the problem.

Would you kindly be able to clarify what is going on?

0 replies

Monishabasith · 2024-11-08T10:37:26Z

Monishabasith
Nov 8, 2024

Course video/timestamp: video number 239 timestamp 5:51

My error
Building model with: https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4

ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = create_model()
2 model.summary()

2 frames
/usr/local/lib/python3.10/dist-packages/keras/src/models/sequential.py in add(self, layer, rebuild)
93 layer = origin_layer
94 if not isinstance(layer, Layer):
---> 95 raise ValueError(
96 "Only instances of keras.Layer can be "
97 f"added to a Sequential model. Received: {layer} "

ValueError: Only instances of keras.Layer can be added to a Sequential model. Received: <tensorflow_hub.keras_layer.KerasLayer object at 0x7de34da405b0> (of type <class 'tensorflow_hub.keras_layer.KerasLayer'>)

TODO:
My code

def create_model(input_shape=INPUT_SHAPE, output_shape=OUTPUT_SHAPE, model_url=MODEL_URL):
  print("Building model with:", MODEL_URL)

  # Setup the model layers
  model = tf.keras.Sequential([
    hub.KerasLayer(MODEL_URL), # Layer 1 (input layer)
    tf.keras.layers.Dense(units=OUTPUT_SHAPE,
                          activation="softmax") # Layer 2 (output layer)
  ])

  # Compile the model
  model.compile(
      loss=tf.keras.losses.CategoricalCrossentropy(),
      optimizer=tf.keras.optimizers.Adam(),
      metrics=["accuracy"]
  )

  # Build the model
  model.build(INPUT_SHAPE)

  return model```

TODO: Add your code here, best to format with backticks as well, for example: 

\```python <- use triple backticks before and after your code, write "python" after the first set to make the code formatted
[your code here]
\``` <- be sure to delete the slashes, only keep the backticks

## What I've tried so far

I tried some changes in code using stack overflow. i even tried to run your notebook of end to end dogvision. Your code also shows the same error

TODO: Add some steps for what you've tried to do so far to solve your error... (this will help others know what you've done when troubleshooting)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo question template #48

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Demo question template #48

mrdbourke Jun 2, 2023 Maintainer

Replies: 3 comments · 1 reply

nischal6 Sep 27, 2023

My error

My code

What I've tried so far

mrdbourke Sep 28, 2023 Maintainer Author

vfonsecal Jul 29, 2024

Monishabasith Nov 8, 2024

My error Building model with: https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4

mrdbourke
Jun 2, 2023
Maintainer

Replies: 3 comments 1 reply

nischal6
Sep 27, 2023

mrdbourke Sep 28, 2023
Maintainer Author

vfonsecal
Jul 29, 2024

Monishabasith
Nov 8, 2024

My error
Building model with: https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4