Why don't I need to use `torch.softmax()` for calculating predictions? #314

mrdbourke · 2023-02-24T07:33:05Z

mrdbourke
Feb 24, 2023
Maintainer

Original question:

Why is it that in the PyTorch neural network classification, we used softmax for the output logits to convert into prediction probabilities but in computer vision when using the forward pass we don't use softmax on the logits? We simply get the prediction probabilities by taking torch.argmax()?

mrdbourke · 2023-02-24T07:39:17Z

mrdbourke
Feb 24, 2023
Maintainer Author

Good question!

In short:

softmax helps go from: logits -> prediction probabilities -> predictions
but you can just go from: logits -> predictions (skip the middle step) via torch.argmax()

Why?

This is because softmax is a "monotonic function" - https://en.wikipedia.org/wiki/Monotonic_function

For example, it doesn't change the order of the inputs that go into it, only their values (from one form to another form, more specifically to be in range of 0 to 1 and all adding to 1).

Examples

The following two lines of code will output the same values:

import torch

torch.manual_seed(42)

tensor_A = torch.randn(1, 10)

softmax = torch.argmax(torch.softmax(tensor_A, dim=1), dim=1)
no_softmax = torch.argmax(tensor_A, dim=1)

print(softmax == no_softmax)

Output:

tensor([True])

Why transform logits to prediction probabilities then?

I find these helpful to understand more than raw logits but it's not 100% necessary.

Try playing around with the input and output values with and without softmax.

And to dig deeper, see how the softmax function is defined, replicating it without using in-built functions would be a great exercise: https://en.wikipedia.org/wiki/Softmax_function

2 replies

Joaquin-Pineiro Mar 16, 2023

I was about to make the same question, hopefully someone did it first, thank you for your reply Daniel!

henrykohl Mar 19, 2023

Hi Daniel:
I have a question about the further prediction evaluation in lecture #121.

`
for X, y in tqdm(test_dataloader, desc="Making predictions..."):

X, y = X.to(device), y.to(device)
y_logit = model_2(X)  
y_pred = torch.softmax(y_logit.squeeze(), dim=0).argmax(dim=1) 
y_preds.append(y_pred.cpu())

`

Why do we use "y_pred = torch.softmax(y_logit.squeeze(), dim=0).argmax(dim=1)" instead of "y_pred = torch.softmax(y_logit.squeeze(), dim=1).argmax(dim=1) "? Should the softmax not be computed along dim=1? As mentioned above, we can go from: logits -> predictions (skip the middle step) via torch.argmax(). Actually, y_logit.argmax(dim=1) is equal to y_pred = torch.softmax(y_logit.squeeze(), dim=1).argmax(dim=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why don't I need to use `torch.softmax()` for calculating predictions? #314

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Why don't I need to use torch.softmax() for calculating predictions? #314

Uh oh!

mrdbourke Feb 24, 2023 Maintainer

Replies: 1 comment · 2 replies

Uh oh!

mrdbourke Feb 24, 2023 Maintainer Author

Examples

Why transform logits to prediction probabilities then?

Uh oh!

Uh oh!

Joaquin-Pineiro Mar 16, 2023

Uh oh!

Uh oh!

henrykohl Mar 19, 2023

Why don't I need to use `torch.softmax()` for calculating predictions? #314

mrdbourke
Feb 24, 2023
Maintainer

Replies: 1 comment 2 replies

mrdbourke
Feb 24, 2023
Maintainer Author