Position embeddings in the paper replicating chapter #349

hasorez · 2023-03-15T18:22:14Z

hasorez
Mar 15, 2023

Hi, thanks @mrdbourke for the course. I had no idea what deep learning even was before it. Now I'm feeling like I could code skynet.

So according to my understanding the CNN that is used to turn the images into patch embeddings in the paper replicating chapter returns multiple feature maps of the same image. If that's the case why do we add the position embeddings when all of the feature maps are of the same image? Why is the position of these feature maps in the tensor important?

Answered by mrdbourke

Mar 20, 2023

Hi @hasorez,

Glad you're enjoying the course!

You're right about the CNN turning images into patch embeddings.

The position embeddings are added so that the model knows that the patches have some kind of sequential order.

As in, patch 1 will be somewhat more related to patch 2 than to patch 16 (in most cases).

Otherwise, the model may just assign random order to the patches (it would still likely learn something here because neural networks are pretty robust).

From a high level, this in turn means that a model would see an image as a collection of random patches of colour, not really knowing that a dogs head is most often connected to the rest of its body (the patch with the dog head is c…

View full answer

mrdbourke · 2023-03-20T22:55:53Z

mrdbourke
Mar 20, 2023
Maintainer

Hi @hasorez,

Glad you're enjoying the course!

You're right about the CNN turning images into patch embeddings.

The position embeddings are added so that the model knows that the patches have some kind of sequential order.

As in, patch 1 will be somewhat more related to patch 2 than to patch 16 (in most cases).

Otherwise, the model may just assign random order to the patches (it would still likely learn something here because neural networks are pretty robust).

From a high level, this in turn means that a model would see an image as a collection of random patches of colour, not really knowing that a dogs head is most often connected to the rest of its body (the patch with the dog head is close to the patch of the body).

The authors of the paper found adding positional embeddings improved performance.

Source: https://arxiv.org/abs/2010.11929

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Position embeddings in the paper replicating chapter #349

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Position embeddings in the paper replicating chapter #349

Uh oh!

hasorez Mar 15, 2023

Replies: 1 comment

Uh oh!

mrdbourke Mar 20, 2023 Maintainer

hasorez
Mar 15, 2023

mrdbourke
Mar 20, 2023
Maintainer