Skip to content
Discussion options

You must be logged in to vote

Hi @hasorez,

Glad you're enjoying the course!

You're right about the CNN turning images into patch embeddings.

The position embeddings are added so that the model knows that the patches have some kind of sequential order.

As in, patch 1 will be somewhat more related to patch 2 than to patch 16 (in most cases).

Otherwise, the model may just assign random order to the patches (it would still likely learn something here because neural networks are pretty robust).

From a high level, this in turn means that a model would see an image as a collection of random patches of colour, not really knowing that a dogs head is most often connected to the rest of its body (the patch with the dog head is c…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by hasorez
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants