You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m confused about using image information as queries. Why would the attention map of image-text correlation be weighted towards the text information? In fact, text-to-image diffusion models all use text as the query.
The text was updated successfully, but these errors were encountered:
I’m confused about using image information as queries. Why would the attention map of image-text correlation be weighted towards the text information? In fact, text-to-image diffusion models all use text as the query.
The text was updated successfully, but these errors were encountered: