Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Research: Caricatures #121

Open
Open
@colah

Description

@colah

🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.

Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.

Description

Caricatures are a powerful feature visualization technique that we haven't fully explored or published on yet. Roughly, they allow us to take an input image, feed it through to some layer of a network, and get a sense of how the network understood it.

image

Caircatures do this by creating a new image that has a similar but more extreme activation pattern to the original at a given layer.

There are two related properties that make caricatures really interesting as a visualization:

  • They are basis-free visualizations. Unlike neuron visualizations, where which neurons you pick dramatically effects the results, and rotating activation space would dramatically change things, caricatures are unaffected. This means they work well even for models where concepts don't align with neurons.

  • They are comparable visualizations. Most visualizations we have are not comparable between models. For example, if you visualize a neuron in one model, and another in a different model, there's no reason for them to represent the same thing and you learn little about how the models compare. While there are other comparable visualizations, caricatures are by far the simplest ones.

This makes caricatures a really important technique! This is because

  1. they are our first, simplest line of attack on model comparison

  2. they are a super useful tool for debugging feature visualization when it doesn’t work (because they remove neuron choice as a potential problem).

    image

Next Steps

  1. Caricatures are much more powerful when shown in context, as demonstrated at the top of this notebook. It would be great to scale this!

  2. It would be super excited to do more controlled experiments of changing network architectures and see how the caricatures respond. (The models would also be a useful resources to have for future model comparison work.) The one I'm most immediately excited about is exploring network branches, the effect of data sets, and preprocessing.

  3. We've recently had some early exciting results about "attributive caricatures" which might be interesting to explore:

    image

  4. It might be useful to show how they can be used for debugging feature vis.

Metadata

Metadata

Labels

researchResearch ideas and projects.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions