Research: Neuron Mechanics #110
Description
🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.
Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.
Description
Decomposed feature visualization is a new technique for exploring the mechanics of a feature. Instead of just visualizing the feature by creating a new input, one optimizes a previous hidden layer, then visualizes each of those components separately.
For example, in the following example it reveals that a car detecting neuron at layer mixed4c functions by doing a 5x5 convolution, looking for windows in the top half and wheels in the bottom:
When applied to adjacent layers, decomposed feature visualization can be thought of as visualizing the weights or the gradient of a two layers. It can also be thought of as a way to explain attribution. Look at the draft article!
I think this technique is exciting because it seems to significantly advance our mechanistic understanding of how convolutional networks operate. This seems critical to our ability to eventually audit models, or gain deep scientific understanding.
Links
- Draft article (password: "vis")
- Please don't share link to draft anywhere high profile. We'd like to save attention for a polished final version.
- colab notebook
Next Steps
- Fiddle with hyperparmeters / objectives for visualization.
- Try using neuron instead of channel objective.
- Try using cossim objectives? (See issue on new feature vis objectives)
- Scale up: visualize many neurons this way, potentially across many models.
- Find interesting examples of meaningful units and how they are implemented by the network.
- To start, pick interesting neurons from the feature vis appendix and explore those. Note this technique doesn't work well for 1x1 conv neurons.
- Ideally, find interesting trends in how neural nets do things.
- Explore using this to explain attribution (see demo in article).
- Explore using this to directly visualize hidden layer attribution. Instead of a saliency map with scalars, apply feature vis to each vector.
- Writing: There's a lot of work to be done on the draft article.