1. 1. Overview
  2. 2. Introduction
    1. 2.1. Motivation
    2. 2.2. Tensor Networks
    3. 2.3. Summary
  3. 3. Architectures
    1. 3.1. MLPs
    2. 3.2. Sequences
    3. 3.3. Attention
    4. 3.4. Convolution
    5. 3.5. Mixer
    6. 3.6. Composition
    7. 3.7. Summary
  4. 4. Decompositions
    1. 4.1. Orthogonalisation
    2. 4.2. Diagonalisation
    3. 4.3. Extensions
    4. 4.4. Summary
  5. 5. Interpretability
    1. 5.1. Duality
    2. 5.2. Sparsity
    3. 5.3. Structure
    4. 5.4. Features
    5. 5.5. Circuits
    6. 5.6. Supervision
    7. 5.7. Superposition
    8. 5.8. Summary
  6. 6. Experiments
    1. 6.1. Chess
    2. 6.2. Tradeoffs
  7. 7. Conclusion
    1. 7.1. Future Work
  8. 8. Appendix
    1. 8.1. Glossary
    2. 8.2. Normalisation
    3. 8.3. Invariants
    4. 8.4. Spiders
    5. 8.5. Squared Attention
  9. 9. Documentation
    1. 9.1. Modules
      1. 9.1.1. Matrix
      2. 9.1.2. Bilinear
      3. 9.1.3. Attention
    2. 9.2. Compositions
      1. 9.2.1. Sequential
      2. 9.2.2. Dual
    3. 9.3. Sparsification
      1. 9.3.1. TICA
    4. 9.4. Plotting

Compositional Interpretability