- 1. Overview
- 2. Introduction
- 2.1. Motivation
- 2.2. Tensor Networks
- 2.3. Summary
- 3. Architectures
- 3.1. MLPs
- 3.2. Sequences
- 3.3. Attention
- 3.4. Convolution
- 3.5. Mixer
- 3.6. Composition
- 3.7. Summary
- 4. Decompositions
- 4.1. Orthogonalisation
- 4.2. Diagonalisation
- 4.3. Extensions
- 4.4. Summary
- 5. Interpretability
- 5.1. Duality
- 5.2. Sparsity
- 5.3. Structure
- 5.4. Features
- 5.5. Circuits
- 5.6. Supervision
- 5.7. Superposition
- 5.8. Summary
- 6. Experiments
- 6.1. Chess
- 6.2. Tradeoffs
- 7. Conclusion
- 7.1. Future Work
- 8. Appendix
- 8.1. Glossary
- 8.2. Normalisation
- 8.3. Invariants
- 8.4. Spiders
- 8.5. Squared Attention
- 9. Documentation
- 9.1. Modules
- 9.1.1. Matrix
- 9.1.2. Bilinear
- 9.1.3. Attention
- 9.2. Compositions
- 9.2.1. Sequential
- 9.2.2. Dual
- 9.3. Sparsification
- 9.3.1. TICA
- 9.4. Plotting