Interpretability
This chapter proposes a new compositional framework for interpretability.
Mechanistic interpretability aims to decompile neural networks into computer programs.
The envisioned approach uses features/variables and circuits/functions.
....
We propose a similar framework.