Interpretability

This chapter proposes a new compositional framework for interpretability.

Mechanistic interpretability aims to decompile neural networks into computer programs.
The envisioned approach uses features/variables and circuits/functions.

....

We propose a similar framework.