Compositional Interpretability

CompInterp uncovers the structure of neural representations, showing how simple features compose into complex behaviours. By unifying tensor and neural network paradigms, model weights and data are treated as a single modality. This compositional lens on design, analysis and control paves the way for inherently interpretable AI without compromising performance.

Compositional architectures capture rich non-linear (polynomial) relationships between representation spaces. Instead of masking them through linear approximations, CompInterp methods expose their inherent hierarchical structure accross levels of abstraction. This allows for weight-based subcircuit analysis, grounding interpretability in formal (de)compositions rather than post-hoc activation-based heuristics.

We’re now scaling compositional interpretability to transformers and CNNs by leveraging their low-rank structure through tensor decomposition and information theory. Learn more in our latest talk!

news

May 12, 2026 Explore our interactive manifold viewer on Qwen 3.5! :telescope:
May 10, 2026 Bilinear Autoencoders Find Interpretable Manifolds is out! :ocean:
May 09, 2026 From Mechanistic to Compositional Interpretability is out! :triangular_ruler:
Apr 15, 2026 We are mentoring for MARS (Mentorship for Alignment Research Students)! :seedling:
Apr 14, 2026 Research using our tools was featured in TIME magazine! :newspaper:

selected publications

  1. arXiv
    manifold2.png
    Bilinear autoencoders find interpretable manifolds
    May 2026
  2. arXiv
    refinement.png
    From Mechanistic to Compositional Interpretability
    Ward Gauderis*, Thomas Dooms*, Steven T. Holmer, Kola Ayonrinde, and 1 more author
    May 2026
  3. Compositionality Unlocks Deep Interpretable Models
    In Connecting Low-Rank Representations in AI: At the 39th Annual AAAI Conference on Artificial Intelligence, Nov 2024