Compositional Interpretability

CompInterp uncovers the structure of neural representations, showing how simple features compose into complex behaviours. By unifying tensor and neural network paradigms, model weights and data are treated as a single modality. This compositional lens on design, analysis and control paves the way for inherently interpretable AI without compromising performance.

Compositional architectures capture rich non-linear (polynomial) relationships between representation spaces. Instead of masking them through linear approximations, CompInterp methods expose their inherent hierarchical structure accross levels of abstraction. This allows for weight-based subcircuit analysis, grounding interpretability in formal (de)compositions rather than post-hoc activation-based heuristics.

We’re now scaling compositional interpretability to transformers and CNNs by leveraging their low-rank structure through tensor decomposition and information theory. Learn more in our latest talk!

news

Oct 15, 2025 We are presenting our work at Flanders AI research day. :hand_over_mouth:
Oct 01, 2025 We got a spotlight at the mechinterp workshop (NeurIPS’25)! :partying_face:
Apr 05, 2025 We have a website now! :sparkles:
Mar 04, 2025 We are presenting our poster at CoLoRAI (AAAI’25)!

selected publications

  1. MI @ NeurIPS
    manifold.png
    Finding Manifolds With Bilinear Autoencoders
    In Mechanistic Interpretability Workshop: At the Thirty-Ninth Annual Conference on Neural Information Processing Systems, Oct 2025
  2. Compositionality Unlocks Deep Interpretable Models
    In Connecting Low-Rank Representations in AI: At the 39th Annual AAAI Conference on Artificial Intelligence, Nov 2024