CompInterp

CompInterp uncovers the structure of neural representations, showing how simple features compose into complex behaviours. By unifying tensor and neural network paradigms, model weights and data are treated as a single modality. This compositional lens on design, analysis and control paves the way for inherently interpretable AI without compromising performance.

Compositional architectures capture rich non-linear (polynomial) relationships between representation spaces. Instead of masking them through linear approximations, CompInterp methods expose their inherent hierarchical structure accross levels of abstraction. This allows for weight-based subcircuit analysis, grounding interpretability in formal (de)compositions rather than post-hoc activation-based heuristics.

We’re now scaling compositional interpretability to transformers and CNNs by leveraging their low-rank structure through tensor decomposition and information theory. Learn more in our latest talk!

news

May 12, 2026	Explore our interactive manifold viewer on Qwen 3.5!
May 10, 2026	Bilinear Autoencoders Find Interpretable Manifolds is out!
May 09, 2026	From Mechanistic to Compositional Interpretability is out!
Apr 15, 2026	We are mentoring for MARS (Mentorship for Alignment Research Students)!
Apr 14, 2026	Research using our tools was featured in TIME magazine!

selected publications

arXiv
Bilinear autoencoders find interpretable manifolds

Thomas Dooms^*, Ward Gauderis^*, Geraint Wiggins, and Jose Oramas

May 2026

Abs arXiv Bib HTML PDF Slides Website

Sparse autoencoders have become a standard tool for uncovering interpretable latent representations in neural networks. Yet salient concepts often span manifolds that current linear methods cannot capture without post hoc analysis. This paper uses quadratic latents to close this gap: we implement these with bilinear autoencoders, which decompose activations into low-rank quadratic forms, compose linearly in weight space, and admit input-independent geometric analysis. This qualitative difference in what concepts quadratic latents can detect challenges the standard linear representation hypothesis. Our experiments and visualisations show that multi-dimensional geometries are highly prevalent and that composite latents capture them well, systematically improving reconstruction error in language models. Furthermore, we show that autoencoders with varying geometric priors recover the same input subspace despite their dictionary entries being distinct. Practically, these models serve as an unsupervised tool for manifold discovery, which we demonstrate through an interactive online visualizer for Qwen 3.5. This is a step toward nonlinear but mathematically tractable latent representations whose composition is expressive and interpretable by design.
@misc{dooms_bilinear_2026, title = {Bilinear autoencoders find interpretable manifolds}, author = {Dooms, Thomas and Gauderis, Ward and Wiggins, Geraint and Oramas, Jose}, year = {2026}, month = may, eprint = {2605.08891}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2605.08891}, }
arXiv
From Mechanistic to Compositional Interpretability

Ward Gauderis^*, Thomas Dooms^*, Steven T. Holmer, Kola Ayonrinde, and 1 more author

May 2026

Abs arXiv Bib HTML PDF

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model’s decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we prove a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable foundation for automating the discovery and evaluation of mechanistic explanations.
@misc{gauderis_compositional_2026, title = {From Mechanistic to Compositional Interpretability}, author = {Gauderis, Ward and Dooms, Thomas and Holmer, Steven T. and Ayonrinde, Kola and Wiggins, Geraint A.}, year = {2026}, month = may, eprint = {2605.08934}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2605.08934}, }
CoLoRAI @ AAAI
Compositionality Unlocks Deep Interpretable Models

Thomas Dooms^*, Ward Gauderis^*, Geraint Wiggins, and Jose Oramas

In Connecting Low-Rank Representations in AI: At the 39th Annual AAAI Conference on Artificial Intelligence, Nov 2024

Abs arXiv Bib HTML PDF Video Poster Slides

We propose χ-net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. χ-nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this toward formal weight-based interpretability and model compression.
@inproceedings{dooms_compositionality_2024, title = {Compositionality {Unlocks} {Deep} {Interpretable} {Models}}, url = {https://openreview.net/forum?id=bXAt5iZ69l}, urldate = {2025-02-17}, booktitle = {Connecting {Low}-{Rank} {Representations} in {AI}: {At} the 39th {Annual} {AAAI} {Conference} on {Artificial} {Intelligence}}, author = {Dooms, Thomas and Gauderis, Ward and Wiggins, Geraint and Oramas, Jose}, month = nov, year = {2024}, }