publications | CompInterp

2024

CoLoRAI @ AAAI
Compositionality Unlocks Deep Interpretable Models

Thomas Dooms^*, Ward Gauderis^*, Geraint Wiggins, and Jose Oramas

In Connecting Low-Rank Representations in AI: At the 39th Annual AAAI Conference on Artificial Intelligence, Nov 2024

Abs arXiv Bib HTML PDF Video Poster Slides

We propose χ-net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. χ-nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this toward formal weight-based interpretability and model compression.
@inproceedings{dooms_compositionality_2024, title = {Compositionality {Unlocks} {Deep} {Interpretable} {Models}}, url = {https://openreview.net/forum?id=bXAt5iZ69l}, urldate = {2025-02-17}, booktitle = {Connecting {Low}-{Rank} {Representations} in {AI}: {At} the 39th {Annual} {AAAI} {Conference} on {Artificial} {Intelligence}}, author = {Dooms, Thomas and Gauderis, Ward and Wiggins, Geraint and Oramas, Jose}, month = nov, year = {2024}, }
ICLR
Bilinear MLPs Enable Weight-Based Mechanistic Interpretability

Michael T. Pearce, Thomas Dooms, Alice Rigg, Jose Oramas, and 1 more author

In The Thirteenth International Conference on Learning Representations, Oct 2024

Abs Bib HTML PDF

A mechanistic understanding of how MLPs do computation in deep neural net- works remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that neverthe- less achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecom- position reveals interpretable low-rank structure across toy tasks, image classifi- cation, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight- based interpretability is viable for understanding deep-learning models.
@inproceedings{pearceBilinearMLPsEnable2024a, title = {Bilinear {{MLPs}} Enable Weight-Based Mechanistic Interpretability}, booktitle = {The {{Thirteenth International Conference}} on {{Learning Representations}}}, author = {Pearce, Michael T. and Dooms, Thomas and Rigg, Alice and Oramas, Jose and Sharkey, Lee}, year = {2024}, month = oct, urldate = {2025-05-07}, langid = {english}, url = {https://openreview.net/forum?id=gI0kPklUKS}, arxive = {2410.08417}, }