Summary

All prominent building blocks within neural network design can be written as a tensor network.
Often, the only requirement is replacing ordinary activation functions with multiplicative variants.

For instance, MLPs can be replaced with bilinear layers, a commonly used tensor representation. Many modern architectures chunk inputs into sequences where each entry is handled similarly.
We discussed the differences and similarities between mixers, convolutions and attention.

In short, this formal diagrammatic language helps reason about their structure and composition. The next chapter shows another advantage; these networks are thoroughly studied in the literature. There exist sophisticated algorithms to compute global metrics, that are otherwise intractable.

Compositional Interpretability

Summary