Tensor Networks

Matrices can be seen as a map between one space to another. Tensors generalise this by mapping between an arbitrary number of spaces.

For instance, a matrix may encode a graph, translating between source -> target. A time-dependent graph time, source -> target can be described by a tensor. Tensors can be seen as high-dimensional cubes, where each side is called an index (sometimes called mode). The amount of indices is called the order of the tensor.

Unfortunately, the number of parameters in a tensor grows exponentially in its order. This makes is intractable to work with tensors for many non-trivial problems.

Factorising tensors into networks

Despite only being quadratic in parameter count, it can be impractical to work with huge matrices. A common solution is to factorise or decompose these into something that is easier to work with. For instance, such a matrix \(X\) can be approximated using two lower rank parts \(X \approx A B\).

There exist several generalisations to tensors, all of which can be described by tensor networks. Tensor networks are graphs, where each node, or box, represents a tensor and each edge, or wire, represents how tensors are connected. Unlike an ordinary graph, tensor networks can have unconnected edges that represent unconnected indices (inputs/outputs). This graphical notation is really useful to understand what kind of structure the described tensor has.

Tensors instantiated by deep neural networks are laughably huge -- over thousand orders. However, these can be studied efficiently anyway as highly structured tensor networks.

Introduction to tensor diagrams

This following introduces the tensor diagram notation, focussing on common intuitions.¹

A matrix is denoted as a box.
It has an input and an output wire.
These can be named for clarity.

A vector is also a box but only has a single wire.
We don't distinguish between input or output vectors for simplicity.

A vector-matrix multiplication connects them.

A matrix multiplication connects two matrices.
It contracts away the hidden dimension.

Transposing matrices changes the order of input and output. This corresponds to a vertical reflection of the diagram (hence the nook).

(Flattened) Matrices can represent a connection between inputs, just like in the attention mechanism. This shows why one side should be transposed.

Special matrix structure can be conveyed through diagrams. This is the identity matrix. It doesn't do anything to the input.

A diagonal matrix is written as an identity matrix, element-wise multiplied by a vector.
In this case, the multiplication is denoted as a black circle.

When an isometric (think orthogonal) matrix is multiplied with its transpose it equals the identity. We indicate such matrices with a border.

Diagrammatic equations are a powerful tool to simplify and analyse tensor networks. We write the SVD of a matrix as such.

The number of free/open wires indicate the order of the tensor. This box represents a dense third-order tensor.

There exist other ways in which to instantiate tensors that may be more efficient. This is a diagram of a bilinear layer.

Mathematical meaning

Diagrams are useful because they convey an intuitive overview of the structure of networks. However, they can also be used to understand gritty details of specific contractions.

In these diagrams, boxes are tensors, but what are wires and what does it mean to connect them? Wires represent a sum of over basis vectors, for example, zero vectors with 1 at entry \(i\).

Hence, connections indicate that some index of a matrix is the same (and should be summed). Following this, diagrams can be converted to Einstein notation, as encountered in einsums.

Diagrammatic reasoning

Becoming fluent in tensor networks takes time and practice; some seemingly simple have non-trivial interpretations. However, these diagrams allow us to spatially reason about tensor operations in a way other mathematical tools don't. They are not merely a visualisation tool, they are a formal language over tensor networks. Diagrams can be seen as formulae, between which equalities can holds (as illustrated in the SVD example).

The following are a few examples of specific networks with a well-known meaning. Try to interpret what each diagram represents.

Hint

Connecting both sides of the same matrix implies taking a sum over equal indices.

Solution

The trace of A. \(\left(\sum_i A_{ii}\right)\)

Hint

This diagram can be seen as the trace of A multiplied with itself.

Solution

The Frobenius norm of matrix A. \(\left(\sum_{ij} A_{ij}A_{ij}\right)\)

Hint

Both sides perform an element-wise multiplication instead of an inner product.

Solution

The element-wise multiplication of matrix entries in A and B.

Hint

Unconnected sub-networks represent a (tensor) product.

Solution

A partial identity, scaled by the inner product of V and the second input.

This notation often varies throughout the literature. While the underlying meaning is formal, the notation itself is often intuitive. The used diagrams are inspired by (Coecke and Kissinger, 2018) ↩

Compositional Interpretability

Tensor Networks

Factorising tensors into networks

Introduction to tensor diagrams

Mathematical meaning

Diagrammatic reasoning