Diagonalisation

Since direct orthogonalisation uses an eigendecomposition, each tensor is already ordered by local importance. However, its possible to do better, tensors can be ordered with respect to any point in the network (often the output).

Preliminaries

One advantage of using tensor networks is that they readily reveal structure within tensors. Mathematicians past toiled tirelessly so we could plagiarise their lemmas.

One such lemma is that performing SVD on the (flattened) tensor product of two matrices is equal to performing SVD separately and then taking the tensor product of each part.

The proof is based on the fact that SVD is unique (when ordered by singular values). Since the left side is a valid decomposition, performing the 'full' decomposition will be equivalent.

The full network can be split, into a pre tensor (upstream) and a post tensor (downstream). This shows all duplicated inputs are fully separable and equal. Hence, computing the SVD on one such wire is sufficient to find the best global decomposition.

Due to the orthogonalisation, the pre tensor is already orthogonal, and hence valid right singular vectors for the SVD.

Hence, we only need to compute the left singular vectors and singular values from the post tensor. Fortunately, that's also easy. Multiplying with the post tensor's transpose would yield a huge tensor. However, since all hidden wires are equal, these can be connected, yielding the following matrix.

Compositional Interpretability

Diagonalisation

Preliminaries