A small collection of insights on the structure of embeddings (latent spaces) produced by deep neural networks.
Manifold Hypothesis: High-dimensional data sampled from natural (real-world) processes lies in a low-dimensional manifold.
Hierarchical Organization: Features organize hierarchically across layers – earlier layers capture low-level (small context) features while deeper layers represent increasingly abstract (large context) concepts.
Linear Hypothesis: Neural networks represent features as linear directions in their activation space.
Superposition Hypothesis: Neural nets represent more “independent” features than a layer has neurons (dimensions) by representing features as a linear combination of neurons.
Universality Hypothesis: Circuits reappear across different models for the same data.
Adversarial Vulnerability: Small changes in input space can cause large shifts in embeddings and therefore also in predictions made from them, suggesting the learned manifolds have irregular geometric properties.
Neural Collapse: After extensive training, class features in the final layer cluster tightly around their means, with the network’s classification weights aligning with these mean directions. Within-class variation becomes minimal compared to between-class differences, effectively creating distinct, well-separated clusters for each class.