Executive Summary
This consolidated paper unifies two previously separate works into a single treatment of universal entanglement in transformer activation space. The first contribution (from AI-25) establishes the discrimination-activation dissociation: SVD directions in multi-concept ridge regression can be concept-pure for classification (V-matrix purity greater than 0.96) while simultaneously carrying all concepts in their activations. The damage matrix — constructed by projecting out each direction and measuring leave-one-out accuracy loss — reveals minimum cross-concept damage of 38.8%. No direction can be removed without damaging every concept.
The second contribution (from AI-26) establishes that this entanglement is geometric, not learned. Eight experiments across four transformer architectures (GPT-2 124M, Qwen-0.5B, Qwen-7B, and Qwen-7B-Instruct) show that random Gaussian projections to d greater than or equal to 448 reproduce the learned entanglement intensity (EI = 1.50), while PCA reverses it. Superlinear amplification is confirmed: triple EI exceeds mean pairwise by 2x. Together, the results establish that entanglement intensity is determined by the ratio d/k (hidden dimension to number of concepts), not by training or architecture.
Discrimination-Activation Dissociation
Linear probing assumes that concept-separability in the classifier implies concept-separability in the activations. This paper shows that assumption is false. Using multi-concept ridge regression with SVD decomposition on Qwen 2.5-7B, directions can be concept-pure for discrimination while simultaneously carrying all concepts in their activations. The V-matrix shows what the classifier uses each direction for; the damage matrix shows what each direction actually carries. This establishes a fundamental limitation on direction-based concept editing: the geometry that supports classification is not the geometry that carries information.
Entanglement Is Geometric
Random Gaussian projections to 448 dimensions match learned EI (1.50); projections to the 7-dimensional informative rank yield baseline EI (0.18). PCA to 112 dimensions achieves EI 0.18 with purity 0.76 — reversing the entanglement. Concept-type independence is validated by replacing linguistic concepts with software engineering concepts (mean ratio 0.97). RLHF accelerates entanglement crystallization during training. Stratified bootstrap confidence intervals (2,000 iterations) confirm EI is significantly above zero for all four models.
Superlinear Amplification
When three concepts are probed simultaneously, the triple EI exceeds the mean pairwise EI by 2x (GPT-2: 1.87x, Qwen-7B: 2.15x). Nesting two concepts into one reduces EI below the pairwise baseline, confirming that independent concept axes drive the superlinearity. This is not a measurement artifact but a structural consequence of encoding multiple concepts in a shared high-dimensional space.
Key Findings
- V-matrix purity greater than 0.96: SVD directions are concept-pure for discrimination
- Minimum cross-concept damage 38.8%: Every direction carries every concept in activations
- Random projection reproduces entanglement: Gaussian projections to 448d match learned EI (1.50)
- PCA reverses entanglement: PCA to 112d achieves EI 0.18 with purity 0.76
- Superlinear amplification: Triple/pairwise EI ratio 1.87x-2.15x
- Cross-model consistency: All four architectures show EI greater than 1.0 at terminal layers
- RLHF accelerates crystallization: Instruction tuning reaches terminal EI faster
Superseded Papers
This paper consolidates and supersedes:
- AI-25: Entangled Directions — discrimination-activation dissociation and the damage matrix
- AI-26: Structural Entanglement in the Informative Subspace — eight experiments establishing entanglement as geometric
Key References
INLP: Iterative Null-Space Projection for concept erasure.
Extensions of Lipschitz mappings into Hilbert space.
The Concentration Barrier (AI-11): effective dimensionality bounds on selectivity.
The Entanglement Theorem (AI-27): formal proof that entanglement is geometric.