Executive Summary
If one AI agent has just finished analyzing a complex medical case, could it transfer its reasoning state to a second agent by copying internal neural activations -- bypassing the bottleneck of explaining its findings in words? We tested this systematically with four approaches using Qwen 2.5-7B across 160 domain probes in four domains. Text beats all of them.
A three-sentence summary achieves 95% domain classification accuracy and preserves 63.7% of the sender's reasoning trajectory (measured by the receiver's ability to predict the sender's specific next words). The best activation injection adds just 1.2% to trajectory alignment. Full activation injection without text -- despite preserving 4x higher representational geometry -- carries zero reasoning trajectory information: the receiver's perplexity is indistinguishable from having no context at all.
The strongest result comes from using activations to select input sequences rather than inject state. Priming selection closes 48.9% of the expert-baseline gap, outperforming activation injection by 5.4x. The practical recommendation is direct: design the input sequence, do not inject activations.
INLP Projection Transfer (from former AI-16)
Text baseline achieves 95% domain classification. INLP injection (36 dimensions) adds +1.9%, outperforming full activation injection (3,584 dimensions, +1.3%) through a denoising effect. Sender and receiver encode domain in nearly orthogonal directions (cosine 0.27). Procrustes alignment fails (residual > 1). No shared coordinate system exists for domain-specific information.
Full Activation Injection (from former AI-17)
All text-based conditions cluster at RSA ~0.11 regardless of injection bandwidth (0-3,584 dimensions). BOS + Full activation achieves RSA = 0.47 -- 4x higher. Despite this geometric advantage, KL divergence is similar across all conditions (6.97-7.35 nats). Domain reversal: text best preserves legal within-domain structure (0.23), while activations best preserve science (0.64). The two modalities carry complementary information -- text carries lexical/semantic structure, activations carry computational/geometric structure.
Continuation Perplexity (from former AI-18)
Original context achieves PPL = 1.59 (near-deterministic). Text summary achieves PPL = 3.05, preserving 63.7% of the perplexity gap. Full activation without text scaffold achieves PPL = 5.60, indistinguishable from no context (5.62). Despite 4x geometric fidelity, activation injection without text carries zero reasoning trajectory information. Continuation perplexity -- measuring word-by-word prediction alignment -- is itself a methodological contribution.
Activation-Guided Input Selection (from former AI-19)
Priming selection -- choosing the right structured input sequence based on activation-space proximity -- closes 48.9% of the expert-baseline gap. Centroid injection closes 9.1%. Socratic scaffolding worsens performance by 19.9%. Science priming is selected for 77% of probes, achieving 86% gap closure for medical and science but only 22-27% for legal and code. The mechanism is processing style alignment (domain match rate only 15.6%), not domain matching.
Key Findings
- Geometry-function dissociation: 4x higher geometric fidelity (RSA) translates to zero improvement in reasoning trajectory alignment (PPL)
- Text preserves function, activations preserve geometry: Text achieves low perplexity through different geometric means; activations preserve geometry without functional alignment
- Selection outperforms injection 5.4x: Using activations to choose the right input closes 48.9% of the gap; injecting activations closes 9.1%
- 17-layer overwrite: Injection at layer 10 is dominated by 17 subsequent text-conditioned attention layers
- Domain reversal: Text best preserves legal structure; activations best preserve science structure
- Bandwidth ceiling at ~100 dimensions: Additional injection bandwidth beyond 100 PCA components adds no information
- Propositional framing is counterproductive: Analytical scaffolds ("the domain is X") perform worse than no coordination
Limitations
- Same-model transfer (Qwen 2.5-7B to itself) -- the best case for activation injection; cross-model faces steeper obstacles
- Single injection layer (10) -- multi-layer or terminal-layer injection might maintain signal through the forward pass
- Continuation perplexity is a proxy for reasoning alignment, not a direct measure of task completion
- Resource asymmetry: ~150 tokens of text vs 36 floats of injection -- deliberately unfair to test whether injection adds value atop text
- Four domains, one model architecture, one scale
Key References
- McEntire (2026) -- Universal Entanglement in Transformer Activation Space (parent paper)
- Kornblith et al. (2019) -- Similarity of Neural Network Representations Revisited (CKA)
- Kriegeskorte et al. (2008) -- Representational Similarity Analysis
- Ravfogel et al. (2020) -- Iterative Nullspace Projection