Research · Per Ardua

Entanglement Under Fine-Tuning: Architecture-Dependent Collapse, Scale Thresholds, and Entanglement-Optimal Adaptation

Fine-tuning can destroy entanglement geometry — but only in specific architectures at sufficient scale

AI-28/29 Activation Geometry DOI

Absorbs: AI-28: Entanglement-Optimal Fine-Tuning — block-diagonal LoRA, CSR, cross-family replication; AI-29: Selective Disentanglement — architecture-dependent collapse, scale ladder, mechanism tests

Executive Summary

This consolidated paper unifies two previously separate works on how fine-tuning interacts with structural entanglement in transformer activation space. The first contribution (from AI-28) demonstrates that entanglement geometry should be exploited, not fought: block-diagonal LoRA adapters aligned with entanglement structure achieve 48.2% pass@1 on HumanEval+ versus 36.6% for concept-isolating adapters. An 8-seed strong-intervention replication reveals that B3 (code+NL fine-tuning) drives entanglement intensity to zero in Qwen-32B — a complete phase transition by step 3,500. Cross-family replication on CodeLlama-7B and DeepSeek-Coder-6.7B shows the collapse is Qwen-specific.

The second contribution (from AI-29) establishes that the collapse is both architecture-dependent and scale-dependent. Four architectures at ~7B show four qualitatively different responses to the same B3 protocol: CodeLlama increases EI (+27-39%), DeepSeek modestly decreases it (-14%), Mistral is unchanged (+0.4%), and Qwen-7B partially decreases (-32%). A scale ladder within the Qwen family reveals graded susceptibility: -32% at 7B, -93% at 14B, -100% at 32B, with a phase boundary between 14B and 32B.

Entanglement-Optimal Adaptation

Block-diagonal LoRA adapters that respect entanglement geometry outperform adapters that attempt to isolate concept subspaces. B2 (entanglement-aligned) achieves 48.2% on HumanEval+ versus B3 (concept-isolating) at 36.6%, despite comparable parameter counts. B4 (complement-subspace regularization) achieves the smallest EI change with 95% CI spanning zero, preserving entanglement structure while still enabling adaptation. Crosstalk-guided companion selection minimizes cross-concept interference during fine-tuning.

Architecture-Dependent Collapse

B3 drives EI to zero in all 8 seeds of Qwen-32B while preserving domain classification accuracy (0.90-0.98). The same protocol applied to four other architectures produces qualitatively different responses. Non-degeneracy margin is refuted as the mechanism: Qwen-32B has the largest margin (1.78) yet is the only model that collapses. Pre-training composition is unsupported: all models show similar cross-domain activation structure. Scale dependence is confirmed within the Qwen family, with a qualitative phase boundary between 14B and 32B.

Key Findings

  • Entanglement-aware beats concept-isolating: B2 achieves 48.2% vs B3 at 36.6% on HumanEval+
  • B3 drives EI to zero in Qwen-32B: All 8 seeds converge to EI = 0.000 by step 3,500 — a sharp phase transition
  • Architecture-specificity: Four families at ~7B show four qualitatively different responses to identical protocol
  • Scale ladder: -32% at 7B, -93% at 14B, -100% at 32B within Qwen family
  • CSR preserves structure: B4 achieves smallest EI change with 95% CI spanning zero
  • Probe validation: Three independent probe sets confirm collapse is geometric, not a measurement artifact

Superseded Papers

This paper consolidates and supersedes:

Key References

  • McEntire (2026) — Structural Entanglement (AI-26): establishes the geometric phenomenon this paper exploits
  • McEntire (2026) — Entangled Directions (AI-25): discovers the discrimination-activation dissociation
  • McEntire (2026) — The Entanglement Theorem (AI-27): formal proof of the geometric mechanism
  • Hu et al. (2022) — LoRA: Low-Rank Adaptation of Large Language Models

Download Full Paper

Access the complete research paper with detailed methodology, empirical evidence, and formal proofs.

Download PDF