2508.06477v2

Model: healer-alpha-free

# Intuition emerges in Maximum Caliber models at criticality **Authors**: Lluís Arola-Fernández > Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Catalonia, Spain Current address. (September 26, 2025) ## Abstract Whether large predictive models merely parrot their training data or produce genuine insight lacks a physical explanation. This work reports a primitive form of intuition that emerges as a metastable phase of next-token prediction under future path-entropy maximization. The intuition mechanism is discovered via mind-tuning, the minimal principle that imposes Maximum Caliber in predictive models with a control temperature-like parameter $\lambda$ . Training on random walks in deterministic mazes reveals a rich phase diagram: imitation (low $\lambda$ ), rule-breaking hallucination (high $\lambda$ ), and a fragile in-between window exhibiting strong protocol-dependence (hysteresis) and multistability, where models spontaneously discover novel goal-directed strategies. These results are captured by a mechanistic low-dimensional theory and frame intuition as an emergent property at the critical balance between memorizing what is and wondering what could be. Introduction.— The rise of large-scale predictive models is reshaping artificial intelligence and transforming science and society. This progress is built upon a dominant scaling paradigm: pre-training autoregressive neural networks [1] with enormous parameter counts on big volumes of data [2] using massive compute resources [3]. When coupled with powerful search at inference time [4], this approach has yielded impressive performance in complex games [5], medical diagnosis [6] and algorithmic discovery [7]. Yet, the brute-force solution does not match the elegant efficiency of natural intelligence, which discovers intuitive shortcuts and novel, creative strategies from sparse data without rewards [8]. This contrast sharpens a foundational debate: are these models showing sparks of artificial general intelligence (AGI) [9], or are they “stochastic parrots” [10] that leverage vast experience to create an illusion of thought [5, 11]? While often addressed via complex reasoning benchmarks [12], the paradigm’s limits can be distilled into a simple Gedankenexperiment (Fig. 1). <details> <summary>Figure1.png Details</summary> ![9cca6edc](/v1/image/9cca6edcd2d4701ba9391a06e5d04c5860aa12d13b3e94db7be8570c491ee5c3) ### Visual Description ## Symbol/Logo: Geometric Black and White Interlocking Pattern ### Overview The image displays a square, monochromatic (black and white) geometric symbol or logo. It consists of interlocking, angular shapes that create a maze-like or labyrinthine pattern. There is no textual information, data, charts, or diagrams present. The image is purely a graphic design element. ### Components/Axes * **Primary Element:** A single, square-shaped graphic. * **Color Palette:** Strictly two-tone: solid black (`#000000`) and solid white (`#FFFFFF`). * **Shapes:** The design is composed of thick, rectilinear (right-angled) lines and blocks that interlock. The white shapes form a continuous path or series of connected corridors against the black background, and vice-versa, creating a figure-ground ambiguity. * **Text/Labels:** None present. * **Axes/Legend:** Not applicable. This is not a data visualization. ### Detailed Analysis The design is contained within a perfect square frame. The pattern is asymmetric but balanced. The white element can be interpreted as a single, continuous, winding path that starts near the top-left, moves right, descends, moves left, descends again, and then moves right to exit near the bottom-right. The black areas form complementary shapes. The lines are of uniform, substantial thickness, giving the symbol a bold, solid, and modern appearance. The corners are all sharp 90-degree angles. ### Key Observations 1. **Absence of Data:** The image contains zero textual or numerical information. It is not a chart, graph, diagram, or document. 2. **Visual Style:** The style is minimalist, abstract, and reminiscent of a corporate logo, a maze, a circuit board trace, or a stylized monogram (e.g., potentially representing interlocking letters like 'C' and 'F' or 'G' and 'E'). 3. **Figure-Ground Relationship:** The design plays with perception; the viewer's eye can switch between seeing the white shape as the foreground on a black background or the black shapes as the foreground on a white background. ### Interpretation As this image contains no factual data, trends, or textual content, its "interpretation" is purely aesthetic and symbolic. * **What it Demonstrates:** The image demonstrates principles of graphic design: contrast, balance, negative space, and geometric abstraction. It is a visual mark, not an informational graphic. * **How Elements Relate:** The black and white shapes are perfectly complementary and interdependent; one cannot exist without the other in this composition. Their relationship is symbiotic and creates the overall pattern. * **Notable Anomalies:** The only notable aspect is the complete lack of any extractable textual or data-driven information, which is the primary focus of the extraction task. The image serves as a test case for identifying when a provided visual contains no relevant data to extract. **Conclusion for Technical Documentation:** This image is a graphic symbol with no embedded text, data, labels, or diagrammatic information. For technical purposes, it would be cataloged as a "logo" or "abstract geometric mark" and described by its visual properties (shape, color, style) rather than any informational content. </details> Figure 1: Gedankenexperiment on emergent reasoning. A minimal environment abstracts a reasoning task into its essential components: a constrained space (a maze) and a hidden, optimal solution (to escape). The reader’s own intuition immediately grasps the task, yet a standard predictive model trained on random walk trajectories (i.e., non-intelligent data without rewards) will never discover it. This work provides a physical explanation for this leap. We introduce mind-tuning, a simple principle that balances next-token prediction against future path-entropy maximization with a temperature-like parameter $\lambda$ . To our knowledge, mind-tuning is the minimal implementation of the Maximum Caliber (MaxCal) principle [13, 14, 15] compatible with autoregressive training. It reveals the emergence of a fragile metastable phase, within a narrow temperature window between imitation and hallucination regimes, that is reminiscent of intuition. While our intuition mechanism points toward a horizon of diverse futures to explore, the prevailing paradigm remains blind, fixated only on predicting the next token. Constrained path-entropy maximization is already implicit in intrinsic motivation frameworks [16] like Causal Entropic Forces [17], Active Inference [18], Empowerment [19], or the Maximum Occupancy Principle [20]. Yet, a physical basis for such emergent behavior in pure predictive models has remained elusive. The metastable regime reported here, bounded by distinct, entropy and energy-driven transitions with strong hysteresis and multistability, explains that emergent reasoning is both rare and protocol-dependent. Furthermore, the high-dimensional mechanisms behind this phenomenology are captured analytically by a low-dimensional theory. This perspective casts intelligence as a state of computational matter [21], building on a rich history of minimal models for emergent cognitive behavior, from Hopfield’s memory [22] and Kuramoto’s synchronization [23] to phenomena in deep learning like double-descent [24], grokking [25], neural collapse [26], symmetry-breaking [27], and collective learning [28], often analyzed through spin-glass analogies [29] and phase diagrams [30, 28]. The phase-transition picture is key to research showing that intelligent systems may operate near a critical point, at the “edge of chaos” [31, 32, 33]. At criticality, fluctuations and system responsiveness peak [31, 34], creating the ideal conditions for the leap from mimicry to insight. In the learning problem, our theory points toward a critical scaling axis driven by the system’s intrinsic dynamics and suggests that current models operate in a suboptimal imitation phase, lacking the intuition that a physical mechanism unlocks. Mind-tuning.— We focus on reasoning problems solvable by generating trajectories $z=(x_{0},a_{0},x_{1},a_{1},\dots)$ . The system’s behavior is governed by a policy $\pi_{\theta,\beta}$ , a neural network with parameters $\theta$ that maps a data history $h_{t}=(x_{0},a_{0},\dots,x_{t})$ to a probability distribution over a discrete set of actions $\mathcal{A}$ via a softmax function $$ \pi_{\theta,\beta}(a_{t}\!\mid\!h_{t})=\frac{e^{\beta\,\ell_{\theta}(h_{t},a_{t})}}{\sum_{a^{\prime}\in\mathcal{A}}e^{\beta\,\ell_{\theta}(h_{t},a^{\prime})}}, \tag{1} $$ where $\ell_{\theta}$ are the network’s output logits and $\beta$ controls the policy’s stochasticity. This general setting includes state-decision spaces, standard autoregressive models where histories contain tokens and other representations (see SM Sec. S1 for implementation details). To isolate the intuition mechanism, we assume an offline, imperfect setting [35]: the model never interacts with the environment, has no external rewards, and learns from a dataset $\mathcal{D}$ of non-optimal histories. How can a purely predictive model discover a better solution than what it has seen? By biasing prediction toward futures with high causal path diversity, as prescribed by the Maximum Caliber (MaxCal) principle [13]: among all dynamics consistent with known constraints, prefer those that maximize the entropy of trajectories. The most unbiased learning objective that imposes MaxCal is the free-energy-like functional $$ \mathcal{F}_{\lambda,\beta,\tau}(\theta)=\mathcal{E}_{\beta}(\theta)-\lambda\mathcal{H}_{\tau,\beta}(\theta), \tag{2} $$ where $\lambda\!\geq\!0$ is an effective temperature controlling the energy–entropy trade-off. The first term is the standard Cross-Entropy or negative log-likelihood $(\mathcal{E}$ ), measuring the cost of imitating the training data $$ \mathcal{E}_{\beta}(\theta)=\left\langle-\log\pi_{\theta,\beta}(a_{t}|h_{t})\right\rangle_{(h_{t},a_{t})\in\mathcal{D}}. \tag{3} $$ This energy $\mathcal{E}$ is traded against the causal path-entropy $\mathcal{H}$ , a Shannon entropy of self-generated futures up to a horizon of length $\tau$ $$ \mathcal{H}_{\tau,\beta}(\theta)=\left\langle\frac{1}{\tau}\left\langle-\ln P(z_{\text{future}}|h_{t})\right\rangle_{z_{\text{future}}\sim\pi_{\theta,\beta}}\right\rangle_{h_{t}\in\mathcal{D}}. \tag{4} $$ Eq.(4) is estimated over the cone of futures induced by the model itself (see SM Sec. S2B for entropy calculations), making the objective function inherently subjective and self-referential, as the internal beliefs dynamically shape the learning landscape. The gradient update $$ \theta(t+1)\leftarrow\theta(t)+\eta[{-\nabla_{\theta}\mathcal{E}_{\beta}(\theta)}+{\lambda\nabla_{\theta}\mathcal{H}_{\tau,\beta}(\theta)}] \tag{5} $$ frames learning as a competition between prediction and causal entropic forces acting on the system’s degrees of freedom, i.e. the network weights. To our knowledge, this self-contained mechanism is the minimal MaxCal implementation compatible with prevalent offline auto-regressive training. Unlike surprise-minimization [36, 37], here the entropic term rewards keeping plausible futures open, pulling toward the adjacent possible [38], without environment interaction [19, 20, 39]. The framework also admits a Bayesian interpretation [40, 41]: standard auto-regressive training use flat priors. In mind-tuning, instead, the data likelihood filters an optimistic entropic prior over futures with high diversity. Experiments.— We test this principle in the minimal sandbox of the Gedankenexperiment (Fig. 1). A model is trained on constrained random-walk trajectories, which respect the maze walls but contain no intelligent strategies for escaping. Sweeping the parameter $\lambda$ yields a rich phase diagram, with clear transitions in both genotype (Fig. 2 A) and phenotype (Fig. 2 B) metrics. <details> <summary>Figure2.png Details</summary> ![8ff4f0c9](/v1/image/8ff4f0c9ea9ab7b871a4c71c7e4af4137835a6a3d6f8b8f3357e6fd394e1b8a3) ### Visual Description ## [Multi-Panel Scientific Figure]: Analysis of Entropy, Performance Metrics, and Behavioral Strategies in a Parameterized System ### Overview The image is a composite scientific figure containing five panels (A-E). Panels A and B are quantitative plots showing how different metrics change as a function of a parameter λ (lambda) on a logarithmic scale. Panels C, D, and E are schematic diagrams illustrating three distinct behavioral strategies ("Imitation," "Intuition," "Hallucination") within a maze-like environment, likely corresponding to different regimes of the parameter λ. ### Components/Axes **Panel A (Top Chart):** * **Type:** Line chart with scatter data points and an inset line chart. * **X-axis:** Label: `λ` (lambda). Scale: Logarithmic, ranging from approximately `10^-3` to `10^1`. * **Y-axis:** Label: `Metric value`. Scale: Linear, from `0.6` to `1.4`. * **Legend (Top-Left):** * `E_λ (cross-entropy)`: Represented by a dark red line and data points. * `H_λ (path-entropy)`: Represented by a blue line and data points. * **Inset (Bottom-Right of Panel A):** * **Title:** `Fluctuations` * **X-axis:** `λ` (log scale, same range as main plot). * **Y-axis:** `σ` (sigma, likely standard deviation). Scale: Linear, from `0.00` to `0.05`. * **Data:** Two lines corresponding to the main legend colors (red for cross-entropy fluctuations, blue for path-entropy fluctuations). **Panel B (Middle Chart):** * **Type:** Line chart with scatter data points and a shaded region. * **X-axis:** Label: `λ (log scale)`. Scale: Logarithmic, ranging from `10^-3` to `10^2`. * **Y-axis:** Label: `MFPT / WHR (normalized)`. Scale: Linear, from `0.0` to `1.0`. * **Legend (Center-Right):** * `MFPT`: Represented by a black line and data points. * `WHR`: Represented by a red line and data points. * `P_intuition`: Represented by a light blue shaded area. * **Data Series:** * **MFPT (Black):** Starts high (~0.85 at λ=10^-3), shows a sharp dip to a minimum (~0.12) around λ=10^-1, then rises to a plateau (~0.32) for λ > 10^0. * **WHR (Red):** Starts at 0.0 for low λ, begins rising sharply around λ=10^-1, and plateaus near 0.92 for λ > 10^0. * **P_intuition (Light Blue Shaded):** A region that peaks between λ ≈ 3x10^-2 and λ ≈ 2x10^-1, with its maximum height around 0.8 on the y-axis. **Panels C, D, E (Bottom Diagrams):** * **Type:** Schematic diagrams of a maze (black walls on white background) with overlaid red paths. * **Labels (Below each diagram):** * **C:** `Imitation` * **D:** `Intuition` * **E:** `Hallucination` * **Visual Content:** * **C (Imitation):** Shows a single, clean, efficient red path navigating the maze from a start point (left) to an end point (right). * **D (Intuition):** Shows a primary efficient path similar to C, but also includes a faint, grid-like exploration pattern (red lines) in the open area to the right of the maze exit. * **E (Hallucination):** Shows a dense, chaotic network of red paths covering almost the entire maze area, indicating extensive, non-efficient exploration or "hallucinated" paths. ### Detailed Analysis **Panel A Trends:** 1. **Path-Entropy (H_λ, Blue):** Shows a clear sigmoidal increase. It starts low (~0.6) at λ=10^-3, rises steadily through the middle range (λ=10^-2 to 10^0), and saturates at a high value (~1.4) for λ > 10^0. The trend is "upward sloping" before plateauing. 2. **Cross-Entropy (E_λ, Red):** Remains relatively constant and low (~0.83) for λ < 10^-1. It then increases sharply, converging with the path-entropy line at the saturation point (~1.4) for λ > 10^0. The trend is "flat then sharply upward." 3. **Crossover:** The two lines cross at approximately λ ≈ 0.2. Before this point, cross-entropy is higher than path-entropy; after, path-entropy is higher until they converge. 4. **Inset Fluctuations (σ):** Fluctuations in path-entropy (blue) are higher at low λ and decrease. Fluctuations in cross-entropy (red) show a distinct peak around λ ≈ 0.5, indicating a region of high variability in this metric. **Panel B Trends:** 1. **MFPT (Black):** The "Mean First Passage Time" (a performance metric, lower is better) shows a non-monotonic trend. It is high (poor performance) at very low λ, reaches an optimal minimum (best performance) at an intermediate λ (~0.1), and then degrades again (increases) at high λ before stabilizing. 2. **WHR (Red):** The "Wrong Head Ratio" (an error metric, lower is better) is near zero for low λ, indicating correct behavior. It begins to increase dramatically in the same intermediate λ range where MFPT is optimal, and saturates at a high error rate (~0.92) for high λ. 3. **P_intuition (Shaded):** The probability or prevalence of "intuition" is localized to the intermediate λ range (approx. 0.03 to 0.2), peaking where MFPT is lowest and WHR is beginning to rise. **Panel C-E Correspondence:** * **Imitation (C):** Corresponds to **low λ** (left side of plots). Behavior is rigid, efficient, and error-free (low WHR), but may be slow (higher MFPT). Path-entropy is low. * **Intuition (D):** Corresponds to **intermediate λ** (where `P_intuition` peaks). Behavior combines efficient goal-directed paths with some exploratory "intuition," leading to optimal speed (lowest MFPT) but the onset of errors (rising WHR). Path-entropy is increasing. * **Hallucination (E):** Corresponds to **high λ** (right side of plots). Behavior is chaotic, inefficient (high MFPT), and highly erroneous (high WHR). Path-entropy is saturated at its maximum. ### Key Observations 1. **Phase Transition:** There is a clear phase transition around λ ≈ 0.1-1.0. All metrics (entropies, performance, error) change dramatically in this region. 2. **Performance-Error Trade-off:** The optimal performance (minimum MFPT) occurs precisely in the region where errors (WHR) begin to increase, suggesting a fundamental trade-off between speed and accuracy controlled by λ. 3. **Entropy as an Order Parameter:** Path-entropy (H_λ) acts as a smooth order parameter describing the system's transition from ordered (low entropy, imitation) to disordered (high entropy, hallucination). 4. **Fluctuation Peak:** The peak in cross-entropy fluctuations (inset, Panel A) near the transition point is a classic signature of a critical phase transition. ### Interpretation This figure likely describes a **computational or cognitive model** where a parameter λ controls the balance between **exploitation** (following known, efficient paths) and **exploration** (trying new, random paths). * **Low λ (Imitation Regime):** The system rigidly imitates a known optimal policy. This is safe (zero errors, low WHR) and produces low path-entropy, but may be sub-optimal in speed (higher MFPT) if the policy isn't perfectly adapted to the current context. * **Intermediate λ (Intuition Regime):** The system blends exploitation with directed exploration. This "intuitive" exploration allows it to find faster routes (minimizing MFPT) but introduces the risk of errors (increasing WHR). This is the regime of optimal *performance* but not optimal *accuracy*. The shaded `P_intuition` region highlights this critical, adaptive zone. * **High λ (Hallucination Regime):** The system engages in excessive, undirected exploration. This leads to very high path-entropy, poor performance (high MFPT due to wasted effort), and a very high error rate (high WHR), as the system "hallucinates" paths that do not lead to the goal. **The Peircean Insight:** The data suggests λ modulates the system's **abductive reasoning** capacity. Low λ relies on deduction from a fixed rule (imitation). Intermediate λ allows for creative abduction—forming useful hypotheses about new paths (intuition). High λ degenerates into random guessing or overfitting to noise (hallucination). The trade-off between MFPT and WHR is the cost of this creative process. The figure argues that peak performance in complex environments requires operating near a critical point of controlled disorder, where exploration is maximized just before it becomes counterproductive chaos. </details> Figure 2: Experimental phase diagram. Sweeping $\lambda$ reveals three behavioral phases. (A) Genotype metrics: Cross-Entropy ( $\mathcal{E}$ ) and causal path-entropy ( $\mathcal{H}$ ). Inset: steady-state fluctuations $\sigma$ over different initial realizations depending on $\lambda$ . (B) Phenotype metrics: Mean First Passage Time (MFPT), Wall Hit Ratio (WHR) and intuition likelihood (see SM Sec. 4B). (C-E) Example trajectories for each phase: (C) Imitation, (D) Intuition, and (E) Hallucination. For low $\lambda$ , the system is in an imitation phase: cross-entropy is low, path-entropy is low, and trajectories reproduce the suboptimal random walks from the data, leading to a high Mean First Passage Time (MFPT) to the exit (Fig. 2 C). For high $\lambda$ , the entropic term dominates and the system enters a hallucination phase: cross- and path-entropy are high; maze rules are broken to maximize path diversity, and the Wall Hit Ratio (WHR) increases sharply (Fig. 2 E). Between these two regimes lies a narrow intuition phase, where the trade-off between $\mathcal{E}$ and $\mathcal{H}$ yields an emergent strategy: the model discovers the shortest legal path to the exit (Fig. 2 D), achieving minimal MFPT with zero WHR. The separation between the fluctuation peaks of $\mathcal{E}$ and $\mathcal{H}$ (Fig. 2 A inset) reveals distinct entropy- and energy-driven phase boundaries. <details> <summary>Figure3.png Details</summary> ![3c682dd7](/v1/image/3c682dd78b40bdb28e81ea39c3275f378e144ab9b1faf0963172f63284147ebc) ### Visual Description ## Multi-Panel Scientific Plot: Metric Analysis vs. Parameter λ ### Overview The image is a composite figure containing four distinct plots (labeled A, B, C, D) that analyze the behavior of various metrics as a function of a parameter λ (lambda). The x-axis for all main plots is λ, presented on a logarithmic scale. The plots compare "forward" and "backward" processes or directions for different metrics. Insets in panels A and D show "Random baseline" comparisons. ### Components/Axes * **Panel A (Top):** * **Y-axis:** "Metric values" (linear scale, range ~0.4 to 1.4). * **X-axis:** λ (logarithmic scale, range 10⁻² to 10²). * **Legend (Top-Left):** * `ελ forward` (Solid dark red line, right-pointing triangle markers) * `Hλ (fwd)` (Solid blue line, right-pointing triangle markers) * `ελ backward` (Dashed dark red line, left-pointing triangle markers) * `Hλ (bwd)` (Dashed blue line, left-pointing triangle markers) * **Inset (Bottom-Right):** Titled "Random baseline". Y-axis range ~0.5 to 1.0. X-axis is λ (10⁻² to 10²). Shows two shaded bands (red and blue) representing baseline distributions. * **Panel B (Middle-Left):** * **Y-axis:** "MFPT / WHR (fwd)" (linear scale, range 0.00 to 1.00). * **X-axis:** λ (logarithmic scale, range 10⁻² to 10²). * **Legend (Center-Right):** * `MFPT` (Solid black line, no markers) * `WHR` (Solid red line, right-pointing triangle markers) * **Annotation:** A light cyan vertical shaded band spans approximately λ = 10⁻¹ to 10⁰. * **Panel C (Middle-Right):** * **Y-axis:** "MFPT / WHR (bwd)" (linear scale, range 0.00 to 1.00). * **X-axis:** λ (logarithmic scale, range 10⁻² to 10²). * **Legend (Center-Right):** * `MFPT` (Dashed black line, no markers) * `WHR` (Dashed red line, left-pointing triangle markers) * **Panel D (Bottom):** * **Y-axis:** "⟨w⟩ (mean weight)" (linear scale, range 0.0 to 1.2). * **X-axis:** λ (logarithmic scale, range 10⁻² to 10²). * **Legend (Top-Right):** * `forward` (Solid black line, right-pointing triangle markers) * `backward` (Dashed black line, left-pointing triangle markers) * **Inset (Bottom-Left):** Titled "Random baseline". Y-axis range 0 to 1. X-axis is λ (10⁻² to 10²). Shows a single shaded grey band. ### Detailed Analysis **Panel A: Metric Values (ελ and Hλ)** * **Trend Verification:** * `ελ forward` (Solid Red): Starts ~0.85, remains relatively flat with minor fluctuations until λ ≈ 10⁰, then increases sharply, plateauing near 1.4 for λ > 10¹. * `Hλ (fwd)` (Solid Blue): Starts lower at ~0.55, increases steadily across the entire λ range, approaching the 1.4 plateau near λ=10¹. * `ελ backward` (Dashed Red): Follows a similar shape to its forward counterpart but is shifted to the left (lower λ). It begins its sharp rise earlier, around λ=10⁻¹. * `Hλ (bwd)` (Dashed Blue): Also shifted left relative to its forward counterpart. It rises steeply between λ=10⁻² and 10⁻¹, reaching the plateau earlier than the forward Hλ. * **Key Data Points (Approximate):** * At λ=10⁻²: ελ fwd/bwd ≈ 0.85; Hλ fwd/bwd ≈ 0.55. * At λ=10⁰: ελ fwd ≈ 0.9, ελ bwd ≈ 1.3; Hλ fwd ≈ 1.05, Hλ bwd ≈ 1.35. * At λ=10¹: All four series converge near 1.4. * **Inset (Random Baseline):** Shows that for a random baseline, both metrics (red and blue bands) transition from lower to higher values around λ=10⁰, but the transition is less sharp and the final plateau is lower (~1.0) compared to the main plot. **Panels B & C: MFPT and WHR (Forward vs. Backward)** * **Trend Verification:** * **MFPT (Black lines):** In both forward (B, solid) and backward (C, dashed) plots, MFPT starts high (~1.0), drops sharply as λ increases from 10⁻² to ~10⁻¹, reaches a minimum, and then shows a slight, noisy recovery for higher λ. The minimum occurs within or near the cyan shaded region in B. * **WHR (Red lines):** In both plots, WHR starts near 0.0, remains low until λ ≈ 10⁻¹, then increases sharply, plateauing near 1.0 for λ > 10⁰. The rise is slightly more abrupt in the backward plot (C). * **Key Data Points (Approximate):** * At λ=10⁻²: MFPT ≈ 1.0; WHR ≈ 0.0. * At λ=10⁻¹ (within cyan band in B): MFPT ≈ 0.1-0.2; WHR begins its rise from ~0.0. * At λ=10⁰: MFPT ≈ 0.2-0.3; WHR ≈ 0.9-1.0. * At λ=10¹: MFPT ≈ 0.3; WHR ≈ 1.0. **Panel D: Mean Weight ⟨w⟩** * **Trend Verification:** * `forward` (Solid Black): Starts near 1.0, remains stable with minor fluctuations until λ ≈ 10¹, then drops precipitously to near 0.0. * `backward` (Dashed Black): Starts near 1.0, begins a steady decline earlier (around λ=10⁻¹), drops sharply to near 0.0 by λ=10⁰, and remains there. * **Key Data Points (Approximate):** * At λ=10⁻²: Both series ≈ 1.0. * At λ=10⁰: Forward ≈ 1.1, Backward ≈ 0.05. * At λ=10¹: Forward begins its sharp drop from ~1.2; Backward ≈ 0.0. * At λ=10²: Both series ≈ 0.0. * **Inset (Random Baseline):** Shows the mean weight for a random baseline starts near 1.0 and drops to near 0.0 around λ=10⁰, a transition point earlier than the forward process in the main plot but similar to the backward process. ### Key Observations 1. **Directional Asymmetry:** A consistent theme is that "backward" processes (dashed lines) undergo their characteristic transitions at lower λ values than their "forward" counterparts (solid lines). This is evident in all panels (A, C, D). 2. **Convergence at High λ:** In Panel A, all four metric series converge to the same high value (~1.4) for large λ. In Panels B and C, WHR converges to ~1.0, while MFPT stabilizes at a low but non-zero value. 3. **Critical Transition Zones:** The plots suggest critical λ ranges where system behavior changes dramatically: * λ ≈ 10⁻¹ to 10⁰: MFPT minimum and WHR rise (Panels B, C); Backward mean weight collapses (Panel D). * λ ≈ 10⁰ to 10¹: Forward metrics (ελ, Hλ) rise to plateau (Panel A); Forward mean weight collapses (Panel D). 4. **Baseline Comparison:** The insets show that the observed trends in the main plots deviate significantly from a "Random baseline," particularly in the sharpness of transitions and the final plateau values (e.g., Panel A inset vs. main plot). ### Interpretation This figure likely analyzes the performance or state of a system (e.g., a neural network, optimization process, or dynamical system) as a regularization or control parameter λ is varied. The "forward" and "backward" labels may refer to training vs. inference, two different algorithmic directions, or perturbation directions. * **What the data suggests:** The system undergoes phase-transition-like changes. At low λ, it appears to be in one state (high MFPT, low WHR, high mean weight). As λ increases, it transitions to a different state (low MFPT, high WHR, low mean weight). The "backward" process is more sensitive, transitioning at lower λ. * **Relationship between elements:** The metrics are correlated. The collapse of mean weight (Panel D) coincides with the rise of WHR and the fall of MFPT (Panels B, C). The rise of ελ and Hλ (Panel A) may represent increasing error or entropy as the system is pushed into a new regime by larger λ. * **Notable anomalies:** The slight recovery of MFPT at high λ (Panels B, C) is interesting and may indicate a secondary effect or noise. The fact that forward and backward mean weights collapse at different λ values (Panel D) is a key finding, suggesting hysteresis or path-dependence in the system's response to λ. * **Underlying message:** The parameter λ acts as a strong control knob. There is a critical region (around λ=1) where the system's fundamental behavior changes. The direction of traversal (forward/backward) matters significantly, indicating the system's landscape is non-symmetric or has memory. The deviation from the random baseline confirms the observed phenomena are specific to the structured system under study. </details> Figure 3: Hysteresis and protocol-dependence. Comparing a forward (solid) and backward (dashed) sweep of $\lambda$ reveals that the intuitive state is stable once found. (A) Hysteresis loop in genotype metrics ( $\mathcal{E},\mathcal{H}$ ). (B, C) Phenotype for the forward and backward sweeps, respectively, with the forward sweep showing a wider intuition window. (D) The mean network weight $\langle w\rangle$ acts as an order parameter capturing the system’s bistability. Insets show baselines without protocol. Operationally, this critical learning phase maximizes future path-entropy with minimal cross-entropy, enabling novel, goal-directed behavior at inference without interaction or explicit rewards. Reaching this phase depends on data quality and model complexity, requiring a sufficiently large future horizon and adequate model capacity (see SM Sec. S3 for a parametric study). The fragility of the mechanism is tied to multistability, as observed when applying adiabatic protocols that smoothly sweep the control parameter $\lambda$ (Fig. 3). A large hysteretic loop appears in the genotype metrics (A), which has behavioral consequences in the phenotype: while a forward sweep from $\lambda\approx 0$ opens the intuition window, with low MFPT and low WHR (B), a backward sweep starting from high $\lambda$ does not reach the desired phase (C). The bistability is captured by an effective order parameter –the mean network weight– which remains in an ordered intuitive state once the system has been guided there (D). The adiabatic protocol shows that a self-referential fine-tuning from imitation to controlled imagination allows the system to stabilize in a metastable phase, a process that motivates the term mind-tuning. Effective theory.— The phenomenology of mind-tuning emerges from a high-dimensional, multistable free-energy landscape. We capture the essential mechanism in a scalar order parameter $m\in[0,1]$ , representing the model’s rationality, and define a Boltzmann policy with an effective potential $U_{m}(a)$ : $$ p_{m,\beta}(a|h_{t})=\frac{e^{-\beta U_{m}(a)}}{\sum_{a^{\prime}\in\mathcal{A}}e^{-\beta U_{m}(a^{\prime})}}. \tag{6} $$ Actions, or decisions, are classified into optimal $a^{*}$ , rational-but-suboptimal $a^{r}$ , and non-rational $a^{n}$ and $m_{D}$ is a free parameter representing the training data’s rationality. The effective costs, $$ \displaystyle U(a^{*}) \displaystyle=0, \displaystyle U(a^{r}) \displaystyle=\frac{\max(0,m-m_{D})}{1-m_{D}}, \displaystyle U(a^{n}) \displaystyle=m, \tag{7} $$ are designed to create a trade-off: as the model’s rationality $m$ improves beyond the data’s, the cost of suboptimal-but-legal actions grows, forcing a choice between true optimality and rule-breaking. For the simple Markovian maze with a small state-space, the free energy $\mathcal{F}_{\lambda}(m)$ can be computed analytically (see SM Sec. S4A). For a given $\lambda$ , one can also explore the learning dynamics in this landscape by sampling rationality states $m$ from the equilibrium distribution $P(m)\propto e^{-\hat{\beta}\mathcal{F}(m)}$ , where the inverse temperature $\hat{\beta}$ controls the exploration-exploitation trade-off, modeling stochasticity during gradient descent. This effective theory qualitatively reproduces the experimental phase diagram, including the transitions in both genotypic (Fig. 4 A) and phenotypic metrics (Fig. 4 B). The underlying mechanism is revealed by exploring the minima of the free-energy landscape, found by solving $\partial\mathcal{F}_{\lambda}(m)/\partial m=0$ . This analysis confirms a smooth, entropy-driven transition followed by an abrupt, first-order energy-driven one, creating a bistable region where intuition ( $m>m_{D}$ ) and hallucination ( $m\ll m_{D}$ ) coexist (Fig. 4 C). Intriguingly, the theory further predicts a more elusive inspiration phase: a third stable solution with $m\approx 1$ , associated to a state of true creative insight. This strategy abruptly departs from data and represents internalized understanding. Unlike the subtle intuitive state, which often requires a high inference $\beta$ to be executed without error, this inspired solution would be robust even with a noisy policy. Yet, it is hidden within a tiny basin of attraction masked by the dominant hallucination phase (see SM Sec. S4.C). These predictions point to a very rich phase diagram, where intuition may be the trigger of even more exotic phenomena. <details> <summary>Figure4.png Details</summary> ![fe39684a](/v1/image/fe39684a26ae5b97b2097966fd60316e45102df1f6929cc47391b47a8de906b5) ### Visual Description ## Multi-panel scientific figure with line graphs and inset ### Overview The image is a three-panel scientific figure (labeled A, B, C) displaying various metrics plotted against a parameter λ (lambda) on a logarithmic x-axis. The figure appears to analyze the behavior of a system, likely related to decision-making, rationality, or information theory, as λ varies over six orders of magnitude (10⁻³ to 10³). Each panel shows different but related quantities, suggesting a phase transition or critical point around λ ≈ 10⁰ (λ=1). ### Components/Axes **Common Elements:** * **X-axis (All Panels):** Labeled "λ" (lambda). Scale is logarithmic, with major tick marks at 10⁻³, 10⁻², 10⁻¹, 10⁰, 10¹, 10², 10³. * **Panel Labels:** Large, bold letters "A", "B", "C" in the top-left corner of each respective panel. **Panel A:** * **Y-axis:** Labeled "Mean metric values". Linear scale from 0.8 to 1.4. * **Legend (Top-Left):** * Solid dark red line: `⟨E_λ⟩ (cross-entropy)` * Dash-dot blue line: `⟨H_λ⟩ (path-entropy)` * **Inset Plot (Top-Right of Panel A):** * **Title:** "Critical points" * **Y-axis:** Labeled "First derivative". Linear scale from 0.0 to 0.6. * **X-axis:** Labeled "λ". Logarithmic scale matching the main plot. * **Legend (Inset, Top-Right):** * Solid dark red line: `E` * Dash-dot blue line: `H` **Panel B:** * **Y-axis:** Labeled "MFPT / WHR (normalized)". Linear scale from 0.0 to 1.0. * **Legend (Center-Left):** * Solid black line: `MFPT` * Dashed red line: `WHR` * Light cyan filled area: `p_intuition` **Panel C:** * **Y-axis:** Labeled "Rationality minima (m)". Linear scale from 0.0 to 1.0. * **Legend (Bottom-Left):** * Solid black line: `m* (global)` * Dashed black line: `m** (2nd)` * Dotted black line: `m*** (3rd)` * Dash-dot purple line: `m_d = 0.7` ### Detailed Analysis **Panel A: Mean Metric Values** * **Trend Verification:** Both the cross-entropy (`⟨E_λ⟩`) and path-entropy (`⟨H_λ⟩`) lines show a sigmoidal transition. They start at a low, stable plateau for small λ, undergo a sharp increase centered around λ ≈ 10⁰, and reach a high, stable plateau for large λ. * **Data Points (Approximate):** * For λ < 10⁻²: `⟨E_λ⟩` ≈ 1.00, `⟨H_λ⟩` ≈ 0.82. * Transition Region (10⁻¹ < λ < 10¹): Both metrics rise sharply. The path-entropy (`⟨H_λ⟩`) begins its ascent slightly earlier (around λ=10⁻¹) than the cross-entropy. * For λ > 10¹: Both metrics plateau. `⟨E_λ⟩` ≈ 1.38, `⟨H_λ⟩` ≈ 1.39. The lines nearly converge at the high plateau. * **Inset - Critical Points (First Derivatives):** * The first derivative of both metrics shows a pronounced peak, confirming the location of the steepest change (the critical point). * The peak for `E` (cross-entropy derivative) is sharper and higher, reaching ~0.65 at λ ≈ 10⁰. * The peak for `H` (path-entropy derivative) is broader and lower, with a maximum ~0.55 at a similar λ. A smaller, secondary hump in the `H` derivative is visible around λ ≈ 10⁻¹. **Panel B: MFPT, WHR, and Intuition** * **Trend Verification:** The Mean First Passage Time (`MFPT`) and Weakly Harmonic Ratio (`WHR`) show an inverse relationship across the transition. The `p_intuition` area highlights the region of transition. * **Data Points (Approximate):** * **MFPT (Solid Black):** Starts at a normalized value of 1.0 for λ < 10⁻². Drops sharply between λ=10⁻² and λ=10⁻¹ to a minimum of ~0.18. It then rises to a local peak of ~0.48 at λ ≈ 10⁰ before settling to a stable value of ~0.40 for λ > 10¹. * **WHR (Dashed Red):** Starts near 0.0 for λ < 10⁻¹. Begins a sharp rise around λ=10⁰, crossing the MFPT line near λ=1. It continues to increase, approaching a plateau of ~1.0 for λ > 10¹. * **p_intuition (Cyan Area):** This probability distribution is zero for very small and very large λ. It forms a broad peak between λ ≈ 10⁻² and λ ≈ 10⁰, with its maximum density occurring around λ ≈ 10⁻⁰.⁵ (approx. 0.3). This area visually encapsulates the region where MFPT is low and WHR is beginning to rise. **Panel C: Rationality Minima** * **Trend Verification:** This panel tracks the location (`m`) of minima in a rationality landscape. The global minimum (`m*`) shifts dramatically, while higher-order minima (`m**`, `m***`) appear and disappear. * **Data Points (Approximate):** * **m* (Global, Solid Black):** For λ < 10⁻¹, it is constant at ~0.70. It begins to rise slightly, reaching ~0.77 at λ=10⁰. Immediately after λ=10⁰, it drops precipitously, approaching 0.0 for λ > 10¹. * **m_d (Dash-dot Purple):** A constant reference line at m = 0.7. * **m** (2nd, Dashed Black):** Appears abruptly at λ ≈ 10⁰, starting at a low value (~0.1) and rising sharply to join a plateau at ~0.94 for λ > 10¹. * *** (3rd, Dotted Black):** Appears at a higher λ (≈ 10¹), starting near ~0.8 and showing a slight downward trend. ### Key Observations 1. **Coordinated Phase Transition:** All three panels indicate a major system transition centered at λ ≈ 1 (10⁰). This is evidenced by the sharp rise in entropy metrics (A), the crossover of MFPT and WHR (B), and the dramatic shift in the global rationality minimum `m*` (C). 2. **Critical Point Confirmation:** The inset in Panel A provides mathematical confirmation of the critical point via peaks in the first derivatives of the entropy metrics. 3. **Intuition Region:** Panel B's `p_intuition` area suggests that "intuitive" processing is most probable in the pre-transition regime (λ < 1), where MFPT is low but WHR has not yet risen. 4. **Emergence of Complexity:** Panel C shows that for λ > 1, the rationality landscape becomes more complex, with the emergence of distinct second (`m**`) and third (`m***`) order minima, while the global minimum (`m*`) vanishes toward zero. ### Interpretation The data collectively suggests a **phase transition in a decision-making or search process** governed by the parameter λ. λ likely represents a trade-off parameter, such as the balance between exploration and exploitation, or between intuitive and deliberative processing. * **Low λ (λ << 1):** The system is in an "intuitive" or "fast" state. Path-entropy is low, Mean First Passage Time (MFPT) is high (suggesting slow, direct paths?), and the rationality landscape has a single, stable global minimum (`m* ≈ 0.7`). * **Critical λ (λ ≈ 1):** The system undergoes a rapid reorganization. Entropy increases sharply, indicating greater disorder or exploration. MFPT drops to a minimum (fastest passage?), while the Weakly Harmonic Ratio (WHR) begins to rise, possibly indicating a shift towards more structured, harmonic behavior. The probability of "intuitive" processing (`p_intuition`) peaks here. * **High λ (λ >> 1):** The system settles into a "rational" or "deliberative" state. Entropy metrics plateau at a high value. WHR dominates over MFPT. Crucially, the rationality landscape changes fundamentally: the original global minimum disappears (`m* → 0`), and new, higher-order minima (`m**`, `m***`) emerge at different `m` values. This implies that the optimal strategy or solution (`m`) has fundamentally changed. **In essence, the figure illustrates how tuning a single parameter (λ) can drive a system from a simple, intuitive regime through a critical point of maximum change, into a complex, rational regime with a restructured solution landscape.** The transition is not smooth but sharp, characteristic of a phase transition. </details> Figure 4: Theoretical predictions. The low-dimensional model reproduces the experimental findings. (A) Theoretical $\mathcal{E}$ and $\mathcal{H}$ vs. $\lambda$ . (B) Corresponding MFPT and WHR. (C) Minima of the free-energy landscape vs the control parameter $\lambda$ . The plot reveals coexisting stable states ( $m^{*},m^{**},m^{***}$ ) and a first-order transition where the global minimum jumps discontinuously, explaining the observed hysteresis. Accessing these different cognitive phases requires navigating a complex landscape. Indeed, the observed hysteresis and the success of the adiabatic protocol are explained by this multi-stability. The analytical phase-diagram (Fig. 4 C) shows that slowly increasing $\lambda$ is a safe route to guide the system into the intuition basin of attraction. In Bayesian terms, it first grounds the model with the data likelihood before introducing the entropic prior. Reaching more exotic phases in the landscape, like the predicted inspiration state, would likely demand more complex, non-equilibrium protocols. Discussion.— High-quality human data can carry an implicit drive toward path diversity, and optimization itself can induce entropic pressures that improve generalization [42], yielding an “intelligent simulator” from curated experience. This view predicts that current models should spontaneously increase their causal path entropy with scale. Our framework makes this drive explicit and grounded in MaxCal, providing a shortcut to intuition that encodes implicit search into model weights to reduce the need for expensive search at inference [43]. These results point toward a hidden axis, training-time imagination, that may be key to unlock out-of-distribution generalization in offline predictive models [35]. Our results are demonstrated in a minimal sandbox, a choice that is deliberate. The maze is the simplest non-trivial setting where the mechanism can be isolated and reproduced analytically. Many reasoning tasks can be viewed as navigation through a “conceptual maze” where a key insight unlocks a vastly larger state-space [17, 19, 20, 21]. This argument promises applications in control [17, 20], reasoning [8, 44], and planning [44]. Stefan Zweig’s The Royal Game [45] provides a compelling literary analogue: a prisoner achieves chess mastery by first studying games (imitation) and then playing against himself in his mind (imagination). His triumph occurs at the edge of madness, a state mirroring intuition coexisting with hallucination in our phase diagram. Yet, scaling mind-tuning to real-world cases faces significant challenges. Computationally, estimating path-entropy for long horizons is hard due to the combinatorial explosion of futures [13]. This requires designing clever sampling strategies [17, 46], perhaps inspired by dreaming, hierarchical reasoning [44] and unconventional methods and architectures [47, 48]. Theoretically, a full characterization of the phase diagrams and universality classe is needed to design optimal tuning protocols [49]. For uncharted domains, identifying the right spaces for entropy maximization can be difficult and the offline theory may need data augmentation from environment interaction [18]. Yet, tuning $\lambda$ for future diversity in practice can turn into an alignment problem, trading benefits for safety [50]. Despite these challenges, this work takes a high-risk, high-reward route to reframing intelligence not merely as compression and computation, but as a physical phenomenon emerging at criticality. Acknowledgments.— The author thanks many colleagues at IFISC and URV for enriching discussions. This work has been partially supported by the María de Maeztu project CEX2021-001164-M funded by the MICIU/AEI/10.13039/501100011033 and by Programa Maria Goyri URV. ## References - Vaswani et al. [2017] A. Vaswani et al., Attention is all you need, in Adv. in Neural Info. Processing Systems, Vol. 30 (2017). - Kaplan et al. [2020] J. Kaplan et al., Scaling laws for neural language models (2020), arXiv:2001.08361 [cs.LG] . - Hoffmann et al. [2022] J. Hoffmann et al., Training compute-optimal large language models, arXiv preprint (2022), 2203.15556 . - DeepSeek-AI et al. [2025] DeepSeek-AI et al., Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning (2025), arXiv:2501.12948 [cs.CL] . - Shojaee et al. [2024] P. Shojaee et al., The illusion of thinking, arXiv preprint (2024), 2401.00675 . - Brodeur et al. [2024] P. G. Brodeur et al., Superhuman performance of a large language model on the reasoning tasks of a physician, arXiv preprint (2024), 2412.10849 . - Novikov et al. [2025] A. Novikov et al., Alphaevolve: A coding agent for scientific and algorithmic discovery (2025), arXiv:2506.13131 . - Chollet [2019] F. Chollet, On the measure of intelligence, arXiv preprint (2019), 1911.01547 . - Bubeck et al. [2023] S. Bubeck et al., Sparks of artificial general intelligence: Early experiments with gpt-4, (2023), 2303.12712 . - Bender et al. [2021] E. M. Bender et al., On the dangers of stochastic parrots: Can language models be too big?, in Proceedings ACM (2021) pp. 610–623. - Mitchell and Krakauer [2023] M. Mitchell and D. C. Krakauer, The debate over understanding in ai’s large language models, PNAS 120, e2215907120 (2023). - Liang et al. [2022] P. Liang et al., Holistic evaluation of language models, arXiv preprint (2022), 2211.09110 . - Jaynes [1980] E. T. Jaynes, The minimum entropy production principle, Ann. Rev.of Physical Chemistry 31, 579 (1980). - Pressé et al. [2013] S. Pressé, K. Ghosh, J. Lee, and K. A. Dill, Principles of maximum entropy and maximum caliber in statistical physics, Reviews of Modern Physics 85, 1115 (2013). - Dixit et al. [2018] P. D. Dixit et al., Perspective: Maximum caliber is a general variational principle for dynamical systems, The Journal of Chemical Physics 148, 010901 (2018). - Kiefer [2025] A. B. Kiefer, Intrinsic motivation as constrained entropy maximization, arXiv preprint (2025), 2502.02962 . - Wissner-Gross and Freer [2013] A. D. Wissner-Gross and C. E. Freer, Causal entropic forces, Physical Review Letters 110, 168702 (2013). - Wen [2025] B. Wen, The missing reward: Active inference in the era of experience (2025), arXiv:2508.05619 . - Klyubin et al. [2005] A. S. Klyubin, D. Polani, and C. L. Nehaniv, Empowerment: A universal agent-centric measure of control, in 2005 IEEE CEC, Vol. 1 (2005) pp. 128–135. - Ramirez-Ruiz et al. [2024] J. Ramirez-Ruiz et al., Complex behavior from intrinsic motivation to occupy future action-state path space, Nature Communications 15, 5281 (2024). - Friston et al. [2022] K. J. Friston et al., Designing ecosystems of intelligence from first principles, arXiv preprint (2022), 2212.01354 . - Hopfield [1982] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences 79, 2554 (1982). - Kuramoto [1975] Y. Kuramoto, Self-entrainment of a population of coupled non-linear oscillators, in International Symposium on Mathematical Problems in Theoretical Physics (Springer, 1975) pp. 420–422. - Belkin et al. [2019] M. Belkin, D. Hsu, S. Ma, and S. Mandal, Reconciling modern machine-learning practice and the classical bias–variance trade-off, PNAS 116, 15849 (2019). - Power et al. [2022] A. Power et al., Grokking: Generalization beyond overfitting in small neural networks, arXiv (2022), 2201.02177 . - Papyan et al. [2020] V. Papyan, X. Y. Han, and D. L. Donoho, Prevalence of neural collapse during the terminal phase of deep learning training, PNAS 117, 24927 (2020). - Liu et al. [2025] Z. Liu, Y. Xu, T. Poggio, and I. Chuang, Parameter symmetry potentially unifies deep learning theory, arXiv preprint (2025), 2502.05300 . - Arola-Fernández and Lacasa [2024] L. Arola-Fernández and L. Lacasa, Effective theory of collective deep learning, Phys. Rev. Res. 6, L042040 (2024). - Carleo et al. [2019] G. Carleo et al., Machine learning and the physical sciences, Rev. Mod. Phys. 91, 045002 (2019). - Lewkowycz et al. [2020] A. Lewkowycz et al., The large learning rate phase of deep learning: the catapult mechanism (2020), arXiv:2003.02218 [stat.ML] . - Muñoz [2018] M. A. Muñoz, Colloq.: Criticality and dynamical scaling in living systems, R. of Mod. Phys. 90, 031001 (2018). - Zhang et al. [2025] S. Zhang et al., Intelligence at the edge of chaos (2025), arXiv:2410.02536 [cs.AI] . - Jiménez-González et al. [2025] P. Jiménez-González, M. C. Soriano, and L. Lacasa, Leveraging chaos in the training of artificial neural networks (2025), arXiv:2506.08523 [cs.LG] . - Arola-Fernández et al. [2020] L. Arola-Fernández et al., Uncertainty propagation in complex networks: From noisy links to critical properties, Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 023129 (2020). - Levine et al. [2020] S. Levine, A. Kumar, G. Tucker, and J. Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems (2020), arXiv:2005.01643 [cs.LG] . - Heins et al. [2024] C. Heins et al., Collective behavior from surprise minimization, PNAS 121, e2320239121 (2024). - Friston [2010] K. Friston, The free-energy principle: A unified brain theory?, Nature Reviews Neuroscience 11, 127 (2010). - Kauffman [2000] S. A. Kauffman, Investigations (Oxford Univ. Pr., 2000). - Eysenbach and Levine [2022] B. Eysenbach and S. Levine, Maximum entropy rl (provably) solves some robust rl problems (2022), arXiv:2103.06257 [cs.LG] . - Jaynes [1957] E. T. Jaynes, Information theory and statistical mechanics, The Physical Review 106, 620 (1957). - Zdeborová and Krzakala [2016] L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Adv. in Phys. 65, 453–552 (2016). - Ziyin et al. [2025] L. Ziyin, Y. Xu, and I. Chuang, Neural thermodynamics i: Entropic forces in deep and universal representation learning (2025), arXiv:2505.12387 [cs.LG] . - Belcak et al. [2025] P. Belcak et al., Small language models are the future of agentic ai (2025), arXiv:2506.02153 . - Wang et al. [2025] G. Wang et al., Hierarchical reasoning models, arXiv preprint (2025), 2506.21734 . - Zweig [1943] S. Zweig, The Royal Game (Viking Press, 1943). - Aguilar [2022] J. e. a. Aguilar, Sampling rare trajectories using stochastic bridges, Phys. Rev. E 105, 064138 (2022). - Labay-Mora et al. [2025] Labay-Mora et al., Theoretical framework for quantum associative memories, Quantum Science and Technology 10, 035050 (2025). - Brunner et al. [2025] D. Brunner et al., Roadmap on neuromorphic photonics (2025), arXiv:2501.07917 [cs.ET] . - Manzano et al. [2024] G. Manzano et al., Thermodynamics of computations with absolute irreversibility, unidirectional transitions, and stochastic computation times, Phys. Rev. X 14, 021026 (2024). - Arenas et al. [2011] A. Arenas et al., The joker effect: Cooperation driven by destructive agents, J. of Theo. Bio. 279, 113–119 (2011). - Maddison et al. [2017] C. J. Maddison, A. Mnih, and Y. W. Teh, The concrete distribution: A continuous relaxation of discrete random variables, in ICLR (2017) 1611.00712 . - Williams [1992] R. J. Williams, Simple statistical gradient-following algorithms for connectionist rl, ML 8, 229 (1992). Supplementary Material for: “Intuition emerges in Maximum Caliber models at criticality” ## Appendix A S1. Experimental Setup and Hyperparameters The experimental setting is a minimal yet non-trivial environment for testing emergent reasoning. It consists of a deterministic $24\times 24$ maze with periodic boundary conditions, where an agent must find the path to a designated exit. This controlled testbed provides a tractable state space for analyzing the learning dynamics. The agent’s behavior is determined by a policy network that maps the current state (2D position $x_{t}$ ) to a probability distribution over the four cardinal actions: $\mathcal{A}=\{\text{Up, Down, Right, Left}\}$ . For auto-regressive training, a simple deterministic function $f(x_{t},a)$ maps the last action to the next state. The training dataset $\mathcal{D}$ is intentionally non-optimal. In our main experiments, it contains $N=100$ trajectories, each of length $T=60$ steps, generated by a constrained random walks. These walkers respect the maze walls (i.e., never collide with them) but otherwise move randomly, exhibiting no goal-directed behavior. This design ensures that the optimal exit strategy is not present in the training data, forcing the model to discover it. The model parameters $\theta$ are optimized by minimizing the free-energy functional $\mathcal{F}_{\lambda,\beta,\tau}(\theta)$ (Eq. (2) in the main text) via the Adam optimizer. The results presented in the main text (Fig. 2) are averaged over 20 independent training runs, each with a different random weight initialization, to ensure statistical robustness. The key hyperparameters used in the main experiments are: a policy network structured as a multi-layer perceptron (MLP) with one hidden layer of 128 neurons and ReLU activation; a learning rate of $1\times 10^{-3}$ ; 300 training epochs per $\lambda$ value; and a future horizon of $\tau=40$ steps in the entropy calculation. The policy stochasticities are set to $\beta=1$ for training, $\beta=5$ for entropy calculation (imagination), and $\beta=10$ at inference time. A high imagination $\beta$ (compared to the training $\beta$ ) is beneficial for discovering hidden solutions that maximize causal entropy (i.e., finding the exit) with a finite $\tau$ and sparse data. A high inference $\beta$ is necessary to induce intuitive behavior in practice. In the intuition phase, the agent finds a superior solution but must execute its policy quite deterministically to follow the optimal path in the minimum time. For problems that are not Markovian or where the data representation does not contain full state information (e.g., data are sequences of moves or the agent only sees its local environment), a more advanced neural network is required. Transformers are the standard for modeling long, non-Markovian sequences of tokens. Our framework naturally extends to these sequential autoregressive architectures, albeit at the cost of more parameters and computational effort. Code availability.— PyTorch source code to reproduce the results of this paper is publicly available on GitHub: https://github.com/mystic-blue/mind-tuning. ## Appendix B S2. Calculation of Objective Functionals The mind-tuning objective function $\mathcal{F}_{\lambda,\beta,\tau}(\theta)=\mathcal{E}_{\beta}(\theta)-\lambda\mathcal{H}_{\tau,\beta}(\theta)$ consists of two key terms. Below we detail their calculation. ### B.1 A. Cross-Entropy Estimation The cross-entropy term $\mathcal{E}_{\beta}(\theta)$ , defined in Eq. (3) of the main text, measures the model’s ability to imitate the training data. It is estimated by averaging the negative log-likelihood of the actions taken in the dataset $\mathcal{D}$ given the preceding histories: $$ \hat{\mathcal{E}}_{\beta}(\theta)=\frac{1}{|\mathcal{D}|}\sum_{(h_{t},a_{t})\in\mathcal{D}}[-\log\pi_{\theta,\beta}(a_{t}|h_{t})] $$ where $|\mathcal{D}|$ is the total number of state-action pairs in the training set. This term encourages the policy to assign high probability to the trajectories observed during training. ### B.2 B. Causal Path-Entropy: Analytic Calculation for Markovian Systems For systems with fully-observed, discrete, and reasonably small state spaces $\mathcal{V}$ , such as our maze environment, the path-entropy can be computed analytically. Since the system is Markovian ( $h_{t}=x_{t}$ ), we can define a policy-dependent transition matrix $M_{\pi}$ . The element $(M_{\pi})_{x^{\prime},x}$ gives the probability of transitioning from state $x$ to state $x^{\prime}$ under the current policy $\pi_{\theta,\beta}$ . Specifically, $(M_{\pi})_{x^{\prime},x}=\sum_{a\in\mathcal{A}}\pi_{\theta,\beta}(a|x)\delta_{x^{\prime},f(x,a)}$ , where $f(x,a)$ is the deterministic function that returns the next state. Given a starting state $x_{start}$ , we can compute the probability distribution over future states $\vec{\rho}_{k}$ at any time step $k$ by evolving an initial occupancy vector (a point mass at $x_{start}$ ) via the recursion $\vec{\rho}_{k+1}=M_{\pi}\vec{\rho}_{k}$ . The conditional path-entropy for a trajectory starting at $x_{start}$ is then the time-averaged Shannon entropy of the policy, weighted by the occupancy probability at each future state: $$ \mathcal{H}_{\tau,\beta}(\theta|x_{start})=\frac{1}{\tau}\sum_{k=0}^{\tau-1}\sum_{x\in\mathcal{V}}(\rho_{k})_{x}\left[-\sum_{a\in\mathcal{A}}\pi_{\theta,\beta}(a|x)\log\pi_{\theta,\beta}(a|x)\right]. $$ The total functional $\mathcal{H}_{\tau,\beta}(\theta)$ is the expectation of Eq. (S2) over all starting states in the training dataset $\mathcal{D}$ . This entire calculation is fully differentiable with respect to the network parameters $\theta$ , allowing for efficient gradient-based optimization. This exact method was used to produce all experimental and theoretical results in this work. Its primary computational cost scales with the size of the state space $|\mathcal{V}|$ , making it suitable for our testbed. ### B.3 C. Causal Path-Entropy: Monte Carlo Estimation for High-Dimensional Systems For high-dimensional or continuous state spaces, or for non-Markovian sequence models like Transformers, the analytic approach becomes intractable. In these cases, $\mathcal{H}$ must be estimated via Monte Carlo sampling. For each starting history $h_{start}$ in a training mini-batch, we generate $K$ independent future trajectories (rollouts) of length $\tau$ by autoregressively sampling actions from the policy. The estimator for the path-entropy functional is: $$ \hat{\mathcal{H}}_{\tau,\beta}(\theta)\approx\frac{1}{|\mathcal{B}|}\sum_{h_{start}\in\mathcal{B}}\left(\frac{1}{K\tau}\sum_{k=1}^{K}\sum_{j=0}^{\tau-1}\left[-\ln\pi_{\theta,\beta}(a_{j}^{(k)}|h_{j}^{(k)})\right]_{h_{start}}\right). $$ To ensure that gradients can be backpropagated through the sampling process, especially for discrete action spaces, reparameterization techniques are required. A standard method is the Gumbel-Softmax trick [51], which provides a continuous, differentiable approximation to the sampling procedure. Alternatively, the gradient of the entropic objective can be estimated using policy gradient methods like REINFORCE [52], though this often suffers from high variance. ## Appendix C S3. Parametric Dependencies of the Intuition Phase The emergence of the fragile intuition phase is a critical phenomenon highly sensitive to the model, data, and learning protocol parameters. Below, we detail the key dependencies we investigated. ### C.1 A. Future Horizon $\tau$ The future horizon $\tau$ dictates the timescale of the model’s “imagination”. Our experiments show that the intuition phase only emerges for a sufficiently long horizon (Fig. S1). For a small $\tau$ , the model is myopic; the long-term entropic gain from escaping the maze is not visible, so the model defaults to minimizing cross-entropy and remains in the imitation phase. As $\tau$ increases, the model can foresee the vast expansion of possible futures that awaits outside the maze, creating a strong entropic incentive to find an exit. For intermediate horizons, we often observe a cheating phase—a local minimum in the free-energy landscape where the model learns to take a single illegal step through a wall. This strategy is a compromise: it incurs a small penalty for rule-breaking but gains a significant medium-term entropic advantage. Only for large $\tau$ does the incentive to find a legal path to maximal freedom dominate (i.e., virtue over vice). <details> <summary>FigureS1.png Details</summary> ![13c2eb55](/v1/image/13c2eb55ffa8bc2a6d503e7fe2e9b51bff8c132396b026af3a2ac6adde65393d) ### Visual Description ## Multi-Panel Line Chart: Metric and Performance Analysis Across Lambda (λ) and Tau (τ) ### Overview The image displays a composite figure containing six line charts arranged in a 2x3 grid. The three columns (A, B, C) correspond to different values of a parameter τ (tau): τ=1, τ=20, and τ=40. The top row of charts plots two entropy-based metrics against λ (lambda), while the bottom row plots two normalized performance metrics against the same λ. The x-axis for all charts is λ on a logarithmic scale. ### Components/Axes * **Panels:** Three main vertical panels labeled **A**, **B**, and **C** at the top-left of each column. * **Top Row Charts:** * **Y-axis:** Label is "Metric value". Scale ranges from 0.2 to 1.4. * **X-axis:** Shared with the bottom chart in each panel. Label is "λ (log scale)". Scale is logarithmic, with major ticks at 10⁻³, 10⁻², 10⁻¹, 10⁰, 10¹, 10². * **Legend (Top-Left):** Contains two entries. * Red line: `E_λ(θ*) (cross-entropy)` * Blue line: `H_λ(θ*) (path-entropy)` * **Bottom Row Charts:** * **Y-axis:** Label is "Normalised MFPT / WHR". Scale ranges from 0.0 to 1.0. * **X-axis:** Identical to the top row. Label is "λ (log scale)". * **Legend (Bottom-Left):** Contains two entries. * Black line: `MFPT` * Red line: `WHR` * **Data Representation:** Each chart shows a solid line (likely a mean or trend) overlaid on a scatter of semi-transparent data points (gray for MFPT, light red for WHR, light blue for path-entropy, light red for cross-entropy). ### Detailed Analysis **Panel A (τ=1):** * **Top Chart:** Both `E_λ(θ*)` (red) and `H_λ(θ*)` (blue) start at low values (~0.8 and ~0.6 respectively) for small λ. They exhibit a sharp, simultaneous increase between λ ≈ 10⁻¹ and λ ≈ 10⁰, plateauing together at a value of approximately 1.4 for λ > 10⁰. * **Bottom Chart:** `MFPT` (black) starts high (~1.0) and fluctuates before beginning a steady decline around λ ≈ 10⁻¹, settling near 0.3 for λ > 10⁰. `WHR` (red) starts at 0, begins a sharp increase around λ ≈ 10⁻¹, and plateaus near 1.0 for λ > 10⁰. The crossover point where `WHR` surpasses `MFPT` is near λ ≈ 10⁻⁰·⁵ (~0.3). **Panel B (τ=20):** * **Top Chart:** The increase in both entropy metrics is more gradual and staggered compared to Panel A. `H_λ(θ*)` (blue) begins rising earlier (around λ ≈ 10⁻²) and leads `E_λ(θ*)` (red) through the transition. They converge to the same plateau (~1.4) around λ ≈ 10⁰. * **Bottom Chart:** The decline of `MFPT` and rise of `WHR` are also more gradual. The `MFPT` curve shows a distinct dip and partial recovery between λ ≈ 10⁻¹ and λ ≈ 10⁰ before settling. The crossover occurs slightly later than in Panel A, near λ ≈ 10⁻⁰·³ (~0.5). **Panel C (τ=40):** * **Top Chart:** The separation between the two entropy curves is most pronounced. `H_λ(θ*)` (blue) rises steadily from λ ≈ 10⁻², while `E_λ(θ*)` (red) remains flat until λ ≈ 10⁻¹ before rising steeply to meet the blue line at the plateau (~1.4) near λ ≈ 10⁰·⁵ (~3). * **Bottom Chart:** The trends are similar to Panel B but stretched. A notable feature is a light blue shaded vertical region between approximately λ ≈ 10⁻¹·⁵ and λ ≈ 10⁻⁰·⁵ (0.03 to 0.3). Within this region, `MFPT` drops sharply and `WHR` begins its ascent. The crossover point is near λ ≈ 10⁻⁰·² (~0.6). ### Key Observations 1. **Consistent Plateaus:** In all top charts, both entropy metrics converge to the same maximum value (~1.4) for sufficiently large λ. 2. **Inverse Relationship:** In all bottom charts, `MFPT` and `WHR` exhibit an inverse relationship; as one increases, the other decreases. 3. **Effect of τ:** Increasing τ (from A to C) delays and smoothens the transition region for all metrics. The "activation" of the metrics shifts to higher values of λ. 4. **Transition Sharpness:** The transition is sharpest for τ=1 (Panel A) and becomes progressively more gradual for τ=20 and τ=40. 5. **Data Variability:** The scatter of data points is most pronounced in the transition regions, indicating higher variance in system behavior during the phase change. ### Interpretation This figure likely illustrates the behavior of a stochastic or machine learning system under varying regularization strength (λ) and a time-scale or temperature parameter (τ). * **Entropy Metrics (Top Row):** The increase in both cross-entropy and path-entropy with λ suggests that stronger regularization (higher λ) leads to more exploratory or higher-entropy solutions in the parameter space (θ*). The path-entropy (`H_λ`) appears to be a leading indicator, responding to λ earlier than the cross-entropy (`E_λ`), especially at higher τ. * **Performance Metrics (Bottom Row):** The inverse relationship between Mean First Passage Time (MFPT) and what is likely a Weighted Hitting Rate (WHR) indicates a trade-off. As λ increases, the system finds its target state faster (lower MFPT) and more reliably (higher WHR). This is the desired effect of the regularization. * **Role of τ:** The parameter τ controls the "inertia" or timescale of the system. A higher τ makes the system less sensitive to small λ, requiring a stronger regularization force (higher λ) to push it out of its initial state and into the high-entropy, high-performance regime. The shaded region in Panel C highlights a critical window of λ where this transition is initiated for a high-τ system. * **Overall Narrative:** The data demonstrates a **regularization-induced phase transition**. Below a critical λ, the system is in a low-entropy, high-MFPT, low-WHR state. Above this critical λ, it transitions to a high-entropy, low-MFPT, high-WHR state. The parameter τ modulates the sharpness and location of this transition. The alignment of the entropy increase with the performance improvement suggests that the exploration encouraged by regularization is directly responsible for the enhanced efficiency in reaching the target. </details> Figure S1: Dependence on Future Horizon $\tau$ . Phase diagram of the genotypic (top) and phenotypic (bottom) metrics as a function of $\lambda$ for different future horizons. The intuition window (sharp dip in MFPT and zero WHR, shaded blue) appears and stabilizes only for a long horizon ( $\tau=40$ ). (A) A short horizon ( $\tau=1$ ) yields only imitation and hallucination. (B) An intermediate horizon ( $\tau=20$ ) can lead to a cheating strategy, which is worse than the true intuitive solution (C). ### C.2 B. Model Capacity The capacity of the policy network, controlled by the number of neurons, is relevant (Fig. S2). A model with insufficient capacity has high bias and lacks the representational power to learn the complex, mixed strategy required to balance maze constraints with goal-directed exploration. It cannot simultaneously represent the world model and the entropic drive, so the intuition phase does not emerge. Conversely, a model with excessive capacity relative to the task complexity is prone to overfitting. It may perfectly memorize the noisy random walks from the training data or discover trivial, non-generalizable solutions (e.g., exploiting specific numerical artifacts) to maximize entropy. The intuition phase occupies a “sweet spot” where model capacity is well-matched to the problem, enabling generalization from sparse data rather than mere memorization or unconstrained hallucination. <details> <summary>FigureS2.png Details</summary> ![324bae66](/v1/image/324bae663ca06a01977ef79d76c844d5874dd3d5ccc3a25952223d5c4e9bddce) ### Visual Description ## Multi-Panel Line Charts: Regularization Strength (λ) vs. Model Performance Metrics ### Overview The image displays a set of three multi-panel charts (labeled A, B, C) from a scientific or technical study. Each panel corresponds to a different model "capacity" (8, 64, 128). Within each panel, there are two vertically stacked subplots sharing a common x-axis. The charts plot various performance metrics against a regularization parameter λ on a logarithmic scale. The data is presented as scatter points with overlaid trend lines. ### Components/Axes * **Panels:** Three main panels arranged horizontally, labeled **A**, **B**, and **C** in the top-left corner of each. * **Panel A Title:** `capacity=8` * **Panel B Title:** `capacity=64` * **Panel C Title:** `capacity=128` * **X-Axis (Common to all subplots):** * **Label:** `λ (log scale)` * **Scale:** Logarithmic, ranging from `10^-3` to `10^2`. Major ticks are at `10^-3`, `10^-2`, `10^-1`, `10^0`, `10^1`, `10^2`. * **Top Subplot (Per Panel):** * **Y-Axis Label:** `Metric value` * **Scale:** Linear, from `0.2` to `1.4`. Major ticks at `0.2`, `0.4`, `0.6`, `0.8`, `1.0`, `1.2`, `1.4`. * **Legend (Located in Panel A, top subplot):** * **Dark Red Line/Points:** `ε_λ(θ*) (cross-entropy)` * **Blue Line/Points:** `H_λ(θ*) (path-entropy)` * **Bottom Subplot (Per Panel):** * **Y-Axis Label:** `Normalised MFPT / WHR` * **Scale:** Linear, from `0.0` to `1.0`. Major ticks at `0.0`, `0.2`, `0.4`, `0.6`, `0.8`, `1.0`. * **Legend (Located in Panel A, bottom subplot):** * **Black Line/Points:** `MFPT` * **Red Line/Points:** `WHR` * **Additional Visual Element:** Light blue shaded vertical regions appear in the bottom subplots of panels B and C, roughly between λ = `10^-1.5` and `10^-0.5`. ### Detailed Analysis **Top Subplots (Cross-entropy & Path-entropy):** * **Trend Verification:** * **Cross-entropy (Dark Red):** Starts near `1.0` at low λ. Shows a slight dip or plateau before λ=`10^-1`, then increases sharply, plateauing at approximately `1.4` for λ > `10^0`. * **Path-entropy (Blue):** Starts significantly lower than cross-entropy at low λ (approx. `0.35` in A, `0.55` in B/C). Increases steadily with λ, converging with the cross-entropy line at the high plateau of `~1.4`. * **Capacity Effect:** The initial value of path-entropy at low λ increases with capacity (A: ~0.35, B/C: ~0.55). The transition to the high plateau appears slightly smoother with higher capacity. **Bottom Subplots (MFPT & WHR):** * **Trend Verification:** * **MFPT (Black):** Starts at the maximum normalized value of `1.0` at low λ. Undergoes a sharp, precipitous drop between λ=`10^-2` and `10^-1`, reaching a minimum near `0.1-0.2`. For λ > `10^-1`, it recovers slightly and stabilizes around `0.3-0.4`. * **WHR (Red):** Starts near `0.0` at low λ. Begins a sharp increase at approximately the same λ where MFPT drops (`10^-2` to `10^-1`). It plateaus at a high value for λ > `10^0`. * **Capacity Effect & Key Data Points:** * **Plateau WHR Value:** Increases with capacity. * **Panel A (capacity=8):** WHR plateaus at `~0.5`. * **Panel B (capacity=64):** WHR plateaus at `~0.95`. * **Panel C (capacity=128):** WHR plateaus at `~0.95` (similar to B). * **Transition Region:** The light blue shaded area highlights the λ region (`~0.03` to `~0.3`) where the most dramatic changes in MFPT and WHR occur. This region is present in B and C but not explicitly shaded in A. * **MFPT Minimum:** The lowest point of the MFPT curve occurs within or just after the shaded transition region, reaching values as low as `~0.1` in panels B and C. ### Key Observations 1. **Inverse Relationship:** There is a clear inverse relationship between MFPT and WHR. As λ increases through the critical transition region, MFPT collapses while WHR surges. 2. **Phase Transition:** The data suggests a phase transition in model behavior controlled by λ. Low λ favors high MFPT/low WHR and separated entropy metrics. High λ favors low MFPT/high WHR and converged entropy metrics. 3. **Capacity Saturation:** Increasing capacity from 8 to 64 has a dramatic effect on the final WHR plateau (0.5 to ~0.95). Increasing further to 128 shows minimal additional gain, suggesting a saturation point. 4. **Metric Convergence:** At high λ, both entropy metrics (cross-entropy and path-entropy) converge to the same high value (`~1.4`), indicating a loss of distinction between them in the high-regularization regime. ### Interpretation This figure likely illustrates the effect of a regularization strength parameter (λ) on the optimization dynamics and final solution quality of a machine learning model, possibly in the context of loss landscape analysis or generalization. * **What the data suggests:** The parameter λ acts as a control knob for a trade-off. At low λ (weak regularization), the model finds solutions with high "Mean First Passage Time" (MFPT, possibly related to optimization difficulty or stability) but low "WHR" (a performance or generalization metric). As λ increases, the model undergoes a rapid transition: optimization becomes faster/easier (MFPT drops) and the solution quality improves dramatically (WHR rises), until both metrics plateau. * **How elements relate:** The top plots show that this transition is accompanied by a change in the entropy of the solution path. The convergence of cross-entropy and path-entropy at high λ suggests the regularization forces the model into a simpler, more constrained region of the solution space. The capacity effect shows that larger models can achieve better final performance (higher WHR plateau) but undergo a similar transition. * **Notable Anomalies:** The sharp, almost discontinuous drop in MFPT is striking. The light blue shaded region in B and C explicitly marks this critical λ window where the model's behavior fundamentally changes. The fact that the path-entropy starts higher for larger capacities (B, C vs. A) may indicate that larger models begin in a more complex state before regularization simplifies them. **In summary, the figure demonstrates a regularization-induced phase transition in model behavior, where increasing λ past a critical threshold simultaneously simplifies the solution (converging entropies), accelerates optimization (lower MFPT), and improves performance (higher WHR), with the benefit scaling with model capacity up to a point.** </details> Figure S2: Dependence on Model Capacity. Emergence of the intuition phase as a function of the number of neurons in the hidden layer. (A) A model with insufficient capacity (e.g., 8 neurons) cannot learn the required behavior. The intuition phase is robust for models with sufficient capacity (e.g., 64 (B) or 128 neurons (C)), which are powerful enough to discover the solution but not so powerful that they immediately overfit. ### C.3 C. Maze Complexity We evaluated the framework on several environments of increasing complexity (Fig. S3). In simpler environments (e.g., a straight corridor), the escape task is trivial because the data trajectories are very close to the optimal solution. The intuition window is consequently wide and appears at lower values of $\lambda$ . As maze complexity increases, finding the optimal path becomes a harder constraint-satisfaction problem. The cross-entropy term $\mathcal{E}$ more strongly penalizes deviations from valid paths. To overcome this, a stronger entropic pressure (a higher $\lambda$ ) is required to motivate the search for the distant exit. As a result, the intuition window narrows and shifts in the phase diagram, indicating that a more precise tuning of the energy-entropy balance is needed for more difficult problems. In some cases, the intuition window may disappear entirely, requiring protocols like the adiabatic sweep to be reached. <details> <summary>FigureS3.png Details</summary> ![7f46f1f7](/v1/image/7f46f1f75bcddc1606b4f907778727aa93380cf0e2b8770ce91bf01239091136) ### Visual Description ## Multi-Panel Scientific Figure: Navigation Metrics Across Environments ### Overview The image is a composite scientific figure containing four columns (labeled A, B, C, D) corresponding to four distinct environments: **Corridor**, **Path**, **Maze**, and **Room**. Each column contains three vertically stacked panels: a top heatmap, a middle line graph with an inset, and a bottom line graph. The figure analyzes the relationship between a parameter `λ` (lambda) and various performance or behavioral metrics across these environments. ### Components/Axes **Global Structure:** - **Columns (A-D):** Labeled at the top-left of each column's heatmap. Corresponding environment names are printed below the heatmaps: **Corridor** (A), **Path** (B), **Maze** (C), **Room** (D). - **Rows:** 1. **Top Row:** Heatmaps showing spatial distributions. 2. **Middle Row:** Line graphs plotting "Metric value" vs. `λ (log scale)`. 3. **Bottom Row:** Line graphs plotting "MFPT / WHR (normalized)" vs. `λ (log scale)`. **Middle Row Graphs (All Columns):** | Element | Description | | :--- | :--- | | **Y-axis** | Label: "Metric value". Scale: 0.6 to 1.4. | | **X-axis** | Label: "λ (log scale)". Scale: logarithmic, 10⁻³ to 10². | | **Legend** | Located at bottom-left of each plot. - Red line: `ε_λ(θ*)` - Blue line: `H_λ(θ*)` | | **Inset Graph** | Located at bottom-right of each plot. - Title: "fluctuations" - Y-axis: Label "σ". Scale: 0.00 to 0.05. - X-axis: Label "λ". Scale: linear, matching main plot's range. - Contains red and blue lines corresponding to the main plot's metric fluctuations. | **Bottom Row Graphs (All Columns):** | Element | Description | | :--- | :--- | | **Y-axis** | Label: "MFPT / WHR (normalized)". Scale: 0.0 to 1.0. | | **X-axis** | Label: "λ (log scale)". Scale: logarithmic, 10⁻³ to 10². | | **Legend** | Embedded within plot area. - Black line: **MFPT** (Mean First Passage Time) - Red line: **WHR** (Weighted Hit Rate) - Cyan shaded area: **p_intuition** | | **Data Points** | Scatter points (gray/black for MFPT, red for WHR) overlaid on the lines, indicating individual data samples. | ### Detailed Analysis **Top Row - Heatmaps:** - Each heatmap displays a 2D spatial grid. A central black square (likely an obstacle or start zone) is present in all. - **Color Scale:** Blue to Red/Orange. Blue indicates low values, red/orange indicates high values. - **Panel A (Corridor):** High values (red/orange) are concentrated in a vertical column above the central block, with a diffuse blue region spreading outward. - **Panel B (Path):** High values form a narrow, winding path emanating from the central block. - **Panel C (Maze):** High values are confined to a more complex, maze-like structure around the central block. - **Panel D (Room):** High values are concentrated in a square "room" shape directly above the central block, with minimal spread. **Middle Row - Metric Value vs. λ:** - **Trend for `H_λ(θ*)` (Blue Line):** In all four environments, the blue line shows a consistent **sigmoidal increase**. It starts at a lower value (~0.6-1.0) for small λ (10⁻³), rises steeply between λ ≈ 10⁻¹ and λ ≈ 1, and plateaus at a high value (~1.4) for λ > 1. - **Trend for `ε_λ(θ*)` (Red Line):** The red line shows a more environment-dependent pattern. - **Corridor (A):** Remains relatively flat (~1.1) until λ ≈ 1, then jumps sharply to meet the blue line's plateau. - **Path (B) & Maze (C):** Starts lower (~0.9), increases gradually, then joins the blue plateau around λ ≈ 1. - **Room (D):** Starts high (~1.25), remains flat, then increases to join the plateau. - **Inset "fluctuations" (σ):** The standard deviation (σ) for both metrics is generally low (<0.05). A notable peak in σ for the red line (`ε_λ`) occurs around the transition point (λ ≈ 1) in Corridor, Path, and Maze, indicating higher variability during the phase change. **Bottom Row - MFPT/WHR vs. λ:** - **Trend for MFPT (Black Line):** Shows a **non-monotonic, U-shaped or decreasing-then-stable** pattern. - Starts high (~0.6-1.0) at low λ. - Drops to a minimum around λ ≈ 10⁻¹ to 10⁰. - Rises slightly and stabilizes at a moderate value (~0.3-0.4) for λ > 1. - **Trend for WHR (Red Line):** Shows a **sharp, sigmoidal increase**. - Is near zero for λ < 10⁻¹. - Increases dramatically between λ ≈ 10⁻¹ and λ ≈ 1. - Plateaus near 1.0 for λ > 1. - **Trend for p_intuition (Cyan Shaded Area):** Represents a probability or region of "intuitive" behavior. - Peaks in the intermediate λ range (λ ≈ 10⁻¹ to 10⁰), precisely where MFPT is minimized and WHR is undergoing its sharp increase. - The peak is most pronounced and broad in the Corridor (A) and Path (B) environments. - **Scatter Points:** The gray/black (MFPT) and red (WHR) dots show the spread of individual trial data around the mean lines. The spread is larger at lower λ values, especially for MFPT. ### Key Observations 1. **Phase Transition at λ ≈ 1:** A critical transition occurs around λ = 1 across all metrics and environments. This is where `H_λ` and `ε_λ` plateau, WHR saturates, and MFPT stabilizes. 2. **Environment-Dependent Initial Conditions:** The starting values (at low λ) for `ε_λ` and MFPT vary significantly by environment (e.g., MFPT starts highest in Room, lowest in Maze), reflecting the inherent difficulty or structure of each space. 3. **Optimal Intermediate λ:** The `p_intuition` region and the minimum of MFPT coincide in the intermediate λ range (0.1 - 1), suggesting an optimal parameter zone for efficient navigation. 4. **Consistency of H_λ:** The blue line (`H_λ`) behaves almost identically across all four environments, suggesting it measures a fundamental property that scales uniformly with λ regardless of spatial structure. ### Interpretation This figure likely comes from a study on **navigation, search strategies, or decision-making in structured environments**, where `λ` is a control parameter balancing exploration and exploitation, or cost and reward. - **What the data suggests:** The parameter `λ` acts as a "tuning knob." At low `λ`, the agent exhibits inefficient, high-variance behavior (high MFPT, low WHR). As `λ` increases towards 1, the agent undergoes a behavioral phase transition, rapidly adopting an efficient, "intuitive" strategy (peak `p_intuition`, minimal MFPT, surging WHR). At high `λ`, performance saturates (high, stable WHR and `H_λ`), but may become rigid. - **Relationship between elements:** The top heatmaps visualize the *spatial outcome* of the strategies quantified below. The narrow path in B vs. the room in D explains why the initial `ε_λ` and MFPT differ—the agent's default policy is already more effective in a simple room than a complex path. The middle graphs show the convergence of two information-theoretic or energy-based metrics (`ε_λ`, `H_λ`), while the bottom graphs link this to concrete performance (MFPT, WHR). - **Notable Anomalies/Patterns:** The sharp peak in fluctuation (`σ`) for `ε_λ` at the transition point is a classic signature of a critical phase change, where the system is most sensitive. The near-perfect inverse relationship between the WHR rise and the MFPT fall highlights a direct trade-off between hitting targets quickly and the time taken to find them. **In summary, the figure demonstrates that navigating complex environments optimally requires tuning a parameter `λ` to an intermediate "critical" region, where a transition from exploratory to exploitative behavior occurs, maximizing intuitive efficiency.** </details> Figure S3: Dependence on Maze Complexity. Position and width of the intuition window (measured by MFPT) for environments of varying difficulty. (A,B) For a simple corridor, the window is wide and appears at low $\lambda$ . (C) For the more complex maze used in the main text, the window is narrower, reflecting the increased difficulty. (D) For even more complex problems, the intuition window can disappear, necessitating specific protocols to reach the desired phase. ## Appendix D S4. Detailed Theory and Further Predictions Here we expand on the theory from the main text, providing the explicit analytical forms for the free energy functional. We also clarify the calculation of the intuition likelihood ( $p_{\text{intuition}}$ ) and discuss the existence of a more elusive inspiration phase as a further prediction of the theory. ### D.1 A. The Effective Free Energy Functional For the Markovian maze environment with a small state-space, the terms of the free energy functional $\mathcal{F}_{\lambda}(m)=\mathcal{E}(m)-\lambda\mathcal{H}(m)$ can be computed analytically as a function of the rationality order parameter $m$ . Note that this theory is effective: the $\beta$ of the analytical policy is distinct from the experimental one, since the former controls only three minimal analytical costs while the latter modulates the entire logit vector. The effective cross-entropy, $\mathcal{E}(m)$ , is the expectation of the negative log-likelihood over the state distribution of the given data, $\rho_{\mathcal{D}}(s)$ . For a single state $s$ , the cross-entropy is $E_{s}(m)=\log Z_{m}(s)+\beta\langle U_{m}(a)\rangle_{a\sim\text{uniform}}$ , where the average is over legal moves from $s$ . Summing over the data distribution gives $$ \mathcal{E}(m)=\left\langle\log\left(\sum_{a\in\mathcal{A}}e^{-\beta U_{m}(a|s)}\right)+\beta\frac{\sum_{a^{\prime}\in\mathcal{A}_{\text{legal}}(s)}U_{m}(a^{\prime}|s)}{|\mathcal{A}_{\text{legal}}(s)|}\right\rangle_{s\sim\rho_{\mathcal{D}}}, $$ where $U_{m}(a|s)$ is the cost of action $a$ in state $s$ (which depends on whether the move is optimal, suboptimal, or a wall collision) and $\mathcal{A}_{\text{legal}}(s)$ is the set of valid moves from $s$ . The effective path-entropy, $\mathcal{H}(m)$ , is the time-averaged Shannon entropy of the policy $p_{m,\beta}$ over trajectories of length $\tau$ starting from an initial state distribution $\rho_{0}$ (in our case, a single point at the maze start). It is calculated using the policy-dependent transition matrix $M_{m}$ $$ \mathcal{H}(m)=\frac{1}{\tau}\sum_{k=0}^{\tau-1}\left(\sum_{s\in\mathcal{V}}(\rho_{k})_{s}\cdot h_{m}(s)\right), $$ where $\vec{\rho}_{k}=(M_{m})^{k}\vec{\rho}_{0}$ is the state occupancy vector at time $k$ , and $h_{m}(s)$ is the local policy entropy at state $s$ : $$ h_{m}(s)=-\sum_{a\in\mathcal{A}}p_{m,\beta}(a|s)\log p_{m,\beta}(a|s). $$ These analytical expressions are used to generate the theoretical plots in the main text. ### D.2 B. Calculating the Intuition Likelihood ( $p_{\text{intuition}}$ ) The intuition metric, visualized as the cyan region in the plots, quantifies the model’s ability to spontaneously follow the optimal path. In experiments, this intuition likelihood is measured as the fraction of independent trials where the system displays the optimal solution at inference (minimal MFPT with zero WHR). The same empirical criterion can be applied to the effective theory. More interestingly, the intuition likelihood can also be calculated analytically if the optimal route is known. We define it as the joint probability of generating the true shortest path to the exit, for a horizon of $q$ steps (where $q$ depends on the maze topology). Let the optimal path be the sequence of states $z^{*}=(s_{0}^{*},s_{1}^{*},\dots,s_{q}^{*})$ , where $s_{0}^{*}$ is the starting position, and let $a_{t}^{*}$ be the optimal action to transition from $s_{t}^{*}$ to $s_{t+1}^{*}$ . The intuition likelihood for a given rationality level $m$ is: $$ p_{\text{intuition}}(m)=\prod_{t=0}^{q-1}p_{m,\beta}(a_{t}^{*}|s_{t}^{*}) $$ Since the system can be multistable, the final value reported in the figure for a given $\lambda$ is the Boltzmann-weighted average of this likelihood over all coexisting free energy minima ( $m^{*},m^{**},\dots$ ): $$ \langle p_{\text{intuition}}\rangle_{\lambda}=\sum_{i}w_{i}(\lambda)\cdot p_{\text{intuition}}(m_{i})\quad\text{where}\quad w_{i}(\lambda)=\frac{e^{-\hat{\beta}\mathcal{F}_{\lambda}(m_{i})}}{\sum_{j}e^{-\hat{\beta}\mathcal{F}_{\lambda}(m_{j})}}, $$ where $\hat{\beta}$ is an inverse temperature controlling the sampling of minima. At high $\hat{\beta}$ (low thermal noise), the system predominantly samples the global minimum, reproducing the steady-state results of the main experiments. In the numerical experiments, each run starts from a random weight initialization, and gradient descent acts as a local search that can fall into any of the attracting states. The likelihood metric is therefore zero in the imitation and hallucination phases (where the probability of following the optimal path is negligible) and peaks sharply in the narrow intuition window, provided the policy’s inference $\beta$ is sufficiently high. ### D.3 C. From Intuition to Inspiration: Further Predictions of the Effective Theory The intuition phase represents a significant discovery, where the model finds a hidden, optimal solution that smoothly branches from the data-driven imitation phase. It is a better way, but not a radical departure. Intriguingly, the theory predicts the existence of a distinct, more profound cognitive phase: inspiration. Inspiration is not a continuous improvement but an abrupt jump to a qualitatively different state of insight. This corresponds to the emergence of a new, globally optimal minimum in the free-energy landscape, where the rationality parameter is close to $m\approx 1$ . A model in the inspiration phase does not merely approximate the optimal policy; it knows the solution is correct. This internalized understanding would manifest through a key operational signature: the model could execute the optimal strategy robustly, even with a stochastic policy (low inference $\beta$ ), distinguishing it from the more tentative intuitive state. The theory predicts that the imagination temperature $\beta_{\text{dream}}$ —the policy stochasticity in the entropy term—is a key parameter for accessing these states (Fig. S4). At low $\beta_{\text{dream}}$ , the intuition phase ( $m>m_{D}$ ) is unstable. It emerges in a stable window only for sufficiently large $\beta_{\text{dream}}$ . At even higher values, this stable intuition branch can bifurcate into two locally stable solutions: the familiar intuition phase and this hidden inspiration phase ( $m\approx 1$ ). Both can coexist while the hallucination phase ( $m\ll m_{D}$ ) remains the global attractor. Observing this more exotic inspiration phase in practice would likely require careful tuning protocols, potentially starting from the intuition phase and employing non-equilibrium dynamics. <details> <summary>FigureS4.png Details</summary> ![05b4e66f](/v1/image/05b4e66fe5894e2e539a8a2191d37206e444e45d9f42add6e406ad35f3085ba3) ### Visual Description \n ## Multi-Panel Line Graph Analysis: Entropy Metrics, First-Passage Times, and Rationality Minima ### Overview The image is a composite figure containing three main panels (A, B, C), each displaying a set of three vertically stacked line graphs. The panels compare system behavior under three different values of a parameter labeled `β_dream` (2.0, 5.0, and 20.0). The graphs plot various metrics against a common x-axis, `λ` (log scale), which likely represents a control parameter or inverse temperature. The figure appears to be from a scientific or technical paper analyzing phase transitions, decision-making, or optimization processes. ### Components/Axes **Global Structure:** * **Three Main Panels:** Labeled **A**, **B**, and **C** in the top-left corner of each panel. * **Panel Titles:** Each panel has a title indicating the `β_dream` value: `β_dream = 2.0` (A), `β_dream = 5.0` (B), `β_dream = 20.0` (C). * **Common X-Axis:** All nine subplots share the same x-axis label at the bottom: `λ (log scale)`. The axis is logarithmic, with major tick marks at `10^-4`, `10^-3`, `10^-2`, `10^-1`, `10^0`, `10^1`, `10^2`, `10^3`, `10^4`. * **Vertical Layout per Panel:** Each panel (A, B, C) contains three subplots stacked vertically. **Subplot Details (Common to all panels, y-axes differ):** 1. **Top Subplot:** * **Y-Axis Label:** `Expected Metric Values`. * **Legend (Top-Left):** * Solid dark red line: `⟨ℰ⟩_λ (cross-entropy)` * Dash-dot blue line: `⟨H⟩_λ (path-entropy)` * **Inset Plot (Center-Right):** A smaller plot titled `Critical Points`. * **Y-Axis Label:** `First Derivative`. * **Legend:** Solid dark red line: `ℰ`; Dash-dot blue line: `H`. * **X-Axis:** Same log scale as the main plot (`10^-4` to `10^2`). 2. **Middle Subplot:** * **Y-Axis Label:** `Normalized MFPT/WHR`. * **Legend (Bottom-Right):** * Solid black line: `MFPT` * Dashed red line: `WHR` * Light cyan shaded area: `P_intuition` 3. **Bottom Subplot:** * **Y-Axis Label:** `Rationality Minima (m)`. * **Legend (Bottom-Left):** * Solid black line: `m* (global)` * Dashed black line: `m** (2nd)` * Dash-dot purple line: `m_d = 0.7` (a constant reference line). * **Panel C only:** An additional dotted black line: `m*** (3rd)`. ### Detailed Analysis **Panel A (β_dream = 2.0):** * **Top (Entropy):** Both `⟨ℰ⟩_λ` and `⟨H⟩_λ` start at a low, constant value (~1.00 and ~1.27 respectively) for `λ < 10^0`. They undergo a sharp, sigmoidal increase between `λ ≈ 10^0` and `λ ≈ 10^1`, saturating at a high constant value (~1.38). The `⟨H⟩_λ` curve rises slightly earlier than `⟨ℰ⟩_λ`. The inset shows the first derivatives of both metrics peak sharply in the transition region (`λ ≈ 10^0`), with the `ℰ` peak being taller and narrower than the `H` peak. * **Middle (MFPT/WHR):** `MFPT` starts high (~0.95), remains stable until `λ ≈ 10^0`, then drops sharply to a low plateau (~0.38) by `λ ≈ 10^1`. `WHR` starts at 0, begins rising at `λ ≈ 10^0`, and saturates at 1.0 by `λ ≈ 10^1`. The cyan `P_intuition` shaded region is a narrow peak centered around `λ ≈ 10^0.5` (approx. 3), coinciding with the crossover point of the MFPT and WHR curves. * **Bottom (Rationality):** `m* (global)` starts at ~0.7, begins a smooth decline at `λ ≈ 10^0`, and approaches 0 by `λ ≈ 10^2`. `m** (2nd)` is a constant line at ~0.72, slightly above the initial `m*`. The `m_d = 0.7` reference line is constant. **Panel B (β_dream = 5.0):** * **Top (Entropy):** Similar sigmoidal transition as Panel A, but it occurs at a lower `λ` value (starting around `λ ≈ 10^-1`). The saturation values appear similar. The inset derivative peaks are now located at `λ ≈ 10^-1`. The `H` derivative shows a small secondary bump before the main peak. * **Middle (MFPT/WHR):** The transition is more complex. `MFPT` starts high (~1.0), drops sharply at `λ ≈ 10^-1`, but then exhibits a pronounced non-monotonic "bump" or local maximum between `λ ≈ 10^0` and `λ ≈ 10^1` before settling to its low plateau. `WHR` rises from 0 starting at `λ ≈ 10^-1` and saturates at 1.0. The `P_intuition` region is broader and more complex, with a main peak under the MFPT drop and a secondary lobe under the MFPT bump. * **Bottom (Rationality):** `m* (global)` starts at ~0.7, begins declining at `λ ≈ 10^-1`, and approaches 0. `m** (2nd)` now shows a dynamic behavior: it starts at ~0.7, rises to ~0.85 by `λ ≈ 10^1`, and then plateaus. The `m_d = 0.7` line remains constant. **Panel C (β_dream = 20.0):** * **Top (Entropy):** The sigmoidal transition is even sharper and occurs at the lowest `λ` (starting before `λ ≈ 10^-2`). The curves appear almost step-like. The inset derivative peaks are very sharp and located at `λ ≈ 10^-2`. The `H` derivative shows a distinct double-peak structure. * **Middle (MFPT/WHR):** The `MFPT` drop is very steep, occurring at `λ ≈ 10^-2`. The non-monotonic bump seen in Panel B is absent; the curve drops directly to its low plateau. `WHR` rises sharply at the same `λ`. The `P_intuition` region is a single, sharp peak aligned with the transition. * **Bottom (Rationality):** `m* (global)` starts at ~0.7 and drops to 0 very sharply at `λ ≈ 10^-2`. `m** (2nd)` starts at ~0.7, rises sharply to ~0.95 at `λ ≈ 10^0`, and then plateaus. A new `m*** (3rd)` line appears, starting at ~0.1 at `λ ≈ 10^-1`, jumping to ~0.95 at `λ ≈ 10^0`, and plateauing. The `m_d = 0.7` line is constant. ### Key Observations 1. **Phase Transition Shift:** As `β_dream` increases from 2.0 to 20.0, the critical transition point (where metrics change sharply) shifts to lower values of `λ` (from ~10^0 to ~10^-2). 2. **Transition Sharpness:** The transitions in both entropy and MFPT/WHR become sharper and more step-like with increasing `β_dream`. 3. **Complexity in Intermediate β:** Panel B (`β_dream=5.0`) shows the most complex behavior in the middle subplot, with a non-monotonic MFPT curve and a broad, multi-lobed `P_intuition` region. This suggests an intermediate regime with competing effects. 4. **Rationality Hierarchy:** The number of distinct rationality minima (`m*`, `m**`, `m***`) that become relevant increases with `β_dream`. In Panel C, a third minimum (`m***`) emerges. 5. **Correlation of Features:** The peaks in the derivative insets (top subplots) align with the `λ` values where the main entropy curves, MFPT/WHR curves, and `P_intuition` regions show their most dramatic changes. ### Interpretation This figure illustrates the behavior of a system governed by a parameter `λ` under different "dream" or exploration intensities (`β_dream`). The data suggests a **phase transition** from one regime (low `λ`) to another (high `λ`). * **Low `λ` Regime:** Characterized by low entropy (ordered state), high Mean First-Passage Time (MFPT, suggesting slow dynamics or high resistance), low Winning Hit Rate (WHR, poor performance), and a single dominant rationality minimum (`m*`). * **High `λ` Regime:** Characterized by high entropy (disordered or exploratory state), low MFPT (fast dynamics), high WHR (good performance), and the emergence of multiple rationality minima (`m**`, `m***`), indicating a more complex decision landscape. * **Role of `β_dream`:** This parameter controls the **location and sharpness** of the transition. Higher `β_dream` makes the system switch between regimes at a lower `λ` value and in a more abrupt, discontinuous manner. The intermediate `β_dream=5.0` reveals a richer structure, possibly indicating a region where ordered and disordered phases coexist or compete, leading to the non-monotonic MFPT and broad intuition peak. * **`P_intuition`:** This shaded region likely represents the parameter space where an intuitive or heuristic strategy is most effective or probable. Its peak coincides with the phase transition, suggesting intuition is most valuable when the system is poised between order and disorder. In essence, the figure maps out how increasing exploratory pressure (`β_dream`) shifts and sharpens a fundamental transition in system dynamics, performance, and the underlying rational structure, with intuition playing a key role at the critical point. </details> Figure S4: Dependence of theoretical predictions on the imagination temperature $\beta_{\text{dream}}$ . The theoretical phase diagram is shown for increasing values of $\beta_{\text{dream}}=\{2.0,5.0,20.0\}$ . This parameter controls the policy stochasticity in the self-referential entropy calculation. (A-C) As $\beta_{\text{dream}}$ increases, the system’s phase diagram (bottom row) changes. Higher values of this temperature can also reveal more complex phase structures, including the emergence of the inspiration phase, as discussed in the text. Insets in the first row (here and in the main text) measure the numerical first-derivatives of both the cross-entropy and path-entropy for low sampling temperature at equilibrium (thus for global attractors). The separation in the peaks of the discontinuities (B,C) signal the entropy- and energy- driven transitions that delimitate the intuition window.

Rendering Paper...