# Statistical physics for artificial neural networks
**Authors**:
- Zongrui Pei (New York University, New York, NY10003, USA)
- E-mail: peizongrui@gmail.com; zp2137@nyu.edu
Abstract
The 2024 Nobel Prize in Physics was awarded for pioneering contributions at the intersection of artificial neural networks (ANNs) and spin-glass physics, underscoring the profound connections between these fields. The topological similarities between ANNs and Ising-type models, such as the Sherrington-Kirkpatrick model, reveal shared structures that bridge statistical physics and machine learning. In this perspective, we explore how concepts and methods from statistical physics, particularly those related to glassy and disordered systems like spin glasses, are applied to the study and development of ANNs. We discuss the key differences, common features, and deep interconnections between spin glasses and neural networks while highlighting future directions for this interdisciplinary research. Special attention is given to the synergy between spin-glass studies and neural network advancements and the challenges that remain in statistical physics for ANNs. Finally, we examine the transformative role that quantum computing could play in addressing these challenges and propelling this research frontier forward. Contents
1. 1 Introduction
1. 2 Spin-glass physics for artificial neural networks
1. 2.1 Relations of spin glasses, biological neurons, and associative memory
1. 2.2 Dictionary of corresponding concepts
1. 2.3 Hopfield neural network and Boltzmann machines
1. 2.4 Replica theory and the cavity method
1. 2.5 Overparameterization and double-descent behavior
1. 3 Challenges and perspectives
1. 3.1 Challenges in ANNs and spin glasses for ANNs
1. 3.2 Spin-glass physics helps understand ANNs
1. 3.3 Do we need new order parameters for ANNs?
1. 3.4 Quantum computing for spin glasses and ANNs
1. 3.5 More opportunities
1. 4 Conclusions
1 Introduction
Artificial neural networks (ANNs) have had a profound influence in many sectors, as demonstrated by numerous notable milestones in ANN applications. For example, AlexNet is one of the first models to show the impressive capabilities of ANNs in classification tasks using the ImageNet dataset krizhevsky2012imagenet. AlphaGo corroborates that ANNs can outperform human players in games silver2016mastering; silver2017mastering. ANNs also find their application in self-driving cars badue2021self. Recently, large language models chatGPT; pei2025language, which are complex ANN models, have fundamentally changed how we work, learn, and teach, influencing nearly everyone. ANNs have been widely adopted in various scientific research domains. An outstanding example is AlphaFold, which has solved the half-century-long challenge in structural biology jumper2021highly and accelerated the determination of three-dimensional (3D) protein structures, potentially revolutionizing drug discovery and the healthcare industry. A few more typical examples in physics are shown in Figure 1. ANNs have been used to describe many-body interactions or construct empirical potentials, such as PhysNet unke2019physnet. Some ANN potentials with multiple neural layers are also referred to as deep potentials zhang2018deep. Recently, inspired by the success of foundation models for natural languages, developing foundation models for potentials has been proposed, and some initial studies have been conducted batatia2023foundation. Order parameters can describe the phase transitions in thermodynamics. ANNs have been adopted to construct new order parameters successfully yin2021neural; rogal2019neural; jung2025roadmap. As a proof of concept, ANNs are also used to rediscover physical laws and concepts iten2020discovering. ANN models have been developed to design novel materials pei2023toward; Tshitoyan2019; pei2024designing; pei2024computer. They have been utilized to link the physical properties of various materials and their crystal structures, microstructure, and processing pei2021machine; pei2024towards. Examples of applications include low-dimensional materials, structural materials, and functional materials, among others.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Diagram: ML & Physics Integration Framework
### Overview
The diagram illustrates the intersection of machine learning (ML) and physics, organized around a central hexagon labeled "ML & Physics." Six radiating sections connect to specialized domains, each represented by diagrams and text. The layout emphasizes bidirectional relationships between ML techniques and physical principles.
### Components/Axes
1. **Central Hexagon**:
- Label: "ML & Physics" (white text on blue background)
- Position: Center of the diagram
2. **Radiating Sections**:
- **Top**: Molecular structure with equations for energy, forces, and dipole moments
- **Top-Right**: Periodic table-like chart with color-coded cells (pink, blue, purple)
- **Bottom-Right**: Flowchart for language models in materials design
- **Bottom**: Thermodynamics/order parameters graph with insets
- **Bottom-Left**: Neural network architecture for Q&A systems
- **Top-Left**: Many-body interaction diagram with parameterization steps
3. **Key Text Elements**:
- "Discover physical laws and concepts" (top-left)
- "Discover new materials" (top-right)
- "Thermodynamics/Order parameters" (bottom)
- "Language models for materials design" (bottom-right)
- "Many-body interaction (ML potentials)" (top)
4. **Legend**:
- Horizontal bar with gradient from blue ("no data") to purple ("agree")
- Position: Bottom of periodic table section
### Detailed Analysis
1. **Molecular Section**:
- Equations:
- Energy: $ E = \sum_{i=1}^N E_i + \sum_{i=1}^N \sum_{j>i}^N k_e \frac{q_i q_j}{r_{ij}} $
- Forces: $ F_i = -\frac{\partial E}{\partial r_i} $
- Dipole: $ p = \sum_{i=1}^N q_i r_i $
- Visual: 3D molecular structure with red dashed circle highlighting $ r_{cut} $
2. **Periodic Table Section**:
- Color-coded cells:
- Pink: Elements with "no data" (e.g., B, C, N)
- Blue: Elements with "disagree" (e.g., Al, Si, P)
- Purple: Elements with "agree" (e.g., He, Ne, Ar)
- Notable: Transition metals (Fe, Co, Ni) show mixed agreement
3. **Thermodynamics Graph**:
- X-axis: Time (ns)
- Y-axis: $ f(Q) $ (0-1 scale)
- Insets:
- Left: Phase transition at $ Q_{max} = 0.20 $
- Right: Atomic arrangement at $ Q_{A19} = 0.52 $
4. **Neural Network Diagram**:
- Architecture: Encoder-Decoder with latent representation
- Labels:
- Input: Observations
- Output: Answer
- Hidden: Latent representation
5. **Many-Body Interaction**:
- Visual: Molecular dynamics simulation
- Text: "ML potentials" with parameterization steps (encoding â parameterization â decoding)
### Key Observations
1. **Color Coding**:
- Blue/purple gradient in legend correlates with data confidence (blue = no data, purple = agree)
- Periodic table shows ML predictions align best with noble gases (He, Ne, Ar)
2. **Temporal Dynamics**:
- Thermodynamics graph shows phase transition at 5 ns ($ Q_{max} = 0.20 $)
- Atomic arrangement changes at 15 ns ($ Q_{A19} = 0.52 $)
3. **ML Architecture**:
- Encoder-Decoder structure mirrors traditional NLP models but applied to physical systems
### Interpretation
This diagram demonstrates how ML accelerates physical discovery through:
1. **Material Design**: Language models process scientific literature to predict material properties
2. **Quantum Mechanics**: ML potentials replace classical force fields for many-body interactions
3. **Thermodynamics**: Neural networks model phase transitions and atomic arrangements
4. **Periodic Table Analysis**: ML identifies elements where predictions align with experimental data
The central hexagon acts as a conceptual bridge, showing that ML enhances physics through:
- Improved parameterization of molecular interactions
- Faster discovery of new materials
- Better understanding of thermodynamic systems
- More accurate predictions of material properties
Notably, the periodic table's color coding suggests ML performs best with noble gases, possibly due to their simpler electronic structures. The temporal graph indicates ML can predict phase transitions with high temporal resolution (ns scale), suggesting applications in ultrafast material science.
</details>
Figure 1: The applications of artificial neural networks in physics. Artificial neural networks have been widely utilized to solve a diverse range of problems, including those in scientific research. Here, we show a few typical examples in physical research, i.e., (i) many-body interactions (machine-learning potentials) unke2019physnet; zhang2018deep, (ii) thermodynamics (proposal of order parameters to describe phase transitions) yin2021neural; rogal2019neural, (iii) discovery of physical laws and concepts iten2020discovering, (iv) discovery or design of new materials pei2021machine, and (v) language models for materials design pei2023toward; Tshitoyan2019; pei2024towards.
ANNs have been employed to understand Ising-like physical systems mills2020finding; carrasquilla2017machine; hibat2021variational; fan2023searching, including spin glass hibat2021variational; fan2023searching. They provide a new opportunity to understand the physics of spin glass. Fan et al. searched for spin glass ground states through deep reinforcement learning fan2023searching. Huang et al. confirmed the efficiency of classic machine learning to describe quantum phases, taking 2D random Heisenberg models as an example huang2022provably. ANNs can boost Monte-Carlo simulations of spin glasses (autoregressive neural networks) mcnaughton2020boosting and search for ground states of spin glasses (tropical tensor network) liu2021tropical.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Timeline and Bar Chart: Evolution of Computational Models and LLM Architectures
### Overview
The image presents a dual visualization: a historical timeline of computational models (left) and a bar chart showing the evolution of LLM architectures (right). The timeline traces key milestones from 1943 to 2024, while the bar chart quantifies publication trends in LLM research from 2018â2024. A Spin Glass diagram (bottom-left) connects to the Hopfield Network (1982) via a red arrow.
---
### Components/Axes
#### Timeline (Left)
- **X-axis**: Years (1943â2024), marked at intervals (1943, 1949, 1958, 1974, 1975, 1982, 1985, 2017, 2021, "Now", "Year").
- **Y-axis**: Unlabeled, but represents vertical placement of models/diagrams.
- **Key Elements**:
- **Spin Glass Diagram**: Bottom-left, triangular structure with "+", "-", and "?" labels, connected by arrows.
- **Model Diagrams**:
- **Hopfield Network** (1982): Grid of nodes with visible (black) and hidden (gray) nodes.
- **Boltzmann Machine** (1985): Dense network with visible (black) and hidden (gray) nodes.
- **Restricted Boltzmann Machine** (1985): Sparse connections between visible and hidden nodes.
- **Text Labels**: Model names and years (e.g., "McCulloch-Pitts network" at 1943, "Transformer" at 2017).
#### Bar Chart (Right)
- **X-axis**: Years (2018â2024), labeled sequentially.
- **Y-axis**: "Number of major publications" (0â20), with gridlines.
- **Legend**: Right-side, color-coded categories:
- **Encoder-Only**: Orange
- **Encoder-Decoder**: Yellow
- **Decoder-Only**: Green
- **Linear-Attention**: Brown
- **Data**: Stacked bars for each year, showing cumulative contributions per category.
---
### Detailed Analysis
#### Timeline
1. **1943**: McCulloch-Pitts network (first neural network model).
2. **1949**: Hebbâs rule (learning mechanism for synaptic weights).
3. **1958**: Perceptron (early single-layer neural network).
4. **1974**: Littleâs associative network (early associative memory model).
5. **1975**: EA/SK Spin Glass (theoretical model linking physics and computation).
6. **1982**: Hopfield Network (energy-based associative memory).
7. **1985**: Boltzmann Machine (stochastic neural network with hidden layers).
8. **2017**: Transformer (self-attention architecture).
9. **2020**: AlphaFold (protein structure prediction).
10. **2021â2024**: LLMs (large language models, e.g., GPT, BERT).
#### Bar Chart
- **2018**:
- Encoder-Only: 2 publications
- Encoder-Decoder: 5
- Decoder-Only: 3
- Linear-Attention: 0
- **2019**:
- Encoder-Only: 4
- Encoder-Decoder: 6
- Decoder-Only: 2
- Linear-Attention: 0
- **2020**:
- Encoder-Only: 3
- Encoder-Decoder: 4
- Decoder-Only: 5
- Linear-Attention: 0
- **2021**:
- Encoder-Only: 2
- Encoder-Decoder: 3
- Decoder-Only: 8
- Linear-Attention: 0
- **2022**:
- Encoder-Only: 1
- Encoder-Decoder: 2
- Decoder-Only: 12
- Linear-Attention: 0
- **2023**:
- Encoder-Only: 0
- Encoder-Decoder: 1
- Decoder-Only: 9
- Linear-Attention: 3
- **2024**:
- Encoder-Only: 0
- Encoder-Decoder: 0
- Decoder-Only: 6
- Linear-Attention: 4
---
### Key Observations
1. **Timeline**:
- The Spin Glass (1975) is explicitly linked to the Hopfield Network (1982) via a red arrow, suggesting a conceptual bridge between physics-inspired models and neural networks.
- The "?" in the Spin Glass diagram (1943) implies unresolved questions about its role in later architectures.
2. **Bar Chart**:
- **Peak in 2022**: Decoder-Only models dominate (12 publications), reflecting the rise of architectures like GPT-3.
- **Linear-Attention Emergence**: First appears in 2023, growing to 4 publications by 2024.
- **Encoder-Decoder Decline**: Drops from 6 (2019) to 1 (2023), indicating shifting research focus.
---
### Interpretation
1. **Historical Progression**:
- Early models (1943â1985) laid foundational principles (e.g., Hebbâs rule, Boltzmann Machines) that influenced later architectures.
- The Spin Glass (1975) represents an interdisciplinary link between physics and computation, later formalized in the Hopfield Network.
2. **LLM Evolution**:
- The 2022 peak aligns with breakthroughs like GPT-3, which popularized Decoder-Only models.
- Linear-Attentionâs rise (2023â2024) suggests growing interest in efficiency improvements for long-sequence tasks.
3. **Unresolved Questions**:
- The Spin Glassâs "?" (1943) highlights gaps in understanding its direct impact on modern architectures.
- The timelineâs abrupt end at "Year" (2024) leaves future trends speculative.
---
### Spatial Grounding
- **Legend**: Right-aligned, colors match bar chart categories (orange=Encoder-Only, green=Decoder-Only).
- **Spin Glass Diagram**: Bottom-left, isolated from the timeline but connected via the red arrow to the Hopfield Network.
- **Bar Chart**: Right-side, stacked bars with clear color coding; legend positioned adjacent to data.
---
### Conclusion
The image illustrates the trajectory from early computational models to modern LLMs, emphasizing the Spin Glassâs theoretical role and the dominance of Decoder-Only architectures in recent years. The 2022 publication peak underscores rapid innovation, while Linear-Attentionâs emergence hints at future directions. The Spin Glassâs unresolved status invites further interdisciplinary exploration.
</details>
Figure 2: The history of artificial neural networks and relevant physical models. Other landmarks include backpropagation in the 1970s and AlexNet in 2012. Interestingly, bio-inspired ANNs ultimately led to the solution of the protein folding problem that has plagued us for half a century.
The significant breakthroughs in developing ANNs highlight the importance of convergence across different disciplines. ANNs mimic the structure and function of biological neurons, inspired by the human brain. In addition to their biological origin, the physical foundation of ANNs has been recognized by the Nobel Prize in Physics 2024, awarded to physicist John J. Hopfield and computer scientist Geoffrey E. Hinton hinton2025nobel. Their foundational discoveries enabled the development of ANNs and their applications today. ANNs are closely connected with spin glass, a traditional topic theorized by Edwards and Anderson in 1975 edwards1975theory and has attracted considerable attention in recent years. The importance of spin glasses as a representative complex system in different research domains across various length scales was demonstrated by the Nobel Prize in Physics 2021, awarded partly to Giorgio Parisi parisi1979infinite. The Nobel Prizes in Physics in 2021 and 2024 have also contributed to the increased attention. The prize-winning studies revealed a profound connection between spin glasses and diverse disciplines, ranging from large to small scales, encompassing complexity science, biology, and computer science.
There are a few landmarks in the history of ANNs [Figure 2], such as the seminal work of McCulloch and Pitts on neural networks in 1943 mcculloch1943logical that initiated the research domain and the Hebbian mechanism or Hebbâs rule in 1949 that suggests when neurons fire signals hebb2005organization, the simple ANN perceptron proposed by Rosenblatt in 1958 rosenblatt1958perceptron, and the proposal of associative neural networks by Little in 1974 little1974existence, and later by Hopfield in 1982 hopfield1982neural. In 1985, Hinton and colleagues proposed the Boltzmann machine ackley1985learning, a stochastic recurrent neural network (RNN) where the energy state of each neural-network configuration follows the Boltzmann distribution. Restricted Boltzmann machines were proposed later by removing the connections between neurons within the same layer to improve the networkâs efficiency. The proposal of backpropagation in the 1970s, which Hinton and colleagues utilized to optimize ANNs like AlexNet in 2012, is another significant milestone. In 2017, the transformer architecture was proposed as an alternative to RNN structures, where the attention mechanism is a key novel component. Due to its high efficiency and suitability in large model optimization, it opened a new era for deep neural networks. AlphaFold jumper2021highly, the Nobel-Prize-winning model, and the popular large language models like GPT-3.5/4/4.5 and Llama 2/3/4 all adopt the transformer architecture.
Albeit with enormous success, the mechanisms behind these successes remain enigmatic. Fortunately, it has been demonstrated that idealized versions of these powerful networks are mathematically equivalent to older, simpler machine learning models, such as kernel machines belkin2018understand; jacot2018neural or physical models jcrn-3nrc. Therefore, the physical foundation of ANNs is closely related to magnetism (spins) and statistical physics amit1985spin; amit1987statistical; watkin1993statistical; mezard2024spin. Hopfield networks hopfield1982neural; hopfield1999brain; krotov2023new and Boltzmann machines ackley1985learning are essentially Sherrington-Kirkpatrick (SK) spin glasses with random interaction parameters sherrington1975solvable, where each neuron or spin is connected with all other ones except itself. When mapping the spin-glass structure to an ANN, each state corresponds to a pattern or memory in the Hopfield network. Finite metastable states indicate the limited capability of the Hopfield network folli2017maximum. It is essential to determine whether general spin glasses exhibit infinite metastable states, as is the case with the mean-field SK model. Spherical spin glass models have been used to study simplified, idealized ANNs choromanska2015loss. Parisi proposed a replica symmetry breaking (RSB) solution parisi1979infinite; charbonneau2023spin to the SK model sherrington1975solvable. The RSB theory has also been used to study ANNs, and steps toward some rigorous results were discussed agliari2020replica. Ghio et al. showed the sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective ghio2024sampling. If this equivalence of ANNs and spin-glass systems can be extended beyond idealized neural networks, it may explain how practical ANNs achieve their astonishing results.
Box 1 | Basic Concepts of Statistical Physics Used by ANNs
Thermodynamics
Energy Landscapes and Loss Functions The loss functions in ANNs are similar to the total energy of physical systems [Figure 1]. Statistical physics concepts, such as energy landscapes, are used to understand the dynamics and optimization in neural networks. For example, the weights and states in neural networks can be thought of as occupying a rugged energy landscape with many local minima, just like glassy Ising models. To optimize or train ANNs, we use methods such as simulated annealing and stochastic gradient descent, which are inspired by statistical physics.
Entropy and Information Theory Entropy from statistical physics is used to measure uncertainty in predictions and model behavior. One type of entropy, known as âcross entropy,â between the actual values and predictions can be used as an error function. In addition, the entropy concept is closely related to the Information Bottleneck Principle, which explains how neural networks compress information during training, much like physical systems reduce entropy under certain conditions.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Chart/Diagram Type: Rugged Energy Landscape with Phase Transitions
### Overview
The image depicts a **rugged energy landscape** (blue line) plotted against **parameter space** (x-axis) and **Energy / Error function** (y-axis). It includes two insets showing **high-temperature states** (disordered lattice) and **ground states** (ordered lattice). Red and green dots mark critical points, with red dashed lines connecting metastable phases to the global minimum.
### Components/Axes
- **X-axis**: Parameter space (no numerical scale provided).
- **Y-axis**: Energy / Error function (no numerical scale provided).
- **Legend**:
- Red dashed lines: "Local minima/metastable phases"
- Green dashed line: "Global minimum/ground phase"
- **Insets**:
- Top-right: "High-temperature state" (disordered lattice of blue/white dots).
- Bottom-right: "Ising square-ice ground state" (ordered lattice with arrows indicating spin interactions).
### Detailed Analysis
1. **Rugged Energy Landscape (Blue Line)**:
- The blue line oscillates with three peaks and two valleys.
- **Local minima**: Two red dots at parameter space positions ~0.3 and ~0.7 (approximate).
- **Global minimum**: Green dot at parameter space position ~0.5 (approximate).
- Red dashed lines connect local minima to the global minimum, illustrating transitions between metastable and stable states.
2. **High-Temperature State (Top Inset)**:
- Disordered lattice of blue and white dots (no discernible pattern).
- Arrows in the inset (bottom-right corner) show random spin orientations (red/blue arrows).
3. **Ground State (Bottom Inset)**:
- Ordered lattice with alternating blue and white dots.
- Arrows in the inset show coordinated spin interactions (red/blue arrows forming a cross pattern).
### Key Observations
- The energy landscape has **two metastable states** (local minima) and one **global minimum** (green dot).
- The **global minimum** is positioned centrally in parameter space (~0.5), while local minima are offset (~0.3 and ~0.7).
- The **high-temperature state** lacks order, while the **ground state** exhibits a highly structured lattice.
- Red dashed lines suggest **phase transitions** between metastable and ground states.
### Interpretation
This diagram likely represents a **phase transition model** (e.g., Ising model) in physics or materials science. The rugged energy landscape illustrates how systems navigate between metastable states (local minima) and the most stable state (global minimum). The insets visualize the **disordered-to-ordered transition** driven by temperature changes:
- At high temperatures, thermal energy disrupts order (high-temperature state).
- At low temperatures, spins align to minimize energy (ground state).
The diagram emphasizes the **challenge of escaping local minima** in optimization problems, a common issue in machine learning and statistical mechanics. The Ising square-ice ground state inset suggests a focus on **frustration-free systems** with exact solvability.
No numerical values or uncertainties are provided in the image. All positional estimates (e.g., parameter space positions) are approximate based on visual alignment.
</details>
Figure 3: Basic concepts in statistical physics and thermodynamics. Here, we show that each pattern (memory or configuration) corresponds to an energy state in the rugged energy landscape of the free energy function or an error function.
Phase Transitions The concept of phase transitions is useful to describe the behavior of ANNs. ANNs exhibit phase transitions during training when their prediction and reliability suddenly change upon small changes of ANN structures (e.g., neural layers, parameter size in each layer), learning rate, etc., around critical values. This is similar to physical systems undergoing structural changes (e.g., spin glasses).
Bayesian Inference and Partition Functions
Bayesian statistics finds its application in ANNs. It uses Bayesâ theorem to calculate and update the probability distribution when new data or knowledge is available. This conditional feature meets the flexibility requirement of real-world problems, making it practically useful. Bayesian neural networks use principles from statistical physics for probabilistic inference. The partition function, a cornerstone of statistical mechanics, is used to calculate probabilities in probabilistic graphical models, including some neural networks.
Learning Dynamics and Langevin Equations
There are a few connections between Langevin equations and ANNs. Here, we mention two of them: (i) The learning dynamics of ANNs is described by Langevin equations, analogous to the motion of particles in a thermal bath. In studying stochastic gradient descent (SGD), the noise introduced by mini-batch sampling is found to resemble thermal noise goldt2019dynamics. (ii) The diffusion model of ANNs for images is similar to Langevin equations albergo2023stochastic, since both start with a white noisy background, and gradually recover the true states based on the status change of each particle or pixel.
2 Spin-glass physics for artificial neural networks
Just as neurons are the biological analogue of ANNs, spin glasses could be considered the physical analogue of ANNs: they were the inspiration for the original Hopfield model hopfield1982neural, which itself is viewed by some as a type of spin glass. We will discuss their relationships below.
2.1 Relations of spin glasses, biological neurons, and associative memory
Spin glass A spin glass is a type of magnetic system in which the quenched interactions between localized spins may be either ferromagnetic or antiferromagnetic with roughly equal probability. It exhibits an array of experimental properties: spin freezing with no long-range magnetic order at low temperature, a magnetic susceptibility cusp at a critical temperature accompanied by the absence of a singularity in the specific heat, slow relaxation, aging, and several others stein2013spin; stein2011spin. Most theoretical work has focused on the Edwards-Anderson (EA) spin glass edwards1975theory proposed in 1975 and its infinite-range analogue, the Sherrington-Kirkpatrick (SK) model sherrington1975solvable, proposed the same year.
The EA Hamiltonian in the absence of an external field is given by
$$
H=-\sum_{<ij>}J_{ij}\sigma_{i}\sigma_{j}\ , \tag{1}
$$
where the couplings $J_{ij}$ are i.i.d. random variables representing the interaction between spins $\sigma_{i}$ and $\sigma_{j}$ at nearest-neighbor sites $i$ and $j$ (denoted by the bracket in (1)). One is free to choose the distribution of the couplings: a common choice, which will be used here, is a Gaussian distribution with zero mean and unit variance. This distinguishes the spin glass from more conventional magnets: if $J_{ij}$ were a constant $J$ , the model becomes either a ferromagnet ( $J>0$ ) or an antiferromagnet ( $J<0$ ), both with an ordered ground state.
As noted above, when every pair of spins interacts with each other, the model becomes the SK model, a mean-field spin glass with Hamiltonian
$$
H=-\frac{1}{\sqrt{N}}\sum_{i<j}J_{ij}\sigma_{i}\sigma_{j}\ , \tag{2}
$$
where the coupling distribution $J_{ij}$ is also a mean-zero, unit variance Gaussian. The thermodynamic properties of the SK model are well understood, and analytical expressions were found for its free energy and order parameters parisi1979infinite. Whether similar properties exist in short-range spin glasses remains an open question. Understanding short-range spin glasses in finite dimensions remains a major challenge in both condensed matter physics and statistical mechanics.
ANNs and spin glasses Magnetic systems have long been used to model neural networks, going back (at least) to the work of McCullough and Pitts mcculloch1943logical. In its simplest form, a neuron is assumed to have only two states: âonâ or firing, and âoffâ or quiescent. Therefore, its state can be modelled as an Ising spin, where $+1$ corresponds to the firing state and $-1$ to the quiescent state. A state, or firing pattern, of the entire neural network then corresponds to a spin configuration $\{\sigma\}$ . The interactions between neurons, i.e., synaptic efficiencies, correspond to the couplings $J_{ij}$ between spins in a magnetic system.
The Hebb learning rule is used to determine the coupling parameters hebb2005organization. In associative neural networks, such as the Hopfield network, suppose there are $p$ patterns (corresponding to memories). Then the interactions $J_{ij}$ between neurons are modelled as
$$
J_{ij}=\frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}, \tag{3}
$$
where the state of the $i^{\rm th}$ spin in the $\mu^{\rm th}$ pattern (spin configuration) is represented by $\xi_{i}^{\mu}$ , which also takes on the values $± 1$ . When $p=1$ , the system is equivalent to a Mattis model mattis1976solvable, which is gauge-equivalent to a ferromagnet. Setting $p>1$ introduces frustration into the system, and its behavior becomes more spin-glass-like. Apart from its interpretation as a model for neural networks, the Hopfield model represents an interesting statistical mechanical system in its own right and has been the subject of considerable study. A few studies have used statistical mechanics methods to investigate phase transitions in the Hopfield model hopfield1982neural; amit1985storing; gardner1988space and have found a bound $p$ on the number of memories that can be faithfully recalled from any initial state in a system of $N$ spins or neurons.
In addition to Hopfield networks, some idealized ANNs also have similar correspondences with spin glasses little1974existence, such as the spherical SK spin glass models choromanska2015loss [Figure 4]. In a spherical spin glass, the spins have varying magnitudes subject to the constraint $\sum_{i}^{N}\sigma_{i}^{2}=N$ . Given training data $\{X_{i}\},y$ , a neural network with $H-1$ hidden layers is equivalent to an equation choromanska2015loss, i.e., $y=q\sigma\big(W^{T}_{H}\sigma(W^{T}_{H-1}...\sigma(W_{1}^{T}X))...\big)$ , where $q$ is a normalization factor. The loss function of this ANN is
$$
L_{\Lambda,H}(\bm{\tilde{w}})=\frac{1}{\Lambda^{(H-1)/2}}\sum_{i_{1},i_{2},...,i_{H}=1}^{\Lambda}X_{i_{1},i_{2},...,i_{H}}\tilde{w}_{i_{1}}\tilde{w}_{i_{2}}...\tilde{w}_{i_{H}}, \tag{4}
$$
where $\Lambda$ is the number of weights. By imposing the spherical constraint above on the weights that follow $1/\Lambda\sum_{i=1}^{\Lambda}\tilde{w}_{i}^{2}=1$ , the loss function of this neural network was found to be mathematically equivalent to the Hamiltonian formulation of a spin glass.
<details>
<summary>x4.png Details</summary>

### Visual Description
## Neural Network and Spin Glass Model Diagram
### Overview
The image presents two interconnected technical diagrams:
1. **Neural Network Architecture** (Left): A feedforward network with input, hidden, and output layers.
2. **Spin Glass Dynamics** (Right): A temporal evolution of a spin glass network.
---
### Components/Axes
#### Neural Network (a)
- **Input Layer**:
- Labeled "Input: X" with 5 blue nodes (neurons).
- Connected to hidden layer via matrix **W** and bias **b**.
- **Hidden Layer**:
- Labeled "Hidden layer: Z" with 3 purple nodes.
- Equations:
- **Z = WX + b** (linear transformation).
- **y = f(W'Z + b)** (output with activation function **f**).
- **Output Layer**:
- Single orange node labeled "Output: y".
- **Matrices**:
- **W** (inputâhidden weights), **W'** (hiddenâoutput weights).
#### Spin Glass (b)
- **Initial Network**:
- 4 blue nodes (time **t_i**) connected in a fully linked "SK Spin glass" topology.
- **Temporal Evolution**:
- Arrows indicate transformation to a vertical chain of 4 red nodes (time **t_j**).
- Represents dynamic reconfiguration over time.
---
### Detailed Analysis
#### Neural Network
- **Flow**:
- Input **X** â Hidden **Z** via **W** and **b** â Output **y** via **W'** and **b**.
- Activation function **f** (e.g., ReLU, sigmoid) applied at output.
- **Color Coding**:
- Blue (input), Purple (hidden), Orange (output) for clarity.
#### Spin Glass
- **Structure**:
- Initial fully connected network (blue nodes) evolves into a linear chain (red nodes) over time.
- Arrows suggest stochastic or deterministic transitions between states.
---
### Key Observations
1. **Neural Network**:
- Standard feedforward architecture with explicit weight matrices and biases.
- No explicit activation function specified for hidden layer (only output).
2. **Spin Glass**:
- Temporal evolution implies time-dependent interactions (e.g., Ising model dynamics).
- Color shift (blueâred) may denote state changes (e.g., spin polarization).
---
### Interpretation
- **Neural Network**:
- Illustrates forward propagation: data flows from input to output through weighted sums and nonlinear activation.
- Matrices **W**, **W'** and biases **b** define learnable parameters.
- **Spin Glass**:
- Represents a physical system (e.g., magnetic spins) evolving over time with complex interactions.
- The transition from a fully connected network to a linear chain may model phase transitions or critical dynamics.
- **Connection**:
- Both diagrams emphasize network topology and transformations (linear vs. temporal).
- The spin glass model could metaphorically represent the "hidden layer" dynamics in neural networks, where interactions evolve over training iterations.
---
**Note**: No numerical data or explicit values provided; focus is on structural and conceptual relationships.
</details>
Figure 4: Topology of an artificial neural network and Sherrington-Kirkpatrick (SK) model. a, A simple, fully connected neural network with one hidden layer. b, A four-spin mean field SK spin-glass model.
Similarity between biological neurons and ANNs ANNs are designed to imitate the structure of biological neurons [Figure 5] duan2020spiking. A neuron or nerve cell consists of three major parts, i.e., a cell body (soma), dendrites (receiving extensions), and an axon (conducting extension). When two neurons are connected, the presynaptic cell sends an electromagnetic signal to the postsynaptic cell. The signal travels from the axon of the presynaptic cell to the dendrites of the postsynaptic cell, which are directly connected. A threshold potential or action potential controls the activation of a neural connection [Figure 5 c]. The postsynaptic potential decides when to fire an electromagnetic signal. The potential at a specific neuron is a function of all postsynaptic potentials from its presynaptic neurons, i.e.,
$$
V_{i}=\sum_{j}J_{ij}(S_{j}+1). \tag{5}
$$
When $V_{i}$ is larger than a threshold $U_{i}$ , or $V_{i}-U_{i}>0$ the neuron is active ( $S_{i}=1$ ). Using the Heaviside function $H(x)$ , the state of the neuron can be written as $S_{i}=H(U_{i}-V_{i})$ . We use $h_{i}$ to represent the molecular field, $h_{i}=U_{i}-V_{i}$ . When the spin direction is parallel to $h_{i}$ , the local configuration is stable, i.e., $h_{i}S_{i}>0$ . Usually, the threshold function $U_{i}$ is assumed to satisfy $U_{i}â\sum_{j}J_{ij}$ , then we find $-\frac{1}{2}\sum_{i}h_{i}S_{i}â H=-\frac{1}{2}\sum_{ij}J_{ij}\sigma_{i}\sigma_{j}$ , which has the same form as Eq. 1. The above formulation is for zero temperature. At finite temperature, the probability of activating one neuron is $P(S_{i})=\frac{1}{\exp[-\beta(V_{i}-U_{i})]+1}$ , which is a Fermi-Dirac distribution. Here, $\beta$ is the inverse temperature, defined by $\beta=1/k_{B}T$ with $k_{B}$ as the Boltzmann constant and $T$ is the temperature.
ANNs inherit these key features. Each artificial neuron sends data to the neurons in its following layers, which react collectively. This response is calculated using matrix multiplication plus a bias variable. The activation and updated connections of neurons follow Hebbâs rule hebb2005organization, which states that when two neurons fire together, the excitatory (ferromagnetic in magnetic terms) component of their coupling is enhanced. The more often two neurons are active together, the stronger their excitatory connection becomes. This holds for both biological and artificial neural networks.
<details>
<summary>x5.png Details</summary>

### Visual Description
## Diagram: Biological Neuron and Artificial Neural Network with Action Potential Analysis
### Overview
The image presents three interconnected sections:
1. **Biological Neuron (a)**: A detailed anatomical diagram of a neuron, highlighting presynaptic and postsynaptic cells, dendrites, axon, and cell body.
2. **Artificial Neural Network (b)**: A schematic representation of a multi-layered neural network with input, hidden, and output layers, showing voltage values (V_in, V_out).
3. **Action Potential Graphs (c)**: Three overlaid graphs comparing membrane potential over time in *in vivo*, *in vitro*, and *model* conditions.
---
### Components/Axes
#### a. Biological Neuron
- **Labels**:
- **Pre-neuron (Presynaptic cell)**: Red arrow pointing to the neuron's input region.
- **Axon**: Red arrow tracing the neuron's elongated structure.
- **Cell body**: Central orange structure with dendritic branches.
- **Dendrites (inputs)**: Green arrows indicating input reception.
- **Post-neuron (Postsynaptic cell)**: Green arrow pointing to the neuron's output region.
- **Spatial Grounding**:
- Dendrites are clustered around the cell body (center-left).
- Axon extends vertically from the cell body (top-right).
#### b. Artificial Neural Network
- **Structure**:
- **Input Layer**: 4 nodes (V_in,1 to V_in,4) with varying activation patterns (dots).
- **Hidden Layer**: 4 nodes (V_out,1 to V_out,4) with summation symbols (ÎŁ).
- **Output Layer**: 4 nodes (V_out,1 to V_out,4) with triangular activation functions.
- **Axes**:
- **X-axis**: Layers (Input â Hidden â Output).
- **Y-axis**: Nodes (V_in,1 to V_out,4).
#### c. Action Potential Graphs
- **Axes**:
- **X-axis**: Time (ms), ranging from 0 to 4 ms.
- **Y-axis**: Membrane potential (mV), ranging from -80 to 40 mV.
- **Legends**:
- **Red**: *In vivo* (biological experiment).
- **Blue**: *In vitro* (isolated tissue).
- **Black**: *Model* (simulated data).
---
### Detailed Analysis
#### a. Biological Neuron
- **Flow**:
- Dendrites receive inputs (green arrows) â Cell body integrates signals â Axon transmits output (red arrow) â Postsynaptic cell responds.
- **Key Features**:
- Dendritic spines (small protrusions) are visible on the cell body.
- Axon terminals form synapses with the postsynaptic cell.
#### b. Artificial Neural Network
- **Voltage Values**:
- **Input Layer**:
- V_in,1: 3 dots (high activation).
- V_in,2: 2 dots.
- V_in,3: 1 dot.
- V_in,4: No activation.
- **Hidden Layer**:
- V_out,1: 3 dots.
- V_out,2: 2 dots.
- V_out,3: 1 dot.
- V_out,4: No activation.
- **Output Layer**:
- V_out,1 to V_out,4: Triangular activation functions (blue).
#### c. Action Potential Graphs
- **Trends**:
- **In vivo**: Sharpest peak at ~2 ms (40 mV).
- **In vitro**: Slightly delayed peak (~2.5 ms, 35 mV).
- **Model**: Broadest peak (~2.2 ms, 30 mV).
- **Notable**:
- All graphs show a rapid depolarization phase followed by repolarization.
- The *model* graph has the lowest amplitude, suggesting simplified ion channel dynamics.
---
### Key Observations
1. **Neuron-to-Network Mapping**:
- Dendrites (inputs) â Hidden layer nodes (V_out).
- Axon (output) â Output layer nodes (V_out).
2. **Action Potential Consistency**:
- Peaks align temporally (~2 ms) across all conditions, validating the model's accuracy.
3. **Activation Patterns**:
- Input layer nodes show decreasing activation (V_in,1 > V_in,2 > V_in,3 > V_in,4).
---
### Interpretation
1. **Biological Inspiration for AI**:
- The neuron diagram (a) directly maps to the artificial network (b), illustrating how biological neurons inspire computational architectures.
2. **Action Potential Dynamics**:
- The *in vivo* graph (red) reflects the most realistic membrane potential, while the *model* (black) simplifies ion channel behavior.
3. **Network Activation**:
- The input layer's decreasing activation (V_in,1 to V_in,4) suggests a gradient of signal strength, mirrored in the hidden layer's output (V_out,1 to V_out,4).
4. **Practical Implications**:
- The *in vitro* data (blue) bridges biological and simulated systems, showing how isolated neurons behave compared to whole organisms.
This diagram underscores the interplay between biological neuroscience and artificial intelligence, demonstrating how action potential models validate computational simulations of neural activity.
</details>
Figure 5: Biological neurons, artificial neural networks, and action potentials. a, Biological neuron and its associated concepts duan2020spiking. b, Artificial neural network with input features $V_{\mathrm{in},i}$ and output features $V_{\mathrm{out},i}$ duan2020spiking. Here, the resistor symbols (rectangles) represent the activation functions that switch the signals from individual inputs on or off. c Activation potentials naundorf2006unique. The top panel represents an action potential in a cat visual cortex neuron in vivo. The middle panel is an action potential from a cat visual cortical slice in vivo at 20°C. The bottom panel is a model potential. The arrow indicates the characteristic kink at the onset of the action potential. This figure is adapted from Refs. duan2020spiking; naundorf2006unique and modified.
2.2 Dictionary of corresponding concepts
To clarify the connection across the three research domains, we provide a list of analogous features of ANNs, biological neurons, and spin glasses in Table 1. Here, the concepts across the three research domains are compared. A few concepts in the table have been discussed above, such as a spin in spin glass corresponds to a biological neuron in biology and an artificial neuron in ANNs. This comparison helps identify some interesting concepts that are not studied, thus deepening our understanding and providing new research opportunities. For example, the Hamiltonian in spin glasses yields a total energy of the system that has a similar role to the loss function in ANNs, which has no correspondence in biological neurons. The effective learning mechanism of neurons remains elusive, although a few explanations have been proposed, such as the principle of predictive coding luczak2022neurons. The concept of Hamiltonian, if it can capture the overall activity of biological neurons, may be relevant to what dominates the coordination of specific neurons and the choice of particular information propagation paths friston2010free; friston2009free. Various forms of order parameters, including spin overlap parameters, magnetization, giant cluster size (commonly used in percolation theory), etc., are physical quantities that describe the average information of spin variables in spin systems and capture phase states. It can be calculated as the overall activity of biological neurons or artificial neurons, like how closely the current firing pattern matches a learned activity pattern in the hippocampus or cortex. Studying order parameters in ANNs can help understand how many neurons are generally used or activated, which is valuable to understanding the fundamental principles of ANNs as kernels.
Table 1: Dictionary for artificial neural networks (ANNs), spin glasses, and biological neurons. The terms in spin glasses and their corresponding terms in ANNs are compared. Interestingly, no term in ANNs corresponds to the order parameter, which is closely connected to the value of loss functions. The common features of ANNs and spin glasses explain (i) why Monte Carlo methods can explore their energy landscapes and (ii) why spin glass methods can be directly used to study ANNs. Another interesting question is whether we need to introduce new order parameters for spin glass physics.
| spin | biological neuron | artificial neuron |
| --- | --- | --- |
| spin variables | neuron state (active/inactive) | features |
| interaction of two spins | electromagnetic signal | weight between two neurons |
| interaction strengths | action potential | weight matrix |
| total energy (Hamiltonian) | ? (unclear) | loss function |
| stable or metastable states | memory | memorized patterns in associative network |
| connectivity of spins | synapses | activation function |
| spherical SG models | ? (unknown) | topology of fully connected ANNs |
| order parameter | overall activity of neurons | ? (no correspondence) |
2.3 Hopfield neural network and Boltzmann machines
The Hopfield neural network hopfield1982neural consists of a single layer of fully interconnected neurons, with each neuron linked to every other neuron [Figure 4 a]. The metastable states, or local minima of the network, correspond to the patterns, which allows the creation of associative or content addressable memories. Patterns are memorized and encoded in the network parameters. A similar idea was proposed by Little eight years prior to Hopfieldâs discovery little1974existence. Structurally, it bears a topological resemblance to the SK model sherrington1975solvable. The phase diagram of the Hopfield network is also similar to the SK model sherrington1975solvable. The original motivation of the Hopfield network was to create a model of associative memory, in which a stored pattern can be retrieved from an incomplete or noisy input.
The Hopfield network evolves over time, with the state of each neuron changing dynamically (Figure 4 b). This temporal evolution allows it to be viewed as a multi-layered system, where each âlayerâ represents a different time step. In this sense, it functions as a fully connected neural network, with each neuron capable of reading and outputting data. The network stores patterns by adjusting connection weights, and the number of patterns it can retain is proportional to the number of neurons folli2017maximum.
The Hopfield neural network influenced the development of recurrent neural networks (RNNs), such as the long short-term memory networks and the more efficient gated recurrent units. RNNs, characterized by recurrent neural layers, are designed to process sequential data, such as speech and natural languages. The transformer architecture has gradually replaced these methods, where the self-attention mechanism is adopted to replace recurrence for processing sequential data. The Hopfield neural network and its successors remain an active area of research agliari2019relativistic; saccone2022direct; negri2023storage.
Similar to the Hopfield neural network, the Boltzmann machine is a system where each spin interacts with all the others ackley1985learning [Figure 6 b]. It consists of a visible layer and a hidden layer, with all neuronsâwhether in the visible (input/output) or hidden layerâfully connected. Data is both inputted and outputted through the visible layer. A defining characteristic of Boltzmann machines is their stochastic nature: neuron activation is probabilistic rather than deterministic, influenced by connection weights and inputs. The probability of activation is determined by the Boltzmann distribution.
Due to the computational complexity of Boltzmann machines, more tractable ârestrictedâ Boltzmann machines (RBMs) were introduced smolensky1986information; nair2010rectified [Figure 6 c]. RBMs adopt a bipartite structure, where the digital layer (or visible layer) and the analog layer (hidden layer) are fully connected, but there are no intra-layer connections. This structural constraint allows the use of more computationally efficient training algorithms compared to RBMs fischer2014training. Boltzmann machines have been widely applied to physical and chemical problems, including quantum many-body wavefunction simulations nomura2017restricted; melko2019restricted, modeling polymer conformational properties yu2019generating, and representing quantum states with non-Abelian or anyonic symmetries vieijra2020restricted.
Barra et al. studied the equivalence of Hopfield networks and Boltzmann machines (Figure 6 d) barra2012equivalence. The study is based on a âhybridâ Boltzmann machine (HBM) model, where the $P$ hidden units in the analog layer are continuous (for pattern storage) and the $N$ visible neurons are discrete and binary. They showed that the HBM, when marginalized over the hidden units, and the Hopfield network are statistically equivalent. Assume $P(\sigma,z)$ is the joint distribution for HBM, $P(z)$ is the distribution for the continuous hidden variables that usually follow the Gaussian distribution, and $P(\sigma)$ is the distribution for the Hopfield distribution. These quantities follow Bayesâ rule $P(\sigma,z)=P(\sigma|z)P(z)=P(z|\sigma)P(\sigma)$ .
Since this work involves several key concepts used in ANNs, such as the diffusion model and pattern overlap, we discuss their theoretical details further here. The activity of the hidden layer follows a stochastic differential equation $T\frac{dz_{\mu}}{dt}=-z_{\mu}(t)+\sum_{i}\xi_{i}^{\mu}\sigma_{i}+\frac{2T}{\beta}\zeta_{\mu}(t)$ , where $\zeta_{\mu}$ is a white Gaussian noise. The idea is similar to the diffusion model widely used in image and video generation. The probability for $z_{\mu}$ described by the stochastic differential equation above is $P(z_{\mu}|\sigma)=\sqrt{\frac{\beta}{2\pi}}\exp\bigg[-\frac{\beta}{2}\bigg(z_{\mu}-\sum_{i}\xi_{i}^{\mu}\sigma_{i}\bigg)^{2}\bigg]$ . The Hamiltonian of the HBM shown in Figure 6 d is $H_{hbm}(\sigma,z,\tau;\xi,\eta)=\frac{1}{2}\bigg(\sum_{\mu}z^{2}_{\mu}+\sum_{\nu}\tau^{2}_{\nu}\bigg)-\sum_{i}\sigma^{2}_{i}\bigg(\sum_{\mu}\xi_{i}^{\mu}z_{\mu}+\sum_{\nu}\eta_{i}^{\nu}\tau_{\nu}\bigg)$ . Then, the joint probability for the HBM can be calculated by $P(\sigma,z,\tau)=\exp[-\beta H_{hbm}(\sigma,z,\tau;\xi,\eta)]/Z(\beta,\xi,\eta)$ , where the partition function $Z(\beta,\xi,\eta)=\sum_{\sigma}ât\prod^{P}_{\mu=1}dz_{\mu}ât\prod^{K}_{\nu=1}d\tau_{\nu}\exp[-\beta H_{hbm}(\sigma,z,\tau;\xi,\eta)]$ . Assisted by the Gaussian integral and Bayesâ rule, the probability for $\sigma$ is $P(\sigma)=\exp\bigg(-\frac{\beta}{2}\sum_{i,j}(\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu})\sigma_{i}\sigma_{j}\bigg)$ . If we set $J_{ij}=\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}$ , it is straightforward to see that this is the SK spin glass model. After determining the HBM Hamiltonian, it is not difficult to find that the concept of pattern overlap is mathematically equivalent to the overlap of replicas, an order parameter used in spin glass edwards1975theory.
Both Boltzmann machines and Hopfield neural networks have been foundational in the advancement of ANNs and deep learning. While newer, more efficient algorithms continue to emerge, these models demonstrate the profound impact of statistical physics on shaping machine-learning techniques.
2.4 Replica theory and the cavity method
Replica theory Replicas are copies of the same system. The key idea is to consider $n$ replicas in calculating its free energy, which is given by
$$
F=k_{B}T\langle\ln(Z)\rangle, \tag{6}
$$
where $\langle·\rangle$ represents thermal average over the ensemble. Then, the mathematical identity
$$
\langle\ln(Z)\rangle=\lim_{n\rightarrow 0}\frac{\langle Z^{n}\rangle-1}{n} \tag{7}
$$
can be used at the limit of $nâ 0$ to find the free energy. This is purely mathematical and not physical because the initially assumed integer $n$ is treated like a real number that can be smaller than one and infinitely close to 0. Mathematician Talagrand proved its correctness strictly talagrand2003spin. In the replica symmetry method, each replica is treated identically, which is the origin of the negative entropy in the mean-field solution for the SK model. Parisi proposed a replica symmetry-breaking method that successfully addressed the negative entropy issue by constructing a matrix ansatz in which replicas can have different ordering states.
The replica theory was initially developed to study glassy, disordered systems, such as spin glasses kirkpatrick1978infinite; parisi1983order; charbonneau2023spin; newman2024critical, and later to understand the macroscopic behavior of learning algorithms and capacity limits gardner1988space; amit1985storing. It was used to analyze the phase transitions of associative neural networks (e.g., the Hopfield network) and overparameterization and generalization of ANNs rocks2022memorizing; baldassi2022learning. One crucial question in associative neural networks is the storage capacity of memory. A network system with size $N$ could only provide associative memory $pâ€\alpha_{c}N$ at zero temperature with $\alpha_{c}=0.1-0.2$ for Hopfield models hopfield1982neural; amit1985storing. For example, Amit and colleagues found $\alpha_{c}â 0.138$ with Hebbâs rule $J_{ij}=1/2\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}$ amit1985storing. When different or no constraints are imposed on $J_{ij}$ , different values of $\alpha_{c}$ are found gardner1988space; gardner1987maximum; gardner1988optimal; krauth1989storage. When $pâ„\alpha_{c}N$ , memory quality degrades quickly. There is a phase transition at $\alpha_{c}$ . Studying this problem is equivalent to finding the number of ground states of a spin glass.
The replica theory has been used to understand the phase transition or the critical behavior of ANNs in supervised and unsupervised learning. Hou et al. proposed a statistical physics model of unsupervised learning and found that the sensory inputs drive a series of continuous phase transitions related to spontaneous intrinsic-symmetry breaking hou2020statistical. Baldassi et al. studied the subdominant dense clusters in ANNs with discrete synapses baldassi2015subdominant. They found these clusters enabled high computational performance in these systems.
Cavity method Later, Parisi and coauthors proposed another method, i.e., the cavity model, to solve the SK model (a method to calculate the statistical properties by removing a single spin and observing the reaction of the spin system) mezard1987spin. It is a statistical method used to calculate thermodynamic properties, serving as an alternative to the replica theory. It focuses on removing one spin and its interactions with its neighbors to create a cavity and calculates the response of the rest of the system. At zero temperature, the response is determined by energy minimization. Rocks et al. adopted the âzero-temperature cavity methodâ for the random nonlinear features model rocks2022memorizing to study the double descent behavior, an important phenomenon that we will discuss later.
<details>
<summary>x6.png Details</summary>

### Visual Description
## Diagram: Neural Network Architectures Comparison
### Overview
The image compares four neural network architectures: Hopfield neural network (a), Boltzmann machine (b), Restricted Boltzmann machine (c), and a hybrid visible-hidden layer configuration (d). Each section includes diagrams of neuron connections, layer types, and operational characteristics.
### Components/Axes
- **Labels**:
- **a**: Hopfield neural network
- **b**: Boltzmann machine
- **c**: Restricted Boltzmann machine
- **d**: Hybrid visible-hidden layer configuration
- **Diagram Elements**:
- **Nodes**:
- Blue circles (visible layer, store patterns)
- Pink circles (hidden layer, store patterns)
- Blue squares (ÏââÏâ , hidden layer outputs)
- Pink triangles (ÏââÏâ, input patterns)
- **Connections**:
- Fully connected (a, b)
- Restricted connections (c, d)
- **Text Annotations**:
- "Deterministic; T=0 K" (a)
- "Stochastic; Monte Carlo; finite temperature" (b)
- "Only neurons in different layers connected" (c)
### Detailed Analysis
#### Section a (Hopfield Neural Network)
- **Structure**: Fully connected neurons with visible and hidden layers.
- **Flow**: Input patterns (ÏââÏâ) at time *t<sub>i</sub>* propagate to output patterns (ÏââÏâ ) at time *t<sub>j</sub>*.
- **Key Features**:
- Deterministic operation (T=0 K).
- Visible layer stores patterns explicitly.
#### Section b (Boltzmann Machine)
- **Structure**: Fully connected visible and hidden layers.
- **Flow**: Stochastic interactions between layers via Monte Carlo sampling.
- **Key Features**:
- Finite temperature enables probabilistic pattern storage.
- No explicit input/output separation.
#### Section c (Restricted Boltzmann Machine)
- **Structure**: Visible and hidden layers with **only inter-layer connections** (no intra-layer connections).
- **Flow**: Similar to modern ANNs, with visible layer storing patterns.
- **Key Features**:
- Restricted connectivity reduces complexity.
- Stacked RBMs form deep learning architectures.
#### Section d (Hybrid Configuration)
- **Structure**: Visible layer (ÏââÏâ) connected to hidden layer (ÏââÏâ ) via fully connected edges.
- **Flow**: Input patterns (Ï) propagate through hidden layer (Ï) with no feedback loops.
- **Key Features**:
- Simplified architecture compared to RBMs.
- No explicit temperature or stochasticity mentioned.
### Key Observations
1. **Deterministic vs. Stochastic**:
- Hopfield (a) operates deterministically (T=0 K), while Boltzmann machines (b, c) use stochastic sampling.
2. **Connectivity**:
- RBM (c) restricts connections to inter-layer, unlike fully connected Boltzmann machines (b).
3. **Pattern Storage**:
- Visible layers (a, c, d) explicitly store patterns, while hidden layers (b, c) learn implicit representations.
### Interpretation
- **Hopfield Networks** (a) are ideal for associative memory tasks but lack scalability due to deterministic dynamics.
- **Boltzmann Machines** (b) introduce stochasticity for better generalization but suffer from high computational cost.
- **RBMs** (c) address this by restricting connections, enabling efficient training and forming the basis of deep belief networks.
- **Section d** illustrates a simplified feedforward architecture, emphasizing direct input-to-output mapping without hidden layer interactions.
The progression from Hopfield to RBM reflects advancements in balancing memory capacity, computational efficiency, and scalability in neural networks.
</details>
Figure 6: Comparison of Hopfield neural network and Boltzmann machines. a, Principle of the Hopfield neural network. b, Illustration of a Boltzmann machine. c, Restricted Boltzmann machine, generated by removing the intra-layer connections of visible and hidden layers. d, The equivalence of Hopfield neural network and a hybrid restricted Boltzmann machine. Here, the visible variables $\{\sigma_{i}\}$ are discrete and binary and hidden variables $\{z_{i}\}$ and $\{\tau_{i}\}$ are continuous.
2.5 Overparameterization and double-descent behavior
Overparameterization The variance-bias trade-off is prevalent and has been observed in numerous models, and the reason is apparent: fewer parameters result in low variance and high bias, while more parameters lead to high variance and low bias for each prediction. There is an optimal choice of intermediate parameter size at which the model achieves its best performance. However, this does not seem to hold for neural networks. The optimal performance of neural networks is achieved with overparameterization belkin2019reconciling; rocks2022memorizing; baldassi2022learning. When a neural network has more parameters than the number of input data points, it is still considered overparameterized. In simple analytical expressions, such as linear equations, this typically becomes problematic, a phenomenon known as overfitting. One long-standing mystery of ANNs is that they seemingly subvert traditional machine learning theory, as their parameter size can exceed the number of data points for training without a signal of overfitting.
Double-descent behavior The total error function of a model usually has a âUâ shape, and its minimum corresponds to the optimal parameters. The training error vanishes at a critical value of the parameter size when it equals the size of the training data points. The test error becomes divergent at this critical parameter size, analogous to the specific heat at the transition temperature. In traditional linear models, the error increases except at this critical value; in the overparameterized model, the test error first decreases, then increases at the critical value, and then decreases again. This phenomenon is referred to as double-descent behavior belkin2019reconciling; rocks2022memorizing; baldassi2022learning. This behavior results in an optimal or minimal test error in the overparameterized region.
Understanding the variance-bias trade-off of neural networks with overparameterization is crucial in deep learning. There are different opinions on the overparameterization phenomenon, which relies on methods from statistics and probability theory. For example, this behavior can be studied using the replica trick. The neural networks undergo a phase transition when the parameters increase across the critical threshold. The Information Bottleneck Principle (IBP) proposes that deep neural networks first fit the training data and then discard irrelevant information by going through an information bottleneck, which helps them generalize tishby2000information; tishby2015deep. Since the results are based on a particular type of ANNs, it is controversial whether the IBP conclusion holds generally for all deep neural networks saxe2019information.
ANNs with more neurons in a neural layer or wide neural layers usually have better generalization performance than their narrower counterparts. When the number of neurons approaches infinity, this extreme case becomes mathematically more straightforward to treat, which is equivalent to the fact that the SK model is easier to solve than the EA model. When an infinite number of neurons is allowed in a layer, this is equivalent to a Gaussian process neal1996priors; bahri2024houches. Interestingly, a recent study came to similar conclusions for quantum neural networks garcia2025quantum. The Gaussian process is one way to view ANNs and explains why many parameters are not overfitting. ANNs are equivalent to kernels at initialization and throughout the training process. Many parameters in ANNs remain constant, causing them to behave like a kernel jacot2018neural. Their values do not depend on the training data, but rather on the architecture of the neural network.
Machine learning algorithms that use kernels are known as kernel machines. These models operate by mapping data from a low-dimensional space to a high-dimensional one using functions such as Gaussian kernels, which can enhance classification performance. Its inverse process is regularization, which aims to reduce the number of free parameters to prevent overfitting. Kernel machines are conceptually simpler and more analytically tractable. During training, the evolution of the function represented by an infinite-width neural network mirrors that of a kernel machine. In function space, both models can be visualized as descending a smooth, convex (bowl-shaped) landscape in a high-dimensional space. Due to this structure, it is mathematically straightforward to prove that gradient descent converges to the global minimum. However, the practical relevance of this equivalence remains debated. Real-world neural networks are of finite width, and their parameters can change in complex ways during training. In terms familiar to statistical physicists, this is akin to questioning whether insights from the SK model of spin glasses remain valid in the short-range EA modelâi.e., whether the behavior in idealized, solvable models carries over to more realistic, complex systems.
3 Challenges and perspectives
ANNs are good at recognizing patterns. However, there is a lack of explainability, bias, hallucination, and transferability in ANNs. More drawbacks that need to be mitigated include reliance on massive datasets, catastrophic forgetting, vulnerability to attacks, high computational costs, and inadequate symbolic reasoning wang2024hopfield. In this section, we will elaborate on the challenges primarily relevant to their statistical aspects and provide our perspectives on their solutions.
3.1 Challenges in ANNs and spin glasses for ANNs
There are many unanswered questions specifically to ANNs, such as why ANNs work and why backpropagation performs better than other methods in high-dimensional space belkin2021fit. We do not know where to start solving them, and if we know, âa horde of people would do itâ miller2024nobel. To make things worse, ANNs have complicated and diverse structures, which prevent them from constructing a rigorous mathematical foundation. Nonetheless, statistical physics provides essential tools to solve these problems.
Statistical physics is deeply connected to optimization problems and provides approaches to solving them. This connection has been convincingly demonstrated by the fact that simulated annealing provides solutions to combinatorial optimization problems, such as the traveling salesman problem kirkpatrick1983optimization. Additionally, the simple âbasin-hoppingâ approach has been applied to atomic and molecular clusters, as well as more complicated hypersurface deformation techniques for crystals and biomolecules wales1999global. Breakthroughs in statistical physics are valuable to finding the optimal solutions to ANNs. We anticipate the development of more efficient statistical methods.
Understanding the nature of ground states of short-range spin glasses can provide valuable and indirect insights into the capacity of ANN to memorize patterns. One crucial direction associated with statistical physics for ANNs is to study the topological structure of non-idealized ANNs and their connection with spin-glass models mezard2024spin. We can only connect simplified ANNs with spin glasses, not the more interesting, widely applied ANNs. A few challenges exist in understanding spin glasses that have interaction ranges intermediate between those of short-range EA and mean-field SK models, as well as their mapping to general ANNs. Specific examples include (i) finding the global optimal solutions and (ii) whether the ground states in a general spin glass are finite or infinite newman2022ground. Numerical and theoretical bottlenecks contribute to the challenges. Numerically, (1) there is limited information on large systems and scaling due to the limited computing resources, and (2) there is a lack of visualization methods to help process high-dimensional data and inspire ideas for analytical solutions. Theoretically, (1) we have no clear picture of the models in the thermodynamic limit, and (2) there is no analytical method to describe short-range spin-glass models. Analytic solutions are limited to the dynamics of models with all-to-all interactions, such as the SK model. It is non-trivial to understand a sparse network where not all neurons or spins interact with each other. However, finding an analytical solution to sparse networks is essential for comprehending general ANNs, just as the SK model is for fully connected ANNs. Recently, Metz proposed a dynamical mean-field method for sparse directed networks and found their exact analytical solutions metz2025dynamical. The general solution is claimed to apply to the study of neural networks and beyond, including ecosystems, epidemic spreading, and synchronization, which represents meaningful progress. However, more efforts are still needed in this direction.
3.2 Spin-glass physics helps understand ANNs
Spin-glass physics and ANNs are reciprocal. Important inference problems in machine learning can be formulated as problems in the statistical physics of disordered systems. However, the significant issues we face in analyzing deep networks require the development of a new chapter in spin glass theory mezard2024spin. Only a solid theory can transform deep network predictions from best guesses in a black box into interpretable, demonstrable statements whose worst-case behavior can be controlled, although constructing a theory of deep learning is challenging.
We have summarized a few examples where ANNs solve problems in statistical and theoretical physics, such as the phase detection of matter or materials. This is an active research area. Meanwhile, mean-field theory (e.g., cavity method or replica theory) inspires the development of ANN algorithms and enhances our understanding of ANNs, such as the double-descent behavior. Currently, only analytical results are obtained for simple ANNs, and only these simple ANNs are well understood. Unlike the previous methods, which are suitable for single-layer or shallow networks, active research is ongoing to develop a statistical mechanical theory for learning in deep architectures, such as the method proposed by Li and Sompolinsky li2021statistical. In the future, it is expected that the properties of more real ANNs can be fully explored.
<details>
<summary>order-parameter.jpg Details</summary>

### Visual Description
## Figure: Multiscale Analysis of Magnetic Order Parameters in Complex Materials
### Overview
The figure presents a multiscale analysis of magnetic order parameters in complex materials, combining neural network architecture visualization, temperature-dependent order parameter measurements, and experimental data comparisons. It includes three main components: (a) a 3D convolutional neural network (CNN) architecture for material property prediction, (b) temperature-dependent order parameter measurements with phase transition markers, and (c) comparative experimental data for different material compositions.
### Components/Axes
#### Part (a): Neural Network Architecture
- **Diagram Elements**:
- Encoder: 3D Convolutional layers (green/orange blocks)
- Dense layers: ÎŒ (mean), Ï (variance), z (latent space)
- Decoder: 3D Average Pooling layers (green blocks)
- Error visualization: Color-coded lattice structure (purple=low error, yellow=high error)
- t-SNE plot: 2D projection with temperature color scale (500K-2000K)
- Equation: d = |xâ| + |xâ| (distance metric in latent space)
#### Part (b): Temperature-Dependent Order Parameters
- **Graph 1**:
- X-axis: Temperature (T) in Kelvin (0-2000K)
- Y-axis: <Z_op>/T (normalized order parameter)
- Legend:
- LROâ (gray)
- SROâ (light gray)
- SROâ (dark gray)
- Experimentâ (black)
- Inset: Lattice structures showing:
- Strong B2 order (panel 1)
- Partial B2 order (panel 2)
- A2 order (panel 3, red box)
- **Graph 2**:
- X-axis: Temperature (T) in Kelvin (0-3000K)
- Y-axis: Ï(Z_op) (susceptibility)
- Legend:
- LROâ (gray)
- Experimentâ (dark gray)
- AlâCoCrFeNi (blue: x=1, orange: x=1.6, green: x=2)
#### Part (c): Comparative Material Analysis
- **Graph 1**:
- X-axis: Temperature (T) in Kelvin (500-4000K)
- Y-axis: Ï(Z_op) (susceptibility)
- Legend:
- LROâ (gray)
- SROâ (light gray)
- SROâ (dark gray)
- Experimentâ (black)
- MoNbTaW (blue: 10x10x10, orange: 12x12x12)
- MoNbTaVW (green: 10x10x10)
- **Graph 2**:
- X-axis: Temperature (T) in Kelvin (1000-3000K)
- Y-axis: Ï(Z_op) (susceptibility)
- Legend:
- LROâ (gray)
- Experimentâ (dark gray)
- AlâCoCrFeNi (blue: x=1, orange: x=1.6, green: x=2)
### Detailed Analysis
#### Part (a)
- The CNN architecture uses 3D convolutional layers (green/orange) for feature extraction, followed by dense layers (ÎŒ, Ï, z) for latent space representation. The decoder reconstructs the material structure using 3D average pooling (green blocks). The t-SNE plot visualizes the latent space with temperature-dependent clustering (color scale: purple=500K, yellow=2000K). The error visualization shows spatial distribution of prediction errors in the lattice structure.
#### Part (b)
- **Graph 1**:
- LROâ shows a sharp peak at ~500K (value ~10) followed by a broad peak at ~1500K (value ~8).
- SROâ exhibits a single broad peak at ~1500K (value ~6).
- Experimental data (black) matches simulated trends with uncertainty bars (±0.5).
- Inset lattice structures show progressive disorder from panel 1 (strong B2) to panel 3 (A2 order).
- **Graph 2**:
- Ï(Z_op) decreases monotonically with temperature for all compositions.
- AlâCoCrFeNi x=1 shows the highest susceptibility (45â15).
- x=2 composition exhibits the lowest susceptibility (20â10).
#### Part (c)
- **Graph 1**:
- MoNbTaW (10x10x10) shows Ï(Z_op) decreasing from 6â1 (500Kâ3000K).
- Larger MoNbTaW (12x12x12) follows similar trend but with higher values.
- MoNbTaVW (10x10x10) exhibits the lowest susceptibility (6â0.5).
- **Graph 2**:
- AlâCoCrFeNi x=1: 45â15 (1000Kâ2000K)
- x=1.6: 35â12 (1000Kâ2000K)
- x=2: 20â10 (1000Kâ2000K)
- Experimental data (dark gray) aligns with simulations within error margins.
### Key Observations
1. **Phase Transitions**: Sharp peaks in <Z_op>/T (part b) indicate magnetic phase transitions at ~500K and ~1500K.
2. **Composition Effects**: Higher Al content (x=2) reduces susceptibility compared to x=1.
3. **System Size Dependence**: Larger MoNbTaW (12x12x12) shows delayed phase transitions vs. 10x10x10.
4. **Experimental Validation**: Simulated trends match experimental data within error bars (±0.5-1.0).
### Interpretation
The neural network architecture (a) demonstrates a multiscale approach to predict magnetic properties, with the t-SNE plot revealing temperature-dependent clustering in latent space. The order parameter measurements (b) show distinct phase transitions in LROâ and SROâ, with experimental validation confirming the model's accuracy. Comparative analysis (c) reveals that:
- Higher Al content (x=2) suppresses magnetic ordering compared to x=1.
- Larger unit cells (12x12x12) exhibit delayed phase transitions due to increased atomic disorder.
- MoNbTaVW shows significantly reduced susceptibility, suggesting different magnetic interactions.
The error visualization in (a) highlights the importance of spatial resolution in predicting local magnetic order. The temperature-dependent susceptibility trends (c) indicate that material composition and structural complexity are critical factors in determining magnetic behavior, with potential applications in designing magnetic materials for spintronics and thermal management.
</details>
Figure 7: Neural network-based order parameter in complex concentrated alloys. a, A schematic structure for variational auto-encoder (VAE) was used. The information extracted from the latent space and projected in a two-dimensional (2D) space using t-SNE. The 2D data was then used to construct the order parameter $\langle Z^{op}\rangle_{T}$ based on Manhattan distance $\sum_{i}|x_{i}|$ . b, The new order parameter describes the different configurations. The first-order derivative of $\langle Z^{op}\rangle_{T}$ (i.e., $\chi(Z^{op})$ ) can play a similar role to the specific heat $C_{v}$ , where the peaks represent the two phase transitions. c, The temperatures of phase transitions represented by the peak of $\chi(Z^{op})$ are compared with experimental data. This figure is adjusted from Ref. yin2021neural.
3.3 Do we need new order parameters for ANNs?
Historically, order parameters play an essential role in describing the thermodynamic behavior of complex systems, as demonstrated by the second-order parameters $\langle q^{2}\rangle$ proposed by Edwards and Anderson edwards1975theory. New order parameters promote the study of glassy systems. Currently, the phase distributions of short-range spin glasses are not fully understood, which may limit our understanding of complex ANNs. We may need new order parameters based on ANNs to describe phase transitions and the phase distribution of spin glasses. Developing a rigorous mathematical description of metastable states and solutions appears to be necessary. In a previous study, we adopted a similar idea and proposed a neural-network-based order parameter to study the complex concentrated alloys, where multiple principal elements co-exist that introduce maximal disorder (Figure 7) yin2021neural. We successfully utilized the new order parameter to differentiate between different phases. We found that the predicted phase transition temperatures are consistent with experimental results. Similarly, due to the complex feature of ANN phase transitions, we may need one ANN to help us understand the phases of another ANN. ANNs can capture size-independent patterns that pave the way to understand the ground states of spin glasses at the thermodynamic limit. Currently, only spin-glass results of relatively small sizes are available, limited by computing resources (speed of supercomputers). However, we still need to overcome difficulties, including feature extraction and feature synthesis used in ANNs.
3.4 Quantum computing for spin glasses and ANNs
Due to the rugged energy landscapes of spin glasses, even todayâs most powerful supercomputers cannot model the large size of this complex system, leaving the behavior of these systems largely unexplored. Quantum computers can mitigate this issue with their exponential scaling. The breakthroughs in both hardware and algorithms in quantum computing provide new opportunities. A team from Google reported an error-corrected chip, Willow, featuring 105 qubits based on superconducting qubits, in December 2024 Acharya2025. The performance of the Willow chip is challenged by Chinaâs Zuchongzhi 3.0 processor gao2025establishing, a superconducting quantum computer prototype featuring 105 qubits, which was reported in March 2025. The hope of quantum computing based on Majorana particles, a type of quasi-particle in topological insulators, rose in February 2025. Microsoft published its intermediate, controversial result on its quantum computer, Majorana 1 quantum2025interferometric. Quantum computers can be used to efficiently explore the phase spaces of spin-glass-like systems. Simulating Ising spin glasses on a quantum computer provides new opportunities Lidar1997. For example, the Ising-type Sachdev-Ye-Kitaev (SYK) model that describes the dynamics of wormholes between black holes for fermions can be solved numerically by quantum computing gao2021traversable; jafferis2022traversable. Quantum computing can also be applied in quantum machine learning, such as in the development of quantum versions of ANNs or quantum neural networks boreiri2025topologically. This will become more promising with the emergence of new quantum algorithms. Recently, King and colleagues used quantum annealing processors to simulate quantum dynamics in programmable spin glasses King2025-D-Wave, one application of quantum computers of practical interest. Given the topological relationship between spin glasses and ANNs, this will provide valuable insights into understanding ANNs.
Additionally, it is anticipated that ANNs can leverage the benefits of quantum computing in terms of efficiency and accuracy. For example, the quantum version of the Hopfield network was developed by replacing classical Hebbian learning with a quantum Hebbian learning rule rebentrost2018quantum. The researchers demonstrated the ability to store exponentially large networks with a polynomial number of qubits by encoding them as amplitudes of quantum states. Their quantum algorithm achieves a quantum computational complexity that is logarithmic in the dimensionality of the data. The advantages of quantum ANNs, as well as other quantum algorithms, are questionable and unknown in near-term quantum computers. To address this critical question, Abbas et al. proposed the so-called effective dimension to measure the power and trainability of quantum ANNs abbas2021power. Assisted by the measure, they showed a quantum advantage of a class of quantum ANNs. Quantum ANNs are currently under active investigation and are still in their early stages of development.
3.5 More opportunities
One exciting future direction of ANN is neural computing, also known as synaptic computation abbott2004synaptic; wang2018fully; weilenmann2024single; zhou2024field. The storage and computing of the current ANNs are separate. Differently, biological neurons store data and perform computations through the same neural network, making it highly efficient. Since associative ANNs have demonstrated their capability for memory storage and have been successfully applied in computation, it is natural to take the next step and design memristive neural networks. Memristors can store and compute information simultaneously. A memristor is a combination of a memory unit and a resistor, which can mimic biological synaptic functions in ANNs. The resistance of a memristor changes based on the history of applied current or voltage. A prominent advantage of memristors is that they are non-volatile and can retain memory in the absence of power. For example, Wang et al. developed fully memristive neural networks for pattern classification with unsupervised learning wang2018fully. In a recent study, Weilenmann and colleagues explored energy-efficient neural networks using a single neuromorphic memristor that mimics multiple synaptic mechanisms weilenmann2024single. More details can be found in a recent review of the memristive Hopfield neural networks applied to chaotic systems lin2023review. Memristors are artificial analogs of biological neurons that harness computing power by mimicking their structure and function. Recently, another exciting direction that deserves more attention is the direct participation of neurons in vitro in computational processes. These studies integrate adaptive in vitro neurons and in silico high-density multi-electrode arrays into digital systems to perform computations kagan2022vitro. Nonetheless, these research directions are still in their infancy and offer a lot of opportunities to explore the potential of neural computing. Since they are not the focus of this article, we will not discuss them in depth.
4 Conclusions
We have reviewed the applications and history of artificial neural networks, as well as their connections with biology and statistical physics, particularly in the context of spin glasses. We showed the deep connections of these multidisciplinary directions. For example, we demonstrate how the replica theory is applied to understand the behavior of artificial neural networks. We also discussed the challenges and possible solutions. One of the significant problems is understanding and also accelerating the training of large artificial neural networks, which can benefit from the integration with quantum computing and neural computing. Quantum computing can provide exponential acceleration, while neural computing can mitigate the storage limit, reducing the latency in data transfer. Theoretically, the complex behavior, reliability, and stability of ANNs require more research efforts, which lag behind their applications. Arguably, the most challenging problems involve understanding biological and artificial neurons. The former involves the formation of consciousness, while the latter involves the realization of artificial intelligence. We concluded that statistical physics bridges the gap between these key multidisciplinary problems and will provide valuable methods to find their answers.