# Statistical physics for artificial neural networks
**Authors**:
- Zongrui Pei (New York University, New York, NY10003, USA)
- E-mail: peizongrui@gmail.com; zp2137@nyu.edu
Abstract
The 2024 Nobel Prize in Physics was awarded for pioneering contributions at the intersection of artificial neural networks (ANNs) and spin-glass physics, underscoring the profound connections between these fields. The topological similarities between ANNs and Ising-type models, such as the Sherrington-Kirkpatrick model, reveal shared structures that bridge statistical physics and machine learning. In this perspective, we explore how concepts and methods from statistical physics, particularly those related to glassy and disordered systems like spin glasses, are applied to the study and development of ANNs. We discuss the key differences, common features, and deep interconnections between spin glasses and neural networks while highlighting future directions for this interdisciplinary research. Special attention is given to the synergy between spin-glass studies and neural network advancements and the challenges that remain in statistical physics for ANNs. Finally, we examine the transformative role that quantum computing could play in addressing these challenges and propelling this research frontier forward. Contents
1. 1 Introduction
1. 2 Spin-glass physics for artificial neural networks
1. 2.1 Relations of spin glasses, biological neurons, and associative memory
1. 2.2 Dictionary of corresponding concepts
1. 2.3 Hopfield neural network and Boltzmann machines
1. 2.4 Replica theory and the cavity method
1. 2.5 Overparameterization and double-descent behavior
1. 3 Challenges and perspectives
1. 3.1 Challenges in ANNs and spin glasses for ANNs
1. 3.2 Spin-glass physics helps understand ANNs
1. 3.3 Do we need new order parameters for ANNs?
1. 3.4 Quantum computing for spin glasses and ANNs
1. 3.5 More opportunities
1. 4 Conclusions
1 Introduction
Artificial neural networks (ANNs) have had a profound influence in many sectors, as demonstrated by numerous notable milestones in ANN applications. For example, AlexNet is one of the first models to show the impressive capabilities of ANNs in classification tasks using the ImageNet dataset krizhevsky2012imagenet. AlphaGo corroborates that ANNs can outperform human players in games silver2016mastering; silver2017mastering. ANNs also find their application in self-driving cars badue2021self. Recently, large language models chatGPT; pei2025language, which are complex ANN models, have fundamentally changed how we work, learn, and teach, influencing nearly everyone. ANNs have been widely adopted in various scientific research domains. An outstanding example is AlphaFold, which has solved the half-century-long challenge in structural biology jumper2021highly and accelerated the determination of three-dimensional (3D) protein structures, potentially revolutionizing drug discovery and the healthcare industry. A few more typical examples in physics are shown in Figure 1. ANNs have been used to describe many-body interactions or construct empirical potentials, such as PhysNet unke2019physnet. Some ANN potentials with multiple neural layers are also referred to as deep potentials zhang2018deep. Recently, inspired by the success of foundation models for natural languages, developing foundation models for potentials has been proposed, and some initial studies have been conducted batatia2023foundation. Order parameters can describe the phase transitions in thermodynamics. ANNs have been adopted to construct new order parameters successfully yin2021neural; rogal2019neural; jung2025roadmap. As a proof of concept, ANNs are also used to rediscover physical laws and concepts iten2020discovering. ANN models have been developed to design novel materials pei2023toward; Tshitoyan2019; pei2024designing; pei2024computer. They have been utilized to link the physical properties of various materials and their crystal structures, microstructure, and processing pei2021machine; pei2024towards. Examples of applications include low-dimensional materials, structural materials, and functional materials, among others.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Machine Learning and Physics Applications
### Overview
The image is a composite diagram illustrating the applications of machine learning (ML) in physics and materials science. It is structured as a pentagon in the center, with each side leading to a different application area. The central pentagon is labeled "ML & Physics". The diagram showcases how ML techniques are used to discover physical laws, new materials, model many-body interactions, analyze thermodynamics/order parameters, and develop language models for materials design.
### Components/Axes
* **Central Pentagon:** Labeled "ML & Physics". Serves as the central theme connecting the different applications.
* **Application Areas (clockwise from top):**
* Many-body interaction (ML potentials)
* Discover new materials
* Language models for materials design
* Thermodynamics/Order parameters
* Discover physical laws and concepts
### Detailed Analysis
**1. Many-body interaction (ML potentials):**
* **Diagram:** Shows a molecular structure with atoms represented as spheres. A dashed red circle indicates a cutoff radius labeled "r_cut".
* **Equations:**
* Energy: `E = ÎŁ E_i + ÎŁ ÎŁ k_ij q_i q_j / r_ij` (summation indices not fully clear, but likely `i=1` to `N` and `j>i`)
* Forces: `F_i = -âE / âr_i`
* Dipole: `p = ÎŁ q_i r_i` (summation from `i=1` to `N`)
* **Description:** This section focuses on using ML to model the interactions between multiple bodies (atoms/molecules). The equations represent the energy, forces, and dipole moments in a system.
**2. Discover new materials:**
* **Diagram:** A periodic table fragment is shown, with elements colored according to the agreement between ML predictions and experimental results.
* **Legend:**
* White square: ML vs YSI(Mech. I)
* Upward-pointing triangle: ML vs -dz(Mech. II)
* Downward-pointing triangle: YSI(Mech. I) vs -dz(Mech. II)
* **Color Scale:** A horizontal bar ranges from "disagree" (purple) to "agree" (teal).
* **Elements:** The periodic table includes elements from Be to Bi. Elements are colored based on the agreement between different methods. For example, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, Ta, W, Re, Os, Ir, Pt, Au, Hg, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yo.
* **Description:** This section explores the use of ML to predict and discover new materials. The colored periodic table indicates the level of agreement between ML predictions and experimental data for various elements.
**3. Language models for materials design:**
* **Diagram:** A neural network architecture is depicted.
* **Components:**
* Input layer (one-hot vectors): "Collection of Books, journals, etc." -> "Digitalization" -> "Corpora bank" -> Vocabulary of m words. Example input vectors are shown for Mg and CoCrFeNiMn.
* Hidden layer: Contains "Neuron 1", "Neuron 2", ..., "Neuron n". Hidden layer matrix M_hidden.
* Output layer (softmax): Contains "W_1", "W_2", ..., "W_m".
* **Text:** "Only connected within a window of size W".
* **Description:** This section illustrates the use of language models in materials design. The neural network takes text data (books, journals) as input, processes it through hidden layers, and generates an output, likely related to material properties or design parameters.
**4. Thermodynamics/Order parameters:**
* **Diagrams:**
* **(a)** Two images showing atomic configurations. The left image is labeled "Q^900 = 0.20", and the right image is labeled "Q^A15 = 0.52".
* **(b)** A plot of f(Q) vs. time (ns). The x-axis ranges from 0 to 20 ns. The y-axis, f(Q), ranges from 0 to 1.0. The plot shows a step-like increase in f(Q) around 10 ns. Inset images show atomic configurations at different points in time.
* **Description:** This section focuses on using ML to analyze thermodynamic properties and order parameters in materials. The plot shows how an order parameter, f(Q), changes over time, indicating a phase transition or structural change.
**5. Discover physical laws and concepts:**
* **Diagrams:**
* A graph of x vs. t, showing a linear relationship.
* A schematic of an encoding-decoding process: "observations" -> "encoding" -> "parameterization" -> "decoding" -> "answer".
* A neural network diagram with an "encoder E", a "latent representation r", and a "decoder D". The input is "observation o" and "question q", and the output is "answer a".
* **Equations:**
* `v = ___` (parameterization step)
* `x(t) = x_0 + vt'` (answer)
* **Description:** This section explores the use of ML to discover underlying physical laws and concepts. The diagrams illustrate how ML can be used to encode observations, parameterize them, and decode them to obtain answers or predictions.
### Key Observations
* The diagram provides a high-level overview of how ML is being applied in various areas of physics and materials science.
* Each application area is represented by a combination of diagrams, equations, and descriptions.
* The use of neural networks is a common theme across several application areas.
* The periodic table visualization provides a clear way to assess the agreement between ML predictions and experimental data.
### Interpretation
The image demonstrates the growing importance of machine learning in physics and materials science. ML techniques are being used to address complex problems, such as modeling many-body interactions, discovering new materials, and uncovering fundamental physical laws. The diagram highlights the interdisciplinary nature of this research area, bringing together concepts from physics, materials science, and computer science. The applications shown have the potential to accelerate scientific discovery and lead to the development of new technologies. The periodic table visualization is particularly insightful, as it provides a visual representation of the accuracy and reliability of ML predictions for different elements. The language models section suggests a move towards automated materials design, where ML can generate new material candidates based on text data and scientific literature.
</details>
Figure 1: The applications of artificial neural networks in physics. Artificial neural networks have been widely utilized to solve a diverse range of problems, including those in scientific research. Here, we show a few typical examples in physical research, i.e., (i) many-body interactions (machine-learning potentials) unke2019physnet; zhang2018deep, (ii) thermodynamics (proposal of order parameters to describe phase transitions) yin2021neural; rogal2019neural, (iii) discovery of physical laws and concepts iten2020discovering, (iv) discovery or design of new materials pei2021machine, and (v) language models for materials design pei2023toward; Tshitoyan2019; pei2024towards.
ANNs have been employed to understand Ising-like physical systems mills2020finding; carrasquilla2017machine; hibat2021variational; fan2023searching, including spin glass hibat2021variational; fan2023searching. They provide a new opportunity to understand the physics of spin glass. Fan et al. searched for spin glass ground states through deep reinforcement learning fan2023searching. Huang et al. confirmed the efficiency of classic machine learning to describe quantum phases, taking 2D random Heisenberg models as an example huang2022provably. ANNs can boost Monte-Carlo simulations of spin glasses (autoregressive neural networks) mcnaughton2020boosting and search for ground states of spin glasses (tropical tensor network) liu2021tropical.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Timeline of Neural Network and LLM Development
### Overview
The image presents a timeline illustrating the evolution of neural networks and Large Language Models (LLMs) from 1943 to the present day. It includes diagrams of various network architectures, a protein structure, and a bar chart showing the number of major releases of different LLM architectures over time.
### Components/Axes
* **Timeline Axis:** A horizontal arrow labeled "Year" spans from 1943 to "Now," with key years marked.
* **Network Diagrams:**
* Hopfield Network: A square with four nodes, each labeled "0" or "1," connected by lines.
* Boltzmann Machine: A similar structure with visible and hidden nodes.
* Restricted Boltzmann Machine: Similar to the Boltzmann Machine.
* **Spin Glass Diagram:** A triangle with "+" and "-" signs, and arrows indicating spin directions.
* **Transformer Architecture Diagram:** A block diagram illustrating the components of a transformer network, including positional encoding, input/output embeddings, multi-head attention, and feed-forward layers.
* **Protein Structure:** A 3D rendering of a protein structure, likely related to AlphaFold.
* **LLM Architecture Evolution Chart:**
* X-axis: Years from 2018 to 2024.
* Y-axis: "Number of major releases," ranging from 0 to 20.
* Legend:
* Encoder-Only (Blue)
* Encoder-Decoder (Orange)
* Decoder-Only (Green)
* Linear-Attn (Brown)
### Detailed Analysis
**Timeline Events:**
* **1943:** McCulloch-Pitts network
* **1949:** Hebb's rule
* **1958:** Perceptron
* **1974:** Little's associative network
* **1975:** EA/SK Spin glass
* **1982:** Hopfield network
* **1985:** Boltzmann Machine
* **2017:** Transformer
* **2021:** AlphaFold
* **Now:** LLMs
**LLM Architecture Evolution Chart:**
* **2018:** Encoder-Only: ~2
* **2019:** Encoder-Only: ~4, Encoder-Decoder: ~3, Total: ~7
* **2020:** Encoder-Only: ~1, Encoder-Decoder: ~2, Total: ~3. "GPT3" is labeled near this bar.
* **2021:** Decoder-Only: ~12
* **2022:** Decoder-Only: ~19
* **2023:** Decoder-Only: ~9
* **2024:** Decoder-Only: ~11, Linear-Attn: ~1, Total: ~12
**Network Diagrams:**
* **Hopfield Network:** Contains four nodes arranged in a square. Two nodes are filled black and labeled "1", while the other two are white and labeled "0". Each node is connected to every other node.
* **Boltzmann Machine:** Contains four visible nodes (two black "1", two white "0") and four hidden nodes (all gray "0"). All nodes are interconnected.
* **Restricted Boltzmann Machine:** Contains four visible nodes (two black "1", two white "0") and four hidden nodes (all gray "0"). All nodes are interconnected.
**Spin Glass Diagram:**
* A triangle with arrows indicating spin directions. Two arrows point upwards (blue), and one points downwards. "+" and "-" signs are placed at the corners. A question mark in red is placed at the bottom right corner.
**Transformer Architecture Diagram:**
* Illustrates the flow of data through a transformer network, including input and output embeddings, positional encoding, multi-head attention, and feed-forward layers.
### Key Observations
* The timeline highlights the progression from early neural network models to modern LLMs.
* The LLM architecture chart shows a significant increase in the number of Decoder-Only model releases, especially after 2020.
* The protein structure likely represents the impact of AI, specifically AlphaFold, on protein structure prediction.
### Interpretation
The image effectively visualizes the historical development of neural networks and LLMs. The shift from Encoder-Only and Encoder-Decoder architectures to predominantly Decoder-Only models suggests a trend towards sequence-to-sequence models and generative AI. The inclusion of AlphaFold indicates the broader impact of AI on scientific domains beyond natural language processing. The Spin Glass diagram represents a problem in physics that has connections to neural networks. The question mark suggests an unknown or unresolved aspect of the system.
</details>
Figure 2: The history of artificial neural networks and relevant physical models. Other landmarks include backpropagation in the 1970s and AlexNet in 2012. Interestingly, bio-inspired ANNs ultimately led to the solution of the protein folding problem that has plagued us for half a century.
The significant breakthroughs in developing ANNs highlight the importance of convergence across different disciplines. ANNs mimic the structure and function of biological neurons, inspired by the human brain. In addition to their biological origin, the physical foundation of ANNs has been recognized by the Nobel Prize in Physics 2024, awarded to physicist John J. Hopfield and computer scientist Geoffrey E. Hinton hinton2025nobel. Their foundational discoveries enabled the development of ANNs and their applications today. ANNs are closely connected with spin glass, a traditional topic theorized by Edwards and Anderson in 1975 edwards1975theory and has attracted considerable attention in recent years. The importance of spin glasses as a representative complex system in different research domains across various length scales was demonstrated by the Nobel Prize in Physics 2021, awarded partly to Giorgio Parisi parisi1979infinite. The Nobel Prizes in Physics in 2021 and 2024 have also contributed to the increased attention. The prize-winning studies revealed a profound connection between spin glasses and diverse disciplines, ranging from large to small scales, encompassing complexity science, biology, and computer science.
There are a few landmarks in the history of ANNs [Figure 2], such as the seminal work of McCulloch and Pitts on neural networks in 1943 mcculloch1943logical that initiated the research domain and the Hebbian mechanism or Hebbâs rule in 1949 that suggests when neurons fire signals hebb2005organization, the simple ANN perceptron proposed by Rosenblatt in 1958 rosenblatt1958perceptron, and the proposal of associative neural networks by Little in 1974 little1974existence, and later by Hopfield in 1982 hopfield1982neural. In 1985, Hinton and colleagues proposed the Boltzmann machine ackley1985learning, a stochastic recurrent neural network (RNN) where the energy state of each neural-network configuration follows the Boltzmann distribution. Restricted Boltzmann machines were proposed later by removing the connections between neurons within the same layer to improve the networkâs efficiency. The proposal of backpropagation in the 1970s, which Hinton and colleagues utilized to optimize ANNs like AlexNet in 2012, is another significant milestone. In 2017, the transformer architecture was proposed as an alternative to RNN structures, where the attention mechanism is a key novel component. Due to its high efficiency and suitability in large model optimization, it opened a new era for deep neural networks. AlphaFold jumper2021highly, the Nobel-Prize-winning model, and the popular large language models like GPT-3.5/4/4.5 and Llama 2/3/4 all adopt the transformer architecture.
Albeit with enormous success, the mechanisms behind these successes remain enigmatic. Fortunately, it has been demonstrated that idealized versions of these powerful networks are mathematically equivalent to older, simpler machine learning models, such as kernel machines belkin2018understand; jacot2018neural or physical models jcrn-3nrc. Therefore, the physical foundation of ANNs is closely related to magnetism (spins) and statistical physics amit1985spin; amit1987statistical; watkin1993statistical; mezard2024spin. Hopfield networks hopfield1982neural; hopfield1999brain; krotov2023new and Boltzmann machines ackley1985learning are essentially Sherrington-Kirkpatrick (SK) spin glasses with random interaction parameters sherrington1975solvable, where each neuron or spin is connected with all other ones except itself. When mapping the spin-glass structure to an ANN, each state corresponds to a pattern or memory in the Hopfield network. Finite metastable states indicate the limited capability of the Hopfield network folli2017maximum. It is essential to determine whether general spin glasses exhibit infinite metastable states, as is the case with the mean-field SK model. Spherical spin glass models have been used to study simplified, idealized ANNs choromanska2015loss. Parisi proposed a replica symmetry breaking (RSB) solution parisi1979infinite; charbonneau2023spin to the SK model sherrington1975solvable. The RSB theory has also been used to study ANNs, and steps toward some rigorous results were discussed agliari2020replica. Ghio et al. showed the sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective ghio2024sampling. If this equivalence of ANNs and spin-glass systems can be extended beyond idealized neural networks, it may explain how practical ANNs achieve their astonishing results.
Box 1 | Basic Concepts of Statistical Physics Used by ANNs
Thermodynamics
Energy Landscapes and Loss Functions The loss functions in ANNs are similar to the total energy of physical systems [Figure 1]. Statistical physics concepts, such as energy landscapes, are used to understand the dynamics and optimization in neural networks. For example, the weights and states in neural networks can be thought of as occupying a rugged energy landscape with many local minima, just like glassy Ising models. To optimize or train ANNs, we use methods such as simulated annealing and stochastic gradient descent, which are inspired by statistical physics.
Entropy and Information Theory Entropy from statistical physics is used to measure uncertainty in predictions and model behavior. One type of entropy, known as âcross entropy,â between the actual values and predictions can be used as an error function. In addition, the entropy concept is closely related to the Information Bottleneck Principle, which explains how neural networks compress information during training, much like physical systems reduce entropy under certain conditions.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Energy Landscape Diagram
### Overview
The image depicts an energy landscape diagram illustrating the concept of local minima, global minimum, and their corresponding states. It shows a plot of energy/error function versus parameter space, with a blue line representing the rugged energy landscape. The diagram also includes visual representations of "High-temperature state" and "Ising square-ice ground state" as arrangements of black and white circles.
### Components/Axes
* **Y-axis:** "Energy / Error function"
* **X-axis:** "Parameter space"
* **Blue Line:** "Rugged Energy Landscape"
* **Red Dots:** Represent "Local minima/metastable phases"
* **Green Dot:** Represents "Global minimum/ground phase"
* **Red Box (Top-Right):** "High-temperature state" - A visual representation of a disordered state with alternating black and white circles.
* **Green Box (Bottom-Right):** "Ising square-ice ground state" - A visual representation of an ordered state with a specific arrangement of black and white circles. An inset shows arrows representing the direction of the Ising spins.
### Detailed Analysis or ### Content Details
* **Rugged Energy Landscape (Blue Line):** The blue line shows a curve with multiple local minima and one global minimum. The curve starts at a relatively high energy level, dips down to a local minimum, rises again, dips to a global minimum, rises again, and then dips to another local minimum before rising sharply.
* **Local Minima/Metastable Phases (Red Dots):** Two red dots are placed on the blue line, indicating local minima. These are connected to the "High-temperature state" representation with red dashed lines.
* **Global Minimum/Ground Phase (Green Dot):** A green dot is placed at the lowest point of the blue line, indicating the global minimum. This is connected to the "Ising square-ice ground state" representation with a green dashed line.
* **High-temperature state (Red Box):** The arrangement of black and white circles appears random, suggesting a high-energy, disordered state.
* **Ising square-ice ground state (Green Box):** The arrangement of black and white circles shows a more ordered pattern, suggesting a lower-energy, ordered state. The inset shows arrows pointing up and to the right, labeled 'v', indicating the direction of the Ising spins.
### Key Observations
* The energy landscape has multiple local minima, representing metastable states.
* The global minimum represents the most stable ground state.
* The "High-temperature state" is associated with local minima and a disordered configuration.
* The "Ising square-ice ground state" is associated with the global minimum and an ordered configuration.
### Interpretation
The diagram illustrates the concept of energy landscapes in physical systems. The rugged energy landscape represents the potential energy of a system as a function of its parameters. The local minima correspond to metastable states, where the system can be trapped for a certain period. The global minimum represents the most stable state of the system. The "High-temperature state" and "Ising square-ice ground state" provide visual representations of the system's configuration in different energy states. The diagram suggests that the system tends to settle into the global minimum, but it can be trapped in local minima depending on the initial conditions and energy barriers. The arrows in the Ising square-ice ground state inset likely represent the direction of magnetic spins in the Ising model.
</details>
Figure 3: Basic concepts in statistical physics and thermodynamics. Here, we show that each pattern (memory or configuration) corresponds to an energy state in the rugged energy landscape of the free energy function or an error function.
Phase Transitions The concept of phase transitions is useful to describe the behavior of ANNs. ANNs exhibit phase transitions during training when their prediction and reliability suddenly change upon small changes of ANN structures (e.g., neural layers, parameter size in each layer), learning rate, etc., around critical values. This is similar to physical systems undergoing structural changes (e.g., spin glasses).
Bayesian Inference and Partition Functions
Bayesian statistics finds its application in ANNs. It uses Bayesâ theorem to calculate and update the probability distribution when new data or knowledge is available. This conditional feature meets the flexibility requirement of real-world problems, making it practically useful. Bayesian neural networks use principles from statistical physics for probabilistic inference. The partition function, a cornerstone of statistical mechanics, is used to calculate probabilities in probabilistic graphical models, including some neural networks.
Learning Dynamics and Langevin Equations
There are a few connections between Langevin equations and ANNs. Here, we mention two of them: (i) The learning dynamics of ANNs is described by Langevin equations, analogous to the motion of particles in a thermal bath. In studying stochastic gradient descent (SGD), the noise introduced by mini-batch sampling is found to resemble thermal noise goldt2019dynamics. (ii) The diffusion model of ANNs for images is similar to Langevin equations albergo2023stochastic, since both start with a white noisy background, and gradually recover the true states based on the status change of each particle or pixel.
2 Spin-glass physics for artificial neural networks
Just as neurons are the biological analogue of ANNs, spin glasses could be considered the physical analogue of ANNs: they were the inspiration for the original Hopfield model hopfield1982neural, which itself is viewed by some as a type of spin glass. We will discuss their relationships below.
2.1 Relations of spin glasses, biological neurons, and associative memory
Spin glass A spin glass is a type of magnetic system in which the quenched interactions between localized spins may be either ferromagnetic or antiferromagnetic with roughly equal probability. It exhibits an array of experimental properties: spin freezing with no long-range magnetic order at low temperature, a magnetic susceptibility cusp at a critical temperature accompanied by the absence of a singularity in the specific heat, slow relaxation, aging, and several others stein2013spin; stein2011spin. Most theoretical work has focused on the Edwards-Anderson (EA) spin glass edwards1975theory proposed in 1975 and its infinite-range analogue, the Sherrington-Kirkpatrick (SK) model sherrington1975solvable, proposed the same year.
The EA Hamiltonian in the absence of an external field is given by
$$
H=-\sum_{<ij>}J_{ij}\sigma_{i}\sigma_{j}\ , \tag{1}
$$
where the couplings $J_{ij}$ are i.i.d. random variables representing the interaction between spins $\sigma_{i}$ and $\sigma_{j}$ at nearest-neighbor sites $i$ and $j$ (denoted by the bracket in (1)). One is free to choose the distribution of the couplings: a common choice, which will be used here, is a Gaussian distribution with zero mean and unit variance. This distinguishes the spin glass from more conventional magnets: if $J_{ij}$ were a constant $J$ , the model becomes either a ferromagnet ( $J>0$ ) or an antiferromagnet ( $J<0$ ), both with an ordered ground state.
As noted above, when every pair of spins interacts with each other, the model becomes the SK model, a mean-field spin glass with Hamiltonian
$$
H=-\frac{1}{\sqrt{N}}\sum_{i<j}J_{ij}\sigma_{i}\sigma_{j}\ , \tag{2}
$$
where the coupling distribution $J_{ij}$ is also a mean-zero, unit variance Gaussian. The thermodynamic properties of the SK model are well understood, and analytical expressions were found for its free energy and order parameters parisi1979infinite. Whether similar properties exist in short-range spin glasses remains an open question. Understanding short-range spin glasses in finite dimensions remains a major challenge in both condensed matter physics and statistical mechanics.
ANNs and spin glasses Magnetic systems have long been used to model neural networks, going back (at least) to the work of McCullough and Pitts mcculloch1943logical. In its simplest form, a neuron is assumed to have only two states: âonâ or firing, and âoffâ or quiescent. Therefore, its state can be modelled as an Ising spin, where $+1$ corresponds to the firing state and $-1$ to the quiescent state. A state, or firing pattern, of the entire neural network then corresponds to a spin configuration $\{\sigma\}$ . The interactions between neurons, i.e., synaptic efficiencies, correspond to the couplings $J_{ij}$ between spins in a magnetic system.
The Hebb learning rule is used to determine the coupling parameters hebb2005organization. In associative neural networks, such as the Hopfield network, suppose there are $p$ patterns (corresponding to memories). Then the interactions $J_{ij}$ between neurons are modelled as
$$
J_{ij}=\frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}, \tag{3}
$$
where the state of the $i^{\rm th}$ spin in the $\mu^{\rm th}$ pattern (spin configuration) is represented by $\xi_{i}^{\mu}$ , which also takes on the values $± 1$ . When $p=1$ , the system is equivalent to a Mattis model mattis1976solvable, which is gauge-equivalent to a ferromagnet. Setting $p>1$ introduces frustration into the system, and its behavior becomes more spin-glass-like. Apart from its interpretation as a model for neural networks, the Hopfield model represents an interesting statistical mechanical system in its own right and has been the subject of considerable study. A few studies have used statistical mechanics methods to investigate phase transitions in the Hopfield model hopfield1982neural; amit1985storing; gardner1988space and have found a bound $p$ on the number of memories that can be faithfully recalled from any initial state in a system of $N$ spins or neurons.
In addition to Hopfield networks, some idealized ANNs also have similar correspondences with spin glasses little1974existence, such as the spherical SK spin glass models choromanska2015loss [Figure 4]. In a spherical spin glass, the spins have varying magnitudes subject to the constraint $\sum_{i}^{N}\sigma_{i}^{2}=N$ . Given training data $\{X_{i}\},y$ , a neural network with $H-1$ hidden layers is equivalent to an equation choromanska2015loss, i.e., $y=q\sigma\big(W^{T}_{H}\sigma(W^{T}_{H-1}...\sigma(W_{1}^{T}X))...\big)$ , where $q$ is a normalization factor. The loss function of this ANN is
$$
L_{\Lambda,H}(\bm{\tilde{w}})=\frac{1}{\Lambda^{(H-1)/2}}\sum_{i_{1},i_{2},...,i_{H}=1}^{\Lambda}X_{i_{1},i_{2},...,i_{H}}\tilde{w}_{i_{1}}\tilde{w}_{i_{2}}...\tilde{w}_{i_{H}}, \tag{4}
$$
where $\Lambda$ is the number of weights. By imposing the spherical constraint above on the weights that follow $1/\Lambda\sum_{i=1}^{\Lambda}\tilde{w}_{i}^{2}=1$ , the loss function of this neural network was found to be mathematically equivalent to the Hamiltonian formulation of a spin glass.
<details>
<summary>x4.png Details</summary>

### Visual Description
## Diagram: Neural Network and SK Spin Glass
### Overview
The image presents two diagrams side-by-side. Diagram 'a' illustrates a basic neural network architecture, showing input, hidden, and output layers with interconnected neurons. Diagram 'b' depicts an SK Spin glass model and its evolution over time.
### Components/Axes
**Diagram a (Neural Network):**
* **Title:** Input: X, Hidden layer: Z, Output: y (located at the top)
* **Layers:**
* Input Layer: Contains 5 blue neurons, labeled "neuron" below.
* Hidden Layer: Contains 3 purple neurons, labeled "neuron" below.
* Output Layer: Contains 1 orange neuron.
* **Connections:** Lines connecting each neuron in the input layer to every neuron in the hidden layer, and each neuron in the hidden layer to the output neuron.
* **Equations:**
* Z = WX + b
* y = f(W'Z + b)
* f: activation function
* **Labels:** Matrices W, W'; bias b (located at the bottom)
**Diagram b (SK Spin Glass):**
* **Title:** SK Spin glass (located at the top)
* **Top Section:** A network of 4 blue neurons, each connected to all others with double-headed arrows.
* **Middle Section:** Two vertical chains of 4 neurons each. The left chain is light blue, and the right chain is light purple.
* **Connections:**
* The top network connects to the top neuron of the left chain via a thick gray arrow pointing downwards.
* Each neuron in the left chain is connected to the neuron above and below it with double-headed arrows. There are also curved arrows connecting the top and bottom neurons, and the second and third neurons.
* The left chain transitions to the right chain via a gradient arrow.
* Each neuron in the right chain is connected to the neuron above and below it with double-headed arrows. There are also curved arrows connecting the top and bottom neurons, and the second and third neurons.
* **Labels:**
* Time t<sub>i</sub> (below the left chain)
* Time t<sub>j</sub> ... (below the right chain)
### Detailed Analysis or ### Content Details
**Diagram a (Neural Network):**
* The input layer has 5 neurons, the hidden layer has 3 neurons, and the output layer has 1 neuron.
* The connections between layers are fully connected, meaning each neuron in one layer is connected to every neuron in the next layer.
* The equations describe the mathematical operations performed in the network. Z represents the output of the hidden layer, calculated as the matrix product of the input X and weight matrix W, plus a bias term b. The output y is calculated by applying an activation function f to the matrix product of the hidden layer output Z and weight matrix W', plus a bias term b.
**Diagram b (SK Spin Glass):**
* The top section represents a fully connected spin glass network.
* The middle section shows the evolution of the spin glass state over time. The left chain represents the state at time t<sub>i</sub>, and the right chain represents the state at time t<sub>j</sub>.
* The transition from the left chain to the right chain represents the change in the spin glass state over time.
### Key Observations
* Diagram 'a' illustrates a feedforward neural network, a common architecture in machine learning.
* Diagram 'b' visualizes the concept of a spin glass, a disordered magnetic system, and its temporal evolution.
* The diagrams use color to distinguish different layers/states (blue, purple, orange in the neural network; blue, light blue, light purple in the spin glass).
### Interpretation
The image juxtaposes a neural network with an SK Spin glass model, suggesting a potential analogy or connection between these two complex systems. The neural network diagram illustrates a computational model used for learning and prediction, while the SK Spin glass diagram represents a physical system with disordered interactions. The evolution of the spin glass state over time could be analogous to the learning process in a neural network, where the network's parameters are adjusted to minimize error. The image may be intended to highlight the potential for applying concepts from statistical physics, such as spin glasses, to understand and improve machine learning algorithms.
</details>
Figure 4: Topology of an artificial neural network and Sherrington-Kirkpatrick (SK) model. a, A simple, fully connected neural network with one hidden layer. b, A four-spin mean field SK spin-glass model.
Similarity between biological neurons and ANNs ANNs are designed to imitate the structure of biological neurons [Figure 5] duan2020spiking. A neuron or nerve cell consists of three major parts, i.e., a cell body (soma), dendrites (receiving extensions), and an axon (conducting extension). When two neurons are connected, the presynaptic cell sends an electromagnetic signal to the postsynaptic cell. The signal travels from the axon of the presynaptic cell to the dendrites of the postsynaptic cell, which are directly connected. A threshold potential or action potential controls the activation of a neural connection [Figure 5 c]. The postsynaptic potential decides when to fire an electromagnetic signal. The potential at a specific neuron is a function of all postsynaptic potentials from its presynaptic neurons, i.e.,
$$
V_{i}=\sum_{j}J_{ij}(S_{j}+1). \tag{5}
$$
When $V_{i}$ is larger than a threshold $U_{i}$ , or $V_{i}-U_{i}>0$ the neuron is active ( $S_{i}=1$ ). Using the Heaviside function $H(x)$ , the state of the neuron can be written as $S_{i}=H(U_{i}-V_{i})$ . We use $h_{i}$ to represent the molecular field, $h_{i}=U_{i}-V_{i}$ . When the spin direction is parallel to $h_{i}$ , the local configuration is stable, i.e., $h_{i}S_{i}>0$ . Usually, the threshold function $U_{i}$ is assumed to satisfy $U_{i}â\sum_{j}J_{ij}$ , then we find $-\frac{1}{2}\sum_{i}h_{i}S_{i}â H=-\frac{1}{2}\sum_{ij}J_{ij}\sigma_{i}\sigma_{j}$ , which has the same form as Eq. 1. The above formulation is for zero temperature. At finite temperature, the probability of activating one neuron is $P(S_{i})=\frac{1}{\exp[-\beta(V_{i}-U_{i})]+1}$ , which is a Fermi-Dirac distribution. Here, $\beta$ is the inverse temperature, defined by $\beta=1/k_{B}T$ with $k_{B}$ as the Boltzmann constant and $T$ is the temperature.
ANNs inherit these key features. Each artificial neuron sends data to the neurons in its following layers, which react collectively. This response is calculated using matrix multiplication plus a bias variable. The activation and updated connections of neurons follow Hebbâs rule hebb2005organization, which states that when two neurons fire together, the excitatory (ferromagnetic in magnetic terms) component of their coupling is enhanced. The more often two neurons are active together, the stronger their excitatory connection becomes. This holds for both biological and artificial neural networks.
<details>
<summary>x5.png Details</summary>

### Visual Description
## Neuron and Artificial Neural Network Comparison
### Overview
The image presents a comparison between a biological neuron and an artificial neural network, along with action potential graphs from in vivo, in vitro, and model simulations. The image is divided into three sections labeled a, b, and c. Section a illustrates a biological neuron, section b depicts an artificial neural network, and section c shows action potential graphs.
### Components/Axes
**Section a: Biological Neuron**
* **Labels:**
* Pre-neuron (Presynaptic cell)
* Axon
* Synapse
* Dendrites (inputs)
* Post-neuron (Postsynaptic cell)
* Cell body
**Section b: Artificial Neural Network**
* **Title:** artificial neural network
* **Inputs:** V<sub>in,1</sub>, V<sub>in,2</sub>, V<sub>in,3</sub>, V<sub>in,m</sub>
* **Outputs:** V<sub>out,1</sub>, V<sub>out,2</sub>, V<sub>out,3</sub>, V<sub>out,n</sub>
* The network consists of a grid of interconnected elements, with resistor-like symbols connecting the input and output nodes.
**Section c: Action Potential**
* **Title:** action potential
* **Y-axis:** Membrane potential (mV)
* Scale: -60 to 0 in the top two graphs, -80 to 40 in the bottom graph.
* **X-axis:** Time (ms)
* Scale: 0 to 4
* **Graphs:**
* In vivo
* In vitro
* Model
* Each graph shows a red line representing the action potential over time. A black arrow points to the approximate starting point of the action potential rise.
### Detailed Analysis or ### Content Details
**Section a: Biological Neuron**
* The neuron is depicted with its main components labeled. The pre-neuron connects to the axon, which leads to the synapse. The synapse connects to the dendrites of the post-neuron. The cell body is also labeled.
**Section b: Artificial Neural Network**
* The artificial neural network is represented as a grid. The inputs are labeled V<sub>in,1</sub>, V<sub>in,2</sub>, V<sub>in,3</sub>, and V<sub>in,m</sub>. The outputs are labeled V<sub>out,1</sub>, V<sub>out,2</sub>, V<sub>out,3</sub>, and V<sub>out,n</sub>. The connections between the inputs and outputs are represented by resistor-like symbols.
**Section c: Action Potential**
* **In vivo:** The red line starts at approximately -60 mV at 0 ms. It rises sharply to a peak around 10 mV at approximately 2 ms, then decreases back down.
* **In vitro:** The red line starts at approximately -60 mV at 0 ms. It rises sharply to a peak around 0 mV at approximately 2 ms, then decreases back down.
* **Model:** The red line starts at approximately -60 mV at 0 ms. It rises sharply to a peak around 40 mV at approximately 2 ms, then decreases back down.
### Key Observations
* The biological neuron diagram highlights the key components involved in neuronal communication.
* The artificial neural network diagram shows a simplified representation of interconnected nodes.
* The action potential graphs show the change in membrane potential over time for in vivo, in vitro, and model simulations.
* The action potential peaks vary between the three graphs, with the model showing the highest peak.
### Interpretation
The image illustrates the analogy between a biological neuron and an artificial neural network. Section a shows the structure of a biological neuron, highlighting the key components involved in signal transmission. Section b presents a simplified model of an artificial neural network, demonstrating how interconnected nodes can process information. Section c compares the action potential graphs from in vivo, in vitro, and model simulations. The differences in the action potential peaks suggest that the model may not perfectly replicate the biological system, but it captures the general trend of membrane potential change over time. The image suggests that artificial neural networks are inspired by the structure and function of biological neurons, but they are simplified representations of complex biological systems.
</details>
Figure 5: Biological neurons, artificial neural networks, and action potentials. a, Biological neuron and its associated concepts duan2020spiking. b, Artificial neural network with input features $V_{\mathrm{in},i}$ and output features $V_{\mathrm{out},i}$ duan2020spiking. Here, the resistor symbols (rectangles) represent the activation functions that switch the signals from individual inputs on or off. c Activation potentials naundorf2006unique. The top panel represents an action potential in a cat visual cortex neuron in vivo. The middle panel is an action potential from a cat visual cortical slice in vivo at 20°C. The bottom panel is a model potential. The arrow indicates the characteristic kink at the onset of the action potential. This figure is adapted from Refs. duan2020spiking; naundorf2006unique and modified.
2.2 Dictionary of corresponding concepts
To clarify the connection across the three research domains, we provide a list of analogous features of ANNs, biological neurons, and spin glasses in Table 1. Here, the concepts across the three research domains are compared. A few concepts in the table have been discussed above, such as a spin in spin glass corresponds to a biological neuron in biology and an artificial neuron in ANNs. This comparison helps identify some interesting concepts that are not studied, thus deepening our understanding and providing new research opportunities. For example, the Hamiltonian in spin glasses yields a total energy of the system that has a similar role to the loss function in ANNs, which has no correspondence in biological neurons. The effective learning mechanism of neurons remains elusive, although a few explanations have been proposed, such as the principle of predictive coding luczak2022neurons. The concept of Hamiltonian, if it can capture the overall activity of biological neurons, may be relevant to what dominates the coordination of specific neurons and the choice of particular information propagation paths friston2010free; friston2009free. Various forms of order parameters, including spin overlap parameters, magnetization, giant cluster size (commonly used in percolation theory), etc., are physical quantities that describe the average information of spin variables in spin systems and capture phase states. It can be calculated as the overall activity of biological neurons or artificial neurons, like how closely the current firing pattern matches a learned activity pattern in the hippocampus or cortex. Studying order parameters in ANNs can help understand how many neurons are generally used or activated, which is valuable to understanding the fundamental principles of ANNs as kernels.
Table 1: Dictionary for artificial neural networks (ANNs), spin glasses, and biological neurons. The terms in spin glasses and their corresponding terms in ANNs are compared. Interestingly, no term in ANNs corresponds to the order parameter, which is closely connected to the value of loss functions. The common features of ANNs and spin glasses explain (i) why Monte Carlo methods can explore their energy landscapes and (ii) why spin glass methods can be directly used to study ANNs. Another interesting question is whether we need to introduce new order parameters for spin glass physics.
| spin | biological neuron | artificial neuron |
| --- | --- | --- |
| spin variables | neuron state (active/inactive) | features |
| interaction of two spins | electromagnetic signal | weight between two neurons |
| interaction strengths | action potential | weight matrix |
| total energy (Hamiltonian) | ? (unclear) | loss function |
| stable or metastable states | memory | memorized patterns in associative network |
| connectivity of spins | synapses | activation function |
| spherical SG models | ? (unknown) | topology of fully connected ANNs |
| order parameter | overall activity of neurons | ? (no correspondence) |
2.3 Hopfield neural network and Boltzmann machines
The Hopfield neural network hopfield1982neural consists of a single layer of fully interconnected neurons, with each neuron linked to every other neuron [Figure 4 a]. The metastable states, or local minima of the network, correspond to the patterns, which allows the creation of associative or content addressable memories. Patterns are memorized and encoded in the network parameters. A similar idea was proposed by Little eight years prior to Hopfieldâs discovery little1974existence. Structurally, it bears a topological resemblance to the SK model sherrington1975solvable. The phase diagram of the Hopfield network is also similar to the SK model sherrington1975solvable. The original motivation of the Hopfield network was to create a model of associative memory, in which a stored pattern can be retrieved from an incomplete or noisy input.
The Hopfield network evolves over time, with the state of each neuron changing dynamically (Figure 4 b). This temporal evolution allows it to be viewed as a multi-layered system, where each âlayerâ represents a different time step. In this sense, it functions as a fully connected neural network, with each neuron capable of reading and outputting data. The network stores patterns by adjusting connection weights, and the number of patterns it can retain is proportional to the number of neurons folli2017maximum.
The Hopfield neural network influenced the development of recurrent neural networks (RNNs), such as the long short-term memory networks and the more efficient gated recurrent units. RNNs, characterized by recurrent neural layers, are designed to process sequential data, such as speech and natural languages. The transformer architecture has gradually replaced these methods, where the self-attention mechanism is adopted to replace recurrence for processing sequential data. The Hopfield neural network and its successors remain an active area of research agliari2019relativistic; saccone2022direct; negri2023storage.
Similar to the Hopfield neural network, the Boltzmann machine is a system where each spin interacts with all the others ackley1985learning [Figure 6 b]. It consists of a visible layer and a hidden layer, with all neuronsâwhether in the visible (input/output) or hidden layerâfully connected. Data is both inputted and outputted through the visible layer. A defining characteristic of Boltzmann machines is their stochastic nature: neuron activation is probabilistic rather than deterministic, influenced by connection weights and inputs. The probability of activation is determined by the Boltzmann distribution.
Due to the computational complexity of Boltzmann machines, more tractable ârestrictedâ Boltzmann machines (RBMs) were introduced smolensky1986information; nair2010rectified [Figure 6 c]. RBMs adopt a bipartite structure, where the digital layer (or visible layer) and the analog layer (hidden layer) are fully connected, but there are no intra-layer connections. This structural constraint allows the use of more computationally efficient training algorithms compared to RBMs fischer2014training. Boltzmann machines have been widely applied to physical and chemical problems, including quantum many-body wavefunction simulations nomura2017restricted; melko2019restricted, modeling polymer conformational properties yu2019generating, and representing quantum states with non-Abelian or anyonic symmetries vieijra2020restricted.
Barra et al. studied the equivalence of Hopfield networks and Boltzmann machines (Figure 6 d) barra2012equivalence. The study is based on a âhybridâ Boltzmann machine (HBM) model, where the $P$ hidden units in the analog layer are continuous (for pattern storage) and the $N$ visible neurons are discrete and binary. They showed that the HBM, when marginalized over the hidden units, and the Hopfield network are statistically equivalent. Assume $P(\sigma,z)$ is the joint distribution for HBM, $P(z)$ is the distribution for the continuous hidden variables that usually follow the Gaussian distribution, and $P(\sigma)$ is the distribution for the Hopfield distribution. These quantities follow Bayesâ rule $P(\sigma,z)=P(\sigma|z)P(z)=P(z|\sigma)P(\sigma)$ .
Since this work involves several key concepts used in ANNs, such as the diffusion model and pattern overlap, we discuss their theoretical details further here. The activity of the hidden layer follows a stochastic differential equation $T\frac{dz_{\mu}}{dt}=-z_{\mu}(t)+\sum_{i}\xi_{i}^{\mu}\sigma_{i}+\frac{2T}{\beta}\zeta_{\mu}(t)$ , where $\zeta_{\mu}$ is a white Gaussian noise. The idea is similar to the diffusion model widely used in image and video generation. The probability for $z_{\mu}$ described by the stochastic differential equation above is $P(z_{\mu}|\sigma)=\sqrt{\frac{\beta}{2\pi}}\exp\bigg[-\frac{\beta}{2}\bigg(z_{\mu}-\sum_{i}\xi_{i}^{\mu}\sigma_{i}\bigg)^{2}\bigg]$ . The Hamiltonian of the HBM shown in Figure 6 d is $H_{hbm}(\sigma,z,\tau;\xi,\eta)=\frac{1}{2}\bigg(\sum_{\mu}z^{2}_{\mu}+\sum_{\nu}\tau^{2}_{\nu}\bigg)-\sum_{i}\sigma^{2}_{i}\bigg(\sum_{\mu}\xi_{i}^{\mu}z_{\mu}+\sum_{\nu}\eta_{i}^{\nu}\tau_{\nu}\bigg)$ . Then, the joint probability for the HBM can be calculated by $P(\sigma,z,\tau)=\exp[-\beta H_{hbm}(\sigma,z,\tau;\xi,\eta)]/Z(\beta,\xi,\eta)$ , where the partition function $Z(\beta,\xi,\eta)=\sum_{\sigma}ât\prod^{P}_{\mu=1}dz_{\mu}ât\prod^{K}_{\nu=1}d\tau_{\nu}\exp[-\beta H_{hbm}(\sigma,z,\tau;\xi,\eta)]$ . Assisted by the Gaussian integral and Bayesâ rule, the probability for $\sigma$ is $P(\sigma)=\exp\bigg(-\frac{\beta}{2}\sum_{i,j}(\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu})\sigma_{i}\sigma_{j}\bigg)$ . If we set $J_{ij}=\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}$ , it is straightforward to see that this is the SK spin glass model. After determining the HBM Hamiltonian, it is not difficult to find that the concept of pattern overlap is mathematically equivalent to the overlap of replicas, an order parameter used in spin glass edwards1975theory.
Both Boltzmann machines and Hopfield neural networks have been foundational in the advancement of ANNs and deep learning. While newer, more efficient algorithms continue to emerge, these models demonstrate the profound impact of statistical physics on shaping machine-learning techniques.
2.4 Replica theory and the cavity method
Replica theory Replicas are copies of the same system. The key idea is to consider $n$ replicas in calculating its free energy, which is given by
$$
F=k_{B}T\langle\ln(Z)\rangle, \tag{6}
$$
where $\langle·\rangle$ represents thermal average over the ensemble. Then, the mathematical identity
$$
\langle\ln(Z)\rangle=\lim_{n\rightarrow 0}\frac{\langle Z^{n}\rangle-1}{n} \tag{7}
$$
can be used at the limit of $nâ 0$ to find the free energy. This is purely mathematical and not physical because the initially assumed integer $n$ is treated like a real number that can be smaller than one and infinitely close to 0. Mathematician Talagrand proved its correctness strictly talagrand2003spin. In the replica symmetry method, each replica is treated identically, which is the origin of the negative entropy in the mean-field solution for the SK model. Parisi proposed a replica symmetry-breaking method that successfully addressed the negative entropy issue by constructing a matrix ansatz in which replicas can have different ordering states.
The replica theory was initially developed to study glassy, disordered systems, such as spin glasses kirkpatrick1978infinite; parisi1983order; charbonneau2023spin; newman2024critical, and later to understand the macroscopic behavior of learning algorithms and capacity limits gardner1988space; amit1985storing. It was used to analyze the phase transitions of associative neural networks (e.g., the Hopfield network) and overparameterization and generalization of ANNs rocks2022memorizing; baldassi2022learning. One crucial question in associative neural networks is the storage capacity of memory. A network system with size $N$ could only provide associative memory $pâ€\alpha_{c}N$ at zero temperature with $\alpha_{c}=0.1-0.2$ for Hopfield models hopfield1982neural; amit1985storing. For example, Amit and colleagues found $\alpha_{c}â 0.138$ with Hebbâs rule $J_{ij}=1/2\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}$ amit1985storing. When different or no constraints are imposed on $J_{ij}$ , different values of $\alpha_{c}$ are found gardner1988space; gardner1987maximum; gardner1988optimal; krauth1989storage. When $pâ„\alpha_{c}N$ , memory quality degrades quickly. There is a phase transition at $\alpha_{c}$ . Studying this problem is equivalent to finding the number of ground states of a spin glass.
The replica theory has been used to understand the phase transition or the critical behavior of ANNs in supervised and unsupervised learning. Hou et al. proposed a statistical physics model of unsupervised learning and found that the sensory inputs drive a series of continuous phase transitions related to spontaneous intrinsic-symmetry breaking hou2020statistical. Baldassi et al. studied the subdominant dense clusters in ANNs with discrete synapses baldassi2015subdominant. They found these clusters enabled high computational performance in these systems.
Cavity method Later, Parisi and coauthors proposed another method, i.e., the cavity model, to solve the SK model (a method to calculate the statistical properties by removing a single spin and observing the reaction of the spin system) mezard1987spin. It is a statistical method used to calculate thermodynamic properties, serving as an alternative to the replica theory. It focuses on removing one spin and its interactions with its neighbors to create a cavity and calculates the response of the rest of the system. At zero temperature, the response is determined by energy minimization. Rocks et al. adopted the âzero-temperature cavity methodâ for the random nonlinear features model rocks2022memorizing to study the double descent behavior, an important phenomenon that we will discuss later.
<details>
<summary>x6.png Details</summary>

### Visual Description
## Neural Network Architectures
### Overview
The image presents a comparative overview of different neural network architectures: Hopfield network, Boltzmann machine, Restricted Boltzmann machine, and a final unnamed network. Each architecture is illustrated with a diagram showing the connections between neurons and key characteristics.
### Components/Axes
* **Section a:** Hopfield neural network
* Diagram shows a fully connected network of neurons.
* A downward arrow indicates a transition to a layered structure.
* Two columns of neurons are labeled "Time t\_i" and "Time t\_j".
* Text: "Neurons fully connected", "Only visible layer (store patterns)", "Deterministic; T=0 K"
* **Section b:** Boltzmann machine
* Diagram shows a fully connected network of neurons with self-loops.
* Labels: "Visible layer (store patterns)", "Hidden layer"
* Text: "All neurons connected", "Stochastic; Monte Carlo; finite temperature"
* **Section c:** Restricted Boltzmann machine
* Diagram shows a bipartite network with connections only between layers.
* Labels: "Visible layer (store patterns)", "Hidden layer"
* Text: "Only neurons in different layers connected; similar to ANNs used nowadays, comprised of a stack of Boltzmann machines"
* **Section d:** Unnamed Network
* Two diagrams showing different network structures.
* The left diagram has two layers of nodes, one labeled z1, z2, z3 and the other labeled sigma1, sigma2, sigma3, sigma4, sigma5, and two nodes labeled tau1, tau2.
* The right diagram has nodes labeled sigma1, sigma2, sigma3, sigma4, sigma5.
### Detailed Analysis
* **Hopfield Network (a):**
* Initial state: A fully connected network of approximately 4 neurons.
* Transition: Transforms into a layered structure with two columns, each containing approximately 4 neurons.
* Time t\_i: Neurons are light blue.
* Time t\_j: Neurons are purple.
* The network is deterministic and operates at absolute zero temperature (T=0 K).
* **Boltzmann Machine (b):**
* A fully connected network of approximately 4 neurons in each layer.
* All neurons are interconnected, including self-loops.
* Two layers are labeled "Visible layer" and "Hidden layer".
* The network is stochastic and uses Monte Carlo methods at a finite temperature.
* **Restricted Boltzmann Machine (c):**
* A bipartite network with two layers.
* The "Visible layer" contains approximately 4 light blue neurons.
* The "Hidden layer" contains approximately 4 purple neurons.
* Connections exist only between the visible and hidden layers.
* **Unnamed Network (d):**
* Left Diagram:
* Top layer: 3 purple nodes labeled z1, z2, z3.
* Middle layer: 5 light blue square nodes labeled sigma1, sigma2, sigma3, sigma4, sigma5.
* Bottom layer: 2 purple triangle nodes labeled tau1, tau2.
* All top layer nodes are connected to all middle layer nodes.
* All middle layer nodes are connected to all bottom layer nodes.
* Right Diagram:
* 5 light blue square nodes labeled sigma1, sigma2, sigma3, sigma4, sigma5.
* Each node is connected to two other nodes, forming a pentagon with all diagonals drawn.
### Key Observations
* The diagrams illustrate the evolution from fully connected networks (Hopfield, Boltzmann) to more structured, layered networks (Restricted Boltzmann).
* The transition from deterministic to stochastic models is highlighted.
* The unnamed network in section d shows a more complex architecture with different types of nodes and connections.
### Interpretation
The image provides a visual comparison of different neural network architectures, emphasizing their connectivity, layering, and operational characteristics. The progression from Hopfield to Boltzmann to Restricted Boltzmann machines demonstrates the evolution towards more structured and efficient network designs. The unnamed network in section d suggests further variations in network architecture, potentially involving different types of nodes and connections for specialized tasks. The distinction between deterministic and stochastic models highlights the different approaches to learning and inference in these networks.
</details>
Figure 6: Comparison of Hopfield neural network and Boltzmann machines. a, Principle of the Hopfield neural network. b, Illustration of a Boltzmann machine. c, Restricted Boltzmann machine, generated by removing the intra-layer connections of visible and hidden layers. d, The equivalence of Hopfield neural network and a hybrid restricted Boltzmann machine. Here, the visible variables $\{\sigma_{i}\}$ are discrete and binary and hidden variables $\{z_{i}\}$ and $\{\tau_{i}\}$ are continuous.
2.5 Overparameterization and double-descent behavior
Overparameterization The variance-bias trade-off is prevalent and has been observed in numerous models, and the reason is apparent: fewer parameters result in low variance and high bias, while more parameters lead to high variance and low bias for each prediction. There is an optimal choice of intermediate parameter size at which the model achieves its best performance. However, this does not seem to hold for neural networks. The optimal performance of neural networks is achieved with overparameterization belkin2019reconciling; rocks2022memorizing; baldassi2022learning. When a neural network has more parameters than the number of input data points, it is still considered overparameterized. In simple analytical expressions, such as linear equations, this typically becomes problematic, a phenomenon known as overfitting. One long-standing mystery of ANNs is that they seemingly subvert traditional machine learning theory, as their parameter size can exceed the number of data points for training without a signal of overfitting.
Double-descent behavior The total error function of a model usually has a âUâ shape, and its minimum corresponds to the optimal parameters. The training error vanishes at a critical value of the parameter size when it equals the size of the training data points. The test error becomes divergent at this critical parameter size, analogous to the specific heat at the transition temperature. In traditional linear models, the error increases except at this critical value; in the overparameterized model, the test error first decreases, then increases at the critical value, and then decreases again. This phenomenon is referred to as double-descent behavior belkin2019reconciling; rocks2022memorizing; baldassi2022learning. This behavior results in an optimal or minimal test error in the overparameterized region.
Understanding the variance-bias trade-off of neural networks with overparameterization is crucial in deep learning. There are different opinions on the overparameterization phenomenon, which relies on methods from statistics and probability theory. For example, this behavior can be studied using the replica trick. The neural networks undergo a phase transition when the parameters increase across the critical threshold. The Information Bottleneck Principle (IBP) proposes that deep neural networks first fit the training data and then discard irrelevant information by going through an information bottleneck, which helps them generalize tishby2000information; tishby2015deep. Since the results are based on a particular type of ANNs, it is controversial whether the IBP conclusion holds generally for all deep neural networks saxe2019information.
ANNs with more neurons in a neural layer or wide neural layers usually have better generalization performance than their narrower counterparts. When the number of neurons approaches infinity, this extreme case becomes mathematically more straightforward to treat, which is equivalent to the fact that the SK model is easier to solve than the EA model. When an infinite number of neurons is allowed in a layer, this is equivalent to a Gaussian process neal1996priors; bahri2024houches. Interestingly, a recent study came to similar conclusions for quantum neural networks garcia2025quantum. The Gaussian process is one way to view ANNs and explains why many parameters are not overfitting. ANNs are equivalent to kernels at initialization and throughout the training process. Many parameters in ANNs remain constant, causing them to behave like a kernel jacot2018neural. Their values do not depend on the training data, but rather on the architecture of the neural network.
Machine learning algorithms that use kernels are known as kernel machines. These models operate by mapping data from a low-dimensional space to a high-dimensional one using functions such as Gaussian kernels, which can enhance classification performance. Its inverse process is regularization, which aims to reduce the number of free parameters to prevent overfitting. Kernel machines are conceptually simpler and more analytically tractable. During training, the evolution of the function represented by an infinite-width neural network mirrors that of a kernel machine. In function space, both models can be visualized as descending a smooth, convex (bowl-shaped) landscape in a high-dimensional space. Due to this structure, it is mathematically straightforward to prove that gradient descent converges to the global minimum. However, the practical relevance of this equivalence remains debated. Real-world neural networks are of finite width, and their parameters can change in complex ways during training. In terms familiar to statistical physicists, this is akin to questioning whether insights from the SK model of spin glasses remain valid in the short-range EA modelâi.e., whether the behavior in idealized, solvable models carries over to more realistic, complex systems.
3 Challenges and perspectives
ANNs are good at recognizing patterns. However, there is a lack of explainability, bias, hallucination, and transferability in ANNs. More drawbacks that need to be mitigated include reliance on massive datasets, catastrophic forgetting, vulnerability to attacks, high computational costs, and inadequate symbolic reasoning wang2024hopfield. In this section, we will elaborate on the challenges primarily relevant to their statistical aspects and provide our perspectives on their solutions.
3.1 Challenges in ANNs and spin glasses for ANNs
There are many unanswered questions specifically to ANNs, such as why ANNs work and why backpropagation performs better than other methods in high-dimensional space belkin2021fit. We do not know where to start solving them, and if we know, âa horde of people would do itâ miller2024nobel. To make things worse, ANNs have complicated and diverse structures, which prevent them from constructing a rigorous mathematical foundation. Nonetheless, statistical physics provides essential tools to solve these problems.
Statistical physics is deeply connected to optimization problems and provides approaches to solving them. This connection has been convincingly demonstrated by the fact that simulated annealing provides solutions to combinatorial optimization problems, such as the traveling salesman problem kirkpatrick1983optimization. Additionally, the simple âbasin-hoppingâ approach has been applied to atomic and molecular clusters, as well as more complicated hypersurface deformation techniques for crystals and biomolecules wales1999global. Breakthroughs in statistical physics are valuable to finding the optimal solutions to ANNs. We anticipate the development of more efficient statistical methods.
Understanding the nature of ground states of short-range spin glasses can provide valuable and indirect insights into the capacity of ANN to memorize patterns. One crucial direction associated with statistical physics for ANNs is to study the topological structure of non-idealized ANNs and their connection with spin-glass models mezard2024spin. We can only connect simplified ANNs with spin glasses, not the more interesting, widely applied ANNs. A few challenges exist in understanding spin glasses that have interaction ranges intermediate between those of short-range EA and mean-field SK models, as well as their mapping to general ANNs. Specific examples include (i) finding the global optimal solutions and (ii) whether the ground states in a general spin glass are finite or infinite newman2022ground. Numerical and theoretical bottlenecks contribute to the challenges. Numerically, (1) there is limited information on large systems and scaling due to the limited computing resources, and (2) there is a lack of visualization methods to help process high-dimensional data and inspire ideas for analytical solutions. Theoretically, (1) we have no clear picture of the models in the thermodynamic limit, and (2) there is no analytical method to describe short-range spin-glass models. Analytic solutions are limited to the dynamics of models with all-to-all interactions, such as the SK model. It is non-trivial to understand a sparse network where not all neurons or spins interact with each other. However, finding an analytical solution to sparse networks is essential for comprehending general ANNs, just as the SK model is for fully connected ANNs. Recently, Metz proposed a dynamical mean-field method for sparse directed networks and found their exact analytical solutions metz2025dynamical. The general solution is claimed to apply to the study of neural networks and beyond, including ecosystems, epidemic spreading, and synchronization, which represents meaningful progress. However, more efforts are still needed in this direction.
3.2 Spin-glass physics helps understand ANNs
Spin-glass physics and ANNs are reciprocal. Important inference problems in machine learning can be formulated as problems in the statistical physics of disordered systems. However, the significant issues we face in analyzing deep networks require the development of a new chapter in spin glass theory mezard2024spin. Only a solid theory can transform deep network predictions from best guesses in a black box into interpretable, demonstrable statements whose worst-case behavior can be controlled, although constructing a theory of deep learning is challenging.
We have summarized a few examples where ANNs solve problems in statistical and theoretical physics, such as the phase detection of matter or materials. This is an active research area. Meanwhile, mean-field theory (e.g., cavity method or replica theory) inspires the development of ANN algorithms and enhances our understanding of ANNs, such as the double-descent behavior. Currently, only analytical results are obtained for simple ANNs, and only these simple ANNs are well understood. Unlike the previous methods, which are suitable for single-layer or shallow networks, active research is ongoing to develop a statistical mechanical theory for learning in deep architectures, such as the method proposed by Li and Sompolinsky li2021statistical. In the future, it is expected that the properties of more real ANNs can be fully explored.
<details>
<summary>order-parameter.jpg Details</summary>

### Visual Description
## Chart/Diagram Type: Multi-Panel Figure
### Overview
The image presents a multi-panel figure (a, b, c) exploring material properties and computational modeling. Panel (a) illustrates a 3D convolutional neural network architecture used for material analysis, along with a t-SNE projection. Panel (b) shows atomic structures representing different ordering states and a plot of <Zop>T and Cv vs. T. Panel (c) displays plots of Ï(Zop) vs. T for different material compositions.
### Components/Axes
**Panel a:**
* **Diagram:** A schematic of a 3D Convolutional Neural Network.
* **Layers:** 3D Convolutional layers, 3D Average Pooling layers, Encoder, Dense layers (with labels ÎŒ, z, Ï), Decoder.
* **Input:** A 3D representation of a material structure.
* **Output:** An "error" signal and a t-SNE projection.
* **t-SNE Plot:**
* **Axes:** t1 (x-axis), t2 (y-axis). Both range from approximately -100 to 100.
* **Color Scale:** Temperature (T) in Kelvin, ranging from 500 (blue) to 2000 (yellow).
* **Equation:** d = |x1| + |x2|
* **Annotation:** d2 = (x1, x2)
**Panel b:**
* **Atomic Structures:**
* Structure 1: "Strong B2 order"
* Structure 2: "Partial B2 order"
* Structure 3: "A2 order" (enclosed in a red box)
* **Plot:**
* **Left Y-axis:** <Zop>T, ranging from 0 to 10.
* **Right Y-axis:** Cv, ranging from 0 to 1.0.
* **X-axis:** T (K), ranging from 0 to 2000.
* **Data Series:**
* <Zop>T: Red line with circular markers.
* Cv: Blue line.
**Panel c:**
* **Top Plot:**
* **Y-axis:** Ï(Zop), ranging from 0 to 6.
* **X-axis:** T (K), ranging from 500 to 4000.
* **Data Series:**
* MoNbTaW N=10x10x10: Blue line with error bars.
* MoNbTaW N=12x12x12: Orange line with error bars.
* MoNbTaVW N=10x10x10: Green line with error bars.
* **Vertical Shaded Regions:** Gray, indicating LRO1, SRO1, and SRO2.
* **Legend:** Located at the top-right.
* **Bottom Plot:**
* **Y-axis:** Ï(Zop), ranging from 5 to 45.
* **X-axis:** T (K), ranging from 800 to 3000.
* **Data Series:**
* AlxCoCrFeNi x=1: Blue line with error bars.
* AlxCoCrFeNi x=1.6: Orange line with error bars.
* AlxCoCrFeNi x=2: Green line with error bars.
* **Vertical Shaded Regions:** Gray, indicating LRO2.
* **Labels:** x=1, x=1.6, x=2 are marked on the plot.
* **Legend:** Located at the top-right.
### Detailed Analysis or ### Content Details
**Panel a:**
* The 3D convolutional neural network takes a 3D material structure as input.
* The network consists of convolutional layers, pooling layers, and dense layers.
* The output is used to generate a t-SNE projection, which visualizes the high-dimensional data in a 2D space.
* The t-SNE plot shows clusters of points, colored by temperature.
**Panel b:**
* The atomic structures represent different degrees of ordering in the material.
* Structure 1 (Strong B2 order) shows a high degree of order.
* Structure 2 (Partial B2 order) shows a partial degree of order.
* Structure 3 (A2 order) shows a disordered state.
* The <Zop>T curve (red) shows a peak around T = 200 K, then decreases as temperature increases.
* At T = 0 K, <Zop>T is approximately 9.5.
* At T = 500 K, <Zop>T is approximately 2.5.
* At T = 2000 K, <Zop>T is approximately 1.
* The Cv curve (blue) shows a sharp peak around T = 200 K, then decreases and shows a smaller peak around T = 750 K.
**Panel c:**
* **Top Plot:**
* The blue line (MoNbTaW N=10x10x10) starts at approximately Ï(Zop) = 3.2 at T = 500 K, decreases to a minimum around T = 1000 K, and then remains relatively flat.
* The orange line (MoNbTaW N=12x12x12) starts at approximately Ï(Zop) = 4.0 at T = 500 K, decreases to a minimum around T = 1250 K, and then increases slightly.
* The green line (MoNbTaVW N=10x10x10) starts at approximately Ï(Zop) = 5.8 at T = 500 K, decreases to a minimum around T = 1250 K, and then increases.
* **Bottom Plot:**
* The blue line (AlxCoCrFeNi x=1) starts at approximately Ï(Zop) = 44 at T = 800 K, decreases to a minimum around T = 1500 K, and then increases slightly.
* The orange line (AlxCoCrFeNi x=1.6) starts at approximately Ï(Zop) = 22 at T = 800 K, decreases to a minimum around T = 1500 K, and then increases slightly.
* The green line (AlxCoCrFeNi x=2) starts at approximately Ï(Zop) = 32 at T = 800 K, decreases to a minimum around T = 1500 K, and then decreases slightly.
### Key Observations
* Panel (a) shows the architecture of a neural network used for material analysis.
* Panel (b) shows the relationship between atomic ordering and material properties.
* Panel (c) shows the temperature dependence of Ï(Zop) for different material compositions.
* The t-SNE plot in panel (a) visualizes the high-dimensional data in a 2D space, revealing clusters of points corresponding to different temperatures.
* The <Zop>T and Cv curves in panel (b) show peaks around T = 200 K, indicating a phase transition.
* The Ï(Zop) curves in panel (c) show minima around T = 1000-1500 K, indicating a change in material properties.
### Interpretation
The data suggests that the neural network is capable of capturing the complex relationships between atomic structure, temperature, and material properties. The t-SNE plot provides a visual representation of the high-dimensional data, revealing clusters of points corresponding to different temperatures. The <Zop>T, Cv, and Ï(Zop) curves provide quantitative information about the temperature dependence of material properties. The observed peaks and minima in these curves indicate phase transitions and changes in material behavior. The different material compositions exhibit distinct behaviors, highlighting the importance of composition in determining material properties. The vertical shaded regions in panel (c) likely indicate regions of different ordering types.
</details>
Figure 7: Neural network-based order parameter in complex concentrated alloys. a, A schematic structure for variational auto-encoder (VAE) was used. The information extracted from the latent space and projected in a two-dimensional (2D) space using t-SNE. The 2D data was then used to construct the order parameter $\langle Z^{op}\rangle_{T}$ based on Manhattan distance $\sum_{i}|x_{i}|$ . b, The new order parameter describes the different configurations. The first-order derivative of $\langle Z^{op}\rangle_{T}$ (i.e., $\chi(Z^{op})$ ) can play a similar role to the specific heat $C_{v}$ , where the peaks represent the two phase transitions. c, The temperatures of phase transitions represented by the peak of $\chi(Z^{op})$ are compared with experimental data. This figure is adjusted from Ref. yin2021neural.
3.3 Do we need new order parameters for ANNs?
Historically, order parameters play an essential role in describing the thermodynamic behavior of complex systems, as demonstrated by the second-order parameters $\langle q^{2}\rangle$ proposed by Edwards and Anderson edwards1975theory. New order parameters promote the study of glassy systems. Currently, the phase distributions of short-range spin glasses are not fully understood, which may limit our understanding of complex ANNs. We may need new order parameters based on ANNs to describe phase transitions and the phase distribution of spin glasses. Developing a rigorous mathematical description of metastable states and solutions appears to be necessary. In a previous study, we adopted a similar idea and proposed a neural-network-based order parameter to study the complex concentrated alloys, where multiple principal elements co-exist that introduce maximal disorder (Figure 7) yin2021neural. We successfully utilized the new order parameter to differentiate between different phases. We found that the predicted phase transition temperatures are consistent with experimental results. Similarly, due to the complex feature of ANN phase transitions, we may need one ANN to help us understand the phases of another ANN. ANNs can capture size-independent patterns that pave the way to understand the ground states of spin glasses at the thermodynamic limit. Currently, only spin-glass results of relatively small sizes are available, limited by computing resources (speed of supercomputers). However, we still need to overcome difficulties, including feature extraction and feature synthesis used in ANNs.
3.4 Quantum computing for spin glasses and ANNs
Due to the rugged energy landscapes of spin glasses, even todayâs most powerful supercomputers cannot model the large size of this complex system, leaving the behavior of these systems largely unexplored. Quantum computers can mitigate this issue with their exponential scaling. The breakthroughs in both hardware and algorithms in quantum computing provide new opportunities. A team from Google reported an error-corrected chip, Willow, featuring 105 qubits based on superconducting qubits, in December 2024 Acharya2025. The performance of the Willow chip is challenged by Chinaâs Zuchongzhi 3.0 processor gao2025establishing, a superconducting quantum computer prototype featuring 105 qubits, which was reported in March 2025. The hope of quantum computing based on Majorana particles, a type of quasi-particle in topological insulators, rose in February 2025. Microsoft published its intermediate, controversial result on its quantum computer, Majorana 1 quantum2025interferometric. Quantum computers can be used to efficiently explore the phase spaces of spin-glass-like systems. Simulating Ising spin glasses on a quantum computer provides new opportunities Lidar1997. For example, the Ising-type Sachdev-Ye-Kitaev (SYK) model that describes the dynamics of wormholes between black holes for fermions can be solved numerically by quantum computing gao2021traversable; jafferis2022traversable. Quantum computing can also be applied in quantum machine learning, such as in the development of quantum versions of ANNs or quantum neural networks boreiri2025topologically. This will become more promising with the emergence of new quantum algorithms. Recently, King and colleagues used quantum annealing processors to simulate quantum dynamics in programmable spin glasses King2025-D-Wave, one application of quantum computers of practical interest. Given the topological relationship between spin glasses and ANNs, this will provide valuable insights into understanding ANNs.
Additionally, it is anticipated that ANNs can leverage the benefits of quantum computing in terms of efficiency and accuracy. For example, the quantum version of the Hopfield network was developed by replacing classical Hebbian learning with a quantum Hebbian learning rule rebentrost2018quantum. The researchers demonstrated the ability to store exponentially large networks with a polynomial number of qubits by encoding them as amplitudes of quantum states. Their quantum algorithm achieves a quantum computational complexity that is logarithmic in the dimensionality of the data. The advantages of quantum ANNs, as well as other quantum algorithms, are questionable and unknown in near-term quantum computers. To address this critical question, Abbas et al. proposed the so-called effective dimension to measure the power and trainability of quantum ANNs abbas2021power. Assisted by the measure, they showed a quantum advantage of a class of quantum ANNs. Quantum ANNs are currently under active investigation and are still in their early stages of development.
3.5 More opportunities
One exciting future direction of ANN is neural computing, also known as synaptic computation abbott2004synaptic; wang2018fully; weilenmann2024single; zhou2024field. The storage and computing of the current ANNs are separate. Differently, biological neurons store data and perform computations through the same neural network, making it highly efficient. Since associative ANNs have demonstrated their capability for memory storage and have been successfully applied in computation, it is natural to take the next step and design memristive neural networks. Memristors can store and compute information simultaneously. A memristor is a combination of a memory unit and a resistor, which can mimic biological synaptic functions in ANNs. The resistance of a memristor changes based on the history of applied current or voltage. A prominent advantage of memristors is that they are non-volatile and can retain memory in the absence of power. For example, Wang et al. developed fully memristive neural networks for pattern classification with unsupervised learning wang2018fully. In a recent study, Weilenmann and colleagues explored energy-efficient neural networks using a single neuromorphic memristor that mimics multiple synaptic mechanisms weilenmann2024single. More details can be found in a recent review of the memristive Hopfield neural networks applied to chaotic systems lin2023review. Memristors are artificial analogs of biological neurons that harness computing power by mimicking their structure and function. Recently, another exciting direction that deserves more attention is the direct participation of neurons in vitro in computational processes. These studies integrate adaptive in vitro neurons and in silico high-density multi-electrode arrays into digital systems to perform computations kagan2022vitro. Nonetheless, these research directions are still in their infancy and offer a lot of opportunities to explore the potential of neural computing. Since they are not the focus of this article, we will not discuss them in depth.
4 Conclusions
We have reviewed the applications and history of artificial neural networks, as well as their connections with biology and statistical physics, particularly in the context of spin glasses. We showed the deep connections of these multidisciplinary directions. For example, we demonstrate how the replica theory is applied to understand the behavior of artificial neural networks. We also discussed the challenges and possible solutions. One of the significant problems is understanding and also accelerating the training of large artificial neural networks, which can benefit from the integration with quantum computing and neural computing. Quantum computing can provide exponential acceleration, while neural computing can mitigate the storage limit, reducing the latency in data transfer. Theoretically, the complex behavior, reliability, and stability of ANNs require more research efforts, which lag behind their applications. Arguably, the most challenging problems involve understanding biological and artificial neurons. The former involves the formation of consciousness, while the latter involves the realization of artificial intelligence. We concluded that statistical physics bridges the gap between these key multidisciplinary problems and will provide valuable methods to find their answers.