## Memristors - from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing
Adnan Mehonic * , Abu Sebastian, Bipin Rajendran, Osvaldo Simeone, Eleni Vasilaki, Anthony J. Kenyon
Dr. Adnan Mehonic, Prof Anthony J. Kenyon
Department of Electronic & Electrical Engineering, UCL, Torrington Place, London WC1E 7JE, United Kingdom
E-mail: adnan.mehonic.09@ucl.ac.uk
Dr. Abu Sebastian
IBM Research - Zurich, 8803 Rüschlikon, Switzerland
Dr. Bipin Rajendran, Prof Osvaldo Simeone
Centre for Telecommunications Research, Department of Engineering, King's College London, WC2R 2LS, United Kingdom
Prof. Eleni Vasilaki
Department of Computer Science, University of Sheffield, Sheffield, South Yorkshire, United Kingdom
Keywords: memristor, neuromorphic, AI, deep learning, spiking neural networks, in-memory computing
## Abstract
Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. These successes have been mostly supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raise the question of whether the described progress will be slowed or halted due to hardware limitations. This paper reviews the case for a novel beyond-CMOS hardware technology - memristors - as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks. Central themes are the reliance on nonvon-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in artificial intelligence, we briefly discuss an example based reservoir computing. We conclude the review by speculating on the 'big picture' view of future neuromorphic and brain-inspired computing systems.
## 1. Introduction
The three factors are currently driving the main developments in artificial intelligence (AI): availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. Graphics processing units (GPUs) have been demonstrated as effective coprocessors for the implementation of machine learning (ML) algorithms based on deep learning (DL). Solutions based on deep learning and GPU implementations have led to massive improvements in many AI tasks, but have also caused an exponential increase in demand for computing power. Recent analyses show that the demand for computing power has increased by a factor of 300,000 since 2012, and the estimate is that this demand will double every 3.4 months - at a much faster rate than improvements made historically through Moore's scaling (a 7-fold improvement over the same period of time) [1] . At the same time, Moore's law has been slowing down significantly for the last few years [2] , as there are strong indications that we will not be able to continue scaling down CMOS transistors. This calls for the exploration of alternative technology roadmaps for the development of scalable and efficient AI solutions.
Transistor scaling is not the only way to improve computing performance. Architectural innovations such as GPUs, field-programmable arrays (FPGAs), and application-specific integrated circuits (ASICs), have all significantly advanced the ML field 3 . A common aspect of modern computing architectures for ML is a move away from the classical von Neumann architecture that physically separates memory and computing. This approach yields a performance bottleneck that is often the main reason for both energy and speed inefficiency of ML implementations on conventional hardware platforms due to costly data movements. However, architectural developments alone are not likely to be sufficient. In fact, standard digital CMOS components are inherently not well suited for the implementation of a massive number of continuous weights/synapses in artificial neural networks (ANNs).
1.1. The promise of memristors. There is a strong case to be made for the exploration of alternative technologies. Although the memristor technology is currently still in development, it is a strong candidate for future non-CMOS and beyond von-Neumann computing solutions [4] . Since its early development in 2008 [5] , or even earlier under different names [6] , memristor technology expanded remarkably to include many different materials solutions, physical mechanisms, and novel computing approaches [4] . A single progress report cannot cover all different approaches and fast-growing developments in the field. The evaluation of state of the art in memristor-based electronics can be found elsewhere [7] . Instead, in this paper, we present and discuss a few representative case studies, showcasing the potential role of memristors in the expanding field of AI hardware. We present examples of how memristors are used for in-memory computing systems, deep learning accelerators, and spike-based computing. Finally, we discuss and speculate on the future of neuromorphic and bio-inspired computing paradigms and provide reservoir computing as an example.
For the last 15 years, memristors have been a focal point for many different research communities - mathematicians, solid-state physicists, experimental material scientists, electrical engineers and, more recently, computer scientists and computational neuroscientists. The concept of memristor was introduced almost 50 years ago, back in 1971 [8] , was nearly forgotten for almost four decades. It is now experiencing a rebirth with a vibrant and very active research community. There are many different flavours of memristive technologies. Still, in their most popular implementation, memristors are simple two-terminal devices with the extraordinary property that their resistance depends on their history of electrical stimuli. In other words, memristors are resistors with memory. They promise high levels of integration, stable non-volatile resistance states, fast resistance switching, excellent energy efficiency - all very desirable properties for next generation of memory technologies.
The physical implementations of memristors are broad and arguably include many different technologies such as redox-based resistive random-access memory (ReRAM), phase change memories (PCM), magnetoresistive random-access memory (MRAM). Further differentiations within larger classes can be made, depending on physical mechanisms that govern the resistance change. Many excellent reviews cover the principles and switching mechanisms of memristor devices. Here, we will briefly mention two extensively studied types of memristive devices, namely redox-based random access memory (ReRAM) and phase-change memory (PCM).
Resistance switching is one of the most explored properties of memristive devices. A thin insulating film reversibly changes its electrical resistance - between an insulating state and a conducting state - under the application of an external electrical stimulus. For binary memory devices, two stable states are sought, typically called the high resistance state (HRS), and the low resistance state (LRS). The transition from the HRS to the LRS is called a SET process, while a RESET process describes the transition from the LRS to the HRS.
Basic memory cells of both types, in their most straightforward implementation, have three layers - two conductive electrodes and a thin switching layer sandwiched in-between. Local redox processes govern resistance switching in ReRAM devices. A broad classification can be made based on a distinction between the switching that happens as a result of intrinsic properties of the switching material (typically oxides), and switching that is the result of indiffusion of metal ions (typically from one of the metallic electrodes). The former type is called intrinsic switching, and the latter is called extrinsic switching [9] . Alternatively, a classification can be made depending on the main driving force for the redox process (thermal or electrical), or the type of ions that move. The main three classes are electrochemical metallization cells (or conductive bridge) ReRAMs (ECM), valence change ReRAMs (VCM) and thermochemical ReRAMs (TCM) [4] .
Many ReRAM devices require an electroforming step prior to resistance switching. This can be considered a soft breakdown of the insulating material. A conductive filament is produced inside the insulating film as a result of the applied electrical bias. Modification of conductive filaments, led by a local redox process, leads to the change of resistance. The diameter of the conductive filament is typically of the order of a few nanometers to a few tens of nanometers, and it does not depend on the size of the electrodes. Another, less common type is interfacetype switching, which does not depend on creation and modification of conductive filaments, but can be driven by the formation of a tunnel or Schottky barrier across the whole interface between electrode and switching layer.
In the case of PCMs, the change of resistance due to the crystallisation and amorphisation processes of phase change materials. Amplitude and duration of applied voltage pulses control the phase transitions - the SET process changes the amorphous to a crystalline phase (HRS to LRS transition), and the RESET process changes the crystalline to an amorphous phase (LRS to HRS transition).
For many computing tasks, more than two states are required, and for most memristive devices, including ReRAMs and PCMs, many resistance states can be achieved. However, benchmarking of memristive devices for different applications, beyond pure digital memory, can be challenging and relies on many different parameters other than the number of different resistance states. We will discuss the main device properties in the context of different applications.
1.2 The landscape of different approaches and applications. In the context of this paper, memristors can be used in applications beyond simple memory devices [10] . A 'big picture' landscape of memristor-based approaches for AI is shown in Figure 1. There is more than one way that memristors can perform computing. A unique feature of memristor devices is the ability to co-locate memory and computing and to break the von Neumann bottleneck at the lowest, nanometre-scale level. One such approach is the concept of in-memory computing, which uses memory not only to store the data but also to perform computation at the same physical location. Furthermore, memristors have long been considered for deep learning acceleration. Specifically, memristive crossbar arrays physically represent weights in artificial neural networks as conductances at each crosspoint. When voltages are applied at one side of the crossbar and current sensed on the orthogonal terminals, the array provides vector-matrix multiplication in constant time step using Kirchhoff's and Ohm's laws. Vector-matrix multiplications dominate most DL algorithms - hundreds of thousands are often needed during training and inference. When weights are implemented as memristor conductances, there is no need for the extensive power-hungry data movement required by conventional digital systems based on the von Neumann architecture.
Other more bio-realistic concepts are also being explored. These include schemes relying on spike-based communication. The central premise of this approach can be summarised with the motto 'computing with time, not in time'. It has been shown that memristors can directly implement some functions of biological neurons and synapses, most importantly, synapse-like plasticity, and neuron-like integration and spiking. In these solutions, the information is encoded and transferred in the form of voltage or current spikes. Memristor resistances are used as proxies for synaptic strengths. More importantly, adjustment of the resistances is controlled according to local learning rules. One popular local learning rule is spike-timingdependent plasticity (STDP), which adjust a local state variable such as conductance dynamically based on the relative timing of spikes. In a simple example, the conductance of a memristive 'synapse' can be increased or decreased depending on the degree of overlap between pre- and post-synaptic voltage pulses. There also exist implementations that do not require overlapping pulses, instead utilising the volatile internal dynamics of memristive devices. Spike-based computing promises further improvements in power-efficiency, taking the inspiration from the remarkable efficiency of the human brain.
Finally, we speculate that, for future developments in AI, new knowledge and computational models from the fields of computational neuroscience could play a crucial role. Virtually all recent developments in ML and DL are driven by the field of computer science. At the same time, the algorithmic inspiration from neuroscience is mostly based on old models established as early as the 1950s. Although we are still at the infancy of understanding the full working principles of the biological brain, novel brain-inspired architectural principles, beyond simple probabilistic deep learning approaches, could lead to higher-level cognitive functionalities. One such example is the concept of reservoir computing, which we discuss briefly in the paper. It is unlikely that current digital CMOS transistor technology can be optimized for the implementation of much more dynamic and adaptive systems in an efficient way. In contrast, memristor-based systems, with their rich switching dynamics and many state variables, may provide a perfect substrate to build a new class of intelligent and efficient neuromorphic systems.
Figure 1. The landscape of memristor-based systems for Artificial Intelligence. In-memory computing aims to eliminate the von-Neumann bottleneck by implementing compute directly within the memory. Deep learning accelerators based on memristive crossbars are used to implement vector-matrix multiplication directly using Ohm's and Kirchhoff's laws. Spiking neural networks, a type of artificial neural networks, are biologically more plausible and do not operate with continuous signals, but use spikes to process and transfer data. Memristor systems could provide a hardware platform to implement spike-based learning and inference. More complex functionalities (neuromorphic), beyond simple digital switching CMOS paradigm, directly implemented in memristive hardware primitives, might fuel the next wave of higher cognitive systems.
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Diagram: Memristor-Based Computing Landscape
### Overview
This diagram illustrates the landscape of memristor-based computing, showcasing its various applications and underlying principles. It highlights the shift from conventional von Neumann architecture to in-memory computing, the role of memristors in deep learning acceleration and spiking neural networks, and the potential for future cognitive computing. The diagram is organized around a central depiction of memristors, with radiating sections detailing different aspects of the technology.
### Components/Axes
The diagram is divided into several key sections:
* **In-memory computing:** Illustrates the integration of memory and computation.
* **Memristors:** Central focus, depicting different types (PCM, ReRAM).
* **Deep Learning Accelerators:** Shows a memristor crossbar array and its function in a MAC accelerator.
* **Memristor-based Spiking Neural Networks:** Depicts memristors mimicking synaptic functionality within a neural network.
* **Future of cognitive computing:** Illustrates the potential applications in cognitive abilities.
There are no explicit axes in the traditional sense, but the diagram uses spatial arrangement to convey relationships between components.
### Detailed Analysis or Content Details
**1. In-memory computing (Top-Left):**
* Depicts a transition from separate "Memory" and "Logic" blocks to an integrated "In-memory" block.
* Text: "Bringing computing closer to memory".
* Text: "Conventional von-Neumann architecture Minimising von-Neumann bottleneck improves efficiency".
**2. Memristors (Center):**
* Visual representation of different memristor types:
* **PCM (Phase Change Memory):** Depicted as a layered structure with red and blue spheres.
* **ReRAM (Resistive Random Access Memory):** Depicted as a layered structure with blue and red spheres.
* Label: "Memristors" is prominently displayed over the central image.
**3. Deep Learning Accelerators (Top-Right):**
* **Memristor crossbar array:** A grid of memristors with "Operating/Sensing Terminals" labeled.
* **Power-efficient analog MAC accelerator:** Illustrates the use of memristors in a matrix-vector multiplication.
* Equation: "I = VG" (Output current is proportional to input voltage).
* Text: "Input Multiplication matrix C is mapped onto ReRAM crossbar array".
* Text: "Input Multiplication vector is defined by voltage vector V".
* Text: "Output: current I represents a vector-matrix product".
**4. Memristor-based Spiking Neural Networks (Bottom-Left):**
* Depiction of a neural network with memristors acting as synapses.
* Text: "Memristor functionality".
* Text: "Synaptic functionality".
* Diagram of spiking neurons with input and output signals.
* Text: "Spike-based learning and inference".
**5. Future of cognitive computing (Bottom-Right):**
* Illustration of a human brain silhouette with various cognitive attributes highlighted.
* Attributes: "Attention", "Creativity", "Speed", "Focus", "Flexibility", "Memory".
* Text: "Novel cognitive applications".
* Depiction of bio-inspired algorithms, devices, and systems.
* Text: "Novel bio-inspired algorithms, devices and systems".
* Image of a circuit board with various components.
* Text: "Biology".
### Key Observations
* The diagram emphasizes the potential of memristors to overcome the limitations of the von Neumann architecture.
* The central placement of memristors highlights their crucial role in various computing paradigms.
* The diagram showcases a wide range of applications, from deep learning acceleration to cognitive computing.
* The use of visual representations of memristor structures (PCM, ReRAM) provides insight into their physical characteristics.
* The equation "I = VG" suggests a simple linear relationship between input voltage and output current in the MAC accelerator.
### Interpretation
The diagram presents a compelling vision of the future of computing, where memristors play a central role in enabling more efficient and intelligent systems. The shift from conventional von Neumann architecture to in-memory computing is presented as a key enabler for overcoming the limitations of traditional computing. The diagram suggests that memristors can be used to accelerate deep learning algorithms, mimic the functionality of biological neurons, and ultimately create systems with human-like cognitive abilities. The inclusion of "Biology" in the bottom section suggests that the design of these systems is inspired by the natural world. The diagram is a high-level overview and does not delve into the specific challenges and complexities of implementing these technologies. The diagram is a promotional piece, and as such, it presents a very optimistic view of the technology.
</details>
## 2. In-memory computing
In the von Neumann architecture, which dates back to the 1940s, memory and processing units are physically separated and large amounts of data need to be shuttled back and forth between them during the execution of various computational tasks. The latency and energy associated with accessing data from the memory units are key performance bottlenecks for a range of applications, in particular for the increasingly prominent artificial intelligence related workloads [11] . The energy cost associated with moving data is a key challenge for both severely energy constrained mobile and edge computing as well as high performance computing in a cloud environment due to cooling constraints. The current approaches, such as using hundreds of processors in parallel [12] or application-specific processors [13] , are not likely to fully overcome the challenge of data movement. It is getting increasingly clear that novel architectures need to be explored where memory and processing are better collocated. In-memory computing is one such non-von Neumann approach where certain computational tasks are performed in place in the memory itself organized as a computational memory unit [14,15 ,16, 17]. As schematically illustrated in Figure 2, in-memory computing obviates the need to move data into a processing unit. Computing is performed by exploiting the physical attributes of the memory devices, their array-level organization, the peripheral circuitry as well as the control logic. In this paradigm, the memory is an active participant in the computational task. Besides reducing latency and energy cost associated with data movement, in-memory computing also has the potential to improve the computational time complexity associated with certain tasks due to the massive parallelism afforded by a dense array of millions of nanoscale memory devices serving as compute units. By introducing physical coupling between the memory devices, there is also a potential for further reduction in computational time complexity [18, 19]. Memristive devices such as PCM, ReRAM and MRAM [20, 21] are particularly well suited for in-memory computing.
## Processing unit & Conventional memory Processing unit & Computational memory
Figure 2. In-memory computing. In a conventional computing system, when an operation f is performed on data D, D has to be moved into a processing unit. This incurs significant latency and energy cost and creates the well-known von Neumann bottleneck. With in-memory computing, f(D) is performed within a computational memory unit by exploiting the physical attributes of the memory devices. This obviates the need to move D to the processing unit. (Adapted and reproduced with permission [14] , Copyright 2017, Nature Research)
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Diagram: Processing Unit and Memory Interaction - Conventional vs. Computational
### Overview
The image presents a comparative diagram illustrating two architectures for processing unit and memory interaction. The left side depicts a conventional architecture, while the right side shows a computational memory architecture. Both diagrams represent the flow of data and control signals during a computation defined as A := f(A), where A is a variable and f is a function.
### Components/Axes
The diagram consists of two main sections, each representing an architecture. Each section contains:
* **Memory:** Divided into banks labeled "Bank #1" through "Bank #N".
* **Processing Unit:** Comprising a Control Unit, Arithmetic Logic Unit (ALU), and Cache.
* **Connections:** Arrows indicating data and control flow. Labels on these arrows include "CONTROL", "FETCH", "STORE", and "bottleneck".
* **Mathematical Notation:** "A := f(A)" appears above each diagram, representing the computation being performed.
* **Data Representation:** The variable 'A' is shown within the memory banks. In the computational memory architecture, the function 'f' is also stored within the memory bank alongside 'A'.
### Detailed Analysis or Content Details
**Left Diagram (Conventional Architecture):**
* **Memory:** The memory is divided into 'N' banks. The variable 'A' is located in Bank #1. Data flow between the memory and the processing unit is represented by pink arrows for 'FETCH' and 'STORE', and a black arrow for 'CONTROL'.
* **Processing Unit:** The 'FETCH' signal retrieves data from memory to the Cache. The Cache then passes the data to the ALU, which applies the function 'f' to it. The result is then 'STORE'd back into memory. The 'bottleneck' label is placed near the 'FETCH' and 'STORE' arrows, indicating a potential performance limitation.
* **ALU:** The output of the ALU is labeled 'f'.
**Right Diagram (Computational Memory Architecture):**
* **Memory:** The memory is divided into 'N' banks. Bank #1 contains both the variable 'A' and the function 'f'. The memory is further labeled as "Computational memory" and "Conventional memory".
* **Processing Unit:** The 'CONTROL' signal is sent from the Control Unit to the memory. The ALU receives the function 'f' directly from the memory (Bank #1) via a dotted arrow, bypassing the Cache.
* **ALU:** The output of the ALU is labeled 'f' with a circular arrow around it, suggesting an iterative process or feedback loop.
### Key Observations
* The computational memory architecture aims to reduce the "bottleneck" associated with data transfer between memory and the processing unit by storing the function 'f' directly in memory alongside the data 'A'.
* The dotted arrow in the computational memory architecture indicates a direct connection between the memory and the ALU for the function 'f', bypassing the Cache.
* The conventional architecture relies heavily on the Cache for intermediate data and function storage, potentially leading to performance limitations.
### Interpretation
The diagram illustrates a shift in computing paradigms from a traditional Von Neumann architecture (conventional) to a more modern computational memory architecture. The conventional architecture suffers from the "memory wall" problem, where the speed of data transfer between the processor and memory limits overall performance. The computational memory architecture attempts to address this by bringing computation closer to the data, reducing the need for frequent data transfers.
The presence of 'f' within the memory bank in the computational architecture suggests that the processing is partially or fully offloaded to the memory itself. This is a key characteristic of in-memory computing and near-data processing. The dotted line represents a more efficient pathway for the function 'f' to reach the ALU, potentially improving performance.
The diagram highlights the potential benefits of integrating computation and memory, leading to reduced latency, increased bandwidth, and improved energy efficiency. The 'A := f(A)' notation emphasizes the iterative nature of many computations, and the computational memory architecture is designed to optimize this iterative process.
</details>
Figure 3. The key physical attributes of memristive devices that facilitate in- memory computing . a) Binary storage capability whereby the devices can be switched between high and low resistance values in a repeatable manner (Adapted and reproduced with permission [22] . Copyright 2019, IOP Publishing). b) Multi- level storage capability whereby the devices can be programmed to a continuum of resistance values by the application of appropriate programming pulses (Adapted and reproduced with permission [23] . Copyright 2018, American Institute of Physics) c) The accumulative behavior whereby the resistance of a device can be progressively decreased by the successive application of identical programming pulses (Adapted and reproduced with permission [23] . Copyright 2018, American Institute of Physics).
<details>
<summary>Image 3 Details</summary>

### Visual Description
\n
## Scatter Plots: Resistance vs. Cycling/Current/Pulses
### Overview
The image presents three scatter plots (labeled a, b, and c) illustrating the relationship between resistance and different parameters: number of cycles, programming current, and number of pulses, respectively. Each plot includes schematic diagrams of a device structure, likely a memristor or similar resistive switching element, positioned above the data. The plots appear to demonstrate the resistive switching behavior of the device.
### Components/Axes
**Plot a:**
* **X-axis:** Number of cycles (logarithmic scale, from 10⁰ to 10¹⁰) labeled "Number of cycles".
* **Y-axis:** Resistance (logarithmic scale, from 10³ to 10⁷ Ω) labeled "Resistance (Ω)".
* **Data Series:** Two distinct data series, labeled "SET" (red circles) and "RESET" (black circles) in the top-left corner.
**Plot b:**
* **X-axis:** Programming current (linear scale, from 100 to 800 μA) labeled "Programming current (µA)".
* **Y-axis:** Resistance (logarithmic scale, from 10⁴ to 10⁷ Ω) labeled "Resistance (Ω)".
* **Data Series:** Three data series, represented by red triangles, green circles, and purple squares.
**Plot c:**
* **X-axis:** Number of pulses (linear scale, from 0 to 30) labeled "Number of pulses".
* **Y-axis:** Resistance (logarithmic scale, from 10⁴ to 10⁷ Ω) labeled "Resistance (Ω)".
* **Data Series:** Four data series, represented by red triangles, green circles, purple squares, and magenta diamonds.
All plots share a similar schematic diagram at the top, depicting a layered device structure. The diagrams include red and black rectangular elements, and a blue patterned element.
### Detailed Analysis or Content Details
**Plot a:**
* The "SET" data series (red) shows a decreasing resistance with increasing number of cycles, starting at approximately 6 x 10⁶ Ω and decreasing to approximately 2 x 10⁵ Ω. The trend is generally downward, but with some fluctuations.
* The "RESET" data series (black) shows an increasing resistance with increasing number of cycles, starting at approximately 2 x 10⁴ Ω and increasing to approximately 8 x 10⁶ Ω. The trend is generally upward, but with some fluctuations.
**Plot b:**
* Red triangles: Resistance decreases from approximately 7 x 10⁶ Ω at 100 μA to approximately 2 x 10⁵ Ω at 600 μA, then plateaus.
* Green circles: Resistance decreases from approximately 6 x 10⁶ Ω at 100 μA to approximately 3 x 10⁵ Ω at 600 μA, then plateaus.
* Purple squares: Resistance decreases from approximately 7 x 10⁶ Ω at 100 μA to approximately 2 x 10⁵ Ω at 600 μA, then plateaus.
**Plot c:**
* Red triangles: Resistance decreases rapidly from approximately 6 x 10⁶ Ω at 0 pulses to approximately 2 x 10⁵ Ω at 15 pulses, then plateaus.
* Green circles: Resistance decreases rapidly from approximately 6 x 10⁶ Ω at 0 pulses to approximately 2 x 10⁵ Ω at 15 pulses, then plateaus.
* Purple squares: Resistance decreases rapidly from approximately 6 x 10⁶ Ω at 0 pulses to approximately 2 x 10⁵ Ω at 15 pulses, then plateaus.
* Magenta diamonds: Resistance decreases rapidly from approximately 6 x 10⁶ Ω at 0 pulses to approximately 2 x 10⁵ Ω at 15 pulses, then plateaus.
### Key Observations
* All three plots demonstrate a clear resistive switching behavior, where the resistance can be switched between high and low states.
* Plot a shows the endurance of the switching behavior over many cycles.
* Plot b shows the switching behavior as a function of the applied programming current.
* Plot c shows the switching behavior as a function of the number of applied pulses.
* The data series in plots b and c are very similar, suggesting that the switching behavior is consistent across different conditions or devices.
### Interpretation
The data suggests that the device exhibits reliable resistive switching characteristics. The plots demonstrate that the resistance can be controlled by varying the number of cycles, programming current, and number of pulses. The logarithmic scales on the Y-axes highlight the significant change in resistance between the high and low resistance states. The schematic diagrams indicate a layered structure, likely a memristive device. The consistency of the data series in plots b and c suggests that the switching mechanism is robust and repeatable. The plateaus observed in plots b and c indicate a saturation effect, where further increases in current or pulses do not result in significant changes in resistance. This data is likely used to characterize the performance of a novel memory or neuromorphic computing device.
</details>
There are several key physical attributes that enable in-memory computing using memristive devices. First of all, the ability to store two levels of resistance/conductance values in a nonvolatile manner and to reversibly switch from one level to the other (binary storage capability) can be exploited for computing. Figure 3 a shows the resistance values achieved upon repeated switching of a representative PCM device between low resistance SET states and high resistance RESET states. Due to the SET and RESET states, resistance could serve as an additional logic state variable. In conventional CMOS, voltage serves as the single logic state variable. The input signals are processed as voltage signals and are output as voltage signals. By combining CMOS circuitry with memristive devices, it is possible to exploit the additional resistance state variable. For example, the RESET state could indicate logic '0' and the SET state could denote logic '1'. This enables logical operations that rely on the interaction between the voltage and resistance state variables and could enable the seamless integration of processing and storage. This is the essential idea behind memristive logic, which is an active area of research [24, 25, 26] . Memristive logic has the potential to impact application areas such as image processing [27] , encryption and database query [28] . Brain-inspired hyperdimensional computing that involves the manipulation of large binary vectors has recently emerged as another promising application area for in-memory logic [29, 30] . Going beyond binary storage, certain memristive devices can also be programmed to a continuum of resistance or conductance values (analog storage capability). For example, Figure 3 b shows a continuum of resistance levels in a PCM device achieved by the application of programming pulses with varying amplitude. The device is first programmed to the fully crystalline state, after which RESET pulses are applied with progressively increasing amplitude. The device resistance is measured after the application of each RESET pulse. Due to this property, it is possible to program a memristive device to a certain desired resistance value through iterative programming by applying several pulses in a closed-loop manner [31] . Yet another physical attribute that enables in-memory computing is the accumulative behavior exhibited by certain memristive devices. In these devices, it is possible to progressively reduce the device resistance by the successive application of SET pulses with the same amplitude. And in certain cases, it is possible to progressively increase the resistance by the successive application of RESET pulses. Experimental measurement of this accumulative behavior in a
PCM device is shown in Figure 3 c . This accumulative behavior is central to applications such as training deep neural networks which is described later. The intrinsic stochasticity associated with the switching behavior in memristive devices can also be exploited for inmemory computing [32] . Applications include stochastic computing [33] and physically unclonable functions [34] .
× Figure 4. a) Compressed sensing involves one matrix-vector multiplication. Data recovery is performed via an iterative scheme, using several matrix-vector multiplications on the very same measurement matrix and its transpose. b) An experimental illustration of compressed sensing recovery in the context of image compression is presented, showing 50% compression of a 128x128 pixel image. The normalized mean square error (NMSE) associated with the reconstructed signal is plotted against the number of iterations. Adapted and reproduced with permission [35] , Copyright 2018, IEEE.
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Diagram & Chart: Approximate Message Passing (AMP) Algorithm & Reconstruction Performance
### Overview
The image presents a visual explanation of the Approximate Message Passing (AMP) algorithm for iterative reconstruction, alongside a performance comparison of different numerical precisions. The top portion (a) illustrates the AMP algorithm's flow with matrix operations and vector updates. The bottom portion (b) shows a chart comparing the Normalized Mean Squared Error (NMSE) for different precision levels (PCM chip, 4x4-bit Fixed-point, and Floating-point) as a function of iterations, and visual examples of original and reconstructed images.
### Components/Axes
**Part a (AMP Algorithm Diagram):**
* **Header:** "Measurement" and "Iterative reconstruction (AMP Algorithm)"
* **Variables:** `y`, `x1` to `xN`, `q1` to `qM`, `u1` to `uN`, `z1` to `zM`
* **Matrices:** `M`, `M̂`, `MT`
* **Equation:** `y = Mx`
* **Equation:** `q(k) = M̂(k)`
* **Equation:** `u(k) = MT z(k)`
**Part b (Performance Chart):**
* **X-axis:** "Iterations k" (Scale: 0 to 30)
* **Y-axis:** "NMSE" (Scale: 10^-2 to 10^0, logarithmic)
* **Legend:**
* Red Solid Line: "PCM chip"
* Red Dashed Line: "4x4-bit Fixed-point"
* Green Dashed Line: "Floating-point"
* **Images:** "Original Image" and "Reconstructed Image" (side-by-side comparison)
### Detailed Analysis or Content Details
**Part a (AMP Algorithm Diagram):**
The diagram shows a series of matrix-vector operations.
* The "Measurement" section shows a matrix `M` multiplying a vector `x` (with components `x1` to `xN`) to produce a vector `y` (with components `y1` to `yM`). The elements of `M` are represented by small green arrows.
* The "Iterative reconstruction" section shows three stages:
1. `q(k) = M̂(k)`: An estimated matrix `M̂` (with elements represented by green arrows) is used to calculate `q(k)` (with components `q1` to `qM`).
2. `u(k) = MT z(k)`: The transpose of `M` (`MT`) is multiplied by `z(k)` (with components `z1` to `zM`) to produce `u(k)` (with components `u1` to `uN`).
3. The diagram shows iterative updates with `x̂1(k)` to `x̂N(k)` and the corresponding `q` and `z` vectors at each iteration `k`.
**Part b (Performance Chart):**
* **Floating-point:** The green dashed line representing Floating-point precision shows a steep downward trend initially, rapidly decreasing from approximately 10^0 to approximately 10^-3 NMSE within the first 10 iterations. The line then plateaus, with minimal further reduction in NMSE. At iteration 30, the NMSE is approximately 2x10^-3.
* **4x4-bit Fixed-point:** The red dashed line representing 4x4-bit Fixed-point precision shows a similar initial downward trend, but less steep than Floating-point. It starts at approximately 10^0 and decreases to approximately 5x10^-2 at iteration 30.
* **PCM chip:** The red solid line representing PCM chip precision shows a more gradual decrease in NMSE. It starts at approximately 10^0 and decreases to approximately 8x10^-2 at iteration 30. The line exhibits some oscillations.
* **Images:** The "Original Image" shows a building with visible details. The "Reconstructed Image" appears slightly blurred compared to the original, but retains the overall structure of the building.
### Key Observations
* Floating-point precision achieves the lowest NMSE and fastest convergence.
* 4x4-bit Fixed-point precision performs better than PCM chip precision, but significantly worse than Floating-point.
* The reconstructed image, while not perfect, demonstrates that the AMP algorithm can effectively reconstruct the original image.
* The NMSE curves suggest diminishing returns with increasing iterations beyond a certain point, particularly for Floating-point precision.
### Interpretation
The data demonstrates the impact of numerical precision on the performance of the AMP algorithm for image reconstruction. Floating-point precision provides the highest accuracy and fastest convergence, likely due to its ability to represent a wider range of values with greater precision. Fixed-point and PCM chip precisions introduce quantization errors that limit the algorithm's performance. The visual comparison of the original and reconstructed images confirms the quantitative results, showing that lower precision leads to a slightly degraded reconstruction quality. The logarithmic scale of the NMSE highlights the significant difference in performance between the different precision levels. The AMP algorithm appears to be effective in reconstructing images, but its performance is sensitive to the numerical precision used in the calculations. The plateauing of the NMSE curves suggests that there is a limit to the achievable reconstruction accuracy, even with high-precision arithmetic. This could be due to factors such as noise in the measurements or imperfections in the model used for reconstruction.
</details>
× × A very useful in-memory computing primitive enabled by the binary and analog nonvolatile storage capability is matrix-vector multiplication (MVM) [36, 37] . The physical laws that are exploited to perform this operation are Ohm's law and Kirchhoff's current summation laws. For example, to perform the operation Ax = b , the elements of A are mapped linearly to the conductance values of memristive devices organized in a crossbar configuration. The x values are mapped linearly to the amplitudes of read voltages and are applied to the crossbar along the rows. The result of the computation, b , will be proportional to the resulting current measured along the columns of the array. Compressed sensing and recovery are one of the applications that could benefit from an in-memory computing unit that performs matrix-vector multiplications. The objective behind compressed sensing is to acquire a large signal at subNyquist sampling rate and to subsequently reconstruct that signal accurately. Unlike most other compression schemes, sampling and compression are done simultaneously, with the signal getting compressed as it is sampled. Such techniques have widespread applications in the domain of medical imaging, security systems, and camera sensors. The compressed measurements can be thought of as a mapping of a signal x of length N to a measurement vector y of length M < N. If this process is linear, then it can be modeled by an M N measurement matrix M. The idea is to store this measurement matrix in the in-memory computing unit, with memristive devices organized in a cross-bar configuration (see Figure 4(a)). In this manner the compression operation can be performed in O(1) time complexity.
To recover the original signal from the compressed measurements, an approximate message passing algorithm (AMP) can be used, using an iterative algorithm that involves several matrix-vector multiplications on the very same measurement matrix and its transpose. In this way the same matrix that was coded in the in-memory computing unit can also be used for the reconstruction, reducing reconstruction complexity from O( MN) to O( N ). An experimental illustration of compressed sensing recovery in the context of image compression is shown in Figure 4(b). A 128x128-pixel image was compressed by 50% and recovered using the measurement matrix elements encoded in a PCM array. The normalized mean square error associated with the recovered signal is plotted as a function of the number of iterations. A remarkable property of AMP is that its convergence rate is independent of the precision of the matrix-vector multiplications. The lack of precision only results in a higher error floor, which may be considered acceptable for many applications. Note that, in this application, the measurement matrix remains fixed and hence the property of PCM that is exploited is the multi-level storage capability.
## 3. Deep learning accelerators
<details>
<summary>Image 5 Details</summary>

### Visual Description
\n
## Diagram: Neuromorphic Computing Architecture
### Overview
The image depicts a conceptual diagram of a neuromorphic computing architecture, illustrating how an image of a dog is processed through layers of artificial neurons and peripheral circuits. The diagram highlights the flow of information from a digital interface, through a control unit, into a network of peripheral circuits, and finally to a classification output ("dog").
### Components/Axes
The diagram consists of the following key components:
* **Digital Interface:** A dark purple rectangle on the left side, representing the input source.
* **Control Unit:** A dark blue rectangle connected to the digital interface.
* **Peripheral Circuits:** Multiple green rectangles, each containing a grid of smaller squares representing individual processing elements. These are arranged in a repeating pattern.
* **Communication Network:** A dotted red line connecting the peripheral circuits.
* **Artificial Neural Network:** Layers of white circles representing neurons, with connections between them. A red arrow indicates the flow of information.
* **Image Input:** A photograph of a dog in the top-left corner.
* **Output Label:** The word "dog" is written to the right of the final layer of the neural network.
There are no explicit axes or scales in this diagram. It is a schematic representation of a system rather than a data visualization.
### Detailed Analysis / Content Details
The diagram shows a multi-layered system.
1. **Input:** An image of a dog is presented as the initial input.
2. **Neural Network:** The image is processed through multiple layers of interconnected neurons (white circles). The connections between neurons are represented by lines. The red arrow indicates the direction of information flow.
3. **Peripheral Circuits:** The output of the neural network is then fed into a series of peripheral circuits (green rectangles). Each circuit appears to contain a grid of processing elements (small squares). The arrangement of these elements within the circuits is consistent across all shown circuits.
4. **Communication Network:** The peripheral circuits are interconnected via a communication network (dotted red line).
5. **Control Unit & Digital Interface:** The entire system is controlled by a control unit (dark blue rectangle) which receives input from a digital interface (dark purple rectangle).
The diagram shows a repeating pattern of peripheral circuits, indicated by the ellipsis ("...") suggesting that the system can be scaled. The number of neurons in each layer of the neural network appears to decrease as the information flows through the layers. The exact number of neurons in each layer is difficult to determine due to the schematic nature of the diagram.
### Key Observations
* The diagram emphasizes the parallel processing nature of the architecture, with multiple peripheral circuits operating concurrently.
* The use of a neural network suggests that the system is designed for pattern recognition and classification tasks.
* The diagram does not provide specific details about the algorithms or hardware used in the system. It is a high-level conceptual overview.
* The image of the dog is used as a specific example to illustrate the system's ability to recognize objects.
### Interpretation
This diagram illustrates a neuromorphic computing architecture designed to mimic the structure and function of the biological brain. The key idea is to move away from traditional von Neumann architectures, which separate processing and memory, and towards architectures where processing and memory are co-located, as in the brain.
The neural network layers perform feature extraction and pattern recognition, while the peripheral circuits likely implement the actual computation and memory storage. The communication network allows the circuits to exchange information and coordinate their activities. The control unit manages the overall operation of the system.
The use of an image of a dog as an example suggests that the system is intended for image recognition tasks. However, the architecture is general enough to be applied to other types of data and tasks.
The diagram highlights the potential advantages of neuromorphic computing, such as low power consumption, high parallelism, and robustness to noise. However, it also reveals the challenges of designing and building such systems, such as the complexity of the hardware and the difficulty of programming them. The diagram is a conceptual illustration and does not provide enough detail to assess the performance or feasibility of the architecture. It is a high-level overview of a potential approach to building more brain-like computers.
</details>
Communicationnetwork
Figure 5. Deep learning based on in-memory computing. The various layers of a neural network are mapped to a computational memory unit where memristive devices are organized in a crossbar configuration. The synaptic weights are stored in the conductance state of the memristive devices. A global communication network is used to send data from one array to another. Adapted and reproduced with permission [17] , Copyright 2020, Nature Research.
Deep neural networks (DNNs), loosely inspired by biological neural networks, consist of parallel processing units called neurons interconnected by plastic synapses. By tuning the weights of these interconnections using millions of labelled examples, these networks are able to perform certain supervised learning tasks remarkably well. These networks are typically trained via a supervised learning algorithm based on gradient descent. During the training phase, the input data is forward propagated through the neuron layers with the synaptic networks performing multiply-accumulate operations. The final layer responses are compared with input data labels and the errors are back-propagated. Both steps involve sequences of matrix-vector multiplications. Subsequently, the synaptic weights are updated to reduce the error. This optimization approach can take multiple days or weeks to train state-of-the-art networks on conventional computers. Hence, there is a significant effort towards the design of custom ASICs based on reduced precision arithmetic and highly optimized dataflow [13, 38] . However, the need to shuttle millions of synaptic weight values between the memory and processing unit remains a key performance bottleneck and hence in-memory computing is being explored as an alternative approach for both inference and training of DNNs [39, 40] . The essential idea is to map the various layers of a neural network to an in-memory computing unit where memristive devices are organized in a crossbar configuration (see Figure 5). The synaptic weights are stored in the conductance state of the memristive devices and the propagation of data through each layer is performed in a single step by inputting the data to the crossbar rows and deciphering the results at the columns.
<details>
<summary>Image 6 Details</summary>

### Visual Description
\n
## Chart/Diagram Type: Performance Comparison of Neural Network Training Methods
### Overview
The image presents a comparison of the test accuracy achieved by different training methods for a neural network on the CIFAR-10 dataset, plotted against training time. The top portion of the image shows a diagram of the neural network architecture, while the bottom portion displays a line chart illustrating the performance of three different training approaches: a floating-point (FP32) baseline, custom training, and direct mapping of FP32 weights.
### Components/Axes
* **X-axis:** Time (s), logarithmic scale from 10<sup>-5</sup> to 10<sup>5</sup>.
* **Y-axis:** Test Accuracy (%), ranging from 60% to 100%.
* **Data Series:**
* Floating point (FP32) baseline (dashed red line)
* Experiments: Custom training (blue line with square markers)
* Experiments: Direct mapping of FP32 weights (red line with diamond markers)
* **Neural Network Diagram:** Shows a series of convolutional layers and ResNet blocks.
* Input: CIFAR10 image
* First Layer: Conv (3x3x16)
* ResNet Block 1: 3x3x16 convolutions
* 6 Conv Layers repeated 3 times with increasing filter sizes (1x1x28, 3x3x28, 3x3x56)
* Final Layer: Softmax (56x10)
* Output: Label
* **Legend:** Located in the bottom-left corner, clearly identifying each data series by color and marker type.
### Detailed Analysis or Content Details
The chart displays the test accuracy as a function of training time.
* **Floating Point (FP32) Baseline:** Starts at approximately 91% accuracy at 10<sup>-5</sup> seconds and decreases steadily to around 81% accuracy at 10<sup>5</sup> seconds. The line is relatively smooth.
* **Custom Training:** Begins at approximately 89% accuracy at 10<sup>-5</sup> seconds and remains relatively stable, fluctuating between approximately 88% and 91% accuracy throughout the entire training period (up to 10<sup>5</sup> seconds).
* **Direct Mapping of FP32 Weights:** Starts at approximately 74% accuracy at 10<sup>-5</sup> seconds and increases to around 83% accuracy at 10<sup>2</sup> seconds, then fluctuates significantly between approximately 75% and 85% accuracy for the remainder of the training period.
Specific data points (approximate):
| Time (s) | FP32 Baseline (%) | Custom Training (%) | Direct Mapping (%) |
|---|---|---|---|
| 10<sup>-5</sup> | 91 | 89 | 74 |
| 10<sup>-3</sup> | 90 | 89 | 78 |
| 10<sup>-1</sup> | 88 | 90 | 81 |
| 10<sup>1</sup> | 86 | 89 | 82 |
| 10<sup>2</sup> | 84 | 88 | 83 |
| 10<sup>3</sup> | 82 | 90 | 78 |
| 10<sup>5</sup> | 81 | 91 | 82 |
### Key Observations
* The FP32 baseline exhibits a consistent decrease in accuracy over time, suggesting potential overfitting or diminishing returns.
* Custom training maintains a relatively high and stable accuracy throughout the training process.
* Direct mapping of FP32 weights shows an initial increase in accuracy, followed by significant fluctuations, indicating instability or sensitivity to training parameters.
* Custom training consistently outperforms direct mapping of FP32 weights.
* At the beginning of training, direct mapping starts with significantly lower accuracy than both the baseline and custom training.
### Interpretation
The data suggests that custom training is the most effective method for achieving and maintaining high accuracy on the CIFAR-10 dataset, given the network architecture shown. The FP32 baseline, while starting with high accuracy, degrades over time, potentially due to overfitting. Direct mapping of FP32 weights shows promise with an initial increase in accuracy, but its instability and lower overall performance compared to custom training indicate that it requires further optimization or is not well-suited for this specific network and dataset.
The neural network diagram illustrates a deep convolutional neural network with ResNet blocks, which are known for their ability to mitigate the vanishing gradient problem and enable the training of deeper networks. The architecture appears to be designed for image classification tasks, as evidenced by the final Softmax layer and the CIFAR-10 image input. The diagram provides context for understanding the performance results, as the network's complexity and design choices influence its training dynamics and accuracy. The logarithmic scale on the x-axis highlights the importance of considering training time, especially for computationally intensive tasks like deep learning. The fluctuations in the direct mapping method could be due to the challenges of transferring weights from a higher-precision representation (FP32) to a lower-precision one, potentially leading to quantization errors or instability during training.
</details>
Time (s)
Figure 6. Deep learning inference. Experimental results on ResNet-32 using the CIFAR-10 dataset. The classification accuracies obtained via the direct mapping and custom training approaches are compared to the floating-point baseline. Adapted and reproduced with permission [40] , Copyright 2019, IEEE.
Deep learning inference refers to just the forward propagation in a DNN once the weights have been learned. Both binary and analogue storage capability of memristive devices can be exploited for the MVM operations associated with the inference operation. The key challenges are the inaccuracies associated with programming the devices to a specified synaptic weight as well as drift, noise etc. associated with the conductance values [41] . Due to these reasons, the synaptic weights that are obtained by training in high precision arithmetic (e.g. 32-bit floating point) cannot be mapped directly to computational memory. However, it can be shown that by customizing the training procedure to make it aware of these devicelevel nonidealities, it is possible to obtain synaptic weights that are suitable for being mapped to an in-memory computing unit [42,40] . A more recent approach is to use the committee machines of multiple smaller neural networks. The approach shows the promise of increasing inference accuracy without increasing the number of devices by using a committee of smaller neural networks [43] . Figure 6 shows mixed hardware/software experimental results using a prototype multi-level PCM chip. The synaptic weights are mapped to PCM devices organized in a 2-PCM differential configuration (723,444 PCM devices in total). It can be seen that the custom training scheme approaches the floating-point base-line, whereas the direct mapping approach fails to deliver sufficient accuracy. The slight temporal decline in accuracy is attributed to the conductance drift exhibited by PCM devices [44] . However, in spite of the drift, a classification accuracy of close to 90% is maintained over a significant duration of time.
Figure 7. Deep learning training. a) Schematic illustration of the mixed-precision architecture for training DNNs. b) The synaptic weight distributions and classification accuracies are compared between the experiments and floating point baseline [45] .
<details>
<summary>Image 7 Details</summary>

### Visual Description
## Diagram & Chart: Neural Network Training Accuracy
### Overview
The image contains two main parts: (a) a diagram illustrating a high-precision unit and computational memory architecture for neural network training, and (b) a chart showing the accuracy of different models (FP64, PCM model, and Experiment) on training and test sets over training epochs.
### Components/Axes
**(a) Diagram:**
* **High-precision unit (red dashed box):** Contains blocks labeled "Forward propagation", "Backward propagation", "Weight update", "Compute ΔW", "Accumulate ΔW", and a computational block with "+", "x", and "floor(x/ε)".
* **Computational memory (blue dashed box):** Contains an array of memory cells with resistors and DAC/ADC connections. Also contains a "DAC/ADC Programming circuit".
* Arrows indicate data flow between the high-precision unit and the computational memory.
**(b) Chart:**
* **X-axis:** "Training epoch" ranging from 0 to 30.
* **Y-axis:** "Accuracy (%)" ranging from 95 to 100.
* **Data Series:**
* "Training set" (black diamonds)
* "Test set" (grey diamonds)
* "FP64" (light blue solid line)
* "PCM model" (green solid line)
* "Experiment" (teal solid line with blue diamonds)
* **Legend:** Located in the top-right corner, associating colors with the data series.
### Detailed Analysis or Content Details
**(a) Diagram:**
The diagram depicts a system for neural network training. The "High-precision unit" performs the standard forward and backward propagation steps, calculates weight updates (ΔW), and accumulates these updates. The "Computational memory" appears to store weights and perform computations using DAC/ADC conversion and a programming circuit. The "floor(x/ε)" block suggests a quantization operation. The arrows indicate a bidirectional flow of data between the two units.
**(b) Chart:**
* **Training Set:** Starts at approximately 96.5% accuracy at epoch 0, increases rapidly to approximately 99.2% accuracy by epoch 10, and plateaus around 99.3% for the remainder of the training period (epochs 10-30).
* **Test Set:** Starts at approximately 97.2% accuracy at epoch 0, increases to approximately 98.2% accuracy by epoch 10, and plateaus around 97.8% for the remainder of the training period.
* **FP64:** Starts at approximately 96.8% accuracy at epoch 0, increases to approximately 99.1% accuracy by epoch 10, and plateaus around 99.2% for the remainder of the training period.
* **PCM Model:** Starts at approximately 96.5% accuracy at epoch 0, increases to approximately 98.5% accuracy by epoch 10, and plateaus around 98.7% for the remainder of the training period.
* **Experiment:** Starts at approximately 96.7% accuracy at epoch 0, increases to approximately 98.8% accuracy by epoch 10, and plateaus around 98.9% for the remainder of the training period.
### Key Observations
* All models exhibit a rapid increase in accuracy during the initial training epochs (0-10).
* The training set consistently achieves higher accuracy than the test set, indicating some degree of overfitting.
* The "Experiment" model achieves the highest accuracy on both the training and test sets, followed by the "PCM model" and then "FP64".
* The accuracy plateaus for all models after approximately 10 epochs, suggesting that further training would yield minimal improvements.
### Interpretation
The diagram illustrates a novel architecture for neural network training that leverages computational memory to potentially improve efficiency or reduce precision requirements. The chart demonstrates the performance of this architecture ("Experiment") compared to traditional floating-point (FP64) and a "PCM model" (likely a phase-change memory based model).
The results suggest that the "Experiment" model achieves comparable or slightly better accuracy than FP64, while the PCM model performs well but slightly underperforms the "Experiment". The gap between training and test accuracy indicates that the models are overfitting to the training data, which could be addressed through regularization techniques. The plateau in accuracy after 10 epochs suggests that the models have converged and further training is unlikely to yield significant improvements. The use of a "floor(x/ε)" block in the diagram suggests that the system may be employing quantization to reduce memory usage or computational complexity. The DAC/ADC components indicate a conversion between digital and analog signals, potentially for efficient weight storage and computation within the computational memory.
</details>
In-memory computing can also be used in the context of supervised training of DNNs with backpropagation. When performing training of a DNN encoded in crossbar arrays, forward propagation is performed in the same way as inference described above. Next, backward propagation is performed by inputting the error gradient from the subsequent layer onto the columns of the current layer and deciphering the result from the rows. Subsequently the error gradient is computed. Finally, the weight update is performed based on the outer product of activations and error gradients of each layer. This weight update relies on the accumulative behaviour of memristive devices. Recent deep learning research shows that when training DNNs, it is possible to perform the forward and backward propagations rather imprecisely while the gradients need to be accumulated in high precision [ 46 ] . This observation makes the DL training problem amenable to the mixed-precision in-memory computing approach that was recently proposed [ 47 ] . The in-memory compute unit is used to store the synaptic weights and to perform the forward and backward passes, while the weight changes are accumulated in high precision (Figure 7(a)) [ 48 , 49 ] . When the accumulated weight exceeds a certain threshold, pulses are applied to the corresponding memory devices to alter the synaptic weights. This approach was tested using the handwritten digit classification problem based on the MNIST data set. A two-layered neural network was employed with 2-PCM devices in differential configuration (approx. 400,000 devices) representing the synaptic weights. Resulting test accuracy after 20 epochs of training was approx. 98% (Figure 7(b)). After training, inference on this network was performed for over a year with marginal reduction in the test accuracy. The crossbar topology also facilitates the estimation of the gradient and the in-place update of the resulting synaptic weight all in O(1) time complexity [ 50 , 39] . By obviating the need to perform gradient accumulation externally, this approach could yield better performance than the mixed-precision approach. However, significant improvements to the memristive technology, in particular the accumulative behavior, is needed to apply this to a wide range of DNNs [ 51 , 52 ] .
Compared to the charge-based memory devices that are also used for in-memory computing [53, 54, 55] , a key advantage of memristive devices is the potential to be scaled to dimensions of a few nanometers [56, 57, 58, 59,60] . Most of the memristive devices are also suitable for back end of line integration, thus enabling their integration with a wide range of front-end CMOS technologies. Another key advantage is the non-volatility of these devices that would obviate the need for computing systems to be constantly connected to a power supply. However, there are also challenges that need to be overcome. The significant intra-device and intra-device variability associated with the SET and RESET states is a key challenge for applications where memristive devices are used for logical operations. For applications that rely on analogue storage capability, a significant challenge is programming variability that captures the inaccuracies associated with programming an array of devices to desired conductance values. In ReRAM, this variability is attributed mostly to the stochastic nature of filamentary switching and one prominent approach to counter this is that of establishing preferential paths for CF formation [ 61 , 62 ] . Representing single computational elements by using multiple memory devices is another promising approach [ 63 ] . Yet another challenge is the temporal and temperature-induced variations of the programmed conductance values. The resistance 'drift' in PCM devices, which is attributed to the intrinsic structural relaxation of the amorphous phase, is an example. The concept of projected phase change memory is a promising approach towards tackling 'drift' [ 64 , 65 ] . The requirements that the memristive devices need to fulfil when employed for computational memory are heavily application dependant. For memristive logic, high cycling endurance ( > 10 12 cycles) and low device-to-device variability of the SET/RESET resistance values are critical. For computational tasks involving read-only operations, such as matrix-vector multiplication, it is required that the conductance states remain relatively unchanged during their execution. It is also desirable to have a gradual analogue-type switching characteristic for programming a continuum of resistance values in a single device. A linear and symmetric accumulative behaviour is also required in applications where the device conductance needs to be incrementally updated such as in deep learning training [ 66 ] . For stochastic computing applications, random device variability is not problematic, but graceful device degradation is highly desirable, as described in [ 67 ].
## 4. Spiking Neural Networks and Memristors
As opposed to the deep learning networks discussed above, spiking neural networks (SNNs) can more naturally incorporate the notion of time in signal encoding and processing. SNNs are typically modelled on the integrate-and-fire behaviour of neurons in the brain. In this framework, neurons communicate with each other using binary signals or spikes. The arrival of a spike at a synapse triggers a current flow into the downstream neuron, with the magnitude of the current weighted by the effective conductance of the synapse. The incoming currents are integrated by the neuron to determine its membrane potential and a spike is issued when the potential exceeds a threshold. This spiking behaviour can be triggered in a deterministic or probabilistic manner. Once a spike is issued, the membrane potential is reset to a resting potential or decreased according to some predetermined rule. The integration is limited to a specific time window, or else a leak factor is incorporated in the integration, endowing the neuron model with a finite memory of past spiking events.
Compared to the realization of second-generation deep neural networks (DNNs discussed in the previous section), SNNs can potentially have significant improvements in efficiency. The first reason for this comes from the underlying signal encoding mechanism. The calculation of the output of a neuron involves the determination of the weighted sum of synaptic weights with real-valued neuronal outputs of the previous layer. For a fully connected second generation DNN with 𝑁 neurons in each layer, this requires 𝑁 ! multiplications of real valued numbers, typically stored in low precision representations. In contrast, the forward propagation operation in an SNN only requires addition operations, as the input neuronal signals are binary spike signals. To elaborate, assume that the input signal is encoded as a spike train with duration 𝑇 , with a minimum inter-spike interval of ∆𝑡 . If the probability of a spike at any instant of time is 𝑝 , then on an average 𝑁𝑝𝑇/∆𝑡 spikes have to be propagated through the synapses, and this requires 𝑁 ! 𝑝𝑇/∆𝑡 addition operations. In most modern processors, the cost of multiplication, 𝐶 " , is 3-4 times higher than that of addition, 𝐶 # . Hence, provided the neuronal and synaptic variables required for computation are available in the processor, SNNs offer a path to more efficient computation if the inequality
$$C _ { a } p \left ( \frac { T } { \Delta t } \right ) < C _ { m }$$
holds. Hence, it is important to develop algorithms for SNNs that minimize 𝑝 and (𝑇/∆𝑡) to improve computational efficiency. This requires the use of sparse binary signal encoding schemes that go beyond rate coding that is typically used in SNNs today. The following section will discuss strategies to develop general-purpose learning rules for SNNs that satisfy such constraints.
The second potential for efficiency improvement of SNNs as compared to second-generation networks arises thanks to novel memory-processor architectures based on memristive devices. While SNNs can be implemented using Si CMOS SRAM or DRAM technologies, the advent of novel nanoscale memristive devices provides opportunities for significant improvements in overall computational efficiency.
Figure 8. A cross-bar array based representation of an SNN. Each synaptic weight is represented by the differential conductance of two nanoscale devices in the crossbar.
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Diagram: Neural Network and Gated Recurrent Unit (GRU)
### Overview
The image depicts a simplified illustration of a neural network connection on the left, and a Gated Recurrent Unit (GRU) architecture on the right. The neural network shows connections between input nodes (green) and output nodes (orange) with labeled weights. The GRU diagram illustrates the flow of information through gates and the computation of output.
### Components/Axes
The diagram consists of the following components:
* **Input Layer (Left):** Green circles labeled *Xj*.
* **Output Layer (Left):** Orange circles labeled *Yk*.
* **Weight (Left):** Labeled *wk j* connecting input and output nodes.
* **Input Vector (Right):** Green circles labeled *Vj*.
* **Gates (Right):** Represented by square boxes with red lines indicating input and green lines indicating output.
* **Gate Weights (Right):** Labeled *Gkj-* and *Gkj+*.
* **Summation Node (Right):** Triangle-shaped nodes with "+" and "-" signs, indicating summation and subtraction.
* **Output (Right):** Orange circles labeled *Ik*.
### Detailed Analysis or Content Details
**Neural Network (Left):**
The neural network shows a connection between an input layer with multiple nodes (Xj) and an output layer with multiple nodes (Yk). Each connection is labeled with a weight (wk j). The diagram does not provide specific numerical values for the weights or the number of nodes.
**Gated Recurrent Unit (GRU) (Right):**
The GRU diagram shows a series of horizontal lines representing the flow of information.
* There are 5 input nodes (Vj) connected to a series of gate blocks.
* Each gate block consists of multiple square boxes.
* Red vertical lines connect the input nodes to the gate blocks.
* Green horizontal lines connect the gate blocks to each other and to the summation nodes.
* There are three summation nodes, each with a "+" and "-" sign.
* Each summation node connects to an output node (Ik).
* The gate weights are labeled *Gkj-* and *Gkj+*. The diagram does not provide specific numerical values for these weights.
* The diagram does not provide the number of gate blocks within each GRU cell.
### Key Observations
* The GRU diagram illustrates the core components of a GRU cell, including the input gate, reset gate, and update gate (though these are not explicitly labeled).
* The summation nodes suggest a weighted combination of inputs.
* The diagram is a simplified representation and does not show all the details of a GRU cell, such as the activation functions.
* The diagram does not provide any numerical data or values.
### Interpretation
The diagram illustrates the fundamental architecture of a neural network and a GRU cell. The neural network portion demonstrates the basic connection between input and output layers, while the GRU portion shows how information is processed and controlled through gates. The GRU is a type of recurrent neural network designed to handle sequential data by selectively remembering or forgetting information. The gates (represented by the square boxes) control the flow of information, allowing the GRU to learn long-term dependencies in the data. The diagram highlights the key components of a GRU cell, but it is a simplified representation and does not include all the details of a fully implemented GRU. The absence of numerical values suggests that the diagram is intended to convey the conceptual structure rather than specific calculations. The diagram is a visual aid for understanding the flow of information within these neural network architectures.
</details>
Memristive devices can be integrated at the junctions of crossbar arrays to represent the weights of synapses, and CMOS circuits at the periphery can be designed to implement the neuronal integration and learning logic. As mentioned above, this architecture enables the computation of spike propagation operation in an efficient manner based on Kirchhoff's law as:
$$I _ { k } = \sum _ { j } \left ( G _ { k j } ^ { + } - G _ { k j } ^ { - } \right ) V _ { j }$$
In this formula, 𝑉 % denotes the applied voltage pulses that are triggered when an input neuron spikes and are applied to the line connected to the 𝑗 th input neuron, 𝐺 $% & and 𝐺 $% ' are the conductances of the devices configured in a differential configuration to represent the synaptic weight, and 𝐼 $ is the total incoming current into the 𝑘 th output neuron. The small form factor of the devices, coupled with the scalability of operating voltages and currents beyond what is possible with conventional CMOS, suggests that these architectures can have several orders of magnitude efficiency improvement over Silicon based implementations [68,69] .
However, apart from the already mentioned non-idealities of memritive devices, crossbar arrays with more than 2048x2048 devices cannot be fabricated and operated reliably due to the resistance drop on the wires and the sneak-paths that corrupt the measurement and programming of synaptic states. One approach to mitigate these issues is to design neurosynaptic cores with smaller crossbars and associated neuron circuits, tile these cores on a 2D array, and provide communication fabrics between the cores [70] . Such tiled neurosynaptic core-based designs are particularly amenable for realizing SNNs, as only binary spikes corresponding to intermittently active spiking neurons need to be transported between cores, as opposed to real-valued neuronal variables that are active for all the neurons in the core in the case of deep learning networks. This is the second inherent advantage that SNNs have over DNNs for computational efficiency improvement.
Overcoming the reliability challenges mentioned above is essential for building reliable systems, and would require the co-optimization of algorithms and architectures that are designed to mitigate or leverage these non-ideal behaviours for computation. Two kinds of systems can be visualized based on the application mode. Inference engines, which do not support on-chip learning, can be designed based on memristive devices integrated on crossbars, where the devices are programmed to the desired conductance states based on the weights obtained from software training. However, as memristive devices support incremental conductance changes by the application of suitable electrical programming pulses, it is also possible to design learning systems where network weight updates are implemented on-chip in an event-driven manner [82] . There are also many recent examples where these devices have been engineered to mimic the integration and fire characteristics of biological neurons [71, 72,73], potentially enabling the realization of all-memristor implementations of spiking neural networks [74] . The field is still in its infancy, and so far, has only witnessed small proof-ofconcept demonstrations. We now discuss some of the approaches that have been explored towards realizing memristive based inference-only spiking networks as well as learning networks with SNNs.
4.1. Memristive SNNs for inference . A common approach to develop SNNs is to start with a second-generation ANN trained using traditional backpropagation-based methods, and then convert the resulting network to a spiking network in software. These solutions are based on weight-normalization schemes so that the spike rates of the neurons in the SNN are proportional to the activations of the neurons in the ANN [75, 76] . While this should in principle result in SNNs with comparable accuracies as their second-generation counterparts, some device-aware re-training would typically be necessary when the network is implemented in hardware due to the non-linearity and limited dynamic ranges of nanoscale devices.
One of the differentiating features of inference engines is that the nanoscale devices storing state variables are programmed only rarely, compared to the number of reads (potentially at every inference cycle). Since higher-energy programming cycles have a stronger effect in degrading device lifetimes compared to the lower-energy read cycles, this mode of operation can have better overall system reliability compared to that of learning systems.
In a preliminary hardware demonstration leveraging this approach, R. Midya et al. used memristors based on SiOxNy:Ag to implement compact oscillatory neurons whose output voltage oscillation frequency is proportional to the input current [77] . In this proof-of-concept demonstration of a 3-layer network, ANN to SNN conversion was limited to the last layer alone, but the approach could be extended to hidden layers as well.
4.2. Memristive SNNs for unsupervised learning and adaptation. Most hardware demonstrations of SNNs using memristive devices have focused on the unsupervised learning paradigm, where the synaptic weights are modified in an unsupervised manner according to the biologically inspired spike timing dependent plasticity (STDP) rule [78] . The rule captures the experimental observation that when a synapse experiences multiple pre-before-post pairings, the effective synaptic strength increases, and conversely, multiple post-before-pre spike pairs result in an effective decrease of synaptic conductance.
It should be noted that while other biological mechanisms may also play a key role in learning and memory formation in the brain, as have been observed experimentally [79, 80] , STDP is a simple local learning rule which is especially straight-forward to implement in hardware. While it is possible to implement timing dependent plasticity rules using many-transistor CMOS circuits [81] , it was experimentally demonstrated early on that memristive devices can exhibit STDP-like weight adaptation behaviours upon the application of suitable waveforms [82, 83,84] . Going beyond individual device demonstrations, IBM has also demonstrated an integrated neuromorphic core with 256x256 phase change memory synapses fabricated along with Si CMOS neuron circuits capable of on-chip learning based on a simplified model of STDP for auto-associative pattern learning tasks [85] .
Boybat et al. used phase change memristive synapses to demonstrate temporal correlation detection through unsupervised learning based on a simplified form of STDP [86] as shown in Figure 9. In their experiment, a multi-memristive architecture was introduced, where 𝑁 PCM devices are used to represent one synapse, with all devices within a synapse read during spike transmission, but only one of the devices, selected through an arbitration scheme, is programmed to update the synaptic weight. Software equivalent accuracies could be obtained in the experiment with this scheme, although the individual devices are plagued by several common non-ideal effects such as programming non-linearity, read noise, and conductance drift. Note that with 𝑁 = 1 device representing a synapse, the network accuracy was significantly lower than the software baseline; 𝑁 = 7 devices were necessary to obtain close to ideal performance.
Spiking networks can also be used for other unsupervised learning [87] and adaptation tasks. Recently, Y. Fang et al. demonstrated that certain optimization problems could be solved driven by the coupled dynamics of ferroelectric field-effect transistor (FeFET) based spiking neurons [88] . While there was no synaptic weight adaptation in this approach, the optimal solution to the problem is determined by the coupled interactions between the neurons which modulate each other's membrane potentials in an event-driven manner.
Figure 9 . a) Unsupervised learning demonstration using multi-memristive PCM architecture. The network consists of an integrate and fire neuron receiving inputs from 1000 multi-PCM synapses, with each synapse being excited by Poisson generated binary spike streams. 10% of the synapses receive correlated inputs, while the rest receive uncorrelated inputs. The weights evolve based on the simplified STDP rule shown. b) With N=7 PCM device per synapse, the correlated and uncorrelated synaptic weights evolve to well-separated values, while with N=1, the separation is corrupted due to programming noise. Adapted with permission [86], Copyright 2018, Nature Research.
<details>
<summary>Image 9 Details</summary>

### Visual Description
## Diagram: Synaptic Plasticity and Neuronal Network Response
### Overview
The image presents a diagram illustrating the process of synaptic plasticity, specifically Spike-Timing-Dependent Plasticity (STDP), and its effect on synaptic weights within neuronal networks. The diagram is split into two main sections: (a) a schematic of the synaptic process and (b) graphs showing synaptic weight changes for different network sizes and input correlations.
### Components/Axes
**Section (a): Synaptic Process Schematic**
* **Input streams:** Represented by multiple lines converging into synapses.
* **Synapses:** Illustrated as small squares connecting input streams to the neuron.
* **Postsynaptic outputs:** Depicted as a neuron with a "Neuronal membrane" component and a "Threshold and fire" mechanism.
* **Neuronal spike events:** Shown as output signals from the neuron.
* **STDP:** A graph illustrating the change in synaptic weight (ΔW) as a function of the time difference between pre- and post-synaptic spikes (t<sub>post</sub> - t<sub>pre</sub>).
* **Input spike events:** Represented by waveforms with a duration of 950 ns.
* **Current waveforms:** Two waveforms representing current changes, one with 440 μA and a duration of 100 ns, and another with 100 μA and a duration of 100 ns.
**Section (b): Synaptic Weight Graphs**
* **X-axis:** "Experiment time steps (T<sub>s</sub>)" ranging from 0 to 300.
* **Y-axis:** "Synaptic weight" ranging from 0 to 1.0.
* **Top Graph (N = 1):** Shows synaptic weight changes for a network with one neuron. Two lines are present:
* Orange line: Represents correlated inputs.
* Blue line: Represents uncorrelated inputs.
* **Bottom Graph (N = 7):** Shows synaptic weight changes for a network with seven neurons. Two lines are present:
* Orange line: Represents correlated inputs.
* Blue line: Represents uncorrelated inputs.
* **Gray shaded regions:** Indicate periods of input stimulation.
### Detailed Analysis or Content Details
**Section (a): Synaptic Process Schematic**
* The diagram shows input streams converging onto synapses, which then stimulate a neuron.
* The STDP graph illustrates that if the pre-synaptic spike occurs *before* the post-synaptic spike (positive t<sub>post</sub> - t<sub>pre</sub>), the synaptic weight increases (positive ΔW). Conversely, if the pre-synaptic spike occurs *after* the post-synaptic spike (negative t<sub>post</sub> - t<sub>pre</sub>), the synaptic weight decreases (negative ΔW).
* The current waveforms show the magnitude and duration of current changes associated with synaptic events.
**Section (b): Synaptic Weight Graphs**
* **N = 1 (Top Graph):**
* The orange line (correlated inputs) starts at approximately 0.8 and fluctuates around 0.9, with dips coinciding with the gray shaded regions. It ends at approximately 0.85.
* The blue line (uncorrelated inputs) starts at approximately 0.5 and decreases to around 0.3 during the first shaded region, then fluctuates between 0.3 and 0.5, ending at approximately 0.35.
* **N = 7 (Bottom Graph):**
* The orange line (correlated inputs) starts at approximately 0.8 and increases to around 0.95, then fluctuates around 0.85-0.95, ending at approximately 0.9.
* The blue line (uncorrelated inputs) starts at approximately 0.5 and decreases to around 0.3 during the first shaded region, then fluctuates between 0.3 and 0.5, ending at approximately 0.4.
### Key Observations
* STDP leads to an increase in synaptic weight when the pre-synaptic spike precedes the post-synaptic spike, and a decrease when it follows.
* Correlated inputs tend to strengthen synaptic weights over time, while uncorrelated inputs tend to weaken them.
* The effect of correlated inputs is more pronounced in larger networks (N = 7) compared to smaller networks (N = 1).
* Uncorrelated inputs consistently lead to a decrease in synaptic weight.
### Interpretation
The diagram demonstrates how STDP can contribute to the formation of stable neuronal networks. The strengthening of synapses due to correlated inputs suggests a mechanism for learning and memory, where neurons that fire together wire together. The weakening of synapses due to uncorrelated inputs helps to prune irrelevant connections, improving the efficiency of the network. The larger network (N = 7) shows a more robust strengthening of correlated synapses, indicating that network size can amplify the effects of STDP. The gray shaded regions represent periods of stimulation, and the corresponding dips in synaptic weight for uncorrelated inputs suggest that repeated stimulation without correlation can lead to synaptic depression. This model provides a biologically plausible mechanism for synaptic plasticity and its role in neural computation.
</details>
4.3. Memristive SNNs for supervised learning. Compared to the previous two approaches, implementing supervised learning in SNNs is a more challenging tasks, as the algorithm and the network must generate spikes at precise time instants based on the input excitation. As opposed to the backpropagation algorithm that is highly successful in training ANNs, supervised learning algorithms for SNNs are not well developed yet, due to the inherent difficulty in applying gradient descent methods for spiking neuron models with infinite discontinuities at the instants of spikes. Nevertheless, there have been several demonstrations of supervised learning algorithms for SNNs based on approximate forms of gradient descent for simple fully-connected networks [89,90,91] .
Figure 10 . a) SNN supervised learning experiment. A two-layer network is tasked with generating 1000ms long spike streams from the 168 neurons at the output corresponding to the images of the spoken characters. The inputs to the network are 132 spike streams representing the characters subsampled from the output of a Silicon cochlea chip. The weights are modified based on the NormAD learning rule. b) Using multi-PCM synapses, the accuracy of spike placement at the output is about 80%, compared to the FP64 accuracy of close to 98%. [92]
<details>
<summary>Image 10 Details</summary>

### Visual Description
## Diagram & Chart: Spiking Neural Network Training Accuracy
### Overview
The image presents a diagram illustrating a spiking neural network setup alongside a chart showing the training accuracy of different models over training epochs. The diagram depicts the flow of spikes from a silicon cochlea through a spiking neural network to desired spike streams. The chart shows the accuracy (%) of three models (FP64, Experiment, and PCM model) as a function of training epoch.
### Components/Axes
**Diagram (a):**
* **Left Plot:** Scatter plot of "Spikes from silicon cochlea subsampled". X-axis: "Time (ms)", ranging from 0 to 1000. Y-axis: "Input channel index", ranging from 0 to 120. Data points are colored orange and red.
* **Center Diagram:** Representation of a "Spiking neural network". It shows a network with 132 input nodes and 168 output nodes, connected by lines representing synaptic connections.
* **Right Plot:** Scatter plot of "Desired spike streams". X-axis: "Time (ms)", ranging from 0 to 1000. Y-axis: "Output neuron index", ranging from 0 to 150. Data points are colored blue.
* **Top-Right:** Four images representing "Spike rates as images", each a 4x4 grid of colored squares.
**Chart (b):**
* X-axis: "Training epoch", ranging from 0 to 100.
* Y-axis: "Accuracy (%)", ranging from 20 to 100.
* Three lines representing different models:
* FP64 (teal)
* Experiment (red)
* PCM model (grey)
* Shaded areas around each line represent the standard deviation or confidence interval.
### Detailed Analysis or Content Details
**Diagram (a):**
* The left plot shows two distinct clusters of orange and red points, suggesting two different spike patterns over time. The orange points are concentrated between 0-500ms and 80-120 input channel index, while the red points are concentrated between 500-1000ms and 0-40 input channel index.
* The spiking neural network diagram shows a fully connected layer with 132 input nodes and 168 output nodes.
* The right plot shows a dense pattern of blue points, indicating a consistent stream of spikes across all output neurons and time.
**Chart (b):**
* **FP64:** The teal line starts at approximately 50% accuracy at epoch 0 and rises sharply to around 95% accuracy by epoch 50. It plateaus around 95-98% for the remainder of the training epochs.
* **Experiment:** The red line starts at approximately 45% accuracy at epoch 0 and rises steadily to around 85% accuracy by epoch 100. The shaded area around the line indicates a significant amount of variability in the accuracy.
* **PCM model:** The grey line starts at approximately 40% accuracy at epoch 0 and rises steadily to around 75% accuracy by epoch 100. The shaded area around the line indicates a significant amount of variability in the accuracy.
### Key Observations
* The FP64 model achieves the highest accuracy and converges quickly.
* The Experiment and PCM models show slower convergence and lower overall accuracy.
* The Experiment and PCM models have larger variability in accuracy compared to the FP64 model.
* The spike patterns in the silicon cochlea (left plot) are distinct, suggesting different auditory stimuli.
### Interpretation
The diagram illustrates a computational model of auditory processing, where spikes from a silicon cochlea are processed by a spiking neural network to generate desired spike streams. The chart shows the training accuracy of different models used to implement this network. The FP64 model, likely a high-precision floating-point model, demonstrates superior performance, suggesting that higher precision is beneficial for training this type of network. The Experiment and PCM models, potentially lower-precision or alternative implementations, show lower accuracy and greater variability, indicating challenges in achieving comparable performance with these approaches. The difference in accuracy between the models could be due to the limitations of lower-precision arithmetic or the specific architecture of the PCM model. The distinct spike patterns in the silicon cochlea suggest that the network is capable of processing different auditory stimuli. The overall results suggest that while spiking neural networks hold promise for auditory processing, careful consideration must be given to the precision and architecture of the models used to implement them.
</details>
Recently, Nandakumar et al., demonstrated a proof-of-concept realization of supervised learning in a 2-layer SNN implemented using nanoscale phase change memory synapses based on the Normalized Approximate Descent Algorithm [89] . In the experiment, 132 spikestreams representing spoken audio signals generated using a Silicon cochlea chip was used as the input, and the network was trained to generate 168 spike-streams whose arrival times indicate the pixel intensity corresponding to the spoken characters [92] . Compared to normal classification problems in deep networks where the accuracy depends only on the relative magnitude of the response of the output neurons, the SNN problem is harder as the network is tasked with generating close to 1000 spikes at specific time instances over a period of 1250 ms from 168 spiking neurons that are excited by 132 input spike streams. The accuracy for spike placement obtained in the experiment was about 80% compared to the software baseline accuracy of over 98%, despite using the same multi-memristive architecture described earlier. This experiment is hence illustrative of the need for developing more robust and event-driven learning algorithms for SNNs that can mitigate or even leverage the device non-idealities for designing computational systems.
## 4.4. Harnessing Randomness for Learning Noise - from impairment to asset . As
discussed in the previous section, the implementation of standard deterministic learning rules, such as STDP or gradient-based schemes like NormAD [89] , may be severely impaired in hardware implementations whose components are inherently noisy. In this section, we explore the idea that, if properly harnessed, native hardware randomness can be an asset for the deployment of training algorithms for SNNs [93, 94] . The gist of the argument is that randomness enables the native implementation of probabilistic models, which otherwise would require the deployment of additional, potentially costly, components. As we elaborate on next, probabilistic models have several advantages over their conventional deterministic counterparts. We focus the discussion on the problem of training, but we will also mention some of the advantages in terms of inference.
4.5. Training deterministic SNN models. Standard Artificial Neural Network (ANN)-based models only account for uncertainty at their inputs or outputs, while the process transforming inputs to outputs is deterministic. While limiting their expressiveness and their capacity to model structured uncertainty [95] , this modelling choice does not cause a problem in the development of learning rules for ANNs. This is because deterministic ANN models define a differentiable input-output mapping as a function of the model weights, enabling the direct derivation of gradient-based learning rules through backpropagation and automatic differentiation.
Not so for SNNs. In fact, deterministic spiking neuron models such as Leaky Integrate and Fire (LIF) define non-differentiable functions of the synaptic weights: Increasing or decreasing the synaptic weights of a spiking neuron may cause the membrane potential to cross or step back from the spiking threshold, causing an abrupt change in the output. The derivative with respect to the weights is hence zero except around the firing threshold, where it is undefined. As a result, standard gradient-based learning rules cannot be directly derived for deterministic models of SNNs.
A second important issue with conventional gradient-based methods when applied to deterministic SNN models concerns the problem of credit assignment. Discrete-time deterministic SNN models can be interpreted as Recurrent Neural Networks (RNNs) with state defined by the neurons' membrane potential, input currents, and previous spiking behaviours [91] . Accordingly, the outputs and the state transition produced as a function of exogeneous inputs and state depend on the learnable synaptic weights. Therefore, a synaptic weight affects the loss function being optimized via changes that are propagated through the neurons and through time. Assigning credit for changes in the output - which is what is needed to compute the gradient - hence requires to either backpropagate per-output changes through neurons and time or to keep track of per-weight changes in a forward manner through neurons and time [96,97,91] . Both solutions come with significant drawbacks: Backpropagation requires keeping track of forward activations and flowing information backward through time, while forward methods entail the need to memorize per-weight quantities across all neurons.
Given the two challenges discussed above - non-differentiability and credit assignment state-of-the-art training methods for SNNs based on deterministic, typically LIF, models follow various heuristic approaches. As discussed in the previous section, the most common class of methods sidesteps both challenges by carrying out an offline conversion from a pretrained ANNs. This makes it impossible to implement online on-chip learning, and it also limits information processing to rate encoding, which encodes information in the spike frequency (e.g. see [75]). A second, popular, approach is to implement biologically inspired local synaptic update rules, such as STDP, that do not require credit assignment. The main downside of these approaches is that they do not optimize specific objective functions sidestepping the problem of non-differentiability - and hence they are difficult to generalize to a variety of tasks and requirements. When focusing on rate encoding, it is possible to overcome to problem of non-differentiability, but not that of credit assignment, by removing non-linearities and working directly with spiking rates, for example with low-pass filtered spike trains [98,89] .
In contrast to standard rate encoding, SNNs enable a novel type of information processing that computes with time, rather than merely over it as for ANNs. In order to make use of this unique capability of SNNs, it is necessary to derive learning rules that are capable of processing information encoded in the timing of the spikes and not only in their frequency. The simplest way to do this is to limit the number of spikes per neuron to one, so as to assign a continuous-valued output to each neuron. This allows the derivation of backpropagationbased rules as for ANNs, whereby the neurons' (differentiable) non-linearities capture the relationship between input and output spike timings [99] .
More sophisticated methods, allowing for multiple spikes per neuron, are either based on soft non-linearity models [100] or on surrogate gradient methods [91] . The first type of approaches tackles the problem of non-differentiability by approximating the threshold activation function with a differentiable function [100] . As a result, these methods do not preserve the key feature of SNNs of processing and communicating binary spikes. The second class of techniques approximates the derivative of the threshold activation function (but not the function itself) when computing gradients [91] . Both types of methods require backward or forward propagation or the implementation of heuristic credit assignment methods such as random backpropagation [101] . As an example, SuperSpike uses forward propagation to carry out credit assignment over time coupled with random backpropagation for spatial credit assignment [90,91] .
We emphasize again that the discussion above focused on the role of randomness in facilitating training. Randomness in SNNs can also be useful in the inference phase to enable Gibbs sampling-based Bayesian inference strategies [93, 102] .
4.6. Probabilistic SNN models. Among their key advantages, probabilistic models allow the direct encoding of domain knowledge in the graph of connections among the constituent variables - a key feature of so-called expert systems - and the modelling of uncertainty [103] .
They can also account for complex multi-modal distributions, unlike their deterministic counterparts [104] . Finally, stochastic models, even for ANNs, can both improve generalization, as in dropout regularization, and facilitate exploration of the training space [105] .
Training of probabilistic models is generally conceptually more complex than for deterministic models due to the need to account for the exponentially large space of values that the hidden stochastic units can take. Note, however, that probabilistic models have provided the framework used to develop the first deep learning algorithms for ANNs in [106] through Boltzmann machines. Early training methods for general (undirected) models used Gibbs sampling or mean-field approximation, requiring an expensive cycling through the variables one at a time [107,108] . More modern approaches leverage advanced forms of approximate learning and inference via (Generalized) Expectation Maximization, Monte Carlo methods, and variational inference [106,104,109,110] .
Probabilistic models for SNNs can be thought as direct extensions of the belief networks studied in [107,106,105] from static to dynamic models. As in belief networks, a neuron spikes probabilistically with a probability that increases with its membrane potential. In belief networks, the membrane potential of a neuron is an instantaneous function of the current spikes emitted by the incoming neurons in neuron's fan-in. In contrast, in an SNN, the membrane potential of a neuron evolves over time as for LIF models as a function of the past spiking behaviour of the neuron itself and of the neurons in its fan-in (see [111] for a review).
4.7. Training probabilistic SNN models. For the development of training rules, probabilistic SNN models have the fundamental advantage over their deterministic counterparts that the probability of the neurons' outputs is a differentiable function of the model parameters, including the synaptic weights. Many learning criteria can be formulated as the average over such distribution of a given loss or reward function. Specifically, in supervised and unsupervised learning, the learning problem can be formulated as the minimization of a loss function averaged over the joint distribution of data and of specific neurons in a read-out layer [112,111] ; and in reinforcement learning, the goal is to minimize an average reward function dependent on the behaviour of the neurons in the readout layer [113] . Unlike deterministic SNN models, probabilistic SNN models hence allow naturally for the definition of differentiable learning criteria.
Once a learning criterion is determined based on the problem under study, training can be carried out via stochastic gradient-based rules. The key novel challenge in deriving such rules is the need to differentiate over the distribution of the neurons' outputs. Mathematically, with deterministic models, one needs to differentiate a training criterion of the type
$$L _ { d } ( \theta ) = E _ { X \sim D } [ f _ { \theta } ( X ) ] ,$$
where the expectation is taken over the empirical distribution 𝐷 of the data and the model parameter 𝜃 directly affects the learning criterion 𝑓 , (𝑋) through the input-output function of the network. In contrast, with probabilistic models, the relevant learning criterion is of the type
$$L _ { p } ( \theta ) = E _ { X \sim D } [ E _ { Y \sim P _ { \theta } } [ f ( X , Y ) ] ] ,$$
in which 𝑌 represents the random output of the neurons. Note that unlike the standard deterministic approach, the model parameters affect the learning performance through the distribution of the random output of the neurons.
Maximization of the criterion above can be in principle carried out via Expectation Maximization. In practice, the intractability of Bayesian inference of the hidden neurons entails the need for approximate solutions based on sampling methods and gradient-based techniques [104] . Computing stochastic gradients of 𝐿 -(𝜃) requires a double empirical expectation, one over the data distribution 𝐷 and one over the output distribution 𝑃 , . Estimators based on such samples can be derived by following one of a variety of principles, yielding different statistical properties in terms of, e.g., bias and variance [114] .
While a number of techniques attempt to reuse the standard backpropagation algorithm, e.g., the 'Straight-Through' estimator [105] , an approach that is more suitable for the implementation of SNNs is obtained via the score, or log-likelihood, or REINFORCE method and variations thereof (see [104,109,110]). Accordingly, for given data and neurons' output samples the gradient with respect to a synaptic weight can be estimated through the correlation between the accrued loss function over time and the log-probability of the realized output for a given sample (𝑋, 𝑌) , i.e., (somewhat informally)
$$\nabla _ { \theta } L _ { p } ( \theta ) \approx f ( X , Y ) \nabla _ { \theta } \log ( P _ { \theta } ( Y ) ) .$$
Intuitively, the higher the loss is, the more the negative gradient should push away from output distributions that generate such disadvantageous samples 𝑌 . Various improvements of the statistical properties of this estimator are reviewed in [114].
The REINFORCE gradient estimate ∇ , 𝐿 -(𝜃) highlights not only the direct differentiability of generic learning criteria but also the fact that probabilistic learning rules solve the credit assignment problem by not requiring any form of backpropagation [105] . In contrast, a gradient-based rule that uses ∇ , 𝐿 -(𝜃) only requires all nodes to receive a global feedback signal 𝑓(𝑋, 𝑌), which may be computed by a central node [111] . The resulting learning procedure follows the standard three-factor rule from theoretical computer science, whereby the synaptic weights are modified based on pre- and post-synaptic recent spikes, which are locally available at each neuron, and on a global feedback signal [111] . Accordingly, the rule can be easily implemented in an online streaming fashion.
4.8. Generalized probabilistic SNN models. Apart from the advantages described above in terms of differentiability and credit assignment, probabilistic models can be directly extended with minor conceptual and algorithmic difficulties in various directions. First, it is possible to directly derive - technically, by selecting a categorical instead of a Bernoulli distribution in a Generalized Linear Model (GLM) for SNNs - training rules that allow for multi-valued spikes or inter-neuron instantaneous connections or, equivalently, Winner Take All (WTA) circuits [115, 102] . This is particularly important since data produced by some neuromorphic sensors incorporates a sign to indicate a positive or negative change [116] . Multi-valued spikes can also be used for time compression [117] . Second, various decoding rules, such as first-to-spike, can be directly optimized for, instead of having to rely on surrogate target spiking sequences [118] . Third, probabilistic models can provide an estimate of the uncertainty on the trained weights by means of Bayesian Monte Carlo methods [115] .
Before describing some applications of the models and learning rules reviewed above, we mention briefly here alternative probabilistic formulations for SNNs. In the models discussed above, randomness is defined at the level of neurons' outputs. Alternative models introduce randomness at the level of synapses or thresholds [119,120] .
4.9. Examples. Once an SNN is trained, it can be used as a sequence-to-sequence mapper in order to solve supervised, unsupervised, and reinforcement learning problems. Alternatively, with specific choices of the synaptic kernels and memory, the SNN can be used as a Gibbs sampler to carry out Bayesian inference with outputs encoded in the spiking rates [93, 102] . We now briefly discuss three applications that fall in the first category, one concerning supervised learning, one reinforcement learning, and one federated learning.
<details>
<summary>Image 11 Details</summary>

### Visual Description
## Line Chart: Encoding Performance vs. Time Steps
### Overview
The image presents two line charts side-by-side, comparing the performance of "rate encoding" and "time encoding" methods. The left chart displays the Mean Absolute Error (MAE) as a function of the number of time steps (NT), while the right chart shows the number of spikes as a function of NT. Both charts cover the range of NT from 3 to 15.
### Components/Axes
* **Left Chart:**
* X-axis: NT (Number of Time Steps), ranging from 3 to 15.
* Y-axis: error (MAE), ranging from 0.04 to 0.1.
* Data Series:
* "rate encoding" (Blue line with diamond markers)
* "time encoding" (Orange line with circle markers)
* **Right Chart:**
* X-axis: NT (Number of Time Steps), ranging from 3 to 15.
* Y-axis: number of spikes, ranging from 0 to 4.
* Data Series:
* "rate encoding" (Blue line with diamond markers)
* "time encoding" (Orange line with circle markers)
### Detailed Analysis or Content Details
**Left Chart (MAE vs. NT):**
* **rate encoding (Blue):** The line starts at approximately 0.092 at NT=3, decreases to approximately 0.076 at NT=5, then decreases to approximately 0.071 at NT=7, increases slightly to approximately 0.073 at NT=10, and finally decreases to approximately 0.072 at NT=15. The trend is generally decreasing, with some fluctuations.
* **time encoding (Orange):** The line starts at approximately 0.095 at NT=3, decreases steadily to approximately 0.062 at NT=7, and then decreases further to approximately 0.055 at NT=15. The trend is consistently decreasing.
**Right Chart (Number of Spikes vs. NT):**
* **rate encoding (Blue):** The line starts at approximately 0.8 at NT=3, increases to approximately 1.6 at NT=5, increases to approximately 2.4 at NT=7, increases to approximately 3.2 at NT=10, and finally increases to approximately 3.6 at NT=15. The trend is consistently increasing.
* **time encoding (Orange):** The line starts at approximately 1.0 at NT=3, increases to approximately 1.4 at NT=5, increases to approximately 1.7 at NT=7, increases to approximately 1.9 at NT=10, and finally increases to approximately 2.1 at NT=15. The trend is consistently increasing, but at a slower rate than "rate encoding".
### Key Observations
* In the left chart, "time encoding" consistently achieves lower MAE values than "rate encoding" across all values of NT.
* In the right chart, both encoding methods show an increasing number of spikes with increasing NT. "rate encoding" exhibits a steeper increase in the number of spikes compared to "time encoding".
* The MAE for "time encoding" decreases more rapidly initially (between NT=3 and NT=7) than in the later stages.
* The number of spikes for "rate encoding" is consistently higher than for "time encoding".
### Interpretation
The data suggests that "time encoding" is a more accurate method than "rate encoding" for the given task, as evidenced by the lower MAE values. However, "rate encoding" requires a larger number of spikes to achieve its performance. This implies a trade-off between accuracy and computational cost (or energy consumption, if spikes represent neuronal firings).
The decreasing MAE for "time encoding" with increasing NT suggests that the performance of this method improves as more time steps are considered. The increasing number of spikes with NT for both methods indicates that more information is being processed as the temporal window expands. The difference in the rate of increase in spikes between the two methods could be related to how each method represents information over time. "Rate encoding" may rely on a higher density of spikes to convey information, while "time encoding" may be more efficient in its use of spikes.
The charts provide a quantitative comparison of two encoding strategies, highlighting their respective strengths and weaknesses. This information could be valuable for selecting the most appropriate encoding method for a specific application, considering factors such as accuracy requirements, computational resources, and energy constraints.
</details>
△T
△T
Figure 11. Test error and number of spikes as a function of the time expansion parameter defining source encoding from natural signals to spikes. Reproduced with permission [ 111 ] , Copyright 2019, IEEE .
In order to first illustrate the potential of probabilistic SNNs trained to process time-encoded information, in Figure 11, we consider an online sequence prediction problem in which samples of a discrete-time source are converted into spiking signals with ∆𝑇 time instants for each sample of the input source. We consider two types of encoding, one based on standard quantization and rate encoding, and one based on the time encoding via Gaussian receptive fields. The figures, fully detailed in [111], demonstrate that time encoding can vastly outperform rate encoding both in terms of accuracy and in terms of number of spikes, which is a proxy for energy consumption.
Second, we consider a standard reinforcement learning task, in which a probabilistic SNN is used as a stochastic policy. Figure 12 compares the performance as a function of the resolution of the input grid representation for a policy directly trained with a first-to-spike decoder and one that is instead converted using state-of-the-art methods from a pre-trained ANN. The results clearly validate the intuition that directly training the stochastic policy as an SNN is more efficient than using ANN-to-SNN conversion.
Figure 12 Time steps to reach goal and spikes per episode for a grid world reinforcement learning task. Reproduced with permission [113], Copyright 2019, IEEE.
<details>
<summary>Image 12 Details</summary>

### Visual Description
## Chart: Performance Comparison of SNN Training Methods
### Overview
This image presents two charts comparing the performance of different Spiking Neural Network (SNN) training methods – Direct training of probabilistic SNNs and ANN-to-SNN conversion – against a standard Artificial Neural Network (ANN). The charts plot performance metrics (time-steps to reach goal and spikes per episode) against the number of input neurons. The charts are stacked vertically.
### Components/Axes
**Top Chart:**
* **Title:** Not explicitly present, but implied to be related to time-steps to reach goal.
* **X-axis:** "number of input neurons" with markers: 4(6 x 6), 6(4 x 4), 20(2 x 2), 70(1 x 1).
* **Y-axis:** "time-steps to reach goal" on a logarithmic scale, ranging from 10<sup>1</sup> to 10<sup>4</sup>.
* **Legend:** Located in the top-right corner.
* Green Line: "Direct training prob. SNN"
* Black Line: "ANN-to-SNN conversion"
* Orange Line: "ANN"
**Bottom Chart:**
* **Title:** Not explicitly present, but implied to be related to spikes per episode.
* **X-axis:** "number of input neurons" with markers: 4(6 x 6), 6(4 x 4), 20(2 x 2), 70(1 x 1).
* **Y-axis:** "spikes per episode" on a logarithmic scale, ranging from 10<sup>1</sup> to 10<sup>6</sup>.
* **Legend:** Located in the top-right corner.
* Green Line: "Direct training prob. SNN"
* Black Line: "ANN-to-SNN conversion"
Both charts share the same x-axis. Annotations with arrows and text "T = 8, τ<sub>2</sub> = 4" are present in both charts, pointing to specific data points. "T" likely represents a time constant and "τ<sub>2</sub>" another parameter.
### Detailed Analysis or Content Details
**Top Chart (Time-steps to reach goal):**
* **ANN (Orange Line):** Starts at approximately 2000 time-steps at 4(6x6) input neurons, decreases to approximately 200 time-steps at 6(4x4), remains relatively stable at around 150 time-steps for 20(2x2) and 70(1x1).
* **ANN-to-SNN conversion (Black Line):** Starts at approximately 3000 time-steps at 4(6x6), decreases to approximately 1000 time-steps at 6(4x4), then decreases to approximately 200 time-steps at 20(2x2), and increases slightly to approximately 300 time-steps at 70(1x1).
* **Direct training prob. SNN (Green Line):** Starts at approximately 10000 time-steps at 4(6x6), dramatically decreases to approximately 20 time-steps at 6(4x4), and remains relatively stable at around 10-20 time-steps for 20(2x2) and 70(1x1).
**Bottom Chart (Spikes per episode):**
* **ANN-to-SNN conversion (Black Line):** Starts at approximately 500000 spikes at 4(6x6), decreases to approximately 100000 spikes at 6(4x4), then decreases to approximately 30000 spikes at 20(2x2), and remains relatively stable at around 30000 spikes at 70(1x1).
* **Direct training prob. SNN (Green Line):** Starts at approximately 100 spikes at 4(6x6), decreases to approximately 20 spikes at 6(4x4), and remains relatively stable at around 10-20 spikes for 20(2x2) and 70(1x1).
The annotation "T = 8, τ<sub>2</sub> = 4" points to a data point on both charts. On the top chart, it corresponds to approximately 20 time-steps for the green line at 6(4x4) input neurons. On the bottom chart, it corresponds to approximately 20 spikes for the green line at 6(4x4) input neurons.
### Key Observations
* The "Direct training prob. SNN" method consistently achieves the lowest time-steps to reach the goal, especially as the number of input neurons increases.
* The "ANN-to-SNN conversion" method performs better than the ANN in terms of time-steps to reach the goal for 4 and 6 input neurons, but the performance converges as the number of input neurons increases.
* The "Direct training prob. SNN" method exhibits the lowest spike count per episode across all input neuron configurations.
* Both metrics (time-steps and spikes) show a significant improvement when moving from 4(6x6) to 6(4x4) input neurons for all methods.
* Increasing the number of input neurons beyond 6(4x4) yields diminishing returns in performance improvement.
### Interpretation
The data suggests that direct training of probabilistic SNNs is a more efficient approach than ANN-to-SNN conversion and standard ANNs, particularly in terms of both speed (time-steps to reach goal) and energy efficiency (spikes per episode). The dramatic reduction in both metrics when transitioning from 4(6x6) to 6(4x4) input neurons indicates a critical threshold in network capacity. The stabilization of performance beyond 6(4x4) suggests that further increasing the network size does not provide substantial benefits.
The annotation "T = 8, τ<sub>2</sub> = 4" likely represents specific hyperparameters used in the direct training method, highlighting a configuration that achieves optimal performance at 6(4x4) input neurons. The consistent low spike count of the direct training method suggests that it is able to represent information more sparsely, leading to lower power consumption. The convergence of the ANN and ANN-to-SNN conversion methods at higher input neuron counts suggests that the benefits of SNNs become less pronounced as network complexity increases. This could be due to the limitations of the ANN-to-SNN conversion process in fully capturing the dynamics of spiking neurons.
</details>
Finally, we consider the potential of SNN for on-mobile training via Federated Learning (FL). The approach is motivated by the fact that training on a device is limited by the amount of data available at it. Cooperative training can be carried out through FL as explored in [121], where an online FL-based learning rule is introduced for networked on-mobile probabilistic SNNs. As seen in the Figure 13 through sufficiently frequent inter-device communication, with a communication round occurring every 𝜏 iterations, the scheme demonstrates significant advantages over separate on-mobile training.
Figure 13 Test loss versus number of training iterations with inter-device communication taking place every 𝜏 iterations.
<details>
<summary>Image 13 Details</summary>

### Visual Description
\n
## Line Chart: Normalized Mean Test Loss vs. Global Algorithmic Time
### Overview
This image presents a line chart illustrating the relationship between normalized mean test loss and global algorithmic time for two different values of τ (tau). The chart appears to be evaluating the performance of an algorithm over time, with the loss function indicating the error rate.
### Components/Axes
* **X-axis:** "global algorithmic time t" - Scale ranges from 0 to 1200, with markings at 0, 400, 800, and 1200.
* **Y-axis:** "normalized mean test loss" - Scale ranges from 0.94 to 1.15, with markings at 0.95, 1.00, 1.05, and 1.10, 1.15.
* **Line 1 (Red):** Labeled "τ = 400". This line represents the normalized mean test loss for τ equal to 400.
* **Line 2 (Purple):** Labeled "τ = 8". This line represents the normalized mean test loss for τ equal to 8.
### Detailed Analysis
* **Line τ = 400 (Red):** The line starts at approximately 1.15 at t=0 and rapidly decreases to around 1.02 by t=100. It then fluctuates between approximately 0.98 and 1.05 until t=400. After t=400, the line exhibits a noticeable increase, reaching a peak of approximately 1.08 at t=500, then fluctuates between 0.98 and 1.05 until t=1200.
* **Line τ = 8 (Purple):** The line starts at approximately 1.04 at t=0 and steadily decreases to around 0.95 by t=100. It continues to fluctuate around 0.95, with minor variations, until t=1200. The line remains relatively stable throughout the entire duration.
### Key Observations
* The algorithm with τ = 8 consistently exhibits a lower normalized mean test loss compared to the algorithm with τ = 400.
* The algorithm with τ = 400 experiences a significant increase in loss around t = 400, suggesting a potential instability or change in behavior.
* The algorithm with τ = 8 demonstrates a more stable and consistent performance over time.
### Interpretation
The chart suggests that the parameter τ significantly impacts the performance of the algorithm. A smaller value of τ (τ = 8) leads to a more stable and lower loss, indicating better performance. The sudden increase in loss for τ = 400 around t = 400 could indicate a point where the algorithm encounters difficulties, such as getting stuck in a local minimum or diverging. This could be due to the algorithm's sensitivity to the value of τ, or it could be a result of the specific dataset or problem being solved. The consistent performance of τ = 8 suggests it is a more robust parameter setting for this particular algorithm and task. Further investigation would be needed to understand the underlying reasons for the observed differences and to optimize the value of τ for optimal performance.
</details>
4.10. Algorithmic and hardware co-design. To sum up the discussion in this section, spikebased learning and inference are promising facets of the neuromorphic computing paradigm. Unlike conventional machine learning models, spike-based processing "computes with time, not in time". As we have discussed, the main advantage is a potentially massive increase in power efficiency. In this section, we have presented a review of algorithmic models that leverage stochastic behavior for the implementation of SNNs. While it is true that spike-based computing can be implemented in CMOS technology, there is a great deal to be gained from compact nano-scale implementations of fundamental functional blocks - spiking neurons and adjustable synapses-- in terms of scalability and power-efficiency. Memristors are much better suited to emulate, and not merely simulate many of the sought functionalities. Moreover, the implementation of probabilistic models on current hardware platforms is made difficult by the lack of randomness sources in such systems. In contrast, the inherent randomness of switching processes in memristive devices could provide a source of randomness "for free". Research in spike-based computing is a fast-growing field. We believe that developing better-suited hardware platforms would accelerate the progress of co-designed spike-based learning and inference machines. Memristors may be the missing piece that will unlock the potential of spike-based computing.
## 5. Future of neuromorphic and bio-inspired computing systems
Taking a 'big picture' view, current AI, and machine learning methods in particular, have achieved astonishing results in every field they have been applied to and have become or are becoming standard tools for nearly every type of industry one can think of. This impressive invasion was mainly propelled by deep learning which is loosely inspired by biological neural networks.
Deep learning primarily refers to learning with artificial neural networks of many layers, and fundamentally is not different to what we know about that field in the 90's. Indeed, the key algorithm underlining the success of deep learning, backpropagation, is an old story: 'Learning with back-propagating errors' by Rumelhart, Hinton and Williams was published in 1986 [122] . The most commonly used neural networks are feedforward neural networks, and, convolutional neural networks used for image processing can be seen as inspired by our visual system, and both of these are not very new concepts.
Backpropagation is perhaps the most fundamental method we can think of for parameter optimisation. It is derived by differentiating an error function with respect to the learnable parameters, so in some ways it is not entirely surprising that the algorithm existed for many years. What might be somehow surprising is that we have not been able to move away much from this idea. While there has been recent progress, much of it consisted of relatively small additions and tweaks, for instance new ways to address the so called 'problem of the vanishing gradient', the deterioration of the error signal as is backpropagated from the output to the input of the network. Undoubtedly, there were some fundamentally different architectures, smart techniques and novel analyses but arguably, the key factor behind such a success seems to be the vast availability of data and computational power.
In fact, recent advances of the neuroscience community are not present in the neural networks. We do not want to argue that this, per se, is either good or bad, or to suggest that the next super-algorithms will be copying nature. We only want to underline that though inspired only, artificial neural networks had their basis on neuroscience concepts and that there are many phenomena that have, perhaps, not been sufficiently explored within an AI context. For instance, biological neural networks have different learning rules for positive and negative connections, connections change in multiple time scales and show reversable dynamic behavior (known as short-term plasticity), and the brain itself has a structure where specific areas play different roles, just to name a few.
Instead, our progress was mainly based on hardware improvements that made this success possible by allowing long training phases; an amount of training unrealistic for any human. While it is true that human intelligence also develops over years and that human learning involves many trials, for comparison AlphaGoZero, which surpass human performance in the game of Go, was trained over 4.9 Million games 123 . To match this number of games would require a human that lives for 90 years to complete one Go game every 10 minutes from the moment they are born. This realization tells us two things: (1) our machines do not learn the same way that humans do, and even if we think our methods as bioinspired, we likely still miss some key ingredients and (2) executing that many games certainly require considerable computational power and energy consumption.
As a consequence, training algorithms often require a high energy footprint due to excessive training times and hyper parameter tuning involved. Hyper-parameters are parameters of the system that are not (usually) adapted via the learning method itself, one such example is the learning rate, which indicate how fast the network should update its 'knowledge'. Before rushing to say that a high learning rate is obviously desirable, such a learning rate could lead to oscillations as, for instance, optimal solutions could be overlooked, or it could lead to forgetting previously obtained knowledge. Setting the learning rate right is not always trivial. In fact, the tuning of hyper-parameters was what originally made the machine learning community to turn away from artificial neural networks, and it was the performance of deep learning that brought the focus back. One may then wonder, at the end of the day how much energy inefficient could deep learning systems be? The reply is perhaps surprising: estimated carbon emissions for training standard natural language processing models is approximately five times higher than running a car for a lifetime [124] . This realization suggests there is an urgent need to improve on both current hardware and learning models.
Given such energy concerns, systems based on low-power memristive devices are a highly promising alternative [125,126] . Besides having a low carbon footprint, there is numerus work demonstrating devices that mimic neurons, synapses, and plasticity phenomena. Often such approaches work well for offline training. However, some of these attempts, particularly where plasticity is involved, are opportunistic (including own work) and how scaling to larger networks could happen is not always obvious. Faithfully reproducing the brain functionality, when neuroscience has already so many open questions is challenging for any technology. Moreover, using technologies that potentially allow less possibilities for engineering in comparison to traditional methods (such as CMOS) might well be mission impossible. How far can we go by reconstructing neuron by neuron and synapse by synapse in terms of scalability remains unclear. A more promising way might be to achieve a deeper understanding of the physics of the relevant materials and based on this understanding codevelop the technology and the required learning methods for achieving Artificial Intelligence.
<details>
<summary>Image 14 Details</summary>

### Visual Description
\n
## Diagram: Echo State Network (Reservoir Computing) Architecture
### Overview
The image depicts the architecture of an Echo State Network (ESN), a type of recurrent neural network (RNN) that falls under the broader category of reservoir computing. The diagram illustrates the three main layers: an input layer, a dynamic reservoir layer, and an output layer. The connections between these layers are shown, highlighting the fixed input weights and trainable output weights.
### Components/Axes
The diagram consists of three main sections:
* **Input Layer:** Located on the left side, represented by purple circles. Labeled "Input layer" and "Input x(t)".
* **Dynamic Reservoir Layer:** Occupies the central portion of the diagram, depicted as a light yellow, amorphous shape containing orange circles. Labeled "Dynamic reservoir layer" and "Internal states r(t)".
* **Output Layer:** Situated on the right side, represented by teal circles. Labeled "Output layer" and "Output y(t)".
* **Fixed Weights:** Indicated by lines connecting the input layer to the reservoir layer. Labeled "Fixed".
* **Trainable Weights:** Indicated by lines connecting the reservoir layer to the output layer. Labeled "Trainable weights".
* **Connections within Reservoir:** Numerous curved lines connect the orange circles within the reservoir layer, representing recurrent connections.
* **Ellipsis:** Three dots are present in the input and output layers, indicating that the layers can have more nodes than are explicitly shown.
### Detailed Analysis / Content Details
The diagram shows a fully connected input layer feeding into the dynamic reservoir layer. The reservoir layer consists of approximately 10 orange circles, interconnected by numerous connections. The connections within the reservoir are not explicitly weighted or labeled. The reservoir layer then connects to the output layer, which consists of approximately 5 teal circles.
* **Input Layer:** Contains at least 4 nodes (plus ellipsis indicating more).
* **Reservoir Layer:** Contains approximately 10 nodes. The connections between these nodes are dense and recurrent.
* **Output Layer:** Contains at least 4 nodes (plus ellipsis indicating more).
* **Input to Reservoir Connections:** Each input node is connected to every node in the reservoir layer.
* **Reservoir to Output Connections:** Each reservoir node is connected to every output node.
* **Weighting:** The connections from the input layer to the reservoir layer have fixed weights, while the connections from the reservoir layer to the output layer have trainable weights.
### Key Observations
The key feature of this architecture is the fixed, randomly generated weights within the reservoir layer. This contrasts with traditional RNNs where all weights are trained. The reservoir layer acts as a non-linear transformation of the input signal, and the output layer learns to extract relevant information from the reservoir's internal states. The diagram emphasizes the recurrent connections within the reservoir, which are crucial for maintaining a "memory" of past inputs.
### Interpretation
This diagram illustrates the core concept of reservoir computing, specifically as implemented in an Echo State Network. The reservoir layer, with its fixed random weights and recurrent connections, provides a rich, high-dimensional representation of the input signal. The trainable output weights allow the network to learn to map this representation to the desired output. This approach simplifies training compared to traditional RNNs, as only the output weights need to be adjusted. The diagram highlights the separation of concerns: the reservoir handles the complex non-linear dynamics, while the output layer performs the task-specific learning. The use of fixed weights in the reservoir is a key design choice that enables efficient training and avoids the vanishing/exploding gradient problems often encountered in deep RNNs. The diagram is a conceptual illustration and does not provide specific numerical data or performance metrics. It serves to explain the architecture and the flow of information within the network.
</details>
Fixed
Figure 14 Reservoir Computing maps inputs x(t) to higher-dimensional space, defined by the reservoir states r (t). Only weights connecting reservoir states r (t) and output y (t) need to be trained.
In the meantime, in parallel, we can immediately explore simple bio-inspired approaches that harness the dynamics of the material and could be proven useful for particular sets of problems. Here we present one such example which stems from the area of reservoir computing, an idea invented separately by Herbert Jeager for the machine learning community [ 127 ] , under the name of echo state networks, and by Wolfgang Mass [ 128 ] for the computational neuroscience community, under the name of liquid state machines. We strongly suspect that both these methods were very much motivated by the difficulty of training recurrent networks with a generalization of backpropagation known as backpropagation through time. While feedforward networks can perform many tasks successfully, recurrences are required for memory and, moreover, the brain is clearly not only feedforward. If recurrences exist and are required, there must be a way to efficiently train such structures. As a side note, it is very difficult to imagine how a biological neural network would be able to implement backpropagation through time, and for this alternative approaches have recently made their appearance [ 129 ] .
Reservoir computing methods came up with a workaround to the problem of training recurrent networks: they do not train them but instead harness their properties. Common in the approaches of echo state networks and liquid state machines is the idea of using a randomly recurrent network with fixed connectivity, hence no need to resort to backpropagation through time. This recurrent network is called a reservoir. It provides memory and at the same time transforms the input data to a spatiotemporal representation of higher dimensionality. This enhanced representation can be used as an input to single-layer perceptrons, that are trained with a very simple learning method, so the only learnable parameters are the feedforward weights between the reservoir neurons and the output neurons. The key difference between the echo state networks and liquid state machines, is that the first approach uses recurrent artificial neuron dynamics while the second uses recurrent spiking neural networks, reflecting the mindset in their corresponding communities. The main principle of reservoir computing is shown in Figure 14. The input x (t) is projected into the higher dimensional feature space r (t) by using the dynamical reservoir system. Only the weights connecting the internal states r (t) with the output y (t) need to be trained, while the rest of the system is fixed. The advantage of this approach is that it only requires a simple training method, while the ability to process complex and temporal data is retained.
Indeed, it might be surprising how much randomness can do from the point of computation: a random network can enrich data representations sufficiently so that a linear method can separate the data into the desirable classes. This approach is conceptually similar to the wellknown method of support vector machines, which uses kernels to augment the dimensionality of the data, so that again only a simple linear method is sufficient to achieve data classification. In fact, a link between the purely statistical technique of support vector machines and the bioinspired technique of reservoir computing has been formally built [130] . We can perhaps think of this link as a demonstration that biological inspiration and purely mathematical methodology might also solve problems in a similar manner.
We claim that reservoir computing would benefit from appropriate hardware. When simulating, the convergence of the recurrent network requires time, because the continuous system will be discretized and sequentially run on the CPU. If instead we replace the reservoir with an appropriate material, this step could become both fast and energy efficient: the material could compute effortlessly using its physical properties. Reservoirs do not need overengineering, since no specific structure is required; we only need to produce dynamics that are complex enough but not chaotic. In fact, there has already been work exploiting memristors in this direction [131] .
Could ideas from biology still add value to existing methods? A recent augmentation of the echo state networks [132] , inspired by the fruitfly brain, explores the concept of sparseness in order to improve learning performance of reservoirs. In brains, contrary to the typical artificial neural networks, only few neurons fire at a time, a fact that has been linked to memory capacity. Neuronal thresholds, appropriately initialized and updated with a slower time constant than that of the feedforward learnable weights can modulate sparseness and lead to better performance in comparison to the non-sparse reservoir, but also in comparison to stateof-the-art methods in a set of benchmark problems. Due to the sparseness leading to taskspecific neurons, this bio-inspired technique can alleviate the problem of catastrophic forgetting. Machine learning methods often suffer from the fact that once they learn a new task they have forgotten the previous one. Since in the space reservoir network a new task will likely recruit previously unused neurons, leaning a new skill does not completely override those previously learned. This simple method competes and surpasses more complicated methods that are built specifically to address catastrophic forgetting. Most importantly, the formulation of the specific rule allows for completely replacing the network dynamics with any other dynamics, including material dynamics, that are suitable for the purpose (i.e. highly nonlinear and not chaotic). Perhaps there are more such lessons to be learned from biology.
So, what can be done right now? To us it is clear that a better understanding of the physics behind memristive devices is key for the progress of the field [133] . A deeper understanding will allow us to harness the properties of the system for brain-like computation rather trying to fabricate some arbitrary brain behavior that may or may not be important in the context of a specific application, or worse may not scale up. Instead of thinking at the level of mimicking neurons and synapses, we can instead take inspiration from the biological systems, consider the dynamics required for neuronal processing and use the material physics to reproduce them.
## 6. Conclusion
Memristor technologies are still to realise their full potential that has been promoted over the last 15 years. Although predominantly seen as candidates to replace or augment our current digital memory technologies, the impact of memristor technologies on the broader fields of artificial intelligence and cognitive computing platforms are likely to be even more significant. As discussed in this progress report, the versatility of memristor technologies has resulted in their use across a range of applications: from in-memory computing, deep learning accelerators, spiking neural networks, to more futuristic bio-inspired computing paradigms. These approaches should not be seen as solutions to the same problem, nor as technologies that are in direct competition among themselves or with current, very successful, CMOS systems. Additionally, it is crucial to recognise that many of the discussed research areas are still at the very beginning of their development. Of these, more mature approaches will likely produce industrially relevant solutions sooner. For example, greater power-efficiency is an essential utility and a pressing issue that many engineers are trying to address. In-memory computing and deep learning accelerators based on memristors represent an attractive proposition for extreme power-efficiency.
There is also significant scope for more fundamental work. Development of new generations of bio-inspired algorithms would further boost advancements in hardware systems and platforms. The challenge and opportunity lie in the interdisciplinary nature of the research and the necessity to understand distinct methodologies and approaches. We believe that the community will benefit from the next generation of researchers being well educated across different traditional disciplines. For example, there is an undeniable link between the fields of computer science, more specifically machine learning, and computational neuroscience. The two disciplines could co-exist separately and act independently with distinct goals; however, there are great benefits to be gained from a more holistic approach. A strong case for closer collaboration has been made recently [134] . Collaborations should be expanded to include researchers in solid-state physics, materials science, nanoelectronics, circuit/architecture design and information theory. Memristors show great promise to be a fabric for producing brain-inspired building blocks [135] , and this progress report showcases different types of memristor-based applications. Memristor technologies are versatile enough to provide the perfect platform for different disciplines to strive together in pushing the frontiers of our current technologies in the most fundamental way.
## Acknowledgements
A.M. acknowledges funding and support from the Royal Academy of Engineering under the Research Fellowship scheme. A.S. acknowledges funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement number 682675). B.R. acknowledges partial support from IBM and Cisco. O. S. acknowledges funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program (grant agreement 725731). E.V. would like to acknowledge a Google Faculty Research Award (2017). AJK acknowledges funding from the Engineering and Physical Sciences Research Council (EPSRC).
## References
- 1 Dario Amodei and Danny Hernandez, AI and Compute, https://openai.com/blog/ai-and-compute/, Access: March 2020
2 M. M. Waldrop, Nature 2016 , 530 , 144.
3 V. Sze, Y.-H. Chen, T.-J. Yang, J. S. Emer, Proc. IEEE 2017 , 105 , 2295.
- 4 D. Ielmini, R. Waser, Eds. , Resistive Switching: From Fundamentals of Nanoionic Redox Processes to Memristive Device Applications , Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2016 .
- 5 D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, Nature 2008 , 453 , 80.
- 6 K. Szot, W. Speier, G. Bihlmayer, R. Waser, Nature Mater 2006 , 5 , 312.
- 7 M. A. Zidan, J. P. Strachan, W. D. Lu, Nat Electron 2018 , 1 , 22.
- 8 L. Chua, IEEE Trans. Circuit Theory 1971 , 18 , 507.
- 9 A. Mehonic, A. J. Kenyon, in Defects at Oxide Surfaces (Eds.: J. Jupille, G. Thornton), Springer International Publishing, Cham, 2015 , pp. 401-428.
- 10 S. Yu, Neuro-Inspired Computing Using Resistive Synaptic Devices , Springer Science+Business Media, New York, NY, 2017 .
- 11 O. Mutlu, S. Ghose, J. Gómez-Luna, R. Ausavarungnirun, Microprocessors and Microsystems 2019 , 67 , 28.
- 12 S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, D. Glasco, IEEE Micro 2011 , 31 , 7.
- 13 N. P. Jouppi, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, C. Young, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, N. Patil, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Patterson, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, G. Agrawal, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, R. Bajwa, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, S. Bates, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. H. Yoon, S. Bhatia, N. Boden, in Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17 , ACM Press, Toronto, ON, Canada, 2017 , pp. 1-12.
- 14 A. Sebastian, T. Tuma, N. Papandreou, M. Le Gallo, L. Kull, T. Parnell, E. Eleftheriou, Nat Commun 2017 , 8 , 1115.
- 15 J. J. Yang, D. B. Strukov, D. R. Stewart, Nature Nanotech 2013 , 8 , 13.
- 16 D. Ielmini, H.-S. P. Wong, Nat Electron 2018 , 1 , 333.
- 17 A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, E. Eleftheriou, Nat. Nanotechnol. 2020 , DOI 10.1038/s41565-020-0655-z.
- 18 M. Di Ventra, Y. V. Pershin, Nature Phys 2013 , 9 , 200.
- 19 Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, D. Ielmini, Proc Natl Acad Sci USA 2019 , 116 , 4123.
- 20 L. Chua, Appl. Phys. A 2011 , 102 , 765.
- 21 H.-S. P. Wong, S. Salahuddin, Nature Nanotech 2015 , 10 , 191.
- 22 A. Sebastian, M. Le Gallo, E. Eleftheriou, J. Phys. D: Appl. Phys. 2019 , 52 , 443002.
- 23 A. Sebastian, M. Le Gallo, G. W. Burr, S. Kim, M. BrightSky, E. Eleftheriou, Journal of Applied Physics 2018 , 124 , 111101.
- 24 J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, R. S. Williams, Nature 2010 , 464 , 873.
- 25 I. Vourkas, G. Ch. Sirakoulis, IEEE Circuits Syst. Mag. 2016 , 16 , 15.
- 26 S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, U. C. Weiser, IEEE Trans. Circuits Syst. II 2014 , 61 , 895.
- 27 A. Haj-Ali, R. Ben-Hur, N. Wald, R. Ronen, S. Kvatinsky, IEEE Trans. Circuits Syst. I 2018 , 65 , 4258.
- 28 S. Hamdioui, H. A. Du Nguyen, M. Taouil, A. Sebastian, M. L. Gallo, S. Pande, S. Schaafsma, F. Catthoor, S. Das, F. G. Redondo, G. Karunaratne, A. Rahimi, L. Benini, in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) , IEEE, Florence, Italy, 2019 , pp. 486-491.
- 29 A. Rahimi, S. Datta, D. Kleyko, E. P. Frady, B. Olshausen, P. Kanerva, J. M. Rabaey, IEEE Trans. Circuits Syst. I 2017 , 64 , 2508.
- 30 G. Karunaratne, M. L. Gallo, G. Cherubini, L. Benini, A. Rahimi, A. Sebastian, arXiv:1906.01548 [physics] 2020 .
- 31 N. Papandreou, H. Pozidis, A. Pantazi, A. Sebastian, M. Breitwisch, C. Lam, E. Eleftheriou, in 2011 IEEE International Symposium of Circuits and Systems (ISCAS) , IEEE, Rio de Janeiro, Brazil, 2011 , pp. 329-332.
R. Carboni, D. Ielmini,
Adv. Electron. Mater.
,
, 1900198.
- 33 Y. Shim, S. Chen, A. Sengupta, K. Roy, Sci Rep 2017 , 7 , 14101.
- 34 H. Nili, G. C. Adam, B. Hoskins, M. Prezioso, J. Kim, M. R. Mahmoodi, F. M. Bayat, O. Kavehei, D. B. Strukov, Nat Electron 2018 , 1 , 197.
- 35 M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, E. Eleftheriou, IEEE Trans. Electron Devices 2018 , 65 , 4304.
- 36 G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M. Ishii, P. Narayanan, A. Fumarola, L. L. Sanches, I. Boybat, M. Le Gallo, K. Moon, J. Woo, H. Hwang, Y. Leblebici, Advances in Physics: X 2017 , 2 , 89.
- 37 M. A. Zidan, J. P. Strachan, W. D. Lu, Nat Electron 2018 , 1 , 22.
- 38 B. Fleischer, S. Shukla, M. Ziegler, J. Silberman, J. Oh, V. Srinivasan, J. Choi, S. Mueller, A. Agrawal, T. Babinsky, N. Cao, C.-Y. Chen, P. Chuang, T. Fox, G. Gristede, M. Guillorn, H. Haynie, M. Klaiber, D. Lee, S.H. Lo, G. Maier, M. Scheuermann, S. Venkataramani, C. Vezyrtzis, N. Wang, F. Yee, C. Zhou, P.-F. Lu, B. Curran, L. Chang, K. Gopalakrishnan, in 2018 IEEE Symposium on VLSI Circuits , IEEE, Honolulu, HI, 2018 , pp. 35-36.
- 39 G. W. Burr, R. M. Shelby, S. Sidler, C. di Nolfo, J. Jang, I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, B. N. Kurdi, H. Hwang, IEEE Trans. Electron Devices 2015 , 62 , 3498.
- 40 A. Sebastian, I. Boybat, M. Dazzi, I. Giannopoulos, V. Jonnalagadda, V. Joshi, G. Karunaratne, B. Kersting,
- R. Khaddam-Aljameh, S. R. Nandakumar, A. Petropoulos, C. Piveteau, T. Antonakopoulos, B. Rajendran, M. L. Gallo, E. Eleftheriou, in 2019 Symposium on VLSI Technology , IEEE, Kyoto, Japan, 2019 , pp. T168-T169.
- 41 A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, A. J. Kenyon, Front. Neurosci. 2019 , 13 , 593.
- 42 V. Joshi, M. L. Gallo, I. Boybat, S. Haefeli, C. Piveteau, M. Dazzi, B. Rajendran, A. Sebastian, E. Eleftheriou, arXiv:1906.03138 [cs] 2019 .
- 43 D. Joksas, P. Freitas, Z. Chai, W. H. Ng, M. Buckwell, W. D. Zhang, A. J. Kenyon, A. Mehonic, arXiv:1909.06658 [cs] 2019 .
- 44 M. Le Gallo, D. Krebs, F. Zipoli, M. Salinga, A. Sebastian, Adv. Electron. Mater. 2018 , 4 , 1700627.
- 45 S. R. Nandakumar, M. L. Gallo, C. Piveteau, V. Joshi, G. Mariani, I. Boybat, G. Karunaratne, R. KhaddamAljameh, U. Egger, A. Petropoulos, T. Antonakopoulos, B. Rajendran, A. Sebastian, E. Eleftheriou, arXiv:2001.11773 [cs] 2020 .
- 46 I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, arXiv:1609.07061 [cs] 2016 .
- 47 M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, E. Eleftheriou, Nat Electron 2018 , 1 , 246.
- 48 S. R. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, E. Eleftheriou, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) , IEEE, Florence, 2018 , pp. 1-5.
- 49 E. Eleftheriou, G. Karunaratne, B. Kersting, M. Stanisavljevic, V. P. Jonnalagadda, N. Ioannou, K. Kourtis, P. A. Francese, A. Sebastian, M. L. Gallo, S. R. Nandakumar, C. Piveteau, I. Boybat, V. Joshi, R. KhaddamAljameh, M. Dazzi, I. Giannopoulos, IBM J. Res. & Dev. 2019 , 63 , 7:1.
- 50 F. Alibart, E. Zamanidoost, D. B. Strukov, Nat Commun 2013 , 4 , 2072.
- 51 T. Gokmen, Y. Vlasov, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00333
- 52 S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, G. W. Burr, Nature 2018 , 558 , 60.
- 53 V. Seshadri, T. C. Mowry, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture MICRO-50 '17 , ACM Press, Cambridge, Massachusetts, 2017 , pp. 273-287.
- 54 S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) , IEEE, Austin, TX, 2017 , pp. 481-492.
- 55 N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, P. Deaville, IEEE Solid-State Circuits Mag. 2019 , 11 , 43.
- 56 F. Xiong, A. D. Liao, D. Estrada, E. Pop, Science 2011 , 332 , 568.
Kai-Shin Li, C. Ho, Ming-Taou Lee, Min-Cheng Chen, Cho-Lun Hsu, J. M. Lu, C. H. Lin, C. C. Chen, B. W.
Wu, Y. F. Hou, C. Yi. Lin, Y. J. Chen, T. Y. Lai, M. Y. Li, I. Yang, C. S. Wu, Fu-Liang Yang, in
- Symposium on VLSI Technology (VLSI-Technology): Digest of Technical Papers , IEEE, Honolulu, HI, USA, 2014 , pp. 1-2.
- 58 M. Salinga, B. Kersting, I. Ronneberger, V. P. Jonnalagadda, X. T. Vu, M. Le Gallo, I. Giannopoulos, O. Cojocaru-Mirédin, R. Mazzarello, A. Sebastian, Nature Mater 2018 , 17 , 681.
- 59 S. Pi, C. Li, H. Jiang, W. Xia, H. Xin, J. J. Yang, Q. Xia, Nature Nanotech 2019 , 14 , 35.
- 60 M. Buckwell, L. Montesi, S. Hudziak, A. Mehonic, A. J. Kenyon, Nanoscale 2015 , 7 , 18030.
- 61 S. Brivio, J. Frascaroli, S. Spiga, Nanotechnology 2017 , 28 , 395202.
- 62 S. Choi, S. H. Tan, Z. Li, Y. Kim, C. Choi, P.-Y. Chen, H. Yeon, S. Yu, J. Kim, Nature Mater 2018 , 17 , 335.
- 63 I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, E. Eleftheriou, Nat Commun 2018 , 9 , 2514.
- 64 W. W. Koelmans, A. Sebastian, V. P. Jonnalagadda, D. Krebs, L. Dellmann, E. Eleftheriou, Nat Commun 2015 , 6 , 8181.
I. Giannopoulos, A. Sebastian, M. Le Gallo, V. P. Jonnalagadda, M. Sousa, M. N. Boon, E. Eleftheriou, in
2018 IEEE International Electron Devices Meeting (IEDM)
- 66 S. Yu, Proc. IEEE 2018 , 106 , 260.
, IEEE, San Francisco, CA,
, pp. 27.7.1-27.7.4.
- 67 M. Le Gallo, T. Tuma, F. Zipoli, A. Sebastian, E. Eleftheriou, in 2016 46th European Solid-State Device Research Conference (ESSDERC) , IEEE, Lausanne, Switzerland, 2016 , pp. 373-376.
- 68 S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, H.-S. P. Wong, in 2012 International Electron Devices Meeting , IEEE, San Francisco, CA, USA, 2012 , pp. 10.4.1-10.4.4.
- 69 P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, H. Qian, Nature 2020 , 577 , 641.
- 70 P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, D. S. Modha, Science 2014 , 345 , 668.
- 71 M. Jerry, W. Tsai, B. Xie, X. Li, V. Narayanan, A. Raychowdhury, S. Datta, in 2016 74th Annual Device Research Conference (DRC) , IEEE, Newark, DE, USA, 2016 , pp. 1-2.
- 72 T. Tuma, A. Pantazi, M. L. Gallo, A. Sebastian, E. Eleftheriou, Nat. Nanotechnol. 2016 , 11, 693.
73 A. Mehonic, A. J. Kenyon, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00057
- 74 A. Sengupta, A. Ankit, K. Roy, in 2017 International Joint Conference on Neural Networks (IJCNN) , IEEE, Anchorage, AK, USA, 2017 , pp. 4557-4563.
- 75 P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, M. Pfeiffer, in 2015 International Joint Conference on Neural Networks (IJCNN) , IEEE, Killarney, Ireland, 2015 , pp. 1-8.
- 76 A. Sengupta, Y. Ye, R. Wang, C. Liu, K. Roy, Front. Neurosci . 2019 13, 95.
- 77 R. Midya, Z. Wang, S. Asapu, S. Joshi, Y. Li, Y. Zhuo, W. Song, H. Jiang, N. Upadhay, M. Rao, P. Lin, C. Li, Q. Xia, J. J. Yang, Adv. Electron. Mater. 2019 , 5, 1900060.
- 78 G. Bi, M. Poo, J. Neurosci. 1998 , 18 , 10464.
- 79 H. Shouval, Front. Comput. Neurosci. 2010 , DOI 10.3389/fncom.2010.00019.
- 80 Z. Brzosko, S. B. Mierau, O. Paulsen, Neuron 2019 , 103 , 563.
- 81 J. Seo, B. Brezzo, Y. Liu, B. D. Parker, S. K. Esser, R. K. Montoye, B. Rajendran, J. A. Tierno, L. Chang, D.
- S. Modha, D. J. Friedman, in 2011 IEEE Custom Integrated Circuits Conference (CICC) , IEEE, San Jose, CA, USA, 2011 , pp. 1-4.
- 82 D. Kuzum, R. G. D. Jeyasingh, B. Lee, H.-S. P. Wong, Nano Lett. 2012 , 12 , 2179.
- 83 S. Kim, C. Du, P. Sheridan, W. Ma, S. Choi, W. D. Lu, Nano Lett. 2015 , 15 , 2203.
- 84 K. Zarudnyi, A. Mehonic, L. Montesi, M. Buckwell, S. Hudziak, A. J. Kenyon, Front. Neurosci. 2018 , 12 , 57.
- 85 S. Kim, M. Ishii, S. Lewis, T. Perri, M. BrightSky, W. Kim, R. Jordan, G. W. Burr, N. Sosa, A. Ray, J.-P. Han, C. Miller, K. Hosokawa, C. Lam, in 2015 IEEE International Electron Devices Meeting (IEDM) , IEEE, Washington, DC, USA, 2015 , pp. 17.1.1-17.1.4.
- 86 I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, E. Eleftheriou, Nat Commun 2018 , 9 , 2514.
- 87 A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, T. Prodromakis, Nat Commun 2016 , 7 , 12611.
- 88 Y. Fang, Z. Wang, J. Gomez, S. Datta, A. I. Khan, A. Raychowdhury, Front. Neurosci. 2019 , 13 , 855.
- 89 N. Anwani, B. Rajendran, Neurocomputing 2020 , 380 , 67.
- 90 F. Zenke, S. Ganguli, Neural Computation 2018 , 30 , 1514.
- 91 E. O. Neftci, H. Mostafa, F. Zenke, IEEE Signal Process. Mag. 2019 , 36 , 51.
- 92 S. R. Nandakumar, I. Boybat, M. L. Gallo, E. Eleftheriou, A. Sebastian, B. Rajendran, arXiv:1905.11929 [cs] 2019 .
- 93 W. Maass, Proc. IEEE 2014 , 102 , 860.
- 94 M. Payvand, M. V. Nair, L. K. Müller, G. Indiveri, Faraday Discuss. 2019 , 213 , 487.
- 95 D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques , MIT Press, Cambridge, MA, 2009 .
- 96 R. J. Williams, D. Zipser, Neural Computation 1989 , 1 , 270.
- 97 A. Griewank, A. Walther, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation , Society For Industrial And Applied Mathematics, Philadelphia, PA, 2008 .
- 98 J. H. Lee, T. Delbruck, M. Pfeiffer, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00508.
- 99 H. Mostafa, IEEE Trans. Neural Netw. Learning Syst. 2017 , 1.
- 100 D. Huh, T. J. Sejnowski, arXiv:1706.04698 [cs, q-bio, stat] 2017 .
- 101 E. O. Neftci, C. Augustine, S. Paul, G. Detorakis, Front. Neurosci. 2017 , 11 , 324.
- 102 B. Nessler, M. Pfeiffer, L. Buesing, W. Maass, PLoS Comput Biol 2013 , 9 , e1003037
- 103 R. M. Neal, Artificial Intelligence 1992 , 56 , 71.
- 104 Y. Tang, R. Salakhutdinov, Advances In Neural Information Processing Systems 2013 (Pp. 530-538).
- 105 Y. Bengio, N. Léonard, A. Courville, arXiv:1308.3432 [cs] 2013 .
- 106 G. E. Hinton, S. Osindero, Y.-W. Teh, Neural Computation 2006 , 18 , 1527
- 107 R. M. Neal, Artificial Intelligence 1992 , 56 , 71.
- 108 D. Barber, P. Sollich, in Advances In Neural Information Processing Systems 2000 (Pp. 393-399).
- 109 T. Raiko, M. Berglund, G. Alain, L. Dinh, arXiv:1406.2989 [cs, stat] 2015 .
- 110 S. Gu, S. Levine, I. Sutskever, A. Mnih, arXiv:1511.05176 [cs] 2016 .
- 111 H. Jang, O. Simeone, B. Gardner, A. Gruning, IEEE Signal Process. Mag. 2019 , 36 , 64.
- 112 J. Brea, W. Senn, J.-P. Pfister, Journal of Neuroscience 2013 , 33 , 9565
B. Rosenfeld, O. Simeone, B. Rajendran, in
2019 IEEE 20th International Workshop on Signal Processing
Advances in Wireless Communications (SPAWC)
, IEEE, Cannes, France,
S. Mohamed, M. Rosca, M. Figurnov, A. Mnih,
H. Jang, O. Simeone, in
, pp. 1-5.
arXiv:1906.10652 [cs, math, stat]
.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP)
, IEEE, Brighton, United Kingdom,
, pp. 3382-3386.
S.-C. Liu, B. Rueckauer, E. Ceolini, A. Huber, T. Delbruck,
C. Xu, W. Zhang, Y. Liu, P. Li,
Front. Neurosci.
,
IEEE Signal Process. Mag.
, 104.
118 A. Bagheri, O. Simeone, B. Rajendran, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , IEEE, Calgary, AB, 2018 , pp. 2986-2990.
N. Kasabov,
Neural Networks
,
23 , 16
120 H. Mostafa, G. Cauwenberghs, Neural Computation 2018 , 30 , 1542.
N. Skatchkovsky, H. Jang, O. Simeone, arXiv:1910.09594 [cs, eess, stat]
D. E. Rumelhart, G. E. Hinton, R. J. Williams,
,
, 533.
Nature
123 D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, D. Hassabis, Nature 2017 , 550 , 354.
E. Strubell, A. Ganesh, A. McCallum,
Q. Xia, J. J. Yang,
Nat. Mater.
arXiv:1906.02243 [cs]
,
, 309.
M. Rahimi Azghadi, Y.-C. Chen, J. K. Eshraghian, J. Chen, C.-Y. Lin, A. Amirsoleimani, A. Mehonic, A. J.
Kenyon, B. Fowler, J. C. Lee, Y.-F. Chang,
Advanced Intelligent Systems
, 1900189.
The 'echo State' approach to Analysing and Training Recurrent Neural Networks
H. Jaeger,
German National Research Institute For Computer Science,
W. Maass, T. Natschläger, H. Markram,
.
Neural Computation
L. Manneschi, E. Vasilaki,
Nat Mach Intell
,
M. Hermans, B. Schrauwen,
, 155
Neural Computation
,
C. Du, F. Cai, M. A. Zidan, W. Ma, S. H. Lee, W. D. Lu,
, 104.
Nat Commun
L. Manneschi, A. C. Lin, E. Vasilaki, arXiv:1912.08124 [cs, stat]
,
.
M. Lanza, H.-S. P. Wong, E. Pop, D. Ielmini, D. Strukov, B. C. Regan, L. Larcher, M. A. Villena, J. J. Yang,
L. Goux, A. Belmonte, Y. Yang, F. M. Puglisi, J. Kang, B. Magyari-Köpe, E. Yalon, A. Kenyon, M. Buckwell,
A. Mehonic, A. Shluger, H. Li, T.-H. Hou, B. Hudec, D. Akinwande, R. Ge, S. Ambrogio, J. B. Roldan, E.
Miranda, J. Suñe, K. L. Pey, X. Wu, N. Raghavan, E. Wu, W. D. Lu, G. Navarro, W. Zhang, H. Wu, R. Li, A.
Holleitner, U. Wurstbauer, M. C. Lemme, M. Liu, S. Long, Q. Liu, H. Lv, A. Padovani, P. Pavan, I. Valov, X.
Jing, T. Han, K. Zhu, S. Chen, F. Hui, Y. Shi,
Adv. Electron. Mater.
Commun. ACM
J. B. Aimone,
J. D. Kendall, S. Kumar,
,
, 110.
Applied Physics Reviews
,
, 011305.
, 1800143.
.
,
,
.
14 , 2531.
, 2204.
,
36 , 29.
, GMD -