2004.14942v1

Model: nemotron-free

## Memristors - from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing Adnan Mehonic * , Abu Sebastian, Bipin Rajendran, Osvaldo Simeone, Eleni Vasilaki, Anthony J. Kenyon Dr. Adnan Mehonic, Prof Anthony J. Kenyon Department of Electronic & Electrical Engineering, UCL, Torrington Place, London WC1E 7JE, United Kingdom E-mail: adnan.mehonic.09@ucl.ac.uk Dr. Abu Sebastian IBM Research - Zurich, 8803 Rüschlikon, Switzerland Dr. Bipin Rajendran, Prof Osvaldo Simeone Centre for Telecommunications Research, Department of Engineering, King's College London, WC2R 2LS, United Kingdom Prof. Eleni Vasilaki Department of Computer Science, University of Sheffield, Sheffield, South Yorkshire, United Kingdom Keywords: memristor, neuromorphic, AI, deep learning, spiking neural networks, in-memory computing ## Abstract Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. These successes have been mostly supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raise the question of whether the described progress will be slowed or halted due to hardware limitations. This paper reviews the case for a novel beyond-CMOS hardware technology - memristors - as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks. Central themes are the reliance on nonvon-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in artificial intelligence, we briefly discuss an example based reservoir computing. We conclude the review by speculating on the 'big picture' view of future neuromorphic and brain-inspired computing systems. ## 1. Introduction The three factors are currently driving the main developments in artificial intelligence (AI): availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. Graphics processing units (GPUs) have been demonstrated as effective coprocessors for the implementation of machine learning (ML) algorithms based on deep learning (DL). Solutions based on deep learning and GPU implementations have led to massive improvements in many AI tasks, but have also caused an exponential increase in demand for computing power. Recent analyses show that the demand for computing power has increased by a factor of 300,000 since 2012, and the estimate is that this demand will double every 3.4 months - at a much faster rate than improvements made historically through Moore's scaling (a 7-fold improvement over the same period of time) [1] . At the same time, Moore's law has been slowing down significantly for the last few years [2] , as there are strong indications that we will not be able to continue scaling down CMOS transistors. This calls for the exploration of alternative technology roadmaps for the development of scalable and efficient AI solutions. Transistor scaling is not the only way to improve computing performance. Architectural innovations such as GPUs, field-programmable arrays (FPGAs), and application-specific integrated circuits (ASICs), have all significantly advanced the ML field 3 . A common aspect of modern computing architectures for ML is a move away from the classical von Neumann architecture that physically separates memory and computing. This approach yields a performance bottleneck that is often the main reason for both energy and speed inefficiency of ML implementations on conventional hardware platforms due to costly data movements. However, architectural developments alone are not likely to be sufficient. In fact, standard digital CMOS components are inherently not well suited for the implementation of a massive number of continuous weights/synapses in artificial neural networks (ANNs). 1.1. The promise of memristors. There is a strong case to be made for the exploration of alternative technologies. Although the memristor technology is currently still in development, it is a strong candidate for future non-CMOS and beyond von-Neumann computing solutions [4] . Since its early development in 2008 [5] , or even earlier under different names [6] , memristor technology expanded remarkably to include many different materials solutions, physical mechanisms, and novel computing approaches [4] . A single progress report cannot cover all different approaches and fast-growing developments in the field. The evaluation of state of the art in memristor-based electronics can be found elsewhere [7] . Instead, in this paper, we present and discuss a few representative case studies, showcasing the potential role of memristors in the expanding field of AI hardware. We present examples of how memristors are used for in-memory computing systems, deep learning accelerators, and spike-based computing. Finally, we discuss and speculate on the future of neuromorphic and bio-inspired computing paradigms and provide reservoir computing as an example. For the last 15 years, memristors have been a focal point for many different research communities - mathematicians, solid-state physicists, experimental material scientists, electrical engineers and, more recently, computer scientists and computational neuroscientists. The concept of memristor was introduced almost 50 years ago, back in 1971 [8] , was nearly forgotten for almost four decades. It is now experiencing a rebirth with a vibrant and very active research community. There are many different flavours of memristive technologies. Still, in their most popular implementation, memristors are simple two-terminal devices with the extraordinary property that their resistance depends on their history of electrical stimuli. In other words, memristors are resistors with memory. They promise high levels of integration, stable non-volatile resistance states, fast resistance switching, excellent energy efficiency - all very desirable properties for next generation of memory technologies. The physical implementations of memristors are broad and arguably include many different technologies such as redox-based resistive random-access memory (ReRAM), phase change memories (PCM), magnetoresistive random-access memory (MRAM). Further differentiations within larger classes can be made, depending on physical mechanisms that govern the resistance change. Many excellent reviews cover the principles and switching mechanisms of memristor devices. Here, we will briefly mention two extensively studied types of memristive devices, namely redox-based random access memory (ReRAM) and phase-change memory (PCM). Resistance switching is one of the most explored properties of memristive devices. A thin insulating film reversibly changes its electrical resistance - between an insulating state and a conducting state - under the application of an external electrical stimulus. For binary memory devices, two stable states are sought, typically called the high resistance state (HRS), and the low resistance state (LRS). The transition from the HRS to the LRS is called a SET process, while a RESET process describes the transition from the LRS to the HRS. Basic memory cells of both types, in their most straightforward implementation, have three layers - two conductive electrodes and a thin switching layer sandwiched in-between. Local redox processes govern resistance switching in ReRAM devices. A broad classification can be made based on a distinction between the switching that happens as a result of intrinsic properties of the switching material (typically oxides), and switching that is the result of indiffusion of metal ions (typically from one of the metallic electrodes). The former type is called intrinsic switching, and the latter is called extrinsic switching [9] . Alternatively, a classification can be made depending on the main driving force for the redox process (thermal or electrical), or the type of ions that move. The main three classes are electrochemical metallization cells (or conductive bridge) ReRAMs (ECM), valence change ReRAMs (VCM) and thermochemical ReRAMs (TCM) [4] . Many ReRAM devices require an electroforming step prior to resistance switching. This can be considered a soft breakdown of the insulating material. A conductive filament is produced inside the insulating film as a result of the applied electrical bias. Modification of conductive filaments, led by a local redox process, leads to the change of resistance. The diameter of the conductive filament is typically of the order of a few nanometers to a few tens of nanometers, and it does not depend on the size of the electrodes. Another, less common type is interfacetype switching, which does not depend on creation and modification of conductive filaments, but can be driven by the formation of a tunnel or Schottky barrier across the whole interface between electrode and switching layer. In the case of PCMs, the change of resistance due to the crystallisation and amorphisation processes of phase change materials. Amplitude and duration of applied voltage pulses control the phase transitions - the SET process changes the amorphous to a crystalline phase (HRS to LRS transition), and the RESET process changes the crystalline to an amorphous phase (LRS to HRS transition). For many computing tasks, more than two states are required, and for most memristive devices, including ReRAMs and PCMs, many resistance states can be achieved. However, benchmarking of memristive devices for different applications, beyond pure digital memory, can be challenging and relies on many different parameters other than the number of different resistance states. We will discuss the main device properties in the context of different applications. 1.2 The landscape of different approaches and applications. In the context of this paper, memristors can be used in applications beyond simple memory devices [10] . A 'big picture' landscape of memristor-based approaches for AI is shown in Figure 1. There is more than one way that memristors can perform computing. A unique feature of memristor devices is the ability to co-locate memory and computing and to break the von Neumann bottleneck at the lowest, nanometre-scale level. One such approach is the concept of in-memory computing, which uses memory not only to store the data but also to perform computation at the same physical location. Furthermore, memristors have long been considered for deep learning acceleration. Specifically, memristive crossbar arrays physically represent weights in artificial neural networks as conductances at each crosspoint. When voltages are applied at one side of the crossbar and current sensed on the orthogonal terminals, the array provides vector-matrix multiplication in constant time step using Kirchhoff's and Ohm's laws. Vector-matrix multiplications dominate most DL algorithms - hundreds of thousands are often needed during training and inference. When weights are implemented as memristor conductances, there is no need for the extensive power-hungry data movement required by conventional digital systems based on the von Neumann architecture. Other more bio-realistic concepts are also being explored. These include schemes relying on spike-based communication. The central premise of this approach can be summarised with the motto 'computing with time, not in time'. It has been shown that memristors can directly implement some functions of biological neurons and synapses, most importantly, synapse-like plasticity, and neuron-like integration and spiking. In these solutions, the information is encoded and transferred in the form of voltage or current spikes. Memristor resistances are used as proxies for synaptic strengths. More importantly, adjustment of the resistances is controlled according to local learning rules. One popular local learning rule is spike-timingdependent plasticity (STDP), which adjust a local state variable such as conductance dynamically based on the relative timing of spikes. In a simple example, the conductance of a memristive 'synapse' can be increased or decreased depending on the degree of overlap between pre- and post-synaptic voltage pulses. There also exist implementations that do not require overlapping pulses, instead utilising the volatile internal dynamics of memristive devices. Spike-based computing promises further improvements in power-efficiency, taking the inspiration from the remarkable efficiency of the human brain. Finally, we speculate that, for future developments in AI, new knowledge and computational models from the fields of computational neuroscience could play a crucial role. Virtually all recent developments in ML and DL are driven by the field of computer science. At the same time, the algorithmic inspiration from neuroscience is mostly based on old models established as early as the 1950s. Although we are still at the infancy of understanding the full working principles of the biological brain, novel brain-inspired architectural principles, beyond simple probabilistic deep learning approaches, could lead to higher-level cognitive functionalities. One such example is the concept of reservoir computing, which we discuss briefly in the paper. It is unlikely that current digital CMOS transistor technology can be optimized for the implementation of much more dynamic and adaptive systems in an efficient way. In contrast, memristor-based systems, with their rich switching dynamics and many state variables, may provide a perfect substrate to build a new class of intelligent and efficient neuromorphic systems. Figure 1. The landscape of memristor-based systems for Artificial Intelligence. In-memory computing aims to eliminate the von-Neumann bottleneck by implementing compute directly within the memory. Deep learning accelerators based on memristive crossbars are used to implement vector-matrix multiplication directly using Ohm's and Kirchhoff's laws. Spiking neural networks, a type of artificial neural networks, are biologically more plausible and do not operate with continuous signals, but use spikes to process and transfer data. Memristor systems could provide a hardware platform to implement spike-based learning and inference. More complex functionalities (neuromorphic), beyond simple digital switching CMOS paradigm, directly implemented in memristive hardware primitives, might fuel the next wave of higher cognitive systems. <details> <summary>Image 1 Details</summary> ![5601afdf](/v1/image/5601afdf3b210060fd94cc565b26bae0d0ba11e789074507f164090e4951b4da) ### Visual Description ## Diagram: Memristor-Based Computing Architectures and Applications ### Overview The image presents a conceptual framework for memristor-based computing technologies, divided into four quadrants surrounding a central oval labeled "Memristors." Each quadrant explores a distinct application or architectural approach, emphasizing efficiency improvements, neural network implementations, and future cognitive computing paradigms. --- ### Components/Axes 1. **Central Oval (Memristors)** - Contains two material diagrams: - **PCM (Phase-Change Memory)**: Blue spheres with a pink core. - **ReRAM (Resistive RAM)**: Pink spheres with a blue core. - Text: "Memristors" (bold black font). 2. **Quadrant Labels** - **Top-Left**: "In-memory computing" - **Top-Right**: "Memristor crossbar array" and "Deep Learning Accelerators" - **Bottom-Left**: "Memristor-based Spiking Neural Networks" - **Bottom-Right**: "Future of cognitive computing" 3. **Diagram Elements** - **Top-Left**: - Conventional von-Neumann architecture (green box with "COMPUTE" and "MEMORY" labels). - Arrows indicate data flow: "Bringing computing closer to memory." - Text: "Minimising von-Neumann bottleneck improves efficiency." - **Top-Right**: - Memristor crossbar array (grid of green lines with yellow nodes labeled "Memristor"). - "Operating/Sensing Terminals" labeled on grid edges. - Equations for analog MAC accelerator: - Input: Multiplication matrix **G** mapped to RRAM crossbar. - Output: Current vector **I** = **Y·G** (vector-matrix product). - **Bottom-Left**: - Neuron-like structure with dendritic spines (pink/red dots) and synaptic terminals. - Text: "Spike-based learning and inference." - **Bottom-Right**: - Brain silhouette with labeled cognitive traits: "Attention," "Creativity," "Speed," "Focus," "Flexibility," "Memory." - Green arrow pointing to "Novel bio-inspired algorithms, devices and systems." --- ### Detailed Analysis 1. **Top-Left (In-memory computing)** - Conventional architecture shows separation between memory (blue) and logic (green). - Memristor integration reduces data movement, improving efficiency. 2. **Top-Right (Memristor crossbar array)** - Grid structure represents crossbar arrays with memristors at intersections. - Analog MAC accelerator equations suggest parallel computation capabilities. 3. **Bottom-Left (Spiking Neural Networks)** - Neuron model includes dendritic spines (pink/red) and synaptic terminals. - Memristor array acts as synaptic weights for spike-based learning. 4. **Bottom-Right (Future cognitive computing)** - Brain silhouette emphasizes human-like cognitive traits. - Green arrow links memristors to bio-inspired systems. --- ### Key Observations - **Color Coding**: - Blue (PCM), Pink (ReRAM), Green (crossbar arrays), Orange (neurons). - **Efficiency Focus**: All quadrants emphasize reducing energy consumption (e.g., "power-efficient analog MAC accelerator"). - **Biological Inspiration**: Spiking neural networks and brain-like cognitive traits suggest neuromorphic computing goals. --- ### Interpretation The diagram illustrates memristors as a foundational technology for next-generation computing systems. By integrating memory and logic (top-left), enabling analog crossbar arrays (top-right), and mimicking biological neural processes (bottom-left), memristors address von-Neumann bottlenecks and enable energy-efficient AI. The bottom-right quadrant ties these advancements to broader cognitive applications, suggesting memristors could underpin systems with human-like adaptability and creativity. The emphasis on "novel bio-inspired algorithms" implies a shift toward neuromorphic computing paradigms that prioritize parallelism and low-power operation. </details> ## 2. In-memory computing In the von Neumann architecture, which dates back to the 1940s, memory and processing units are physically separated and large amounts of data need to be shuttled back and forth between them during the execution of various computational tasks. The latency and energy associated with accessing data from the memory units are key performance bottlenecks for a range of applications, in particular for the increasingly prominent artificial intelligence related workloads [11] . The energy cost associated with moving data is a key challenge for both severely energy constrained mobile and edge computing as well as high performance computing in a cloud environment due to cooling constraints. The current approaches, such as using hundreds of processors in parallel [12] or application-specific processors [13] , are not likely to fully overcome the challenge of data movement. It is getting increasingly clear that novel architectures need to be explored where memory and processing are better collocated. In-memory computing is one such non-von Neumann approach where certain computational tasks are performed in place in the memory itself organized as a computational memory unit [14,15 ,16, 17]. As schematically illustrated in Figure 2, in-memory computing obviates the need to move data into a processing unit. Computing is performed by exploiting the physical attributes of the memory devices, their array-level organization, the peripheral circuitry as well as the control logic. In this paradigm, the memory is an active participant in the computational task. Besides reducing latency and energy cost associated with data movement, in-memory computing also has the potential to improve the computational time complexity associated with certain tasks due to the massive parallelism afforded by a dense array of millions of nanoscale memory devices serving as compute units. By introducing physical coupling between the memory devices, there is also a potential for further reduction in computational time complexity [18, 19]. Memristive devices such as PCM, ReRAM and MRAM [20, 21] are particularly well suited for in-memory computing. ## Processing unit & Conventional memory Processing unit & Computational memory Figure 2. In-memory computing. In a conventional computing system, when an operation f is performed on data D, D has to be moved into a processing unit. This incurs significant latency and energy cost and creates the well-known von Neumann bottleneck. With in-memory computing, f(D) is performed within a computational memory unit by exploiting the physical attributes of the memory devices. This obviates the need to move D to the processing unit. (Adapted and reproduced with permission [14] , Copyright 2017, Nature Research) <details> <summary>Image 2 Details</summary> ![b37253a2](/v1/image/b37253a2b5cb92f11ce56887eb12810fdbfac9ecabc6b2ce8618706fd16273fb) ### Visual Description ## Diagram: Memory and Processing Unit Architectures Comparison ### Overview The image presents two side-by-side diagrams comparing memory and processing unit architectures. The left diagram illustrates a conventional architecture with a clear separation between memory banks and processing units, while the right diagram shows a computational memory architecture with integrated processing capabilities. Both diagrams include control units, ALUs (Arithmetic Logic Units), and memory banks, but differ in their data flow and computational capabilities. ### Components/Axes **Left Diagram (Conventional Architecture):** - **Memory Section**: - Contains multiple "Bank #1" to "Bank #N" blocks, each with two square elements. - Labels: "Memory", "Bank #1", "Bank #N". - **Processing Unit Section**: - Contains "Control unit" and "ALU" blocks. - Labels: "Processing unit", "Control unit", "ALU". - **Data Flow**: - Arrows labeled "CONTROL" (blue) connect memory banks to the control unit. - Arrows labeled "FETCH" (blue) and "STORE" (red) connect the cache to the ALU. - A "bottleneck" label (red) highlights the fetch/store pathway. - **Function Notation**: "A := f(A)" appears above the processing unit. **Right Diagram (Computational Memory Architecture):** - **Memory Section**: - Contains "Computational memory" with "Bank #1" to "Bank #N" blocks. - Labels: "Computational memory", "Bank #1", "Bank #N". - **Processing Unit Section**: - Contains "Control unit" and "ALU" blocks. - Labels: "Processing unit", "Control unit", "ALU". - **Data Flow**: - Arrows labeled "CONTROL" (blue) connect computational memory to the control unit. - Dashed arrows labeled "f" (function application) connect memory banks directly to the ALU. - No explicit "bottleneck" label. - **Function Notation**: "A := f(A)" appears above the processing unit. **Shared Elements**: - Both diagrams use color-coded arrows: - Blue: "CONTROL" signals. - Red: "STORE" operations (left diagram only). - Pink: "bottleneck" highlight (left diagram only). - Both include "Cache" blocks connected to the ALU. ### Detailed Analysis **Left Diagram**: - Memory banks are isolated from processing units. - Data must pass through the control unit before reaching the ALU. - Fetch/store operations create a bottleneck, indicated by the red label and thicker arrow. - Function application (f(A)) occurs after data retrieval from memory. **Right Diagram**: - Memory banks are labeled "Computational memory," implying integrated computation. - Dashed arrows labeled "f" suggest in-memory computation (A := f(A)). - No explicit bottleneck, as computation occurs closer to memory. - Direct data flow from memory to ALU via function application. ### Key Observations 1. **Bottleneck Elimination**: The right diagram removes the fetch/store bottleneck present in the left diagram. 2. **In-Memory Computation**: The right diagram introduces function application (f) directly in memory banks, enabling computation without data movement. 3. **Architectural Integration**: Computational memory in the right diagram blurs the line between storage and processing. 4. **Control Unit Role**: Both architectures retain a central control unit, but its role differs: - Left: Manages data flow between isolated components. - Right: Coordinates integrated memory-processing operations. ### Interpretation The diagrams contrast traditional von Neumann architectures (left) with emerging computational memory designs (right). The left diagram's bottleneck highlights the performance limitations of separating memory and processing. The right diagram's integration of computation into memory banks suggests: - Reduced latency through in-memory operations. - Lower energy consumption by minimizing data movement. - Potential for parallel computation across memory banks. This architectural shift aligns with trends in neuromorphic and in-memory computing, where processing units are embedded within memory hierarchies to address the "memory wall" problem. The function notation (f(A)) implies support for complex operations directly in memory, which could enable applications like AI/ML workloads with reduced data transfer overhead. </details> Figure 3. The key physical attributes of memristive devices that facilitate in- memory computing . a) Binary storage capability whereby the devices can be switched between high and low resistance values in a repeatable manner (Adapted and reproduced with permission [22] . Copyright 2019, IOP Publishing). b) Multi- level storage capability whereby the devices can be programmed to a continuum of resistance values by the application of appropriate programming pulses (Adapted and reproduced with permission [23] . Copyright 2018, American Institute of Physics) c) The accumulative behavior whereby the resistance of a device can be progressively decreased by the successive application of identical programming pulses (Adapted and reproduced with permission [23] . Copyright 2018, American Institute of Physics). <details> <summary>Image 3 Details</summary> ![d3f4b14f](/v1/image/d3f4b14f88b8fc5c2511fc7f2494ee03439fc6024ea57252c543cadc01864449) ### Visual Description ## Line Graphs: Resistance vs. Cycles, Programming Current, and Pulses ### Overview The image contains three line graphs (a, b, c) depicting resistance (Ω) as a function of different operational parameters: (a) number of cycles, (b) programming current (μA), and (c) number of pulses. Each graph includes insets illustrating a device structure with layered components (electrode, active layer, electrode) and red dots representing localized features. The graphs use logarithmic scales for resistance and linear scales for operational parameters. --- ### Components/Axes #### Graph a: Resistance vs. Number of Cycles - **X-axis**: Number of cycles (logarithmic scale: 10⁰ to 10¹⁰) - **Y-axis**: Resistance (Ω) (logarithmic scale: 10⁴ to 10⁷) - **Legend**: - Black squares: "SET" - Red dots: "RESET" - **Insets**: Two device diagrams showing layered structures with red dots in the active layer. #### Graph b: Resistance vs. Programming Current - **X-axis**: Programming current (μA) (linear scale: 100 to 800) - **Y-axis**: Resistance (Ω) (logarithmic scale: 10⁴ to 10⁷) - **Legend**: - Purple circles: "SET" - Green squares: "RESET" - Blue triangles: "RESET+SET" - Red diamonds: "RESET+SET+RESET" - **Insets**: Three device diagrams with red dots in the active layer. #### Graph c: Resistance vs. Number of Pulses - **X-axis**: Number of pulses (linear scale: 0 to 30) - **Y-axis**: Resistance (Ω) (logarithmic scale: 10⁴ to 10⁷) - **Legend**: - Purple circles: "SET" - Green squares: "RESET" - Blue triangles: "RESET+SET" - Red diamonds: "RESET+SET+RESET" - **Insets**: Three device diagrams with red dots in the active layer. --- ### Detailed Analysis #### Graph a: Resistance vs. Number of Cycles - **SET (black squares)**: Resistance decreases from ~10⁶ Ω to ~10⁴ Ω over 10⁸ cycles. The trend is monotonic and gradual. - **RESET (red dots)**: Resistance remains constant at ~10⁷ Ω across all cycles. No variation observed. - **Key Data Points**: - At 10² cycles: SET ≈ 10⁶ Ω, RESET ≈ 10⁷ Ω. - At 10⁸ cycles: SET ≈ 10⁴ Ω, RESET ≈ 10⁷ Ω. #### Graph b: Resistance vs. Programming Current - **SET (purple circles)**: Resistance increases from ~10⁵ Ω to ~10⁶ Ω as current rises from 100 to 800 μA. - **RESET (green squares)**: Resistance remains flat at ~10⁵ Ω regardless of current. - **RESET+SET (blue triangles)**: Resistance increases from ~10⁵ Ω to ~10⁶ Ω, similar to SET but with a steeper slope. - **RESET+SET+RESET (red diamonds)**: Resistance increases from ~10⁵ Ω to ~10⁶ Ω, with a plateau at higher currents. - **Key Data Points**: - At 100 μA: SET ≈ 10⁵ Ω, RESET ≈ 10⁵ Ω, RESET+SET ≈ 10⁵ Ω, RESET+SET+RESET ≈ 10⁵ Ω. - At 800 μA: SET ≈ 10⁶ Ω, RESET ≈ 10⁵ Ω, RESET+SET ≈ 10⁶ Ω, RESET+SET+RESET ≈ 10⁶ Ω. #### Graph c: Resistance vs. Number of Pulses - **SET (purple circles)**: Resistance decreases from ~10⁶ Ω to ~10⁵ Ω over 30 pulses. - **RESET (green squares)**: Resistance remains flat at ~10⁵ Ω. - **RESET+SET (blue triangles)**: Resistance decreases from ~10⁶ Ω to ~10⁵ Ω, with a slower rate than SET. - **RESET+SET+RESET (red diamonds)**: Resistance decreases from ~10⁶ Ω to ~10⁵ Ω, with a gradual decline. - **Key Data Points**: - At 0 pulses: SET ≈ 10⁶ Ω, RESET ≈ 10⁵ Ω, RESET+SET ≈ 10⁶ Ω, RESET+SET+RESET ≈ 10⁶ Ω. - At 30 pulses: SET ≈ 10⁵ Ω, RESET ≈ 10⁵ Ω, RESET+SET ≈ 10⁵ Ω, RESET+SET+RESET ≈ 10⁵ Ω. --- ### Key Observations 1. **SET Operation**: Resistance decreases with cycles (graph a) and increases with programming current (graph b). In graph c, resistance decreases with pulses. 2. **RESET Operation**: Resistance remains constant across all parameters (graphs a, b, c). 3. **Combined Sequences**: - "RESET+SET" and "RESET+SET+RESET" show intermediate resistance values between SET and RESET. - Resistance trends for combined sequences are less pronounced than for individual operations. 4. **Device Structure**: Insets in all graphs show a layered device with red dots in the active layer, likely representing defects or active regions critical for resistance modulation. --- ### Interpretation - **SET/RESET Dynamics**: The device exhibits non-volatile switching behavior, where resistance changes persist across cycles (graph a) and pulses (graph c). The SET operation reduces resistance, while RESET maintains it. - **Current Dependence**: Resistance modulation is current-dependent (graph b), with higher currents inducing larger resistance changes. This suggests a threshold effect for SET/RESET operations. - **Pulse Effects**: Repeated pulses (graph c) reduce resistance, indicating potential fatigue or stabilization of the active layer. The gradual decline suggests a memory effect or degradation mechanism. - **Device Mechanism**: The red dots in the active layer (insets) may represent localized defects or phase-separated regions that govern resistance switching. Their distribution could influence the device's response to electrical stimuli. The data collectively demonstrate a resistive switching memory device with tunable resistance via electrical programming, current, and pulse sequences. The interplay between SET/RESET operations and device structure highlights the importance of material composition and defect engineering for optimizing performance. </details> There are several key physical attributes that enable in-memory computing using memristive devices. First of all, the ability to store two levels of resistance/conductance values in a nonvolatile manner and to reversibly switch from one level to the other (binary storage capability) can be exploited for computing. Figure 3 a shows the resistance values achieved upon repeated switching of a representative PCM device between low resistance SET states and high resistance RESET states. Due to the SET and RESET states, resistance could serve as an additional logic state variable. In conventional CMOS, voltage serves as the single logic state variable. The input signals are processed as voltage signals and are output as voltage signals. By combining CMOS circuitry with memristive devices, it is possible to exploit the additional resistance state variable. For example, the RESET state could indicate logic '0' and the SET state could denote logic '1'. This enables logical operations that rely on the interaction between the voltage and resistance state variables and could enable the seamless integration of processing and storage. This is the essential idea behind memristive logic, which is an active area of research [24, 25, 26] . Memristive logic has the potential to impact application areas such as image processing [27] , encryption and database query [28] . Brain-inspired hyperdimensional computing that involves the manipulation of large binary vectors has recently emerged as another promising application area for in-memory logic [29, 30] . Going beyond binary storage, certain memristive devices can also be programmed to a continuum of resistance or conductance values (analog storage capability). For example, Figure 3 b shows a continuum of resistance levels in a PCM device achieved by the application of programming pulses with varying amplitude. The device is first programmed to the fully crystalline state, after which RESET pulses are applied with progressively increasing amplitude. The device resistance is measured after the application of each RESET pulse. Due to this property, it is possible to program a memristive device to a certain desired resistance value through iterative programming by applying several pulses in a closed-loop manner [31] . Yet another physical attribute that enables in-memory computing is the accumulative behavior exhibited by certain memristive devices. In these devices, it is possible to progressively reduce the device resistance by the successive application of SET pulses with the same amplitude. And in certain cases, it is possible to progressively increase the resistance by the successive application of RESET pulses. Experimental measurement of this accumulative behavior in a PCM device is shown in Figure 3 c . This accumulative behavior is central to applications such as training deep neural networks which is described later. The intrinsic stochasticity associated with the switching behavior in memristive devices can also be exploited for inmemory computing [32] . Applications include stochastic computing [33] and physically unclonable functions [34] . × Figure 4. a) Compressed sensing involves one matrix-vector multiplication. Data recovery is performed via an iterative scheme, using several matrix-vector multiplications on the very same measurement matrix and its transpose. b) An experimental illustration of compressed sensing recovery in the context of image compression is presented, showing 50% compression of a 128x128 pixel image. The normalized mean square error (NMSE) associated with the reconstructed signal is plotted against the number of iterations. Adapted and reproduced with permission [35] , Copyright 2018, IEEE. <details> <summary>Image 4 Details</summary> ![59de7dce](/v1/image/59de7dceefaade835402c10f95ea5372f2b2c484f75b8cbf6b1bf9e56e05e1c0) ### Visual Description ## Diagram: AMP Algorithm and NMSE Performance Analysis ### Overview The image contains two primary components: 1. **Diagram (a)**: Illustrates the Adaptive Measurement and Processing (AMP) algorithm for iterative image reconstruction. 2. **Graph (b)**: Compares the normalized mean squared error (NMSE) performance of different computational methods over iterations. 3. **Images**: Side-by-side comparison of an original grayscale image and its reconstructed version. --- ### Components/Axes #### Diagram (a) - **Measurement Section (Blue Box)**: - Variables: `x₁, x₂, ..., xₙ` (input signals) and `y₁, y₂, ..., yₘ` (measurements). - Arrows: Green diagonal arrows indicate measurement mapping `y = Mx`. - **Iterative Reconstruction Section (Red Box)**: - Variables: - `q₁(k), q₂(k), ..., qₘ(k)` (intermediate estimates). - `u₁(k), u₂(k), ..., uₙ(k)` (update terms). - `z₁(k), z₂(k), ..., zₘ(k)` (final reconstructions). - Equations: - `q(k) = Mx̂(k)` (forward model). - `u(k) = Mᵀz(k)` (backprojection). - Arrows: Red arrows indicate iterative updates between variables. #### Graph (b) - **X-axis**: Iterations `k` (0 to 30). - **Y-axis**: NMSE (log scale, 10⁻³ to 10⁰). - **Legend**: - Red solid line: PCM chip. - Blue dashed line: 4x4-bit Fixed-point. - Green dash-dot line: Floating-point. #### Images - **Original Image**: Grayscale photo of a house with a chimney. - **Reconstructed Image**: Slightly blurred version of the original, showing reconstruction fidelity. --- ### Detailed Analysis #### Diagram (a) - **Measurement Mapping**: - Input signals `x₁–xₙ` are transformed into measurements `y₁–yₘ` via matrix `M`. - Green arrows show the forward process `y = Mx`. - **Iterative Reconstruction**: - Forward model: `q(k) = Mx̂(k)` (blue arrows). - Backprojection: `u(k) = Mᵀz(k)` (red arrows). - Iterative updates refine estimates `q(k)` and `u(k)` to produce reconstructions `z(k)`. #### Graph (b) - **NMSE Trends**: - **PCM chip (red)**: Converges to ~0.05 NMSE by iteration 10, stabilizes. - **4x4-bit Fixed-point (blue)**: Converges to ~0.1 NMSE, slower than PCM. - **Floating-point (green)**: Converges to ~0.01 NMSE, fastest and most accurate. - **Key Values**: - At iteration 30: - PCM: 0.05 ± 0.01. - Fixed-point: 0.1 ± 0.02. - Floating-point: 0.01 ± 0.005. #### Images - **Original vs. Reconstructed**: - Original: Sharp edges, clear chimney details. - Reconstructed: Slight blurring, reduced contrast in chimney and roof edges. --- ### Key Observations 1. **Algorithm Flow**: - Measurement → Forward model → Backprojection → Iterative refinement. 2. **NMSE Performance**: - Floating-point achieves the lowest NMSE, outperforming fixed-point and PCM. - PCM and fixed-point show similar convergence rates but higher error floors. 3. **Image Quality**: - Reconstructed image retains structural details but lacks fine texture. --- ### Interpretation 1. **AMP Algorithm**: - Demonstrates a two-stage process: measurement acquisition followed by iterative reconstruction using forward/backprojection steps. - The red/blue/green color coding distinguishes measurement (blue) from reconstruction (red) phases. 2. **Computational Trade-offs**: - Floating-point precision yields the best NMSE but may require higher computational resources. - Fixed-point and PCM offer lower precision but are more hardware-friendly. 3. **Image Reconstruction**: - The reconstructed image’s NMSE (~0.01 for floating-point) aligns with the visual quality, suggesting the algorithm effectively balances accuracy and efficiency. 4. **Outliers/Anomalies**: - No significant outliers in NMSE trends. All methods converge monotonically. The data suggests that AMP’s iterative reconstruction improves with computational precision, with floating-point methods providing the most accurate results. The visual comparison confirms that reconstruction fidelity degrades with lower precision methods. </details> × × A very useful in-memory computing primitive enabled by the binary and analog nonvolatile storage capability is matrix-vector multiplication (MVM) [36, 37] . The physical laws that are exploited to perform this operation are Ohm's law and Kirchhoff's current summation laws. For example, to perform the operation Ax = b , the elements of A are mapped linearly to the conductance values of memristive devices organized in a crossbar configuration. The x values are mapped linearly to the amplitudes of read voltages and are applied to the crossbar along the rows. The result of the computation, b , will be proportional to the resulting current measured along the columns of the array. Compressed sensing and recovery are one of the applications that could benefit from an in-memory computing unit that performs matrix-vector multiplications. The objective behind compressed sensing is to acquire a large signal at subNyquist sampling rate and to subsequently reconstruct that signal accurately. Unlike most other compression schemes, sampling and compression are done simultaneously, with the signal getting compressed as it is sampled. Such techniques have widespread applications in the domain of medical imaging, security systems, and camera sensors. The compressed measurements can be thought of as a mapping of a signal x of length N to a measurement vector y of length M < N. If this process is linear, then it can be modeled by an M N measurement matrix M. The idea is to store this measurement matrix in the in-memory computing unit, with memristive devices organized in a cross-bar configuration (see Figure 4(a)). In this manner the compression operation can be performed in O(1) time complexity. To recover the original signal from the compressed measurements, an approximate message passing algorithm (AMP) can be used, using an iterative algorithm that involves several matrix-vector multiplications on the very same measurement matrix and its transpose. In this way the same matrix that was coded in the in-memory computing unit can also be used for the reconstruction, reducing reconstruction complexity from O( MN) to O( N ). An experimental illustration of compressed sensing recovery in the context of image compression is shown in Figure 4(b). A 128x128-pixel image was compressed by 50% and recovered using the measurement matrix elements encoded in a PCM array. The normalized mean square error associated with the recovered signal is plotted as a function of the number of iterations. A remarkable property of AMP is that its convergence rate is independent of the precision of the matrix-vector multiplications. The lack of precision only results in a higher error floor, which may be considered acceptable for many applications. Note that, in this application, the measurement matrix remains fixed and hence the property of PCM that is exploited is the multi-level storage capability. ## 3. Deep learning accelerators <details> <summary>Image 5 Details</summary> ![22c6e600](/v1/image/22c6e600298364122dce57f160c7803d3ab4682ee0fe24073bfaff2632596204) ### Visual Description ## Diagram: Neural Network Processing Pipeline for Image Classification ### Overview The diagram illustrates a technical system for image classification using a neural network architecture. It shows the flow of data from a digital interface through peripheral circuits, a communication network, and finally into a multi-layered neural network that outputs a classification label ("dog"). The system emphasizes modular processing stages with distinct color-coded components. ### Components/Axes 1. **Digital Interface** (Purple block on the far left) - Represents the input source for raw image data 2. **Control Unit** (Small purple block below digital interface) - Coordinates processing between components 3. **Peripheral Circuits** (Green grid structures) - Multiple identical modules arranged in rows - Labeled with "Peripheral circuits" text 4. **Communication Network** (Dashed red lines connecting components) - Connects peripheral circuits to neural network 5. **Neural Network** (Central interconnected node structure) - Three distinct layers: - Input layer (leftmost nodes) - Hidden layers (middle interconnected nodes) - Output layer (rightmost nodes) - Final output labeled "dog" with red arrow 6. **Color Coding** - Purple: Digital interface/control unit - Green: Peripheral circuits - Blue: Communication network elements - Red: Final classification output ### Detailed Analysis - **Input Processing**: Digital interface → Control unit → Peripheral circuits (grid structure suggests parallel processing) - **Data Flow**: - Peripheral circuits connect via communication network (dashed red lines) to neural network - Neural network shows progressive complexity from input to output layers - **Output**: Final node cluster outputs "dog" with directional arrow - **Modular Design**: Repeating peripheral circuit patterns suggest scalable architecture ### Key Observations 1. The system employs a hierarchical processing approach with three distinct stages 2. Peripheral circuits appear to perform feature extraction before neural network processing 3. Communication network uses dashed lines, implying non-direct data transfer 4. Neural network visualization uses standard node-connection architecture 5. Color coding provides clear component differentiation without explicit legend ### Interpretation This diagram represents a distributed computing architecture for image classification: - **Peripheral circuits** likely handle preprocessing tasks (e.g., noise reduction, feature extraction) - **Communication network** coordinates data between processing modules - **Neural network** performs final pattern recognition and classification - The red arrow emphasizing "dog" output highlights the system's purpose: transforming raw image data into semantic labels - Modular design suggests potential for adding more peripheral circuits to handle larger datasets or more complex image types - The absence of explicit timing indicators implies focus on architectural relationships rather than performance metrics </details> Communicationnetwork Figure 5. Deep learning based on in-memory computing. The various layers of a neural network are mapped to a computational memory unit where memristive devices are organized in a crossbar configuration. The synaptic weights are stored in the conductance state of the memristive devices. A global communication network is used to send data from one array to another. Adapted and reproduced with permission [17] , Copyright 2020, Nature Research. Deep neural networks (DNNs), loosely inspired by biological neural networks, consist of parallel processing units called neurons interconnected by plastic synapses. By tuning the weights of these interconnections using millions of labelled examples, these networks are able to perform certain supervised learning tasks remarkably well. These networks are typically trained via a supervised learning algorithm based on gradient descent. During the training phase, the input data is forward propagated through the neuron layers with the synaptic networks performing multiply-accumulate operations. The final layer responses are compared with input data labels and the errors are back-propagated. Both steps involve sequences of matrix-vector multiplications. Subsequently, the synaptic weights are updated to reduce the error. This optimization approach can take multiple days or weeks to train state-of-the-art networks on conventional computers. Hence, there is a significant effort towards the design of custom ASICs based on reduced precision arithmetic and highly optimized dataflow [13, 38] . However, the need to shuttle millions of synaptic weight values between the memory and processing unit remains a key performance bottleneck and hence in-memory computing is being explored as an alternative approach for both inference and training of DNNs [39, 40] . The essential idea is to map the various layers of a neural network to an in-memory computing unit where memristive devices are organized in a crossbar configuration (see Figure 5). The synaptic weights are stored in the conductance state of the memristive devices and the propagation of data through each layer is performed in a single step by inputting the data to the crossbar rows and deciphering the results at the columns. <details> <summary>Image 6 Details</summary> ![dd1a2f51](/v1/image/dd1a2f5128d951c8757a5b21be49d47f250431499d0076d0db71cebe538c4ddd) ### Visual Description ## Line Graph: Test Accuracy vs. Time for Neural Network Training ### Overview The image contains two primary components: 1. A **neural network architecture diagram** (top) depicting a ResNet-based model for CIFAR-10 image classification. 2. A **line graph** (bottom) comparing test accuracy over time for three training methods: FP32 baseline, custom training, and direct mapping of FP32 weights. --- ### Components/Axes #### Neural Network Diagram - **Input**: CIFAR-10 image (3x32x32) - **Layers**: - 6 convolutional layers (Conv) with 3x3x16 filters - 3 ResNet blocks (each with 6 convolutional layers): - Block 1: 3x16x16 → 3x28x28 - Block 2: 3x28x28 → 3x56x56 - Block 3: 3x56x56 → 3x56x56 - Output: Softmax layer (56x10) for label prediction - **Color Coding**: - Yellow: Convolutional layers - Blue: ResNet blocks - Red: ResNet blocks (highlighted in diagram) - Gray: Softmax layer #### Line Graph - **X-axis**: Time (s) on logarithmic scale (10⁻⁵ to 10⁵) - **Y-axis**: Test Accuracy (%) from 60% to 100% - **Legend**: - Dashed gray: FP32 baseline (90% accuracy) - Solid blue: Custom training experiments (90% accuracy) - Red diamonds: Direct mapping of FP32 weights (80–85% accuracy) --- ### Detailed Analysis #### Neural Network Diagram - **Flow**: - Input → 6 Conv layers → ResNet Block 1 → ResNet Block 2 → ResNet Block 3 → Softmax → Label - **Key Details**: - ResNet blocks use residual connections (indicated by red arrows in diagram). - Output dimensions grow from 3x16x16 to 3x56x56 across blocks. #### Line Graph 1. **FP32 Baseline (Gray Dashed Line)**: - Constant at ~90% accuracy across all time scales. 2. **Custom Training (Blue Line)**: - Stable at ~90% accuracy, matching the FP32 baseline. 3. **Direct Mapping of FP32 Weights (Red Diamonds)**: - Starts at ~85% accuracy, dips below 80% at ~10¹ seconds, then recovers to ~80% by 10³ seconds. - Exhibits significant volatility compared to other methods. --- ### Key Observations 1. **Accuracy Stability**: - Custom training and FP32 baseline maintain near-identical accuracy (~90%), suggesting robust performance. 2. **Direct Mapping Limitations**: - Red line shows a 10% accuracy drop relative to the baseline, with erratic fluctuations. 3. **Time Correlation**: - Direct mapping’s performance degradation occurs at intermediate time scales (~10¹–10³ seconds). --- ### Interpretation 1. **Model Architecture**: - The ResNet-based design (with residual connections) likely enables efficient feature extraction, contributing to high accuracy. 2. **Training Method Impact**: - Direct mapping of FP32 weights introduces instability, possibly due to quantization errors or suboptimal weight initialization. - Custom training avoids these issues, maintaining performance parity with the FP32 baseline. 3. **Practical Implications**: - Direct mapping may be unsuitable for production without additional optimization (e.g., fine-tuning). - The FP32 baseline serves as a critical reference for evaluating quantization trade-offs. --- ### Notable Anomalies - **Red Line Dip**: The sharp accuracy drop at ~10¹ seconds suggests a potential instability during mid-training phases for direct mapping. - **Recovery at 10³ Seconds**: Partial recovery implies some adaptation to the training process, but residual performance gaps persist. </details> Time (s) Figure 6. Deep learning inference. Experimental results on ResNet-32 using the CIFAR-10 dataset. The classification accuracies obtained via the direct mapping and custom training approaches are compared to the floating-point baseline. Adapted and reproduced with permission [40] , Copyright 2019, IEEE. Deep learning inference refers to just the forward propagation in a DNN once the weights have been learned. Both binary and analogue storage capability of memristive devices can be exploited for the MVM operations associated with the inference operation. The key challenges are the inaccuracies associated with programming the devices to a specified synaptic weight as well as drift, noise etc. associated with the conductance values [41] . Due to these reasons, the synaptic weights that are obtained by training in high precision arithmetic (e.g. 32-bit floating point) cannot be mapped directly to computational memory. However, it can be shown that by customizing the training procedure to make it aware of these devicelevel nonidealities, it is possible to obtain synaptic weights that are suitable for being mapped to an in-memory computing unit [42,40] . A more recent approach is to use the committee machines of multiple smaller neural networks. The approach shows the promise of increasing inference accuracy without increasing the number of devices by using a committee of smaller neural networks [43] . Figure 6 shows mixed hardware/software experimental results using a prototype multi-level PCM chip. The synaptic weights are mapped to PCM devices organized in a 2-PCM differential configuration (723,444 PCM devices in total). It can be seen that the custom training scheme approaches the floating-point base-line, whereas the direct mapping approach fails to deliver sufficient accuracy. The slight temporal decline in accuracy is attributed to the conductance drift exhibited by PCM devices [44] . However, in spite of the drift, a classification accuracy of close to 90% is maintained over a significant duration of time. Figure 7. Deep learning training. a) Schematic illustration of the mixed-precision architecture for training DNNs. b) The synaptic weight distributions and classification accuracies are compared between the experiments and floating point baseline [45] . <details> <summary>Image 7 Details</summary> ![65230562](/v1/image/6523056266050d047c9f4324f39f70a2114322af9415bc1d1ca881e833eb77dd) ### Visual Description ## Diagram and Chart: High-Precision Unit and Training Accuracy Analysis ### Overview The image contains two components: 1. **Diagram (a)**: A technical flowchart depicting a high-precision unit and computational memory system. 2. **Graph (b)**: A line chart showing training accuracy over epochs for three data series (FP64, PCM model, Experiment). --- ### Components/Axes #### Diagram (a) - **Labels**: - **High-precision unit** (red dashed box): - Forward propagation - Backward propagation - Weight update - Compute ΔW - Accumulate ΔW - Compute p - floor(χ/ε) - **Computational memory** (blue dashed box): - DAC/ADC (Digital-to-Analog/Analog-to-Digital converters) - Programming circuit - **Flow**: - Forward/backward propagation → Weight update → Compute ΔW → Accumulate ΔW → Compute p → floor(χ/ε) → DAC/ADC → Programming circuit. #### Graph (b) - **Axes**: - **X-axis**: Training epoch (0–30, linear scale). - **Y-axis**: Accuracy (%) (95–100, linear scale). - **Legend**: - **FP64** (gray line) - **PCM model** (green line) - **Experiment** (blue diamonds) --- ### Detailed Analysis #### Diagram (a) - **Textual elements**: - "Forward propagation" and "Backward propagation" are labeled with bidirectional arrows. - "Weight update" includes a summation symbol (+) and a subtraction symbol (−). - "Compute ΔW" and "Accumulate ΔW" are connected via a feedback loop. - "Compute p" and "floor(χ/ε)" are linked to the DAC/ADC block. - **Spatial grounding**: - The high-precision unit is on the left, computational memory in the center, and programming circuit on the right. #### Graph (b) - **Data series**: - **FP64** (gray line): Starts at ~95% accuracy, rises to ~99.5% by epoch 30. - **PCM model** (green line): Starts at ~95%, increases to ~98% by epoch 30. - **Experiment** (blue diamonds): Starts at ~95%, fluctuates between ~96–97.5%, converging to ~97.5% by epoch 30. - **Trends**: - FP64 shows the steepest upward slope. - PCM model and Experiment exhibit slower, more gradual improvements. - All series converge toward higher accuracy as epochs increase. --- ### Key Observations 1. **FP64 dominates**: The gray line (FP64) consistently outperforms other series, reaching near-100% accuracy. 2. **PCM model vs. Experiment**: The green line (PCM model) and blue diamonds (Experiment) show similar trends but with lower final accuracy (~98% vs. ~97.5%). 3. **Convergence**: All series plateau near 97–99.5% accuracy by epoch 30, suggesting diminishing returns after initial training. --- ### Interpretation - **Technical implications**: - The high-precision unit (diagram a) likely enables FP64's superior performance by optimizing weight updates and error computation. - The PCM model and Experiment may represent alternative architectures or hardware implementations, with slightly lower efficiency. - **Training dynamics**: - The graph highlights the importance of epoch count for model convergence. FP64 achieves near-optimal accuracy faster than other methods. - **Anomalies**: - The Experiment's fluctuating accuracy (blue diamonds) suggests potential instability or noise in the training process compared to the smoother PCM model. This analysis underscores the role of precision in computational memory systems and their impact on machine learning training efficiency. </details> In-memory computing can also be used in the context of supervised training of DNNs with backpropagation. When performing training of a DNN encoded in crossbar arrays, forward propagation is performed in the same way as inference described above. Next, backward propagation is performed by inputting the error gradient from the subsequent layer onto the columns of the current layer and deciphering the result from the rows. Subsequently the error gradient is computed. Finally, the weight update is performed based on the outer product of activations and error gradients of each layer. This weight update relies on the accumulative behaviour of memristive devices. Recent deep learning research shows that when training DNNs, it is possible to perform the forward and backward propagations rather imprecisely while the gradients need to be accumulated in high precision [ 46 ] . This observation makes the DL training problem amenable to the mixed-precision in-memory computing approach that was recently proposed [ 47 ] . The in-memory compute unit is used to store the synaptic weights and to perform the forward and backward passes, while the weight changes are accumulated in high precision (Figure 7(a)) [ 48 , 49 ] . When the accumulated weight exceeds a certain threshold, pulses are applied to the corresponding memory devices to alter the synaptic weights. This approach was tested using the handwritten digit classification problem based on the MNIST data set. A two-layered neural network was employed with 2-PCM devices in differential configuration (approx. 400,000 devices) representing the synaptic weights. Resulting test accuracy after 20 epochs of training was approx. 98% (Figure 7(b)). After training, inference on this network was performed for over a year with marginal reduction in the test accuracy. The crossbar topology also facilitates the estimation of the gradient and the in-place update of the resulting synaptic weight all in O(1) time complexity [ 50 , 39] . By obviating the need to perform gradient accumulation externally, this approach could yield better performance than the mixed-precision approach. However, significant improvements to the memristive technology, in particular the accumulative behavior, is needed to apply this to a wide range of DNNs [ 51 , 52 ] . Compared to the charge-based memory devices that are also used for in-memory computing [53, 54, 55] , a key advantage of memristive devices is the potential to be scaled to dimensions of a few nanometers [56, 57, 58, 59,60] . Most of the memristive devices are also suitable for back end of line integration, thus enabling their integration with a wide range of front-end CMOS technologies. Another key advantage is the non-volatility of these devices that would obviate the need for computing systems to be constantly connected to a power supply. However, there are also challenges that need to be overcome. The significant intra-device and intra-device variability associated with the SET and RESET states is a key challenge for applications where memristive devices are used for logical operations. For applications that rely on analogue storage capability, a significant challenge is programming variability that captures the inaccuracies associated with programming an array of devices to desired conductance values. In ReRAM, this variability is attributed mostly to the stochastic nature of filamentary switching and one prominent approach to counter this is that of establishing preferential paths for CF formation [ 61 , 62 ] . Representing single computational elements by using multiple memory devices is another promising approach [ 63 ] . Yet another challenge is the temporal and temperature-induced variations of the programmed conductance values. The resistance 'drift' in PCM devices, which is attributed to the intrinsic structural relaxation of the amorphous phase, is an example. The concept of projected phase change memory is a promising approach towards tackling 'drift' [ 64 , 65 ] . The requirements that the memristive devices need to fulfil when employed for computational memory are heavily application dependant. For memristive logic, high cycling endurance ( > 10 12 cycles) and low device-to-device variability of the SET/RESET resistance values are critical. For computational tasks involving read-only operations, such as matrix-vector multiplication, it is required that the conductance states remain relatively unchanged during their execution. It is also desirable to have a gradual analogue-type switching characteristic for programming a continuum of resistance values in a single device. A linear and symmetric accumulative behaviour is also required in applications where the device conductance needs to be incrementally updated such as in deep learning training [ 66 ] . For stochastic computing applications, random device variability is not problematic, but graceful device degradation is highly desirable, as described in [ 67 ]. ## 4. Spiking Neural Networks and Memristors As opposed to the deep learning networks discussed above, spiking neural networks (SNNs) can more naturally incorporate the notion of time in signal encoding and processing. SNNs are typically modelled on the integrate-and-fire behaviour of neurons in the brain. In this framework, neurons communicate with each other using binary signals or spikes. The arrival of a spike at a synapse triggers a current flow into the downstream neuron, with the magnitude of the current weighted by the effective conductance of the synapse. The incoming currents are integrated by the neuron to determine its membrane potential and a spike is issued when the potential exceeds a threshold. This spiking behaviour can be triggered in a deterministic or probabilistic manner. Once a spike is issued, the membrane potential is reset to a resting potential or decreased according to some predetermined rule. The integration is limited to a specific time window, or else a leak factor is incorporated in the integration, endowing the neuron model with a finite memory of past spiking events. Compared to the realization of second-generation deep neural networks (DNNs discussed in the previous section), SNNs can potentially have significant improvements in efficiency. The first reason for this comes from the underlying signal encoding mechanism. The calculation of the output of a neuron involves the determination of the weighted sum of synaptic weights with real-valued neuronal outputs of the previous layer. For a fully connected second generation DNN with 𝑁 neurons in each layer, this requires 𝑁 ! multiplications of real valued numbers, typically stored in low precision representations. In contrast, the forward propagation operation in an SNN only requires addition operations, as the input neuronal signals are binary spike signals. To elaborate, assume that the input signal is encoded as a spike train with duration 𝑇 , with a minimum inter-spike interval of ∆𝑡 . If the probability of a spike at any instant of time is 𝑝 , then on an average 𝑁𝑝𝑇/∆𝑡 spikes have to be propagated through the synapses, and this requires 𝑁 ! 𝑝𝑇/∆𝑡 addition operations. In most modern processors, the cost of multiplication, 𝐶 " , is 3-4 times higher than that of addition, 𝐶 # . Hence, provided the neuronal and synaptic variables required for computation are available in the processor, SNNs offer a path to more efficient computation if the inequality $$C _ { a } p \left ( \frac { T } { \Delta t } \right ) < C _ { m }$$ holds. Hence, it is important to develop algorithms for SNNs that minimize 𝑝 and (𝑇/∆𝑡) to improve computational efficiency. This requires the use of sparse binary signal encoding schemes that go beyond rate coding that is typically used in SNNs today. The following section will discuss strategies to develop general-purpose learning rules for SNNs that satisfy such constraints. The second potential for efficiency improvement of SNNs as compared to second-generation networks arises thanks to novel memory-processor architectures based on memristive devices. While SNNs can be implemented using Si CMOS SRAM or DRAM technologies, the advent of novel nanoscale memristive devices provides opportunities for significant improvements in overall computational efficiency. Figure 8. A cross-bar array based representation of an SNN. Each synaptic weight is represented by the differential conductance of two nanoscale devices in the crossbar. <details> <summary>Image 8 Details</summary> ![e40cd917](/v1/image/e40cd9174020c12b6aba7edc52515fbe690283f4e27cfba2c7dad54cf57d60d2) ### Visual Description ## Diagram: Hybrid Neural Network and Operational Amplifier Circuit ### Overview The image depicts a hybrid system combining a neural network architecture with an operational amplifier (op-amp) circuit. The left side illustrates a multi-layer perceptron (MLP) with weighted connections, while the right side shows a grid of op-amps with feedback gain stages and current outputs. The system appears to model signal processing or analog computation. ### Components/Axes #### Left: Neural Network - **Nodes**: - **Green circles**: Input (`X_j`) and output (`Y_k`) nodes. - **Orange circles**: Hidden layer nodes. - **Connections**: - **Gray lines**: General interconnections between nodes. - **Black line**: Highlighted connection with weight `W_kj` between a hidden node and output node. - **Labels**: - `X_j`: Input vector. - `Y_k`: Output vector. - `W_kj`: Weight matrix between hidden layer `k` and output layer. #### Right: Op-Amp Circuit - **Grid Structure**: - **Columns**: Labeled `V_j` (input voltages) at the top. - **Rows**: Labeled `I_k` (output currents) at the bottom. - **Blocks**: - **Blue squares**: Gain stages labeled `G_kj^+` (positive feedback) and `G_kj^-` (negative feedback). - **Red vertical lines**: Likely power supply rails or signal isolation barriers. - **Green horizontal lines**: Ground or reference potentials. - **Outputs**: - **Yellow circles**: Labeled `I_k`, representing output currents. - **Triangular symbols**: Summing junctions with `+` and `-` inputs, connected to `I_k`. ### Detailed Analysis #### Neural Network - The MLP has **5 input nodes** (`X_j`), **3 hidden layers** (orange nodes), and **1 output node** (`Y_k`). - Weights (`W_kj`) are explicitly labeled on a single connection, suggesting a focus on specific synaptic strength. - No activation functions are depicted, implying a linear model or abstraction. #### Op-Amp Circuit - **Symmetry**: Each column processes `V_j` through **4 gain stages** (2 positive, 2 negative feedback). - **Gain Values**: `G_kj^+` and `G_kj^-` are identical in magnitude but opposite in polarity, indicating balanced feedback. - **Current Outputs**: Three `I_k` currents are summed at the bottom, suggesting aggregation of processed signals. ### Key Observations 1. **Neural Network**: - The highlighted `W_kj` weight implies criticality in the output computation. - No bias terms are shown, simplifying the model. 2. **Op-Amp Circuit**: - Red lines separate columns, possibly indicating independent processing paths. - Green lines connect all `V_j` inputs to ground, ensuring a common reference. 3. **Integration**: - The system bridges digital (neural network) and analog (op-amp) domains, possibly for hybrid computation. ### Interpretation - **Neural Network**: Represents a simplified MLP for pattern recognition or classification, with emphasis on weight `W_kj` as a tunable parameter. - **Op-Amp Grid**: Models a multi-stage amplifier with feedback control, where `G_kj^+` and `G_kj^-` stabilize gain while `I_k` outputs aggregate processed signals. - **Hybrid System**: Suggests an analog implementation of neural network operations, where op-amp stages emulate synaptic weights (`W_kj`) and feedback loops. The `V_j` inputs could represent neuron activations, and `I_k` outputs mimic synaptic currents. ### Uncertainties - Activation functions in the neural network are unspecified. - Exact values for `G_kj^+`, `G_kj^-`, and `W_kj` are not provided. - The purpose of red/green lines (power vs. signal) is inferred but not explicitly labeled. </details> Memristive devices can be integrated at the junctions of crossbar arrays to represent the weights of synapses, and CMOS circuits at the periphery can be designed to implement the neuronal integration and learning logic. As mentioned above, this architecture enables the computation of spike propagation operation in an efficient manner based on Kirchhoff's law as: $$I _ { k } = \sum _ { j } \left ( G _ { k j } ^ { + } - G _ { k j } ^ { - } \right ) V _ { j }$$ In this formula, 𝑉 % denotes the applied voltage pulses that are triggered when an input neuron spikes and are applied to the line connected to the 𝑗 th input neuron, 𝐺 $% & and 𝐺 $% ' are the conductances of the devices configured in a differential configuration to represent the synaptic weight, and 𝐼 $ is the total incoming current into the 𝑘 th output neuron. The small form factor of the devices, coupled with the scalability of operating voltages and currents beyond what is possible with conventional CMOS, suggests that these architectures can have several orders of magnitude efficiency improvement over Silicon based implementations [68,69] . However, apart from the already mentioned non-idealities of memritive devices, crossbar arrays with more than 2048x2048 devices cannot be fabricated and operated reliably due to the resistance drop on the wires and the sneak-paths that corrupt the measurement and programming of synaptic states. One approach to mitigate these issues is to design neurosynaptic cores with smaller crossbars and associated neuron circuits, tile these cores on a 2D array, and provide communication fabrics between the cores [70] . Such tiled neurosynaptic core-based designs are particularly amenable for realizing SNNs, as only binary spikes corresponding to intermittently active spiking neurons need to be transported between cores, as opposed to real-valued neuronal variables that are active for all the neurons in the core in the case of deep learning networks. This is the second inherent advantage that SNNs have over DNNs for computational efficiency improvement. Overcoming the reliability challenges mentioned above is essential for building reliable systems, and would require the co-optimization of algorithms and architectures that are designed to mitigate or leverage these non-ideal behaviours for computation. Two kinds of systems can be visualized based on the application mode. Inference engines, which do not support on-chip learning, can be designed based on memristive devices integrated on crossbars, where the devices are programmed to the desired conductance states based on the weights obtained from software training. However, as memristive devices support incremental conductance changes by the application of suitable electrical programming pulses, it is also possible to design learning systems where network weight updates are implemented on-chip in an event-driven manner [82] . There are also many recent examples where these devices have been engineered to mimic the integration and fire characteristics of biological neurons [71, 72,73], potentially enabling the realization of all-memristor implementations of spiking neural networks [74] . The field is still in its infancy, and so far, has only witnessed small proof-ofconcept demonstrations. We now discuss some of the approaches that have been explored towards realizing memristive based inference-only spiking networks as well as learning networks with SNNs. 4.1. Memristive SNNs for inference . A common approach to develop SNNs is to start with a second-generation ANN trained using traditional backpropagation-based methods, and then convert the resulting network to a spiking network in software. These solutions are based on weight-normalization schemes so that the spike rates of the neurons in the SNN are proportional to the activations of the neurons in the ANN [75, 76] . While this should in principle result in SNNs with comparable accuracies as their second-generation counterparts, some device-aware re-training would typically be necessary when the network is implemented in hardware due to the non-linearity and limited dynamic ranges of nanoscale devices. One of the differentiating features of inference engines is that the nanoscale devices storing state variables are programmed only rarely, compared to the number of reads (potentially at every inference cycle). Since higher-energy programming cycles have a stronger effect in degrading device lifetimes compared to the lower-energy read cycles, this mode of operation can have better overall system reliability compared to that of learning systems. In a preliminary hardware demonstration leveraging this approach, R. Midya et al. used memristors based on SiOxNy:Ag to implement compact oscillatory neurons whose output voltage oscillation frequency is proportional to the input current [77] . In this proof-of-concept demonstration of a 3-layer network, ANN to SNN conversion was limited to the last layer alone, but the approach could be extended to hidden layers as well. 4.2. Memristive SNNs for unsupervised learning and adaptation. Most hardware demonstrations of SNNs using memristive devices have focused on the unsupervised learning paradigm, where the synaptic weights are modified in an unsupervised manner according to the biologically inspired spike timing dependent plasticity (STDP) rule [78] . The rule captures the experimental observation that when a synapse experiences multiple pre-before-post pairings, the effective synaptic strength increases, and conversely, multiple post-before-pre spike pairs result in an effective decrease of synaptic conductance. It should be noted that while other biological mechanisms may also play a key role in learning and memory formation in the brain, as have been observed experimentally [79, 80] , STDP is a simple local learning rule which is especially straight-forward to implement in hardware. While it is possible to implement timing dependent plasticity rules using many-transistor CMOS circuits [81] , it was experimentally demonstrated early on that memristive devices can exhibit STDP-like weight adaptation behaviours upon the application of suitable waveforms [82, 83,84] . Going beyond individual device demonstrations, IBM has also demonstrated an integrated neuromorphic core with 256x256 phase change memory synapses fabricated along with Si CMOS neuron circuits capable of on-chip learning based on a simplified model of STDP for auto-associative pattern learning tasks [85] . Boybat et al. used phase change memristive synapses to demonstrate temporal correlation detection through unsupervised learning based on a simplified form of STDP [86] as shown in Figure 9. In their experiment, a multi-memristive architecture was introduced, where 𝑁 PCM devices are used to represent one synapse, with all devices within a synapse read during spike transmission, but only one of the devices, selected through an arbitration scheme, is programmed to update the synaptic weight. Software equivalent accuracies could be obtained in the experiment with this scheme, although the individual devices are plagued by several common non-ideal effects such as programming non-linearity, read noise, and conductance drift. Note that with 𝑁 = 1 device representing a synapse, the network accuracy was significantly lower than the software baseline; 𝑁 = 7 devices were necessary to obtain close to ideal performance. Spiking networks can also be used for other unsupervised learning [87] and adaptation tasks. Recently, Y. Fang et al. demonstrated that certain optimization problems could be solved driven by the coupled dynamics of ferroelectric field-effect transistor (FeFET) based spiking neurons [88] . While there was no synaptic weight adaptation in this approach, the optimal solution to the problem is determined by the coupled interactions between the neurons which modulate each other's membrane potentials in an event-driven manner. Figure 9 . a) Unsupervised learning demonstration using multi-memristive PCM architecture. The network consists of an integrate and fire neuron receiving inputs from 1000 multi-PCM synapses, with each synapse being excited by Poisson generated binary spike streams. 10% of the synapses receive correlated inputs, while the rest receive uncorrelated inputs. The weights evolve based on the simplified STDP rule shown. b) With N=7 PCM device per synapse, the correlated and uncorrelated synaptic weights evolve to well-separated values, while with N=1, the separation is corrupted due to programming noise. Adapted with permission [86], Copyright 2018, Nature Research. <details> <summary>Image 9 Details</summary> ![68da0e2d](/v1/image/68da0e2dd9b498226d46dbf5b0dc0f3f6742efd0e13802ea9d610f6964846283) ### Visual Description ## Diagram: Neural Network Model with Synaptic Timing-Dependent Plasticity (STDP) ### Overview The image depicts a computational model of a neuron and its synapses, illustrating synaptic plasticity mechanisms. It includes input streams, synaptic processing, neuronal spiking, and STDP dynamics. Two graphs (b) show synaptic weight evolution over time for different input configurations. --- ### Components/Axes #### Diagram (a) - **Input Streams**: Labeled with "Input spike events" (orange dots) and timing parameters (`t_pre`, `t_post`). - **Synapses**: Represented as a vertical column with "Postsynaptic outputs" arrows. - **Neuron**: Contains "Neuronal membrane" and "Threshold and fire" logic. - **STDP Mechanism**: - Time difference: `t_post - t_pre` (x-axis). - Synaptic weight change (ΔW) with arrows indicating potentiation/depression. - Synaptic weight update: `100 μA` (input) and `440 μA` (output) with 950 ns and 100 ns delays. - **Output**: "Neuronal spike events" (orange dots) post-threshold. #### Graphs (b) - **X-axis**: "Experiment time steps (T_S)" (0–300). - **Y-axis**: "Synaptic weight" (0–1.0). - **Legends**: - **Top Graph (N=1)**: - Orange: "Correlated inputs" - Blue: "Uncorrelated inputs" - **Bottom Graph (N=7)**: - Red: "Correlated inputs" - Blue: "Uncorrelated inputs" --- ### Detailed Analysis #### Diagram (a) - **Input Spike Events**: - Pre-synaptic (`t_pre`): 950 ns duration, 440 μA amplitude. - Post-synaptic (`t_post`): 100 ns duration, 100 μA amplitude. - **STDP Dynamics**: - Synaptic weight change (ΔW) depends on `t_post - t_pre`: - Positive ΔW (potentiation) for `t_post > t_pre`. - Negative ΔW (depression) for `t_post < t_pre`. - Threshold firing (`t_post`) triggers neuronal spiking. #### Graphs (b) - **N=1 (Top Graph)**: - **Correlated inputs (orange)**: - Synaptic weight fluctuates between 0.5–1.0, with a sharp drop at T_S=200. - Post-drop stabilization near 0.3. - **Uncorrelated inputs (blue)**: - Stable synaptic weight (~0.2–0.4) with minor fluctuations. - **N=7 (Bottom Graph)**: - **Correlated inputs (red)**: - Gradual increase from 0.3 to 0.7 over 300 T_S steps. - **Uncorrelated inputs (blue)**: - Flat synaptic weight (~0.1–0.2) with no significant change. --- ### Key Observations 1. **STDP Mechanism**: Synaptic weight changes are directly tied to the timing difference between pre- and post-synaptic spikes. 2. **Input Correlation Impact**: - Correlated inputs (N=1 and N=7) show larger synaptic weight changes compared to uncorrelated inputs. - Higher N (N=7) amplifies the effect of correlated inputs, suggesting scalability of STDP with input correlation. 3. **N=1 Anomaly**: Sharp synaptic weight drop at T_S=200 may indicate threshold saturation or reset mechanisms. --- ### Interpretation The model demonstrates how synaptic plasticity (STDP) enables neurons to adapt based on input timing correlations. Correlated inputs drive stronger synaptic weight changes, implying that neural networks prioritize temporally aligned signals for learning. The N=7 graph highlights that increased input correlation enhances plasticity, supporting theories of Hebbian learning ("cells that fire together, wire together"). The N=1 anomaly suggests additional regulatory mechanisms (e.g., synaptic saturation) not explicitly modeled in the diagram. These findings align with biological neural plasticity principles, where correlated activity strengthens synapses, forming the basis for memory and learning. </details> 4.3. Memristive SNNs for supervised learning. Compared to the previous two approaches, implementing supervised learning in SNNs is a more challenging tasks, as the algorithm and the network must generate spikes at precise time instants based on the input excitation. As opposed to the backpropagation algorithm that is highly successful in training ANNs, supervised learning algorithms for SNNs are not well developed yet, due to the inherent difficulty in applying gradient descent methods for spiking neuron models with infinite discontinuities at the instants of spikes. Nevertheless, there have been several demonstrations of supervised learning algorithms for SNNs based on approximate forms of gradient descent for simple fully-connected networks [89,90,91] . Figure 10 . a) SNN supervised learning experiment. A two-layer network is tasked with generating 1000ms long spike streams from the 168 neurons at the output corresponding to the images of the spoken characters. The inputs to the network are 132 spike streams representing the characters subsampled from the output of a Silicon cochlea chip. The weights are modified based on the NormAD learning rule. b) Using multi-PCM synapses, the accuracy of spike placement at the output is about 80%, compared to the FP64 accuracy of close to 98%. [92] <details> <summary>Image 10 Details</summary> ![15eeb749](/v1/image/15eeb749520bd63f3787a0472d8de95c85fdce95d9f03aff337a975565ae0490) ### Visual Description ## Diagram: Spiking Neural Network Processing and Training Performance ### Overview The image contains two primary components: 1. **Part a**: A spiking neural network (SNN) architecture processing subsampled cochlear spikes into desired spike streams. 2. **Part b**: A line graph comparing training accuracy across epochs for three models (FP64, Experiment, PCM). --- ### Components/Axes #### Part a - **Left Panel**: - **Y-axis**: Input channel index (0–120). - **X-axis**: Time (ms, 0–1000). - **Data**: Heatmap of spike activity (red = low rate, yellow = high rate) from a silicon cochlea subsampled. - **Center Diagram**: - **Neural Network**: 132×168 connections (input to output layer). - **Arrows**: Indicate flow from input spikes to output neuron indices. - **Right Panel**: - **Y-axis**: Output neuron index (0–150). - **X-axis**: Time (ms, 0–1000). - **Data**: Desired spike streams (blue dots) and spike rates visualized as images (labeled "IBM" three times). #### Part b - **X-axis**: Training epoch (0–100). - **Y-axis**: Accuracy (%) (20–100). - **Legend**: - **Blue**: FP64 (solid line). - **Red**: Experiment (dashed line with shaded confidence interval). - **Gray**: PCM model (dotted line with shaded confidence interval). --- ### Detailed Analysis #### Part a - **Spike Distribution**: - Spikes cluster in specific input channels (e.g., channels 0–20 show high activity in early time bins). - Subsampling reduces temporal resolution (e.g., sparse spikes in later time bins). - **Neural Network**: - 132 input neurons → 168 output neurons. - Output neuron indices range from 0 to 150, with sparse activation patterns. - **Desired Spike Streams**: - Blue dots represent target spike timings. - IBM images (likely placeholders) suggest categorical spike patterns. #### Part b - **FP64 Model**: - Starts at ~20% accuracy (epoch 0), rises sharply to ~95% by epoch 50, then plateaus. - Confidence interval narrows as training progresses. - **Experiment Model**: - Begins at ~30%, fluctuates between 50–80%, with wider confidence intervals. - Peaks at ~85% by epoch 100. - **PCM Model**: - Starts at ~20%, rises slowly to ~60%, with the widest confidence intervals. --- ### Key Observations 1. **Part a**: - The SNN maps cochlear spike patterns (input) to structured output streams, suggesting temporal coding. - IBM images may represent predefined spike templates for specific tasks. 2. **Part b**: - FP64 outperforms both Experiment and PCM models, indicating higher precision. - PCM model’s lower accuracy and wider confidence intervals suggest instability or approximation errors. --- ### Interpretation - **Neural Network Functionality**: The SNN transforms raw cochlear spike data into discrete output streams, likely for tasks like speech recognition or auditory processing. The IBM images may encode specific phonetic or rhythmic patterns. - **Training Performance**: - FP64’s dominance highlights the importance of numerical precision in training SNNs. - The Experiment model bridges FP64 and PCM, suggesting real-world implementations face trade-offs between accuracy and computational constraints. - PCM’s poor performance implies it may lack the capacity or optimization to handle complex spike dynamics. - **Critical Insights**: - Subsampling cochlear data risks losing temporal resolution, which the SNN partially compensates for via learned spike timing. - The PCM model’s wide confidence intervals indicate high variance in training, possibly due to hardware limitations or simplified architecture. - FP64’s plateau at ~95% suggests near-optimal performance for this task, leaving little room for improvement. - **Anomalies**: - The IBM images in part a are ambiguous; their repetition may indicate a placeholder or error in the diagram. - The Experiment model’s fluctuating accuracy could reflect dataset variability or overfitting. </details> Recently, Nandakumar et al., demonstrated a proof-of-concept realization of supervised learning in a 2-layer SNN implemented using nanoscale phase change memory synapses based on the Normalized Approximate Descent Algorithm [89] . In the experiment, 132 spikestreams representing spoken audio signals generated using a Silicon cochlea chip was used as the input, and the network was trained to generate 168 spike-streams whose arrival times indicate the pixel intensity corresponding to the spoken characters [92] . Compared to normal classification problems in deep networks where the accuracy depends only on the relative magnitude of the response of the output neurons, the SNN problem is harder as the network is tasked with generating close to 1000 spikes at specific time instances over a period of 1250 ms from 168 spiking neurons that are excited by 132 input spike streams. The accuracy for spike placement obtained in the experiment was about 80% compared to the software baseline accuracy of over 98%, despite using the same multi-memristive architecture described earlier. This experiment is hence illustrative of the need for developing more robust and event-driven learning algorithms for SNNs that can mitigate or even leverage the device non-idealities for designing computational systems. ## 4.4. Harnessing Randomness for Learning Noise - from impairment to asset . As discussed in the previous section, the implementation of standard deterministic learning rules, such as STDP or gradient-based schemes like NormAD [89] , may be severely impaired in hardware implementations whose components are inherently noisy. In this section, we explore the idea that, if properly harnessed, native hardware randomness can be an asset for the deployment of training algorithms for SNNs [93, 94] . The gist of the argument is that randomness enables the native implementation of probabilistic models, which otherwise would require the deployment of additional, potentially costly, components. As we elaborate on next, probabilistic models have several advantages over their conventional deterministic counterparts. We focus the discussion on the problem of training, but we will also mention some of the advantages in terms of inference. 4.5. Training deterministic SNN models. Standard Artificial Neural Network (ANN)-based models only account for uncertainty at their inputs or outputs, while the process transforming inputs to outputs is deterministic. While limiting their expressiveness and their capacity to model structured uncertainty [95] , this modelling choice does not cause a problem in the development of learning rules for ANNs. This is because deterministic ANN models define a differentiable input-output mapping as a function of the model weights, enabling the direct derivation of gradient-based learning rules through backpropagation and automatic differentiation. Not so for SNNs. In fact, deterministic spiking neuron models such as Leaky Integrate and Fire (LIF) define non-differentiable functions of the synaptic weights: Increasing or decreasing the synaptic weights of a spiking neuron may cause the membrane potential to cross or step back from the spiking threshold, causing an abrupt change in the output. The derivative with respect to the weights is hence zero except around the firing threshold, where it is undefined. As a result, standard gradient-based learning rules cannot be directly derived for deterministic models of SNNs. A second important issue with conventional gradient-based methods when applied to deterministic SNN models concerns the problem of credit assignment. Discrete-time deterministic SNN models can be interpreted as Recurrent Neural Networks (RNNs) with state defined by the neurons' membrane potential, input currents, and previous spiking behaviours [91] . Accordingly, the outputs and the state transition produced as a function of exogeneous inputs and state depend on the learnable synaptic weights. Therefore, a synaptic weight affects the loss function being optimized via changes that are propagated through the neurons and through time. Assigning credit for changes in the output - which is what is needed to compute the gradient - hence requires to either backpropagate per-output changes through neurons and time or to keep track of per-weight changes in a forward manner through neurons and time [96,97,91] . Both solutions come with significant drawbacks: Backpropagation requires keeping track of forward activations and flowing information backward through time, while forward methods entail the need to memorize per-weight quantities across all neurons. Given the two challenges discussed above - non-differentiability and credit assignment state-of-the-art training methods for SNNs based on deterministic, typically LIF, models follow various heuristic approaches. As discussed in the previous section, the most common class of methods sidesteps both challenges by carrying out an offline conversion from a pretrained ANNs. This makes it impossible to implement online on-chip learning, and it also limits information processing to rate encoding, which encodes information in the spike frequency (e.g. see [75]). A second, popular, approach is to implement biologically inspired local synaptic update rules, such as STDP, that do not require credit assignment. The main downside of these approaches is that they do not optimize specific objective functions sidestepping the problem of non-differentiability - and hence they are difficult to generalize to a variety of tasks and requirements. When focusing on rate encoding, it is possible to overcome to problem of non-differentiability, but not that of credit assignment, by removing non-linearities and working directly with spiking rates, for example with low-pass filtered spike trains [98,89] . In contrast to standard rate encoding, SNNs enable a novel type of information processing that computes with time, rather than merely over it as for ANNs. In order to make use of this unique capability of SNNs, it is necessary to derive learning rules that are capable of processing information encoded in the timing of the spikes and not only in their frequency. The simplest way to do this is to limit the number of spikes per neuron to one, so as to assign a continuous-valued output to each neuron. This allows the derivation of backpropagationbased rules as for ANNs, whereby the neurons' (differentiable) non-linearities capture the relationship between input and output spike timings [99] . More sophisticated methods, allowing for multiple spikes per neuron, are either based on soft non-linearity models [100] or on surrogate gradient methods [91] . The first type of approaches tackles the problem of non-differentiability by approximating the threshold activation function with a differentiable function [100] . As a result, these methods do not preserve the key feature of SNNs of processing and communicating binary spikes. The second class of techniques approximates the derivative of the threshold activation function (but not the function itself) when computing gradients [91] . Both types of methods require backward or forward propagation or the implementation of heuristic credit assignment methods such as random backpropagation [101] . As an example, SuperSpike uses forward propagation to carry out credit assignment over time coupled with random backpropagation for spatial credit assignment [90,91] . We emphasize again that the discussion above focused on the role of randomness in facilitating training. Randomness in SNNs can also be useful in the inference phase to enable Gibbs sampling-based Bayesian inference strategies [93, 102] . 4.6. Probabilistic SNN models. Among their key advantages, probabilistic models allow the direct encoding of domain knowledge in the graph of connections among the constituent variables - a key feature of so-called expert systems - and the modelling of uncertainty [103] . They can also account for complex multi-modal distributions, unlike their deterministic counterparts [104] . Finally, stochastic models, even for ANNs, can both improve generalization, as in dropout regularization, and facilitate exploration of the training space [105] . Training of probabilistic models is generally conceptually more complex than for deterministic models due to the need to account for the exponentially large space of values that the hidden stochastic units can take. Note, however, that probabilistic models have provided the framework used to develop the first deep learning algorithms for ANNs in [106] through Boltzmann machines. Early training methods for general (undirected) models used Gibbs sampling or mean-field approximation, requiring an expensive cycling through the variables one at a time [107,108] . More modern approaches leverage advanced forms of approximate learning and inference via (Generalized) Expectation Maximization, Monte Carlo methods, and variational inference [106,104,109,110] . Probabilistic models for SNNs can be thought as direct extensions of the belief networks studied in [107,106,105] from static to dynamic models. As in belief networks, a neuron spikes probabilistically with a probability that increases with its membrane potential. In belief networks, the membrane potential of a neuron is an instantaneous function of the current spikes emitted by the incoming neurons in neuron's fan-in. In contrast, in an SNN, the membrane potential of a neuron evolves over time as for LIF models as a function of the past spiking behaviour of the neuron itself and of the neurons in its fan-in (see [111] for a review). 4.7. Training probabilistic SNN models. For the development of training rules, probabilistic SNN models have the fundamental advantage over their deterministic counterparts that the probability of the neurons' outputs is a differentiable function of the model parameters, including the synaptic weights. Many learning criteria can be formulated as the average over such distribution of a given loss or reward function. Specifically, in supervised and unsupervised learning, the learning problem can be formulated as the minimization of a loss function averaged over the joint distribution of data and of specific neurons in a read-out layer [112,111] ; and in reinforcement learning, the goal is to minimize an average reward function dependent on the behaviour of the neurons in the readout layer [113] . Unlike deterministic SNN models, probabilistic SNN models hence allow naturally for the definition of differentiable learning criteria. Once a learning criterion is determined based on the problem under study, training can be carried out via stochastic gradient-based rules. The key novel challenge in deriving such rules is the need to differentiate over the distribution of the neurons' outputs. Mathematically, with deterministic models, one needs to differentiate a training criterion of the type $$L _ { d } ( \theta ) = E _ { X \sim D } [ f _ { \theta } ( X ) ] ,$$ where the expectation is taken over the empirical distribution 𝐷 of the data and the model parameter 𝜃 directly affects the learning criterion 𝑓 , (𝑋) through the input-output function of the network. In contrast, with probabilistic models, the relevant learning criterion is of the type $$L _ { p } ( \theta ) = E _ { X \sim D } [ E _ { Y \sim P _ { \theta } } [ f ( X , Y ) ] ] ,$$ in which 𝑌 represents the random output of the neurons. Note that unlike the standard deterministic approach, the model parameters affect the learning performance through the distribution of the random output of the neurons. Maximization of the criterion above can be in principle carried out via Expectation Maximization. In practice, the intractability of Bayesian inference of the hidden neurons entails the need for approximate solutions based on sampling methods and gradient-based techniques [104] . Computing stochastic gradients of 𝐿 -(𝜃) requires a double empirical expectation, one over the data distribution 𝐷 and one over the output distribution 𝑃 , . Estimators based on such samples can be derived by following one of a variety of principles, yielding different statistical properties in terms of, e.g., bias and variance [114] . While a number of techniques attempt to reuse the standard backpropagation algorithm, e.g., the 'Straight-Through' estimator [105] , an approach that is more suitable for the implementation of SNNs is obtained via the score, or log-likelihood, or REINFORCE method and variations thereof (see [104,109,110]). Accordingly, for given data and neurons' output samples the gradient with respect to a synaptic weight can be estimated through the correlation between the accrued loss function over time and the log-probability of the realized output for a given sample (𝑋, 𝑌) , i.e., (somewhat informally) $$\nabla _ { \theta } L _ { p } ( \theta ) \approx f ( X , Y ) \nabla _ { \theta } \log ( P _ { \theta } ( Y ) ) .$$ Intuitively, the higher the loss is, the more the negative gradient should push away from output distributions that generate such disadvantageous samples 𝑌 . Various improvements of the statistical properties of this estimator are reviewed in [114]. The REINFORCE gradient estimate ∇ , 𝐿 -(𝜃) highlights not only the direct differentiability of generic learning criteria but also the fact that probabilistic learning rules solve the credit assignment problem by not requiring any form of backpropagation [105] . In contrast, a gradient-based rule that uses ∇ , 𝐿 -(𝜃) only requires all nodes to receive a global feedback signal 𝑓(𝑋, 𝑌), which may be computed by a central node [111] . The resulting learning procedure follows the standard three-factor rule from theoretical computer science, whereby the synaptic weights are modified based on pre- and post-synaptic recent spikes, which are locally available at each neuron, and on a global feedback signal [111] . Accordingly, the rule can be easily implemented in an online streaming fashion. 4.8. Generalized probabilistic SNN models. Apart from the advantages described above in terms of differentiability and credit assignment, probabilistic models can be directly extended with minor conceptual and algorithmic difficulties in various directions. First, it is possible to directly derive - technically, by selecting a categorical instead of a Bernoulli distribution in a Generalized Linear Model (GLM) for SNNs - training rules that allow for multi-valued spikes or inter-neuron instantaneous connections or, equivalently, Winner Take All (WTA) circuits [115, 102] . This is particularly important since data produced by some neuromorphic sensors incorporates a sign to indicate a positive or negative change [116] . Multi-valued spikes can also be used for time compression [117] . Second, various decoding rules, such as first-to-spike, can be directly optimized for, instead of having to rely on surrogate target spiking sequences [118] . Third, probabilistic models can provide an estimate of the uncertainty on the trained weights by means of Bayesian Monte Carlo methods [115] . Before describing some applications of the models and learning rules reviewed above, we mention briefly here alternative probabilistic formulations for SNNs. In the models discussed above, randomness is defined at the level of neurons' outputs. Alternative models introduce randomness at the level of synapses or thresholds [119,120] . 4.9. Examples. Once an SNN is trained, it can be used as a sequence-to-sequence mapper in order to solve supervised, unsupervised, and reinforcement learning problems. Alternatively, with specific choices of the synaptic kernels and memory, the SNN can be used as a Gibbs sampler to carry out Bayesian inference with outputs encoded in the spiking rates [93, 102] . We now briefly discuss three applications that fall in the first category, one concerning supervised learning, one reinforcement learning, and one federated learning. <details> <summary>Image 11 Details</summary> ![0947b48d](/v1/image/0947b48d28a7b87d3fbd7fb57e23defad90884d0c919d611a5d943a101bb8b1e) ### Visual Description ## Line Graphs: Error (MAE) and Number of Spikes vs ΔT ### Overview The image contains two side-by-side line graphs comparing the performance of "rate encoding" and "time encoding" across varying ΔT values (3 to 15). The left graph measures error (MAE), while the right graph tracks the number of spikes. Both graphs use distinct color-coded lines for the two encoding methods. ### Components/Axes - **Left Graph (Error - MAE)**: - **X-axis**: ΔT (values: 3, 5, 7, 10, 15) - **Y-axis**: Error (MAE) ranging from 0.04 to 0.10 - **Legend**: - Blue line: "rate encoding" - Orange line: "time encoding" - **Right Graph (Number of Spikes)**: - **X-axis**: ΔT (same values as left graph) - **Y-axis**: Number of spikes (0 to 4) - **Legend**: - Blue line: "rate encoding" - Orange line: "time encoding" ### Detailed Analysis #### Left Graph (Error - MAE) - **Rate Encoding (Blue)**: - ΔT=3: ~0.085 - ΔT=5: ~0.082 - ΔT=7: ~0.075 - ΔT=10: ~0.078 - ΔT=15: ~0.078 - **Trend**: Gradual decline until ΔT=7, then plateaus. - **Time Encoding (Orange)**: - ΔT=3: ~0.090 - ΔT=5: ~0.085 - ΔT=7: ~0.070 - ΔT=10: ~0.060 - ΔT=15: ~0.055 - **Trend**: Steeper decline than rate encoding, stabilizing at lower values. #### Right Graph (Number of Spikes) - **Rate Encoding (Blue)**: - ΔT=3: ~1.0 - ΔT=5: ~1.5 - ΔT=7: ~2.5 - ΔT=10: ~3.5 - ΔT=15: ~4.0 - **Trend**: Linear increase with ΔT. - **Time Encoding (Orange)**: - ΔT=3: ~1.0 - ΔT=5: ~1.1 - ΔT=7: ~1.1 - ΔT=10: ~1.2 - ΔT=15: ~1.2 - **Trend**: Minimal increase, remaining near baseline. ### Key Observations 1. **Error Reduction**: Time encoding achieves lower MAE than rate encoding across all ΔT values, with a more pronounced improvement at higher ΔT. 2. **Spike Count**: Rate encoding correlates with a linear increase in spikes, while time encoding maintains a near-constant low spike count. 3. **Trade-off**: Time encoding reduces error significantly but does not increase spikes, suggesting efficiency. Rate encoding sacrifices error reduction for higher spike counts. ### Interpretation The data suggests that time encoding is more effective at minimizing error (MAE) without increasing computational spikes, making it preferable for applications prioritizing accuracy. Rate encoding, while less efficient in error reduction, may be necessary in scenarios where higher spike counts are tolerable or beneficial. The divergence in trends highlights a potential design choice between accuracy and resource utilization. </details> △T △T Figure 11. Test error and number of spikes as a function of the time expansion parameter defining source encoding from natural signals to spikes. Reproduced with permission [ 111 ] , Copyright 2019, IEEE . In order to first illustrate the potential of probabilistic SNNs trained to process time-encoded information, in Figure 11, we consider an online sequence prediction problem in which samples of a discrete-time source are converted into spiking signals with ∆𝑇 time instants for each sample of the input source. We consider two types of encoding, one based on standard quantization and rate encoding, and one based on the time encoding via Gaussian receptive fields. The figures, fully detailed in [111], demonstrate that time encoding can vastly outperform rate encoding both in terms of accuracy and in terms of number of spikes, which is a proxy for energy consumption. Second, we consider a standard reinforcement learning task, in which a probabilistic SNN is used as a stochastic policy. Figure 12 compares the performance as a function of the resolution of the input grid representation for a policy directly trained with a first-to-spike decoder and one that is instead converted using state-of-the-art methods from a pre-trained ANN. The results clearly validate the intuition that directly training the stochastic policy as an SNN is more efficient than using ANN-to-SNN conversion. Figure 12 Time steps to reach goal and spikes per episode for a grid world reinforcement learning task. Reproduced with permission [113], Copyright 2019, IEEE. <details> <summary>Image 12 Details</summary> ![89c75e20](/v1/image/89c75e203cfb6634504d839f7c3e129cf2120fe6a712ec86c6fcb992cfb1b295) ### Visual Description ## Line Chart: Performance Metrics of Neural Network Architectures ### Overview The image contains two line charts comparing performance metrics (time-steps to reach goal and spikes per episode) across different neural network architectures (Direct training prob. SNN, ANN-to-SNN conversion, ANN) and grid sizes (4x6, 6x4, 20x2, 70x1). Logarithmic scales are used for both y-axes. ### Components/Axes **Top Chart:** - **X-axis**: "number of input neurons" with categories: - 4(6×6) - 6(4×4) - 20(2×2) - 70(1×1) - **Y-axis**: "time-steps to reach goal" (log scale: 10³ to 10⁶) - **Legend**: - Green: Direct training prob. SNN - Black: ANN-to-SNN conversion - Orange: ANN - **Line styles**: - Solid: T = 10 - Dashed: T = 60 - Dotted: T = 8, τₛ = 4 **Bottom Chart:** - **X-axis**: Same categories as top chart - **Y-axis**: "spikes per episode" (log scale: 10² to 10⁵) - **Legend**: Same as top chart - **Line styles**: Same as top chart ### Detailed Analysis **Top Chart Trends:** 1. **Direct training prob. SNN (green)**: - Starts at ~10⁵ time-steps for 4(6×6) grid - Drops sharply to ~10³ for 70(1×1) - Follows a steep downward slope 2. **ANN-to-SNN conversion (black)**: - Starts at ~10⁴ for 4(6×6) - Declines to ~10² for 70(1×1) - Slope less steep than green line 3. **ANN (orange)**: - Starts at ~10³ for 4(6×6) - Declines to ~10¹ for 70(1×1) - Flattest slope among all lines **Bottom Chart Trends:** 1. **Direct training prob. SNN (green)**: - Starts at ~10³ spikes/episode for 4(6×6) - Drops to ~10² for 70(1×1) - Slight upward curvature at 20(2×2) 2. **ANN-to-SNN conversion (black)**: - Starts at ~10² spikes/episode for 4(6×6) - Declines to ~10¹ for 70(1×1) - Nearly flat after 20(2×2) 3. **ANN (orange)**: - Starts at ~10¹ spikes/episode for 4(6×6) - Declines to ~10⁰ for 70(1×1) - Most consistent decline ### Key Observations 1. **Efficiency hierarchy**: ANN > ANN-to-SNN > Direct training prob. SNN in both metrics 2. **Grid size impact**: Performance improves dramatically as grid complexity decreases (4(6×6) → 70(1×1)) 3. **T parameter effect**: - T=8, τₛ=4 (dotted lines) shows optimal performance - T=60 (dashed lines) performs worst 4. **Logarithmic scale significance**: Exponential improvements appear linear on these plots ### Interpretation The data demonstrates that ANN architectures consistently outperform other methods in both time efficiency and computational resource usage. The most efficient configuration (T=8, τₛ=4) achieves near-instantaneous goal attainment with minimal spikes, suggesting optimal neural network design for this task. The logarithmic scale reveals exponential improvements in performance as grid complexity decreases, with ANN methods maintaining efficiency across all scales. The ANN-to-SNN conversion method shows intermediate performance, indicating potential for optimization in hybrid approaches. Direct training prob. SNN requires the most resources, suggesting biological plausibility trade-offs in spiking neural network implementations. </details> Finally, we consider the potential of SNN for on-mobile training via Federated Learning (FL). The approach is motivated by the fact that training on a device is limited by the amount of data available at it. Cooperative training can be carried out through FL as explored in [121], where an online FL-based learning rule is introduced for networked on-mobile probabilistic SNNs. As seen in the Figure 13 through sufficiently frequent inter-device communication, with a communication round occurring every 𝜏 iterations, the scheme demonstrates significant advantages over separate on-mobile training. Figure 13 Test loss versus number of training iterations with inter-device communication taking place every 𝜏 iterations. <details> <summary>Image 13 Details</summary> ![c97dab7f](/v1/image/c97dab7f9b40d0fd97b395f8ef028593f0ff2787c2bd98a65d5654322820b4cf) ### Visual Description ## Line Graph: Normalized Mean Test Loss vs. Global Algorithmic Time ### Overview The graph depicts the evolution of normalized mean test loss over global algorithmic time for two parameter settings (τ=8 and τ=400). Two distinct trajectories are shown, with τ=8 (blue) demonstrating faster convergence and lower final loss compared to τ=400 (red). ### Components/Axes - **X-axis**: "global algorithmic time t" (0 to 1200, linear scale) - **Y-axis**: "normalized mean test loss" (0.95 to 1.15, linear scale) - **Legend**: - Blue line: τ=8 (bottom-left) - Red line: τ=400 (center-right annotation) - **Key markers**: - τ=8 line starts at (0, ~1.15) - τ=400 line starts at (0, ~1.05) - Final τ=8 value at t=1200: ~0.94 - Final τ=400 value at t=1200: ~0.97 ### Detailed Analysis 1. **τ=8 (Blue Line)**: - Initial value: 1.15 at t=0 - Sharp decline to 0.95 by t=200 - Stabilizes with minor fluctuations (~0.94-0.96) after t=400 - Final value: ~0.94 at t=1200 2. **τ=400 (Red Line)**: - Initial value: 1.05 at t=0 - Gradual decline to ~0.97 by t=400 - Persistent oscillations between 0.96-1.02 after t=400 - Final value: ~0.97 at t=1200 ### Key Observations - τ=8 achieves 12% lower final loss than τ=400 - τ=8 converges 2x faster (reaches 0.95 by t=200 vs t=400 for τ=400) - τ=400 exhibits 3x more volatility (peak-to-trough range: 0.96-1.02 vs τ=8's 0.94-0.96) - Both lines show asymptotic behavior toward lower loss values ### Interpretation The graph demonstrates an inverse relationship between τ and both convergence speed and final performance. The τ=8 configuration suggests optimal parameter selection for this algorithm, achieving rapid stabilization and superior final loss. The τ=400 trajectory indicates potential overparameterization or suboptimal tuning, resulting in prolonged convergence and higher residual error. The persistent oscillations in the τ=400 case may indicate sensitivity to initial conditions or local minima trapping. These findings highlight the importance of parameter optimization in algorithmic design, with τ=8 representing a Pareto-optimal balance between computational efficiency and performance. </details> 4.10. Algorithmic and hardware co-design. To sum up the discussion in this section, spikebased learning and inference are promising facets of the neuromorphic computing paradigm. Unlike conventional machine learning models, spike-based processing "computes with time, not in time". As we have discussed, the main advantage is a potentially massive increase in power efficiency. In this section, we have presented a review of algorithmic models that leverage stochastic behavior for the implementation of SNNs. While it is true that spike-based computing can be implemented in CMOS technology, there is a great deal to be gained from compact nano-scale implementations of fundamental functional blocks - spiking neurons and adjustable synapses-- in terms of scalability and power-efficiency. Memristors are much better suited to emulate, and not merely simulate many of the sought functionalities. Moreover, the implementation of probabilistic models on current hardware platforms is made difficult by the lack of randomness sources in such systems. In contrast, the inherent randomness of switching processes in memristive devices could provide a source of randomness "for free". Research in spike-based computing is a fast-growing field. We believe that developing better-suited hardware platforms would accelerate the progress of co-designed spike-based learning and inference machines. Memristors may be the missing piece that will unlock the potential of spike-based computing. ## 5. Future of neuromorphic and bio-inspired computing systems Taking a 'big picture' view, current AI, and machine learning methods in particular, have achieved astonishing results in every field they have been applied to and have become or are becoming standard tools for nearly every type of industry one can think of. This impressive invasion was mainly propelled by deep learning which is loosely inspired by biological neural networks. Deep learning primarily refers to learning with artificial neural networks of many layers, and fundamentally is not different to what we know about that field in the 90's. Indeed, the key algorithm underlining the success of deep learning, backpropagation, is an old story: 'Learning with back-propagating errors' by Rumelhart, Hinton and Williams was published in 1986 [122] . The most commonly used neural networks are feedforward neural networks, and, convolutional neural networks used for image processing can be seen as inspired by our visual system, and both of these are not very new concepts. Backpropagation is perhaps the most fundamental method we can think of for parameter optimisation. It is derived by differentiating an error function with respect to the learnable parameters, so in some ways it is not entirely surprising that the algorithm existed for many years. What might be somehow surprising is that we have not been able to move away much from this idea. While there has been recent progress, much of it consisted of relatively small additions and tweaks, for instance new ways to address the so called 'problem of the vanishing gradient', the deterioration of the error signal as is backpropagated from the output to the input of the network. Undoubtedly, there were some fundamentally different architectures, smart techniques and novel analyses but arguably, the key factor behind such a success seems to be the vast availability of data and computational power. In fact, recent advances of the neuroscience community are not present in the neural networks. We do not want to argue that this, per se, is either good or bad, or to suggest that the next super-algorithms will be copying nature. We only want to underline that though inspired only, artificial neural networks had their basis on neuroscience concepts and that there are many phenomena that have, perhaps, not been sufficiently explored within an AI context. For instance, biological neural networks have different learning rules for positive and negative connections, connections change in multiple time scales and show reversable dynamic behavior (known as short-term plasticity), and the brain itself has a structure where specific areas play different roles, just to name a few. Instead, our progress was mainly based on hardware improvements that made this success possible by allowing long training phases; an amount of training unrealistic for any human. While it is true that human intelligence also develops over years and that human learning involves many trials, for comparison AlphaGoZero, which surpass human performance in the game of Go, was trained over 4.9 Million games 123 . To match this number of games would require a human that lives for 90 years to complete one Go game every 10 minutes from the moment they are born. This realization tells us two things: (1) our machines do not learn the same way that humans do, and even if we think our methods as bioinspired, we likely still miss some key ingredients and (2) executing that many games certainly require considerable computational power and energy consumption. As a consequence, training algorithms often require a high energy footprint due to excessive training times and hyper parameter tuning involved. Hyper-parameters are parameters of the system that are not (usually) adapted via the learning method itself, one such example is the learning rate, which indicate how fast the network should update its 'knowledge'. Before rushing to say that a high learning rate is obviously desirable, such a learning rate could lead to oscillations as, for instance, optimal solutions could be overlooked, or it could lead to forgetting previously obtained knowledge. Setting the learning rate right is not always trivial. In fact, the tuning of hyper-parameters was what originally made the machine learning community to turn away from artificial neural networks, and it was the performance of deep learning that brought the focus back. One may then wonder, at the end of the day how much energy inefficient could deep learning systems be? The reply is perhaps surprising: estimated carbon emissions for training standard natural language processing models is approximately five times higher than running a car for a lifetime [124] . This realization suggests there is an urgent need to improve on both current hardware and learning models. Given such energy concerns, systems based on low-power memristive devices are a highly promising alternative [125,126] . Besides having a low carbon footprint, there is numerus work demonstrating devices that mimic neurons, synapses, and plasticity phenomena. Often such approaches work well for offline training. However, some of these attempts, particularly where plasticity is involved, are opportunistic (including own work) and how scaling to larger networks could happen is not always obvious. Faithfully reproducing the brain functionality, when neuroscience has already so many open questions is challenging for any technology. Moreover, using technologies that potentially allow less possibilities for engineering in comparison to traditional methods (such as CMOS) might well be mission impossible. How far can we go by reconstructing neuron by neuron and synapse by synapse in terms of scalability remains unclear. A more promising way might be to achieve a deeper understanding of the physics of the relevant materials and based on this understanding codevelop the technology and the required learning methods for achieving Artificial Intelligence. <details> <summary>Image 14 Details</summary> ![ee7f1b99](/v1/image/ee7f1b99aaf8c42d976857ed56fea12da9ffd5b3f0f613b0cca6902897274e01) ### Visual Description ## Diagram: Reservoir Computing Architecture ### Overview The diagram illustrates a reservoir computing architecture, a type of recurrent neural network (RNN) structure. It consists of three primary components: an input layer, a dynamic reservoir layer, and an output layer. Arrows indicate directional flow and connections between components, with annotations specifying fixed vs. trainable parameters. ### Components/Axes 1. **Input Layer**: - **Nodes**: Purple circles labeled "Input x(t)" (time-dependent input). - **Connections**: Black arrows with red dashed outlines connecting inputs to the reservoir. - **Annotations**: "Fixed" (weights between input and reservoir are static). 2. **Dynamic Reservoir Layer**: - **Nodes**: Orange circles labeled "Internal states r(t)". - **Connections**: Red arrows forming a dense, recurrent network (self-connections and inter-node connections). - **Annotations**: "Internal states" (dynamic, time-dependent processing). 3. **Output Layer**: - **Nodes**: Teal circles labeled "Output y(t)". - **Connections**: Black arrows with red dashed outlines from the reservoir to outputs. - **Annotations**: "Trainable weights" (weights between reservoir and output are optimized during training). 4. **Legend/Annotations**: - **Color Coding**: - Purple: Input layer. - Orange: Reservoir nodes. - Teal: Output layer. - Red: Reservoir internal connections. - Black: Input/output layer connections. - **Key Labels**: - "Fixed" (input-reservoir weights). - "Trainable weights" (reservoir-output weights). ### Detailed Analysis - **Input Layer**: - Receives time-series data `x(t)`. - Fixed weights ensure the input pattern is preserved without modification during training. - **Reservoir Layer**: - Processes inputs through chaotic, high-dimensional dynamics (`r(t)`). - Dense recurrent connections (red arrows) enable complex temporal pattern learning. - No explicit bias terms or activation functions labeled. - **Output Layer**: - Maps reservoir states to desired outputs `y(t)`. - Trainable weights allow the network to learn task-specific mappings from the reservoir's dynamics. ### Key Observations 1. **Fixed vs. Trainable Weights**: - Input-reservoir connections are static ("Fixed"). - Reservoir-output connections are optimized ("Trainable weights"). 2. **Reservoir Dynamics**: - The chaotic, high-connectivity structure suggests a "black box" for feature extraction. - No explicit time-delay elements or gating mechanisms (e.g., LSTM/GRU) are shown. 3. **Output Structure**: - Output nodes are directly connected to reservoir states, implying linear readout (common in reservoir computing). ### Interpretation This architecture demonstrates the core principles of reservoir computing: - **Fixed Input Weights**: Preserve input patterns while allowing the reservoir to learn temporal dynamics. - **Trainable Output Weights**: Enable task-specific learning by adjusting the readout layer. - **Reservoir Complexity**: The dense, recurrent connections create a high-dimensional, chaotic space for feature representation. The diagram emphasizes the separation of timescale learning (reservoir dynamics) and parameter optimization (output weights), a hallmark of reservoir computing's efficiency in handling sequential data. The absence of explicit activation functions or regularization terms suggests a focus on the theoretical framework rather than implementation details. </details> Fixed Figure 14 Reservoir Computing maps inputs x(t) to higher-dimensional space, defined by the reservoir states r (t). Only weights connecting reservoir states r (t) and output y (t) need to be trained. In the meantime, in parallel, we can immediately explore simple bio-inspired approaches that harness the dynamics of the material and could be proven useful for particular sets of problems. Here we present one such example which stems from the area of reservoir computing, an idea invented separately by Herbert Jeager for the machine learning community [ 127 ] , under the name of echo state networks, and by Wolfgang Mass [ 128 ] for the computational neuroscience community, under the name of liquid state machines. We strongly suspect that both these methods were very much motivated by the difficulty of training recurrent networks with a generalization of backpropagation known as backpropagation through time. While feedforward networks can perform many tasks successfully, recurrences are required for memory and, moreover, the brain is clearly not only feedforward. If recurrences exist and are required, there must be a way to efficiently train such structures. As a side note, it is very difficult to imagine how a biological neural network would be able to implement backpropagation through time, and for this alternative approaches have recently made their appearance [ 129 ] . Reservoir computing methods came up with a workaround to the problem of training recurrent networks: they do not train them but instead harness their properties. Common in the approaches of echo state networks and liquid state machines is the idea of using a randomly recurrent network with fixed connectivity, hence no need to resort to backpropagation through time. This recurrent network is called a reservoir. It provides memory and at the same time transforms the input data to a spatiotemporal representation of higher dimensionality. This enhanced representation can be used as an input to single-layer perceptrons, that are trained with a very simple learning method, so the only learnable parameters are the feedforward weights between the reservoir neurons and the output neurons. The key difference between the echo state networks and liquid state machines, is that the first approach uses recurrent artificial neuron dynamics while the second uses recurrent spiking neural networks, reflecting the mindset in their corresponding communities. The main principle of reservoir computing is shown in Figure 14. The input x (t) is projected into the higher dimensional feature space r (t) by using the dynamical reservoir system. Only the weights connecting the internal states r (t) with the output y (t) need to be trained, while the rest of the system is fixed. The advantage of this approach is that it only requires a simple training method, while the ability to process complex and temporal data is retained. Indeed, it might be surprising how much randomness can do from the point of computation: a random network can enrich data representations sufficiently so that a linear method can separate the data into the desirable classes. This approach is conceptually similar to the wellknown method of support vector machines, which uses kernels to augment the dimensionality of the data, so that again only a simple linear method is sufficient to achieve data classification. In fact, a link between the purely statistical technique of support vector machines and the bioinspired technique of reservoir computing has been formally built [130] . We can perhaps think of this link as a demonstration that biological inspiration and purely mathematical methodology might also solve problems in a similar manner. We claim that reservoir computing would benefit from appropriate hardware. When simulating, the convergence of the recurrent network requires time, because the continuous system will be discretized and sequentially run on the CPU. If instead we replace the reservoir with an appropriate material, this step could become both fast and energy efficient: the material could compute effortlessly using its physical properties. Reservoirs do not need overengineering, since no specific structure is required; we only need to produce dynamics that are complex enough but not chaotic. In fact, there has already been work exploiting memristors in this direction [131] . Could ideas from biology still add value to existing methods? A recent augmentation of the echo state networks [132] , inspired by the fruitfly brain, explores the concept of sparseness in order to improve learning performance of reservoirs. In brains, contrary to the typical artificial neural networks, only few neurons fire at a time, a fact that has been linked to memory capacity. Neuronal thresholds, appropriately initialized and updated with a slower time constant than that of the feedforward learnable weights can modulate sparseness and lead to better performance in comparison to the non-sparse reservoir, but also in comparison to stateof-the-art methods in a set of benchmark problems. Due to the sparseness leading to taskspecific neurons, this bio-inspired technique can alleviate the problem of catastrophic forgetting. Machine learning methods often suffer from the fact that once they learn a new task they have forgotten the previous one. Since in the space reservoir network a new task will likely recruit previously unused neurons, leaning a new skill does not completely override those previously learned. This simple method competes and surpasses more complicated methods that are built specifically to address catastrophic forgetting. Most importantly, the formulation of the specific rule allows for completely replacing the network dynamics with any other dynamics, including material dynamics, that are suitable for the purpose (i.e. highly nonlinear and not chaotic). Perhaps there are more such lessons to be learned from biology. So, what can be done right now? To us it is clear that a better understanding of the physics behind memristive devices is key for the progress of the field [133] . A deeper understanding will allow us to harness the properties of the system for brain-like computation rather trying to fabricate some arbitrary brain behavior that may or may not be important in the context of a specific application, or worse may not scale up. Instead of thinking at the level of mimicking neurons and synapses, we can instead take inspiration from the biological systems, consider the dynamics required for neuronal processing and use the material physics to reproduce them. ## 6. Conclusion Memristor technologies are still to realise their full potential that has been promoted over the last 15 years. Although predominantly seen as candidates to replace or augment our current digital memory technologies, the impact of memristor technologies on the broader fields of artificial intelligence and cognitive computing platforms are likely to be even more significant. As discussed in this progress report, the versatility of memristor technologies has resulted in their use across a range of applications: from in-memory computing, deep learning accelerators, spiking neural networks, to more futuristic bio-inspired computing paradigms. These approaches should not be seen as solutions to the same problem, nor as technologies that are in direct competition among themselves or with current, very successful, CMOS systems. Additionally, it is crucial to recognise that many of the discussed research areas are still at the very beginning of their development. Of these, more mature approaches will likely produce industrially relevant solutions sooner. For example, greater power-efficiency is an essential utility and a pressing issue that many engineers are trying to address. In-memory computing and deep learning accelerators based on memristors represent an attractive proposition for extreme power-efficiency. There is also significant scope for more fundamental work. Development of new generations of bio-inspired algorithms would further boost advancements in hardware systems and platforms. The challenge and opportunity lie in the interdisciplinary nature of the research and the necessity to understand distinct methodologies and approaches. We believe that the community will benefit from the next generation of researchers being well educated across different traditional disciplines. For example, there is an undeniable link between the fields of computer science, more specifically machine learning, and computational neuroscience. The two disciplines could co-exist separately and act independently with distinct goals; however, there are great benefits to be gained from a more holistic approach. A strong case for closer collaboration has been made recently [134] . Collaborations should be expanded to include researchers in solid-state physics, materials science, nanoelectronics, circuit/architecture design and information theory. Memristors show great promise to be a fabric for producing brain-inspired building blocks [135] , and this progress report showcases different types of memristor-based applications. Memristor technologies are versatile enough to provide the perfect platform for different disciplines to strive together in pushing the frontiers of our current technologies in the most fundamental way. ## Acknowledgements A.M. acknowledges funding and support from the Royal Academy of Engineering under the Research Fellowship scheme. A.S. acknowledges funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement number 682675). B.R. acknowledges partial support from IBM and Cisco. O. S. acknowledges funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program (grant agreement 725731). E.V. would like to acknowledge a Google Faculty Research Award (2017). AJK acknowledges funding from the Engineering and Physical Sciences Research Council (EPSRC). ## References - 1 Dario Amodei and Danny Hernandez, AI and Compute, https://openai.com/blog/ai-and-compute/, Access: March 2020 2 M. M. Waldrop, Nature 2016 , 530 , 144. 3 V. Sze, Y.-H. Chen, T.-J. Yang, J. S. Emer, Proc. IEEE 2017 , 105 , 2295. - 4 D. Ielmini, R. Waser, Eds. , Resistive Switching: From Fundamentals of Nanoionic Redox Processes to Memristive Device Applications , Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2016 . - 5 D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, Nature 2008 , 453 , 80. - 6 K. Szot, W. Speier, G. Bihlmayer, R. Waser, Nature Mater 2006 , 5 , 312. - 7 M. A. Zidan, J. P. Strachan, W. D. Lu, Nat Electron 2018 , 1 , 22. - 8 L. Chua, IEEE Trans. Circuit Theory 1971 , 18 , 507. - 9 A. Mehonic, A. J. Kenyon, in Defects at Oxide Surfaces (Eds.: J. Jupille, G. Thornton), Springer International Publishing, Cham, 2015 , pp. 401-428. - 10 S. Yu, Neuro-Inspired Computing Using Resistive Synaptic Devices , Springer Science+Business Media, New York, NY, 2017 . - 11 O. Mutlu, S. Ghose, J. Gómez-Luna, R. Ausavarungnirun, Microprocessors and Microsystems 2019 , 67 , 28. - 12 S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, D. Glasco, IEEE Micro 2011 , 31 , 7. - 13 N. P. Jouppi, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, C. Young, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, N. Patil, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Patterson, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, G. Agrawal, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, R. Bajwa, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, S. Bates, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. H. Yoon, S. Bhatia, N. Boden, in Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17 , ACM Press, Toronto, ON, Canada, 2017 , pp. 1-12. - 14 A. Sebastian, T. Tuma, N. Papandreou, M. Le Gallo, L. Kull, T. Parnell, E. Eleftheriou, Nat Commun 2017 , 8 , 1115. - 15 J. J. Yang, D. B. Strukov, D. R. Stewart, Nature Nanotech 2013 , 8 , 13. - 16 D. Ielmini, H.-S. P. Wong, Nat Electron 2018 , 1 , 333. - 17 A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, E. Eleftheriou, Nat. Nanotechnol. 2020 , DOI 10.1038/s41565-020-0655-z. - 18 M. Di Ventra, Y. V. Pershin, Nature Phys 2013 , 9 , 200. - 19 Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, D. Ielmini, Proc Natl Acad Sci USA 2019 , 116 , 4123. - 20 L. Chua, Appl. Phys. A 2011 , 102 , 765. - 21 H.-S. P. Wong, S. Salahuddin, Nature Nanotech 2015 , 10 , 191. - 22 A. Sebastian, M. Le Gallo, E. Eleftheriou, J. Phys. D: Appl. Phys. 2019 , 52 , 443002. - 23 A. Sebastian, M. Le Gallo, G. W. Burr, S. Kim, M. BrightSky, E. Eleftheriou, Journal of Applied Physics 2018 , 124 , 111101. - 24 J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, R. S. Williams, Nature 2010 , 464 , 873. - 25 I. Vourkas, G. Ch. Sirakoulis, IEEE Circuits Syst. Mag. 2016 , 16 , 15. - 26 S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, U. C. Weiser, IEEE Trans. Circuits Syst. II 2014 , 61 , 895. - 27 A. Haj-Ali, R. Ben-Hur, N. Wald, R. Ronen, S. Kvatinsky, IEEE Trans. Circuits Syst. I 2018 , 65 , 4258. - 28 S. Hamdioui, H. A. Du Nguyen, M. Taouil, A. Sebastian, M. L. Gallo, S. Pande, S. Schaafsma, F. Catthoor, S. Das, F. G. Redondo, G. Karunaratne, A. Rahimi, L. Benini, in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) , IEEE, Florence, Italy, 2019 , pp. 486-491. - 29 A. Rahimi, S. Datta, D. Kleyko, E. P. Frady, B. Olshausen, P. Kanerva, J. M. Rabaey, IEEE Trans. Circuits Syst. I 2017 , 64 , 2508. - 30 G. Karunaratne, M. L. Gallo, G. Cherubini, L. Benini, A. Rahimi, A. Sebastian, arXiv:1906.01548 [physics] 2020 . - 31 N. Papandreou, H. Pozidis, A. Pantazi, A. Sebastian, M. Breitwisch, C. Lam, E. Eleftheriou, in 2011 IEEE International Symposium of Circuits and Systems (ISCAS) , IEEE, Rio de Janeiro, Brazil, 2011 , pp. 329-332. R. Carboni, D. Ielmini, Adv. Electron. Mater. , , 1900198. - 33 Y. Shim, S. Chen, A. Sengupta, K. Roy, Sci Rep 2017 , 7 , 14101. - 34 H. Nili, G. C. Adam, B. Hoskins, M. Prezioso, J. Kim, M. R. Mahmoodi, F. M. Bayat, O. Kavehei, D. B. Strukov, Nat Electron 2018 , 1 , 197. - 35 M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, E. Eleftheriou, IEEE Trans. Electron Devices 2018 , 65 , 4304. - 36 G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M. Ishii, P. Narayanan, A. Fumarola, L. L. Sanches, I. Boybat, M. Le Gallo, K. Moon, J. Woo, H. Hwang, Y. Leblebici, Advances in Physics: X 2017 , 2 , 89. - 37 M. A. Zidan, J. P. Strachan, W. D. Lu, Nat Electron 2018 , 1 , 22. - 38 B. Fleischer, S. Shukla, M. Ziegler, J. Silberman, J. Oh, V. Srinivasan, J. Choi, S. Mueller, A. Agrawal, T. Babinsky, N. Cao, C.-Y. Chen, P. Chuang, T. Fox, G. Gristede, M. Guillorn, H. Haynie, M. Klaiber, D. Lee, S.H. Lo, G. Maier, M. Scheuermann, S. Venkataramani, C. Vezyrtzis, N. Wang, F. Yee, C. Zhou, P.-F. Lu, B. Curran, L. Chang, K. Gopalakrishnan, in 2018 IEEE Symposium on VLSI Circuits , IEEE, Honolulu, HI, 2018 , pp. 35-36. - 39 G. W. Burr, R. M. Shelby, S. Sidler, C. di Nolfo, J. Jang, I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, B. N. Kurdi, H. Hwang, IEEE Trans. Electron Devices 2015 , 62 , 3498. - 40 A. Sebastian, I. Boybat, M. Dazzi, I. Giannopoulos, V. Jonnalagadda, V. Joshi, G. Karunaratne, B. Kersting, - R. Khaddam-Aljameh, S. R. Nandakumar, A. Petropoulos, C. Piveteau, T. Antonakopoulos, B. Rajendran, M. L. Gallo, E. Eleftheriou, in 2019 Symposium on VLSI Technology , IEEE, Kyoto, Japan, 2019 , pp. T168-T169. - 41 A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, A. J. Kenyon, Front. Neurosci. 2019 , 13 , 593. - 42 V. Joshi, M. L. Gallo, I. Boybat, S. Haefeli, C. Piveteau, M. Dazzi, B. Rajendran, A. Sebastian, E. Eleftheriou, arXiv:1906.03138 [cs] 2019 . - 43 D. Joksas, P. Freitas, Z. Chai, W. H. Ng, M. Buckwell, W. D. Zhang, A. J. Kenyon, A. Mehonic, arXiv:1909.06658 [cs] 2019 . - 44 M. Le Gallo, D. Krebs, F. Zipoli, M. Salinga, A. Sebastian, Adv. Electron. Mater. 2018 , 4 , 1700627. - 45 S. R. Nandakumar, M. L. Gallo, C. Piveteau, V. Joshi, G. Mariani, I. Boybat, G. Karunaratne, R. KhaddamAljameh, U. Egger, A. Petropoulos, T. Antonakopoulos, B. Rajendran, A. Sebastian, E. Eleftheriou, arXiv:2001.11773 [cs] 2020 . - 46 I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, arXiv:1609.07061 [cs] 2016 . - 47 M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, E. Eleftheriou, Nat Electron 2018 , 1 , 246. - 48 S. R. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, E. Eleftheriou, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) , IEEE, Florence, 2018 , pp. 1-5. - 49 E. Eleftheriou, G. Karunaratne, B. Kersting, M. Stanisavljevic, V. P. Jonnalagadda, N. Ioannou, K. Kourtis, P. A. Francese, A. Sebastian, M. L. Gallo, S. R. Nandakumar, C. Piveteau, I. Boybat, V. Joshi, R. KhaddamAljameh, M. Dazzi, I. Giannopoulos, IBM J. Res. & Dev. 2019 , 63 , 7:1. - 50 F. Alibart, E. Zamanidoost, D. B. Strukov, Nat Commun 2013 , 4 , 2072. - 51 T. Gokmen, Y. Vlasov, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00333 - 52 S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, G. W. Burr, Nature 2018 , 558 , 60. - 53 V. Seshadri, T. C. Mowry, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture MICRO-50 '17 , ACM Press, Cambridge, Massachusetts, 2017 , pp. 273-287. - 54 S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) , IEEE, Austin, TX, 2017 , pp. 481-492. - 55 N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, P. Deaville, IEEE Solid-State Circuits Mag. 2019 , 11 , 43. - 56 F. Xiong, A. D. Liao, D. Estrada, E. Pop, Science 2011 , 332 , 568. Kai-Shin Li, C. Ho, Ming-Taou Lee, Min-Cheng Chen, Cho-Lun Hsu, J. M. Lu, C. H. Lin, C. C. Chen, B. W. Wu, Y. F. Hou, C. Yi. Lin, Y. J. Chen, T. Y. Lai, M. Y. Li, I. Yang, C. S. Wu, Fu-Liang Yang, in - Symposium on VLSI Technology (VLSI-Technology): Digest of Technical Papers , IEEE, Honolulu, HI, USA, 2014 , pp. 1-2. - 58 M. Salinga, B. Kersting, I. Ronneberger, V. P. Jonnalagadda, X. T. Vu, M. Le Gallo, I. Giannopoulos, O. Cojocaru-Mirédin, R. Mazzarello, A. Sebastian, Nature Mater 2018 , 17 , 681. - 59 S. Pi, C. Li, H. Jiang, W. Xia, H. Xin, J. J. Yang, Q. Xia, Nature Nanotech 2019 , 14 , 35. - 60 M. Buckwell, L. Montesi, S. Hudziak, A. Mehonic, A. J. Kenyon, Nanoscale 2015 , 7 , 18030. - 61 S. Brivio, J. Frascaroli, S. Spiga, Nanotechnology 2017 , 28 , 395202. - 62 S. Choi, S. H. Tan, Z. Li, Y. Kim, C. Choi, P.-Y. Chen, H. Yeon, S. Yu, J. Kim, Nature Mater 2018 , 17 , 335. - 63 I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, E. Eleftheriou, Nat Commun 2018 , 9 , 2514. - 64 W. W. Koelmans, A. Sebastian, V. P. Jonnalagadda, D. Krebs, L. Dellmann, E. Eleftheriou, Nat Commun 2015 , 6 , 8181. I. Giannopoulos, A. Sebastian, M. Le Gallo, V. P. Jonnalagadda, M. Sousa, M. N. Boon, E. Eleftheriou, in 2018 IEEE International Electron Devices Meeting (IEDM) - 66 S. Yu, Proc. IEEE 2018 , 106 , 260. , IEEE, San Francisco, CA, , pp. 27.7.1-27.7.4. - 67 M. Le Gallo, T. Tuma, F. Zipoli, A. Sebastian, E. Eleftheriou, in 2016 46th European Solid-State Device Research Conference (ESSDERC) , IEEE, Lausanne, Switzerland, 2016 , pp. 373-376. - 68 S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, H.-S. P. Wong, in 2012 International Electron Devices Meeting , IEEE, San Francisco, CA, USA, 2012 , pp. 10.4.1-10.4.4. - 69 P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, H. Qian, Nature 2020 , 577 , 641. - 70 P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, D. S. Modha, Science 2014 , 345 , 668. - 71 M. Jerry, W. Tsai, B. Xie, X. Li, V. Narayanan, A. Raychowdhury, S. Datta, in 2016 74th Annual Device Research Conference (DRC) , IEEE, Newark, DE, USA, 2016 , pp. 1-2. - 72 T. Tuma, A. Pantazi, M. L. Gallo, A. Sebastian, E. Eleftheriou, Nat. Nanotechnol. 2016 , 11, 693. 73 A. Mehonic, A. J. Kenyon, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00057 - 74 A. Sengupta, A. Ankit, K. Roy, in 2017 International Joint Conference on Neural Networks (IJCNN) , IEEE, Anchorage, AK, USA, 2017 , pp. 4557-4563. - 75 P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, M. Pfeiffer, in 2015 International Joint Conference on Neural Networks (IJCNN) , IEEE, Killarney, Ireland, 2015 , pp. 1-8. - 76 A. Sengupta, Y. Ye, R. Wang, C. Liu, K. Roy, Front. Neurosci . 2019 13, 95. - 77 R. Midya, Z. Wang, S. Asapu, S. Joshi, Y. Li, Y. Zhuo, W. Song, H. Jiang, N. Upadhay, M. Rao, P. Lin, C. Li, Q. Xia, J. J. Yang, Adv. Electron. Mater. 2019 , 5, 1900060. - 78 G. Bi, M. Poo, J. Neurosci. 1998 , 18 , 10464. - 79 H. Shouval, Front. Comput. Neurosci. 2010 , DOI 10.3389/fncom.2010.00019. - 80 Z. Brzosko, S. B. Mierau, O. Paulsen, Neuron 2019 , 103 , 563. - 81 J. Seo, B. Brezzo, Y. Liu, B. D. Parker, S. K. Esser, R. K. Montoye, B. Rajendran, J. A. Tierno, L. Chang, D. - S. Modha, D. J. Friedman, in 2011 IEEE Custom Integrated Circuits Conference (CICC) , IEEE, San Jose, CA, USA, 2011 , pp. 1-4. - 82 D. Kuzum, R. G. D. Jeyasingh, B. Lee, H.-S. P. Wong, Nano Lett. 2012 , 12 , 2179. - 83 S. Kim, C. Du, P. Sheridan, W. Ma, S. Choi, W. D. Lu, Nano Lett. 2015 , 15 , 2203. - 84 K. Zarudnyi, A. Mehonic, L. Montesi, M. Buckwell, S. Hudziak, A. J. Kenyon, Front. Neurosci. 2018 , 12 , 57. - 85 S. Kim, M. Ishii, S. Lewis, T. Perri, M. BrightSky, W. Kim, R. Jordan, G. W. Burr, N. Sosa, A. Ray, J.-P. Han, C. Miller, K. Hosokawa, C. Lam, in 2015 IEEE International Electron Devices Meeting (IEDM) , IEEE, Washington, DC, USA, 2015 , pp. 17.1.1-17.1.4. - 86 I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, E. Eleftheriou, Nat Commun 2018 , 9 , 2514. - 87 A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, T. Prodromakis, Nat Commun 2016 , 7 , 12611. - 88 Y. Fang, Z. Wang, J. Gomez, S. Datta, A. I. Khan, A. Raychowdhury, Front. Neurosci. 2019 , 13 , 855. - 89 N. Anwani, B. Rajendran, Neurocomputing 2020 , 380 , 67. - 90 F. Zenke, S. Ganguli, Neural Computation 2018 , 30 , 1514. - 91 E. O. Neftci, H. Mostafa, F. Zenke, IEEE Signal Process. Mag. 2019 , 36 , 51. - 92 S. R. Nandakumar, I. Boybat, M. L. Gallo, E. Eleftheriou, A. Sebastian, B. Rajendran, arXiv:1905.11929 [cs] 2019 . - 93 W. Maass, Proc. IEEE 2014 , 102 , 860. - 94 M. Payvand, M. V. Nair, L. K. Müller, G. Indiveri, Faraday Discuss. 2019 , 213 , 487. - 95 D. Koller, N. Friedman, Probabilistic Graphical Models: Principles and Techniques , MIT Press, Cambridge, MA, 2009 . - 96 R. J. Williams, D. Zipser, Neural Computation 1989 , 1 , 270. - 97 A. Griewank, A. Walther, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation , Society For Industrial And Applied Mathematics, Philadelphia, PA, 2008 . - 98 J. H. Lee, T. Delbruck, M. Pfeiffer, Front. Neurosci. 2016 , 10 , DOI 10.3389/fnins.2016.00508. - 99 H. Mostafa, IEEE Trans. Neural Netw. Learning Syst. 2017 , 1. - 100 D. Huh, T. J. Sejnowski, arXiv:1706.04698 [cs, q-bio, stat] 2017 . - 101 E. O. Neftci, C. Augustine, S. Paul, G. Detorakis, Front. Neurosci. 2017 , 11 , 324. - 102 B. Nessler, M. Pfeiffer, L. Buesing, W. Maass, PLoS Comput Biol 2013 , 9 , e1003037 - 103 R. M. Neal, Artificial Intelligence 1992 , 56 , 71. - 104 Y. Tang, R. Salakhutdinov, Advances In Neural Information Processing Systems 2013 (Pp. 530-538). - 105 Y. Bengio, N. Léonard, A. Courville, arXiv:1308.3432 [cs] 2013 . - 106 G. E. Hinton, S. Osindero, Y.-W. Teh, Neural Computation 2006 , 18 , 1527 - 107 R. M. Neal, Artificial Intelligence 1992 , 56 , 71. - 108 D. Barber, P. Sollich, in Advances In Neural Information Processing Systems 2000 (Pp. 393-399). - 109 T. Raiko, M. Berglund, G. Alain, L. Dinh, arXiv:1406.2989 [cs, stat] 2015 . - 110 S. Gu, S. Levine, I. Sutskever, A. Mnih, arXiv:1511.05176 [cs] 2016 . - 111 H. Jang, O. Simeone, B. Gardner, A. Gruning, IEEE Signal Process. Mag. 2019 , 36 , 64. - 112 J. Brea, W. Senn, J.-P. Pfister, Journal of Neuroscience 2013 , 33 , 9565 B. Rosenfeld, O. Simeone, B. Rajendran, in 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) , IEEE, Cannes, France, S. Mohamed, M. Rosca, M. Figurnov, A. Mnih, H. Jang, O. Simeone, in , pp. 1-5. arXiv:1906.10652 [cs, math, stat] . ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , IEEE, Brighton, United Kingdom, , pp. 3382-3386. S.-C. Liu, B. Rueckauer, E. Ceolini, A. Huber, T. Delbruck, C. Xu, W. Zhang, Y. Liu, P. Li, Front. Neurosci. , IEEE Signal Process. Mag. , 104. 118 A. Bagheri, O. Simeone, B. Rajendran, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , IEEE, Calgary, AB, 2018 , pp. 2986-2990. N. Kasabov, Neural Networks , 23 , 16 120 H. Mostafa, G. Cauwenberghs, Neural Computation 2018 , 30 , 1542. N. Skatchkovsky, H. Jang, O. Simeone, arXiv:1910.09594 [cs, eess, stat] D. E. Rumelhart, G. E. Hinton, R. J. Williams, , , 533. Nature 123 D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, D. Hassabis, Nature 2017 , 550 , 354. E. Strubell, A. Ganesh, A. McCallum, Q. Xia, J. J. Yang, Nat. Mater. arXiv:1906.02243 [cs] , , 309. M. Rahimi Azghadi, Y.-C. Chen, J. K. Eshraghian, J. Chen, C.-Y. Lin, A. Amirsoleimani, A. Mehonic, A. J. Kenyon, B. Fowler, J. C. Lee, Y.-F. Chang, Advanced Intelligent Systems , 1900189. The 'echo State' approach to Analysing and Training Recurrent Neural Networks H. Jaeger, German National Research Institute For Computer Science, W. Maass, T. Natschläger, H. Markram, . Neural Computation L. Manneschi, E. Vasilaki, Nat Mach Intell , M. Hermans, B. Schrauwen, , 155 Neural Computation , C. Du, F. Cai, M. A. Zidan, W. Ma, S. H. Lee, W. D. Lu, , 104. Nat Commun L. Manneschi, A. C. Lin, E. Vasilaki, arXiv:1912.08124 [cs, stat] , . M. Lanza, H.-S. P. Wong, E. Pop, D. Ielmini, D. Strukov, B. C. Regan, L. Larcher, M. A. Villena, J. J. Yang, L. Goux, A. Belmonte, Y. Yang, F. M. Puglisi, J. Kang, B. Magyari-Köpe, E. Yalon, A. Kenyon, M. Buckwell, A. Mehonic, A. Shluger, H. Li, T.-H. Hou, B. Hudec, D. Akinwande, R. Ge, S. Ambrogio, J. B. Roldan, E. Miranda, J. Suñe, K. L. Pey, X. Wu, N. Raghavan, E. Wu, W. D. Lu, G. Navarro, W. Zhang, H. Wu, R. Li, A. Holleitner, U. Wurstbauer, M. C. Lemme, M. Liu, S. Long, Q. Liu, H. Lv, A. Padovani, P. Pavan, I. Valov, X. Jing, T. Han, K. Zhu, S. Chen, F. Hui, Y. Shi, Adv. Electron. Mater. Commun. ACM J. B. Aimone, J. D. Kendall, S. Kumar, , , 110. Applied Physics Reviews , , 011305. , 1800143. . , , . 14 , 2531. , 2204. , 36 , 29. , GMD -

Rendering Paper...