2412.20848
Model: gemini-2.0-flash
## Analog Alchemy: Neural Computation with In-Memory Inference, Learning and Routing
Dissertation zur
Erlangung der naturwissenschaftlichen Doktorwürde
(Dr. sc. UZH ETH Zürich) vorgelegt der Mathematisch-naturwissenschaftlichen Fakultät der Universität Zürich und der Eidgenössischen Technischen Hochschule Zürich von
YiÄŸit DemiraÄŸ
aus der
Türkei
Promotionskommission
Prof. Dr. Giacomo Indiveri (Vorsitz und Leitung)
Prof. Dr. Melika Payvand Prof. Dr. Benjamin Grewe
Zürich, 2024
## yigit demirag
## ANALOG ALCHEMY: NEURAL COMPUTATION WITH IN-MEMORY I N F E R E N C E , LEARNING AND ROUTING
To the engineers and scientists who will one day build superintelligence; from whatever materials and circuits, in whatever form.
## ABSTRACT
As neural computation is revolutionizing the field of Artificial Intelligence (AI), rethinking the ideal neural hardware is becoming the next frontier. Fast and reliable von Neumann architecture has been the hosting platform for neural computation. Although capable, its separation of memory and computation creates the bottleneck for the energy efficiency of neural computation, contrasting the biological brain. The question remains: how can we efficiently combine memory and computation, while exploiting the physics of the substrate, to build intelligent systems? In this thesis, I explore an alternative way with memristive devices for neural computation, where the unique physical dynamics of the devices are used for inference, learning and routing. Guided by the principles of gradient-based learning, we selected functions that need to be materialized, and analyzed connectomics principles for efficient wiring. Despite non-idealities and noise inherent in analog physics, I will provide hardware evidence of adaptability of local learning to memristive substrates, new material stacks and circuit blocks that aid in solving the credit assignment problem and efficient routing between analog crossbars for scalable architectures. First, I address limited bit precision problem in binary Resistive Random Access Memory (RRAM) devices for stable training. By introducing a new device programning technique that precisely controls the filament growth process, we enhance the effective bit precision of these devices. Later, we prove the versatility of this technique by applying it to novel perovskite memristors. Second, I focus on the hard problem of online credit assignment in recurrent Spiking Neural Networks (SNNs) in the presence of memristor non-idealities. I present a simulation framework based on a comprehensive statistical model of Phase Change Material (PCM) crossbar array, capturing all major device non-idealities. Building upon the recently developed e-prop local learning rule, we demonstrate that gradient accumulation is crucial for reliably implementing the learning rule with memristive devices. Moreover, I introduce PCM-trace, a scalable implementation of synaptic eligibility traces, a functional block demanded by many learning rules, using volatile characteristics by specifically fabricated PCM devices. Third, I present our discovery of a novel memristor material capable of switching between volatile and non-volatile modes. This reconfigurable memristor, based on halide perovskite nanocrystals, offers a significant advancement in emerging memory technologies, enabling the implementation of both static and dynamic neural variables with the same material and fabrication technique, while holding the world record in endurance. Finally, I introduce Mosaic, a memristive systolic architecture for in memory computing and routing. Mosaic, trained with our novel layout-aware training methods, efficiently implements small-world graph connectivity and demonstrates superior energy efficiency in spike routing compared to other hardware platforms.
## ACKNOWLEDGEMENTS
This thesis wouldn't have been possible without many people: scientists, friends, and family. I'm honored to have shared this journey with such curious and driven individuals. Among the many who contributed, there are a few exceptional individuals who were absolutely core to making this happen:
First I'd like to thank my supervisor, Giacomo Indiveri, who is a rare scientist truly channeling his work towards a dream. Over these 5 years, he gave me complete freedom to explore what I believe are the most exciting problems, while providing me with high-bandwidth feedback on demand, with more than 500 emails and many thousands of DMs. He taught me the importance of pushing unusual ideas to the limit. Whenever I came up with an ambitious project goal, he always reminded me to deeply consider the efficiency on the silicon, first. I'm grateful for having been his student.
I've been very fortunate to coincide with Melika Payvand, my co-supervisor, in this particular academic space and time. She is the most curious mind craving to understand the emergence of intelligence from the physics of computation, and her passion is infectious. Together, we traversed the probability trees for nearly every projects in my PhD, and executed against the entropy. Her close friendship is the cherry on the cake; I enjoyed and valued every second of it.
Then there are people I am very lucky to collaborate with and learn from. Rohit A. John, an extraordinary person who taught me the importance of grinding with massive focus while solving hard problems. And Elisa Vianello, who always provided her seamless support and insights that have made hard projects a joyful exploration. And Emre Neftci, whose disruptive scientific ideas deeply resonated with me, and with whom I always enjoyed discussing ideas.
I have to thank Alpha, Anqchi, Arianna, Chiara, Dmitrii, Farah, Filippo, Jimmy, Karthik, Manu, Maryada, Nicoletta, Tristan, and many others, who I hope will forgive me for not being mentioned individually or for resorting to alphabetical order when I did. Thank you for inspiring conversations in INI hallways, night walks in Zurich, and giving me the privilege of calling you, my friends.
During my PhD, I completed two internships at Google Zurich and one research visit at MILA. All of these were fantastic learning experiences, where I got a chance to reshape my research scope. From these experiences, I would especially like to thank Jyrki Alakuijala, Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, Alexander Mordvintsev, Esteban Real, Arna Ghosh, Jonathan Cornford, Joao Sacramento, Blake Richards, Guillaume Lajoie, and Blaise Aguera y Arcas.
I would also like to extend my thanks to my professors from my Master's degree, particularly Ekmel Ozbay, Bayram Butun and Yusuf Leblebici, for their invaluable support and inspiration.
And to Gizay, for sharing most of the journey and everything we created together.
But most of all, I want to express my deepest gratitude to my mom and dad, who inspired me to be curious, to take the world as a playground, and provided me with a loving home. And to my little brother, Efe, who is the best teammate in every game we play and in life's adventures.
## CONTENTS
| 1 Introduction | 1 Introduction | 1 Introduction | 1 Introduction | 1 | 1 | 1 | 1 | 1 |
|------------------|-----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----|-----|
| 2 | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | 9 | 9 | |
| | 2 . 1 | Introduction 9 | Introduction 9 | Introduction 9 | Introduction 9 | Introduction 9 | | |
| | 2 . 2 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | | |
| | 2 . 3 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | | |
| | 2 . 4 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | | |
| | 2 . 5 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | | |
| | 2 . 6 | Discussion 14 | Discussion 14 | Discussion 14 | Discussion 14 | Discussion 14 | | |
| 3 | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | 15 | |
| | 3 . 1 | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | 15 | 15 | |
| | | 3 . 1 . 2 | Introduction 15 Building blocks for training on in-memory processing cores | Introduction 15 Building blocks for training on in-memory processing cores | 16 | | | |
| | | 3 . 1 . 3 | PCM device modeling and integration into neural networks | PCM device modeling and integration into neural networks | 16 | | | |
| | | 3 . 1 . 4 | Discussion 23 | Discussion 23 | Discussion 23 | | | |
| | 3 . 2 | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | 25 | 25 | |
| | | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | | |
| | | 3 2 2 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | | |
| | | . . | Discussion 26 | Discussion 26 | Discussion 26 | Discussion 26 | | |
| | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | | |
| | | 3 . 3 . 1 | Introduction 28 | Introduction 28 | Introduction 28 | Introduction 28 | | |
| | | 3 . 3 . 2 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | | |
| | | 3 . 3 . 3 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | | |
| | | 3 . 3 . 4 | Circuits and Architecture 32 | Circuits and Architecture 32 | Circuits and Architecture 32 | Circuits and Architecture 32 | | |
| | | 3 . 3 . 5 | Discussion 34 | Discussion 34 | Discussion 34 | Discussion 34 | | |
| 4 | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | 37 | |
| | 4 . 1 | Introduction 37 | Introduction 37 | Introduction 37 | Introduction 37 | Introduction 37 | | |
| | 4 . 2 | Diffusive Mode of the Perovskite Reconfigurable Memristor | Diffusive Mode of the Perovskite Reconfigurable Memristor | Diffusive Mode of the Perovskite Reconfigurable Memristor | 40 | | | |
| | 4 . 3 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | | | |
| | 4 . 4 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | | | |
| | 4 . 5 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | | | |
| | 4 . 6 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | | | |
| | 4 . 7 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | | | |
| | 4 | | | | | | | |
| | . 8 | Discussion 44 | Discussion 44 | Discussion 44 | Discussion 44 | | | |
| 5 | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | Methods | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 49 | |
| | 5 . 1 | Introduction 49 | Introduction 49 | Introduction 49 | Introduction 49 | Introduction 49 | | |
| | 5 . 2 | Mosaic Hardware Computing and Routing Measurements | Mosaic Hardware Computing and Routing Measurements | Mosaic Hardware Computing and Routing Measurements | 51 | | | |
| | 5 . 3 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | | |
| | 5 . 4 | Benchmarking Routing Energy in Neuromorphic Platforms | Benchmarking Routing Energy in Neuromorphic Platforms | Benchmarking Routing Energy in Neuromorphic Platforms | 56 | | | |
| | 5 . 5 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | | |
| | 5 . 6 | Conclusions 65 | Conclusions 65 | Conclusions 65 | Conclusions 65 | Conclusions 65 | | |
| | | | 69 | 69 | 69 | 69 | | |
| a | Appendix | Appendix | | | | | | |
| | Bibliography | Bibliography | 93 | 93 | 93 | 93 | | |
| | Contributions Publications | Contributions Publications | 113 | 113 | 113 | 113 | | |
You've got to listen to the silicon, because it's always trying to tell you what it can do.
Carver Mead
How do we imbue the spark of intelligence into lifeless computational physical substrates? This question has been my quest, inspired by the early pioneers such as McCulloch and Pitts [ 1 ], Alan Turing [ 2 ] and von Neumann [ 3 ], who laid the foundation of modern neural computing. As the field of AI progressively gains superiority in numerous benchmarks, the quest to understand intelligence and to rethink its ideal implementation on a physical substrate has never been more pressing.
While intelligence remains elusive with numerous definitions, learning 1 seems to me to be a cornerstone of intelligence. Whether it is natural or artificial , the intelligent agent must adapt to survive and replicate. Bacteria can learn to swim away from environments that lower the probability of successful replication [ 4 ]. And Artificial Neural Network (ANN) architectures absorbing the datasets better are preferred by AI researchers and industry [ 5 ]. This is a common theme; intelligent systems should learn well to last. And any physical implementation of an intelligent system, likewise, needs to implement learning dynamics. However, the computational demands of learning place an enormous burden on existing hardware.
For over 75 years, computing hardware has relied on the von Neumann architecture: synchronous, deterministic, binary logic driving a processing unit that interfaces with a separate memory subsystem. This design excels at executing arbitrary sequential instructions, but it necessitates constant shuttling of data between memory and compute. 2 Memory hierarchies, with their layers of progressively larger and slower storage, have been the stopgap solution, but fundamentally the bottleneck remains. This non-local memory access is a leading factor in the latency and energy consumption of modern AI systems. In stark contrast, neural computation in biology is inherently intertwined with memory, operating asynchronously, sparsely, and stochastically. This calls for a fundamental rethinking of computing, where neural models and hardware are co-designed with locality and physics-awareness as first principles.
## An Alternative Path
This thesis departs from the well-established path of digital accelerator design. The inherent noise tolerance of neural networks presents an opportunity to relax strict precision and determinism requirements in both compute and memory subsystems of digital electronics. In turn, this relaxation unlocks some exotic modes of computation, where subthreshold regime of transistors and raw physics of novel materials can be exploited for neural computation and storage. Historically, this is the essence of neuromorphic engineering, where a deliberate trade-off can be made, favoring low-power and scalability offered by statistical physical processes over theoretical precision of boolean algebra.
This thesis optimizes neural computation across Marr's computational, algorithmic, and implementational levels [ 6 ], advocating for the co-design of neural models and hardware with locality and physics-awareness as guiding principles. By pushing critical neural operations to the fundamental level of material physics, engaging electrical, chemical, or even mechanical properties, we explore a new frontier in low-power neural computing.
Specifically, we investigate the following key strategies, which are detailed in the subsequent sections:
1 dynamic process of adapting to the environmental pressure to improve the probability of survival or replication.
2 partially due to the rapid time-multiplexing of resources.
- Analog In-Memory Computing: We exploit the physics of volatile (temporary) or nonvolatile (permanent) materials to perform critical neural network operations directly within the memory units. This non-von Neumann architecture fundamentally eliminates the need for data movement between memory and compute units, which are traditionally decoupled systems.
- Local Learning: We depart from the computationally intensive Back-Propagation Through Time (BPTT) for training, opting instead for local and online gradient-based learning rules. These rules inherently exhibit varying degrees of variance and bias in their estimates of the gradient of an objective function [ 7 ]. However, they offer significant advantages in hardware in terms of power efficiency and simplified implementation due to local availability of weight update signals and the elimination of the need for buffering intermediate values.
- Analog In-Memory Routing: We bring out locally dense and globally sparse connectivity on neural networks. This connectivity prior allows high utilization of routers with non-volatile materials to efficiently transmit neural activations within cores of analog systolic arrays.
- Physics-aware Training: We utilize data-driven optimization to counteract non-idealities inherent to analog technologies. This involves collecting extensive component measurements to model their collective behavior, tailoring learning algorithms and circuits accordingly. Additionally, we employ gradient-based architectural adaptations and weight re-parameterizations for robust on-chip inference and training. Our methods are validated on small-scale fabrications to assess their on-device performance and scalability potential.
In the following section, I will explain memristors, the prima materia of our endeavor, that enables and unifies these strategies explored in this thesis.
## Memristors for Analog in Memory Neural Operations
Neural network computation, biological or artificial, is fundamentally memory-centric. The human brain operates on O ( 10 15 ) synapses [ 8 ], while Large Language Models (LLMs) like GPT4 perform non-linear operations on O ( 10 12 ) parameters [ 9 ]. Scaling laws reveal a direct link between parameter count and performance [ 10 ], suggesting increasing network size is a reliable path to improved performance in future neural networks.
Given that memory requirements scale quadratically with the number of layers, memory becomes the primary design constraint in neural computing hardware, impacting scalability, throughput, and power efficiency. 3
An ideal memory system for neural computing should be high density, low-energy, quick access, and on-chip. However, an ideal memory does not yet exist, as these requirements are often conflicting. High density memories (i.e., DRAM, High Bandwidth Memory (HBM), 3 D NAND flash) are off-chip with long time to access, faster memories that don't use capacitors (i.e. SRAM) are larger in area due to transistor count, and high-bandwith memories (i.e., HBM) are expensive as they require additional banks, ports and channels. That's why memory hierarchy exists, to address this trade-off by using multiple levels of progressively larger and slower storage. Yet, the von Neumann bottleneck persists, with memory access to each layer in the hierarchy
F igure 1 . 1 : Importance of data locality is shown in the memory hierarchy as the computation ( 8 b multiplication) costs only the fraction of the energy of memory access, in 45 nm CMOS [ 11 ].
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Pyramid Diagram: Memory Hierarchy Energy Consumption
### Overview
The image is a pyramid diagram illustrating the energy consumption associated with accessing different levels of a memory hierarchy. The pyramid is divided into layers representing different types of memory, with the top layer representing the fastest and most energy-efficient memory, and the bottom layer representing the slowest and least energy-efficient memory. Arrows indicate the energy cost per access as one moves up the memory hierarchy.
### Components/Axes
* **Layers (from top to bottom):**
* 8b mul (8-bit multiplier)
* Local SRAM (8 kB)
* On-chip SRAM (~100 MBs)
* DRAM (GBs)
* **Energy Consumption Labels:**
* 0.2 pJ (at the top layer)
* 5 pJ / 8b (arrow pointing from Local SRAM to 8b mul)
* 50 pJ / 8b (arrow pointing from On-chip SRAM to Local SRAM)
* 640 pJ / 32b (arrow pointing from DRAM to On-chip SRAM)
### Detailed Analysis
* **8b mul:** Located at the top of the pyramid, representing the smallest and fastest memory. Energy consumption is 0.2 pJ.
* **Local SRAM (8 kB):** The second layer from the top. Accessing data from this level to the 8b mul level costs 5 pJ / 8b.
* **On-chip SRAM (~100 MBs):** The third layer from the top. Accessing data from this level to the Local SRAM level costs 50 pJ / 8b.
* **DRAM (GBs):** The bottom layer of the pyramid, representing the largest and slowest memory. Accessing data from this level to the On-chip SRAM level costs 640 pJ / 32b.
### Key Observations
* Energy consumption increases significantly as one moves down the memory hierarchy from faster, smaller memory to slower, larger memory.
* The energy cost of accessing DRAM is significantly higher than accessing SRAM or the 8b multiplier.
### Interpretation
The diagram illustrates the trade-off between memory size, speed, and energy consumption. Faster memory (like the 8b multiplier and SRAM) is smaller and consumes less energy, while slower memory (like DRAM) is larger but consumes significantly more energy per access. This highlights the importance of memory hierarchy design in optimizing performance and energy efficiency in computing systems. The pyramid shape visually emphasizes the increasing cost of accessing lower levels of the memory hierarchy.
</details>
(i.e., DRAM → on-chip SRAM → local SRAM), incurring roughly an order of magnitude more energy and latency penalty. Even in the ideal case of local SRAM, the cost of memory access exceeds that of computation by an order of magnitude. In-memory computing offers a compelling
3 On edge Tensor Processing Unit (TPU) for instance, memory access can consume over 90% of the total energy, throttle throughput to below 10% peak capacity, and in general dominate the majority of chip area [ 12 ].
alternative to data movement by processing data where it is stored, and one of the most promising technologies for in-memory computing is memristors.
Memristors are two-terminal analog resistive memory devices capable of both computation and memory [ 13 ]. They have a unique ability to encode and store information in their electrical conductance, which can be altered based on the history of applied electrical pulses [ 14 ]. Because a single memristor's conductance can transition between multiple levels between Low Conductive State (LCS) and High Conductance State (HCS), it can store more than one bit of information (typically between to 3 and 5 bits) improving the memory density. 4 Various memristor types exist, each with distinct operating principles and advantages. For example, PCM relies on the contrasting electrical resistance of amorphous and crystalline phases in chalcogenide materials [ 16 , 17 ], RRAM operates by altering the resistance of dialectric material through the drift of oxygen vacancies or the formation of conductive filaments [ 18 ], and Ferroelectric Random Access Memory (FeRAM) uses the polarization of ferroelectric materials to store and change information [ 19 ]. Although the electrical interface to these types does not differ significantly (applying electrical pulses to read or program the conductance), these underlying mechanisms determine the device's switching speed, footprint, stochasticity, endurance, and energy efficiency.
When implementing functions with memristor forms, it is helpful to categorize by volatility. Non-volatile memristors retain conductance after programming, ideal for weight storage in neural networks. This property extends to in-memory computing to perform dot-product operations [ 20 -22 ]. When an array of N memristive devices store vector elements Gi in their conductance states, applying voltages Vi to the devices and measuring the resulting currents I i , the dot product ∑ N i = 1 GiVi can be computed with O ( 1 ) complexity by Ohm's Law and Kirchoff's current law. This principle extends to matrix-vector multiplication, enabling neural network inference directly through the analog substrate's physics. In-memory neural inference has been demonstrated in large prototypes like the PCM-based HERMES chip [ 23 ] and RRAM-based NeuRRAM chip [ 18 ].
Volatile memristors, however, provide functionality beyond mere storage, and are often less explored. Their conductance decay within 10 ms to 100 ms allows for continuous-time, stochastic accumulate-and-decay functions, approximating low-pass filtering, signal averaging, or time tracking. In neural computations, volatile memristors have been used to implement short-term synaptic [ 24 , 25 ] and neuronal dynamics [ 26 , 27 ].
why spikes ? In this thesis, I focus on SNNs, where neural activations are represented as discrete pulses called spikes. This choice is primarily motivated by hardware considerations. Spikes enable extreme spatiotemporal sparsity, aligning well with asynchronous computation where energy consumption directly correlates with spiking events that trigger localized circuits [ 28 ]. Mixed-signal spiking neuron circuits inherently act as sigma-delta Analog-to-Digital Converters (ADCs), converting analog neural computation into digital spikes [ 29 ] for noise-robust transmission over long distances without signal degradation [ 30 ]. Additionally, spiking neurons seamlessly interface with reading and programming of memristive devices and also with event-based sensors [ 31 , 32 ] through Address-Event Representation (AER) [ 33 ] protocol. Spiking communication has even been proposed to mitigate severe heating issues in 3 D fabrication using n-ary coding schemes [ 34 ].
From a computational perspective, a common argument for spikes is efficient information encoding through precisely utilizing the temporal dimension (e.g., time-to-first-spike [ 35 ], phasecoding [ 36 ], or inter-spike-interval [ 37 ]). Furthermore, the spiking framework allows to formulate hypotheses about biophysical spike-based learning mechanisms in the brain [ 38 -40 ] and explore the computational capabilities of spiking neural networks [ 41 ], as demonstrated in our recent work. While a comprehensive exploration of the advantages of spikes is beyond this thesis's scope, the primary focus here is their potential for energy-efficient communication on analog substrates, as their unique computational benefits remain to be conclusively demonstrated.
4 These attributes are interestingly similar to biological synapses, where the synaptic efficacy is modulated by the history of pre- and post-synaptic activity, and is suggested to take 26 distinguishable strengths (correlated with spine head volume) [ 15 ].
## State of the Art
Despite their potential, memristors are not without their challenges for neural computing. In this section, I will outline some of these challanges in inference, learning and routing, and state-of-the-art attempts to address them.
Limited bit precision. Programming nanoscale memristive devices, modifies their atomic arrangement, inherently stochastic, non-linear and of limited granularity [ 42 -45 ]. This analog non-idealities are known to cause significant performance drops in training networks compared to software simulations [ 46 ]. Controlled experiments by Sidler et al. [ 22 ] have demonstrated that the poor training performance is primarily due to insufficient number of pulses switching the device between High Resistive State (HRS) and Low Resistive State (LRS).
Following this, various attempts have been made to improve the bit precision of memristive devices. Optimizing RRAM materials has increased the number of bits per device, but often at the cost of lower ON/OFF ratios (i.e., the ratio of the LRS and HRS), making it harder to distinguish states with small footprint circuits [ 47 -50 ]. Furthermore, architectural optimizations have been explored, including using multiple binary devices per synapse [ 51 ], assigning a number system to multiple devices [ 52 ], leveraging stochastic switching [ 53 , 54 ], and complementing binary memristive devices with capacitors [ 55 ]. However, these methods still require complex and large synaptic architectures, limiting scalability.
In Chapter 2 , we propose a novel approach to program intrinsically 1 -bit RRAM devices to increase their effective bit resolution by precise control of the filament formation.
The credit assignment problem. Learning in any systems is fundamentally about adjusting its parameters to improve its performance. In neural networks, learning involves adjusting weights, represented by the vector W , to optimize the performance, measured by an objective function, F ( W ) . The credit assignment problem refers to the challenge of determining the precise weight adjustments needed for improvement, especially in deep networks where the relationship between individual neurons and overall performance is less clear [ 56 ].
Traditionally, Hebbian mechanisms [ 57 ] leveraging the timing of pre- and post-synaptic spikes, have been the go-to solution for on-chip learning in the neuromorphic field. This is because Hebbian rules explain a numerous neuroscientific observations [ 58 , 59 ], posses interesting variance maximization properties [ 60 ] but more practically, utilize local signals to the synapse in a simple way, making them easily adaptable to silicon circuits [ 61 -64 ]. However, Hebbian rules alone had limited success while scaling to large networks, and require heavily crafted architectural biases to achieve hiererchical, disentangled representations. 5
Backpropagation [ 68 ], on the other hand, remains the state-of-the-art algorithm for training modern ANNs, and is one of the pillars of the deep learning revolution. To give an intuition why it works, following the insight from Richards et al. [ 7 ], let's consider weight changes, ∆ W , to be small and objective function to be maximized, F , to be locally smooth, then resulting ∆ F can be approximated as ∆ F ≈ ∆ W T · ∇ WF ( W ) , where ∇ WF ( W ) is the gradient of the objective function with respect to the weights. This means that to guarantee improving learning performance ( ∆ F ≥ 0), a principled approach to weight adjustment is to take a small step in the direction of the steepest performance improvement, guided by the gradient ( ∆ F ≈ η ∇ WF ( W ) T · ∇ WF ( W ) ≥ 0). Backpropagation explicitly calculating gradients, while powerful, is unsuitable for online learning on analog substrates due to its need for symmetric feedback weights and distinct forward/backward phases [ 56 ]. And for temporal credit assignment, BPTT unrolls the network in reverse time to backpropagate the error gradients, resulting in memory complexity scaling with O ( kT ) , where k is the number of time steps and T is the number of neurons. This temporal nonlocality, necessitates the usage of memory hierarchy to save silicon space, resulting in sacrificing energy efficiency and latency. For this reason, various alternatives have been proposed to estimate gradients while offering better locality, such as feedback alignment [ 69 ], Q-AGREL [ 70 ], difference
5 While this is true, it is possible that end-to-end credit assignment might not be needed [ 65 ]. Recent works successfully replacing global backpropagation with truncated layer-wise backpropagation supports this view [ 66 , 67 ]. Nevertheless, designing the right architecture and local loss functions to guide the truncated credit assignment is still an open question.
target propagation [ 71 ], predictive coding [ 72 ] and weight perturbation-based methods [ 73 ]. These methods exhibit varying degrees of variance (resulting in slower convergence) and bias (leading to poor generalization [ 74 ]) in their estimations [ 7 ].
However, as long as variance and bias are within reasonable bounds, the learning rule can still be effective, potentially offering a favorable sweet spot between locality and performance. The challenge of local and effective spatio-temporal credit assignment is still a critical frontier in neural processing in analog substrates, demanding new and creative approaches.
In Chapter 3 , we address this challenge for the first time, designing material and circuits evolved around gradient calculations, and implementing e-prop [ 75 ] local learning rule for Recurrent Spiking Neural Networks (RSNNs) on a memristive chip.
## Programming memristors with non-idealities
for online learning. On-chip learning requires the ability to program the network weights in accordance with the demands of the learning rule. While digital systems often rely on quantization to optimize memory access [ 76 ] and associated mitigation techniques such as stochastic rounding [ 77 ], gradient scaling [ 78 ], quantization range learning [ 79 ] and optimizing the weight representation [ 80 ], analog memristive systems present unique challenges due to nonidealities such as conductance-dependent, non-linear, stochastic, and time-varying programming responses [ 81 ]. Then, it is crucial to identify which digital methods can be effectively transferred to analog systems while promising small footprint and energy overhead.
In Section 3 . 1 and 3 . 2 , we analyze several practical weight update schemes implementing an online learning rule for mixed-signal systems on a custom simulator, and later validate on a real neuromorphic chip.
Scalable synaptic eligibility traces for local learning. Many high-performing local rules rely on eligibility traces [ 39 , 82 -86 ], slow synaptic memory mechanisms that carry information forward in time. These traces bridge the temporal gap between synaptic activity occuring on millisecond timescale with network errors arising seconds later, helping to solve the distal reward problem [ 87 , 88 ]. While several neuromorphic platforms [ 89 -91 ] have incorporated synaptic eligibility traces for learning, this mechanism is one of the most costly building blocks in neural computation, due to quadratic scaling of the number of synapses. Digital implementations suffer from the memory-intensive nature of numerical trace calculations, leading to a von Neumann bottleneck [ 92 , 93 ]. Even in mixed-signal designs, the slow dynamics of eligibility traces require large capacitors leading to sacrifice of the scalability [ 94 ].
In Section 3 . 3 , we propose a novel and scalable implementation of synaptic eligibility traces using volatile memristors.
Unifiying volatile and non-volatile materials. Different neural building blocks require different volatility, bit precision, and endurance characteristics from the memristive devices, which are then tailored to meet these demands. For example, ANN inference workloads require linear non-volatile conductance response over a wide dynamic range for optimal weight update and minimum noise for gradient calculation [ 21 , 22 , 95 ]. Whereas SNNs often demand richer and multiple synaptic dynamics simultaneously e.g., short term conductance decay (to implement Short-Term Plasticity (STP) and eligibility traces [ 82 ]), non-volatile device states (to represent synaptic efficacy) and a probabilistic nature (to mimic synaptic vesicle releases [ 24 ]). However, optimizing different active memristive material for each of these features limits their suitability to a wide range of computational frameworks and ultimately increases the system complexity for most demanding applications. Moreover, these diverse specifications cannot always be implemented by combining
F igure 1 . 2 : Bias and variance in learning rules which estimate the gradients, even if they are not explicitly compute gradients. Figure taken from [ 7 ].
<details>
<summary>Image 2 Details</summary>

### Visual Description
## Bias vs. Variance Plot for Learning Algorithms
### Overview
The image is a 2D scatter plot illustrating the relationship between bias and variance for different learning algorithms. The x-axis represents variance (randomness in weight changes), and the y-axis represents bias (gradient, weight change). Several algorithms are positioned on the plot based on their relative bias and variance.
### Components/Axes
* **X-axis:** Variance, i.e., randomness in weight changes. The axis increases from left to right.
* **Y-axis:** Bias, i.e., gradient, weight change. The axis increases from bottom to top.
* **Data Points:** Represent different learning algorithms.
### Detailed Analysis
The following algorithms are plotted:
* **Error backpropagation:** Located at the origin (bottom-left corner), indicating low bias and low variance.
* **Contrastive learning, predictive coding, dendritic error:** Located on the y-axis, above error backpropagation, indicating low variance and moderate bias. There is a question mark next to this label.
* **Feedback alignment:** Located on the y-axis, higher than the previous point, indicating low variance and high bias.
* **AGREL:** Located on the x-axis, to the right of the origin, indicating low bias and moderate variance. There is a question mark next to this label.
* **Node perturbation:** Located on the x-axis, further to the right than AGREL, indicating low bias and higher variance.
* **Weight perturbation:** Located on the x-axis, furthest to the right, indicating low bias and the highest variance among the plotted algorithms.
* **RDD:** Located in the upper-right quadrant, indicating moderate bias and moderate variance. There is a question mark next to this label.
### Key Observations
* Algorithms on the y-axis (Error backpropagation, Contrastive learning, Feedback alignment) have low variance.
* Algorithms on the x-axis (Error backpropagation, AGREL, Node perturbation, Weight perturbation) have low bias.
* The plot suggests a trade-off between bias and variance.
### Interpretation
The plot visually represents the bias-variance trade-off in machine learning. Algorithms like "Error backpropagation" aim for both low bias and low variance, while others prioritize one over the other. "Feedback alignment" exhibits high bias and low variance, potentially indicating underfitting. "Weight perturbation" exhibits low bias and high variance, potentially indicating overfitting. The question marks next to "Contrastive learning, predictive coding, dendritic error", "AGREL", and "RDD" suggest uncertainty or variability in their exact placement on the bias-variance spectrum. The plot helps in understanding the characteristics of different learning algorithms in terms of their bias and variance properties.
</details>
different types of memristors on a monolithic circuit e.g., volatile and non-volatile, binary and analog, due to the incompatibility of the fabrication processes. Although some prototype materials have been proposed to exhibit dual-functional memory [ 96 , 97 ], the dominance of one of the mechanisms often results in poor switching performance. Therefore, the lack of universal memristors capable of realizing diverse computational primitives has been a challenge.
In Chapter 4 , we present our discovery of a novel memristor type that can be used for both volatile and non-volatile operations based on a simple programming scheme, while achieving a world-record in endurance.
Routing of multi-crossbar arrays for scaling. Scaling neural networks, by increasing layer width or depth, has proven to be a powerful technique for improving performance [ 98 ]. However, scaling memristive crossbar array dimensions is hindered by analog non-idealities such as current sneak-paths, parasitic resistance, capacitance of metal lines, and yield limitations [ 99 -101 ]. For this reason, large-scale systems need to adapt multiple crossbars of managable dimensions [ 102 ], but this introduces the overhead of routing activities, especially with long wires along the source, router and destination. To reduce wiring length, three-dimensional ( 3 D) technology to vertically stack logic, crossbar arrays, and routers has been proposed [ 34 , 102 ], but the fabrication complexity and cost of 3 D integration are currently prohibitive. Today's most advanced multi-crossbar neuromorphic chips, e.g., HERMES [ 23 ] and NeuRRAM [ 18 ], still rely on off-chip communication for routing, strongly diminishing communication energy efficiency. This necessitates the development of efficient on-chip routing mechanisms to achieve energy-efficient communication. When the routing is optimized for communicating events through the on-chip AER protocol, the designer faces a trade-off between source-based and destination-based routing. Source-based routing offers the flexibility of per-neuron Content Addressable Memory (CAM) memory as used by DYNAP-SE [ 103 ], but this comes at the cost of increased chip area and slower memory access. Destination-based routing, while more area-efficient, sacrifices some degree of network configurability.
In Chapter 5 , we propose and fabricate a novel memristive in-memory routing core that can be reconfigured to route signals between crossbars, enabling dense local, sparse global connectivity with orders of magnitude more routing efficiency compared to other SNN hardware platforms.
## Thesis Overview
This thesis explores the path towards intelligent and low-power analog computing substrates, embracing the Bitter Lesson [ 104 ] and addressing some of the challenges in learning and scale. It consists of six selected publications, which I have coauthored with amazing electrical engineers, computer scientists, material designers and neuroscientists. My individual contributions to the work presented in each chapter are clarified and outlined in Appendix 4 .
In the first part, we focus on developing mixed-signal learning circuits targeting memristive weights in single-layer feedforward SNN architectures. We address the limited resolution and device-to-device and cycle-to-cycle variability of binary RRAM weights, aiming to enable on-chip learning. Building upon the observation of Ielmini [ 105 ] that filament size in RRAM can be precisely controlled by compliance current, we introduce a programming circuitry that modulates synaptic weights based on the estimated gradients using a modified Delta Rule [ 106 ]. This approach achieves multi-level weight resolution within the conductance of intrinsically binary switching RRAM devices. We model variability of device responses to our new compliance current programming scheme, I CC to G LRS, from experimental measurements on a 4 kb HfO 2 -based RRAM array, and adjust our implementation accordingly. We validate our approach and circuits through simulations for standard Complementary Metal-Oxide-Semiconductor (CMOS) 180 nm processes and system simulations on the MNIST dataset. This codesign of algorithm, material, and circuit properties establishes a significant building block for single-layer on-chip learning with memristive devices. Furthermore, in Chapter 4 , we demonstrate that our programming
scheme can also be applied, with even more linear response, to novel perovskite memristors.
In Chapter 3 , we extend our focus to more challenging task of training RSNNs on-chip, addressing the complexities of online temporal credit assignment with memristive devices for in-memory computing. Our work encompasses three complementary efforts: 1 ) given a recent local learning rule development for RSNNs [ 75 ], investigating how to reliably program analog devices based on a realistic PCM model on simulations, 2 ) validating these results on a PCM-based neuromorphic chip, and 3 ) proposing a scalable implementation of synaptic eligibility traces, a crucial component of many local learning rules, using volatile memristors.
To achieve this, we start with developing a PyTorch [ 107 ]-based simulation framework based on a comprehensive statistical model of a PCM crossbar array, capturing the major device nonidealities: programming noise, read noise, and temporal drift [ 81 ]. Our selected learning rule, e-prop, estimates the gradient with nice locality constraints, but it is not known how to reflect the gradient signal reliably while programming memristors with non-idealities. This framework enables benchmarking four commonly practiced memristor-aware weight update mechanisms (Sign-Gradient Descent [ 108 ], stochastic updates [ 81 ], multi-memristor updates [ 109 ] and mixedprecision [ 110 ]), to reliably program memristors conductances based on estimated gradients on generic regression tasks. We show that mixed-precision update scheme is superior by accumulating gradients on a high-precision memory (similar to quantization-aware-training methods [ 111 ]), allowing for a lower learning rate and improved alignment of weight update magnitudes with the PCM's minimum programmable conductance change. It also reduces the total number of write pulses by 2 -3 orders of magnitude, reducing energy spent on costly memristive programming and mitigating potential endurance issues. Furthermore, we verified previous digital implementations [ 77 ] of the stochastic update working well down to 8 -bit precision digital simulations.
Following simulations, the next step is to validate it on a physical neuromorphic chip. We implemented the e-prop learning rule on the HERMES chip [ 23 ], fabricated in 14 nm CMOS, with four 256 x 256 PCM crossbar arrays in a differential parallel setup. We programmed all weights of a RSNN onto physical PCM devices across HERMES cores for in-memory inference. On-chip training was controlled by a digital coprocessor implementing a mixed-precision algorithm to accumulate gradients on a high-precision memory unit. Our mixed-precision training on HERMES achieved performance competitive with conventional software simulations, maintained a regularized firing rate, and significantly reduced the number of PCM programming pulses, enhancing energy efficiency. Our results demonstrates the first successful implementation of a gradient-based, powerful online learning rule for RSNNs on an analog substrate. While the mixed-precision technique requires an additional high-precision memory unit, we demonstrate that inference can be in-memory, and off-chip guided learning can be activated as needed with minimal analog device programming.
Local learning rules, including e-prop, often require synaptic eligibility traces [ 39 , 82 -86 ], posing a scaling challenge for analog hardware due to their O ( N 2 ) area scaling, where N is the number of neurons. This challenge is exacerbated by increased time-constant requirements, as larger capacitors are needed for implementation. To address this, we introduce PCM-trace, a novel small-footprint circuitry leveraging the inherent conductance drift of PCM to emulate eligibility traces for local learning rules. We exploit the material bug - the structural relaxation and temporal conductance drift in PCM's amorphous regime [ 112 ], described by R ( t ) = R ( t 0 )( t t 0 ) ν - turning it into a feature. Our optimized material choice allows gradual SET pulses to accumulate the trace, while conductance naturally decays over seconds. We also introduce a multi-PCM-trace configuration, distributing synaptic traces across multiple PCM devices to significantly improve dynamic range. Experimental results on 130 nm CMOS technology confirm that PCM-trace can maintain eligibility traces for over 10 seconds while offering more than 11 x area savings compared to conventional capacitor-based trace implementations [ 94 ].
In Chapter 4 , we introduce a state-of-the-art memristive material based on halide perovskite nanocrystals that can be dynamically reconfigured to exhibit volatile or non-volatile behavior. This is motivated by the fact that many in-memory neural computing systems demand devices with specific switching characteristics, and existing memristive devices cannot be reconfigured to meet these diverse volatile and non-volatile switching requirements. To achieve this, we develop a cesium lead bromide nanocrystals capped with organic ligands as the active switching matrix and silver as the active electrode. This design leverages the low activation energy of ion migration in halide perovskites to achieve both diffusive (volatile) and drift (non-volatile) switching. By actively controlling the compliance current ( Icc ) following our prior work at Chapter 2 , the magnitude of ion flux is adjusted, enabling on-demand switching between the two modes. This control mechanism allows for the selection of diffusive dynamics with low Icc (1 µ A) for volatile behavior and drift kinetics with higher Icc ( 1 mA) for non-volatile memory operation. Moreover, our measurements demonstrated that memristors using perovskite nanocrystals capped with OGB, achieve a record endurance in both volatile and non-volatile modes. We attribute this superior performance to the larger OGB ligands, which better insulate the nanocrystals and regulate the electrochemical reactions responsible for switching behavior.
In Chapter 5 , we switch gears and focus on the scalability of memristive architectures in neuromorphic systems, a challenge hindered by analog non-idealities of crossbar arrays. We introduce Mosaic, a novel memristive systolic array architecture consisting of interconnected Neuron Tiles and Routing Tiles, both implemented with RRAM integrated 130 nm CMOS technology.
Each Neuron Tile, per usual, is a crossbar array of memristors storing network weights for a RSNN layer with Leaky Integrate-and-Fire (LIF) neurons. These neurons emit spikes based on integrated synaptic inputs, transmitting them to neighbouring tiles through Routing Tiles. The Routing Tiles, also based on memristor arrays, define the connectivity patterns between Neuron Tiles. The resulting structure then becomes a small-world graph with dense local and sparse long-range connections, similar to the connectivity found in biological brains 6 .
Our in-memory routing approach necessitates careful optimization of connectivity during offline training to prune RSNNs into a Mosaic-compatible, small-world graph. We introduce a novel hardware layout-aware training method that considers the physical layout of the chip and optimizes neural network weights using either gradient-based or evolutionary algorithms [ 113 ].
In-memory routing and optimization for sparse long-range and locally dense communication in Mosaic result in significant energy efficiency in spike routing, surpassing other SNN hardware platforms by at least one order of magnitude, as demonstrated by hardware measurements and system-level simulations. Notably, due to its layout-aware structured sparsity, Mosaic achieves competitive accuracy in edge computing tasks like biosignal anomaly detection, keyword spotting, and motor control.
6 It may not be a coincidence that some of the solutions proposed in this thesis can also be found in the biological brain, whispering of a deep connection between highly optimized silicon substrates and the evolved neural tissue.
<details>
<summary>Image 3 Details</summary>

### Visual Description
Icon/Small Image (61x83)
</details>
## ENHANCING BIT PRECISION OF BINARY MEMRISTORS FOR ROBUST ON-CHIP LEARNING
This chapter's content was published in IEEE International Symposium on Circuits and Systems (ISCAS). The original publication is authored by Melika Payvand, Yigit Demirag, Thomas Dalgaty, Elisa Vianello, Giacomo Indiveri.
Analog neuromorphic circuits with memristive synapses offer the potential for power-efficient neural network inference, but limited bit precision of memristors poses challenges for gradientbased training. In this chapter, we introduce a programming technique of the weights to enhance the effective bit precision of initially binary memristive devices, enabling more robust and performant on-chip training. To overcome the problems of variability and limited resolution of ReRAM memristive devices used to store synaptic weights, we propose to use only their HCS and control their desired conductance by modulating their programming compliance current, ICC . We introduce spike-based CMOS circuits for training the network weights; and demonstrate the relationship between the weight, the device conductance, and the ICC used to set the weight, supported with experimental measurements from a 4 kb array of HfO 2 -based devices. To validate the approach and the circuits presented, we present circuit simulation results for a standard CMOS 180 nm process and system-level simulations for classifying hand-written digits from the MNIST dataset.
## 2 . 1 introduction
The neural networks deployed on the resource-constrained devices can benefit greatly from online training to adapt to shifting data distributions, sensory noise, device degradation, or new tasks that are not seen in the pretraining. While CMOS architectures with integrated memristive devices offer ultra-low power inference, their use for online learning has been limited [ 114 , 115 ].
In this chapter, we propose novel learning circuits for SNN architectures implemented by 1 T 1 R arrays. These circuits enable analog weight updates on binary ReRAM devices by controlling their SET operation ICC . In addition to increasing the bit precision of network weights, the proposed strategy allows compact, fast, and scalable event-based learning scheme compatible with the AER interface [ 116 ].
Previously significant efforts have aimed to increase the bit precision of memristive devices for online learning through material and architectural optimizations.
material optimization Several groups reported TiO 2 -based [ 47 -49 ] and HfO 2 -based [ 50 ] ReRAM devices with up to 8 bits of precision. However, in all these works, the analog behavior is traded off with the lower available ON/OFF ratio. While the analog behavior is an important concern for training neural networks, cycle-to-cycle and device-to-device variability harms the effective number of bits further when ON/OFF ratio is small. Also, tuning the precise memory state is not always easily achievable in a real-time manner, requiring recursively tuning with an active feedback scheme [ 50 , 117 ]. Furthermore, some efforts have been focused on carefully designing a barrier level using exhaustive experimental search over a range of materials [ 47 , 48 ] which makes it difficult to fabricate.
architecture optimization Increasing the effective bit resolution has also been demonstrated with architectural advancements. Strategies such as using multiple binary switches to emulate n-bit synapses [ 51 ] or exploiting stochastic switching properties for analog-like adaptation [ 53 , 54 ] have been explored. Alternatively, IBM's approach of using a capacitor alongside two PCM devices to act as an analog volatile memory, increases combined precision but incurs significant area overhead [ 55 ]. Recently, a mixed-precision approach has been employed to train
the networks using a digital coprocessor for weight update accumulation [ 118 ], but requires digital buffering of weights, gradients and suffers of domain-conversion costs.
Thus, neither device nor architecture optimizations have fully resolved the challenges of low bit precision in memristors for online learning. Prior work by Ielmini [ 105 ] observed that the electrical resistance of memristors after a SET operation follows a linear relationship with ICC in a log-log scale, by control over the size of the filament. This critical observation underpins our approach, where we exploit this relationship to directly control device conductance.
To minimize the effect of variability, we adopt an algorithm-device codesign approach. We restrict devices to stay only in their HCS and modulate their conductance by adjusting the programming ICC . Specifically, we derive a technologically feasible online learning algorithm based on the Delta rule [ 106 ], mapping weight updates onto the ICC used for setting the device. This co-design offers several advantages: (i) relaxed fabrication constraints compared to multi-bit devices, and (ii) increased state stability due to the use of only two levels per device.
## 2 . 2 reram device modeling
To find the average relationship between the mean of the cycle-to-cycle distribution of the HCS and the SET programming ICC , we performed measurements on a 16 × 256 ( 4 kb) array of HfO 2 -based ReRAM devices integrated onto a 130 nm CMOS process between metal layers 4 and 5 [ 119 ]. Each device is connected in series to the drain of an n-type selector transistor which allows the SET programming ICC to be controlled based on the voltage applied to its gate. The 1 T 1 R structure allows a single device to be selected for reading or programming by applying appropriate voltages to a pair of Source/Bit Lines (SL/BL) and a single Word Line (WL).
All 4 kb devices were initially formed in a raster scan fashion by applying a large voltage ( 4 V typical) between the SL and BL to induce a soft breakdown in the oxide layer and introduce conductive oxygen vacancies. After forming, each device was subject to sets of 100 RESET/SET cycles over a range of SET ICC s between 10 µ A and 400 µ A, where the resistance of each device was recorded after each SET operation. The mean of all devices' median resistances over the 100 cycles, at a single ICC , gives the average relationship between HCS median and SET ICC as in Fig. 2 . 1 . The relationship is seen to follow a line in the log-log plot (power law) and over this ICC range, it allows precise control of the conductance median of the cycle-to-cycle distribution between 50 k Ω and 2 k Ω .
## 2 . 3 bit -precision enhancing weight update rule
The learning algorithm is based on the Delta rule, the simplest form of the gradient descent for single-layer networks. In our implementation, the objective function is defined as the difference between the desired target output signal y and the network prediction signal ˆ y , for a given set of input patterns signals x , weighted by the synaptic weight parameters w . Then the Delta rule can be used to calculate the change of the weights connecting a neuron i in the input layer and a neuron j at the output layer as follows:
$$\Delta w _ { j i } = \eta ( y _ { j } - \hat { y } _ { j } ) x _ { i } = \eta \delta _ { j } \, x _ { i } , & & ( 2 . 1 )$$
F igure 2 . 1 : Mean and standard deviation of the device conductance as a function of the ICC s. The inset shows the samples from the fitted mean and standard deviation used for the simulations.
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Line Chart: GLRS vs. ICC
### Overview
The image contains a line chart with error bars and an inset scatter plot. The main chart shows the relationship between GLRS (conductance in the low resistance state) and ICC (compliance current), both plotted on logarithmic scales. The inset plot shows a similar relationship, but with a different notation for GLRS.
### Components/Axes
**Main Chart:**
* **X-axis (horizontal):** ICC (A) - Compliance Current in Amperes. Logarithmic scale from 10^-5 to 10^-4.
* **Y-axis (vertical):** GLRS (S) - Conductance in Siemens. Logarithmic scale from 10^-5 to 10^-4.
* **Data Series:** A single data series represented by a blue line with error bars.
**Inset Chart:**
* **X-axis (horizontal):** ICC (A) - Compliance Current in Amperes. Logarithmic scale from 10^-5 to 10^-4.
* **Y-axis (vertical):** ÄœLRS(S) - Conductance in Siemens. Logarithmic scale from 10^-5 to 10^-4.
* **Data Series:** A scatter plot of blue points.
### Detailed Analysis
**Main Chart Data:**
The blue line represents the relationship between ICC and GLRS. The trend is generally upward, indicating that as the compliance current increases, the conductance also increases.
* **ICC = 10^-5 A:** GLRS ≈ 1.2 x 10^-5 S, with an error bar extending approximately from 0.8 x 10^-5 S to 1.6 x 10^-5 S.
* **ICC ≈ 2 x 10^-5 A:** GLRS ≈ 3.5 x 10^-5 S, with an error bar extending approximately from 3 x 10^-5 S to 4 x 10^-5 S.
* **ICC ≈ 4 x 10^-5 A:** GLRS ≈ 6.5 x 10^-5 S, with an error bar extending approximately from 6 x 10^-5 S to 7 x 10^-5 S.
* **ICC ≈ 6 x 10^-5 A:** GLRS ≈ 8.5 x 10^-5 S, with an error bar extending approximately from 8 x 10^-5 S to 9 x 10^-5 S.
* **ICC ≈ 8 x 10^-5 A:** GLRS ≈ 1.0 x 10^-4 S, with an error bar extending approximately from 9.5 x 10^-5 S to 1.05 x 10^-4 S.
* **ICC ≈ 1.2 x 10^-4 A:** GLRS ≈ 1.2 x 10^-4 S, with an error bar extending approximately from 1.15 x 10^-4 S to 1.25 x 10^-4 S.
* **ICC ≈ 1.5 x 10^-4 A:** GLRS ≈ 1.3 x 10^-4 S, with an error bar extending approximately from 1.25 x 10^-4 S to 1.35 x 10^-4 S.
* **ICC ≈ 2 x 10^-4 A:** GLRS ≈ 1.4 x 10^-4 S, with an error bar extending approximately from 1.35 x 10^-4 S to 1.45 x 10^-4 S.
**Inset Chart Data:**
The inset chart shows a scatter plot of ÄœLRS(S) vs. ICC(A). The trend is similar to the main chart, with ÄœLRS generally increasing as ICC increases. The data is more scattered compared to the main chart.
### Key Observations
* Both charts show a positive correlation between compliance current (ICC) and conductance (GLRS or ÄœLRS).
* The main chart provides a clearer view of the trend due to the line connecting the data points and the presence of error bars.
* The inset chart shows a more granular view of the data, but the scatter makes it harder to discern the exact relationship.
* The rate of increase in GLRS appears to decrease as ICC increases, suggesting a saturation effect.
### Interpretation
The data suggests that increasing the compliance current during the switching process leads to a higher conductance in the low resistance state (LRS). This could be due to the formation of more conductive filaments or the enlargement of existing filaments within the device. The saturation effect observed at higher ICC values might indicate a limit to the number or size of filaments that can be formed. The inset plot likely represents raw data, while the main plot represents processed or averaged data, hence the smoother trend and error bars. The difference in notation (GLRS vs. ÄœLRS) might indicate different measurement techniques or data processing methods.
</details>
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Algorithm: Delta Rule Implementation
### Overview
The image presents Algorithm 1, which details the Delta Rule implementation using dual-memristors. It outlines the steps involved in updating weights based on the difference between predicted and actual values, incorporating memristor characteristics.
### Components/Axes
The algorithm consists of the following components:
- **Initialization:** `wji1` and `wji2` are initialized with random values.
- **Loop:** A `while` loop continues as long as `t` is less than `simDuration`.
- **Error Calculation:** `δj` is calculated as the absolute difference between `ŷ` and `y`.
- **Conditional Update:** An `if` statement checks if `@Pre` is true and `δj` is greater than `δth`.
- **Memristor Read:** Inside the `forall` loop, `Iji1` and `Iji2` are read from the memristors `wji1` and `wji2`.
- **Current Calculation:** `I1` and `I2` are calculated based on `Iji1`, `Iji2`, and `c1`.
- **Weight Update:** Another `if` statement checks if `(ŷ - y)` is greater than 0. Based on the condition, `S1`, `S2`, `ICCji1`, and `ICCji2` are updated using `I1`, `I2`, `η`, `δj`, and `c2`.
- **Memristor Reset and Set:** `RESET(wji1, wji2)` and `SET(wji1, wji2)` are called.
### Detailed Analysis or ### Content Details
The algorithm's steps are as follows:
1. **Initialization:**
* `wji1 = rand()`
* `wji2 = rand()`
2. **Main Loop:**
* `while t < simDuration do`
* `δj = |ŷ - y|`
* `if @Pre and δj > δth then`
* `forall wji do`
* `Iji1, Iji2 = READ(wji1, wji2)`
* `I1 = Iji1 * c1`
* `I2 = Iji2 * c1`
* `if (Å· - y) > 0 then`
* `S1 = I1 + ηδj`
* `ICCji1 = S1 * c2`
* `S2 = I2 - ηδj`
* `ICCji2 = S2 * c2`
* `else`
* `S1 = I1 - ηδj`
* `ICCji1 = S1 * c2`
* `S2 = I2 + ηδj`
* `ICCji2 = S2 * c2`
* `end`
* `end`
* `RESET(wji1, wji2)`
* `SET(wji1, wji2)`
* `end`
* `end`
### Key Observations
- The algorithm uses dual-memristors (`wji1`, `wji2`) to store weights.
- The error `δj` is calculated as the absolute difference between the predicted value `ŷ` and the actual value `y`.
- The weights are updated based on the error and a learning rate `η`.
- The `READ`, `RESET`, and `SET` functions likely interact with the memristor hardware.
- The conditional statements control the weight update process based on the error and a pre-defined condition `@Pre`.
### Interpretation
The algorithm implements a Delta Rule learning mechanism using dual-memristors. The memristors likely store the weights, and the algorithm updates these weights based on the error between the predicted and actual outputs. The use of dual-memristors might be for differential representation or to improve the stability and accuracy of the weight updates. The `@Pre` condition and `δth` threshold likely introduce a form of regularization or control over when the weights are updated. The algorithm aims to minimize the error between the predicted and actual values by iteratively adjusting the weights stored in the memristors.
</details>
F igure 2 . 2 : Event-based neuromorphic architecture using online learning in a 1 T 1 R array (a), and the asynchronous state machine used as the switch controller applying the appropriate voltages on the BL, SL and WL of the array for online learning.
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Circuit Diagram and State Machine: Memristor Array Control
### Overview
The image presents two diagrams. Diagram (a) depicts a circuit involving a memristor array connected to a switch controller and associated circuitry for reading and writing data. Diagram (b) shows a state machine outlining the operational states (IDLE, READ, SET, RESET) and transitions for controlling the memristor array.
### Components/Axes
**Diagram (a): Circuit Diagram**
* **Switch Controller:** Located at the top-left, connected to the memristor array via green and red lines.
* **Memristor Array:** A 2xN array of memristors.
* **Bit Lines (BL):** Labeled BL1, BL2, ..., BLN, running vertically.
* **Word Lines (WL):** Labeled WL1 and WL2, running horizontally.
* **Source Lines (SL):** Labeled SL1 and SL2, running horizontally.
* **Exc. (Excitation):** Green shaded region indicating excitatory memristors.
* **Inh. (Inhibition):** Red shaded region indicating inhibitory memristors.
* **Amplifiers:** Two amplifiers connected to the memristor array.
* Input voltages labeled as Vrf.
* Output voltages labeled as Ve and Vi.
* **I&F (Integrate and Fire):** A block receiving input from the amplifiers.
* **LB (Learning Block):** A blue block receiving input from the I&F block and outputting "Target1".
* **Current Lines:** Labeled Icc,e1 (green) and Icc,in (red), connecting the Learning Block back to the amplifier circuitry.
**Diagram (b): State Machine**
* **States:**
* IDLE: Initial state.
* READ: Reads data from the memristor array.
* SET: Sets the memristor to a specific state.
* RESET: Resets the memristor to a specific state.
* **Transitions:** Arrows indicate transitions between states, labeled with conditions (e.g., READ=1, STOP=1, WRITE=1).
* **State Parameters:** Each state includes values for Bit Line (BL), Source Line (SL), and Word Line (WL).
### Detailed Analysis
**Diagram (a): Circuit Diagram**
* The Switch Controller manages the activation of word lines (WL1, WL2) via blue arrows.
* The memristor array consists of memristors at the intersections of bit lines and word lines.
* The excitatory (Exc.) memristors are highlighted in green, while the inhibitory (Inh.) memristors are highlighted in red.
* The amplifiers compare the voltages from the memristor array (Vrf) and produce output voltages (Ve, Vi).
* The I&F block integrates the amplifier outputs and generates a signal for the Learning Block (LB).
* The Learning Block (LB) outputs a target signal (Target1) and provides feedback currents (Icc,e1, Icc,in) to the amplifier circuitry.
**Diagram (b): State Machine**
* **IDLE:**
* Transitions to READ state when READ=1.
* Transitions to SET state when STOP+1 and WRITE=1.
* **READ:**
* BL = Vrd
* SL = Vrf
* WL = Vdd
* Transitions to IDLE state when STOP=1.
* **SET:**
* BL = 0
* SL = Vset
* WL = Vc -> Icc
* Transitions to IDLE state.
* **RESET:**
* BL = Vrst
* SL = 0
* WL = Vdd
* Transitions to IDLE state.
### Key Observations
* The circuit diagram shows a memristor array with excitatory and inhibitory connections.
* The state machine defines the operational flow for reading, setting, and resetting the memristors.
* The state transitions are controlled by input signals (READ, STOP, WRITE).
* Each state has specific voltage levels applied to the bit lines, source lines, and word lines.
### Interpretation
The diagrams illustrate a system for controlling a memristor array, likely for neuromorphic computing or memory applications. The circuit diagram shows the physical connections and signal processing components, while the state machine defines the control logic for manipulating the memristor states. The use of excitatory and inhibitory memristors suggests a neural network implementation. The state machine ensures proper sequencing of operations for reading, writing, and resetting the memristors, enabling reliable data storage and processing. The feedback currents (Icc,e1, Icc,in) from the Learning Block to the amplifier circuitry suggest an adaptive learning mechanism.
</details>
where δ j is the error, and η is the learning rate. To implement this using memristive synaptic architecture, we represent each synaptic weight w , by the combined conductance of two memristors, wji 1 and wji 2 , arranged in a push-pull differential configuration. This scheme extends the effective dynamic range of a single synapse to capture the negative values.
During the network operation, the target and the prediction signals are compared continuously to generate the error signal. With the arrival of a pre-synaptic event, if the error signal is larger than a small error threshold, the weight update process is initiated. The small error threshold that creates the "stop-learning" regime has been proposed to help the convergence of the neural networks with stochastic weight updates [ 120 ].
The implementation of the synaptic plasticity consists of three phases (Alg. 1 ). First, a READ operation is performed on every excitatory and inhibitory memristor to determine their conductance. Then the resulting current values ( I ji 1 and I ji 2 ) are scaled to the level of the error signal. Second, the current value proportional to the amount of the weight change ηδ j xi is summed up with the scaled READ current to represent the desired conductance strength to be programmed. Finally, these currents are further scaled to a valid ICC range using linear scaling constants c 1 and c 2. To provide a larger dynamic range per synapse, the conductance of both memristors are updated with a push-pull mechanism considering the sign of the error (i.e. if the conductance of one memristor increased, the conductance of the complementary memristor is decreased, and vice-versa).
## 2 . 4 learning circuits and architecture
F igure 2 . 3 : Learning circuits generating the ICC for updating the devices based on the distance between the neuron and its target frequency. Highlighted in red is the Gm-C filters, low pass filtering the neuron and target spikes giving rise to VN and VT . In green and orange, the error between the two is calculated generating positive ( I ErrP ) and negative ( IErrN ) errors, unless error is small and STOP signal is high. In purple, Ve , the excitatory voltage from Fig. 2 . 2 , regenerates the read current and is scaled to I scale producing IeS . Based on the error sign (UP), ICC 1 is either the sum of IeS and I Err or the subtraction of the two.
<details>
<summary>Image 7 Details</summary>

### Visual Description
## Circuit Diagram: Four-Stage Neuromorphic Circuit
### Overview
The image presents a circuit diagram composed of four distinct stages, each highlighted with a different background color (red, green, orange, and blue). These stages appear to represent a neuromorphic circuit, likely designed for spike-based processing. The diagram includes transistors, capacitors, operational amplifiers, and logic gates, interconnected to perform a specific function related to neuron and target spike processing.
### Components/Axes
* **Stage 1 (Red Background):**
* Two operational amplifiers (op-amps) are shown.
* The top op-amp has "Neuron Spikes" as input to the positive terminal and outputs a voltage labeled "VN".
* The bottom op-amp has "Target Spikes" as input to the positive terminal and outputs a voltage labeled "VT".
* Each op-amp output is connected to a capacitor, which is grounded.
* **Stage 2 (Green Background):**
* This stage contains a complex arrangement of transistors.
* Labels include: "ref", "Curr. Comp", "STOP", "Isubth", "VN", "VT", and "UP" (connected to an inverter).
* **Stage 3 (Orange Background):**
* This stage also contains a complex arrangement of transistors.
* Labels include: "VSF", "VerrP", "VerrN", "Iscale", "STOP", and "Pre".
* A NAND gate is labeled "STOP" and "Pre" as inputs.
* **Stage 4 (Blue Background):**
* This stage contains transistors and a NAND gate.
* Labels include: "STOP", "Pre", "Iscale", "Ve", "Ie_s", "VerrP", "VerrN", "UP", "UP", and "Icc1".
* A NAND gate is labeled "STOP" and "Pre" as inputs.
### Detailed Analysis
* **Stage 1 (Red):** The op-amps likely convert the input spike signals (Neuron Spikes and Target Spikes) into corresponding voltages VN and VT. The capacitors likely serve as integrators or filters.
* **Stage 2 (Green):** This stage appears to be a comparator or current mirror circuit. The "Curr. Comp" label suggests a current comparison function. "Isubth" likely refers to a subthreshold current. The "UP" signal is likely an output indicating the result of the comparison.
* **Stage 3 (Orange):** This stage seems to be related to error amplification and scaling. "VerrP" and "VerrN" likely represent positive and negative error voltages. "Iscale" likely represents a scaling current. The NAND gate with "STOP" and "Pre" inputs likely controls the operation of this stage.
* **Stage 4 (Blue):** This stage appears to be involved in generating a control signal based on the error voltages. "Ve" likely represents an error voltage. "Ie_s" likely represents an error current. "Icc1" likely represents a control current. The "UP" and "UP" signals likely control the direction of the correction.
### Key Observations
* The circuit is highly interconnected, with signals flowing between the different stages.
* The use of transistors and capacitors suggests analog signal processing.
* The presence of logic gates suggests digital control or decision-making.
* The labels "Neuron Spikes" and "Target Spikes" suggest that the circuit is designed for processing neural signals.
### Interpretation
The circuit diagram likely represents a neuromorphic system designed to compare neuron and target spike rates and generate a control signal to adjust the neuron's behavior. The first stage converts spike rates into voltages. The second stage compares these voltages. The third stage amplifies the error between the voltages. The fourth stage generates a control signal based on the amplified error. The "STOP" and "Pre" signals likely provide a mechanism to halt or reset the circuit's operation. The overall system appears to implement a form of spike-timing-dependent plasticity (STDP) or a similar learning rule.
</details>
neuromorphic architecture Figure 2 . 2 a illustrates the event-based neuromorphic architecture encompassing the learning algorithm. It consists of a 1 T 1 R array, a Switch Controller,
Leaky Integrate and Fire (I&F) neurons and a learning block (LB). Every neuron receives excitatory and inhibitory currents from two rows of the 1 T 1 R array respectively.
With the arrival of every event through the AER interface (not shown), two consecutive READ and WRITE signals are generated [ 115 ]. Based on these signals, the asynchronous state machine in Fig. 2 . 2 b controls the sequence so that the SLs, BLs and the WLs of the array are driven by the appropriate voltages such that: device is read, its value is integrated by the I&F neuron; the error value is updated through the learning block (LB), generating ICC 1 and ICC 2 (section 2 . 3 ); Based on these values, the excitatory and inhibitory devices are programmed.
learning circuits Based on Alg. 1 and data from Fig. 2 . 1 , we have designed circuits that generate the appropriate ICC , based on the firing rate distance between the neuron and its target. Figure 2 . 3 presents these circuits. The spikes from the neurons and the target are integrated using subthreshold Gm-C filters highlighted in red generating VN and VT . These voltages are subtracted from one another using a subthreshold "Bump" ( subBump ) circuit [ 121 ] highlighted in green, and an above-threshold "Bump" circuit ( abvBump ) in orange.
subBump circuit compares VN and VT giving rise to the error currents when neuron and the target frequencies are far apart and generates the STOP signal when the error is small and in the stop-learning range ( δ th ) [ 120 , 122 ]. STOP signal gates the tail current of all the above-threshold circuits and thus substantially reduce the power consumption when the learning is stopped. Moreover, input events are also used as another gating mechanism. abvBump circuit subtracts VN and VT and scales it to I scale , equal to the maximum ICC required based on Fig. 2 . 1 . Based on the error sign (UP), the scaled error current is summed with or subtracted from the scaled device current generating the desired ICC (Alg. 1 ). This circuit is highlighted in purple.
circuit simulations results Figure 2 . 4 a depicts the positive and negative error currents, STOP-learning signal, and the ICC 1 and ICC 2 currents. The error currents follow a Sigmoid which can be approximated by a line for error values between 1 and 1 . As is explained in Alg. 1 , for positive errors, ICC 2 ( ICC 1 ) follows the summation (subtraction) of the error current with the scaled device current, while for the negative errors, it is the opposite. Figure 2 . 4 b illustrates the dependence of the ICC on the current value of the devices which shifts the error current curve up or down.
## 2 . 5 system -level simulations
We performed SNN simulations with BRIAN 2 [ 123 ] to evaluate the performance of our proposed update scheme, incorporating the device models (see Fig. 2 . 1 ) with stochastic weight changes. Our goal was to achieve a test accuracy comparable to the artificial neural networks trained with backpropagation with Single-precision floating-point format (FP 32 ) precision on the digital hardware.
We evaluated our network on the MNIST handwritten digits dataset [ 124 ] using the first five classes ( 30596 training and 5139 testing images, each 28 × 28 pixels). We trained a fully connected, single-layer spiking network with 784 input LIF neurons and 5 output LIF neurons. Each input image was presented for 100 ms, with pixel intensity encoded as Poisson spikes with a rate of [ 0, 200 ] Hz. At the output layer, spikes were counted per neuron during each stimulus, and the neuron with the maximum firing rate was selected as the network prediction. The error signal was calculated as the difference between low-pass filtered network output spikes and low-pass filtered target spikes, encoded as Poisson spikes at 40 kHz.
We modeled the cycle-to-cycle variability of ICC dependent GLRS conductance using a Gaussian distribution with ICC -dependent mean and standard deviation, as described in Section 2 . 2 . This variability model was applied to all synaptic devices in the simulation, and we achieved the test accuracy of 92 . 68 % after training for three epochs.
F igure 2 . 4 : (a) Error current, STOP learning signal and ICC as a function of the normalized error between the target and the neuron frequencies. (b) Change of the ICC 1 (red) and ICC 2 (blue) as a function of the error and the resistance value of the devices.
<details>
<summary>Image 8 Details</summary>

### Visual Description
## Chart Type: Line Graphs of Current vs. Normalized Error
### Overview
The image contains two line graphs, labeled 'a' and 'b', displaying the relationship between current (I, in μA) and normalized error. Graph 'a' shows the current of different components (I_bump, I_CC1, I_CC2, I_errP, I_errN) as a function of normalized error. Graph 'b' shows the current as a function of normalized error for different values of R, with an arrow indicating increasing R.
### Components/Axes
**Graph a:**
* **X-axis:** Normalized error, ranging from -1.00 to 1.00 in increments of 0.25.
* **Y-axis (left):** I (μA), ranging from 0 to 300 in increments of 50.
* **Y-axis (right):** Stop (V), ranging from 0.00 to 1.75 in increments of 0.25.
* **Legend (top-right):**
* Green: I_bump
* Dark Blue: I_CC1
* Light Blue: I_CC2
* Brown: I_errP
* Orange: I_errN
**Graph b:**
* **X-axis:** Normalized error, ranging from -1.00 to 1.00 in increments of 0.25.
* **Y-axis:** I (μA), ranging from 0 to 300 in increments of 50.
* **Arrow:** Indicates "Increasing R" from top-left to bottom-right.
* **Data Series:** A family of curves ranging from dark blue to dark red, representing different values of R.
### Detailed Analysis
**Graph a:**
* **I_bump (Green):** Starts near 0 μA at normalized error -1.00, increases to approximately 60 μA around -0.25, sharply drops to 0 μA at 0.00, and then increases again to approximately 60 μA at 0.25, before decreasing back to near 0 μA at 1.00.
* **I_CC1 (Dark Blue):** Starts at approximately 330 μA at normalized error -1.00, decreases steadily to approximately 10 μA at 0.00, and remains near 10 μA until 1.00.
* **I_CC2 (Light Blue):** Starts near 0 μA at normalized error -1.00, increases steadily to approximately 130 μA at 0.00, and continues to increase to approximately 150 μA at 1.00.
* **I_errP (Brown):** Starts at approximately 160 μA at normalized error -1.00, decreases steadily to approximately 0 μA at 0.00, and remains near 0 μA until 1.00.
* **I_errN (Orange):** Starts near 0 μA at normalized error -1.00, remains near 0 μA until 0.00, and then increases steadily to approximately 80 μA at 1.00.
**Graph b:**
* The curves transition from dark blue to dark red, representing increasing values of R.
* The dark blue curves start at high current values (approximately 300 μA) at normalized error -1.00 and decrease to near 0 μA at 0.00, remaining near 0 μA until 1.00.
* The dark red curves start near 0 μA at normalized error -1.00, remain near 0 μA until 0.00, and then increase to higher current values (approximately 180 μA) at 1.00.
* As R increases (from blue to red), the current at negative normalized error values decreases, and the current at positive normalized error values increases.
### Key Observations
* In graph 'a', I_bump peaks around normalized error values of -0.25 and 0.25. I_CC1 and I_errP decrease as normalized error increases, while I_CC2 and I_errN increase.
* In graph 'b', increasing R causes a shift in the current distribution. Higher R values result in lower current for negative normalized errors and higher current for positive normalized errors.
### Interpretation
The graphs illustrate how different current components and overall current behavior change with respect to normalized error and the parameter R. Graph 'a' shows the individual contributions of different current components, highlighting their behavior around the zero normalized error point. Graph 'b' demonstrates how the overall current profile is modulated by the value of R, suggesting that R influences the system's response to positive and negative errors. The data suggests a balancing or switching mechanism controlled by the normalized error and modulated by the parameter R.
</details>
## 2 . 6 discussion
Significant effort is underway to develop learning algorithms for SNNs due to their potential for highly parallel, low-power processing. However, a substantial gap exists between these algorithms and their hardware implementation due to noise, variability, and limited bit precision. This gap underscores the importance of technologically plausible learning algorithms rooted in device physics and measurements [ 125 ]. Our work, exploiting ReRAM current compliance ICC for weight updates, represents a step in this direction.
power consumption and scalability While the learning block can generate up to 100 s of µ A s of ICC for large errors, design considerations such as event and STOP-learning signal gating mitigate average power consumption. Peak current per learning block ranges from 1 to 600 µ A , depending on the network error. Leveraging the Poisson distribution of events (due to thermal noise), we can assume that only one column of devices is programmed at a time. Therefore, peak power scales sublinearly with the number of neurons (linear in the worst case). This sublinear scaling implies that power consumption does not fundamentally limit scalability. However, with Poisson-distributed input events and a maximum frequency per input channel, an upper bound for array size exists, determined by event pulse width and tolerance to missing events [ 54 ].
the nonlinear effect The power-law relationship between ICC → GLRS (Fig. 2 . 1 ) introduces a nonlinear mapping of weight updates. This nonlinearity slightly biases the weight update away from the optimal values calculated by the Delta rule. Further investigation into mitigating this bias through calibration or algorithmic compensation could improve learning accuracy.
In this chapter, I presented a technologically plausible learning algorithm that leverages the compliance current of binary ReRAMs to generate variable, multi-level conductance changes. Our comprehensive co-design approach spans multiple levels of abstraction, from device measurements to algorithm, architecture, and circuits. We believe this work represents a significant step toward realizing always-on, on-chip learning systems. As we will see in Chapter 4 , our method can be extended to other non-volatile materials, providing a broader pathway for on-chip learning hardware.
## ONLINE TEMPORAL CREDIT ASSIGNMENT WITH NON-VOLATILE AND VOLATILE MEMRISTORS
This chapter builds upon three conceptually linked works. My initial concept of investigating the online credit assignment problem for recurrent neural networks implemented with non-ideal non-volatile memristive devices [ 126 ], and the subsequent utilization of eligibility traces with volatile memristors for scalability [ 127 ], laid the foundation for this research. These concepts were further validated through real-hardware testing in collaboration with IBM Research Zürich [ 128 ].
## 3 . 1 framework for online training of rsnns with non -volatile memristors
Training RSNNs on ultra-low-power hardware remains a significant challenge. This is primarily due to the lack of spatio-temporally local learning mechanisms capable of addressing the credit assignment problem effectively, especially with limited weight resolution and online training with a batch size of one. These challenges are accentuated when using memristive devices for in-memory computing to mitigate the von Neumann bottleneck, at the expense of increased stochasticity in recurrent computations.
To investigate online learning in memristive neuromorphic Recurrent Neural Network (RNN) architectures, we present a simulation framework and experiments on differential-architecture crossbar arrays based on an accurate and comprehensive PCM device model. We train a spiking RNN on regression tasks, with weights emulated within this framework, using the recently proposed e-prop learning rule. While e-prop truncates the exact gradients to follow locality constraints, its direct implementation on memristive substrates is hindered by significant PCM non-idealities. We compare several widely adopted weight update schemes designed to cope with these non-idealities and demonstrate that only gradient accumulation can enable efficient online training of RSNNs on memristive substrates.
## 3 . 1 . 1 Introduction
RNNs are a remarkably expressive [ 129 ] class of neural networks, successfully adapted in domains such as audio/video processing, language modeling and Reinforcement Learning (RL) [ 130 -135 ]. Their power lies in their architecture, enabling the processing of long and complex sequential data. Each neuron contributes to network processing at various times in the computation, promoting hardware efficiency by the principle of reuse and being the dominant architecture observed in the mammalian neocortex [ 136 , 137 ]. However, training RNNss under constrained memory and computational resources remains a challenge [ 83 ].
Current hardware implementations of neural networks still lag behind the energy efficiency of biological systems, largely due to data movement between separate processing and memory units in von Neumann architectures. Compact nanoscale memristive devices have gained attention for implementing artificial synapses [ 99 , 138 -142 ]. These devices enable calculating synaptic propagation in-memory between neurons, breaking the von Neumann bottleneck [ 92 , 93 ]
Memristive devices are particularly promising for SNNs, especially for low-power, sparse, and event-based neuromorphic systems that emulate biological principles [ 143 , 144 ]. In these systems, synapses (memory) and neurons (processing units) are arranged in a crossbar architecture (Fig. 3 . 1 a), with memristive devices storing synaptic efficacy in their programmable multi-bit conductance values. This architecture inherently supports the sparse, event-driven nature of SNNs, enabling in-memory computation of synaptic propagation through Ohm's and Kirchhoff's Laws. As demonstrated for 32 nm technology [ 145 , 146 ], memristive crossbar arrays offer higher density and lower dynamic energy consumption during inference compared to traditional Static Random Access Memory (SRAM). Additionally, their non-volatile nature reduces static power consumption associated with volatile CMOS memory. Thus, in-memory acceleration of spiking
RNNs with non-volatile, multi-bit resolution memristive devices is a promising path for scalable neuromorphic hardware in temporal signal processing.
PCM devices are among the most mature emerging resistive memory technologies. Their small footprint, fast read/write operation, and multi-bit storage capacity make them ideal for in-memory computation of synaptic propagation [ 147 , 148 ]. Consequently, PCM technology has seen increased interest in neuromorphic computing [ 138 , 149 -151 ].
While a single PCM device can achieve 3 -4 bits of resolution [ 152 ], they exhibit significant non-idealities due to stochastic Joule heating-based switching physics. Molecular dynamics introduce 1/ f noise and structural relaxation, leading to cycle-to-cycle variation in addition to device-to-device variability from fabrication.
Hardware-algorithm co-design with chip-in-the-loop setups is one approach to address these non-idealities [ 138 ]. However, neural network training necessitates iterative evaluation of architectures, learning rule modifications, and hyperparameter tuning on large datasets, which are time and resource-intensive with such setups. In contrast, a software simulation framework with highly accurate statistical model of memristive devices offers faster iteration and better understanding of device effects due to increased observability of internal state variables.
In this work, we investigate whether a RSNN can be trained with a local learning rule despite the adverse impacts of memristive in-memory computing, including write/read noise, conductance drift, and limited bit precision. We build upon the statistical PCM model from Nandakumar et al. [ 81 ] to faithfully model a differential memristor crossbar array (Section 3 . 1 . 3 ), define a target spiking RNN architecture, and describe the properties of an ideal learning rule, selecting the e-prop algorithm [ 75 ] for training (Section 3 . 1 . 3 . 1 ). We implement multiple memristor-aware weight update methods to map ideal e-prop updates to memristor conductances on the crossbar array, addressing device non-idealities (Section 3 . 1 . 3 . 2 ). Finally, we present a training scheme exploiting in-memory computing with extreme sparsity and reduced conductance updates for energy-efficient training (Section 3 . 1 . 3 . 4 ).
## 3 . 1 . 2 Building blocks for training on in-memory processing cores
In this section, we describe the main components of our simulation framework for training spiking RNNs with PCM synapses 1 .
## 3 . 1 . 3 PCM device modeling and integration into neural networks
The nanoscale PCM device typically consists of a Ge2Sb2Te5 (GST) switching material sandwiched between two metal electrodes, forming a mushroom-like structure (Fig. 3 . 1 b). Short electrical pulses applied to the device terminals induce Joule heating, locally modifying the temperature distribution within the PCM. This controlled temperature change can switch the molecular configuration of GST between amorphous (high-resistance) and crystalline (low-resistance) states [ 17 ].
A short, high-amplitude RESET pulse (typically 3 . 5 V amplitude and 20 ns duration) increases the amorphous state volume by melting a significant portion of the GST, which then rapidly quenches to form an amorphous configuration. Conversely, a longer, lower-amplitude SET pulse ( 3 . 5 V amplitude and 20 ns duration) increases the crystalline volume by raising the temperature to initiate crystal nuclei growth. To read the device conductance, a smaller-amplitude READ pulse ( 0 . 2 V amplitude 50 ns duration) is applied to avoid inducing phase transitions.
In practice, PCM programming operations suffer from write/read noise and conductance drift [ 153 ]. The asymmetry of SET and RESET operations, along with the nonlinear conductance response to pulse number and frequency, complicate precise programming. Accurately capturing these non-idealities and device dynamics in network models is crucial for realistic evaluation of metrics such as weight update robustness, hyperparameter choices, and training duration.
While comprehensive models exist for describing PCM electrical [ 45 ], thermal [ 154 ], structural [ 155 , 156 ], and phase-change properties [ 157 , 158 ], these models often involve on-the-fly differential equations with uncertain numerical convergence, lack incorporation of inter- and
1 The code is available at https://github.com/YigitDemirag/srnn-pcm
F igure 3 . 1 : a. PCM devices can be arranged in a crossbar architecture to emulate both a non-volatile synaptic memory and a parallel and asynchronous synaptic propagation using in-memory computation. b. Mushroom-type geometry of a single PCM device. The conductance of the device can be reconfigured by changing the volume ratio of amorphous and crystalline regions.
<details>
<summary>Image 9 Details</summary>

### Visual Description
## Diagram: Neuromorphic Computing Architecture and Memristor Operation
### Overview
The image presents a schematic of a neuromorphic computing architecture using memristors as synaptic elements. Part (a) illustrates the network structure with presynaptic and postsynaptic neurons connected by a grid of synapses. Part (b) depicts the operational principle of a memristor, showing the transition between crystalline and amorphous states to modulate conductance during RESET and SET operations.
### Components/Axes
**Part a:**
* **Nodes:** Represented by circles (neurons) and squares (synapses).
* **Presynaptic neurons:** Located on the left side of the diagram.
* **Synapses:** Arranged in a grid pattern connecting presynaptic and postsynaptic neurons.
* **Postsynaptic neurons:** Located at the bottom of the diagram.
* **Connections:** Lines connecting neurons and synapses.
* **Inset:** A magnified view of a synapse, showing two memristor circuits labeled G+ and G-.
**Part b:**
* **Legend (Top-Right):**
* **Blue:** Electrode (Pt/Ti)
* **Green:** Geâ‚‚Sbâ‚‚Teâ‚… (Crystalline)
* **Red:** Geâ‚‚Sbâ‚‚Teâ‚… (Amorphous)
* **Yellow:** Heater (TiN)
* **Gray:** Dielectric (SiOâ‚‚)
* **Memristor Operation Diagrams:** Three cylindrical diagrams illustrating the memristor's state during RESET and SET operations.
* **RESET Operation:** Labeled above the diagrams, indicating decreasing conductance.
* **SET Operation:** Labeled below the diagrams, indicating increasing conductance.
* **Arrow:** A double-headed arrow below the diagrams indicates the direction of the SET operation (increasing conductance), while an arrow above the diagrams indicates the direction of the RESET operation (decreasing conductance).
### Detailed Analysis
**Part a: Neuromorphic Network**
* The network consists of 6 presynaptic neurons and 6 postsynaptic neurons.
* The synapses form a 6x6 grid, with each synapse connecting a presynaptic neuron to a postsynaptic neuron.
* The inset shows a simplified circuit diagram of a synapse, containing two memristors (G+ and G-) connected in a specific configuration.
**Part b: Memristor Operation**
* The memristor consists of several layers: Electrode (Pt/Ti), Geâ‚‚Sbâ‚‚Teâ‚… (Crystalline/Amorphous), Heater (TiN), and Dielectric (SiOâ‚‚).
* **Initial State:** The first diagram shows the heater (yellow) surrounded by the dielectric (gray), with the crystalline Geâ‚‚Sbâ‚‚Teâ‚… (green) above, and the electrode (blue) on top.
* **RESET Operation (Decreasing Conductance):** As the memristor undergoes RESET operation, the heater causes a portion of the crystalline Geâ‚‚Sbâ‚‚Teâ‚… to transform into the amorphous state (red). The amorphous region grows in size from left to right in the diagrams.
* **SET Operation (Increasing Conductance):** The reverse process, where the amorphous region transforms back into the crystalline state, increases the conductance.
### Key Observations
* The neuromorphic network architecture utilizes memristors as synaptic elements.
* The conductance of the memristor is modulated by changing the phase of the Geâ‚‚Sbâ‚‚Teâ‚… material between crystalline and amorphous states.
* RESET operation decreases conductance by increasing the amorphous region, while SET operation increases conductance by increasing the crystalline region.
### Interpretation
The image illustrates a fundamental concept in neuromorphic computing, where memristors are used to mimic the behavior of biological synapses. The ability to modulate the conductance of memristors allows for the implementation of artificial neural networks that can learn and adapt like the human brain. The RESET and SET operations demonstrate how the memristor's resistance can be dynamically adjusted, enabling the storage and processing of information within the network. The use of Geâ‚‚Sbâ‚‚Teâ‚…, a phase-change material, is crucial for achieving this dynamic resistance modulation. The architecture and operational principles shown in the image are essential for developing energy-efficient and high-performance neuromorphic computing systems.
</details>
intra-device stochasticity, or are designed for pulse shapes and current-voltage sweeps that do not reflect circuit operating conditions [ 17 ].
Therefore, we adopted the statistical PCM model by Nandakumar et al. [ 81 ], which captures major PCM non-idealities based on measurements from 10 , 000 devices. This model includes nonlinear conductance change with respect to applied pulses, conductance-dependent write and read stochasticity, and temporal drift (Fig. 3 . 2 ). A programming history variable represents the device's nonlinear response to consecutive SET pulses, updated after each pulse. After a new SET pulse, the model samples conductance change ( ∆ G ) from a Gaussian distribution with mean and standard deviation based on programming history and previous conductance. Drift is included using the empirical exponential drift model [ 156 ] G ( t ) = G ( T 0 ) ( t / T 0 ) -v , where G ( T 0 ) is the conductance after a WRITE pulse at time T 0, and G ( t ) is the final conductance after drift. The model also accounts for 1/ f READ noise [ 159 ], which increases monotonically with conductance.
To integrate this model into neural network simulations, we developed a PyTorch-based PCM crossbar array simulation framework [ 107 ]. This framework tracks all simulated PCM devices in the crossbar simultaneously, enabling realistic SET, RESET, and READ operations (implementation details in Section A). Section 3 . 1 . 3 . 3 describes how this framework is used to represent synaptic weights in an RNN.
F igure 3 . 2 : The chosen PCM model from [ 81 ] captures the major device non-idealities. a. The WRITE model enables calculation of the conductance increase with each consecutive SET pulse applied to the device. The band illustrates one standard deviation. b. The READ model enables calculation of 1/ f noise, increasing as a function of conductance. c. The DRIFT model calculates the temporal conductance evolution as a function time. T 0 indicates the time of measurement after the initial programming of the device.
<details>
<summary>Image 10 Details</summary>

### Visual Description
## Chart Type: Multiple Line Graphs
### Overview
The image presents three line graphs (a, b, c) illustrating different models: WRITE, READ, and DRIFT. Each graph plots conductance against different parameters (pulse number, device conductance, and time, respectively). The graphs are in a light blue color scheme.
### Components/Axes
**Graph a: WRITE model**
* **Title:** WRITE model
* **X-axis:** Pulse number (values: 1, 4, 8, 12, 16, 20)
* **Y-axis:** Device conductance (µS) (values: 0.0, 2.5, 5.0, 7.5, 10.0, 12.5)
* **Data:** A single line with a shaded region around it, representing the uncertainty.
**Graph b: READ model**
* **Title:** READ model
* **X-axis:** Device conductance (µS) (values: 0.1, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0)
* **Y-axis:** Measured conductance (µS) (values: 0.0, 2.5, 5.0, 7.5, 10.0, 12.5)
* **Data:** A single line with a shaded region around it, representing the uncertainty.
**Graph c: DRIFT model**
* **Title:** DRIFT model
* **X-axis:** Time (s) (logarithmic scale: 10^-2, 10^0, 10^2, 10^4)
* **Y-axis:** Device conductance (µS) (values: 5, 10, 15)
* **Legend:** Located in the top-right corner.
* G(T0) = 12 (top line)
* G(T0) = 10 (second line from top)
* G(T0) = 8 (third line from top)
* G(T0) = 6 (fourth line from top)
* G(T0) = 4 (fifth line from top)
* G(T0) = 2 (bottom line)
### Detailed Analysis
**Graph a: WRITE model**
* The device conductance increases rapidly with the initial pulses and then plateaus.
* At pulse number 1, the device conductance is approximately 2 µS.
* At pulse number 4, the device conductance is approximately 5 µS.
* At pulse number 8, the device conductance is approximately 7 µS.
* At pulse number 12, the device conductance is approximately 8 µS.
* At pulse number 16, the device conductance is approximately 8.5 µS.
* At pulse number 20, the device conductance is approximately 9 µS.
**Graph b: READ model**
* The measured conductance increases linearly with the device conductance.
* At device conductance 0.1 µS, the measured conductance is approximately 0 µS.
* At device conductance 2.0 µS, the measured conductance is approximately 2 µS.
* At device conductance 4.0 µS, the measured conductance is approximately 4 µS.
* At device conductance 6.0 µS, the measured conductance is approximately 6 µS.
* At device conductance 8.0 µS, the measured conductance is approximately 8 µS.
* At device conductance 10.0 µS, the measured conductance is approximately 10 µS.
* At device conductance 12.0 µS, the measured conductance is approximately 12 µS.
**Graph c: DRIFT model**
* All lines show a decrease in device conductance over time.
* The higher the initial conductance G(T0), the higher the device conductance at any given time.
* G(T0) = 12: Device conductance starts at approximately 18 µS at 10^-2 s and decreases to approximately 6 µS at 10^4 s.
* G(T0) = 10: Device conductance starts at approximately 15 µS at 10^-2 s and decreases to approximately 5 µS at 10^4 s.
* G(T0) = 8: Device conductance starts at approximately 12 µS at 10^-2 s and decreases to approximately 4 µS at 10^4 s.
* G(T0) = 6: Device conductance starts at approximately 9 µS at 10^-2 s and decreases to approximately 3 µS at 10^4 s.
* G(T0) = 4: Device conductance starts at approximately 6 µS at 10^-2 s and decreases to approximately 2 µS at 10^4 s.
* G(T0) = 2: Device conductance starts at approximately 3 µS at 10^-2 s and decreases to approximately 1 µS at 10^4 s.
### Key Observations
* The WRITE model shows a saturation effect, where the device conductance increases less with each subsequent pulse.
* The READ model shows a linear relationship between device conductance and measured conductance.
* The DRIFT model shows that device conductance decreases over time, with the rate of decrease depending on the initial conductance.
### Interpretation
The three graphs illustrate the behavior of a device under different operations: writing, reading, and drifting. The WRITE model demonstrates how the device's conductance changes as it is programmed with pulses. The READ model shows how accurately the device's conductance can be measured. The DRIFT model shows how the device's conductance changes over time due to drift effects. The data suggests that the device's conductance can be controlled through pulses, measured accurately, but is also subject to drift over time. The drift effect is more pronounced for devices with higher initial conductance.
</details>
## 3 . 1 . 3 . 1 Credit Assignment Solutions for Recurrent Network Architectures
The credit assignment problem refers to the problem of determining the appropriate change for each synaptic weight to achieve the desired network behavior [ 7 ]. As the architecture determines the information flow inside the network, the credit assignment solution is intertwined with network architecture. Consequently, many proposed solutions in SNNs landscape are specific to architecture components e.g., eligibility traces [ 160 ], dendritic [ 161 ] or neuromodulatory [ 162 ] signals.
In our work, we select a RSNN with LIF neuron dynamics described by the following discretetime equations [ 75 ]:
̸
$$v _ { j } ^ { t + 1 } & = \alpha v _ { j } ^ { t } + \sum _ { i \neq j } W _ { j i } ^ { r e c } z _ { i } ^ { t } + \sum _ { i } W _ { j i } ^ { i n } x _ { i } ^ { t } - z _ { j } ^ { t } v _ { t h } \\ & z _ { j } ^ { t } = H \left ( \frac { v _ { j } ^ { t } - v _ { t h } } { v _ { t h } } \right )$$
where v t j is the membrane voltage of neuron j at time t . The output state of a neuron is a binary variable, z t j , that can either indicate a spike, 1, or no spike, 0. The neuron spikes when the membrane voltage exceeds the threshold voltage vth , a condition that is implemented based on the Heaviside function H . The parameter α ∈ [ 0, 1 ] is the membrane decay factor calculated as α = e -δ t / τ m , where δ t is the discrete time step resolution of the simulation and τ m is the neuronal membrane decay time constant, typically tens of milliseconds. The network activity is driven by input spikes x t i . Input and recurrent weights are represented as W in ji and W rec ji respectively. At the output layer, the recurrent spikes are fed through readout weights W out kj into a single layer of leaky integrator units yk with the decay factor κ ∈ [ 0, 1 ] . This continuous valued output unit is analogous to a motor function which generates coherent motor output patterns [ 163 ] of the type shown in Fig. 3 . 3 .
Training goal is to find optimal network weights { W inp ji , W rec ji and W out kj } , that maximize the task performance [ 7 ]. For an ideal neuromorphic hardware, the learning algorithm must (i) use spatio-temporally local signals (ii) be online and (iii) tested beyond the toy problems. As an example, FORCE algorithm [ 163 , 164 ] performs well on motor tasks, but it violates the first requirement by requiring knowledge of all synaptic weights. BPTT with surrogate gradients [ 165 , 166 ] has its own drawbacks due to the need to buffer intermediate neuron states and activations, violating the second requirement.
E-prop offers a local and online learning rule for single-layer RNNs [ 75 ] by factorizing the gradients into a sum of products between instantaneously available learning signals and local eligibility traces. Speficially, the gradient dE dWji is represented as a sum of products over time t :
$$\frac { d E } { d W _ { j i } } = \sum _ { t } \frac { d E } { d z _ { j } ^ { t } } \cdot \left [ \frac { d z _ { j } ^ { t } } { d W _ { j i } } \right ] _ { l o c a l } \, ,$$
where E is the loss term such as the mean squared error between the network output y t k and target y ∗ , t k for regression tasks.
The term [ dz t j dWji ] local is not an approximation. It is computed locally, carries the factorization of the gradient (a local measure of the the synaptic weight's contribution to the neuronal activity) forward in time, and is described as the eligibility trace for the synapse from neuron i to neuron j at time t . Ideally, the term dE dz t j would be the total derivative of the loss function with respect to the neuron's spike output. However, this is unavailable online as it requires information about the spike's future impact on the error. Therefore, e-prop approximates the learning signal using the partial derivative ∂ E ∂ z t j , considering only the direct influence of the spike output on
the instantaneous error. This approximation enables e-prop to function as an online learning algorithm for RSNNs, at the cost of truncating the exact gradients.
E-prop's performance is notable, achieving performance comparable results to Long Short-Term Memory (LSTM) networks [ 167 ] trained with BPTT on complex temporal tasks. While the OSTL algorithm [ 84 ] also supports complex recurrent architectures and generates exact gradients, its computational complexity makes it less suitable for hardware implementation compared to e-prop.
In this work, we focus on e-prop due to its
- Sufficient gradient alignment: while not exact, e-prop provides a sufficient approximation of the true gradient.
- Relative simplicity: computational locality and simplicity are well-suited for low-power neuromorphic hardware applications.
- Neuroscientific relevance: the employment of eligibility traces aligns with neuroscientific observations of synaptic plasticity.
## 3 . 1 . 3 . 2 Memristor-Aware Weight Update Optimization
In mixed-signal neuromorphic processors, Learning Blocks (LBs) are typically located with neuron circuits [ 143 , 168 , 169 ]. LBs continuously monitor signals available to the neuron (e.g., presynaptic activities, inference, feedback) and, based on the desired learning rule, instruct weight updates on the synapses. However, when network weights are implemented with memristive devices, weight updates are subject to analog non-idealities. Therefore, LB design must consider these device non-idealities, such as programming noise and asymmetric SET/RESET updates, to ensure accurate transfer of calculated gradients to device conductances. To save both energy and area, weight updates are typically implemented in a single-shot fashion, using one or multiple gradual SET pulses to update the device without requiring a read-verify cycle.
In the following, we describe four widely adopted weight update methods for LBs, implemented in our PCM crossbar array simulation framework. Each method is designed to cope with device non-idealities. In all experiments, our framework employs differential synaptic configuration [ 81 , 109 , 110 ], where each synapse has two sets of memristors ( G + and G -) whose difference represents the effective synaptic conductance (Fig. 3 . 1 a) 2
the sign gradient descent ( signgd ) In SignGD, synaptic weights W is updated based solely on the sign of the loss function gradient with respect to the weights, ˆ ∇ W L , as approximated by the online learning rule. Updates only occur when the magnitude of the gradient for a weight exceeds a predefined threshold θ such that
$$\Delta W = - \delta \, s i g n ( \hat { \nabla } _ { W } \mathcal { L } ) \odot \mathbb { I } ( | \hat { \nabla } _ { W } \mathcal { L } | > \theta ) & & ( 3 . 3 )$$
where δ is a positive stochastic variable representing the conductance change due to a single SET pulse applied to the memristor, sign ( · ) is the element-wise sign function, ⊙ is Hadamard product, I ( · ) is the indicator function implementing stop-learning regime ( 1 if true, 0 otherwise).
This approach ensures convergence under certain conditions [ 171 ]. Due to its simplicity, SignGD is popular in memristive neuromorphic systems [ 172 -174 ]. Upon weight update onset, LB applies a single SET pulse to either the G + or the G -PCM device, as determined by the gradient's sign. The effective value of δ is not constant due to WRITE noise and can lead to biases in the weight update distribution, potentially impacting learning dynamics and convergence.
2 In differential configuration, unidirectional updates can saturate one or both devices [ 81 , 109 , 110 ]. While push-pull mechanism [ 170 ] can address this issue in memristors with symmetric SET/RESET characteristics, it's not feasible in PCMs since the abrupt nature of RESET operations [ 46 ]. This necessiates a frequent saturation check and refresh mechanism to reset and reprogram both memristors.
stochastic update ( su ) Conventional optimization methods often require weight updates that are 3 -4 orders of magnitude smaller than the original weight values [ 175 ], posing a challenge for PCM devices with limited precision [ 176 ]. To bridge this precision gap, SU stochastically executes updates based on the approximated gradient's magnitude [ 81 ]:
$$P ( \ u p d a t e ) = \min \left ( 1 , \frac { | \hat { \nabla } _ { W } \mathcal { L } | } { p } \right ) \quad ( 3 . 4 )$$
where p is a scaling factor controlling the update probability. Choosing p such that update probability P ( update ) is proportional to ∥ ˆ ∇ W L∥ ensures that larger gradients are more likely to trigger updates, effectively adapting the learning rate to the limited precision of PCM devices.
In our implementation, we scale the gradient by 1/ p before comparing it to a random uniform value to determine whether an update occurs. This approach, inspired by Nandakumar et al. [ 81 ], allows for fine-grained control over the effective learning rate. Unlike the original work, we perform the refresh operation before the update to prevent updates on saturated devices. This modification further enhances the stability and reliability of the learning process.
multi -memristor update ( mmu ) MMUenhances synaptic weight resolution and mitigates write noise by utilizing 2 N PCM devices per synapse, arranged in N differential pairs [ 109 ]. Updates are applied sequentially to these devices, effectively reducing the minimum achievable weight change by a factor of 2 N and the variance due to write noise by a factor of √ 2 N (see Supplementary Note 3 ).
In our implementation, we estimate the number of SET pulses required to achieve the desired conductance change, assuming a linear conductance increase of 0 . 75 µ S per pulse (see Section 3 . 1 . 3 ). 3 These pulses are then applied sequentially to the PCM devices in a circular queue. A refresh operation is performed if the conductance of any device pair exceeds 9 µ S and their difference is less than 4 . 5 µ S. This refresh mechanism helps maintain the dynamic range of the synaptic weights and ensures reliable long-term operation.
mixed -precision update ( mpu ) MPUaddresses the discrepancy between the high precision of learning algorithms and the limited resolution of PCM devices by accumulating gradients on a high-precision co-processor until they reach a threshold that can be reliably represented in PCM. This approach is analogous to quantization-aware training techniques [ 110 , 111 ].
In our implementation, approximated gradients calculated by e-prop are accumulated in FP 32 memory until they are an integer multiple of the PCM update granularity ( 0.75 µ S). These accumulated values are then converted to the corresponding number of pulses and applied to the PCM devices. A refresh operation is triggered when the conductance of either device in a pair exceeds 9 µ S, and their difference is less than 4.5 µ S, maintaining synaptic efficacy and preventing saturation.
## 3 . 1 . 3 . 3 Training a Spiking RNN on a PCM Crossbar Simulation Framework
We used PCM crossbar array model to determine realistic values for the network parameters { W in ji , W rec ji and W out kj } . To represent synaptic weights, W ∈ [ -1, 1 ] , with PCM device conductance values, G ∈ [ 0.1, 12 ] µ S [ 81 ], we used the linear relationship W = β [ ∑ N G + -∑ N G -] , where ∑ N G + and ∑ N G -are the total conductance of N memristors 4 representing the potentiation and the depression of the synapse respectively [ 81 ].
The forward computation (inference) of Eq. 3 . 1 is simulated using the PCM crossbar simulation framework, incorporating the effects of READ noise and temporal conductance drift. Subsequently, the weight updates calculated by the e-prop algorithm are applied to the PCM-based crossbar arrays using each of the methods described in Section 3 . 1 . 3 . 2 .
3 For a more precise pulse estimation method, refer to Nandakumar et al. [ 81 ].
4 N = 1 for all weight update methods, except multi-memristor updates.
F igure 3 . 3 : Overview of the spiking RNN training framework with the proposed PCM crossbar array simulation framework, illustrated for a pattern generation task. Network weights are allocated from three crossbar array models, Ginp , Grec , Gout . The network-generated pattern is compared to the target pattern to produce a learning signal, which is fed back to each neuron. The LB calculates instantaneous weight changes ∆ W using the e-prop learning rule and has to efficiently transfer the desired weight change to a conductance change, i.e. ∆ W → ∆ G , while accounting for PCM non-idealities.
<details>
<summary>Image 11 Details</summary>

### Visual Description
## Neural Network Diagram: Recurrent Network Architecture
### Overview
The image presents a diagram of a recurrent neural network architecture. It illustrates the flow of information through different layers, including input neurons, recurrent neurons, and an output unit. The diagram also highlights the learning process, showing how the network adjusts its weights based on the network loss.
### Components/Axes
* **Input Neurons:** A vertical stack of circles labeled "Input Neurons" on the bottom-left.
* **G<sub>inp</sub>:** A grid-like structure representing the input conductance, positioned above and to the right of the input neurons.
* **ΔG<sub>inp</sub>:** A green arrow pointing from the input neurons towards the G<sub>rec</sub> grid, labeled "ΔG<sub>inp</sub>".
* **Recurrent Neurons:** A circular arrangement of nodes within a larger circle, labeled "Recurrent Neurons" at the bottom. Arrows indicate connections between the nodes.
* **G<sub>rec</sub>:** A grid-like structure representing the recurrent conductance, positioned above the recurrent neurons.
* **ΔG<sub>rec</sub>:** A green arrow pointing from the recurrent neurons towards the Learning Block, labeled "ΔG<sub>rec</sub>".
* **G<sub>out</sub>:** A grid-like structure representing the output conductance, positioned to the right of the recurrent neurons.
* **ΔG<sub>out</sub>:** A green arrow pointing from the output neurons towards the Learning Block, labeled "ΔG<sub>out</sub>".
* **Output Unit:** A single oval labeled "Output Unit" at the bottom.
* **Generated Pattern:** A dotted line representing the generated pattern, positioned to the right of the output unit.
* **Network Loss:** A rectangular box labeled "Network Loss" below the Learning Block.
* **Target Pattern:** A solid line representing the target pattern, positioned to the right of the Network Loss.
* **Learning Block:** A rounded rectangle labeled "Learning Block" with the equation "ΔW → ΔG" inside. The text "Per Neuron" is written to the right of the block.
### Detailed Analysis
* **Input Layer:** The input neurons feed into the G<sub>inp</sub> grid.
* **Recurrent Layer:** The recurrent neurons are interconnected, forming a dynamic system.
* **Output Layer:** The output unit receives input from the recurrent layer.
* **Learning Process:** The network loss is calculated by comparing the generated pattern with the target pattern. This loss is then used to adjust the weights (ΔW) in the learning block, which in turn affects the conductances (ΔG).
* **Conductance Grids:** The G<sub>inp</sub>, G<sub>rec</sub>, and G<sub>out</sub> grids likely represent the synaptic conductances between the layers.
* **Feedback Loop:** A curved arrow indicates a feedback loop within the recurrent neurons, suggesting that the output of these neurons influences their own input.
### Key Observations
* The diagram emphasizes the flow of information and the learning process in a recurrent neural network.
* The use of conductance grids (G<sub>inp</sub>, G<sub>rec</sub>, G<sub>out</sub>) suggests a focus on synaptic plasticity.
* The feedback loop within the recurrent layer is a key characteristic of recurrent neural networks, allowing them to process sequential data.
### Interpretation
The diagram illustrates a typical recurrent neural network architecture. The input neurons feed into a recurrent layer, which processes the information and generates an output. The network learns by comparing its output to a target pattern and adjusting its weights accordingly. The conductance grids likely represent the strength of the connections between neurons, which are modified during the learning process. The feedback loop within the recurrent layer allows the network to maintain a memory of past inputs, making it suitable for tasks such as natural language processing and time series analysis. The diagram highlights the key components and processes involved in recurrent neural network learning.
</details>
T able 3 . 1 : Performance evaluation of spiking RNNs with models of PCM crossbar arrays.
| Method | Sign-gradient | Stochastic | Multi-mem (N= 4 ) | Multi-mem (N= 8 ) | Mixed-precision |
|----------|-----------------|--------------|---------------------|---------------------|-------------------|
| MSE Loss | 0 . 2080 | 0 . 1808 | 0 . 1875 | 0 . 1645 | 0 . 0380 |
## 3 . 1 . 3 . 4 Results
We validated online training on a 1 D continual pattern generation task [ 75 ], relevant for motor control and value function estimation in RL [ 163 ], using our analog crossbar framework (see Section 3 . 1 . 3 . 5 )
Table 3 . 1 summarizes the training performance of the RSNN using different weight update methods on PCM crossbar arrays. We defined an MSE loss of < 0 . 1 as the performance performance threshold for this task (see Section A for selection criteria). Among the five configurations, only the mixed-precision approach achieved this threshold, demonstrating sparse spiking activity and successful pattern generation.
During training, the weight saturation problem due to the differential configuration is rare ( < 1% ) , as shown in Fig 3 . 5 (right). We hypothesize that this is due to the mixed-precision algorithm reducing the total number of WRITE pulses through update accumulation ( ∼ 12 WRITE pulses are applied per epoch, Fig. 3 . 6 .). Fig. 3 . 5 (left) illustrates the effective weight distribution of the PCM synapses at the end of the training.
To assess performance loss due to PCM non-idealities (WRITE/READ noise, drift), we simulated an ideal 4 -bit device model, effectively acting as a digital 4 -bit memory (Section A).
Table 3 . 2 summarizes the performance of different update methods with this ideal model. Stochastic, multi-memristor ( N = 8), and mixed-precision updates successfully solved the task, with mixed-precision achieving the best accuracy. All methods performed better without PCM non-idealities. Interestingly, stochastic updates outperformed both multi-memristor methods, suggesting the importance of few stochastic updates when training with quantized weights.
To further evaluate the impact of limited bit precision, we trained the same network with e-prop using standard FP 32 weights. This high-resolution training yielded comparable results to mixed-precision training with either ideal quantized memory or the PCM cell model.
<details>
<summary>Image 12 Details</summary>

### Visual Description
## Neural Network Training and Spike Patterns
### Overview
The image presents a series of plots illustrating the training process and output of a recurrent neural network (RNN). The plots show recurrent layer spike patterns, training loss over epochs, and the generated patterns compared to a target signal.
### Components/Axes
**1. Recurrent Layer Spike Patterns (Top Plot):**
* **Title:** Recurrent layer spike patterns
* **Y-axis:** Neuron ID, ranging from 0 to 100.
* **X-axis:** Time (ms), ranging from 0 to 1000.
* The plot displays vertical blue lines representing the timing of spikes for different neurons over time.
**2. Training Loss (Bottom-Left Plot):**
* **Title:** Training Loss
* **Y-axis:** Loss, ranging from 0.10 to 0.30.
* **X-axis:** Epoch, ranging from 0 to 250.
* The plot shows the training loss decreasing over epochs, with a shaded region indicating the variance or confidence interval.
**3. Pattern Generation (Bottom-Right Plots - 3 subplots):**
* **Title:** Pattern Generation
* **Y-axis:** Output, ranging from approximately -2 to 1.
* **X-axis:** Time (ms), ranging from 0 to 1000.
* **Legend (Top-Right):**
* Blue line: Target
* Orange line: RNN output
* Each subplot shows the target signal (blue) and the RNN output (orange) over time. The target signal appears to be a sine wave.
### Detailed Analysis
**1. Recurrent Layer Spike Patterns:**
* The spike patterns show activity across different neurons at various time points.
* There appear to be periods of higher activity around 400-600 ms and 800-1000 ms.
* The neuron IDs with the most frequent spikes seem to be distributed across the entire range (0-100).
**2. Training Loss:**
* The loss starts around 0.28 and generally decreases to approximately 0.10 by epoch 250.
* The shaded region indicates the variability in the loss, which also decreases over time.
* The loss decreases rapidly in the first 50 epochs, then slows down.
**3. Pattern Generation:**
* The target signal is a smooth sine wave.
* The RNN output attempts to mimic the target signal.
* In the first subplot, the RNN output follows the target signal reasonably well, but with some noise and deviations.
* In the second subplot, the RNN output is similar to the first, with some differences in the noise pattern.
* In the third subplot, the RNN output has a larger negative deviation around 200 ms and 700 ms.
### Key Observations
* The training loss decreases over epochs, indicating that the RNN is learning.
* The RNN output approximates the target signal, but with some errors and noise.
* The spike patterns in the recurrent layer show activity that correlates with the generated patterns.
### Interpretation
The plots demonstrate the training of an RNN to generate a specific pattern (sine wave). The decreasing training loss indicates that the network is learning to minimize the difference between its output and the target signal. The spike patterns in the recurrent layer likely represent the internal dynamics of the network as it learns to generate the desired output. The differences in the RNN output across the three subplots may be due to variations in the initial conditions or random fluctuations during training. The RNN is able to approximate the target signal, but there is still room for improvement in terms of reducing noise and deviations.
</details>
F igure 3 . 4 : Dynamics of a network trained with the mixed-precision algorithm. The raster plot (top) shows the sparse spiking activity ( ∼ 3 . 3 Hz) of recurrent-layer neurons. The training loss (bottom left) demonstrates MSE loss over 250 epochs is averaged over ten best network hyperparameters (see Fig A. 8 for the best performing hyperparameter). Properly-tuned neuronal time constants and trained network weights result in generated patterns following the targets (bottom right). The generated patterns are extracted from three different spiking RNNs.
F igure 3 . 5 : (left) The effective conductance distributions ( G + -G -) of the synapses in the input, recurrent and output layer, at the end of the training with the mixed-precision method. (right) Averaged over 50 training runs, the mean number of PCM devices requiring a refresh is shown for each layer. The refresh operation was not needed for recurrent and output layers.
<details>
<summary>Image 13 Details</summary>

### Visual Description
## Combined Chart: Conductance Histograms and PCM Count vs. Epoch
### Overview
The image presents two charts side-by-side. The left side shows three histograms representing the distribution of effective conductance for the Input (Inp), Recurrent (Rec), and Output (Out) layers. The right side displays a line graph showing the number of Phase Change Memories (PCMs) over Epochs for the same three layers, with a shaded region indicating uncertainty.
### Components/Axes
**Left Side: Histograms**
* **Y-axis (all histograms):** "Count" - Logarithmic scale from 10^0 to 10^3.
* **X-axis (all histograms):** "Effective conductance (µS)" - Linear scale from -10 to 10.
* **Histograms (top to bottom):**
* **Inp layer:** Light blue bars.
* **Rec layer:** Purple bars.
* **Out layer:** Dark purple bars.
**Right Side: Line Graph**
* **Y-axis:** "Number of PCMs" - Linear scale from 0 to 0.5.
* **X-axis:** "Epoch" - Linear scale from 0 to 250.
* **Data Series:**
* **Inp layer:** Light blue line with a shaded region around it.
* **Rec layer:** Purple line (not visible on the graph).
* **Out layer:** Dark purple line (horizontal line at 0).
### Detailed Analysis
**Left Side: Histograms**
* **Inp layer (Light Blue):** The distribution is heavily concentrated around a positive conductance value, approximately between 1 and 3 µS, with a peak count near 10^3. There are also smaller peaks around 5 µS.
* **Rec layer (Purple):** The distribution is broader, ranging from approximately -10 µS to 10 µS. It appears to be roughly symmetrical around 0 µS, with multiple peaks between -5 µS and 5 µS. The highest count is approximately 10^3.
* **Out layer (Dark Purple):** The distribution is concentrated around 0 µS, with a peak count near 10^2. There's a smaller peak around 3 µS.
**Right Side: Line Graph**
* **Inp layer (Light Blue):** The number of PCMs starts at 0 until approximately Epoch 75. It then increases gradually until Epoch 150, where it reaches approximately 0.1. From Epoch 150 to 250, it continues to increase, reaching approximately 0.2 at Epoch 250. The shaded region indicates the uncertainty around this line.
* **Rec layer (Purple):** The line is not visible on the graph, suggesting it remains at or near 0 across all epochs.
* **Out layer (Dark Purple):** The number of PCMs remains at 0 across all epochs.
### Key Observations
* The Inp layer shows a strong preference for positive conductance values.
* The Rec layer exhibits a more balanced distribution of conductance values, centered around zero.
* The Out layer has a high concentration of conductance values near zero.
* The number of PCMs in the Inp layer increases with the number of epochs.
* The Rec and Out layers show minimal to no change in the number of PCMs over epochs.
### Interpretation
The histograms suggest that the different layers in the neural network have distinct conductance profiles. The Inp layer seems to be biased towards positive conductance, while the Rec layer has a more balanced distribution. The Out layer is concentrated around zero, possibly indicating a different role in the network's operation.
The line graph indicates that the Inp layer's PCMs are being utilized and adjusted during training (as epochs increase), while the Rec and Out layers remain relatively inactive in terms of PCM usage. This could imply that the Inp layer is crucial for learning and adapting to the input data, while the other layers might have different functions or require different training strategies. The shaded region around the Inp layer line suggests that there is some variability or uncertainty in the number of PCMs being used at each epoch.
</details>
F igure 3 . 6 : Total number of WRITE pulses applied to PCM devices are shown for the input, recurrent and output layers. Only 0 . 07 %, 0 . 07 % and 0 . 1 % of PCM devices within each layer are programmed respectively during mixed-precision training.
<details>
<summary>Image 14 Details</summary>

### Visual Description
## Line Charts: WRITE Pulses vs. Epoch for Input, Recurrent, and Output Layers
### Overview
The image presents three line charts side-by-side, each depicting the relationship between "WRITE pulses" and "Epoch" for different layers of a neural network: Input layer, Recurrent layer, and Output layer. Each chart shows a light blue line representing the data, surrounded by a shaded light blue area indicating the variance or confidence interval.
### Components/Axes
* **X-axis (all charts):** "Epoch" ranging from 0 to 250, with tick marks at approximately 50, 100, 150, 200, and 250.
* **Y-axis (Input layer):** "WRITE pulses" ranging from 0 to 2000, with tick marks at 500, 1000, 1500, and 2000.
* **Y-axis (Recurrent layer):** "WRITE pulses" ranging from 0 to 1500, with tick marks at 500, 1000, and 1500.
* **Y-axis (Output layer):** "WRITE pulses" ranging from 0 to 35, with tick marks at 5, 15, 20, 25, 30, and 35.
* **Titles:** "Input layer" (top-left), "Recurrent layer" (top-center), "Output layer" (top-right).
* **Data Series:** A single light blue line with a shaded light blue confidence interval is present in each chart.
### Detailed Analysis
**1. Input Layer Chart:**
* **Trend:** The light blue line starts at approximately 0 WRITE pulses at Epoch 0, increases rapidly until approximately Epoch 50, then continues to increase at a slower rate, reaching approximately 1300 WRITE pulses at Epoch 250.
* **Data Points (approximate):**
* Epoch 0: 0 WRITE pulses
* Epoch 50: 700 WRITE pulses
* Epoch 100: 1100 WRITE pulses
* Epoch 150: 1250 WRITE pulses
* Epoch 200: 1300 WRITE pulses
* Epoch 250: 1350 WRITE pulses
**2. Recurrent Layer Chart:**
* **Trend:** The light blue line starts at approximately 0 WRITE pulses at Epoch 0, increases rapidly until approximately Epoch 50, then plateaus, reaching approximately 1350 WRITE pulses at Epoch 250.
* **Data Points (approximate):**
* Epoch 0: 0 WRITE pulses
* Epoch 50: 1000 WRITE pulses
* Epoch 100: 1250 WRITE pulses
* Epoch 150: 1300 WRITE pulses
* Epoch 200: 1350 WRITE pulses
* Epoch 250: 1350 WRITE pulses
**3. Output Layer Chart:**
* **Trend:** The light blue line starts at approximately 0 WRITE pulses at Epoch 0, increases gradually until approximately Epoch 250, reaching approximately 32 WRITE pulses.
* **Data Points (approximate):**
* Epoch 0: 0 WRITE pulses
* Epoch 50: 3 WRITE pulses
* Epoch 100: 12 WRITE pulses
* Epoch 150: 20 WRITE pulses
* Epoch 200: 26 WRITE pulses
* Epoch 250: 32 WRITE pulses
### Key Observations
* The Input and Recurrent layers show a similar trend of rapid increase in WRITE pulses in the initial epochs, followed by a plateau.
* The Output layer shows a more gradual and linear increase in WRITE pulses over the epochs.
* The scale of WRITE pulses differs significantly between the layers, with the Input and Recurrent layers having much higher values than the Output layer.
### Interpretation
The charts illustrate how the number of WRITE pulses changes over epochs in different layers of a neural network. The initial rapid increase in WRITE pulses in the Input and Recurrent layers suggests that these layers are actively learning and adjusting their weights early in the training process. The plateauing effect indicates that these layers may have reached a point of diminishing returns or have converged to a stable state. The Output layer's gradual increase suggests a more consistent and incremental learning process. The different scales of WRITE pulses across layers likely reflect the varying complexity and function of each layer within the network. The shaded area around each line represents the variability in the data, possibly due to different training runs or variations within a single run.
</details>
T able 3 . 2 : Performance evaluation of spiking RNNs with an ideal crossbar array model 5
| Method | Sign-gradient | Stochastic | Multi-mem (N= 4 ) | Multi-mem (N= 8 ) | Mixed-precision |
|----------|-----------------|--------------|---------------------|---------------------|-------------------|
| MSE Loss | 0 . 1021 | 0 . 0758 | 0 . 1248 | 0 . 0850 | 0 . 0289 |
Similar to Nandakumar et al. [ 81 ], we found that the probability scaling factor p in the stochastic update method allows tuning the number of devices programmed during training. Figure 3 . 7 shows that increasing p (decreasing update probability) can reduce WRITE pulses by up to an order of magnitude without degrading the loss. This result highlights the potential for optimizing energy efficiency in memristive online learning systems by strategically adjusting the update probability.
F igure 3 . 7 : The stochastic update method enables tuning the number of WRITE pulses to be applied to PCM devices.
<details>
<summary>Image 15 Details</summary>

### Visual Description
## Scatter Plot: Number of WRITE pulses applied to recurrent layer and Loss vs. Probability scaling factor
### Overview
The image is a scatter plot showing the relationship between the number of WRITE pulses applied to a recurrent layer and the probability scaling factor, with data points colored according to log-loss values. The plot illustrates how the number of WRITE pulses changes with varying probability scaling factors and how this relationship is influenced by the log-loss.
### Components/Axes
* **Title:** Number of WRITE pulses applied to recurrent layer and Loss vs. Probability scaling factor
* **X-axis:** Probability scaling factor, *p*. Scale ranges from 0 to 1750, with tick marks at intervals of 250.
* **Y-axis:** Number of WRITE pulses applied to recurrent layer. Scale ranges from 0 to 3000, with tick marks at intervals of 500.
* **Legend (Top-Right):** Stochastic weight update (represented by green data points).
* **Colorbar (Right):** log-loss values ranging from -1.5 to 4.5.
* -1.5: Light Green
* 0.0: Green
* 1.5: Teal
* 3.0: Dark Teal
* 4.5: Dark Blue
### Detailed Analysis
The scatter plot displays data points colored according to their log-loss values. The x-axis represents the probability scaling factor, and the y-axis represents the number of WRITE pulses.
* **General Trend:** There is a general downward trend, indicating that as the probability scaling factor increases, the number of WRITE pulses tends to decrease.
* **Log-Loss Distribution:**
* **log-loss = -1.5 (Light Green):** These points are scattered throughout the plot, but are more prevalent at higher probability scaling factors (above 750).
* **log-loss = 0.0 (Green):** Similar to the light green points, these are also scattered, but are more concentrated at higher probability scaling factors.
* **log-loss = 1.5 (Teal):** These points are more concentrated at lower probability scaling factors (below 500) and higher number of WRITE pulses.
* **log-loss = 3.0 (Dark Teal):** These points are mostly found at lower probability scaling factors (below 500) and higher number of WRITE pulses.
* **log-loss = 4.5 (Dark Blue):** These points are almost exclusively found at very low probability scaling factors (below 250) and high number of WRITE pulses (above 1500).
### Key Observations
* The number of WRITE pulses is generally higher for lower probability scaling factors.
* Higher log-loss values (dark blue and dark teal) are associated with lower probability scaling factors and higher number of WRITE pulses.
* Lower log-loss values (light green and green) are associated with higher probability scaling factors and lower number of WRITE pulses.
* There is a significant amount of scatter in the data, indicating variability in the relationship between the variables.
### Interpretation
The plot suggests that the number of WRITE pulses required for the recurrent layer decreases as the probability scaling factor increases. The log-loss value appears to be correlated with both the number of WRITE pulses and the probability scaling factor. Higher log-loss values are associated with scenarios where fewer WRITE pulses are needed at lower probability scaling factors, potentially indicating a more efficient learning process under those conditions. Conversely, lower log-loss values are associated with higher probability scaling factors, suggesting that more WRITE pulses are needed to achieve similar performance. The scatter in the data indicates that other factors may also be influencing the relationship between these variables.
</details>
## 3 . 1 . 3 . 5 Methods
For the chosen pattern generation task, the network consists of 100 input and 100 recurrent LIF neurons, along with one leaky-integrator output unit. The network receives a fixed-rate Poisson input, and the target pattern is a one-second-long sequence defined as the sum of four sinusoids ( 1 Hz, 2 Hz, 3 Hz and 5 Hz), with phases and amplitudes are randomly sampled from uniform distributions [ 0, 2 π ] and [ 0.5, 2 ] , respectively. Throughout the training, all layer weights { W in ji , W rec ji and W out kj }, are kept plastic and the device conductances are clipped between 0 . 1 and 12 µ S. This benchmark is adapted from [ 82 ].
We trained approximately 1000 different spiking RNNs for each weight update methods described in Section 3 . 1 . 3 . 2 . Each network shared the same architecture, except for synapse implementations, some hyperparameters and weight initialization. As each update method requires specific additional hyperparameters and can significantly affect network dynamics, we tuned these hyperparameters for each method using Bayesian optimization [ 177 ]. We selected the best performing network hyperparameters out of 1000 candidates based on performance over 250 epochs of the pattern generation task.
## 3 . 1 . 4 Discussion
On-chip learning capability for RSNNs chips enables ultra-low-power intelligent edge devices with adaptation capabilities [ 178 , 179 ]. This work focused on evaluating the efficacy of non-von Neumann analog computing with non-volatile emerging memory technologies in implementing
5 Multi-memristor configurations are implemented assuming a 4 -bit resolution per memory cell. Hence N = 4 and N = 8 is equal to having 7 -bit and 8 -bit resolutions digital weights per synapse, respectively.
updates calculated by the spatio-temporally local e-prop learning rule. This task is particularly challenging due to the need to preserve task-relevant information in the network's activation dynamics for extended periods, despite analog weight non-idealities and the truncated gradients inherent to e-prop.
We developed a PyTorch-based PCM crossbar array simulation framework to evaluate four simplistic memristor update mechanisms. Through extensive hyperparameter optimization, we demonstrated that the mixed-precision update scheme yielded the best accuracy. This superior performance stems from the accumulation of instantaneous gradients on high-precision memory, enabling the use of a low learning rate. Consequently, the ideal weight update magnitude aligns closely with the minimum programmable PCM conductance change, leading to improved convergence.
However, the mixed-precision scheme necessitates high-precision memory for gradient accumulation. This could potentially be addressed by incorporating a co-processor alongside the memristor crossbar array, as demonstrated previously [ 110 ]. Despite this requirement, gradient accumulation enables training with high learning rates, reduces the number of programming cycles together with the total energy to train. The synergy between memristor-based synapses and learning rules and neural architectures inherently capable of gradient accumulation is a promising avenue for further research.
Memory resolution is a critical factor influencing learning performance, aligning with previous findings on mixed-precision learning [ 110 ]. However, increasing resolution often comes at the cost of larger synapses due to the increased number of devices. An alternative solution is to employ binary synapses with a stochastic rounding update scheme [ 180 ]. This approach can leverages the intrinsic cycle-to-cycle variability of memristive devices [ 143 ] to implement stochastic updates efficiently, effectively reducing the learning rate and guiding weight parameters towards their optimal binary values [ 181 , 182 ]
From the computational neuroscience perspective, mixed-precision hardware resembles the concept of the cascade memory model in the neuroscience, where complex synapses hold a hidden state variable which only becomes visible after hitting a threshold [ 183 ]. Similar meta-plastic models has also recently been used for catastrophic forgetting [ 184 ].
To the best of our knowledge, this is the first report on the online training of RSNNs with e-prop learning rule based on realistic PCM synapse models. Our simulation framework enables benchmarking of common update circuits designed to cope with memristor non-idealities and demonstrates that accumulating gradients enhances PCM device programming reliability, reduces the number of programmed devices and outperforms other synaptic weight-update mechanisms.
In the following Section 3 . 2 , we will present the implementation of the mixed-precision update scheme on a neuromorphic hardware with PCM devices. Later in Section 3 . 3 , I will introduce how eligibility traces, a crucial building block of many local learning rules, can be implemented with volatile memristive devices.
## 3 . 2 implementing online training of rsnns on a neuromorphic hardware
Building upon the simulation results in the Section 3 . 1 , we now present an implementation of e-prop on a neuromorphic hardware with in-memory computing capabilities. We embed all the network weights directly onto physical PCM devices and control the training procedure with a hardware-in-the-loop setup. We utilize the HERMES chip [ 185 ] fabricated in 14 nm CMOS technology with 256 x 256 PCM crossbar arrays to validate the feasibility of on-chip learning under physical device constraints. Our experiments demonstrate that the mixed-precision training approach remains effective, achieving performance competitive with an FP 32 realization while simultaneously equipping the RSNN with online training capabilities and leveraging the ultralow-power benefits of the hardware.
## 3 . 2 . 1 From the simulation to an analog chip
The HERMES chip features two in-memory computing cores comprising PCM devices [ 23 ], with conductances ( G ) controlled by individual SET and RESET pulses. A low-amplitude SET pulse (100 µ A, 600ns) gradually switches the material from amorphous to crystalline phase, while a high-amplitude RESET pulse (700 µ A, 600 ns) rapidly switches it to HRS.
The crossbar array operates in a differential parallel setup, i.e., each element of the embedded weight matrices W is represented by four PCM devices in the following way: W ij = (( G + A + G + B ) -( G -A + G -B )) /2, indicated with the 8 T 4 R quantifier in the system diagram of Fig. 3 . 8 . The in-memory computing core allows feeding inputs, e.g., { x t , z t -1 , z t } , to the rows and through the application of Ohm's and Kirchhoff's law, it performs a fully parallel matrix-vector multiplication [ 23 ], approximately with a 4 -bit precision [ 186 ].
. .
<details>
<summary>Image 16 Details</summary>

### Visual Description
## Diagram: Crossbar Array Architecture
### Overview
The image depicts a crossbar array architecture, likely used in a memory or processing system. It shows a grid of interconnected cells, with inputs/PWM signals on one side and ADC outputs on the other.
### Components/Axes
* **Inputs/PWM:** Label on the left side, indicating input signals or Pulse Width Modulation.
* **ADC:** Label at the bottom, indicating Analog-to-Digital Converter outputs.
* **8T4R:** Label within each cell of the array, likely representing an 8-Transistor 4-Resistor memory cell.
* **Dots (...):** Indicate continuation of the array structure in both horizontal and vertical directions.
* **Output for off-chip processing:** Label with an arrow pointing to the right, indicating the destination of the ADC outputs.
### Detailed Analysis
The diagram shows a 3x3 grid of cells, each labeled "8T4R". The cells are interconnected by horizontal and vertical lines. The horizontal lines are connected to "Inputs/PWM" on the left. The vertical lines are connected to "ADC" at the bottom. The dots indicate that the array extends beyond the visible portion.
### Key Observations
* The array structure is regular and repetitive.
* The "8T4R" cells are the fundamental building blocks of the array.
* The "Inputs/PWM" signals are converted to digital signals by the "ADC" and sent for off-chip processing.
### Interpretation
The diagram illustrates a crossbar array architecture, which is a common structure for memory and processing systems. The "8T4R" cells likely store or process data. The "Inputs/PWM" signals are used to write data to the cells or control their operation. The "ADC" converts the analog signals from the cells to digital signals, which can be processed by external circuitry. The use of "off-chip processing" suggests that the array is part of a larger system.
</details>
. .
F igure 3 . 8 : Illustration of the working principle of phase-change devices and their integration into a crossbar architecture. a , Phase switching behavior of PCM devices. SET and RESET pulses can be used to transition between the amorphous and the crystalline phase. b , The PCM devices from a can be incorporated into a crossbar array structure. In our work, four PCM devices are used in a differential manner to represent each weight element
Similar to Section 3 . 1 , we tested the network on a 1 D continual pattern regression task, using an RSNN architecture with 100 recurrently connected LIF neurons and one leaky-integrator output unit (Eq. 3 . 1 ) Again, the RSNN is driven by fixed-rate Poisson spikes from 100 input neurons, with a membrane decay time constant of 30 ms for both recurrent and output units.
To optimize resources allocation, we strategically embedded trainable parameters W in , W rec and W out, within Core 1 , and the feedback matrix B out within Core 2 . This architectural design, illustrated in Fig. 3 . 9 , enables network inference on one core and error-based learning signal generation on the other. We use e-prop for computing the approximated gradients for network weights and leverage the mixed-precision algorithm to accumulate gradients in a high-precision unit. This unit also serves to store the dataset, to compute neuron activations and eligibility traces, and to calculate network error.
F igure 3 . 9 : Task illustration and realization of the network on neuromorphic hardware. Realization of the regression task on a neuromorphic hardware using two cores with PCM crossbar arrays. The trainable parameters W in , W rec and W out are placed into Core 1 and B out is placed into Core 2 . The neuromorphic is used to perform the matrix-vector multiplication required to compute the network activity and the learning signal. A high-precision unit is used to track the individual gradients which are applied to the trainable parameters.
<details>
<summary>Image 17 Details</summary>

### Visual Description
## System Diagram: Neuromorphic Hardware Architecture
### Overview
The image presents a system diagram illustrating a neuromorphic hardware architecture for network activity computation, learning signal computation, and weight matrix updates. It highlights the flow of data and operations between different cores and processing units, including a high-precision coprocessor.
### Components/Axes
* **Title:** The overall diagram lacks a central title, but the individual sections are titled.
* **Sections:** The diagram is divided into three main sections, each representing a core:
* Computation of Network Activity (CORE #1)
* Computation of Learning Signal (CORE #2)
* Update Weight Matrices (CORE #1)
* **Hardware Components:**
* Nueromorphic Hardware
* High-Precision Coprocessor
* **Data Flow:** Arrows indicate the direction of data flow between components.
* **Input/Output:**
* Scaled input *x*
* Target *y*^*
* Prediction *y*
* Error *δ*
* **Processing Blocks:**
* Model Inference
* Gradient Computation & Accumulation
* **Core Components:** Each core contains a matrix of "8T4R" elements, along with "ADC" (Analog-to-Digital Converter) blocks.
* **Inputs/PWM:** Each core receives inputs via "Inputs/PWM" (Pulse Width Modulation).
### Detailed Analysis
* **Computation of Network Activity (CORE #1):**
* Input: Scaled input *x* enters the system.
* Core Operation: Matrix-Vector Multiplication (MVM operation) is performed using weights *W*<sub>in</sub>, *W*<sub>rec</sub>, and *W*<sub>out</sub>.
* Output: The result *G* is passed to the Model Inference block.
* **Computation of Learning Signal (CORE #2):**
* Core Operation: MVM operation is performed using *B*<sub>out</sub><sup>T</sup>.
* Output: *B*<sub>out</sub><sup>T</sup> *δ* is passed to the Gradient Computation & Accumulation block.
* **Update Weight Matrices (CORE #1):**
* Core Operation: Applies SET pulses to update weight matrices *W*<sub>in</sub>, *W*<sub>rec</sub>, and *W*<sub>out</sub>.
* Input: Receives computed SET pulses from the Gradient Computation & Accumulation block.
* **Model Inference:**
* Input: Receives *G* from CORE #1.
* Operation: Performs model inference to generate a prediction *y*.
* **Gradient Computation & Accumulation:**
* Inputs: Receives *B*<sub>out</sub><sup>T</sup> *δ* from CORE #2 and the error signal from the comparator.
* Operation: Computes SET pulses for weight updates.
* **Dataset:** Provides the target value *y*^* for comparison.
* **Comparator:** Calculates the error *δ* between the prediction *y* and the target *y*^*.
* **8T4R Elements:** Each core contains a grid of 8T4R memory elements. The grid appears to be 3x3, but the middle row has an elipsis indicating that there are more rows.
* **ADC Blocks:** Analog-to-Digital Converters are present at the output of each row of 8T4R elements.
### Key Observations
* The architecture uses multiple cores to parallelize computations.
* MVM operations are central to both network activity and learning signal computations.
* The high-precision coprocessor handles model inference and gradient computation.
* The system uses a feedback loop to update weight matrices based on the error between prediction and target values.
* The dashed line separates the Neuromorphic Hardware from the High-Precision Coprocessor.
### Interpretation
The diagram illustrates a neuromorphic computing system designed for efficient machine learning tasks. The use of multiple cores allows for parallel processing of network activity and learning signals. The system employs a gradient-based learning approach, where the error between the model's prediction and the target value is used to update the weight matrices. The separation of neuromorphic hardware and a high-precision coprocessor suggests a hybrid approach, leveraging the strengths of both architectures. The 8T4R memory elements likely represent memristor-based synapses, enabling energy-efficient and compact storage of weights. The ADCs convert the analog outputs of the memristor arrays into digital signals for further processing.
</details>
Before transitioning to the hardware testing, a comprehensive investigation of network performance was conducted within a simulation framework, subjecting the network to various hardware constraints. This involved utilizing weights with varying precision levels, including 32 -bit floating point (SW 32 bit), 8 -bit and 4 -bit fixed-point (SW 8 bit and SW 4 bit), and weights based on a PCM model (SW PCM), as depicted in Fig. 3 . 2 .
The results, illustrated in Figure 3 . 10 , were obtained using four distinct random seeds. Notably, the performance metrics using 32 -bit and 8 -bit precision were comparable, affirming the robustness of the network under reduced precision. The network's performance remained resilient even with 4 -bit ( E = 3.93 ± 0.58 ) or simulated PCM ( E = 3.72 ± 0.53 ). Although a slight degradation was observed with hardware ( E = 5.59 ± 1.29 ), the network succeded in reproducing the target pattern (Fig. 3 . 10 b). The individual subpanels within this figure effectively visualize the patterns at different stages of training: the beginning, after 30 iterations, and at the end of training.
The histograms in Fig. 3 . 10 c show the distributions of trainable matrices before and after training. Notably, modifications in the output weights were instrumental in achieving accurate pattern generation. The average firing rate was maintained at approximately 10 Hz through regularization. This ensured sparsity in both communication and programming pulses, crucial for energy efficiency and extending device longevity. The sparse application of SET pulses, evident in Figure 3 . 10 d, further underscores this efficiency. Specifically, only 2 -3 pulses per iteration were required for the output weight matrix, even though it comprised 100 elements. This highlights the effectiveness of the gradient accumulation in reducing the number of programming pulses and, consequently, the overall energy consumption.
## 3 . 2 . 2 Discussion
We have demonstrated that training RSNNs with the e-prop local weight update rule, using a hardware-in-the-loop approach, can be robust to both limited computational precision and analog imperfections inherent to memristive devices. Furthermore, our experiments show that this system achieves performance competitive with conventional software implementations with full-precision. The RSNN neurons exhibited a low firing rate, and mixed-precision training significantly reduced the number of PCM programming pulses. This reduction in programming activity not only enhances energy efficiency but also mitigates potential endurance issues associated with frequent
device switching. Our findings enable RSNNs trained with biologically-inspired algorithms to be deployed on memristive neuromorphic hardware for sparse and online learning.
In the following Section 3 . 3 , we will present how eligibility traces can be implemented with volatile memristive devices, enabling more scalable credit assignment on neuromorphic hardware.
F igure 3 . 10 : Results of our training approach realized with different synapse models. a , Evolution of the squared error using different synapse models, averaged over 4 random initializations. b , Visual comparison of the network output and the target pattern, at the beginning of training (first subpanel), after 30 training iterations (second subpanel) and after training (third subpanel). c , Histogram of the trainable weight matrices before and after training. d , Number of SET pulses applied to the individual weight matrices.
<details>
<summary>Image 18 Details</summary>

### Visual Description
## Chart/Diagram Type: Multi-Panel Performance Analysis
### Overview
The image presents a multi-panel figure analyzing the performance of different software (SW) and hardware (HW) implementations, likely related to a neural network or similar computational model. The figure is divided into four sub-figures (a, b, c, d), each providing a different perspective on the performance and behavior of the implementations.
### Components/Axes
**Panel a: Error vs. Iteration**
* **X-axis:** Iteration (0 to 400)
* **Y-axis:** Error (0 to 70)
* **Legend (top-right):**
* SW 32 bit (Blue)
* SW 8 bit (Green)
* SW 4 bit (Orange)
* SW PCM (Red)
* HW PCM (Purple)
**Panel b: Amplitude vs. Time**
* **X-axis:** Time [ms] (0 to 1000)
* **Y-axis:** Amplitude (-1 to 1)
* Three subplots showing different time series data.
* **Legend (top-right of the first subplot):**
* Target (Black dashed line)
* HW PCM (Purple solid line)
**Panel c: Weight Distribution Histograms**
* Three subplots: Input weights, Recurrent weights, Output weights
* **X-axis:** Weight value (-0.5 to 0.5 for Input and Recurrent, -1 to 1 for Output)
* **Y-axis:** Number of weights (0 to 3000 for Input, 0 to 2500 for Recurrent, 0 to 25 for Output)
* **Legend (top-right of the Output weights subplot):**
* init (Orange)
* final (Blue)
**Panel d: Number of Programming Pulses vs. Iteration**
* Three subplots: Input weights, Recurrent weights, Output weights
* **X-axis:** Iteration (0 to 400)
* **Y-axis:** Number of programming (SET) pulses (0 to 60 for Input, 0 to 30 for Recurrent, 0 to 10 for Output)
### Detailed Analysis
**Panel a: Error vs. Iteration**
* **SW 32 bit (Blue):** Starts at an error of approximately 70 and decreases rapidly to around 10 by iteration 100, then plateaus.
* **SW 8 bit (Green):** Starts at an error of approximately 65 and decreases rapidly to around 10 by iteration 100, then plateaus.
* **SW 4 bit (Orange):** Starts at an error of approximately 55 and decreases rapidly to around 10 by iteration 100, then plateaus.
* **SW PCM (Red):** Starts at an error of approximately 40 and decreases rapidly to around 10 by iteration 100, then plateaus.
* **HW PCM (Purple):** Starts at an error of approximately 70 and decreases rapidly to around 15 by iteration 100, then plateaus.
**Panel b: Amplitude vs. Time**
* The three subplots show the target signal (black dashed line) and the HW PCM output (purple solid line) over time. The HW PCM output appears to approximate the target signal, with varying degrees of accuracy across the three subplots.
**Panel c: Weight Distribution Histograms**
* **Input weights:** The initial (orange) and final (blue) weight distributions are shown. The final distribution appears slightly narrower and more concentrated around 0 than the initial distribution.
* **Recurrent weights:** Similar to the input weights, the final distribution is slightly narrower and more concentrated around 0.
* **Output weights:** The initial distribution (orange) is more spread out, while the final distribution (blue) is more concentrated around 0.
**Panel d: Number of Programming Pulses vs. Iteration**
* **Input weights:** The number of programming pulses starts high (around 60) and decreases rapidly to around 5 by iteration 100, then fluctuates around that level.
* **Recurrent weights:** The number of programming pulses starts high (around 30) and decreases rapidly to around 2 by iteration 100, then fluctuates around that level.
* **Output weights:** The number of programming pulses fluctuates between 0 and 10 throughout the iterations.
### Key Observations
* All implementations in Panel a show a rapid decrease in error during the first 100 iterations, followed by a plateau.
* The HW PCM implementation in Panel b approximates the target signal, but with some discrepancies.
* The weight distributions in Panel c tend to become more concentrated around 0 after training.
* The number of programming pulses in Panel d decreases over time for input and recurrent weights, suggesting that the network is converging.
### Interpretation
The data suggests that all implementations are learning, as evidenced by the decreasing error in Panel a. The HW PCM implementation is able to approximate the target signal, indicating that it is functioning correctly. The weight distributions becoming more concentrated around 0 suggests that the network is becoming more efficient. The decreasing number of programming pulses indicates that the network is converging and requires fewer adjustments over time. The different bit precisions (32, 8, 4) in Panel a likely refer to the quantization levels used in the software implementations, with higher precision generally leading to lower error. The PCM likely refers to Pulse Code Modulation, a method of digitally representing analog signals.
</details>
## 3 . 3 scalable synaptic eligibility traces with volatile memristive devices
Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip learning mechanisms. Recently, a new class of three-factor spike-based learning rules have been proposed that can solve the temporal credit assignment problem and approximate the error back-propagation algorithm on complex tasks. However, the efficient implementation of these rules on hybrid CMOS/memristive architectures is still an open challenge. Here we present a new neuromorphic building block, called PCM-trace, which exploits the drift behavior of phase-change materials to implement long-lasting eligibility traces, a critical ingredient of threefactor learning rules. We demonstrate how the proposed approach improves the area efficiency by > 10 × compared to existing solutions and demonstrates a technologically plausible learning algorithm supported by experimental data from device measurements.
## 3 . 3 . 1 Introduction
Neuromorphic engineering uses electronic analog circuit elements to implement compact and energy-efficient intelligent cognitive systems [ 187 -190 ]. Leveraging substrate's physics to emulate biophysical dynamics is a strong incentive toward ultra-low power and real-time implementations of neural networks using mixed-signal memristive event-based neuromorphic circuits [ 144 , 191 -193 ]. The majority of these systems are currently deployed in edge-computing applications only in inference mode , in which the network parameters are fixed. However, learning in edge computing can have many advantages, as it enables adaptation to changing input statistics, sensory degradations, reduced network congestion, and increased privacy. Indeed, there have been multiple efforts implementing Spike-Timing Dependent Plasticity (STDP)-variants and Hebbian learning using neuromorphic processors [ 168 , 181 , 194 ]. These methods control LongTerm Depression (LTD) or Long-Term Potentiation (LTP) by specific local features of pre- and post-synaptic activities. However, local learning rules themselves do not provide any guarantee that network performance will improve in multi-layer or recurrent networks. Local error-driven approaches, e.g., the Delta Rule, aim to solve this problem but fail to assign credit for neurons that are multiple synapses away from the network output [ 195 , 196 ]. On the other hand, it has been recently shown that by using external third-factor neuromodulatory signals (e.g., reward or prediction error in reinforcement learning, teaching signal in supervised learning), this can be achieved in hierarchical networks [ 197 , 198 ]. However, there needs to be a mechanism for synapses to remember their past activities for long periods of time, until the reward event or teacher signal is presented. In the brain, these signals are believed to be implemented by calcium ions, or CAMKII enzymes in the synaptic spine [ 199 ] and are called eligibility traces. In machine learning, algorithmic top-down analysis of the gradient descent demonstrated how local eligibility traces at synapses allow networks to reach performances comparable to error back-propagation algorithm on complex tasks [ 75 , 85 , 200 ].
There are already neuromorphic platforms that support the synaptic eligibility trace implementation such as Loihi [ 89 ], BrainScales [ 90 ] or SpiNNaker [ 91 ]. The learning (weight update) in these platforms is only supported through the use of digital processors, hence the numerical trace calculation leads to extremely memory intensive operations forming a von Neumann bottleneck [ 92 , 93 ]. Especially when the eligibility trace calculation is per synapse (instead of per neuron) the memory overhead quickly becomes overwhelming as the number of traces scales quadratically with the number of neurons in the network. And unlike the convolutional architectures on digital neuromorphic processors, where weight sharing reduces the memory bandwidth, the eligibility traces cannot be shared due to their activity dependent nature.
On the other hand, mixed-signal neuromorphic processors that perform in-memory computation can emulate the desired neural and synaptic dynamics using the physics of the analog substrate [ 201 , 202 ]. Differential-Pair Integrator (DPI) based circuits [ 202 , 203 ] which rely on
T able 3 . 3 : Eligibility Traces in Gradient-Estimating Learning Rules
| Learning rule | Pre-synaptic terms | Post-synaptic terms |
|----------------------------|----------------------|-----------------------|
| e-prop (LIF) [ 82 ] | x j | - |
| e-prop (ALIF) [ 82 ] | x j | ψ i |
| Sparse RTRL [ 83 ] | x j | - |
| BDSP [ 39 ] | x j | - |
| SuperSpike [ 85 ] | ϵ ∗ x j | ψ i |
| Sparse Spiking G.D. [ 86 ] | x j | - |
accumulating volatile information on capacitors, in principle can be used to implement eligibility traces. Recently, a substantial progress has been made [ 94 ] in implementing slow-dynamics DPI synapse circuits using advanced Fully Depleted Silicon-On-Insulator (FDSOI) technologies. By combining reverse body biasing and self-cascoding techniques [ 204 ], these circuits can achieve ∼ 6 s long synaptic traces [ 94 ]. However, resulting area due to large capacitor sizes and areadependent leakage due to Alternate Polarity Metal On Metal (APMOM) structures hinder the scalability of such hardware implementations.
In this work, we present a novel approach to exploit the drift behavior of PCM devices to intrinsically perform eligibility trace computation over behavioral timescales. We present the PCM-trace building block as a hybrid memristive-CMOS circuit solution that can lead to recordlow area requirements per synapse. To the best of our knowledge, this is the first work that uses a memristive device not only to store the weight of synapses, but also to keep track of synaptic eligibility to interact with a third factor toward scalable next-generation on-chip learning.
## Eligibility traces in machine learning and neuroscience
Eligibility trace is a decaying synaptic state variable that tracks recency and frequency of synaptic events, as described in Eq. ( 3 . 5 ). The trace state, e ij , of the synapse between pre-synaptic neuron j and post-synaptic neuron i is controlled by usually a linear function of pre-synaptic spiking activity f j ( xj ) and non-linear function of the post-synaptic activity gi ( xi ) , such that
$$\frac { d e _ { i j } } { d t } = - \tau _ { e } e + \eta f _ { j } \left ( x _ { j } \right ) g _ { i } \left ( x _ { i } \right ) & & ( 3 . 5 )$$
, where τ e is the time-constant of the trace and η is the constant scaling factor [ 197 ].
The function of eligibility trace is to keep the temporal correlation history of f j ( xj ) and gi ( xi ) available in the synapse, by accumulating instantaneous correlation events, called synaptic tags . From the top-down gradient-based machine learning perspective, various learning rules require eligibility trace functionality as a part of the network architecture and specifies their own synaptic tag requirements, f j · gi , (i.e., what information to accumulate on the synapse).
Table 3 . 3 summarizes some of the recently developed biologically-plausible, local learning rules employing eligibility traces within the supervised learning framework. Most of the listed learning rules accumulate pre-synaptic events, xj , seldom further smoothed with ϵ (causal membrane kernel), whereas some learning rules additionally require the information of a surrogate partial derivative function of post-synaptic state, ψ i . By approximating the gradient-based optimization for spiking neural networks, these learning rules support competitive performance on standard audio and image classification datasets such as TIMIT Dataset [ 205 ], Spiking Heidelberg Digits [ 206 ], Fashion-MNIST [ 207 ], Neuromorphic-MNIST [ 208 ], CIFAR10 [ 209 ] and even ImageNet [ 210 ].
In the RL, the advantages of having long-lasting synaptic eligibility traces are more evident. Eligibility traces carry the temporal synaptic tag information into the future, allowing backward view when the sparse reward arrives from the environment [ 211 ]. By doing so, eligibility traces assist solving the distal reward problem [ 88 ] (how the brain assigns credits/blame for neurons if
the activity patterns responsible for the reward are no longer exist when the reward arrives) by bridging the milli-second neuronal timescales and second-long behavioral timescales. Almost any Temporal Difference (TD) method e.g., Q-learning or SARSA can benefit from eligibility traces to learn more efficiently [ 211 ]. The policy gradient methods utilizing the temporal discounting operation e.g., discounted reward [ 211 ] or discounted advantage [ 212 ], naturally demands the synaptic eligibility traces. Moreover, some on-policy and off-policy RL models can explain behavioral and physiological experiments on multiple sensory modalities, only if they are equipped with synaptic eligibility traces with > 10 s decay time [ 213 ].
Eligibility traces are also deeply rooted in neurobiology. The synaptic machinery that implements the eligibility trace might be calcium-based mechanisms in the spine e.g., CamKII [ 214 , 215 ], or metastable transient state of molecular dynamics inside the synapse [ 216 ]. In the visual and frontal cortex, in-vivo STDP experiments suggest that pre-before-post pairings induces a synaptic tagging that decays over ∼ 10 s and result in LTP with the arrival of the third-factor neuromodulator noradrenaline [ 217 ]. In the hippocampus, Brzosko, Schultz & Paulsen [ 218 ] found that post-before-pre pairings can result in LTP if the third-factor neuromodulator dopamine arrives in the range of minutes.
In summary, both top-down approaches following machine learning principles and bottom-up approaches built upon in-vivo and in-vitro synaptic plasticity experiments imply the importance of having eligibility traces in neural architectures.
## 3 . 3 . 2 PCM-trace: Implementing eligibility traces with PCM drift
## 3 . 3 . 2 . 1 PCM Measurements
Temporal evolution of electrical resistivity is a widely-observed phenomenon in PCM due to the rearrangements of atoms in the amorphous phase [ 219 ]. This behavior is commonly referred to as structural relaxation or drift. To start the drift, a strong RESET pulse is applied to induce a crystalline to amorphous phase transition where the PCM is melted and quenched. The lowordered and highly-stressed amorphous state then evolves to a more energetically favorable glass state within tens of seconds [ 112 ].
At constant ambient temperature, the resistivity follows
$$R ( t ) = R ( t _ { 0 } ) \left ( \frac { t } { t _ { 0 } } \right ) ^ { \nu } , \quad \, \begin{array} { l } { { ( 3 . 6 ) } } \end{array}$$
where R ( t 0 ) is the resistance measured at time t 0 and ν is the drift coefficient. It has been experimentally verified by many groups that Eq. ( 3 . 6 ) can successfully capture the drift dynamics [ 112 , 156 , 220 ], from microseconds to hours range [ 221 ].
F igure 3 . 11 : Experimental (dots), and simulated (dashed lines) resistance drift characteristics at constant room temperature.
<details>
<summary>Image 19 Details</summary>

### Visual Description
## Resistance vs. Time Chart
### Overview
The image is a chart displaying the relationship between resistance (in Ohms) and time (in seconds). It shows three data series, each representing an experiment, along with corresponding model curves. An inset diagram illustrates the pulse shape used in the experiment.
### Components/Axes
* **X-axis:** Time (s), ranging from 0 to 30 seconds. Increments are marked at 5-second intervals.
* **Y-axis:** Resistance (Ω), scaled by a factor of 10^6. The axis ranges from 2 to 5, implying a resistance range of 2 x 10^6 Ω to 5 x 10^6 Ω. Increments are marked at intervals of 1 x 10^6 Ω.
* **Legend:** Located in the bottom-right corner.
* "Experiment": Represented by blue dots.
* "Model": Represented by a blue dashed-dotted line.
* **Inset Diagram:** Located in the top-left corner, showing a pulse shape with "V_RESET" indicating the voltage and "100 ns" indicating the pulse width.
### Detailed Analysis
There are three distinct data series plotted on the chart. Each series consists of blue dots representing experimental data and a corresponding blue dashed-dotted line representing the model.
* **Bottom Series:**
* **Experiment (Blue Dots):** Starts at approximately 1.8 x 10^6 Ω at time 0, increases rapidly initially, and then plateaus to approximately 3.1 x 10^6 Ω at 30 seconds.
* **Model (Blue Dashed-Dotted Line):** Follows a similar trend to the experimental data, starting at approximately 1.8 x 10^6 Ω and approaching 3.1 x 10^6 Ω at 30 seconds.
* **Middle Series:**
* **Experiment (Blue Dots):** Starts at approximately 2.8 x 10^6 Ω at time 0, increases rapidly initially, and then plateaus to approximately 4.3 x 10^6 Ω at 30 seconds.
* **Model (Blue Dashed-Dotted Line):** Follows a similar trend to the experimental data, starting at approximately 2.8 x 10^6 Ω and approaching 4.3 x 10^6 Ω at 30 seconds.
* **Top Series:**
* **Experiment (Blue Dots):** Starts at approximately 3.3 x 10^6 Ω at time 0, increases rapidly initially, and then plateaus to approximately 5.3 x 10^6 Ω at 30 seconds.
* **Model (Blue Dashed-Dotted Line):** Follows a similar trend to the experimental data, starting at approximately 3.3 x 10^6 Ω and approaching 5.3 x 10^6 Ω at 30 seconds.
### Key Observations
* All three data series exhibit a similar trend: a rapid increase in resistance during the initial phase (0-5 seconds), followed by a gradual plateauing as time increases.
* The "Model" curves closely match the "Experiment" data points for all three series, suggesting a good fit between the model and the experimental results.
* The inset diagram indicates that the experiment involves applying a voltage pulse (V_RESET) with a duration of 100 nanoseconds.
### Interpretation
The chart demonstrates the time-dependent behavior of resistance under certain experimental conditions, likely related to a material or device being subjected to voltage pulses. The close agreement between the experimental data and the model suggests that the model accurately captures the underlying physical processes governing the resistance change. The initial rapid increase in resistance could be attributed to a fast-acting mechanism, while the subsequent plateauing indicates a saturation effect or a slower, rate-limiting process. The different starting resistance values for the three series could represent different initial states or experimental parameters. The pulse shape and duration (V_RESET, 100 ns) are crucial parameters in understanding the experimental setup and the observed resistance behavior.
</details>
With the collaboration of CEA-LETI, we integrated Ge 2 Sb 2 Te 5 -based PCM in state-of-the-art PCM heater-based devices fabricated in the Back-End-Of-Line (BEOL) based on 130 nm CMOS technology. The PCM thickness is 50 nm with the bottom size of 3600 nm 2 . Drift measurements
F igure 3 . 12 : Accumulating eligibility trace using PCM-trace drift model (Eq. 3 . 7 ). After resetting the PCM-trace device at t = 0, 5 random synaptic tags are applied to the synapse, implemented by a gradual SET for each tag that results in 50% increase in the conductivity. The device can keep the eligibility trace for more than 10 s.
<details>
<summary>Image 20 Details</summary>

### Visual Description
## Line Graph: Conductance vs. Time
### Overview
The image is a line graph showing the relationship between conductance (in microSiemens, µS) and time (in seconds, s). The graph displays two data series: "Accumulating e-trace" represented by a solid black line, and "Synaptic tagging" represented by a dotted black line. The graph shows an initial period of rapid change followed by a gradual decay towards a higher baseline. The background has a gradient from white to light blue, with the "Higher Baseline" text placed in the upper-right quadrant.
### Components/Axes
* **X-axis:** Time (s), ranging from 0 to 12 seconds, with tick marks at every 2 seconds.
* **Y-axis:** Conductance (µS), ranging from 0 to 1.0 µS, with tick marks at 0.5 and 1.0 µS.
* **Legend:** Located in the center-right of the graph.
* Solid black line: "Accumulating e-trace"
* Dotted black line: "Synaptic tagging"
* **Text Annotation:** "Higher Baseline" is written in the upper-right quadrant of the graph.
### Detailed Analysis
* **Accumulating e-trace (Solid Black Line):**
* Trend: Initially starts at approximately 0.3 µS, exhibits a series of sharp increases and decreases between 0 and 2 seconds, peaks at approximately 1.1 µS, and then gradually decreases to a value around 0.5 µS by 12 seconds.
* Data Points:
* Time = 0 s, Conductance ≈ 0.3 µS
* Time = 1 s, Conductance peaks at approximately 0.7 µS
* Time = 1.5 s, Conductance peaks at approximately 1.1 µS
* Time = 2 s, Conductance ≈ 0.7 µS
* Time = 4 s, Conductance ≈ 0.5 µS
* Time = 6 s, Conductance ≈ 0.5 µS
* Time = 8 s, Conductance ≈ 0.5 µS
* Time = 10 s, Conductance ≈ 0.5 µS
* Time = 12 s, Conductance ≈ 0.5 µS
* **Synaptic tagging (Dotted Black Line):**
* Trend: Consists of three vertical dotted lines between 0 and 2 seconds.
* Data Points:
* Located at approximately 0.6 s, 1.0 s, and 1.4 s.
### Key Observations
* The "Accumulating e-trace" shows an initial burst of activity followed by a decay to a stable level.
* The "Synaptic tagging" events occur during the initial burst of activity of the "Accumulating e-trace".
* The conductance stabilizes at a "Higher Baseline" of approximately 0.5 µS after the initial activity.
### Interpretation
The graph illustrates the dynamics of conductance over time, likely in a biological system such as a neuron. The "Accumulating e-trace" represents the overall electrical activity, while the "Synaptic tagging" indicates specific events that might be related to synaptic plasticity or learning. The initial burst of activity suggests a response to a stimulus, and the subsequent decay indicates a return to a baseline state. The "Higher Baseline" suggests that the system settles into a new, elevated state of conductance after the initial activity. The synaptic tagging events occurring during the initial burst may be crucial for long-term changes in synaptic strength.
</details>
were performed on three devices to monitor the temporal evolution of the resistance in the HRS state, particularly confirming the model in Eq. ( 3 . 6 ). The test was conducted by first resetting all the cells by applying a RESET pulse to the heater, which has a width of 100 ns with 5 ns rising and falling times, and a peak voltage of 1.85 V. Then, an additional programming pulse is used to bring the devices to different initial conditions, corresponding to R ( t = 1 s ) = [ 1.77 M Ω , 2.39 M Ω , 2.89 M Ω ] . The low-field device resistances are measured every 1 s for 30 s by applying a READ pulse which has the same timing of the RESET pulse but a peak voltage of 0.05 V.
PCM-trace is a novel method to implement seconds-long eligibility trace for the synapse using the drift feature of PCM. By writing Eq. ( 3 . 6 ) as a difference equation of the conductance, we can show that the temporal evolution of the conductance has decay characteristics similar to Eq. ( 3 . 5 ) such that G t + ∆ t ij = ( t -t p t -t p + ∆ t ) ν G t ij , where G t 0 ij = 1/ R t 0 ij , and t p is the last programming time as drift re-initializes with every gradual SET [ 81 , 222 ]. The main difference is that the rate of change in PCM resistivity is a function of time; nevertheless, its time constant is comparable for behavioral time-scales as τ PCM = -∆ t / log (( t / ( t + ∆ t )) ν ) is on the order of tens of seconds [ 213 ]. Therefore, the PCM-trace dynamics can emulate the eligibility trace of the synapse as follows:
$$G _ { i j } ^ { t + \Delta t } = \left ( \frac { t - t _ { p } } { t - t _ { p } + \Delta t } \right ) ^ { \nu } G _ { i j } ^ { t } + \eta f _ { j } ( x _ { j } ^ { t } ) g _ { i } ( x _ { i } ^ { t } )$$
In the PCM-trace method (Eq. 3 . 7 ), the accumulating term on the eligibility trace is implemented by applying a gradual SET to the PCM device whenever the synapse is tagged. To maximize the number of accumulations a PCM device can handle without getting stuck in the LRS regime, some operational conditions need to be satisfied. We initialize the device to HRS by applying a strong RESET pulse, and wait for an initialization time t init of at least 250 ms for the device resistance to increase. If t init is too short, the device conductance would still be too high to be able to accumulate enough tags; and if it is too long, the decay will be weaker (see Eq. 3 . 6 ). Initialization time can be modulated to reach the desired drift speed depending on the material choice and the application. After the initialization time, whenever the synapse is tagged, a single gradual SET (with an amplitude of 100 µA and a pulse width of 100 ns with 5 ns rising and falling times) is applied. To make sure that the device stays in the HRS, a read-verify-set scheme can be used. Finally, the value of the eligibility trace can be measured after seconds by reading the conductance of the device (see Fig. 3 . 12 ).
## 3 . 3 . 3 Multi PCM-trace: Increasing the dynamic range of traces
The number of gradual SET pulses applied to a single PCM-trace device is limited, because each pulse partially increases the device conductivity and eventually move the device toward its LRS ( < 2 M Ω ), where the drift converges to a higher baseline level. This problem can be solved by storing the synaptic eligibility trace distributed across multiple PCM devices, as in Fig. 3 . 13 .
F igure 3 . 13 : Multi PCM-trace concept. Each synapse has a weight and a PCM-trace block where multiple parallel PCM devices keep the eligibility trace of the synapse with their natural drift behavior.
<details>
<summary>Image 21 Details</summary>

### Visual Description
## Diagram: Synapse Model
### Overview
The image presents a diagram of a synapse model, illustrating the connection between a presynaptic neuron and a postsynaptic neuron. The synapse is represented as a central processing unit with two key components: synaptic efficacy and synaptic eligibility trace. The diagram includes mathematical notations to describe the signal transmission and processing within the synapse.
### Components/Axes
* **Presynaptic neuron:** Labeled "Presynaptic neuron" on the left side of the diagram. It is represented by a circle labeled "j".
* **Postsynaptic neuron:** Labeled "Postsynaptic neuron" on the right side of the diagram. It is represented by a circle labeled "i".
* **Synapse:** The central part of the diagram, enclosed in a dashed-line rectangle and labeled "Synapse" at the top.
* **Synaptic efficacy:** The upper part of the synapse, labeled "Synaptic efficacy". It contains the term "W<sub>ij</sub>".
* **Synaptic eligibility trace:** The lower part of the synapse, labeled "Synaptic eligibility trace". It contains a representation of "N parallel memristor devices".
* **Input signal:** Represented by an arrow labeled "x<sub>j</sub>" pointing from the presynaptic neuron to the synapse.
* **Output signal:** Represented by an arrow labeled "x<sub>i</sub>" pointing from the synapse to the postsynaptic neuron.
* **Synaptic current:** Represented by an arrow labeled "I<sub>i</sub> = W<sub>ij</sub>x<sub>j</sub>" pointing from the synapse to the postsynaptic neuron. A blue rectangular pulse is shown above this arrow.
* **Eligibility trace signal:** Represented by an arrow labeled "e<sub>ij</sub> = Σ<sub>n=1</sub><sup>N</sup> G<sub>n</sub>" pointing from the synapse to the postsynaptic neuron. A decaying blue curve is shown above this arrow.
### Detailed Analysis or Content Details
* **Presynaptic neuron:** The presynaptic neuron sends a signal "x<sub>j</sub>" to the synapse.
* **Synapse:** The synapse processes the incoming signal. It consists of two main components:
* **Synaptic efficacy (W<sub>ij</sub>):** Represents the strength of the connection between the two neurons.
* **Synaptic eligibility trace:** Implemented using "N parallel memristor devices". This component tracks the history of synaptic activity.
* **Postsynaptic neuron:** The postsynaptic neuron receives the processed signal from the synapse.
* **Synaptic current (I<sub>i</sub>):** The current generated in the postsynaptic neuron is calculated as the product of the synaptic efficacy (W<sub>ij</sub>) and the presynaptic signal (x<sub>j</sub>), i.e., I<sub>i</sub> = W<sub>ij</sub>x<sub>j</sub>.
* **Eligibility trace signal (e<sub>ij</sub>):** The eligibility trace signal is calculated as the sum of the contributions from N parallel memristor devices, represented as e<sub>ij</sub> = Σ<sub>n=1</sub><sup>N</sup> G<sub>n</sub>.
### Key Observations
* The diagram illustrates a simplified model of a synapse, highlighting the key components involved in signal transmission and processing.
* The synaptic efficacy (W<sub>ij</sub>) represents the strength of the connection, while the synaptic eligibility trace captures the history of synaptic activity.
* The synaptic current (I<sub>i</sub>) is directly proportional to the synaptic efficacy and the presynaptic signal.
* The eligibility trace signal (e<sub>ij</sub>) is a function of the activity of N parallel memristor devices.
### Interpretation
The diagram provides a functional representation of a synapse, emphasizing the role of synaptic efficacy and eligibility traces in neural communication. The use of memristor devices to implement the eligibility trace suggests a potential mechanism for learning and adaptation in neural networks. The equations presented describe how the signals are processed and transmitted between neurons. The diagram highlights the complex interplay of factors that determine the strength and plasticity of synaptic connections.
</details>
By successively routing the tags to multiple PCM devices, the number of gradual SET pulses to be applied per single device is significantly reduced. The postsynaptic neuron receives the sum of product of the pre-synaptic activity and the weight block. In parallel, the PCM-trace block calculates the eligibility trace as a function of pre- and post-synaptic activities (Eq. 3 . 7 ), to be used in the weight update. Fig. 3 . 14 demonstrates the increase of effective dynamic range (number of updates to eligibility trace without getting stuck in the LRS) using multiple PCM devices.
F igure 3 . 14 : Accumulating eligibility trace using multi-PCM configuration. Synapse receives 15 tags between 300 ms to 1300 ms which are routed to three different devices shown in the top three plots. The effective eligibility trace is calculated by applying a READ pulse to the parallel PCM devices. The initialization duration and synaptic activity period are shown with dashed lines in the bottom plot. The synaptic efficacy Wij is modified depending on the state of eligibility trace once the third-factor signal arrives.
<details>
<summary>Image 22 Details</summary>

### Visual Description
## Chart: PCM and Synaptic Dynamics
### Overview
The image presents a series of time-series plots illustrating the dynamics of Phase Change Memory (PCM) cells and synaptic activity. The top three plots show the conductance (G) of three PCM cells (PCM #1, PCM #2, PCM #3) over time, with initial spiking behavior followed by a relatively stable state. The fourth plot shows the synaptic efficacy (e_ij) over time, exhibiting initial spiking behavior and then decaying to a lower value. The bottom plot shows the synaptic weight (W_ij) over time, with step changes at approximately 10s and 20s.
### Components/Axes
* **Y-axis (Conductance G):**
* Units: µs (microseconds)
* Range: Varies for each plot.
* PCM #1, #2, #3: 0.25 to 1.2
* e_ij: 1 to 3
* W_ij: 0 to 10
* **X-axis (Time):**
* Units: s (seconds)
* Range: 0 to 30
* **Legends (Top-Right):**
* PCM #1: Black line
* PCM #2: Black line
* PCM #3: Black line
* e_ij: Black line
* W_ij: Black line
* **Annotations:**
* "init": Vertical dashed line at approximately 0.2s
* "tags": Vertical dashed line at approximately 0.7s
* "delay": Text label pointing to the decaying part of the e_ij curve.
* "3rd-f": Downward arrows at approximately 10s and 20s on the e_ij plot.
### Detailed Analysis
* **PCM #1, PCM #2, PCM #3:**
* All three PCM cells exhibit similar behavior.
* Initial rapid spikes in conductance (G) within the first 2 seconds. The spikes are colored blue, green, and red respectively.
* After the initial spikes, the conductance stabilizes at a lower value, approximately 0.25 µs.
* Trend: Initial spikes followed by stabilization.
* **e_ij (Synaptic Efficacy):**
* Initial rapid spikes in synaptic efficacy (e_ij) within the first 2 seconds. The spikes are colored blue, green, and red respectively.
* After the initial spikes, the efficacy decays over time, starting from approximately 2.5 µs and decreasing to approximately 1 µs.
* Vertical arrows labeled "3rd-f" at approximately 10s and 20s.
* A horizontal dotted line is present at G = 1.
* Trend: Initial spikes, followed by decay.
* **W_ij (Synaptic Weight):**
* The synaptic weight (W_ij) starts at 0.
* At approximately 10s, W_ij jumps to a value of approximately 8.
* At approximately 20s, W_ij returns to 0.
* Trend: Step function with two transitions.
### Key Observations
* The PCM cells exhibit similar spiking behavior initially, followed by stabilization at a low conductance value.
* The synaptic efficacy (e_ij) shows initial spiking, followed by a decay over time.
* The synaptic weight (W_ij) exhibits step changes at specific time points, suggesting external control or events.
* The "3rd-f" annotations on the e_ij plot coincide with the step changes in W_ij.
### Interpretation
The data suggests a relationship between the PCM cell activity, synaptic efficacy, and synaptic weight. The initial spiking in the PCM cells and synaptic efficacy might represent an initial learning or adaptation phase. The subsequent decay in synaptic efficacy could represent forgetting or weakening of the synapse over time. The step changes in synaptic weight, triggered by external events ("3rd-f"), could represent reinforcement or modification of the synaptic connection. The "delay" annotation suggests a time delay in the effect of the initial spiking on the overall synaptic dynamics. The PCM cells are likely acting as input signals to a synapse, where the synaptic efficacy and weight determine the strength of the connection. The "3rd-f" events are likely external stimuli that modulate the synaptic weight.
</details>
## 3 . 3 . 4 Circuits and Architecture
## 3 . 3 . 4 . 1 PCM-trace Architecture
An example in-memory event-based neuromorphic architecture is shown in Fig. 3 . 15 , where the PCM-trace is employed to enable three-factor learning on behavioral time scales.
Synapse: Each synapse includes a weight block Wij in which two PCM devices are used in differential configuration to represent positive and negative weights [ 143 ]. The effective synaptic weight is calculated as the difference of these two conductance values, i.e., Wij = W + ij -W -ij . Also, each synapse has a PCM-trace block e ij that keeps the eligibility trace. Inside the PCM-trace block, there are two PCM devices, keeping track of the positive and negative correlation between pre
and post-synaptic neurons. On the onset of the pre-synaptic input spike, PREj , (i) Wij is read, and the current is integrated by the post-synaptic neuron i ; (ii) Based on the UP / DN signal from the learning block (LB), a gradual SET programming current is applied to positive/negative PCM-trace devices.
Neuron with Learning Block (LB): The LB estimates the pre-post synaptic neuron correlation using the Spike-Driven Synaptic Plasticity (SDSP) rule [ 223 ]. At the time of the pre-synaptic spike, the post-synaptic membrane variable is compared against a threshold, above (below) which an UP ( DN ) signal is generated representing the tag type. On the arrival of the third factor binary reward signal, REW , the state of the Eligibility Traces (ETs) devices is read by the VPROG block (Fig. 3 . 16 b) which generates a gate voltage that modulates the current that programs the weight devices Wij (see Alg. 2 ).
F igure 3 . 15 : PCM-trace-based neuromorphic architecture for three-factor learning. Only positive eligibility trace ( e + ij ) and W + ij are shown.
<details>
<summary>Image 23 Details</summary>

### Visual Description
## Circuit Diagram: Memristor-Based Neural Network
### Overview
The image is a circuit diagram depicting a memristor-based neural network architecture. It shows a 2x2 array of memristor cells, along with associated circuitry for programming (PROG), reading (READ), precharging (PRE), and rewriting (REW). The diagram also illustrates the connection to neuron blocks (NEURON + LB). The diagram is repeated with ellipsis to indicate that the array can be expanded.
### Components/Axes
* **Vertical Axis (Left):**
* PROG: Programming line
* READ: Read line
* **Memristor Cells:**
* W11, W12, W21, W22: Memristors representing synaptic weights.
* e11, e12, e21, e22: Memristors representing error signals.
* **Control Signals:**
* PRE: Precharge signal
* REW: Rewrite signal
* UP & PRE: Update and Precharge signal
* **Neuron Blocks (Right):**
* NEURON + LB #1: Neuron and Learning Block 1
* NEURON + LB #2: Neuron and Learning Block 2
* **Other Components:**
* VPROG: Programming voltage source
* Switches: Various switches controlling the flow of current for different operations.
* Ground: Ground connections.
* **Interconnections:**
* Horizontal lines: Connect memristor cells to control signals and neuron blocks.
* Vertical lines: Connect memristor cells to programming and read lines.
* UP/DN: Update/Down signal lines connecting to neuron blocks.
### Detailed Analysis
The diagram shows a 2x2 array of memristor cells. Each cell consists of two memristors, one representing the synaptic weight (W) and the other representing the error signal (e). The cells are arranged in a grid, with the synaptic weights (W11, W12, W21, W22) located above the error signals (e11, e12, e21, e22) in each cell.
Each memristor cell is connected to a set of control signals (PRE, REW, UP & PRE) and to the programming and read lines (PROG, READ). The control signals are used to configure the cell for different operations, such as programming, reading, precharging, and rewriting.
The memristor cells are also connected to neuron blocks (NEURON + LB). The neuron blocks receive the output of the memristor cells and perform the necessary computations to update the synaptic weights.
The diagram also shows a programming voltage source (VPROG), which is used to program the memristor cells. The programming voltage is applied to the memristor cells through a set of switches.
The diagram includes ellipsis (...) to indicate that the array can be expanded to larger sizes.
* **Top-Left Cell:** Contains memristor W11, memristor e11, and associated switches. Control signals PRE, REW, and UP & PRE are connected to this cell.
* **Top-Right Cell:** Contains memristor W12, memristor e12, and associated switches. Control signals PRE, REW, and UP & PRE are connected to this cell.
* **Bottom-Left Cell:** Contains memristor W21, memristor e21, and associated switches.
* **Bottom-Right Cell:** Contains memristor W22, memristor e22, and associated switches.
* **Neuron Blocks:** Two neuron blocks are shown, labeled NEURON + LB #1 and NEURON + LB #2. These blocks receive input from the memristor cells and provide update signals (UP/DN).
### Key Observations
* The diagram illustrates a basic architecture for a memristor-based neural network.
* The memristor cells are arranged in a grid, with each cell containing two memristors.
* The memristor cells are connected to control signals, programming and read lines, and neuron blocks.
* The diagram includes a programming voltage source and various switches for controlling the flow of current.
* The array can be expanded to larger sizes.
### Interpretation
The diagram demonstrates a hardware implementation of a neural network using memristors. Memristors are used to represent synaptic weights and error signals, allowing for efficient storage and computation. The control signals and switches enable precise control over the programming, reading, and updating of the memristor cells. The neuron blocks perform the necessary computations to update the synaptic weights based on the error signals. This architecture has the potential to enable the development of energy-efficient and high-performance neural networks. The use of memristors allows for in-memory computing, which can significantly reduce the energy consumption of neural networks. The ability to expand the array to larger sizes allows for the creation of more complex neural networks.
</details>
## 3 . 3 . 4 . 2 Circuit Simulation
Fig. 3 . 16 describes the block diagram of the LB implementing SDSP rule, which calculates the prepost neurons' correlation. The membrane variable (described here as a current Imem since circuits are in current-mode) is compared against a threshold value I th through a Bump circuit [ 143 , 224 ].
The output of this block is digitized through a current comparator (in our design chosen as a Winner-Take-All (WTA) block [ 225 ]) and generates UP / DN signals if the membrane variable is above/below the threshold I th , and STOP, SP , if they are close within the dead zone of the bump circuit [ 224 ].
Fig. 3 . 16 b presents the circuit schematic which reads the PCM-trace and generates VPROG . To read the state of the device, a voltage divider is formed between the PCM device and a pseudo resistor, highlighted in green. As the device resistance changes, the input voltage to the differential pair, highlighted in red, changes. This change is amplified by the gain of the diff. pair and the device current is normalized to its tail current giving rise to IPROG
F igure 3 . 16 : (a) Learning block diagram generating UP/DN signals as a function of the correlation between pre and post-synaptic activity. (b) VPROG circuit reading from the eligibility trace device through the voltage divider (green) and generating IPROG through the diff. pair (red) to program the weight device.
<details>
<summary>Image 24 Details</summary>

### Visual Description
## Circuit Diagram: Learning Block and Programming Circuit
### Overview
The image presents two circuit diagrams, labeled (a) and (b). Diagram (a) depicts a "Learning Block" composed of a "BUMP" (Correlation Detector) and a "WTA" (Current Comparator). Diagram (b) shows a programming circuit with transistors and a memristor.
### Components/Axes
**Diagram (a): Learning Block**
* **Title:** (a)
* **Overall Structure:** The Learning Block is enclosed in a dashed rectangle.
* **Input Currents:**
* I<sub>mem</sub> (input on the left)
* I<sub>th</sub> (input on the left)
* **Components:**
* **BUMP:** Labeled "BUMP" with the description "Correlation Detector" below it.
* **WTA:** Labeled "WTA" with the description "Current Comparator" below it.
* **Output Signals:**
* UP (output on the right)
* SP (output on the right)
* DN (output on the right)
**Diagram (b): Programming Circuit**
* **Title:** (b)
* **Key Components:**
* Memristor: Located at the top, connected to a switch.
* Transistors: Several transistors are arranged in a specific configuration.
* **Currents/Voltages:**
* I<sub>ET</sub> (current flowing from the memristor)
* I<sub>PROG</sub> (programming current)
* V<sub>PROG</sub> (programming voltage)
* **Highlighted Regions:**
* Top Region (Orange): Contains a set of transistors.
* Bottom-Left Region (Teal): Contains a stack of transistors connected to ground.
### Detailed Analysis or Content Details
**Diagram (a): Learning Block**
* The Learning Block takes two current inputs, I<sub>mem</sub> and I<sub>th</sub>.
* The BUMP block performs correlation detection.
* The WTA block performs current comparison.
* The Learning Block outputs three signals: UP, SP, and DN.
**Diagram (b): Programming Circuit**
* The circuit includes a memristor, which is a type of non-volatile memory.
* The current I<sub>ET</sub> flows from the memristor through a switch.
* The programming current I<sub>PROG</sub> is controlled by the programming voltage V<sub>PROG</sub>.
* The orange region contains a transistor configuration that likely serves as a current mirror or amplifier.
* The teal region contains a transistor configuration that likely serves as a current sink.
### Key Observations
* Diagram (a) represents a high-level functional block, while diagram (b) shows a detailed circuit implementation.
* The Learning Block likely uses the programming circuit to adjust its internal parameters or weights.
* The memristor in diagram (b) suggests that the learning process involves non-volatile storage of information.
### Interpretation
The image illustrates a system for implementing learning functionality using memristors. The Learning Block (a) performs the core learning operations, while the programming circuit (b) allows for adjusting the memristor's resistance, effectively storing the learned information. The BUMP and WTA blocks likely perform correlation detection and winner-take-all current comparison, respectively. The UP, SP, and DN signals could represent different states or outputs of the learning process. The programming circuit uses transistors to control the current flowing through the memristor, allowing for precise adjustment of its resistance. This setup suggests a neuromorphic computing approach, where memristors are used to emulate the behavior of synapses in the brain.
</details>
which develops VPROG through the diode-connected NMOS transistor. VPROG is connected to the gate of the transistor in series with the weight PCM (see Fig. 3 . 15 ).
Fig. 3 . 17 a plots PRE, Imem , the output of the learning block at the time of the PRE, and the gradual
<details>
<summary>Image 25 Details</summary>

### Visual Description
## Algorithm: Three-factor learning with PCM-trace
### Overview
The image presents Algorithm 2, which describes a three-factor learning process using PCM-trace. The algorithm outlines the steps for updating weights and eligibility traces based on reward signals and temporal conditions.
### Components/Axes
The algorithm consists of the following components:
- Initialization of weights (W) using a random function.
- Resetting eligibility traces (e).
- A main loop that iterates while time (t) is less than taskDuration.
- Calculation of Ii,x based on Vi,th and Vi,mem.
- Conditional execution based on @Pre and t > tinit for eligibility trace accumulation.
- Updating eligibility traces based on Ii,x compared to thresholds I+th and I-th.
- A third-factor component that executes if Reward is true.
- Reading and updating Iij,e+ and Iij,e- based on eligibility traces.
- Calculating I+PROG and I-PROG using scale_const.
- Gradual setting of weights (W) based on I+PROG and I-PROG.
### Detailed Analysis or ### Content Details
The algorithm is presented as pseudocode. Here's a breakdown of the code:
1. **Initialization:**
* `W_{ij}^+ = rand();`
* `W_{ij}^- = rand();`
* `RESET(e_{ij}^+);`
* `RESET(e_{ij}^-);`
2. **Main Loop:**
* `while t < taskDuration do`
* `I_{i,x} = 1 - (V_{i,th} - V_{i,mem}) / V_{i,th};`
* `if @Pre and t > t_{init} then`
* `# Eligibility trace accumulation`
* `forall e_{ij} do`
* `if I_{i,x} > I_{th}^+ then`
* `GRADUAL_SET(e_{ij}^+);`
* `if I_{i,x} < I_{th}^- then`
* `GRADUAL_SET(e_{ij}^-);`
3. **Third-factor:**
* `# Third-factor`
* `if Reward then`
* `forall W_{ij} do`
* `I_{ij,e^+}, I_{ij,e^-} = READ(e_{ij}^+, e_{ij}^-);`
* `I_{PROG}^+ = I_{ij,e^+} * scale_const;`
* `I_{PROG}^- = I_{ij,e^-} * scale_const;`
* `GRADUAL_SET(W_{ij}^+, I_{PROG}^+);`
* `GRADUAL_SET(W_{ij}^-, I_{PROG}^-);`
### Key Observations
- The algorithm uses both positive and negative weights and eligibility traces, indicated by the "+" and "-" superscripts.
- The `GRADUAL_SET` function is used to update both eligibility traces and weights, suggesting a gradual adjustment mechanism.
- The third-factor component is triggered by a `Reward` signal, indicating a reinforcement learning aspect.
- The variable `I_{i,x}` is calculated based on `V_{i,th}` and `V_{i,mem}`, which likely represent threshold and memory values, respectively.
### Interpretation
The algorithm describes a three-factor learning rule that incorporates eligibility traces and a reward signal to update weights. The use of PCM-trace suggests that Phase Change Memory is involved in storing and updating the eligibility traces. The algorithm appears to implement a form of reinforcement learning where the weights are adjusted based on the reward and the eligibility traces, which capture the temporal relationship between actions and rewards. The `GRADUAL_SET` function implies a smooth and incremental adjustment of the weights and traces, potentially contributing to the stability of the learning process. The third factor, triggered by the `Reward` signal, likely modulates the weight updates based on the magnitude and valence of the reward.
</details>
SET pulse applied to the device. As shown, the UP signal is asserted when the membrane current is higher than the threshold indicated in red, which causes a gradual SET pulse with 100 µA to be applied across the PCM-trace device upon PRE events. Fig 3 . 17 b shows the generated IPROG as a function of the state of the eligibility trace device. The higher the ET device's resistance, the less the accumulated correlation, thus the lower the programming current that should be applied to the weight device. The resistance on the x axis of the plot matches the measured resistance of PCM devices shown in Fig. 3 . 11 .
## 3 . 3 . 5 Discussion
Long-lasting ETs enable the construction of powerful learning mechanisms for solving complex tasks by bridging the synaptic and behavioral time-scales. In this work, for the first time, we proposed to use the drift of PCM devices to implement ETs, and analyzed their feasibility for implementation in existing fabrication technologies.
The implementation of the three-factor learning rules with ETs per synapse requires complex memory structures for keeping track of the eligibility trace and the weight. Our proposed approach has clear advantages for scaling. Table 3 . 4 shows a comparison between our PCM synapse and a CMOS-only implementation in 22 nm FDSOI technology from [ 226 ].
PCM is among the most advanced emerging memory technology integrated into the neuromorphic domain [ 109 ]. Our approach of using PCM to store both the synaptic weight and the eligibility trace requires no additional nano-fabrication methods.
| | Area ( µm 2 ) | τ ( s ) | Area/ τ ( µm 2 s - 1 ) |
|-----------------|-----------------|-----------|--------------------------|
| CMOS [ 226 ] | 20 × 17 | 6 | 56.6 |
| PCM [This work] | 12 12 | > 30 | < 4.8 |
×
T able 3 . 4 : Area comparison of eligibility trace implementation
<details>
<summary>Image 26 Details</summary>

### Visual Description
## Chart/Diagram Type: Combined Plots
### Overview
The image presents a combination of plots. On the left, there are four time-series plots showing current variations over time. On the right, there is a plot showing the relationship between the programming current (I_PROG) and the PCM-trace resistance.
### Components/Axes
**Left Side Plots (Time Series):**
* **X-axis (all plots):** Time (ms), ranging from approximately 558 ms to 600 ms.
* **Top Plot:** Shows a series of pulses labeled "PRE". The Y-axis is not explicitly labeled, but it represents some form of signal strength.
* **Second Plot:** Y-axis: Current (nA), ranging from 0 to 10 nA. Two lines are plotted:
* Solid Black Line: Labeled "I_mem"
* Dashed Red Line: Labeled "I+_th"
* **Third Plot:** Shows a series of pulses labeled "PRE & UP". The Y-axis is not explicitly labeled, but it represents some form of signal strength.
* **Bottom Plot:** Y-axis: Current (µA), ranging from 0 to 100 µA. A series of pulses are plotted, labeled "I_SET".
**Right Side Plot:**
* **X-axis:** PCM-trace Resistance (MΩ), ranging from 2 MΩ to 4 MΩ.
* **Y-axis:** I_PROG (mA), ranging from 0.04 mA to 0.09 mA.
* A single black line shows the relationship between PCM-trace resistance and I_PROG.
### Detailed Analysis or ### Content Details
**Left Side Plots (Time Series):**
* **PRE Plot:** Shows a series of pulses with a consistent amplitude. The pulses are closely spaced.
* **I_mem and I+_th Plot:**
* I_mem (Black): Starts near 0 nA, increases linearly to approximately 1 nA around 570 ms, then drops to 0 nA. It increases again to approximately 2 nA around 585 ms, then drops to 0 nA.
* I+_th (Red): A dashed line that remains constant at approximately 1 nA.
* **PRE & UP Plot:** Shows two sets of pulses. The first set occurs around 562 ms, and the second set occurs around 582 ms.
* **I_SET Plot:** Shows two sets of pulses. The first set occurs around 562 ms, and the second set occurs around 582 ms. The pulses reach a peak of approximately 100 µA.
**Right Side Plot:**
* The black line shows a decreasing relationship between PCM-trace resistance and I_PROG.
* At 2 MΩ, I_PROG is approximately 0.093 mA.
* At 3 MΩ, I_PROG is approximately 0.055 mA.
* At 4 MΩ, I_PROG is approximately 0.035 mA.
### Key Observations
* The "PRE" signal consists of a train of pulses.
* The "I_mem" signal increases linearly and then drops to zero, possibly indicating a charging and discharging process.
* The "PRE & UP" and "I_SET" signals are synchronized, suggesting they are related.
* The "I_PROG" decreases as the PCM-trace resistance increases, indicating an inverse relationship.
### Interpretation
The plots likely represent the behavior of a phase-change memory (PCM) cell during a programming operation. The "PRE" signal might be a pre-conditioning pulse. "I_mem" could be the current through the memory cell, and "I+_th" might be a threshold current. "PRE & UP" could represent a combination of pre-conditioning and programming pulses, and "I_SET" could be the current used to set the memory cell to a specific state. The right-side plot shows that the programming current required decreases as the resistance of the PCM-trace increases. This could be due to the fact that higher resistance limits the current flow.
</details>
Ω)
F igure 3 . 17 : a) From the top: PRE events, POST membrane current ( Imem ) and learning threshold ( I th ), PRE events only when Imem is higher than I th , and corresponding gradual SET current pulse applied to PCM-trace. b) Programming current to be applied to the weight PCM as a function of eligibility trace state.
<details>
<summary>Image 27 Details</summary>

### Visual Description
Icon/Small Image (61x85)
</details>
## DISCOVERING SINGLE MATERIAL THAT SWITCHES BETWEEN VOLATILE OR NON-VOLATILE MODES
This chapter's content was published in Nature Communications, featured as one of 50 best papers recently published in the field and can be found online in an extended form including all experimental details which we omit here for clarity. The original publication is authored by Yigit Demirag ∗ , Rohit Abraham John ∗ , Yevhen Shynkarenko, Yuliia Berezovska, Natacha Ohannessian, Melika Payvand, Peng Zeng, Maryna I. Bodnarchuk, Frank Krumeich, Gökhan Kara, Ivan Shorubalko, Manu V. Nair, Graham A. Cooke, Thomas Lippert, Giacomo Indiveri and Maksym V. Kovalenko.
∗ These authors contributed equally.
Many in-memory computing frameworks demand electronic devices with specific switching characteristics to achieve the desired level of computational complexity. Existing memristive devices cannot be reconfigured to meet the diverse volatile and non-volatile switching requirements, and hence rely on tailored material designs specific to the targeted application, limiting their universality. "Reconfigurable memristors" that combine both ionic diffusive and drift mechanisms could address these limitations, but they remain elusive. Here we present a reconfigurable halide perovskite nanocrystal memristor that achieves ondemand switching between diffusive/volatile and drift/non-volatile modes by controllable electrochemical reactions. Judicious selection of the perovskite nanocrystals and organic capping ligands enable state-of-the-art endurance performances in both modes - volatile ( 2 × 10 6 cycles) and non-volatile ( 5.6 × 10 3 cycles ) . We demonstrate the relevance of such proofof-concept perovskite devices on a benchmark reservoir network with volatile recurrent and non-volatile readout layers based on 19 , 900 measurements across 25 dynamically-configured devices.
## 4 . 1 introduction
The human brain operating at petaflops consumes less than 20 W, setting a precedent for scientists that real-time, ultralow-power data processing in a small volume is possible. Inspired by the human brain, the field of neuromorphic computing attempts to emulate various computational principles of the biological substrate by engineering unique materials [ 227 -229 ] and circuits [ 29 , 230 , 231 ]. In the context of hardware implementation of neural networks, the discovery of memristors has been one of the main driving forces for highly efficient in-memory realizations of synaptic operations. Similar to evolution optimizing neurons and synapses by exploiting stable and metastable molecular dynamics [ 232 ], memristive devices of various physical mechanisms [ 25 , 233 , 234 ] have been discovered and developed with different volatile and non-volatile specifications. Since their inception, memristors have been utilized to implement a wide gamut of applications [ 235 ] such as stochastic computing [ 236 ], hyperdimensional computing [ 237 ], spiking [ 238 ] and artificial neural networks [ 95 ]. However, many of these frameworks often demand very different hardware specifications [ 16 ] (Fig. 1 a). To meet these specifications, the memristor fabrication processes are often tediously engineered to reflect the requirements of targeted neural network configurations (e.g., neural encoding, synaptic precision, etc.). For example, the latest state-of-the-art spiking neural network (SNN) models [ 75 , 197 ] require memory elements operating at multiple timescales, with both volatile and non-volatile properties (from tens of milliseconds to hours) [ 144 ]. The current approach of optimizing memristive devices to a single requirement hinders the possibility of implementing multiple computational primitives in neural networks and precludes their monolithic integration on the same hardware substrate.
In this regard, the realization of drift and diffusive memristors have garnered significant attention. Drift memristors portraying non-volatile memory characteristics are typically designed using oxide dielectric materials with a soft-breakdown behaviour. In combination with inert electrodes, the switching mechanism is determined by filaments of oxygen vacancies (valence change memory); whereas implementations with reactive electrodes rely on electrochemical
F igure 4 . 1 : (a) Different neural network frameworks demand particular switching characteristics from in-memory computing implementations. For example, delay systems [ 239 ] (dynamical nonlinear systems with delayed feedback such as virtual reservoir networks), should exhibit only a fading memory to process the inputs from the recent past. Such short-term dynamics are best represented by volatile thresholdswitching memristors [ 240 ]. SNNs often demand both volatile and non-volatile dynamics, simultaneously. Synaptic mechanisms requiring STP and eligibility traces [ 241 ] can be implemented using volatile memristors [ 24 , 151 ] whereas synaptic efficacy requires either efficient binary-switching [ 20 ] or analog switching devices. Lastly, ANN performances specifically benefit from non-volatile features such as multi-level bit precision of weights and linear conductance response during the training phase [ 21 , 22 ] (b) A reconfigurable memristor with active control over its diffusive and drift dynamics may be a feasible unifying solution. Schematic of the reconfigurable halide perovskite nanocrystal memristor is shown for reference. We utilize the same active switching material ( CsPbBr3NCs capped with OGB ligands) to implement two distinct types of computation in the RC framework. The volatile diffusive mode exhibiting short-term memory is utilized as the reservoir layer while the non-volatile drift mode exhibiting long-term memory serves as the readout layer.
<details>
<summary>Image 28 Details</summary>

### Visual Description
## Neural Network Switching Types
### Overview
The image illustrates three types of neural networks (Virtual Reservoir, Spiking Neural, and Artificial Neural Networks) and their corresponding switching behaviors (Threshold, Binary, and Analog). It also depicts the physical mechanisms behind volatile and non-volatile switching, showing thin and thick filament formation.
### Components/Axes
**Part a: Neural Networks and Switching Characteristics**
* **Top Row:** Three types of neural networks are shown:
* Virtual Reservoir Networks: A circular network with nodes and connections.
* Spiking Neural Networks: A network with amplifier-like components.
* Artificial Neural Networks: A layered network structure.
* **Bottom Row:** Current-Voltage (I-V) characteristics for each network type:
* **Axes:** Each I-V plot has "Voltage" on the x-axis and "Current" on the y-axis.
* **Key Points:** Each plot indicates Vreset, Icc, and Vset.
* **Threshold Switching:** The I-V curve shows a sharp transition at a threshold voltage.
* **Binary Switching:** The I-V curve shows a clear on/off state transition.
* **Analog Switching:** The I-V curve shows gradual changes in current with voltage. The plot also labels "Gradual reset" and "Gradual set".
**Part b: Physical Mechanisms of Switching**
* **Left:** "Volatile Diffusive" switching mechanism with a "Thin filament" structure.
* Shows a schematic of a device with layers labeled: ITO, PEDOT:PSS, pTPD, OGB capped CsPbBr3 NCs, and Ag.
* Illustrates the formation of a thin filament.
* **Middle:** A top-down view of the device structure.
* **Right:** "Non-Volatile Drift" switching mechanism with a "Thick filament" structure.
* Shows a schematic of a device with layers labeled: ITO, PEDOT:PSS, pTPD, OGB capped CsPbBr3 NCs, and Ag.
* Illustrates the formation of a thick filament.
### Detailed Analysis or ### Content Details
**Part a: Neural Networks and Switching Characteristics**
* **Virtual Reservoir Networks:**
* The network consists of a dashed circle with several filled blue circles representing nodes. One node is white. Arrows indicate the direction of flow. The symbol "Ï„" is present inside the circle.
* **Spiking Neural Networks:**
* The network contains amplifier-like components connected by lines.
* **Artificial Neural Networks:**
* The network is represented by stacked layers of blocks, with the number "5" visible above the layers.
* **Threshold Switching:**
* The I-V curve starts at low voltage and current, then sharply increases in current at a certain voltage (Vset). As voltage decreases, the current drops sharply at Vreset.
* **Binary Switching:**
* The I-V curve shows a clear on/off state transition.
* **Analog Switching:**
* The I-V curve shows gradual changes in current with voltage, indicating multiple intermediate states.
**Part b: Physical Mechanisms of Switching**
* **Volatile Diffusive (Thin Filament):**
* The schematic shows a thin filament forming between the top and bottom electrodes.
* The layers are arranged from bottom to top: ITO, PEDOT:PSS, pTPD, OGB capped CsPbBr3 NCs, and Ag.
* **Non-Volatile Drift (Thick Filament):**
* The schematic shows a thick filament forming between the top and bottom electrodes.
* The layers are arranged from bottom to top: ITO, PEDOT:PSS, pTPD, OGB capped CsPbBr3 NCs, and Ag.
### Key Observations
* The type of neural network correlates with the switching behavior observed in the I-V characteristics.
* Volatile switching is associated with thin filament formation, while non-volatile switching is associated with thick filament formation.
* The device structure consists of multiple layers, including ITO, PEDOT:PSS, pTPD, OGB capped CsPbBr3 NCs, and Ag.
### Interpretation
The image demonstrates the relationship between different types of neural networks and their corresponding switching behaviors. It highlights the physical mechanisms underlying volatile and non-volatile switching, which are attributed to the formation of thin and thick filaments, respectively. The different switching behaviors (threshold, binary, and analog) are linked to the specific characteristics of the neural networks and the filament formation process. This suggests that the choice of materials and device structure can be tailored to achieve specific switching characteristics for different neural network applications.
</details>
reactions to form conductive bridges (electrochemical metallization memory) [ 242 ]. Such driftbased memristors fit well for emulating synaptic weights that stay stable between weight updates. In contrast, diffusive memristors are often built with precisely embedded clusters of metallic ions with low diffusion activation energy within a dielectric matrix [ 234 ]. The large availability of such mobile ionic species and their low diffusion activation energy facilitate spontaneous relaxation to the insulating state upon removing power, resulting in volatile threshold switching. Memristive devices with such short-term volatility, are better suited to process temporally-encoded input patterns [ 243 ]. Hence, the application determines the type of volatility, bit-precision or endurance of the memristors, which are then heavily tailored by tedious material design strategies to meet these demands [ 16 ]. For example, deep neural network (DNN) inference workloads require linear conductance response over a wide dynamic range for optimal weight update and minimum
noise for gradient calculation [ 21 , 22 , 95 ]. Whereas SNNs often demand richer and multiple synaptic dynamics simultaneously e.g., short term conductance decay (to implement synaptic cleft phenomena such Ca 2 +-dependent short-term plasticity (STP) and CAMKII-related eligibility traces [ 244 ]), non-volatile device states (to represent synaptic efficacy) and a probabilistic nature (to mimic synaptic vesicle releases [ 243 ]) (Fig. 1 a). However, optimizing the active memristive material for each of these features limits their feasibility to suit a wide range of computational frameworks and ultimately increases the system complexity for most demanding applications. Moreover, these diverse specifications cannot always be implemented by combining different types of memristors on a monolithic circuit e.g., volatile and non-volatile, binary and analog, due to the incompatibility of the fabrication processes. Therefore, the lack of universality of memristors that realize not only one, but diverse computational primitives is an unsolved challenge today.
A reconfigurable memristive computing substrate that allows active control over their ionic diffusive and drift dynamics can offer a viable unifying solution but is hitherto undemonstrated. Although dual-functional memory behaviour has been observed previously, the dominance of one of the mechanisms often results in poor switching performance for either one or both modes, limiting the employability of such devices for demanding applications [ 96 , 97 ]. To the best of our knowledge, there is no report yet of a reconfigurable memristive material that can portray both volatile diffusive and multi-state non-volatile drift kinetics, exhibit facile switching between these two modes, and still pertain excellent performance.
Here we report a reconfigurable memristor computing substrate based on halide perovskite nanocrystals that achieves on-demand switching between volatile and non-volatile modes by encompassing both diffusive and drift kinetics (Fig. 1 b). Halide perovskites are newcomer optoelectronic semiconducting materials that have enabled state-of-the-art solar cells [ 245 ], solid state light emitters [ 246 , 247 ] and photodetectors [ 248 -250 ].Recently, these materials have attracted significant attention as memory elements due to their rich variety of charge transport physics that supports memristive switching, such as modulatable ion migration [ 251 -253 ], electrochemical metallization reactions with metal electrodes [ 254 ] and localized interfacial doping with charge transport layers [ 255 ]. While most reports are based on thin films or bulk crystals of halide perovskites [ 251 -253 , 256 ], interestingly perovskite nanocrystal (NC)-based formulations have been much less investigated till date [ 244 , 257 ]. NCs in general are recently garnering significant attention for artificial synaptic implementations because they support a wide range of switching physics such as trapping and release of photogenerated carriers at dangling bonds over a broad spectral region [ 258 ], and single-electron tunnelling [ 259 ]. They allow low-energy (< fJ), highspeed (MHz) operation, and can support scalable and CMOS-compatible fabrication processes. In the case of perovskite NCs, however, existing implementations often utilize NCs only as a charge trapping medium to modulate the resistance states of another semiconductor, in flash-like configurations a.k.a synaptic transistor [ 260 -263 ]. The memristive switching capabilities and limits of the perovskite NC active matrix remains unaddressed, entailing significant research in this direction. Colloids of perovskite nanocrystals (NCs) are readily processable into thin-film NC solids and they offer a modular approach to impart mesoscale structures and electronic interfaces, tunable by adjusting the NC composition, size and surface ligand capping.
Our device comprises all-inorganic cesium lead bromide ( CsPbBr3 ) NCs capped with organic ligands as the active switching matrix and silver ( Ag ) as the active electrode. The design principle for realizing reconfigurable memristors revolves around two main factors. (i) From a material selection perspective, the low activation energy of migration of Ag + and Br -allows easy formation of conductive filaments. The soft lattice of the halide perovskite NCs facilitates diffusion of the mobile ions. Moreover, the organic capping ligands help regulate the extent of electrochemical reactions, resulting in high endurance and good reconfigurability. (ii) From a peripheral circuit design perspective, active control of the compliance control ( Icc ) determines the magnitude of flux of the mobile ionic species and in turn allows facile switching between volatile diffusive and multi-bit non-volatile drift modes of operation.
The surface capping ligands are observed to play a vital role in determining the switching characteristics and endurance performance. CsPbBr3NCs capped with didodecyldimethylammonium bromide (DDAB) ligands display poor switching performance in both volatile ( 10 cycles) and non-volatile ( 50 cycles) modes, whereas NCs capped with oleylguanidinium bromide (OGB)
ligands exhibit record-high endurance performances in both volatile ( 2 million cycles) as well as non-volatile switching ( 5655 cycles) modes [ 240 , 255 , 264 ]
To validate our approach and demonstrate the advantages of such reconfigurable memristive materials, we use a benchmark model of a fully-memristive reservoir computing (RC) framework interfaced to an artificial neural network (ANN) [ 240 ]. The reservoir is modelled as a network of recurrently-connected units whose dynamics act as short-term memory. Any temporal signal entering the reservoir is subject to a high-dimensional nonlinear transformation that enhances the separability of its temporal features. A linear read-out ANN layer is then connected to the reservoir units with all-to-all connections and trained to perform classification based on the temporal information maintained in the reservoir. Our RC implementation comprises perovskite memristors that are configured as diffusion-based volatile dynamic elements to implement the reservoir nodes and as drift-based non-volatile weights to implement the readout ANN layer. In their diffusive mode, the low activation energy of ion migration of the mobile ionic species ( Ag + and Br -) enables volatile threshold switching. The resulting short-term dynamics are essential for capturing temporal correlations within the input data stream. In the drift mode, stable conductive filaments formed by the drift of the ionic species facilitate programming of non-volatile synaptic weights in the readout layer for both training and inference. Furthermore, the readout layer can be trained online via active regulation of the Icc which allows precise selection of the drift dynamics and enables multiple-bit resolution in the low resistive state (LRS). Using neural firing patterns, we show via both experiments and simulations that a RC framework based on reconfigurable perovskite memristors can accurately extract features in the temporal signals and classify firing patterns.
## 4 . 2 diffusive mode of the perovskite reconfigurable memristor
We investigate two systems for diffusive dynamics-didodecyldimethylammonium bromide (DDAB) and oleylguanidinium bromide (OGB)-capped CsPbBr3NCs. The device structure comprises indium tin oxide (ITO), poly( 3 , 4 -ethylenedioxythiophene) polystyrene sulfonate (PEDOT:PSS), poly(N,N'-bis4 -butylphenyl-N,N'-bisphenyl)benzidine (polyTPD), CsPbBr3NCs and Ag (see 'Methods" section). With an lcc of 1 µ A, both material systems portray volatile threshold switching characteristics with diffusive dynamics and spontaneous relaxation back to the initial state, albeit with contrasting endurance. The DDAB-capped perovskite NCs exhibit a poor on-off ratio (volatile memory a.k.a. VM Ipower ON /Ipower OFF ∼ 10 ) and quick transition to a non-volatile state, resulting in an inferior volatile endurance of ∼ 10 cycles. On the other hand, the OGB-capped perovskite NCs depict a highly robust threshold switching behaviour with sub1 V set voltages, VM Ipower ON /Ipower OFF ∼ 10 3 and a record volatile endurance of 2 × 10 6 cycles (Fig. 3 a). The volatile threshold switching behaviour can be attributed to the redistribution of Ag + and Br -ions under an applied electric field, and their back-diffusion upon removing power [ 253 , 265 , 266 ]. It is also important to note that both these devices exhibit a unidirectional DC threshold switching behaviour with no switching occurring under reverse bias (negative voltage on the Ag electrode). This can be correlated to the dominant bipolar electrode effect over thermal-driven diffusion, in alignment with literature [ 267 -269 ]
## 4 . 3 drift mode of the perovskite reconfigurable memristor
Upon increasing the Icc to 1 mA, both the DDAB and OGB-capped CsPbBr3 NC memristors portray typical non-volatile bipolar resistive switching characteristics, once again with contrasting endurance (Fig. 4 . 2 , Fig. 4 . 3 and Supplementary Fig. A. 22 ). Both systems depict forming-free operations and similar on-off ratios ( ≥ 10 3 ) . However, the DDAB capped perovskite NCs quickly transit to a non-erasable non-volatile state, resulting in an inferior non-volatile endurance of 50 cycles (Supplementary Fig. A. 23 ). On the other hand, the OGB-capped perovskite NC-based memristor portrays a highly robust switching behaviour with sub- 1 V set voltages, and record-high non-volatile endurance and retention of 5655 cycles and 10 5 s, respectively (Fig. 4 . 3 , Supplementary Fig. A. 24 ). Similar to the volatile threshold switching mechanism, the nonvolatile resistive
F igure 4 . 2 : The device structure comprises ITO ( 100 nm), PEDOT:PSS ( 30 nm), polyTPD ( 20 nm), OGBcapped CsPbBr3NCs ( 20 nm ) and Ag ( 150 nm ) . (a) Diffusive mode- illustration of the proposed volatile diffusive switching mechanism. (b) Drift mode-illustration of the proposed non-volatile drift switching mechanism. Additional note: The thickness of the individual layers in the device schematic are not drawn to scale to match the experimentally-measured thicknesses. The perovskite layer is not a bulk semiconductor, but 1 -2 layers of nanocrystals (NCs). The schematic is drawn for simplicity, to illustrate the formation and rupture of conductive filaments (CFs) of Ag through the device structure.
<details>
<summary>Image 29 Details</summary>

### Visual Description
## Memristor Diagram: Volatile vs. Non-Volatile
### Overview
The image presents a diagram comparing the operational dynamics of volatile and non-volatile memristors. It illustrates the movement of ions and the formation/dissolution of filaments within the memristor structure under different voltage conditions. The diagram is divided into two main sections, 'a' for the volatile memristor and 'b' for the non-volatile memristor. Each section shows the memristor's state under an applied voltage and after the voltage is removed.
### Components/Axes
The diagram includes the following components:
* **Title (Top-Left)**: "Volatile Memristor" (enclosed in a dashed purple box)
* **Title (Top-Right)**: "Non-Volatile Memristor" (enclosed in a dashed light blue box)
* **Memristor Structure**: Each memristor consists of layers labeled as:
* Ag (Silver)
* CsPbBr3 NCs (Cesium Lead Bromide Nanocrystals)
* pTPD
* PEDOT:PSS
* ITO (Indium Tin Oxide)
* **Voltage Indicators**: "+ + I", "0 V", "+ + + I"
* **Filament Labels**: "Thin filament" (purple dashed box, volatile), "Thick filament" (light blue dashed box, non-volatile)
* **Dynamics Labels**: "Diffusive dynamics" (volatile), "Drift dynamics" (non-volatile)
* **Reaction Legend (Bottom)**:
* i: Ag -> Ag+ + e- (Silver atom becomes a silver ion and an electron)
* ii: Ag+ + e- -> Ag (Silver ion and an electron become a silver atom)
* iii: Br- -> VBr (Bromide ion becomes a Bromine vacancy)
* iv: Ag -> e- + Ag+ (Silver atom becomes an electron and a silver ion)
### Detailed Analysis or Content Details
**Volatile Memristor (a):**
1. **Initial State (+ + I):** A "Thin filament" (purple dashed box) of silver (Ag) atoms and ions (Ag+) forms between the Ag and ITO layers through the CsPbBr3 NCs layer.
2. **Diffusive Dynamics (0 V):** When the voltage is removed (0 V), the filament dissolves due to diffusion. The Ag atoms and ions disperse back into the CsPbBr3 NCs layer.
**Non-Volatile Memristor (b):**
1. **Initial State (+ + + I):** A "Thick filament" (light blue dashed box) of Ag atoms and ions forms between the Ag and ITO layers.
2. **Drift Dynamics (0 V):** When the voltage is removed (0 V), the filament remains intact due to drift dynamics. The Ag atoms and ions do not readily disperse.
**Reaction Details:**
* **Reaction i:** A solid gray circle (Ag) with a purple arrow pointing to an open circle (Ag+) and a small red circle (e-).
* **Reaction ii:** An open circle (Ag+) and a small red circle (e-) with a red arrow pointing to a solid gray circle (Ag).
* **Reaction iii:** A solid dark blue circle (Br-) with a green arrow pointing to an open circle (VBr).
* **Reaction iv:** A solid gray circle (Ag) with a purple arrow pointing to a small red circle (e-) and an open circle (Ag+).
### Key Observations
* The key difference between volatile and non-volatile memristors lies in the stability of the filament after the voltage is removed.
* Volatile memristors exhibit filament dissolution due to diffusion, while non-volatile memristors maintain the filament due to drift dynamics.
* The thickness of the filament appears to influence the stability.
### Interpretation
The diagram illustrates the fundamental mechanisms behind volatile and non-volatile memristor behavior. The volatile memristor's "thin filament" is unstable without an applied voltage, leading to its dissolution and a return to a high-resistance state. This behavior is attributed to the diffusion of silver ions. In contrast, the non-volatile memristor's "thick filament" remains stable even without an applied voltage, maintaining a low-resistance state. This stability is due to drift dynamics, which prevent the silver ions from dispersing. The reactions listed at the bottom explain the chemical processes involved in the formation and dissolution of the silver filament, including the oxidation and reduction of silver and the formation of bromine vacancies. The diagram highlights the importance of controlling ion mobility and filament stability in memristor design for different applications.
</details>
switching can also be attributed to redistribution of ions, and electrochemical reactions under an applied electric field [ 251 , 252 ]. The larger Icc of 1 mA results in permanent and thicker conductive filamentary pathways, and the switching dynamics is now dominated by the drift kinetics of the mobile ion species Ag + and Br -, rather than diffusion.
In the case of DDAB-capped CsPbBr3NCs, the inferior volatile endurance, quick transition to a non-volatile state and mediocre non-volatile endurance indicates poor control of the underlying electrochemical processes and formation of permanent conductive filaments even at low compliance currents. On the other hand, capping CsPbBr3NCs with OGB ligands enables better regulation of the electrochemical processes, resulting in superior on-off ratio, volatile endurance as well as non-volatile endurance. Scanning Electron Microscope (SEM) images indicate similar film thickness in both devices, ruling out dependence on the active material thickness (Fig. 4 . 3 ). Transmission Electron Microscopy (TEM) and Atomic Force Microscopy (AFM) images reveal similar nanocrystal size ( ∼ 10 nm ) and surface roughness for both films, dismissing variations in crystal size and morphology as possible differentiating reasons (Fig. 4 . 3 and Supplementary Fig. A. 25 ). While the exact mechanism is still unknown, the larger size of the OGB ligands compared to DDAB ( 2.3 nm vs. 1.7 nm ) could intuitively provide better isolation to the CsPbBr3NCs and prevent excess electrochemical redox reactions of Ag + and Br -, modulating the formation and rupture of conductive filaments. This comparison is further supported by photoluminescence measurements, pointing to a larger drop of luminescence quantum yield in the films of DDAB-capped NCs, arising from the stronger excitonic diffusion and trapping.
To probe the mechanism further, devices with Au as the top electrode were fabricated, but did not show any resistive switching behaviour. The devices do not reach the compliance current of 1 mA during the set process and do not portray the sudden increase in current, typical of filamentary memristors. This indicates that Ag is crucial for resistive switching and also proves that Br - ions play a trivial role in our devices if any. Control experiments on PEDOT:PSS only and PEDOT:PSS + pTPD devices further reiterate importance of the perovskite NC thin film as an active matrix for reliable and robust Ag filament formation and rupture. Secondary ion mass spectrometry (SIMS) profiling reveals a clear difference in the 107 Ag cross section profile
F igure 4 . 3 : The device structure comprises ITO ( 100 nm ) , PEDOT:PSS ( 30 nm ) , polyTPD ( 20 nm ) , OGBcapped CsPbBr3NCs ( 20 nm ) and Ag ( 150 nm ) as shown in the SEM cross-section. Thickness of the individual layers were confirmed by AFM. The TEM image reveals NCs with an average diameter of -10 nm. (a) Diffusive mode- evolution of the device conductance upon applying DC sweep voltages ( 0 V → 2 V → 0 V ) with an Icc = 1 µ A (top), endurance performance (bottom). (b) Drift mode- evolution of the device conductance upon applying DC sweep voltages ( 0 V → + 5 V → 0 V → -7 V → 0 V ) with an Icc = 1 mA during SET operation (top), endurance performance (bottom).
<details>
<summary>Image 30 Details</summary>

### Visual Description
## Memristor Characteristics: Volatile vs. Non-Volatile
### Overview
The image presents a comparative analysis of volatile and non-volatile memristor characteristics. It includes current-voltage (I-V) curves, cycling endurance plots, and material structure images for both types of memristors. The volatile memristor data is on the left (labeled 'a'), and the non-volatile memristor data is on the right (labeled 'b').
### Components/Axes
**Volatile Memristor (Left Side - 'a')**
* **Top Chart:** I-V Curve
* Y-axis: Current (A), logarithmic scale from 10^-9 to 10^-6
* X-axis: Voltage (V), linear scale from 0 to 2
* Legend (top-right):
* Cycle 1 (purple solid line)
* Cycle 2-49 (gray dashed line)
* Cycle 50 (dark purple solid line)
* **Bottom Chart:** Cycling Endurance
* Y-axis: Current (A), logarithmic scale from 10^-9 to 10^-6
* X-axis: Cycles, linear scale from 0 to 2x10^6
* Legend (center):
* Square: @ 0.1V before
* Circle: @ 2V
* Triangle: @ 0.1V after
* **Middle Images:**
* Top: Cross-sectional TEM image with labels 1-5. Scale bar: 200 nm.
* Bottom: Top-down TEM image. Average size ~ 10 nm.
* Labels:
* 1 - Ag
* 2 - OGB capped CsPbBr3 NCs
* 3 - pTPD
* 4 - PEDOT:PSS
* 5 - ITO
**Non-Volatile Memristor (Right Side - 'b')**
* **Top Chart:** I-V Curve
* Y-axis: Current (A), logarithmic scale from 10^-10 to 10^-4
* X-axis: Voltage (V), linear scale from -6 to 4
* Legend (top-right):
* Cycle 1 (light blue solid line)
* Cycle 1000 (dark blue solid line)
* **Bottom Chart:** Cycling Endurance
* Y-axis: Current (A), logarithmic scale from 10^-13 to 10^-4
* X-axis: Cycles, linear scale from 0 to 5000
* Legend (bottom-left):
* LRS (light blue circles)
* HRS (dark blue circles)
### Detailed Analysis
**Volatile Memristor (Left Side - 'a')**
* **Top Chart (I-V Curve):**
* Cycle 1: Current increases sharply around 0.1V to approximately 10^-6 A, then decreases as voltage increases to 2V.
* Cycles 2-49: Similar behavior to Cycle 1, but with slightly lower current levels.
* Cycle 50: Similar behavior to Cycle 1, but with slightly higher current levels.
* **Bottom Chart (Cycling Endurance):**
* @ 0.1V before: Current remains consistently high at approximately 10^-6 A across all cycles.
* @ 2V: Current fluctuates between approximately 10^-9 A and 10^-8 A, showing switching behavior.
* @ 0.1V after: Current fluctuates between approximately 10^-9 A and 10^-8 A, showing switching behavior.
**Non-Volatile Memristor (Right Side - 'b')**
* **Top Chart (I-V Curve):**
* Cycle 1: Current starts low, increases around 2V to approximately 10^-4 A, then decreases as voltage increases to 4V. As voltage decreases from 4V to -6V, the current decreases.
* Cycle 1000: Shows a more defined hysteresis loop compared to Cycle 1. The switching occurs at different voltage levels.
* **Bottom Chart (Cycling Endurance):**
* LRS (Low Resistance State): Current fluctuates around 10^-7 A.
* HRS (High Resistance State): Current fluctuates around 10^-4 A.
### Key Observations
* **Volatile Memristor:** Exhibits a clear switching behavior in the I-V curve, with the current returning to a low state after the voltage is removed. The cycling endurance plot shows a stable high current state at 0.1V before switching, and fluctuating low current states at 2V and 0.1V after switching.
* **Non-Volatile Memristor:** Shows a distinct hysteresis loop in the I-V curve, indicating non-volatile memory behavior. The cycling endurance plot demonstrates a clear separation between the high resistance state (HRS) and low resistance state (LRS) over many cycles.
* **Material Structure:** The TEM images provide insights into the memristor's layered structure and the size of the nanocrystals.
### Interpretation
The data suggests that the volatile memristor exhibits temporary switching behavior, where the device returns to its original state after the voltage is removed. This is evident from the I-V curves and the cycling endurance plot. In contrast, the non-volatile memristor retains its state even after the voltage is removed, as indicated by the hysteresis loop and the distinct HRS and LRS levels in the cycling endurance plot. The TEM images provide structural context, showing the layered composition and nanocrystal size, which likely influence the memristor's performance. The difference in materials and structure likely contributes to the difference in volatile vs non-volatile behavior.
</details>
when comparing an ON and OFF device. An increase of the 107 Ag count is observed at the interface between the halide perovskite and the organic layers for the device in the ON state. Temperature-dependent measurements further confirm the theory of migration of Ag + ions through the perovskite matrix. The conclusions in this study are observed to be independent of the NC layer thickness, NC size and dispersity.
## 4 . 4 reservoir computing with perovskite memristors
To demonstrate the advantages of the reconfigurability features of our perovskite memristors, we model a fully-memristive RC framework with dynamically-configured layer of virtual volatile reservoir nodes and a readout ANN layer with non-volatile weights in simulation. In particular, we address three distinct forms of computational requirements using the reconfigurability of the proposed device: an accumulating/decaying short-term memory for the temporal processing in the reservoir; a stable long-term memory for retaining trained weights in the readout layer, and a circuit methodology for accessing analog states from binary devices to enhance the training performance.
## 4 . 5 diffusive perovskite memristors as reservoir elements
To implement the reservoir layer with the fabricated memristor devices, we utilize the virtual node concept originally proposed by Appeltant et al. [ 239 ]. Instead of conventional transforming of the input signal to a high-dimensional reservoir state by processing over many non-linear units, the virtual node concept employs the idea of delayed feedback on a single physical device exhibiting strong short-term effects. Under the influence of a sequential input, the dynamical device state goes through a non-linear transient response, which is recorded with fixed timesteps to create a set of virtual nodes representing the reservoir state. Hence, the transient device non-linearity constitutes temporal processing, and the delay system forms the high dimensional representation in the reservoir.
Elements of a reservoir layer should ideally possess a fading memory (sometimes called shortterm memory or echo state property) and non-linear internal dynamics [ 270 ]. The fading memory effect plays a key role in extracting features in the temporal domain of the input data stream, while the non-linear internal dynamics enable projection of temporal features to a high-dimensional state with good separability [ 271 ]. Response of the OGB-capped CsPbBr3NC memristors to low-voltage electrical spikes reveal short-term/fading diffusive dynamics with a relaxation time ≥ 5 ms for an input pulse duration = 20 ms and amplitude = 1 V. Non-linear internal dynamics are evident in 4 formats- (i) from the transient evolution of the device conductance during the stimulations; and from the final device conductance as a function of the applied pulse (ii) amplitude, (iii) width and (iv) number (Supplementary Fig. A. 26 ). An additional test of the echo state property reveals that the present device state is reflective of the input temporal features in the recent past ( < 23 ms ) but not the far past, enabling efficient capture of short-term dependencies in the input data stream (Supplementary Fig. A. 27 ). Stimulation of pulse streams with different temporal features results in distinct temporal dynamics of memristor states.
## 4 . 6 drift perovskite memristors as readout elements
Storing the weight of the fully-connected readout layer of the ANN requires non-volatile synaptic devices. For representing synaptic efficacy, we use the drift-based perovskite memristor configuration that enables stable access to multiple conductance states. Because synaptic efficacy in ANNs can be either positive or negative, we use two memristor devices G + and G -in a differential architecture to represent a single synapse [ 272 ]. Hence, synaptic potentiation is obtained by increasing the conductance of G + , and depression by increasing the conductance of G -with identical pulses. The effective synaptic strength is expressed by the difference between the two conductances ( G + -G -) . Arranged in a crossbar array with the differential configuration, synaptic propagation at the readout layer is realized efficiently, governed by Kirchhoff's Current Law and Ohm's Law at O ( 1 ) complexity [ 273 ].
Like most filament-based memristors, our devices display non-volatile switching across only two stable states (binary) and suffer from the lack of access to true analog conductance states for synaptic efficacy. This low bit resolution during learning has been empirically shown to cause poor network performance [ 274 , 275 ]. To have more granular control over the filament formation, we migrate a recently proposed programming approach for oxide memristors to halide perovskites [ 276 ]. We achieve multi-level stable conductance states in the device's low resistance regime by modulating the programming Icc. In comparison to the undesirable nonlinear transformations seen in HfO2 devices, the mapping from Icc to conductance follows a linear relation for the drift-based CsPbBr3NC devices, hence providing linear mapping to the desired conductance values (see below). This enables controlled weight updates using a single shot without requiring a write-verify scheme.
We use Icc modulation to train the readout layer of the reservoir network (see "Methods") using the statistical measurement data from the devices. For every input pattern received from reservoir nodes, the readout layer produces a classification prediction via a sigmoid activation function. Depending on the classification error, the desired conductance changes of each differential
memristor pair per synapse are calculated. The memristive weights are then updated with the corresponding Icc, resulting in the desired conductance values.
## 4 . 7 classification of neural firing patterns
Next, we present a virtual reservoir neural network [ 239 , 277 ] simulation with the short-term diffusive configuration of perovskite memristors in the reservoir layer and long-term stable drift configuration in the trainable readout layer (Fig. 4 a). The network is tested on the classification of the four commonly observed neural firing patterns in the human brainBursting, Adaptation, Tonic, and Irregular [ 278 ]. These spike trains (Supplementary Fig. A. 28 ) are applied to a single perovskite memristor in the reservoir layer, whose diffusive dynamics constitute a short-term memory between 5 and 20 ms timescale. We exploit the concept of a virtual reservoir, where each reservoir node is uniformly sampled at finite intervals to emulate the rich non-linear temporal processing in reservoir computing. We use a sampling interval of 35 ms, resulting in a population of 30 virtual reservoir nodes representing the temporal features across 1050 ms long neural firing patterns. The device responses are derived from electrical measurements of 25 different memristive devices (Fig. 4 b). Both device-to-device and cycle-to-cycle variability are captured with extensive measurements. Stimulation with "Bursting" spikes results in an accumulative behaviour within each high-frequency group and an exponential decay in the inter-group interval, reflective of the fading memory and non-linear internal dynamics as described above. "Adaptive" patterns trigger weakened accumulative behaviour as a function of the pulse interval, "Irregular" results in random accumulation and decay, while "Tonic" generates states with no observable accumulation. As the last stage of computation, these features are projected to a fully-connected readout layer with 4 sigmoid neurons (see "Methods"). The reservoir network achieves a classification accuracy of 85.1% with the training method of modulating the programming I cc of drift-based perovskite weights in the readout layer (Fig. 4 c). Remarkably, with double-precision floating-point weights trained with the Delta rule [ 195 ] on readout, the test accuracy is 91.8% confirming the effectiveness of our Icc approach (Supplementary Fig. A. 29 , A. 30 , A. 31 ). The training and test accuracy over 5 epochs demonstrates that both networks are not overfitting the training data (Fig. 4 . 4 , Supplementary Table A. 1 ).
## 4 . 8 discussion
We present robust halide perovskite NC-based memristive switching elements that can be reconfigured to exhibit both volatile diffusive and non-volatile drift dynamics. This represents a significant advancement in the experimental realization of memristors. In comparison to pristine volatile and non-volatile memristors, our reconfigurable CsPbBr3NC memristors can be utilized to implement both neurons and synapses with the same material/device platform and adapt to diverse computational primitives without additional modifications to the device stack at run-time. The closest comparison to our devices are dual functional memristors- those that exhibit both volatile and non-volatile switching behaviours without additional materials or device engineering. While impressive demonstrations of dual functional memristors exist, many devices require an electroforming step to initiate the resistive switching behaviour and most importantly, the endurance and retention performance are often limited to < 500 cycles in both modes and ≤ 10 4 s respectively. In comparison, we report a record-high endurance of 2 million cycles in the volatile mode, 5655 cycles in the non-volatile mode, and a retention of 10 5 s, highlighting the significance of our approach. This makes these devices ideal for alwayson online learning systems. The forming-free operation, and low set-reset voltages would allow low power vector-matrix multiplication operations, while the high retention and endurance ensure precise mapping of synaptic weights during training and inference of artificial neural networks. In contrast to most metal oxide-based diffusive memristors that require high programming currents to initiate filament formation ( ≥ 1 V or/and ≥ 10 µ A ), our devices demonstrate forming-free volatile switching at lower voltages and currents ( ≤ 1 V and ≤ 1 µ A ) . This is possibly due to the lower activation energy for Ag + and Br -migration in halide perovskites compared to oxygen
vacancies in oxide dielectrics, softer lattice of the halide perovskite layer and the large availability of mobile ion species in the halide perovskite matrix. Most importantly, our devices can be switched to the volatile mode even after programming multiple non-volatile states, proving true "reconfigurability" (Supplementary Fig. A. 32 ). Such behaviour is an example of the neuromorphic implementation of synapses in SNNs that demand both volatile and non-volatile switching properties, simultaneously (see Fig. 4 . 1 a). It is important to note that existing implementations of dual functional devices cannot be reconfigured back to the volatile mode once the nonvolatile mode is activated, making our device concept and its use case for neuromorphic computing unique.
In operando thermal camera imaging provides further support to our hypothesis of better management of the electrochemical reactions with the OGB ligands when compared to DDAB, and points to the importance of investigating nanocrystal-ligand chemistry for the development of high-performance robust memristors. While the exact memristive mechanism is still unclear, our results favour NC film implementations over thin films empirically. The insights derived on the apt choice of the capping ligands paves way for further investigations on nanocrystal-ligand chemistry for the development of high-performance robust memristors. The ability to reconfigure the switching mode ondemand allows easy implementation of multiple computational layers with a single technology, alleviating the hardware system design requirements for new neuromorphic computational frameworks. Our work complements and goes beyond previous model-based implementations [ 240 ], by comprehensively characterizing diffusive and drift devices for ∼ 5000 patterns of different input spike streams, and collecting statistical data on device-to-device and cycle-to-cycle variability, device degradation, temporal conductance drift and real-time nanoscopic changes in memristor conductance. This statistical data is incorporated in the simulations for a very accurate modelling of the device behaviour for this task. To the best of our knowledge, this is the first time this extent of systematic analysis is being done to use the same device for both diffusive and drift behaviour for a real-world benchmark. Given the excellent performance and record endurance of our reconfigurable halide perovskite memristors, this opens way for a completely novel type of memristive substrate, for applications such as time series forecasting and feature classification.
## 4 . 9 methods
device fabrication Indium tin oxide (ITO, 7 Ω cm -2 ) coated glass substrates were cleaned by sequential sonication in helmanex soap, distilled water, acetone, and isopropanol solution. Substrates were dried and exposed to UV for 15 min. PEDOT:PSS films were deposited by spincoating ( 4000 rpm for 25 s ) the precursors (Clevios, Al 4083 ) and followed by annealing at 130 ◦ C for 20 min. PolyTPD (Poly[N,N'-bis( 4 -butylphenyl)-N,N'-bisphenylbenzidine]) dissolved in chlorobenzene ( 4mg/ml ) was then spin-coated at 2000rpm, 25 s; followed by annealing at 130 ◦ C for 20 min. Solutions of CsPbBr3NCs capped with DDAB and OGB were next deposited via spin coating ( 2000 rpm for 25 s ). Finally, 150 nm of Ag was thermally evaporated through shadow masks ( 100 µ m × 100 µ m ) to complete the device fabrication.
electrical measurements For endurance testing in the volatile mode, write and read voltages of + 2 V and + 0.1 V were used respectively with a pulse width of 5 ms. The following methodology was used: 1 . read the current level of the device using + 0.1 V, 2. apply + 2 V for 5 ms as the write pulse and monitor the device's current level, 3 . repeat step 1 . For the non-volatile mode, write voltage of + 5 V, erase voltage of -7 V and read voltage of + 0.1 V were used. The following methodology was used: 1 . read the current level of the device using + 0.1 V, 2. apply + 5 V/ -7 V for 5 ms as the write/erase pulse, 3 . repeat step 1 and extract the on-off ratio comparing steps 1 and 3 . Note: Since our VM loses the stored information upon removing power, the ON state ( Ipower ON ) is reported as the current value corresponding to the application of the programming pulse (at 2 V ) and the OFF state ( Ipower OFF ) is reported as the current value corresponding to the application of the reading pulse (at 0.1 V ), in alignment with the reported literature [ 234 ]. For endurance measurements in the non-volatile memory (NVM) mode, the
conventional methodology was used, i.e., the ON-OFF ratios were extracted from the current values corresponding to the same reading pulse ( 0.1 V ) .
neural spike pattern dataset generation The neural spike pattern dataset consists of samples of four classes: Bursting, Adaptation, Tonic, and Irregular. "Bursting" firing patterns are defined as groups of high-frequency spikes with a constant inter-group interval; "Adapting" corresponds to spikes with gradually increased intervals; "Tonic" denotes low-frequency spikes with a constant interval; and "Irregular" corresponds to spikes that fire irregularly. In total, the dataset consists of 4975 patterns ( 199 cycles applied to 25 devices) for each of the four types. Each pattern is -1050 ms long, where spikes are emulated with square wave voltage pulses ( 1 V, 25 ms ). For Bursting patterns, each spike train consists of 4 -5 highfrequency burst groups ( 4 spikes per burst group) with an interspike interval (ISI) of 5 ms. Between bursts, there exist 75 -125 ms intervals. For Adaptation patterns, each spike train starts with high-frequency pulses with an ISI of 5 ms and gradually increases 50% with each new spike (with 5% standard deviation). For Tonic patterns, a regular spiking pattern with an average ISI of 70 ms is used. For each ISI, 5% standard deviation is applied. For irregular patterns, spike trains are divided into 60 ms segments, and a spike is assigned randomly with a 50% probability to the beginning of each segment.
simulation of neural networks For classifying neural spike patterns, a fully-connected readout layer with 30 inputs and 4 outputs is used. In addition, there is one bias unit in the input. The 4 neurons at the output are sigmoid neurons. For training, 90% of the neural spike pattern dataset is used over 5 epochs. At the end of each epoch, the network performance is tested with the rest 10% of the dataset. During Icc modulated training, each synapse comprises two conductance values in a differential configuration. The differential current is scaled such that W = β ( G + -G -) , where β = 1/ ( Gmax -Gmin ) , corresponds to maximum and minimum allowed conductance values of memristors. Conductances are initialized randomly with a Normal distribution ( µ G = 0.5mS and σ G = 0.1mS ) . Network prediction is selected deterministically by choosing the output neuron with the maximum activation. After the prediction, the L 1 loss is calculated. Then, weight change that reflects a step in the direction of the ascending loss gradient is calculated with ∆ W = ( η xi δ j ) / β , where η is the learning rate, xi is the reservoir node output, δ j is the calculated loss and 1/ β is the scaling factor between weights and conductances. Target weights are clipped between 0.1mS and 3.5mS. Subsequently, Icc values corresponding to the target conductances are calculated (Supplementary Fig. A. 31 ). Finally, we sample new conductance values from a Normal distribution whose mean and standard deviation is calculated using linear functions of Icc. For the double-precision floating-point-based training, the same readout layer size is used. Network loss is calculated via the Mean Squared Error. Weights are adjusted using the Delta rule with an adaptive learning rate [ 175 ]. Both networks are trained with a batch size of 1 and a suitably tuned hyperparameters.
F igure 4 . 4 : (a) An ANN is trained to perform classification using the temporal properties of the reservoir, in response to a series of inputs representing neural firing patterns. Using Icc control, OGBcapped CsPbBr3NC memristors are configured to the diffusion-based volatile mode to serve as virtual nodes in the reservoir; and to the drift-based non-volatile mode to implement synaptic weights in the ANN readout layer. During single inference, a neural firing pattern represented as a short-voltage pulse train is applied to a single diffusive-mode perovskite device. Based on the virtual node concept [ 239 ] temporal features of the input signal are intrinsically encoded as an evolving conductance of the device due to their nonlinear short-term memory effects. This evolving device state is sampled with equal intervals of 35 ms in time, denoting 30 virtual nodes that jointly represent the reservoir state. These virtual node states are delayed and fed into the readout layer, whose weights, Wji (size of 30 × 4 ), are implemented by the drift-mode non-volatile perovskite memristors, placed in a differential configuration [ 115 ]. Classification of neural firing patterns. (b) Experiments. The memristive reservoir elements are stimulated using four common neural firing input patterns - "Bursting", "Adapting", "Tonic" and "Irregular". During the presentation of inputs, the evolution of the device conductance is monitored. Each spike in the input data stream is realized as a voltage pulse of 1 V amplitude and 20 ms duration, while the device states are read with -0.5 V, 5 ms pulses. (c) Distribution of the programmed perovskite memristor non-volatile conductances with Icc modulation. The inset shows the simulated linear Icc → G relation. Simulations. (d) Normalized confusion matrix shows the classification results with the Icc controlled training scheme. The RC performs slightly worse in irregular patterns due to lack of temporal correlations among samples. (e) Training ( 86.75% ) and test ( 85.14% ) accuracies of the fully-memristive RC framework.
<details>
<summary>Image 31 Details</summary>

### Visual Description
## Neural Network Analysis: Reservoir Computing and Accuracy Evaluation
### Overview
The image presents a comprehensive analysis of a neural network implementation, likely a reservoir computing system. It includes a schematic of the network architecture, simulated and experimental conductance measurements, a confusion matrix, and training/testing accuracy plots.
### Components/Axes
**Panel a: Network Architecture Diagram**
* **Input:** A time-series input signal, represented as a series of pulses.
* **Reservoir:** A recurrent neural network with interconnected nodes. A delay element labeled "Ï„" is present.
* **Virtual Nodes:** Nodes connecting the reservoir to the readout neurons.
* **Readout Neurons:** Four output neurons labeled "Bursting", "Adaptation", "Irregular", and "Tonic".
* **Circuit Implementation:** A schematic showing the physical implementation of a weight (W<sub>ij</sub>) as proportional to the difference between two conductances (G<sup>+</sup> and G<sup>-</sup>), controlled by a current I<sub>CC</sub>.
**Panel b: Spiking Activity**
* Raster plots showing the spiking activity of the four readout neurons ("Tonic", "Bursting", "Irregular", "Adaptation") over time (0 to 1000 ms).
* The data represents approximately 20,000 cycles across 25 devices.
**Panel c: Conductance vs. Current (I<sub>CC</sub>)**
* **Y-axis:** Conductance (mS), ranging from 0 to 4.
* **X-axis:** I<sub>CC</sub> (μA), ranging from 0 to 1000.
* **Inset:** Simulation data showing a positive correlation between I<sub>CC</sub> and conductance.
* **Main Plot:** Experimental data showing conductance values at discrete I<sub>CC</sub> levels (100, 200, 400, 600, 800, 1000 μA). Each data point represents an individual measurement. Mean (μ) and standard deviation (σ) are indicated for each I<sub>CC</sub> level.
* I<sub>CC</sub> = 100 μA: μ ≈ 0.312 mS, σ ≈ 0.060 mS
* I<sub>CC</sub> = 200 μA: μ ≈ 0.662 mS, σ ≈ 0.040 mS
* I<sub>CC</sub> = 400 μA: μ ≈ 1.330 mS, σ ≈ 0.033 mS
* I<sub>CC</sub> = 600 μA: μ ≈ 1.986 mS, σ ≈ 0.074 mS
* I<sub>CC</sub> = 800 μA: μ ≈ 2.658 mS, σ ≈ 0.084 mS
* I<sub>CC</sub> = 1000 μA: μ ≈ 3.325 mS, σ ≈ 0.112 mS
**Panel d: Confusion Matrix**
* A 4x4 confusion matrix showing the classification accuracy for the four neuron types ("Burst", "Adaptation", "Irregular", "Tonic").
* The color intensity represents the classification accuracy, ranging from 0 to 1.
* The diagonal elements (true positives) are darker blue, indicating higher accuracy.
**Panel e: Accuracy vs. Epoch**
* **Y-axis:** Accuracy (%), ranging from 0 to 100.
* **X-axis:** Epoch, ranging from 0 to 5.
* **Test Accuracy:** Red line with square markers.
* **Training Accuracy:** Blue line with circle markers.
* Both lines show a rapid increase in accuracy from epoch 0 to epoch 2, followed by a plateau.
### Detailed Analysis
**Panel a: Network Architecture**
The diagram illustrates a reservoir computing architecture. The input signal is fed into a reservoir, which is a fixed, randomly connected recurrent neural network. The reservoir transforms the input into a higher-dimensional space. The readout neurons are trained to map the reservoir's state to the desired output. The circuit implementation shows how the weights between the reservoir and readout neurons are physically realized using conductances.
**Panel b: Spiking Activity**
The raster plots show the temporal dynamics of the readout neurons. Each plot represents the spiking activity of a single neuron type over time. The plots show the different firing patterns of the four neuron types.
**Panel c: Conductance vs. Current (I<sub>CC</sub>)**
The inset plot shows the simulated relationship between I<sub>CC</sub> and conductance, which is approximately linear. The main plot shows the experimental data, which exhibits variability around the mean values. The standard deviation increases with I<sub>CC</sub>, indicating greater variability at higher current levels.
**Panel d: Confusion Matrix**
The confusion matrix shows the classification performance of the network. The diagonal elements represent the percentage of correctly classified instances for each neuron type. The off-diagonal elements represent the percentage of misclassified instances. The matrix suggests that some neuron types are more easily confused than others.
**Panel e: Accuracy vs. Epoch**
The plot shows the training and testing accuracy as a function of the number of training epochs. Both curves show a rapid increase in accuracy during the first two epochs, followed by a plateau. This indicates that the network learns quickly and reaches a stable performance level. The test accuracy is slightly lower than the training accuracy, which is expected due to overfitting.
### Key Observations
* The reservoir computing system is capable of classifying different neuron types based on their spiking activity.
* The conductance of the physical devices can be controlled by the current I<sub>CC</sub>.
* The classification accuracy reaches a high level after a few training epochs.
* There is some variability in the conductance measurements, which may affect the performance of the network.
### Interpretation
The data suggests that the reservoir computing system is a promising approach for implementing neural networks in hardware. The system is capable of learning complex patterns and achieving high classification accuracy. The physical implementation of the weights using conductances is a key advantage of this approach. The variability in the conductance measurements is a potential limitation that needs to be addressed in future work. The confusion matrix highlights potential areas for improvement in the classification performance. The accuracy plot demonstrates the learning capability of the system and the importance of training the readout neurons.
</details>
## MOSAIC: AN ANALOG SYSTOLIC ARCHITECTURE FOR IN-MEMORY COMPUTING AND ROUTING
This chapter's content was published in Nature Communications, featured as one of 50 best papers recently published in the field. The original publication is authored by Yigit Demirag ∗ , Thomas Dalgaty ∗ , Filippo Moro ∗ , Alessio De Pra, Giacomo Indiveri, Elisa Vianello and Melika Payvand.
∗ These authors contributed equally.
The brain's connectivity is locally dense and globally sparse, forming a small-world graph-a principle prevalent in the evolution of various species, suggesting a universal solution for efficient information routing. However, current artificial neural network circuit architectures do not fully embrace small-world neural network models. Here, we present the neuromorphic Mosaic: a nonvon Neumann systolic architecture employing distributed memristors for in-memory computing and in-memory routing, efficiently implementing small-world graph topologies for SNNs. We've designed, fabricated, and experimentally demonstrated the Mosaic's building blocks, using integrated memristors with 130 nm CMOS technology. We show that thanks to enforcing locality in the connectivity, routing efficiency of Mosaic is at least one order of magnitude higher than other SNN hardware platforms. This is while Mosaic achieves a competitive accuracy in a variety of edge benchmarks. Mosaic offers a scalable approach for edge systems based on distributed spike-based computing and in-memory routing.
## 5 . 1 introduction
Despite millions of years of evolution, the fundamental wiring principle of biological brains has been preserved: dense local and sparse global connectivity through synapses between neurons. This persistence indicates the efficiency of this solution in optimizing both computation and the utilization of the underlying neural substrate [ 30 ]. Studies have revealed that this connectivity pattern in neuronal networks increases the signal propagation speed [ 279 ], enhances echo-state properties [ 280 ] and allows for a more synchronized global network [ 281 ]. While densely connected neurons in the network are attributed to performing functions such as integration and feature extraction functions [ 282 ], long-range sparse connections may play a significant role in the hierarchical organization of such functions [ 283 ]. Such neural connectivity is called smallworldness in graph theory and is widely observed in the cortical connections of the human brain [ 279 , 284 , 285 ] (Fig. 5 . 1 a, b). Small-world connectivity matrices, representing neuronal connections, display a distinctive pattern with a dense diagonal and progressively fewer connections between neurons as their distance from the diagonal increases (see Fig. 5 . 1 c).
Crossbar arrays of non-volatile memory technologies e.g., Floating Gates [ 287 ], RRAM [ 139 , 191 , 288 -291 ], and PCM [ 138 , 292 -294 ] have been previously proposed as a means for realizing artificial neural networks on hardware (Fig. 5 . 1 d). These computing architectures perform in-memory vector-matrix multiplication, the core operation of artificial neural networks, reducing the data movement, and consequently the power consumption, relative to conventional von Neumann architectures [ 144 , 287 , 295 -299 ].
However, existing crossbar array architectures are not inherently efficient for realizing smallworld neural networks at all scales. Implementing networks with small-world connectivity in a large crossbar array would result in an under-utilization of the off-diagonal memory elements (i.e., a ratio of non-allocated to allocated connections > 10) (see Fig. 5 . 1 d and Supplementary Note 1 ). Furthermore, the impact of analog-related hardware non-idealities such as current sneak-paths, parasitic resistance, and capacitance of the metal lines, as well as excessively large read currents and diminishing yield limit the maximum size of crossbar arrays in practice [ 99 -101 ].
These issues are also common to biological networks. As the resistance attenuates the spread of the action potential, cytoplasmic resistance sets an upper bound to the length of dendrites [ 30 ].
F igure 5 . 1 : Small-world graphs in biological network and how to build that into a hardware architecture for edge applications . (a) Depiction of small-world property in the brain, with highly-clustered neighboring regions highlighted with the same color. (b) Network connectivity of the brain is a small-world graph, with highly clustered groups of neurons with sparse connectivity among them. (c) (adapted from Bullmore and Sporns 2009 [ 286 ]). The functional connectivity matrix which is derived from anatomical data with rows and columns representing neural units. The diagonal region of the matrix (darkest colors) contains the strongest connectivity which represents the connections between the neighboring regions. The off-diagonal elements are not connected. (d) Hardware implementation of the connectivity matrix of c, with neurons and synapses arranged in a crossbar architecture. The red squares represent the group of memory devices in the diagonal, connecting neighboring neurons. Black squares show the group of memory devices that are never programmed in a small-world network, and are thus highlighted as 'wasted'. (e) The Mosaic architecture breaks the large crossbars into small densely-connected crossbars (green Neuron Tiles) and connects them through small routing crossbars (blue Routing Tiles). This gives rise to a distributed two-dimensional mesh, with highly connected clusters of neurons, connected to each other through routers. (f) The state of the resistive memory devices in Neuron Tiles determines how the information is processed, while the state of the routing devices determines how it is propagated in the mesh. The resistive memory devices are integrated into 130 nm technology. (g) Plot showing the required memory (number of memristors) as a function of the number of neurons per tile, for different total numbers of neurons in the network. The horizontal dashed line indicates the number of required memory bits using a fully-connected RRAM crossbar array. The cross (X) illustrates the cross-over point below which the Mosaic approach becomes favorable. (h) The Mosaic can be used for a variety of edge AI applications, benchmarked here on sensory processing and Reinforcement learning tasks.
<details>
<summary>Image 32 Details</summary>

### Visual Description
## Diagram: Small-World Implementation on Memory Crossbars
### Overview
The image presents a multi-faceted diagram illustrating a small-world implementation on memory crossbars, inspired by neural networks. It covers aspects from brain structure abstraction to hardware implementation and potential applications.
### Components/Axes
* **a:** A 3D rendering of a brain, with different regions colored distinctly.
* **b:** A network graph representing connections between brain regions, with clusters highlighted by colored ellipses (red, green, blue, yellow).
* **c:** A matrix representing neural unit connections, with black squares indicating connections and white squares indicating no connection. A red ellipse highlights a cluster of connections.
* **d:** A schematic of a memory crossbar implementation. It includes an "Input" label with gray triangles pointing towards a grid of squares. The legend indicates that red squares represent "High G memory unit" and black squares represent "Low G memory unit." The word "Wasted!" is written near the bottom-left corner.
* **e:** A mosaic layout of neuron tiles, with blue and green tiles arranged in a grid. A red square highlights a 3x3 tile region. Arrows indicate connections between tiles.
* **f:** A diagram showing a neuron tile connected to a network of neurons. It also shows a crossbar implementation with red and black dots, representing memory units. A memristor circuit diagram is shown at the bottom.
* **g:** A plot of "# Memory elements" vs. "Neurons per neuron tile." The y-axis is logarithmic, ranging from 10^4 to 10^6. The x-axis ranges from 2 to 64. The plot contains four lines, each representing a different number of neurons: 128 (green), 256 (light blue), 512 (dark blue), and 1024 (black).
* **h:** Examples of applications, including "Continuous space reinforcement learning" (illustrated by a quadrupedal robot), "Keyword Spotting" (illustrated by a waveform), and "Anomaly detection" (illustrated by an electrocardiogram). The text "Wide application range" is also present.
### Detailed Analysis
**a:** The brain rendering shows a complex network of interconnected regions, each distinguished by a different color.
**b:** The network graph abstracts the brain's connectivity, showing clusters of interconnected nodes. The colored ellipses highlight these clusters.
**c:** The neural unit matrix visualizes the connections between individual neurons. The density of black squares indicates the strength of connectivity.
**d:** The memory crossbar schematic shows how neural connections can be implemented using memory units. The "Wasted!" label suggests that some memory units are not utilized.
**e:** The mosaic layout illustrates how neuron tiles can be arranged to form a larger network. The different colors may represent different types of tiles or activity levels.
**f:** The neuron tile diagram shows the internal structure of a tile and its connections to other neurons. The crossbar implementation visualizes how memory units can be used to store and process information.
**g:** The plot shows the relationship between the number of memory elements and the number of neurons per tile.
* **128 Neurons (Green):** The line starts at approximately (2, 2x10^4) and increases to approximately (64, 1.5x10^5).
* **256 Neurons (Light Blue):** The line starts at approximately (2, 3x10^4) and increases to approximately (64, 3x10^5).
* **512 Neurons (Dark Blue):** The line starts at approximately (2, 5x10^4) and increases to approximately (64, 6x10^5).
* **1024 Neurons (Black):** The line starts at approximately (2, 8x10^4) and increases to approximately (64, 8x10^5).
**h:** The application examples demonstrate the potential of this technology in various fields.
### Key Observations
* The diagram progresses from abstract representations of brain structure to concrete hardware implementations.
* The memory crossbar architecture is a key component of the implementation.
* The plot shows that the number of memory elements increases with the number of neurons per tile.
* The application examples highlight the versatility of this technology.
### Interpretation
The diagram illustrates a bio-inspired approach to computing, where the structure and function of the brain are used as a blueprint for designing hardware. The memory crossbar architecture is a promising approach for implementing neural networks in hardware, offering potential advantages in terms of energy efficiency and performance. The "Wasted!" label in (d) suggests that there is room for improvement in the utilization of memory resources. The plot in (g) shows that the number of memory elements required increases rapidly with the number of neurons, which is a key consideration for scaling up this technology. The application examples in (h) demonstrate the potential of this technology to solve real-world problems in various fields.
</details>
Hence, the intrinsic physical structure of the nervous systems necessitates the use of local over global connectivity.
Drawing inspiration from the biological solution for the same problem leads to (i) a similar optimal silicon layout, a small-world graph, and (ii) a similar information transfer mechanism through electrical pulses, or spikes. A large crossbar can be divided into an array of smaller, more locally connected crossbars. These correspond to the green squares of Fig. 5 . 1 e. Each green crossbar hosts a cluster of spiking neurons with a high degree of local connectivity. To pass information among these clusters, small routers can be placed between them - the blue tiles in Fig. 5 . 1 e. We call this two-dimensional systolic matrix of distributed crossbars, or tiles, the neuromorphic Mosaic architecture. Each green tile serves as an analog computing core, which sends out information in the form of spikes, while each blue tile serves as a routing core that spreads the spikes throughout the mesh to other green tiles. Thus, the Mosaic takes advantage of distributed and de-centralized computing and routing to enable not only in-memory computing, but also in-memory routing (Fig. 5 . 1 f). Though the Mosaic architecture is independent of the choice of memory technology, here we are taking advantage of the resistive memory, for its non-volatility, small footprint, low access time and power, and fast programming [ 300 ].
Neighborhood-based computing with resistive memory has been previously explored through using Cellular Neural Networks [ 301 , 302 ], Self-organizing Maps (SOM) [ 303 ], and the crossnet architecture [ 304 ]. Though cellular architectures use local clustering, their lack of global connectivity limits both the speed of information propagation and their configurability. Therefore their application has been mostly limited to low-level image processing [ 305 ]. This also applies for SOMs, which exploit neighboring connectivity and are typically trained with unsupervised methods to visualize low-dimensional data [ 306 ]. Similarly, the crossnet architecture proposed to use distributed small tilted integrated crossbars on top of the CMOS substrate, to create local connectivity domains for image processing [ 304 ]. The tilted crossbars allow the nano-wire feature size to be independent of the CMOS technology node [ 307 ]. However, this approach requires extra post-processing lithographic steps in the fabrication process, which has so far limited its realization.
Unlike most previous approaches, the Mosaic supports both dense local connectivity, and globally sparse long-range connections, by introducing re-configurable routing crossbars between the computing tiles. This allows to flexibly program specific small-world network configurations and to compile them onto the Mosaic for solving the desired task. Moreover, the Mosaic is fully compatible with standard integrated RRAM/CMOS processes available at the foundries, without the need for extra post-processing steps. Specifically, we have designed the Mosaic for smallworld SNNs, where the communication between the tiles is through electrical pulses, or spikes. In the realm of SNN hardware, the Mosaic goes beyond the AER [ 103 , 308 , 309 ], the standard spike-based communication scheme, by removing the need to store each neuron's connectivity information in either bulky local or centralized memory units which draw static power and can consume a large chip area (Supplementary Note 2 ).
In this Article, we first present the Mosaic architecture. We report electrical circuit measurements from computational and Routing Tiles that we designed and fabricated in 130 nm CMOS technology co-integrated with Hafnium dioxide-based RRAM devices. Then, calibrated on these measurements, and using a novel method for training small-world neural networks that exploits the intrinsic layout of the Mosaic, we run system-level simulations applied to a variety of edge computing tasks (Fig. 5 . 1 h). Finally, we compare our approach to other neuromorphic hardware platforms which highlights the significant reduction in spike-routing energy, between one and four orders of magnitude.
## 5 . 2 mosaic hardware computing and routing measurements
In the Mosaic (Fig. 5 . 1 e), each of the tiles consists of a small memristor crossbar that can receive and transmit spikes to and from their neighboring tiles to the North (N), South (S), East (E) and West (W) directions (Supplementary Note 4 ). The memristive crossbar array in the green Neuron Tiles stores the synaptic weights of several LIF neurons. These neurons are implemented using
analog circuits and are located at the termination of each row, emitting voltage spikes at their outputs [ 29 ]. The spikes from the Neuron Tile are copied in four directions of N, S, E and W. These spikes are communicated between Neuron Tiles through a mesh of blue Routing Tiles, whose crossbar array stores the connectivity pattern between Neuron Tiles. The Routing Tiles at different directions decides whether or not the received spike should be further communicated. Together, the two tiles give rise to a continuous mosaic of neuromorphic computation and memory for realizing small-world SNNs.
Small-world topology can be obtained by randomly programming memristors in a computer model of the Mosaic (see Methods and Supplementary Note 3 ). The resulting graph exhibits an intriguing set of connection patterns that reflect those found in many of the small-world graphical motifs observed in animal nervous systems. For example, central 'hub-like' neurons with connections to numerous nodes, reciprocal connections between pairs of nodes reminiscent of winner-take-all mechanisms, and some heavily connected local neural clusters [ 285 ]. If desired, these graph properties could be adapted on the fly by re-programming the RRAM states in the two tile types. For example, a set of desired small-world graph properties can be achieved by randomly programming the RRAM devices into their HCS with a certain probability (Supplementary Note 3 ). Random programming can for example be achieved elegantly by simply modulating RRAM programming voltages [ 125 ].
For Mosaic-based small-world graphs, we estimate the required number of memory devices (synaptic weight and routing weight) as a function of the total number of neurons in a network, through a mathematical derivation (see Methods). Fig. 5 . 1 g plots the memory footprint as a function of the number of neurons in each tile for different network sizes. Horizontal dashed lines show the number of memory elements using one large crossbar for each network size, as has previously been used for RNN implementations [ 310 ]. The cross-over points, at which the Mosaic memory footprint becomes favorable, are denoted with a cross. While for smaller network sizes (i.e. 128 neurons) no memory reduction is observed compared to a single large array, the memory saving becomes increasingly important as the network is scaled. For example, given a network of 1024 neurons and 4 neurons per Neuron Tile, the Mosaic requires almost one order of magnitude fewer memory devices than a single crossbar to implement an equivalent network model.
neuron tile circuits : small -worlds Each Neuron Tile in the Mosaic (Fig. 5 . 2 a) is composed of multiple rows, a circuit that models a LIF neuron and its synapses. The details of one neuron row is shown in Fig. 5 . 2 b. It has N parallel one-transistor-one-resistor ( 1 T 1 R) RRAM structures at its input. The synaptic weights of each neuron are stored in the conductance level of the RRAM devices in one row. On the arrival of any of the input events Vin < i > , the amplifier pins node Vx to Vtop , and thus a read voltage equivalent to Vtop -Vbot is applied across Gi , giving rise to current i in at M 1 , and in turn to i buf f . This current pulse is mirrored through Iw to the 'synaptic dynamics' circuit, Differential Pair Integrator (DPI) [ 311 ], which low pass filters it through charging the capacitor M 9 in the presence of the pulse, and discharging it through current Itau in its absence. The charge/discharge of M 9 generates an exponentially decaying current, Isyn , which is injected into the neuron's membrane potential node, Vmem , and charges capacitor M 13 . The capacitor leaks through M 11 , whose rate is controlled by Vlk at its gate. As soon as the voltage developed on Vmem passes the threshold of the following inverter stage, it generates a pulse, at Vout . The refractory period time constant depends on the capacitor M 16 and the bias on Vrp . (For a more detailed explanation of the circuit, please see Supplementary Note 5 ).
We have fabricated and measured the circuits of the Neuron Tile in a 130 nm CMOS technology integrated with RRAM devices [ 119 ]. The measurements were done on the wafer level, using a probe station shown in Fig. 5 . 2 c. In the fabricated circuit, we statistically characterized the RRAMs through iterative programming [ 312 ] and controlling the programming current, resulting in nine stable conductance states, G , shown in Fig. 5 . 2 d. After programming each device, we apply a pulse on Vin < 0 > and measure the voltage on Vsyn , which is the voltage developed on the M 9 capacitor. We repeat the experiment for four different conductance levels of 4 µ S , 48 µ S , 64 µ S and 147 µ S . The resulting Vsyn traces are plotted in Fig. 5 . 2 e. Vsyn starts from the initial value close to the power supply, 1 . 2 V. The amount of discharge depends on the Iw current which is a linear function of the conductance value of the RRAM, G . The higher the G , the higher the
x
i
t
o
p
i
w
F igure 5 . 2 : Experimental results from the neuron column circuit. (a) Neuron Tile, a crossbar with feed-forward and recurrent inputs displaying network parameters represented by colored squares. (b) Schematic of a single row of the fabricated crossbars, where RRAMs represent neuron weights. Insets show scanning and transmission electron microscopy images of the 1 T 1 R stack with a hafnium-dioxide layer sandwiched between memristor electrodes. Upon input events Vin < i > , Vtop -Vbot is applied across Gi , yielding i in and subsequently i buf f , which feeds into the synaptic dynamic block, producing exponentially decaying current isyn , with a time constant set by MOS capacitor M 9 and bias current Itau . Integration of isyn into neuron membrane potential Vmem triggers an output pulse ( Vout ) upon exceeding the inverter threshold. Refractory period regulation is achieved through MOS cap M 16 and Vrp bias. (c) Wafer-level measurement setup utilizes an Arduino for logic circuitry management to program RRAMs and a B 1500 Device Parameter Analyzer to read device conductance. (d) Cumulative distributions of RRAM conductance ( G ) resulting from iterative programming in a 4096 -device RRAM array with varied SET programming currents. (e) Vsyn initially at 1 . 2 ,V decreases as capacitor M 9 discharges upon pulse arrival at time 0 . Discharge magnitude depends on Iw set by G . Four conductance values' Vsyn curves are recorded. (f) Input pulse train (gray pulses) at Vin < 0 > increases zeroth neuron's Vmem (purple trace) until it fires (light blue trace) after six pulses, causing feedback influence on neuron 1 's Vmem . (g) Statistical measurements on peak membrane potential in response to a pulse across a 5 -neuron array over 10 cycles. (h) Neuron output frequency linearly correlates with G , with error bars reflecting variability across 4096 devices.
<details>
<summary>Image 33 Details</summary>

### Visual Description
## Memristor Circuit Analysis
### Overview
The image presents a comprehensive analysis of a memristor-based circuit, encompassing circuit diagrams, experimental setup, and performance characterization through various plots and graphs. It includes circuit schematics, cumulative distribution functions (CDFs), transient response plots, and frequency response curves, providing a detailed overview of the circuit's behavior and characteristics.
### Components/Axes
**a) Memristor Array Diagram:**
* A 5x5 grid of memristor devices.
* Each row is connected to a circuit represented by a triangle with an arrow.
* Input pulse shown on the left.
* Green box highlights a specific row.
**b) Circuit Schematic:**
* Left: Memristor array with memristors labeled G0 to GN. Input voltages V<0> to V<N> are applied to the memristors. Vbot and Vtop are indicated.
* Center: A CMOS circuit with transistors labeled M1, M2, and M3.
* Right: A more complex circuit involving transistors M4 to M17, with key voltages Vsyn, Vmem, and Vout labeled.
**c) Experimental Setup Photograph:**
* A photograph of the experimental setup, including a computer screen displaying a waveform, various electronic instruments, and the circuit under test.
**d) Cumulative Distribution Function (CDF) Plot:**
* X-axis: G (µS), ranging from 0 to 125.
* Y-axis: CDF, ranging from 0 to 1.00.
* Multiple CDF curves, each representing a different condition or device.
**e) Transient Response Plot:**
* X-axis: Time (ms), ranging from 0 to 0.2.
* Y-axis: Vsyn (V), ranging from 0.8 to 1.2.
* Four curves representing different conductance values: G = 147 µS (blue), G = 64 µS (red), G = 48 µS (green), and G = 4 µS (purple).
**f) Voltage Response Plot:**
* X-axis: Time (ms), ranging from 0 to 1.2.
* Y-axis: Voltage (V), ranging from 0 to 1.2.
* Curves representing V<0> (grey), Vmem (black), V<1> (red), and Vout (blue).
* A horizontal dashed line indicates the threshold voltage Vth.
**g) Peak Vmem vs. Neuron Row ID Box Plot:**
* X-axis: Neuron Row ID, ranging from 1 to 5.
* Y-axis: Peak Vmem (V), ranging from 0 to 0.5.
* Box plots showing the distribution of peak Vmem for each neuron row.
* Top axis: G (µS) values corresponding to each Neuron Row ID: 8, 39, 43, 58, 102.
**h) Output Frequency vs. RRAM Conductance Plot:**
* X-axis: RRAM Conductance (µS), ranging from 20 to 120.
* Y-axis: Output Frequency (kHz), ranging from 10 to 90.
* Two curves representing different bias voltages: Vlk = 250mV (blue) and Vlk = 300mV (green).
### Detailed Analysis
**a) Memristor Array Diagram:**
* The diagram shows a 5x5 array of memristors. Each memristor is represented by a square. The color of the squares varies, but the significance of the color is not immediately apparent from the diagram itself.
* Each row of memristors is connected to a circuit, possibly an amplifier or readout circuit, indicated by a triangle with an arrow.
* The green box highlights a specific row, suggesting it is the focus of further analysis or experimentation.
**b) Circuit Schematic:**
* The circuit schematic is divided into three main sections.
* The left section shows the memristor array with memristors labeled G0 to GN. Input voltages V<0> to V<N> are applied to the memristors. Vbot and Vtop are indicated.
* The center section shows a CMOS circuit with transistors labeled M1, M2, and M3. This section likely serves as a buffer or amplifier for the memristor array output.
* The right section shows a more complex circuit involving transistors M4 to M17, with key voltages Vsyn, Vmem, and Vout labeled. This section likely implements the core functionality of the circuit, such as signal processing or computation.
**c) Experimental Setup Photograph:**
* The photograph shows a typical experimental setup for testing electronic circuits.
* A computer screen displays a waveform, likely showing the output of the circuit under test.
* Various electronic instruments, such as power supplies, signal generators, and oscilloscopes, are visible.
* The circuit under test is likely located on the black breadboard in the center of the image.
**d) Cumulative Distribution Function (CDF) Plot:**
* The CDF plot shows the distribution of memristor conductance values.
* Each curve represents a different condition or device.
* The curves are clustered in distinct regions, suggesting that the memristors can be programmed to different conductance states.
* The CDF curves shift to the right as the conductance increases.
**e) Transient Response Plot:**
* The transient response plot shows the response of the circuit to a pulse input.
* The Y-axis, Vsyn (V), represents the voltage at the synapse.
* The curves show the voltage response for different memristor conductance values.
* The voltage drops rapidly after the pulse is applied and then recovers gradually.
* The magnitude of the voltage drop is larger for higher conductance values.
* G = 147 µS (blue): The voltage drops to approximately 0.85V at 0.03ms and recovers to approximately 1.15V at 0.2ms.
* G = 64 µS (red): The voltage drops to approximately 0.9V at 0.03ms and recovers to approximately 1.15V at 0.15ms.
* G = 48 µS (green): The voltage drops to approximately 0.95V at 0.03ms and recovers to approximately 1.15V at 0.1ms.
* G = 4 µS (purple): The voltage remains relatively constant at approximately 1.17V.
**f) Voltage Response Plot:**
* The voltage response plot shows the voltage at different points in the circuit over time.
* V<0> (grey): Remains at approximately 1.2V.
* Vmem (black): Increases in steps, reaching approximately 0.6V at 0.8ms, then spikes to approximately 1.2V at 1.1ms.
* V<1> (red): Remains at approximately 0V until 1.1ms, then rises to approximately 0.2V.
* Vout (blue): Remains at approximately 0V until 1.1ms, then rises to approximately 0.8V.
* The threshold voltage Vth is indicated by a horizontal dashed line at approximately 0.6V.
**g) Peak Vmem vs. Neuron Row ID Box Plot:**
* The box plot shows the distribution of peak Vmem values for each neuron row.
* The peak Vmem values increase with increasing Neuron Row ID.
* The box plots show the median, quartiles, and outliers for each distribution.
* Neuron Row ID 1 (G = 8 µS): Peak Vmem is approximately 0.1V.
* Neuron Row ID 2 (G = 39 µS): Peak Vmem is approximately 0.25V.
* Neuron Row ID 3 (G = 43 µS): Peak Vmem is approximately 0.3V.
* Neuron Row ID 4 (G = 58 µS): Peak Vmem is approximately 0.35V.
* Neuron Row ID 5 (G = 102 µS): Peak Vmem is approximately 0.5V.
**h) Output Frequency vs. RRAM Conductance Plot:**
* The output frequency increases linearly with increasing RRAM conductance.
* The output frequency is higher for Vlk = 300mV (green) than for Vlk = 250mV (blue).
* Vlk = 250mV (blue): The output frequency ranges from approximately 15 kHz at 30 µS to approximately 85 kHz at 120 µS.
* Vlk = 300mV (green): The output frequency ranges from approximately 20 kHz at 30 µS to approximately 90 kHz at 120 µS.
### Key Observations
* The memristor conductance can be programmed to different states, as shown by the CDF plot.
* The transient response of the circuit is dependent on the memristor conductance, with higher conductance values resulting in larger voltage drops.
* The peak Vmem value increases with increasing Neuron Row ID, indicating that the memristor conductance is increasing along the row.
* The output frequency of the circuit increases linearly with increasing RRAM conductance.
### Interpretation
The data suggests that the memristor-based circuit is functioning as intended. The memristors can be programmed to different conductance states, and the circuit's response is dependent on the memristor conductance. The linear relationship between output frequency and RRAM conductance suggests that the circuit can be used for analog signal processing or computation. The different bias voltages (Vlk) allow for tuning the circuit's performance. The circuit's behavior is consistent with the expected behavior of a memristor-based system. The results demonstrate the potential of memristors for use in neuromorphic computing and other applications.
</details>
Iw , and higher the decrease in Vsyn , resulting in higher Isyn which is integrated by the neuron membrane Vmem . The peak value of the membrane potential in response to a pulse is measured across one array of 5 neurons, each with a different conductance level (Fig. 5 . 2 g). Each pulse increases the membrane potential according to the corresponding conductance level, and once it hits a threshold, it generates an output spike (Fig. 5 . 2 f). The peak value of the neuron's membrane potential and thus its firing rate is proportional to the conductance G , as shown in Fig. 5 . 2 h. The error bars on the plot show the variability of the devices in the 4 kb array. It is worth noting that this implementation does not take into account the negative weight, as the focus of the design has been on the concept. Negative weights could be implemented using a differential signaling approach, by using two RRAMs per synapse [ 115 ].
routing tile circuits : connecting small -worlds A Routing Tile circuit is shown in Fig. 5 . 3 a. It acts as a flexible means of configuring how spikes emitted from Neuron Tiles propagate locally between small-worlds. Thus, the routed message entails a spike, which is either
w
l
k
r
p
F igure 5 . 3 : Experimental measurements of the fabricated Routing Tile circuits. (a) The Routing Tile, a crossbar whose memory state steers the input spikes from different directions towards the destination. (b) Detailed schematic of one row of the fabricated routing circuits. On the arrival of a spike to any of the input ports of the Routing Tile, V in < i > , a current proportional to Gi flows in i in , similar to the Neuron Tile. A current comparator compares this current against a reference current, I re f , which is a bias current generated on chip through providing a DC voltage from the I/O pads to the gate of an on-chip transistor. If i in > i re f , the spike will get regenerated, thus 'pass', or is 'blocked' otherwise. (c) Wafer-level measurements of the test circuits through the probe station test setup. (d) Measurements from 4 kb array shows the Cumulative Distribution Function (CDF) of the RRAM in its High Conductive State (HCS) and Low Conductive State (LCS). The line between the distributions that separates the two is considered as the 'Threshold' conductance, which the decision boundary for passing or blocking the spikes. Based on this Threshold value, the I re f bias in panel (b) is determined. (e) Experimental results from the Routing Tile, with continuous and dashed blue traces showing the waveforms applied to the <N> and <S> inputs, while the orange trace shows the response of the output towards the <E> port. The <E> output port follows the <N> input, resulting from the device programmed into the HCS, while the input from the <S> port gets blocked as the corresponding RRAM device is in its LCS. (f) A binary checker-board pattern is programmed into the routing array, to show a ratio of 10 between the High Resistive and Low Resistive state, which sets a perfect boundary for a binary decision required for the Routing Tile.
<details>
<summary>Image 34 Details</summary>

### Visual Description
## Multi-Panel Figure: Memristor Circuit and Characterization
### Overview
The image presents a multi-panel figure detailing a memristor-based circuit, its characterization, and performance. The figure includes a schematic diagram of the circuit, a photograph of the experimental setup, cumulative distribution functions (CDFs) of conductance, voltage waveforms, and a conductance heatmap.
### Components/Axes
**Panel a:**
* **Labels:** `<N>`, `<W>`, `<S>`, `<E>` are placed above a schematic of a memristor array.
* The array consists of two rows of memristors, some filled in dark green, some light green, and some white.
* The array is enclosed in a light blue box.
**Panel b:**
* **Labels:** V<sub>bot</sub>, V<sub>in</sub><0>, V<sub>in</sub><1>, V<sub>in</sub><N>, V<sub>top</sub>, V<sub>x</sub>, i<sub>in</sub>, i<sub>buff</sub>, i<sub>ref</sub>, V<sub>out</sub>, G<sub>0</sub>, G<sub>1</sub>, G<sub>N</sub>.
* The panel shows a circuit diagram with memristors connected to an amplifier and other circuit elements.
* Microscope images of the device are shown as insets.
**Panel c:**
* A photograph of the experimental setup, including a microscope and associated electronics.
**Panel d:**
* **Axes:**
* X-axis: Conductance (µS), ranging from 0 to 125.
* Y-axis: CDF, ranging from 0.0 to 1.0.
* **Labels:** "Threshold" (vertical dashed red line at approximately 60 µS), "LCS" (light green curve), "HCS" (dark green curve).
**Panel e:**
* **Axes:**
* X-axis: Time (µs), ranging from -4 to 8.
* Y-axis: Voltage (V), ranging from 0.0 to 1.2.
* **Labels:** V<sub>in</sub><N> (blue dashed line), V<sub>in</sub><S> (red dashed-dotted line), V<sub>out</sub><E> (solid black line).
**Panel f:**
* **Axes:**
* X-axis: Inputs, labeled 0, 1, 2, 3.
* Y-axis: Outputs, labeled 0, 1.
* Vertical Colorbar: Conductance (µS), ranging from 0 to 100.
* The panel shows a 2x4 heatmap with varying shades of green, representing conductance values.
### Detailed Analysis
**Panel d (CDF Plot):**
* **LCS (Light Green):** The CDF of the LCS (Low Conductance State) starts at 0 and rises sharply, reaching 1.0 around 50 µS.
* **HCS (Dark Green):** The CDF of the HCS (High Conductance State) starts at 0 and rises more gradually, reaching approximately 0.3 at 125 µS.
* **Threshold:** The vertical red dashed line indicates a threshold value of approximately 60 µS.
**Panel e (Voltage Waveforms):**
* **V<sub>in</sub><N> (Blue Dashed):** The input voltage at node N switches between approximately 0 V and 1.2 V.
* **V<sub>in</sub><S> (Red Dashed-Dotted):** The input voltage at node S switches between approximately 0 V and 1.2 V, with a slight delay compared to V<sub>in</sub><N>.
* **V<sub>out</sub><E> (Solid Black):** The output voltage at node E switches between approximately 0 V and 1.2 V, responding to the input voltages.
**Panel f (Conductance Heatmap):**
* The heatmap represents the conductance values for different input and output combinations.
* The conductance values range from approximately 20 µS (lightest green) to 80 µS (darkest green).
* Specific conductance values for each input/output combination:
* Output 0:
* Input 0: ~80 µS
* Input 1: ~20 µS
* Input 2: ~80 µS
* Input 3: ~20 µS
* Output 1:
* Input 0: ~20 µS
* Input 1: ~80 µS
* Input 2: ~20 µS
* Input 3: ~80 µS
### Key Observations
* The CDF plot (Panel d) shows a clear separation between the low and high conductance states of the memristor.
* The voltage waveforms (Panel e) demonstrate the switching behavior of the memristor circuit in response to input signals.
* The conductance heatmap (Panel f) indicates that the memristor array can perform logic operations based on the input signals.
### Interpretation
The data presented in the figure demonstrates the functionality of a memristor-based circuit. The CDF plot confirms the existence of distinct conductance states, which are essential for memristor-based memory and logic applications. The voltage waveforms show the dynamic response of the circuit to input signals, indicating its ability to perform switching operations. The conductance heatmap illustrates the potential of the memristor array for implementing logic functions, where different input combinations result in distinct output conductance values. The arrangement of the memristors and the circuit design enable the implementation of logic gates or other computational functions. The experimental setup photograph provides context for the physical implementation of the circuit.
</details>
blocked by the router, if the corresponding RRAM is in its HRS, or is passed otherwise. The functional principles of the Routing Tile circuits are similar to the Neuron Tiles. The principal difference is the replacement of the synapse and neuron circuit models with a simple current comparator circuit (highlighted with a black box in Fig. 5 . 3 b). The measurements were done on the wafer level, using a probe station shown in Fig. 5 . 3 c. On the arrival of a spike on an input port of the Routing Tile, V in < i > , 0 < i < N , a current proportional to Gi flows to the device, giving rise to read current i buf f . A current comparator compares i buf f against i re f , which is a bias generated on chip by providing a voltage from the I/O pad to the gate of a transistor (not shown in the Fig.). The I re f value is decided based on the 'Threshold' conductance boundary in Fig. 5 . 3 d. The Routing Tile regenerates a spike if the resulting i buf f is higher than i re f , and blocks it otherwise, since the output remains at zero. Therefore, the state of the device serves to either pass or block input spikes arriving from different input ports ( N , S , W , E ), sending them to its output ports (Supplementary Note 4 ). Since the routing array acts as a binary pass or no-pass, the decision boundary is on whether the devices is in its HCS or LCS, as shown in Fig. 5 . 3 d [ 313 ]. Using a fabricated Routing Tile circuit, we demonstrate its functionality experimentally in Fig. 5 . 3 e. Continuous and dashed blue traces show the waveforms applied to the <N> and <S> inputs of the tile respectively, while the orange trace shows the response of the output towards the E port. The E output port follows the N input resulting from the corresponding RRAM programmed into its HCS, while the input from the S port gets blocked as the corresponding RRAM device is in its LCS, and thus the output remains at zero. This output pulse propagates onward to the next tile.
Note that in Fig. 5 . 3 d the output spike does not appear as rectangular due to the large capacitive load of the probe station (see Methods). To allow for greater reconfigurability, more channels per direction can be used in the Routing Tiles (see Supplementary Note 6 ).
## 5 . 3 analog hardware -aware simulations
## Application to real-time sensory-motor processing through hardware-aware simulations
The Mosaic is a programmable hardware well suited for the application of pre-trained smallworld RSNN in energy and memory-constrained applications at the edge. Through hardwareaware simulations, we assess the suitability of the Mosaic on a series of representative sensory processing tasks, including anomaly detection in heartbeat (application in wearable devices), keyword spotting (application in voice command), and motor system control (application in robotics) tasks (Fig. 5 . 4 a,b,c respectively). We apply these tasks to three network cases, ( i ) a non-constrained RSNN with full-bit precision weights ( 32 bit Floating Point (FP 32 )) (Fig. 5 . 4 d), ( ii ) Mosaic constrained connectivity with FP 32 weights (Fig. 5 . 4 e), and ( iii ) Mosaic constrained connectivity with noisy and quantized RRAM weights (Fig. 5 . 4 f). Therefore, case ( iii ) is fully hardware-aware, including the architecture choices (e.g., number of neurons per Neuron Tile), connectivity constraints, noise and quantization of weights.
For training case i , we use BPTT [ 314 ] with surrogate gradient approximations of the derivative of a LIF neuron activation function on a vanilla RSNN [ 165 ] (see Methods). For training case ( ii ) , we introduce a Mosaic-regularized cost function during the training, which leads to a learned weight matrix with small-world connectivity that is mappable onto the Mosaic (see Methods). For case ( iii ) , we quantize the weights using a mixed hardware-software experimental methodology whereby memory elements in a Mosaic software model are assigned conductance values programmed into a corresponding memristor in a fabricated array. Programmed conductances are obtained through a closed-loop programming strategy [ 312 , 315 -317 ].
For all the networks and tasks, the input is fed as a spike train and the output class is identified as the neuron with the highest firing rate. The RSNN of case ( i ) includes a standard input layer, recurrent layer, and output layer. In the Mosaic cases ( ii ) and ( iii ) , the inputs are directly fed into the Mosaic Neuron Tiles from the top left, are processed in the small-world RSNN, and the resulting output is taken directly from the opposing side of the Mosaic, by assigning some of the Neuron Tiles in the Mosaic as output neurons. As the inputs and outputs are part of the Mosaic fabric, this scheme avoids the need for explicit input and output readout layers, relative to the RSNN, that may greatly simplify a practical implementation.
ecg anomaly detection We first benchmark our approach on a binary classification task in detecting anomalies in the Electrocardiogram (ECG) recordings of the MIT-BIH Arrhythmia Database [ 318 ]. In order for the data to be compatible with the RSNN, we first encode the continuous ECG time-series into trains of spikes using a delta-modulation technique, which describes the relative changes in signal magnitude [ 319 , 320 ] (see Methods). An example heartbeat and its spike encoding are plotted in Fig. 5 . 4 a.
The accuracy over the test set for five iterations of training, transfer, and test for cases ( i ) (red), ( ii ) (green) and ( iii ) (blue) is plotted in Fig. 5 . 4 g using a boxplot. Although the Mosaic constrains the connectivity to follow a small-world pattern, the median accuracy of case ( ii ) only drops by 3 % compared to the non-constrained RSNN of case ( i ) . Introducing the quantization and noise of the RRAM devices in case ( iii ) , drops the median accuracy further by another 2 %, resulting in a median accuracy of 92 . 4 %. As often reported, the variation in the accuracy of case iii also increases due to the cycle-to-cycle variability of RRAM devices [ 317 ].
keyword spotting ( kws ) We then benchmarked our approach on a 20 -class speech task using the Spiking Heidelberg Digits (SHD) [ 206 ] dataset. SHD includes the spoken digits between zero and nine in English and German uttered by 12 speakers. In this dataset, the speech signals have been encoded into spikes using a biologically-inspired cochlea model which effectively
computes a spectrogram with Mel-spaced filter banks, and convert them into instantaneous firing rates [ 206 ].
The accuracy over the test set for five iterations of training, transfer, and test for cases ( i ) (red), ( ii ) (green) and ( iii ) (blue) is plotted in Fig. 5 . 4 h using a boxplot. The dashed red box is taken directly from the SHD paper [ 206 ]. The Mosaic connectivity constraints have only an effect of about 2 . 5 % drop in accuracy, and a further drop of 1 % when introducing RRAM quantization and noise constraints. Furthermore, we experimented with various numbers of Neuron Tiles and the number of neurons within each Neuron Tile (Supplementary Note 8 , Fig. S 10 ), as well as sparsity constraints (Supplementary Note 8 , Fig. S 11 ) as hyperparameters. We found that optimal performance can be achieved when an adequate amount of neural resources are allocated for the task.
motor control by reinforcement learning Finally, we also benchmark the Mosaic in a motor system control RL task, i.e., the Half-cheetah [ 321 ]. RL has applications ranging from active sensing via camera control [ 322 ] to dexterous robot locomotion [ 323 ].
To train the network weights, we employ the evolutionary strategies (ES) from Salimans et. al. [ 113 ] in reinforcement learning settings [ 324 -326 ]. ES enables stochastic perturbation of the network parameters, evaluation of the population fitness on the task and updating the parameters using stochastic gradient estimate, in a scalable way for RL.
Fig. 5 . 4 i shows the maximum gained reward for five runs in cases i, ii, and iii, which indicates that the network learns an effective policy for forward running. Unlike tasks a and b, the network connectivity constraints and parameter quantization have relatively little impact.
Encouragingly, across three highly distinct tasks, performance was only slightly impacted when passing from an unconstrained neural network topology to a noisy small-world neural network. In particular, for the half-cheetah RL task, this had no impact.
## 5 . 4 benchmarking routing energy in neuromorphic platforms
In-memory computing greatly reduces the energy consumption inherent to data movement in Von Neumann architectures. Although crossbars bring memory and computing together, when neural networks are scaled up, neuromorphic hardware will require an array of distributed crossbars (or cores) when faced with physical constraints, such as IR drop and capacitive charging [ 101 ]. Small-world networks may naturally permit the minimization of communication between these crossbars, but a certain energy and latency cost associated with the data movement will remain, since the compilation of the small-world network on a general-purpose routing architecture is not ideal. Hardware that is specifically designed for small-world networks will ideally minimize these energy and latency costs (Fig. 5 . 1 g). In order to understand how the spike routing efficiency of the Mosaic compares to other SNN hardware platforms, optimized for other metrics such as maximizing connectivity, we compare the energy and latency of (i) routing one spike within a core ( 0 -hop), (ii) routing one spike to the neighboring core ( 1 -hop) and (iii) the total routing power consumption required for tasks A and B, i.e., heartbeat anomaly detection and spoken digit classification respectively (Fig. 5 . 4 a,b).
The results are presented in Table 5 . 1 . We report the energy and latency figures, both in the original technology where the systems are designed, and scaled values to the 130 nm technology, where Mosaic circuits are designed, using general scaling laws [ 327 ]. The routing power estimates for Tasks A and B are obtained by evaluating the 0 - and 1 -hop routing energy and the number of spikes required to solve the tasks, neglecting any other circuit overheads. In particular, the optimization of the sparsity of connections between neurons implemented to train Mosaic assures that 95 % of the spikes are routed with 0 -hops operations, while about 4 % of the spikes are routed via 1 -hop operations. The remaining spikes require k-hops to reach the destination Neuron Tile. The Routing energy consumption for Tasks A and B is estimated accounting for the total spike count and the routing hop partition.
The scaled energy figures show that although the Mosaic's design has not been optimized for energy efficiency, the 0 - and 1 -hop routing energy is reduced relative to other approaches,
F igure 5 . 4 : Benchmarking the Mosaic against three edge tasks, heartbeat (ECG) arrhythmia detection, keyword spotting (KWS), and motor control by reinforcement learning (RL). (a,b,c) A depiction of the three tasks, along with the corresponding input presented to the Mosaic. (a) ECG task, where each of the two-channel waveforms is encoded into up (UP) and down (DN) spiking channels, representing the signal derivative direction. (b) KWS task with the spikes representing the density of information in different input (frequency) channels. (c) half-cheetah RL task with input channels representing state space, consisting of positional values of different body parts of the cheetah, followed by the velocities of those individual parts. (d, e, f) Depiction of the three network cases applied to each task. (d) Case ( i ) involves a non-constrained Recurrent Spiking Neural Network (RSNN) with full-bit precision weights (FP 32 ), encompassing an input layer, a recurrent layer, and an output layer. (e) Case ( ii ) represents Mosaic-constrained connectivity with FP 32 weights, omitting explicit input and output layers. Input directly enters the Mosaic, and output is extracted directly from it. Circular arrows denote local recurrent connections, while straight arrows signify sparse global connections between cores. (f) Case ( iii ) is similar to case ( ii ) , but with noisy and quantized RRAM weights. (g, h, i) A comparison of task accuracy among the three cases: case ( i ) (red, leftmost box), case ( ii ) (green, middle box), and case ( iii ) (blue, right box). Boxplots display accuracy/maximum reward across five iterations, with boxes spanning upper and lower quartiles while whiskers extend to maximum and minimum values. Median accuracy is represented by a solid horizontal line, and the corresponding values are indicated on top of each box. The dashed red box for the KWS task with FP 32 RSNN network is included from Cramer et al., 2020 [ 206 ] with 1024 neurons for comparison (with mean value indicated). This comparison reveals that the decline in accuracy due to Mosaic connectivity and further due to RRAM weights is negligible across all tasks. The inset figures depict the resulting Mosaic connectivity after training, which follows a small-world graphical structure.
<details>
<summary>Image 35 Details</summary>

### Visual Description
## Neural Network Analysis and Performance
### Overview
The image presents a comparative analysis of different neural network architectures, including RSNN FP32, Mosaic SNN FP32, and Mosaic SNN RRAM. It includes visualizations of input data, network diagrams, and performance metrics in terms of test accuracy.
### Components/Axes
**Panel a:**
* **Type:** Time-series plot with overlaid scatter plot.
* **X-axis:** Time (ms), ranging from 0 to 600 ms.
* **Y-axis:** Input Channel ID, labeled as "1 UP", "1 DN", "2 UP", "2 DN".
* **Data Series:**
* CH1 (solid green line): Represents the input channel 1.
* CH2 (dotted light green line): Represents the input channel 2.
* Red dots overlaid on both lines, indicating discrete events.
* **Inset:** A small diagram of a body with an arrow pointing to the "EAR".
**Panel b:**
* **Type:** Scatter plot and time-series plot.
* **X-axis:** Time (ms), ranging from 0 to 100 ms.
* **Y-axis:** Input Channel ID, ranging from 0 to 120.
* **Data:** Orange dots scattered across the plot, concentrated between 0-20 ms.
* **Audio Waveform:** A time-series plot of an audio waveform is shown on the right side of the scatter plot.
* **Icon:** A microphone icon is present on the right side.
**Panel c:**
* **Type:** Time-series plot.
* **X-axis:** Environment Step, ranging from 0 to 500.
* **Y-axis:** Input Channel ID, labeled as 0, 1, 2, 3, and 17.
* **Data Series:** Multiple blue lines representing different input channels.
* **Inset:** A small image of a checkerboard pattern.
**Panel d:**
* **Type:** Diagram of a recurrent neural network.
* **Description:** Shows a network with an input layer, a recurrent layer (circular with interconnected nodes), and an output layer.
* **Labels:** "Case i", "Input", "Output".
**Panel e:**
* **Type:** Diagram of a mosaic neural network.
* **Description:** Shows a grid-like structure with interconnected blocks. Arrows indicate the flow of information.
* **Labels:** "Case ii", "Input", "Output", "FP32".
**Panel f:**
* **Type:** Diagram of a mosaic neural network.
* **Description:** Similar to panel e, but with a different color scheme and connection pattern.
* **Labels:** "Case iii", "Input", "Output", "Input".
**Panel g:**
* **Type:** Box plot comparing test accuracy.
* **Y-axis:** Test Accuracy, ranging from 0.75 to 1.00.
* **X-axis:** Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM.
* **Values:**
* RSNN FP32: Approximately 0.970
* Mosaic SNN FP32: Approximately 0.936
* Mosaic SNN RRAM: Approximately 0.918
* **Inset:** A matrix visualization.
**Panel h:**
* **Type:** Box plot comparing test accuracy.
* **Y-axis:** Test Accuracy, ranging from 0.55 to 0.80.
* **X-axis:** Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM.
* **Values:**
* RSNN FP32: Approximately 0.765
* Mosaic SNN FP32: Approximately 0.729
* Mosaic SNN RRAM: Approximately 0.703
* **Inset:** A matrix visualization.
**Panel i:**
* **Type:** Box plot comparing test accuracy.
* **Y-axis:** Test Accuracy, ranging from 640 to 780.
* **X-axis:** Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM.
* **Values:**
* RSNN FP32: Approximately 725.835
* Mosaic SNN FP32: Approximately 736.972
* Mosaic SNN RRAM: Approximately 723.86
* **Inset:** A matrix visualization.
### Detailed Analysis or ### Content Details
**Panel a:** The green line (CH1) shows a clear upward trend between 300ms and 400ms, corresponding to the "1 UP" level. The dotted light green line (CH2) remains relatively stable at the "2 DN" level.
**Panel b:** The scatter plot shows a high density of events between 0 and 20 ms. The audio waveform appears to be periodic.
**Panel c:** The time-series plot shows varying levels of activity across different input channels. Channels 2 and 17 exhibit more dynamic behavior.
**Panels d, e, f:** These diagrams illustrate different neural network architectures. Panel d shows a standard recurrent neural network. Panels e and f show mosaic-like architectures with different connection patterns.
**Panels g, h, i:** These box plots compare the test accuracy of the different network architectures. In panel g, RSNN FP32 has the highest accuracy. In panel h, RSNN FP32 also has the highest accuracy. In panel i, Mosaic SNN FP32 has the highest accuracy.
### Key Observations
* RSNN FP32 generally performs well in the tasks represented in panels g and h.
* Mosaic SNN FP32 performs best in the task represented in panel i.
* The matrix visualizations in panels g, h, and i show different patterns, suggesting different internal representations learned by the networks.
### Interpretation
The image presents a comparative study of different neural network architectures and their performance on various tasks. The results suggest that the choice of architecture can significantly impact performance, and the optimal architecture may depend on the specific task. The visualizations of input data and network diagrams provide insights into the internal workings of these networks. The matrix visualizations likely represent the weight matrices or connectivity patterns within the networks, offering a glimpse into how the networks learn and represent information.
</details>
even if we compare with digital approaches in more advanced technology nodes. This efficiency can be attributed to the Mosaic's in-memory routing approach resulting in low-energy routing memory access which is distributed in space. This reduces (i) the size of each router, and thus energy, compared to larger centralized routers employed in some platforms, and (ii) it avoids the use of CAMs, which consumes the majority of routing energy in some other spike-based routing mechanisms (Supplementary Note 2 ).
Neuromorphic platforms have each been designed to optimize different objectives [ 328 ], and the reason behind the energy efficiency of Mosaic in communication lies behind its explicit optimization for this very metric, thanks to its small-world connectivity layout. Despite this, as was shown in Fig. 5 . 4 , the Mosaic does not suffer from a considerable drop in accuracy, at least for problem sizes of the sensory processing applications on the edge. This implies that for these problems, large connectivity between the cores is not required, which can be exploited for reducing the energy.
The Mosaic's latency figure per router is comparable to the average latency of other platforms. Often, and in particular, for neural networks with sparse firing activity, this is a negligible factor. In applications requiring sensory processing of real-world slow-changing signals, the time constants dictating how quickly the model state evolves will be determined by the size of the Vlk in Fig. 5 . 2 , typically on the order of tens or hundreds of milliseconds. Although, the routing latency grows linearly with the number of hops in the networks, as shown in the final connectivity matrices of Fig. 5 . 4 g,h,i, the number of non-local connections decays down exponentially. Therefore, the latency of routing is always much less than the the time scale of the real-world signals on the edge, which is our target application.
The two final rows of Table 5 . 1 indicates the power consumption of the neuromorphic platforms in tasks A and B respectively. All the platforms are assumed to use a core (i.e., neuron tile) size of 32 neurons, and to have an N-hop energy cost equal to N times the 1 -hop value. The potential of the Mosaic is clearly demonstrated, whereby a power consumption of only a few hundreds of pico Watts is required, relative to a few nano/microwatts in the other neuromorphic platforms.
## 5 . 5 discussion
We have identified small-world graphs as a favorable topology for efficient routing, have proposed a hardware architecture that efficiently implements it, designed and fabricated memristor-based building blocks for the architecture in 130 nm technology, and report measurements and comparison to other approaches. We empirically quantified the impact of both the small-world neural network topology and low memristor precision on three diverse and challenging tasks representative of edge-AI settings. We also introduced an adapted machine learning strategy that enforces small-worldness and accounted for the low-precision of noisy RRAM devices. The results achieved across these tasks were comparable to those achieved by floating point precision models with unconstrained network connectivity.
Although the connectivity of the Mosaic is sparse, it still requires more number of routing nodes than computing nodes. However, the Routing Tiles are more compact than the neuron tiles, as they only perform binary classification. This means that the read-out circuitry does not require a large Signal to Noise Ratio (SNR), compared to the neuron tiles. This loosened requirement reduces the overhead of the Routing Tiles readout in terms of both area and power (Supplementary Note 9 ).
In this work, we have treated Mosaic as a standard RSNN, and trained it with BPTT using the surrogate gradient approximation, and simply added the loss terms that punish the network's dense connectivity to shape sparse graphs. Therefore, the potential computational advantages of small-world architectures do not necessarily emerge, and the performance of the network is mainly related to its number of parameters. In fact, we found that Mosaic requires more neurons, but about the same number of parameters, for the same accuracy as that of an RSNN on the same task. This confirms that taking advantage of small-world connectivity requires a novel training procedure, which we hope to develop in the future. Moreover, in this paper, we have benchmarked the Mosaic on sensory processing tasks and have proposed to take advantage of the small-worldness for energy savings thanks to the locality of information processing. However, from a computational perspective, these tasks do not necessarily take advantage of the smallwordness. In future works, one can foresee tasks that can exploit the small-world connectivity from a computational standpoint.
Mosaic favors local processing of input data, in contrast with conventional deep learning algorithms such as Convolutional and Recurrent Neural Networks. However, novel approaches in
◦ Assuming an average resistance value of 10 k Ω , and a read pulse 10 ns width. ∗ The same as energy per Synaptic Operation (SOP), numbers are taken from Basu, et al, 2022 [ 332 ]
| Mosaic | 130 nm ( 1 . 2 V) | on-chip | 400 fJ â—¦ | 400 fJ | 1 . 6 pJ â—¦ | 1 . 6 pJ â—¦ | 25 ns | 25 ns | Yes | 809 pW â—¦ 5 . 06 nW â—¦ |
|-------------------|---------------------|-------------|------------|-----------------|-------------------|----------------------|---------------------|-------------------------------|--------------------------|--------------------------|
| Loihi [ 89 ] | 14 nm ( 0 . 75 V) | on-chip | 23 . 6 pJ | 60 . 416 pJ | 3 . 5 pJ | 10 . 24 pJ | 6 . 5 ns 60 . 35 ns | No | 7 . 71 nW | 248 . 41 nW |
| Dynap-SE [ 103 ] | 180 nm ( 1 . 8 V) | on-chip | 30 pJ | 13 . 4 pJ | 17 pJ (@ 1 . 3 V) | 17 pJ | 40 ns 28 . 88 ns | Yes | 10 . 02 nW | 322 . 7 nW |
| Neurogrid [ 331 ] | 180 nm ( 3 V) | on/off-chip | 1 nJ | 160 pJ | 14 nJ | 8 . 35 nJ | 20 ns 14 . 4 ns | No | 563 . 31 nW | 18 . 14 µ W |
| SpiNNaker [ 330 ] | 130 nm ( 1 . 2 V) | on-chip | 30 . 3 nJ | 30 . 3 nJ | 1 . 11 nJ | 1 . 11 nJ | 200 ps 200 ps | No | 9 . 85 µ W | 317 . 08 µ W |
| TrueNorth [ 329 ] | 28 nm ( 0 . 775 V) | on-chip | 26 pJ | 62 . 4 pJ | 2 . 3 pJ | 5 . 52 pJ . ns | 6 25 29 ns | No | 8 . 47 nW | 272 . 82 nW |
| Neuromorphic Chip | Technology | Routing | ∗ original | sct ∗∗ . 130 nm | ⋄ original | sct. 130 nm original | sct. 130 nm | Optimized for Small-Worldness | Routing Power for task A | Routing Power for task B |
| Neuromorphic Chip | Technology | Routing | 0 -hop | energy | 1 -hop | energy | 1 -hop latency | Optimized for Small-Worldness | Routing Power for task A | Routing Power for task B |
sct. = Scaled to
∗∗
â‹„ = Numbers are taken from Moradi, et al, 2018 [ 103 ]
T able 5 . 1 : Comparison of spike-routing performance across neuromorphic platforms
deep learning, e.g., Vision Transformers with Local Attention [ 333 ] and MLP-mixers [ 334 ], treat input data in a similar way as the Mosaic, subdividing the input dimensions, and processing the resulting patches locally. This is also similar to how most biological system processes information in a local fashion, such as the visual system of fruit flies [ 335 ].
In the broader context, graph-based computing is currently receiving attention as a promising means of leveraging the capabilities of SNNs [ 336 -338 ]. The Mosaic is thus a timely dedicated hardware architecture optimized for a specific type of graph that is abundant in nature and in the real-world and that promises to find application at the extreme-edge.
## 5 . 6 methods
## Design, fabrication of Mosaic circuits
## 5 . 6 . 0 . 1 Neuron and routing column circuits
Both neuron and routing column share the common front-end circuit which reads the conductances of the RRAM devices. The RRAM bottom electrode has a constant DC voltage Vbot applied to it and the common top electrode is pinned to the voltage Vx by a rail-to-rail operational amplifier (OPAMP) circuit. The OPAMP output is connected in negative feedback to its non-inverting input (due to the 180 degrees phase-shift between the gate and drain of transistor M 1 in Fig. 5 . 2 ) and has the constant DC bias voltage Vtop applied to its inverting input. As a result, the output of the OPAMP will modulate the gate voltage of transistor M 1 such that the current it sources onto the node Vx will maintain its voltage as close as possible to the DC bias Vtop . Whenever an input pulse Vin < n > arrives, a current i in equal to ( Vx -Vbot ) Gn will flow out of the bottom electrode. The negative feedback of the OPAMP will then act to ensure that Vx = Vtop , by sourcing an equal current from transistor M 1 . By connecting the OPAMP output to the gate of transistor M 2, a current equal to i in , will therefore also be buffered, as i buf f , into the branch composed of transistors M 2 and M 3 in series. In the Routing Tile, this current is compared against a reference current, and if higher, a pulse is generated and transferred onwards. The current comparator circuit is composed of two current mirrors and an inverter (Fig. 5 . 3 b). In the neuron column, this current is injected into a CMOS differential-pair integrator synapse circuit model [ 188 ] which generates an exponentially decaying waveform from the onset of the pulse with an amplitude proportional to the injected current. Finally, this exponential current is injected onto the membrane capacitor of a CMOS leaky-integrate and fire neuron circuit model [ 339 ] where it integrates as a voltage (see Fig. 5 . 2 b). Upon exceeding a voltage threshold (the switching voltage of an inverter) a pulse is emitted at the output of the circuit. This pulse in turn feeds back and shunts the capacitor to ground such that it is discharged. Further circuits were required in order to program the device conductance states. Notably, multiplexers were integrated on each end of the column in order to be able to apply voltages to the top and bottom electrodes of the RRAM devices.
A critical parameter in both Neuron and Routing Tiles is the spike's pulse width. Minimizing the width of spikes assures maximal energy efficiency, but that comes at a cost. If the duration of the voltage pulse is too low, the readout current from the 1 T 1 R will be imprecise, and parasitic effects due to the metal lines in the array might even impede the correct propagation of either the voltage pulse or the readout current. For this reason, we thoroughly investigated the minimal pulse-width that allows spikes and readout currents to be reliably propagated, at a probability of 99 . 7 % ( 3 σ ). Extensive Monte Carlo simulation resulted in a spike pulse width of around 100 ns. Based on these SPICE simulations, we also estimated the energy consumption of Mosaic for the different tasks presented in Figure 5 . 4 .
## Fabrication/integration
The circuits described in the Results section have been taped-out in 130 nmtechnology at CEA-Leti, in a 200 mm production line. The Front End of the Line, below metal layer 4 , has been realized by ST-Microelectronics, while from the fifth metal layer upwards, including the deposition of the composites for RRAM devices, the process has been completed by CEA-Leti. RRAM devices are
composed of a 5 nm thick HfO 2 layer sandwiched by two 5 nm thick TiN electrodes, forming a TiN / HfO 2 / Ti / TiN stack. Each device is accessed by a transistor giving rise to the 1 T 1 R unit cell. The size of the access transistor is 650 nm wide. 1 T 1 R cells are integrated with CMOS-based circuits by stacking the RRAM cells on the higher metal layers. In the cases of the neuron and Routing Tiles, 1 T 1 R cells are organized in a small - either 2 x 2 or 2 x 4 - matrix in which the bottom electrodes are shared between devices in the same column and the gates shared with devices in the same row. Multiplexers operated by simple logic circuits enable to select either a single device or a row of devices for programming or reading operations. The circuits integrated into the wafer, were accessed by a probe card which connected to the pads of the dimension of [ 50 x 90 ] µ m 2 .
## RRAM characteristics
Resistive switching in the devices used in our paper are based on the formation and rupture of a filament as a result of the presence of an electric field that is applied across the device. The change in the geometry of the filament results in different resistive state in the device. A SET/RESET operation is performed by applying a positive/negative pulse across the device which forms/disrupts a conductive filament in the memory cell, thus decreasing/increasing its resistance. When the filament is formed, the cell is in the HCS, otherwise the cell is is the LCS. For a SET operation, the bottom of the 1 T 1 R structure is conventionally left at ground level, and a positive voltage is applied to the 1 T 1 R top electrode. The reverse is applied in the RESET operation. Typical values for the SET operation are Vgate in [ 0.9 -1.3 ] V , while the Vtop peak voltage is normally at 2.0 V . For the RESET operation, the gate voltage is instead in the [ 2.75, 3.25 ] V range, while the bottom electrode is reaching a peak at 3.0 V . The reading operation is performed by limiting the Vtop voltage to 0.3 V , a value that avoids read disturbances, while opening the gate voltage at 4.5 V .
## Mosaic circuit measurement setups
The tests involved analyzing and recording the dynamical behavior of analog CMOS circuits as well as programming and reading RRAM devices. Both phases required dedicated instrumentation, all simultaneously connected to the probe card. For programming and reading the RRAM devices, Source Measure Units (SMU)s from a Keithley 4200 SCS machine were used. To maximize stability and precision of the programming operation, SET and RESET are performed in a quasi-static manner. This means that a slow rising and falling voltage input is applied to either the Top (SET) or Bottom (RESET) electrode, while the gate is kept at a fixed value. To the Vtop ( t ) , Vbot ( t ) voltages, we applied a triangular pulse with rising and falling times of 1 sec and picked a value for Vgate . For a SET operation, the bottom of the 1 T 1 R structure is conventionally left at ground level, while in the RESET case the Vtop is equal to 0 V and a positive voltage is applied to Vbot . Typical values for the SET operation are Vgate in [ 0.9 -1.3 ] V , while the Vtop peak voltage is normally at 2.0 V . Such values allow to modulate the RRAM resistance in an interval of [ 5 -30 ] k Ω corresponding to the HCS of the device. For the RESET operation, the gate voltage is instead in the [ 2.75, 3.25 ] V range, while the bottom electrode is reaching a peak at 3.0 V .
The LCS is less controllable than the HCS due to the inherent stochasticity related to the rupture of the conductive filament, thus the HRS level is spread out in a wider [ 80 -1000 ] k Ω interval. The reading operation is performed by limiting the Vtop voltage to 0.3 V , a value that avoids read disturbances, while opening the gate voltage at 4.5 V .
Inputs and outputs are analog dynamical signals. In the case of the input, we have alternated two HP 8110 pulse generators with a Tektronix AFG 3011 waveform generator. As a general rule, input pulses had a pulse width of 1 µ s and rise/fall time of 50 ns . This type of pulse is assumed as the stereotypical spiking event of a Spiking Neural Network. Concerning the outputs, a 1 GHz Teledyne LeCroy oscilloscope was utilized to record the output signals.
## Mosaic layout-aware training via regularizing the loss function
We introduce a new regularization function, LM , that emphasizes the realization cost of short and long-range connections in the Mosaic layout. Assuming the Neuron Tiles are placed in a square layout, LM calculates a matrix H ∈ R j × i , expressing the minimum number of Routing Tiles used to connect a source neuron Nj to target neuron Ni , based on their Neuron Tile positions on Mosaic. Following this, a static mask S ∈ R j × i is created to exponentially penalize the long-range connections such that S = e β H -1, where β is a positive number that controls the degree of penalization for connection distance. Finally, we calculate the LM = ∑ S ⊙ W 2 , for the recurrent weight matrix W ∈ R j × i . Note that the weights corresponding to intra-Neuron Tile connections (where H = 0) are not penalized, allowing the neurons within a Neuron Tile to be densely connected. During the training, task-related cross-entropy loss term (total reward in case of RL) increases the network performance, while LM term reduces the strength of the neural network weights creating long-range connections in Mosaic layout. Starting from the 10 th epoch, we deterministically prune connections (replacing the value of corresponding weight matrix elements to 0 ) when their L 1 -norm is smaller than a fixed threshold value of 0 . 005 . This pruning procedure privileges local connections (i.e., those within a Neuron Tile or to a nearby Neuron Tile) and naturally gives rise to a small-world neural network topology. Our experiments found that gradient norm clipping during the training and reducing the learning rate by a factor of ten after 135 th epoch in classification tasks help stabilize the optimization against the detrimental effects of pruning.
## RRAM-aware noise-resilient training
The strategy of choice for endowing Mosaic with the ability to solve real-world tasks is offline training. This procedure consists of producing an abstraction of the Mosaic architecture on a server computer, formalized as a Spiking Neural Network that is trained to solve a particular task. When the parameters of Mosaic are optimized, in a digital floating-point32 -bits (FP 32 ) representation, they are to be transferred to the physical Mosaic chip. However, the parameters in Mosaic are constituted by RRAM devices, which are not as precise as the FP 32 counterparts. Furthermore, RRAMs suffer from other types of non-idealities such as programming stochasticity, temporal conductance relaxation, and read noise [ 312 , 315 -317 ].
To mitigate these detrimental effects at the weight transfer stage, we adapted the noise-resilient training method for RRAM devices [ 18 , 340 ]. Similar to quantization-aware training, at every forward pass, the original network weights are altered via additive noise (quantized) using a straight-through estimator. We used a Gaussian noise with zero mean and standard deviation equal to 5 % of the maximum conductance to emulate transfer non-idealities. The profile of this additive noise is based on our RRAM characterization of an array of 4096 RRAM devices [ 312 ], which are programmed with a program-and-verify scheme (up to 10 iterations) to various conductance levels then measured after 60 seconds for modeling the resulting distribution.
## ECG task description
The Mosaic hardware-aware training procedure is tested on a electrocardiogram arrhythmia detection task. The ECG dataset was downloaded from the MIT-BIH arrhythmia repository [ 318 ]. The database is composed of continuous 30 -minute recordings measured from multiple subjects. The QRS complex of each heartbeat has been annotated as either healthy or exhibiting one of many possible heart arrhythmias by a team of cardiologists. We selected one patient exhibiting approximately half healthy and half arrhythmic heartbeats. Each heartbeat was isolated from the others in a 700 ms time-series centered on the labelled QRS complex. Each of the two 700 ms channel signals were then converted to spikes using a delta modulation scheme [ 341 ]. This consists of recording the initial value of the time-series and, going forward in time, recording the time-stamp when this signal changes by a pre-determined positive or negative amount. The value of the signal at this time-stamp is then recorded and used in the next comparison forward
in time. This process is then repeated. For each of the two channels this results in four respective event streams - denoting upwards and downwards changes in the signals. During the simulation of the neural network, these four event streams corresponded to the four input neurons to the spiking recurrent neural network implemented by the Mosaic.
Data points were presented to the model in mini-batches of 16 . Two populations of neurons in two Neuron Tiles were used to denote whether the presented ECG signals corresponded to a healthy or an arrhythmic heartbeat. The softmax of the total number of spikes generated by the neurons in each population was used to obtain a classification probability. The negative log-likelihood was then minimized using the categorical cross-entropy with the labels of the signals.
## Keyword Spotting task description
For keyword spotting task, we used SHD dataset ( 20 classes, 8156 training, 2264 test samples). Each input example drawn from the dataset is sampled three times along the channel dimension without overlap to obtain three augmentations of the same data with 256 channels each. The advantage of this method is that it allows feeding the input stream to fewer Neuron Tiles by reducing the input dimension and also triples the sizes of both training and testing datasets.
We set the simulation time step to 1 ms in our simulations. The recurrent neural network architecture consists of 2048 LIF neurons with 45 ms membrane time constant. The neurons are distributed into 8 x 8 Neuron Tiles with 32 neurons each. The input spikes are fed only into the neurons of the Mosaic layout's first row ( 8 tiles). The network prediction is determined after presenting each speech data for 100 ms by counting the total number of spikes from 20 neurons (total number of classes) in 2 output Neuron Tiles located in the bottom-right of the Mosaic layout. The neurons inside input and output Neuron Tiles are not recurrently connected. The network is trained using BPTT on the loss L = LCE + λ LM , where LCE is the cross-entropy loss between input logits and target and LM is the Mosaic-layout aware regularization term. We use batch size of 512 and suitably tuned hyperparameters.
## Reinforcement Learning task description
In the RL experiments, we test the versatility of a Mosaic-optimized RSNN on a continuous action space motor-control task, half-cheetah, implemented using the BRAX physics engine for rigid body simulations [ 342 ]. At every timestep t , environment provides an input observation vector o t ∈ R 25 and a scalar reward value r t . The goal of the agent is to maximize the expected sum of total rewards R = ∑ 1000 t = 0 r t over an episode of 1000 environment interactions by selecting action a t ∈ R 7 calculated by the output of the policy network. The policy network of our agent consists of 256 recurrently connected LIF neurons, with a membrane decay time constant of 30 ms. The neuron placement is equally distributed into 16 Neuron Tiles to form a 7 x 7 Mosaic layout. We note that for simulation purposes, selecting a small network of 16 Neuron Tiles with 16 neurons each, while not optimal in terms of memory footprint (Eq. 5 . 2 ), was preferred to fit the large ES population within the constraints of single GPU memory capacities. At each time step, the observation vector o t is accumulated into the membrane voltages of the first 25 neurons of two upper left input tiles. Furthermore, action vector a t is calculated by reading the membrane voltages of the last seven neurons in the bottom right corner after passing tanh non-linearity.
We considered Evolutionary Strategies (ES) as an optimization method to adjust the RSNN weights such that after the training, the agent can successfully solve the environment with a policy network with only locally dense and globally sparse connectivity. We found ES particularly promising approach for hardware-aware training as (i) it is blind to non-differentiable hardware constraints e.g., spiking function, quantizated weights, connectivity patterns, and (ii) highly parallelizable since ES does not require spiking variable to be stored for thousand time steps compared to BPTT that explicitly calculates the gradient. In ES, the fitness function of an offspring is defined as the combination of total reward over an episode, R and realization cost of short and long-range connections LM (same as KWS task), such that F = R -λ LM . We used the population
size of 4096 (with antithetic sampling to reduce variance) and mutation noise standard deviation of 0 . 05 . At the end of each generation, the network weights with L 0-norm smaller than a fixed threshold are deterministically pruned. The agent is trained for 1000 generations.
## Calculation of memory footprint
We calculate the Mosaic architecture's Memory Footprint (MF) in comparison to a large crossbar array, in building small-world graphical models.
To evaluate the MF for one large crossbar array, the total number of devices required to implement any possible connections between neurons can be counted - allowing for any RSNN to be mapped onto the system. Setting N to be the number of neurons in the system, the total possible number of connections in the graph is MFref = N 2 .
For the Mosaic architecture, the number of RRAM cells (i.e., the MF) is equal to the number of devices in all the Neuron Tiles and Routing Tiles: MFmosaic = MFNeuronTiles + MFRoutingTiles .
Considering each Neuron Tile with k neurons, each Neuron Tile contributes to 5 k 2 devices (where the factor of 5 accounts for the four possible directions each tile can connect to, plus the recurrent connections within the tile). Evenly dividing the N total number of neurons in each Neuron Tile gives rise to T = ceil ( N / k ) required Neuron Tiles. This brings the total number of devices attributed to the Neuron Tile to T · 5 k 2 .
The problem can then be re-written as a function of the geometry. Considering Fig. 5 . 1 g, let i be an integer and ( 2 i + 1 ) 2 the total number of tiles. The number of Neuron Tiles can be written as T = ( i + 1 ) 2 , as we consider the case where Neuron Tiles form the outer ring of tiles. As a consequence, the number of Routing Tiles is R = ( 2 i + 1 ) 2 -( i + 1 ) 2 . Substituting such values in the previous evaluations of MFNeuronTiles + MFRoutingTiles and remembering that k < N · T , we can impose that MFMosaic = MFNeuronTiles + MFRoutingTiles < MFMFref .
The number of Routing Tiles that connects all the Neuron Tiles depends on the geometry of the Mosaic systolic array. Here, we assume Neuron Tiles assembled in a square, each with a Routing Tile on each side. We consider R to be the number of Routing Tiles with ( 4 k ) 2 devices in each. This brings the total number of devices related to Routing Tiles up to MFRoutingTiles = R · ( 4 k ) 2 .
This results in the following expression:
$$M F _ { M o s a i c } = M F _ { N e u r o n T i l e s } + M F _ { R o u t i n g T i l e s } < M F _ { r e f e r e n c e }$$
$$\begin{array} { l } ( i + 1 ) ^ { 2 } \, ( 5 k ^ { 2 } ) + [ ( 2 i + 1 ) ^ { 2 } - ( i + 1 ) ^ { 2 } ] \, ( 4 k ) ^ { 2 } < ( k \, ( i + 1 ) ^ { 2 } ) ^ { 2 } \end{array} \quad ( 5 . 2 )$$
This expression can then be evaluated for i , given a network size, giving rise to the relationships as plotted in Fig. 5 . 1 g in the main text.
<details>
<summary>Image 36 Details</summary>

### Visual Description
Icon/Small Image (54x82)
</details>
In this thesis, I explored implementing neural computation on mixed-signal hardware with in-memory computing capabilities. Neural computation (i.e., inference, learning and routing) is fundamentally memory-centric, and von Neumann architecture's separation of memory and computation is a major contributor to modern neural networks' overall energy consumption and latency. Based on this observation, we investigated offloading critical neural computations directly onto the raw analog dynamics of resistive memory technologies.
This thesis begins by addressing the challenge of online training of feedforward spiking neural networks implemented using intrinsically 1 -bit RRAM devices. In Chapter 2 , we presented a novel memristor programming technique to precisely control the filament formation in RRAMs devices, increasing their effective bit resolution for more stable training dynamics and improved performance. The versatility of this technique was further demonstrated by applying it to novel perovskite memristors in Chapter 4 , achieving an even more linear device response.
In Chapter 3 , we tackled the temporal credit assignment problem in RSNNs on an analog substrate. We developed a simulation framework based on a comprehensive statistical model of the PCM crossbar array, capturing major non-linearities of memristors. This framework enabled the simulation of device responses to various weight programming techniques designed to mitigate these non-idealities. Using this framework, we trained a RSNN with the e-prop local learning rule, demonstrating that gradient accumulation is crucial for accurately reflecting weight updates on memristive devices - a finding later validated on a real neuromorphic chip. While our solution, which relies on a digital coprocessor, incurs an energy cost, it motivates the development of future devices with intrinsic accumulation capabilities, potentially through conductance or charge storage. Additionally, we introduced PCM-trace, a scalable implementation of synaptic eligibility traces for local learning using the volatile regime of PCM devices.
In Chapter 4 , we presented a discovery of a novel memristor capable of switching between volatile and non-volatile modes. This reconfigurable memristor, based on halide perovskite nanocrystals, offers a significant advancement in emerging memory technologies, enabling the implementation of both static and dynamic neural parameters with the same material and fabrication technique.
Finally, in Chapter 5 , we introduced Mosaic, a memristive systolic architecture for in-memory computing and routing. Mosaic efficiently implements small-world graph connectivity, demonstrating superior energy efficiency in spike routing compared to other hardware platforms. Additionally, we introduced a hardware-layout aware training method that takes the physical layout of the chip into account while optimizing the neural network weights.
## Discussion and Outlook
I conclude by highlighting significant limitations of neural computation on analog substrates, which serve as a foundation for future research.
- 1 . Discovering efficient material and peripheral circuits Although our experiments demonstrate the potential of learning on analog substrates using mature RRAM and PCM technologies, significant challenges remain in achieving efficiency of digital accelerators. Limited bit resolution inherent to these devices necessitates energy-intensive gradient accumulation on a co-processor, hindering overall efficiency. As the analog device sizes shrink, this problem is expected to become more pronounced. While improving bit precision can enhance accuracy, it alone won't guarantee efficient analog learning. Additionally, the high WRITE operation energy of memristors needs to be lowered to match the low programming energies of SRAM. Finally, it's worth noting that scaling trends are primarily driven by peripheral circuitry, rather than the memory cells, necessitating area optimization on the periphery to achieve cost-effective fabrication. Future research should prioritize addressing these challenges, relatively in this order to achieve learning performances competitive with digital accelerators.
- 2 . Local learning in unconventional networks The co-design approach is driving neural architectures to be increasingly tailored to hardware for efficiency. For example quantization is adapted to reduce memory footprint, State Space Models (SSMs) leverage recurrency and improve arithmetic intensity (FLOPs/byte) in digital and spiking neurons convert analog membrane voltage to binary spikes and eliminate the need for ADC in analog. It seems reasonable to expect more such primitives to emerge potentially leveraging sparsity or stochasticity. This raises the question of how to design local learning rules for unconventional future architectures. Can a single learning rule provide an end-to-end solution, or can credit assignment be modularized for subgraphs? Recent work suggests that exploring such directions may yield valuable insights and unexpected performance tradeoffs [ 69 , 71 , 72 , 343 , 344 ].
- 3 . Mismatch between silicon and brain Biological plausibility and silicon efficiency represent distinct optimization goals. While silicon dynamics outpace neuronal dynamics in raw speed 1 , it falls short in matching the brain's power efficiency and parallelism. I argue that while mimicking the brain's intricacies may be scientifically interesting, it is not necessarily an ideal blueprint for either silicon efficiency or intelligence. Instead, focusing on silicon efficiency allows us to leverage the unique strengths of this substrate. This will naturally lead to design principles that are common, such as minimizing wiring and localizing computation, however, we are not bound by the specific artifacts of biological evolution. So it remains an open question which criteria from neuroscience should inform hardware design improvements.
- 4 . Remembering the Jevons Paradox While hardware-algorithm co-design aims to enhance the power efficiency of on-device intelligence, the Jevons Paradox suggests that this may inadvertently increase overall AI energy consumption due to increased demand [ 345 ]. This is a phenomenon occurred in the past, where increasing efficiency of coal engines led to more coal consumption. This raises important questions for future investigation about potential societal and governmental regulations to mitigate the environmental impact, or more exotic technological deployment strategies that keep carbon footprint of AI on the Earth minimal.
1 Electron mobility in transistors is at least 10 7 time faster than ion transport rate in neuron channels.
At the time of writing, learning on mixed-signal hardware with memristive weights hasn't reached the efficiency of digital accelerators. It seems probable that in the future the scene will evolve as the material design and fabrication technology quickly mature. Nevertheless, it will remain an exciting playground where intelligence meets the raw, unfiltered physics of computation.
<details>
<summary>Image 37 Details</summary>

### Visual Description
## Image: Isolated Letter "A"
### Overview
The image is a grayscale rendering of the uppercase letter "A". It appears to be a stylistic or decorative representation of the letter, possibly from a specific font.
### Components/Axes
* **Letter:** Uppercase "A"
* **Color:** Grayscale
### Detailed Analysis
The letter "A" is presented in a large, clear format. The font appears to be a serif font, with noticeable serifs at the ends of the strokes. The grayscale shading provides depth and dimension to the letter.
### Key Observations
* The image focuses solely on the letter "A".
* The grayscale color scheme is simple and clean.
### Interpretation
The image likely serves as a visual element, possibly for branding, design, or educational purposes. The choice of a serif font suggests a classic or traditional aesthetic. The simplicity of the image makes it versatile for various applications. There is no data or specific information beyond the visual representation of the letter "A".
</details>
We present here some additional results to complement the main results discussed in the previous sections.
appendix 1 : online training of spiking recurrent neural networks with phase -change memory synapses supplementary note 1 We implemented the PCM crossbar array simulation framework in PyTorch [ 107 ], which can be used for both the inference and the training of ANNs or SNNs. Built on top of the statistical model introduced by Nandakumar et al. [ 81 ], our crossbar model supports asynchronous SET, RESET and READ operations over entire crossbar structures and simultaneously keep tracks of the temporal evolution of device conductances.
A single crossbar array consists of P × Q nodes (each node representing a synapse), where single node has 2 N memristors arranged using the differential architecture ( N potentiation, N depression devices). Each memristor state is represented by four variables, t p for storing the last time the device is written (which is used to calculate the effect of the drift), count for counting how many times it has written (to be used later in the arbiter of N-memristor architectures), Pmem for its programming history (required by the PCM model) and G for representing the conductance of the the device at T 0 seconds later after the last programming time. The initial conductances of PCM devices in the crossbar array are assumed to be iterativelly programmed to HRS, sampled from a Normal distribution N ( µ = 0.1, σ = 0.01 ) µ S.
The PCM crossbar simulation framework supports three major functions: READ, SET and RESET. The READ function takes the pulse time of the applied READ pulse, t , and calculates the effect of drift based on the last programming time t p . Then, it adds the conductance-dependent READ noise and returns the conductance values of whole array. The SET function takes the timing information of the applied SET pulse, together with a mask of shape ( 2 × N × P × Q ) and calculates the effect of the application of a single SET pulse on the PCM devices that are selected with the mask. Finally, the RESET function initializes all the state variables of devices selected with a mask and initializes the conductances using a Normal distribution N ( µ = 0.1, σ = 0.01 ) µ S.
supplementary note 2 READ and WRITE operations to simulated PCM devices in the crossbar model are stochastic and subject to the temporal conductance drift. Additionally, PCM devices offer a very limited bit precision. Therefore, to ease the network training procedure, especially the hyperparameter tuning, we developed the perf-mode . When crossbar model is operated in perf-mode , all stochasticity sources and the conductance drift are disabled. READ operations directly access the device conductance without 1/ f noise and drift, whereas SET operations increase the device conductance as
$$G _ { N } = G _ { N - 1 } + \frac { G _ { M A X } } { 2 ^ { C B _ { R E S } } } , & & ( A . 1 )$$
where, GMAX is the maximum PCM conductance set to 12 µ S (conductivity boundaries are determined based on the device measurements from [ 81 ]), and CBRES is the desired bit-resolution of a single PCM device. In a nutshell, the perf-mode turns PCMs into an ideal memory cells corresponding to a digital memory with a limited bit precision.
supplementary note 3 Here, we demonstrate the impact of using multiple memristor devices per synapse (arranged in differential configuration) on the precision of targeted programming updates. Specifically, we modeled synapses with N = 1, 4, 8 PCM devices and programmed them from initial conditions of integer conductance values Gsource ∈ {-10, 10 } µ S to integer
<details>
<summary>Image 38 Details</summary>

### Visual Description
## Chart: PCM Model vs Perf Mode
### Overview
The image is a line graph comparing the performance of a "PCM Model" against "Perf Mode" over time. The graph shows the conductance (G) in microSiemens (µS) on the y-axis and time in seconds (s) on the x-axis. The PCM Model is represented by a fluctuating blue line, while Perf Mode is represented by a stepped orange line.
### Components/Axes
* **Title:** None explicitly given in the image.
* **X-axis:** Time (s), ranging from 0 to 800 seconds. Axis markers are present at intervals of 100 seconds.
* **Y-axis:** G (µS), representing conductance in microSiemens, ranging from 0 to 12. Axis markers are present at intervals of 2 µS.
* **Legend:** Located in the top-left corner.
* Blue line: PCM Model
* Orange line: Perf Mode
### Detailed Analysis
* **PCM Model (Blue Line):** The blue line generally increases over time, with fluctuations around each step.
* From 0 to 100s, the value fluctuates around 0.5 µS.
* From 100 to 200s, the value fluctuates around 1.8 µS.
* From 200 to 300s, the value fluctuates around 4.5 µS.
* From 300 to 400s, the value fluctuates around 5.5 µS.
* From 400 to 500s, the value fluctuates around 7.5 µS.
* From 500 to 600s, the value fluctuates around 8.5 µS.
* From 600 to 700s, the value fluctuates around 8.0 µS.
* From 700 to 800s, the value fluctuates around 9.0 µS.
* **Perf Mode (Orange Line):** The orange line increases in discrete steps over time.
* From 0 to 100s, the value is approximately 0 µS.
* From 100 to 200s, the value is approximately 3 µS.
* From 200 to 300s, the value is approximately 5 µS.
* From 300 to 400s, the value is approximately 7 µS.
* From 400 to 500s, the value is approximately 9 µS.
* From 500 to 600s, the value is approximately 11 µS.
* From 600 to 700s, the value is approximately 12 µS.
* From 700 to 800s, the value is approximately 12 µS.
### Key Observations
* The PCM Model exhibits more variability than Perf Mode.
* Perf Mode increases in distinct steps, while the PCM Model shows a more continuous, albeit fluctuating, increase.
* The PCM Model generally tracks the steps of Perf Mode but with added noise.
### Interpretation
The graph compares the conductance of a PCM Model to a Perf Mode over time. The Perf Mode appears to be a target or ideal performance level, increasing in discrete steps. The PCM Model attempts to follow this target, but its performance fluctuates around the target values. This suggests that the PCM Model is trying to achieve the performance levels set by the Perf Mode, but it is subject to variability or noise. The stepped nature of the Perf Mode could represent different operational states or target conductance levels, while the PCM Model's fluctuations could be due to factors such as environmental conditions, material properties, or control system limitations. The PCM Model never exceeds the Perf Mode.
</details>
- (a) Comparison of the full PCM model and its perf-mode equivalent after 8 consecutive SET pulses.
- (b) Comparison of the full PCM model and its perf-mode equivalent after 8 consecutive SET pulses, averaged over 300 measurements showing the effect of drift.
<details>
<summary>Image 39 Details</summary>

### Visual Description
## Chart: PCM Model vs Perf Mode
### Overview
The image is a line chart comparing the performance of a "PCM Model" and "Perf Mode" over time. The chart displays the conductance (G) in microSiemens (µS) on the y-axis against time in seconds (s) on the x-axis. The PCM Model is represented by a blue line, and the Perf Mode is represented by an orange line.
### Components/Axes
* **X-axis:** Time (s), ranging from 0 to 300 seconds.
* **Y-axis:** G (µS), ranging from 0 to 12 microSiemens.
* **Legend:** Located in the bottom-right corner.
* Blue line: PCM Model
* Orange line: Perf Mode
### Detailed Analysis
* **PCM Model (Blue Line):**
* Initially, the conductance increases rapidly in a step-wise fashion, mirroring the Perf Mode.
* Around 80 seconds, the conductance peaks at approximately 8.5 µS.
* After the peak, the conductance decreases gradually, stabilizing around 7 µS after 200 seconds.
* **Perf Mode (Orange Line):**
* The conductance increases in discrete steps from 0 to approximately 12 µS between 0 and 80 seconds.
* After reaching 12 µS, the conductance remains constant for the rest of the time period.
### Key Observations
* The Perf Mode reaches its maximum conductance much faster than the PCM Model stabilizes.
* The PCM Model exhibits a peak and subsequent decay in conductance, while the Perf Mode maintains a constant conductance after reaching its maximum.
* The PCM Model's initial increase mirrors the Perf Mode's stepped increase.
### Interpretation
The chart compares the conductance of two different modes, PCM Model and Perf Mode, over time. The Perf Mode quickly reaches and maintains a high conductance, while the PCM Model shows a more dynamic behavior with an initial increase, a peak, and a subsequent decay to a stable value. This suggests that the Perf Mode provides a faster and more stable conductance, while the PCM Model may have a more complex underlying mechanism that leads to the observed peak and decay. The initial step-wise increase in both modes suggests a common underlying process or input that drives the initial conductance increase. The difference in behavior after the initial increase indicates different mechanisms or limitations in the two modes.
</details>
F igure A. 1 : The PCM crossbar model supports both the full PCM model from [ 81 ] and its corresponding simplified version as an ideal digital memory in perf-mode .
<details>
<summary>Image 40 Details</summary>

### Visual Description
## Heatmap: Mean and Standard Deviation of Conductance
### Overview
The image presents two heatmaps side-by-side. The left heatmap displays the mean conductance with an error of 2.90, while the right heatmap shows the standard deviation (STD) of conductance. Both heatmaps have the same axes: "source conductance (µG)" on the vertical axis and "target conductance (µG)" on the horizontal axis, both ranging from -10 to 10. The color intensity in each heatmap represents the magnitude of the mean or standard deviation value.
### Components/Axes
**Left Heatmap (Mean):**
* **Title:** Mean, (Error: 2.90)
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Scale:** Ranges from approximately -7.5 (dark purple) to 10 (light orange).
**Right Heatmap (STD):**
* **Title:** STD
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Scale:** Ranges from approximately 0.0 (dark purple) to 3.5 (light orange).
### Detailed Analysis
**Left Heatmap (Mean):**
The heatmap shows the mean conductance values for different combinations of source and target conductance.
* **Trend:** The values generally increase as both source and target conductance increase. The lower-left corner (negative source and target conductance) has the lowest values (dark purple), while the upper-right corner (positive source and target conductance) has the highest values (light orange).
* **Specific Values:**
* (-10, -10): -8.9
* (10, 10): 7.9
* (-10, 10): -0.31
* (10, -10): -2.6
**Right Heatmap (STD):**
The heatmap shows the standard deviation of conductance values for different combinations of source and target conductance.
* **Trend:** The highest standard deviation values are concentrated along the diagonal, where source and target conductance are equal. The values decrease as you move away from the diagonal.
* **Specific Values:**
* (-10, -10): 0.01
* (10, 10): 0.01
* (-10, -9): 1.5
* (-9, -10): 1.5
* (0, 0): 0.000099
* (1, 1): 0.01
* (2, 2): 0.01
* (3, 3): 0.01
* (4, 4): 0.01
* (5, 5): 0.01
* (6, 6): 0.01
* (7, 7): 0.01
* (8, 8): 0.01
* (9, 9): 0.01
* (10, 10): 0.01
### Key Observations
* The mean conductance values are strongly influenced by both source and target conductance, with higher values observed when both are positive.
* The standard deviation is highest when the source and target conductances are equal, indicating greater variability in these conditions.
### Interpretation
The heatmaps provide insights into the relationship between source and target conductance. The mean heatmap suggests a direct correlation between the two, where increasing either leads to a higher average conductance. The STD heatmap highlights the variability in conductance, which is most pronounced when source and target conductances are matched. This could indicate that the system is more sensitive or unstable under these conditions. The error value of 2.90 associated with the mean suggests that the mean values should be interpreted with caution, as there is a significant level of uncertainty. The very low STD values along the diagonal suggest a high degree of precision when the target and source conductances are equal.
</details>
F igure A. 2 : Multi-memristor configuration with 1 PCM(one depression and one potentiation) per synapse
conductance values Gtarget ∈ {-10, 10 } µ S using the multi-memristor update scheme described in Section 3 . 1 . 3 . 2 . The effective conductance of a synapse is calculated by Gsyn = ∑ N i = 0 G + i -∑ N i = 0 G -i , however we normalized the conductance across 1 -PCM, 4 -PCM and 8 -PCM architectures for an easier comparison, such that Gsyn = 1 N ( ∑ N i = 0 G + i -∑ N i = 0 G -i ) .
Our empirical results verifies the claim of Boybat et al. [ 109 ] that the standard deviation and the update resolution of the write process decreases by √ N .
supplementary note 4 In the differential architectures, consecutive SET pulses applied to positive and negative memristors may cause the saturation of the synaptic conductance and block further updates. The saturation effect is more apparent when a single synapse gets 10 + updates in one direction (potentiation or depression) during the training. For example, this effect is clearly visible in SupplFig. A. 2 , Fig. A. 3 and Fig. A. 4 , when the source conductance and target conductances differ by more than 8 -10 µ S.
We implemented a weight update scheme denoted as the update-ready criterion, which aims to prevent conductance saturation while applying single large updates. Before doing the update, we read both positive and negative pair conductances, and check if the target update is possible. If not, we reset both devices, calculate the new target and apply the number of pulses accordingly. For example, given G + = 8 µ S and G -= 4 µ S and the targeted update + 6 µ S, the algorithm decides to reset both devices because G + can't be increased up to 14 µ S. After both devices are
<details>
<summary>Image 41 Details</summary>

### Visual Description
## Heatmap: Mean and Standard Deviation of Conductance
### Overview
The image presents two heatmaps side-by-side. The left heatmap displays the mean conductance values, while the right heatmap shows the standard deviation (STD) of conductance. Both heatmaps share the same axes, representing "source conductance" and "target conductance" in micro Siemens (µG). A color bar is present on the right of each heatmap to indicate the value range.
### Components/Axes
**Left Heatmap (Mean):**
* **Title:** Mean, (Error: 2.80)
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from -10.0 (dark purple) to 10.0 (yellow). The color bar has tick marks at -10.0, -7.5, -5.0, -2.5, 0.0, 2.5, 5.0, 7.5, and 10.0.
**Right Heatmap (STD):**
* **Title:** STD
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from 0.0 (dark purple) to 2.0 (yellow). The color bar has tick marks at 0.0, 0.5, 1.0, 1.5, and 2.0.
### Detailed Analysis
**Left Heatmap (Mean):**
The mean conductance values vary significantly across the heatmap.
* **Trend:** The values generally increase as both the source and target conductance increase. The lower-left corner (negative source and target conductance) shows the lowest values (around -9 to -4), while the upper-right corner (positive source and target conductance) shows the highest values (around 0.3 to 10).
* **Specific Values:**
* (-10, -10): -9.9
* (10, -10): -1.4
* (-10, 10): 0.3
* (10, 10): 9.9
* (0, 0): -0.0033
**Right Heatmap (STD):**
The standard deviation values are concentrated along the diagonal of the heatmap.
* **Trend:** The highest STD values are along the diagonal where source conductance equals target conductance. The values decrease as you move away from the diagonal.
* **Specific Values:**
* (-10, -10): 0.00
* (10, 10): 0.00
* (-10, -9): 0.83
* (-9, -10): 1.4
* (0, 0): 0.00
* (10, 9): 0.001
* (9, 10): 0.001
### Key Observations
* The mean conductance values are strongly influenced by both source and target conductance.
* The standard deviation is highest when source and target conductances are equal, indicating greater variability under these conditions.
### Interpretation
The heatmaps provide insights into the relationship between source and target conductance. The mean conductance heatmap suggests a positive correlation between both parameters. The STD heatmap indicates that the system is most variable when the source and target conductances are closely matched. This could be due to the system being more sensitive or unstable under these conditions. The error value of 2.80 associated with the mean conductance suggests that the mean values have a certain level of uncertainty.
</details>
F igure A. 3 : Multi-memristor configuration with 8 PCM (four depression and four potentiation) per synapse
<details>
<summary>Image 42 Details</summary>

### Visual Description
## Heatmap: Mean and Standard Deviation of Conductance
### Overview
The image presents two heatmaps side-by-side. The left heatmap displays the mean conductance values, while the right heatmap shows the standard deviation (STD) of conductance. Both heatmaps share the same axes, representing "source conductance" and "target conductance" in microSiemens (µG). The color intensity in each heatmap corresponds to the magnitude of the mean or STD value, as indicated by the color bars on the right side of each heatmap.
### Components/Axes
**Left Heatmap (Mean):**
* **Title:** Mean, (Error: 2.77)
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from -10.0 (dark purple) to 10.0 (light yellow).
**Right Heatmap (STD):**
* **Title:** STD
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from 0.0 (dark purple) to 2.00 (light yellow).
### Detailed Analysis
**Left Heatmap (Mean):**
The heatmap shows a clear diagonal trend. As both source and target conductance increase together, the mean conductance also increases.
* **Source Conductance -10 µG:**
* Target Conductance -10 µG: -9.3
* Target Conductance 0 µG: -3.4
* Target Conductance 10 µG: 0.79
* **Source Conductance 0 µG:**
* Target Conductance -10 µG: -3.3
* Target Conductance 0 µG: 0.4
* Target Conductance 10 µG: 7.1
* **Source Conductance 10 µG:**
* Target Conductance -10 µG: -0.34
* Target Conductance 0 µG: 1.4
* Target Conductance 10 µG: 9.0
**Right Heatmap (STD):**
The standard deviation is lowest along the diagonal where source and target conductance are equal. The STD increases as the difference between source and target conductance increases.
* **Source Conductance -10 µG:**
* Target Conductance -10 µG: 2.0
* Target Conductance 0 µG: 0.85
* Target Conductance 10 µG: 1.1
* **Source Conductance 0 µG:**
* Target Conductance -10 µG: 1.1
* Target Conductance 0 µG: 0.003
* Target Conductance 10 µG: 1.1
* **Source Conductance 10 µG:**
* Target Conductance -10 µG: 2.1
* Target Conductance 0 µG: 0.78
* Target Conductance 10 µG: 0.003
### Key Observations
* The mean conductance is strongly correlated with the target conductance when the source conductance is held constant.
* The standard deviation is minimized when the source and target conductances are equal.
* The error associated with the mean calculation is 2.77.
### Interpretation
The heatmaps illustrate the relationship between source and target conductance in terms of both the average value and the variability. The diagonal trend in the mean conductance heatmap suggests a linear relationship when the source and target conductances are similar. The STD heatmap indicates that the system is most stable (least variable) when the source and target conductances are closely matched. The increasing STD as the difference between source and target conductance grows implies that the system becomes less predictable or more sensitive to noise when these values diverge. The error value of 2.77 associated with the mean suggests a level of uncertainty in the mean conductance values.
</details>
-10.0
F igure A. 4 : Multi-memristor configuration with 16 PCM (eight depression and eight potentiation) per synapse
reset, G + can be programmed to 10 µ S). Although our PCM crossbar array simulation framework supports it, this weight transfer criterion is not used in our simulations because it requires reading the device states during the update.
F igure A. 5 : Update-ready criterion tested with N = 1 memristor per synapse.
<details>
<summary>Image 43 Details</summary>

### Visual Description
## Heatmap: Mean and Standard Deviation of Conductance
### Overview
The image presents two heatmaps side-by-side. The left heatmap displays the mean conductance values, while the right heatmap shows the standard deviation (STD) of conductance. Both heatmaps share the same axes, representing "source conductance" and "target conductance" in micrograms (µG). The color intensity in each heatmap corresponds to the magnitude of the mean or STD value, as indicated by the color bars on the right of each heatmap.
### Components/Axes
**Left Heatmap (Mean):**
* **Title:** Mean, (Error:0.22)
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from -7.5 (dark purple) to 8.4 (light orange).
**Right Heatmap (STD):**
* **Title:** STD
* **X-axis:** target conductance (µG), ranging from -10 to 10 in integer increments.
* **Y-axis:** source conductance (µG), ranging from -10 to 10 in integer increments.
* **Color Bar:** Ranges from 0.0009 (dark purple) to 3.4 (light orange).
### Detailed Analysis
**Left Heatmap (Mean):**
The values in the heatmap represent the mean conductance. The values are shown for each combination of source and target conductance.
* **Trend:** The mean conductance generally increases as both source and target conductance increase. The lower-left corner (negative source and target conductance) shows the lowest values (dark purple), while the upper-right corner (positive source and target conductance) shows the highest values (light orange).
* **Specific Values:**
* (-10, -10): -9.9
* (10, 10): 9.9
* (-10, 10): 8.4
* (10, -10): -8.4
* (0, 0): -2.7
**Right Heatmap (STD):**
The values in the heatmap represent the standard deviation of the conductance. The values are shown for each combination of source and target conductance.
* **Trend:** The standard deviation is highest along the diagonal (where source conductance equals target conductance) and decreases as you move away from the diagonal.
* **Specific Values:**
* (-10, -10): 0.01
* (10, 10): 0.01
* (-10, 9): 1.5
* (9, -10): 1.5
* (0, 0): 0.0009
### Key Observations
* **Mean Conductance:** The mean conductance is strongly correlated with both source and target conductance.
* **Standard Deviation:** The standard deviation is minimized when the source and target conductances are equal.
### Interpretation
The heatmaps illustrate the relationship between source and target conductance, showing how the mean and standard deviation of conductance vary with different combinations of these parameters. The mean conductance heatmap suggests a direct relationship between source/target conductance and the resulting mean conductance. The standard deviation heatmap indicates that the system is most stable (lowest standard deviation) when the source and target conductances are closely matched. The error associated with the mean is 0.22.
</details>
supplementary note 5 We have defined the task success criteria as MSE Loss < 0.1 based on visual inspection. Below in Fig A. 6 , some network performances are shown.
<details>
<summary>Image 44 Details</summary>

### Visual Description
## Line Graphs: E-prop vs Target Signal Matching
### Overview
The image contains six line graphs arranged in a 2x3 grid. Each graph displays two data series: "e-prop" (a solid blue line) and "target" (a dashed blue line). The graphs illustrate how well the "e-prop" signal matches the "target" signal over a "Step (ms)" range from 0 to 1000. Each graph also includes a "Loss" value in the top-right corner, indicating the error between the two signals.
### Components/Axes
* **X-axis:** "Step (ms)" - Ranges from 0 to 1000 in increments of 100.
* **Y-axis:** "Signal" - Ranges from -1.0 to 1.0 (or higher in some plots) in increments of 0.5.
* **Legend:** Located in the top-right corner of each graph.
* "e-prop": Solid blue line
* "target": Dashed blue line
* **Title:** One of the graphs has the title "Matching".
* **Loss:** A numerical value displayed in the top-right corner of each graph, indicating the loss or error.
### Detailed Analysis
**Top-Left Graph:**
* Loss: 0.03623
* e-prop: The solid blue line starts around 0.25, oscillates with several peaks and valleys, and ends around 0.25.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
**Top-Right Graph:**
* Loss: 0.08096
* e-prop: The solid blue line starts around 0.25, oscillates with several peaks and valleys, and ends around 0.25.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
**Middle-Left Graph:**
* Loss: 0.0988
* e-prop: The solid blue line starts around 0.0, oscillates with several peaks and valleys, and ends around 0.0.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
**Middle-Right Graph:**
* Title: Matching
* Loss: 0.1118
* e-prop: The solid blue line starts around 0.25, oscillates with several peaks and valleys, and ends around 0.25.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
**Bottom-Left Graph:**
* Loss: 0.1219
* e-prop: The solid blue line starts around -0.5, oscillates with several peaks and valleys, and ends around 0.0.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
**Bottom-Right Graph:**
* Loss: 0.1502
* e-prop: The solid blue line starts around 0.25, oscillates with several peaks and valleys, and ends around 0.25.
* target: The dashed blue line is a smoother oscillation, generally following the trend of the e-prop line.
### Key Observations
* The "Loss" values vary across the graphs, indicating different levels of matching between the "e-prop" and "target" signals.
* The "e-prop" lines appear more jagged and noisy compared to the smoother "target" lines.
* The "e-prop" signal attempts to approximate the "target" signal, but the accuracy varies.
### Interpretation
The graphs illustrate the performance of an "e-prop" model in approximating a "target" signal. The "Loss" value quantifies the error between the two signals, with lower values indicating better matching. The visual comparison of the "e-prop" and "target" lines reveals the model's ability to capture the general trend of the target signal, although with some discrepancies. The differences in "Loss" values across the graphs suggest that the model's performance varies depending on the specific signal being approximated. The "Matching" title on one of the graphs suggests that the goal is to achieve a close match between the two signals.
</details>
Step (ms)
Step (ms)
F igure A. 6 : Comparison of network performances with six different loss values.
F igure A. 7 : Mean firing rate of 50 networks with PCM synapses trained using the mixed-precision method.
<details>
<summary>Image 45 Details</summary>

### Visual Description
## Line Chart: Mean Firing Rate vs. Epoch
### Overview
The image is a line chart showing the relationship between the mean firing rate (in Hz) and the epoch number. The chart displays a single data series with a shaded area around the line, presumably representing the standard deviation or confidence interval. The firing rate initially increases, then decreases rapidly before stabilizing at a lower level.
### Components/Axes
* **X-axis:** Epoch, with tick marks at approximately 50, 100, 150, 200, and 250.
* **Y-axis:** Mean Firing Rate (Hz), with tick marks at 0, 5, 10, 15, and 20.
* **Data Series:** A single blue line representing the mean firing rate. A light blue shaded area surrounds the line, indicating variability.
* **Legend:** There is no explicit legend, but the single line is understood to represent the "Mean Firing Rate."
### Detailed Analysis
* **Data Series Trend:** The blue line starts at approximately 6 Hz, rises to a peak of around 14.5 Hz, then rapidly declines to approximately 3 Hz, where it stabilizes and fluctuates slightly.
* **Data Points:**
* Epoch 0: Approximately 6 Hz
* Epoch 25: Approximately 14.5 Hz
* Epoch 50: Approximately 9.5 Hz
* Epoch 75: Approximately 6 Hz
* Epoch 100: Approximately 4 Hz
* Epoch 150: Approximately 3 Hz
* Epoch 200: Approximately 3 Hz
* Epoch 250: Approximately 3 Hz
* Epoch 275: Approximately 3.5 Hz
* **Shaded Area:** The light blue shaded area is widest at the beginning of the chart (around Epoch 0-50), indicating higher variability in the firing rate during the initial epochs. The shaded area narrows as the epoch number increases, suggesting that the firing rate becomes more stable over time.
### Key Observations
* The firing rate peaks early in the process (around Epoch 25) and then decreases significantly.
* The firing rate stabilizes after approximately 100 epochs.
* The variability in the firing rate is higher during the initial epochs.
### Interpretation
The chart suggests that the system or process being measured experiences an initial burst of activity (high firing rate) that then settles down to a more stable, lower level. The higher variability in the early epochs could indicate a period of learning, adaptation, or instability before the system reaches a steady state. The data demonstrates a clear trend of decreasing firing rate over time, with a stabilization point around epoch 100. This could represent a biological process, a machine learning algorithm converging, or any system that exhibits an initial surge followed by a period of consolidation.
</details>
F igure A. 8 : MSE loss of 50 networks trained with PCM synapses using the mixed-precision method.
<details>
<summary>Image 46 Details</summary>

### Visual Description
## Chart: Training Loss
### Overview
The image is a line chart showing the training loss over training steps. The chart displays a decreasing trend in loss, indicating that the model is learning. A shaded area around the line represents the uncertainty or variance in the loss.
### Components/Axes
* **Title:** Training loss
* **X-axis:** Training step
* Scale: 0 to 5000, with markers at 0, 1000, 2000, 3000, 4000, and 5000.
* **Y-axis:** Loss
* Scale: 0.10 to 0.30, with markers at 0.10, 0.15, 0.20, 0.25, and 0.30.
* **Data Series:**
* A blue line represents the training loss.
* A light blue shaded area surrounds the blue line, indicating the variance or uncertainty in the loss.
### Detailed Analysis
* **Training Loss (Blue Line):**
* The line starts at approximately 0.24 at training step 0.
* It increases to approximately 0.28 by training step 250.
* The line then generally decreases, with some fluctuations, to approximately 0.09 at training step 5000.
* At training step 1000, the loss is approximately 0.25.
* At training step 2000, the loss is approximately 0.20.
* At training step 3000, the loss is approximately 0.18.
* At training step 4000, the loss is approximately 0.13.
* At training step 5000, the loss is approximately 0.09.
* **Uncertainty (Light Blue Shaded Area):**
* The shaded area represents the range of possible loss values around the mean loss (blue line).
* The width of the shaded area varies, indicating different levels of uncertainty at different training steps.
* The uncertainty appears to decrease as the training progresses.
### Key Observations
* The training loss generally decreases over the training steps, indicating that the model is learning.
* There are fluctuations in the loss, suggesting that the learning process is not perfectly smooth.
* The uncertainty in the loss decreases as the training progresses, suggesting that the model is becoming more confident in its predictions.
### Interpretation
The chart demonstrates the training process of a machine learning model. The decreasing training loss indicates that the model is learning to make better predictions. The fluctuations in the loss may be due to the stochastic nature of the training process or to changes in the data distribution. The decreasing uncertainty suggests that the model is becoming more robust and less sensitive to noise in the data. The overall trend suggests that the model is converging to a good solution.
</details>
## supplementary note 6
## appendix 2 : mosaic : in -memory computing and routing for small -world spike -based neuromorphic systems
<details>
<summary>Image 47 Details</summary>

### Visual Description
## Heatmap: Watts Strogatz vs. Newman Watts Strogatz Graphs
### Overview
The image presents two heatmaps comparing the Watts Strogatz graph and the Newman Watts Strogatz graph. The heatmaps display values for different combinations of 'k' (degree) and 'n' (number of nodes). The color intensity represents the magnitude of the value, with darker shades indicating lower values and lighter shades indicating higher values.
### Components/Axes
**Left Heatmap (Watts Strogatz Graph):**
* **Title:** Watts Strogatz Graph
* **Y-axis:** 'k' (degree) with values 8, 16, 32, and 64.
* **X-axis:** 'n' (number of nodes) with values 128, 256, 512, and 1024.
* **Color Scale:** Ranges from dark blue (low values) to light yellow (high values), with a scale from approximately 0 to 120.
**Right Heatmap (Newman Watts Strogatz Graph):**
* **Title:** Newman Watts Strogatz Graph
* **Y-axis:** 'k' (degree) with values 8, 16, 32, and 64.
* **X-axis:** 'n' (number of nodes) with values 128, 256, 512, and 1024.
* **Color Scale:** Ranges from dark blue (low values) to light yellow (high values), with a scale from approximately 0 to 80.
### Detailed Analysis
**Left Heatmap (Watts Strogatz Graph):**
* **k = 8:**
* n = 128: 15.0
* n = 256: 31.0
* n = 512: 63.0
* n = 1024: 127.0
* Trend: Values increase significantly as 'n' increases.
* **k = 16:**
* n = 128: 7.0
* n = 256: 15.0
* n = 512: 31.0
* n = 1024: 63.0
* Trend: Values increase as 'n' increases.
* **k = 32:**
* n = 128: 3.0
* n = 256: 7.0
* n = 512: 15.0
* n = 1024: 31.0
* Trend: Values increase as 'n' increases.
* **k = 64:**
* n = 128: 1.0
* n = 256: 3.0
* n = 512: 7.0
* n = 1024: 15.0
* Trend: Values increase as 'n' increases.
**Right Heatmap (Newman Watts Strogatz Graph):**
* **k = 8:**
* n = 128: 10.0
* n = 256: 20.8
* n = 512: 41.8
* n = 1024: 84.5
* Trend: Values increase significantly as 'n' increases.
* **k = 16:**
* n = 128: 4.4
* n = 256: 9.7
* n = 512: 20.4
* n = 1024: 41.6
* Trend: Values increase as 'n' increases.
* **k = 32:**
* n = 128: 1.7
* n = 256: 4.3
* n = 512: 9.6
* n = 1024: 20.4
* Trend: Values increase as 'n' increases.
* **k = 64:**
* n = 128: 0.3
* n = 256: 1.7
* n = 512: 4.3
* n = 1024: 9.7
* Trend: Values increase as 'n' increases.
### Key Observations
* In both heatmaps, the values increase as 'n' increases for a given 'k'.
* In both heatmaps, the values decrease as 'k' increases for a given 'n'.
* The Watts Strogatz graph (left) generally has higher values than the Newman Watts Strogatz graph (right) for the same 'k' and 'n' values.
* The most significant increase in values occurs when 'n' increases from 512 to 1024, especially for lower 'k' values.
### Interpretation
The heatmaps illustrate the relationship between the degree ('k') and the number of nodes ('n') in Watts Strogatz and Newman Watts Strogatz graphs. The data suggests that as the number of nodes increases, the measured value (likely related to network properties like average path length or clustering coefficient) also increases. Conversely, as the degree increases, the measured value decreases.
The difference in values between the two graph types indicates that the Newman Watts Strogatz model results in lower values compared to the Watts Strogatz model for the same parameters. This could be due to differences in how the graphs are constructed or how the network properties are calculated.
The observed trends highlight the importance of both degree and network size in determining the characteristics of these small-world network models. The steeper increase in values for larger 'n' suggests that network size has a more pronounced effect on the measured property than the degree.
</details>
n
n
F igure A. 9 : The heatmaps show the ratio of zero elements to non-zero elements in the connectivity matrix for two examples of recurrently connected small-world graph generators. As n (number of nodes, e.g., neurons, in the graph) increases and k (number of neighbour nodes for each node in a ring topology) decreases, the more connections in the connectivity matrix will be zero, indicating the increased proportion of non-used memory elements in a n × n crossbar array.
supplementary note 1 Figure A. 9 quantifies the under-utilization of conventional crossbar arrays while storing example small-world connectivity patterns generated by two standard random graph generation models: Watts-Strogatz small-world graphs [ 279 ] and Newman-WattsStrogatz small-world graphs [ 346 ]. The first type of graphs is characterized by a high degree of local clustering with short vertex-vertex distances, observed in neural networks and selforganizing systems, whereas the latter type mostly captures the properties of lattices associated with statistical physics.
supplementary note 2 To communicate the events between the computing nodes in neuromorphic chips, AER communication scheme has been developed and used [ 308 ]. In AER, whenever a spiking neuron in a chip (or module) generates a spike, its 'address' (or any given ID) is written on a high-speed digital bus and sent to the receiving neuron(s) in one (or more) receiver module(s). In general, AER processing modules require at least one AER input port and one AER output port. As neuromorphic systems scale up in size, complexity, and functionality, researchers have been developing more complex and smarter AER 'variations' to maintain the efficiency, reconfigurability, and reliability of the ever-growing target systems they want to build. The scheme that is used to transport events can be source or destination based, where the source or destination address is embedded in the sent event 'packet'. In the source-based scheme, each receiving neuron has a local CAM that stores the address of all the neurons that are connected to it. In the destination-based approach, each event hops between the nodes where its address gets compared to the node's address until it matches and thus gets delivered. Source-driven routing provides the designer with more freedom to balance event traffic and design routes, but the hardware complexity increases the delays. Destination-based creates pre-determined routes along the network and the designer can only change the output ports [ 347 ]. In summary, in source-based routing, the system requires a CAM memory per neuron, which results in an increase in the area and memory access read times. In destination-based routing, the configurability in the network structure is reduced. Comparatively, in the Mosaic, the routers are memory crossbars that are distributed between the computing cores and steer the spiking information in the mesh. Thus, neither local CAMs, nor a centralized memory is required for routing.
F igure A. 10 : (top) Different random graphs generated using Mosaic model, changing the probability of devices being in their High Conductive state in the neuron tile ( pn ) and routing tile ( pr ). (bottom) The probability of device switching is a function of the voltage applied to it while being programmed.
<details>
<summary>Image 48 Details</summary>

### Visual Description
## Compound Image: Network Diagrams and Performance Charts
### Overview
The image presents a compound figure comprising two network diagrams and two performance charts related to memristor behavior. The network diagrams illustrate connectivity patterns with different probabilities, while the charts depict SET probability versus SET voltage and high resistive state versus absolute reset voltage.
### Components/Axes
**Top-Left: Network Diagram 1**
* Nodes: Represented by green circles.
* Edges: Represented by blue lines connecting the nodes.
* Text: "p<sub>n</sub> = 0.05, p<sub>r</sub> = 0.5" located at the bottom-right of the diagram.
**Top-Right: Network Diagram 2**
* Nodes: Represented by green circles.
* Edges: Represented by blue lines with arrowheads, indicating directionality, connecting the nodes.
* Text: "p<sub>n</sub> = 0.25, p<sub>r</sub> = 0.05" located at the bottom-right of the diagram.
**Bottom-Left: SET Probability Chart**
* X-axis: "SET Voltage (V)", ranging from 0.6 to 1.4 in increments of 0.2.
* Y-axis: "SET Probability", ranging from 0.0 to 1.0 in increments of 0.5.
* Data Series:
* Cyan with '+': "100ns Square"
* Green with '+': "500ns Square"
* Red with '+': "10µs Square"
* Purple with '+': "10µs Ramp"
**Bottom-Right: High Resistive State Chart**
* X-axis: "Absolute Reset Voltage (V)", ranging from 1.5 to 3.0 in increments of 0.5.
* Y-axis: "High Resistive State (Ω)", logarithmic scale ranging from 100k to 1G.
* Data Series:
* Light Blue with Square Markers: "Mean"
* Black Dashed Line: "1σ"
* Brown Dashed Line: "2σ"
### Detailed Analysis
**Top-Left: Network Diagram 1**
* The diagram shows a network with a high degree of connectivity.
* The nodes are densely interconnected.
* p<sub>n</sub> = 0.05 and p<sub>r</sub> = 0.5.
**Top-Right: Network Diagram 2**
* The diagram shows a network with a lower degree of connectivity compared to the top-left diagram.
* The connections are directional, indicated by arrowheads.
* p<sub>n</sub> = 0.25 and p<sub>r</sub> = 0.05.
**Bottom-Left: SET Probability Chart**
* All data series show a sigmoidal trend, indicating a transition from low to high SET probability as the SET voltage increases.
* The "10µs Ramp" (purple) and "10µs Square" (red) curves are nearly overlapping and reach a SET probability of approximately 1.0 at a SET voltage of around 1.2V.
* The "500ns Square" (green) curve reaches a SET probability of approximately 1.0 at a SET voltage of around 1.3V.
* The "100ns Square" (cyan) curve reaches a SET probability of approximately 1.0 at a SET voltage of around 1.4V.
* At 0.8V, the 100ns Square has a probability of approximately 0.05.
* At 1.0V, the 100ns Square has a probability of approximately 0.3.
* At 1.2V, the 100ns Square has a probability of approximately 0.8.
**Bottom-Right: High Resistive State Chart**
* The "Mean" (light blue with square markers) shows an increasing trend of high resistive state with increasing absolute reset voltage.
* At 1.5V, the Mean is approximately 200kΩ.
* At 2.0V, the Mean is approximately 2MΩ.
* At 2.5V, the Mean is approximately 20MΩ.
* At 3.0V, the Mean is approximately 100MΩ.
* The "1σ" (black dashed line) and "2σ" (brown dashed line) represent the standard deviation bands around the mean. The spread increases with voltage.
### Key Observations
* The network diagrams show different connectivity patterns based on the probabilities p<sub>n</sub> and p<sub>r</sub>.
* The SET probability increases with SET voltage for all pulse types.
* The high resistive state increases with absolute reset voltage.
* The spread of the high resistive state, as indicated by the standard deviation bands, increases with voltage.
### Interpretation
The data suggests that the SET probability is highly dependent on the applied SET voltage and the pulse type. Longer pulse durations (10µs) result in a faster transition to a high SET probability compared to shorter pulse durations (100ns). The high resistive state of the memristor is also dependent on the absolute reset voltage, with higher voltages leading to higher resistance values. The increasing spread in the high resistive state at higher voltages indicates greater variability in the reset process. The network diagrams likely represent different configurations or states of the memristor network, with varying degrees of connectivity influencing the overall device behavior. The parameters p<sub>n</sub> and p<sub>r</sub> likely represent probabilities related to network connectivity or state transitions.
</details>
F igure A. 11 : Mosaic connectivity example, formed by setting the probability of connection within Neuron Tile ( pNT ) and Routing Tiles ( pRT ). (left) Densely connected Mosaic composed of 2 Neuron Tiles and 1 Routing Tile. The graph related to its connectivity is shown as well adjacency matrix. (right) Sparsely connected Mosaic. The graph is programmed to favor the intraNeuton Tile connectivity and allow for two clusters to emerge, penalizing connections between the two clusters.
<details>
<summary>Image 49 Details</summary>

### Visual Description
## Neural Network Diagram: Connectivity and Adjacency Matrices
### Overview
The image presents two sets of diagrams illustrating neural network connectivity. Each set includes a schematic of a physical neuron array, a graph representing the network's connectivity, and a heatmap visualizing the adjacency matrix. The left set represents a network with higher connection probabilities (PNT=0.75, PRT=0.6), while the right set represents a network with lower connection probabilities (PNT=0.30, PRT=0.05).
### Components/Axes
**Left Side:**
* **Neuron Array Schematic:** A grid of interconnected nodes, with input/output connections labeled 1-4 and 5-8.
* **Connectivity Graph:** A graph showing connections between neurons 1-8. The thickness of the lines indicates the strength or frequency of connection.
* **Adjacency Matrix (Heatmap):** A matrix where rows and columns represent neuron IDs (1-8). Blue squares indicate a connection between the corresponding neurons, while light squares indicate no connection.
* X-axis: Neuron ID (1-8)
* Y-axis: Neuron ID (1-8)
* **Connection Probabilities:** PNT=0.75, PRT=0.6 (PNT and PRT are not defined in the image, but are assumed to be connection probabilities)
**Right Side:**
* **Neuron Array Schematic:** Identical to the left side.
* **Connectivity Graph:** A graph showing connections between neurons 1-8, but with fewer connections compared to the left side.
* **Adjacency Matrix (Heatmap):** A matrix where rows and columns represent neuron IDs (1-8). Blue squares indicate a connection between the corresponding neurons, while light squares indicate no connection.
* X-axis: Neuron ID (1-8)
* Y-axis: Neuron ID (1-8)
* **Connection Probabilities:** PNT=0.30, PRT=0.05
### Detailed Analysis
**Left Side (High Connectivity):**
* **Connectivity Graph:** Neurons 3, 6, and 7 are highly interconnected. Neuron 7 has the most connections.
* **Adjacency Matrix:**
* Row 1: Connections to neurons 2, 3, 6, and 7.
* Row 2: Connections to neurons 1, 3, 5, 7, and 8.
* Row 3: Connections to neurons 1, 2, 4, 6, 7, and 8.
* Row 4: Connections to neurons 3, 5, 6, and 7.
* Row 5: Connections to neurons 2, 4, 6, 7, and 8.
* Row 6: Connections to neurons 1, 3, 4, 5, 7, and 8.
* Row 7: Connections to neurons 1, 2, 3, 4, 5, 6, and 8.
* Row 8: Connections to neurons 2, 3, 5, 6, and 7.
**Right Side (Low Connectivity):**
* **Connectivity Graph:** Neurons 3, 6, and 8 are interconnected.
* **Adjacency Matrix:**
* Row 1: Connections to neurons 2 and 3.
* Row 2: Connections to neurons 1 and 3.
* Row 3: Connections to neurons 1, 2, 6, and 8.
* Row 4: Connection to neuron 3.
* Row 5: Connection to neuron 8.
* Row 6: Connections to neurons 3, 7, and 8.
* Row 7: Connection to neuron 6.
* Row 8: Connections to neurons 3, 5, and 6.
### Key Observations
* The left side demonstrates a highly connected network, while the right side shows a sparsely connected network.
* The adjacency matrices visually confirm the connectivity patterns observed in the graphs.
* The connection probabilities (PNT and PRT) significantly influence the network's connectivity.
### Interpretation
The image illustrates how varying connection probabilities (PNT and PRT) affect the structure and connectivity of a neural network. Higher probabilities result in a densely connected network with numerous connections between neurons, as seen on the left. Lower probabilities lead to a sparse network with fewer connections, as seen on the right. The adjacency matrices provide a clear visual representation of these connectivity patterns, allowing for a quantitative analysis of the network's structure. The connectivity graphs provide a more intuitive representation of the network's structure. The combination of these representations provides a comprehensive understanding of the network's connectivity properties.
</details>
supplementary note 3 Routing tiles define the connectivity of spiking neural networks implemented on Mosaic. When the number of memristive devices in the routing tiles which are in their high-conductive state (HCS) is not sparse, Mosaic resembles a densely connected neural network (Fig. A. 10 , top left). When most of the memristor in the routing tiles are in the low-conductance state, Mosaic is sparsely connected (Fig. A. 10 , top right). Furthermore, one can further sparsify Mosaic networks by setting memristors in the neuron tiles to the LCS. To do so, we can change the probability of memristors being in their HCS in the neuron tiles, pn , and in the
routing tiles, pr . The switching of the RRAMs presents the property of probabilistic switching as a function of the voltage applied during the programming operation as is shown in Fig. A. 10 , bottom.
Fig. A. 11 shows the construction of two graph topologies, made of 2 Neuron Tiles and one Routing Tile, to clarify the formation of the graphical structure in the Mosaic. By controlling the probability of connections within the Neuron and Routing Tiles, we can produce a densely connected graph (left) with pNT = 0.75, pRT = 0.6, and a sparse graph (right) with pNT = 0.30, pRT = 0.05.
The corresponding connectivity matrix is also shown in the figure, which is directly represented as a hardware architecture in the 3 tiles of the Mosaic, as shown in the figure.
F igure A. 12 : (Neuron tiles (green) transfer information in the form of spikes to each other through routing tiles (blue). Details of the Mosaic architecture is shown with the size of the neuron and routing tiles. The neuron tiles receive feed-forward input from four directions of North (N), East (E), West (W), and South (S), and local recurrent input from the neurons in the tile. The neurons integrate the information and once spike, send their output to 4 directions. Having 4 neurons in a tile, gives rise to 16 outputs ( 4 outputs copied in 4 directions), and 20 inputs ( 4 inputs from 4 directions ( 16 ), plus 4 recurrent inputs). The routing tiles receive 16 inputs ( 4 inputs from 4 directions) and send out 16 outputs ( 4 outputs in 4 directions). In the crossbars, the red squares and black squares represent devices in their high conductive and low conductive state, respectively. The connection between the neuron tile and the routing tile is directly through a wire. For instance, V out < 3:0 > is the same as the Vin , W , and V in,E < 3:0 > is the same as Vout , W .
<details>
<summary>Image 50 Details</summary>

### Visual Description
## Diagram: Interconnected Processing Array
### Overview
The image depicts a schematic of an interconnected processing array, showcasing the data flow between processing elements. It illustrates the input and output voltage signals (V_in and V_out) for individual processing units and the overall array structure. The diagram highlights the connectivity and data routing within the array.
### Components/Axes
* **Processing Elements (PEs):** The array consists of multiple processing elements arranged in a grid-like structure. Each PE is represented by a square block, colored either light blue or light green.
* **Input Voltage Signals (V_in):** Input voltage signals are labeled as V_in, with directional subscripts (N, E, W, S) indicating the direction from which the signal originates (North, East, West, South). Each V_in signal is a 4-bit vector, denoted by "<3:0>".
* **Output Voltage Signals (V_out):** Output voltage signals are labeled as V_out, with directional subscripts (N, E, W, S) indicating the direction in which the signal is sent. Each V_out signal is a 4-bit vector, denoted by "<3:0>".
* **Internal Structure of PEs:** Two zoomed-in views of the internal structure of the PEs are shown at the top of the diagram. These views illustrate the arrangement of internal components (represented by black and red squares) and the routing of signals within the PE.
* **Array Dimensions:** The array is composed of blocks of "20 x 4 (in 4 dir.)" and "16 x 16" processing elements.
* **Arrows:** Arrows indicate the direction of data flow between processing elements. Blue arrows represent data flow in one direction, while green arrows represent data flow in another direction.
### Detailed Analysis
**Zoomed-In PE Structures:**
* **Left PE Structure:** This structure shows a 4x4 grid of interconnected components. The input signals V_in,N, V_in,E, V_in,W, and V_in,S enter the PE. Red squares are interspersed among black squares. Four output signals, V_out<0>, V_out<1>, V_out<2>, and V_out<3>, are generated by four triangle-shaped components.
* **Right PE Structure:** This structure also shows a grid of interconnected components. The input signals V_in,N, V_in,E, V_in,W, and V_in,S enter the PE. Red squares are interspersed among black squares. The output signals V_out,N, V_out,E, V_out,W, and V_out,S exit the PE.
**Array Interconnection:**
* The processing elements are interconnected in a grid-like fashion.
* Arrows indicate the direction of data flow between adjacent PEs.
* The array appears to be expandable, as indicated by the ellipsis (...) symbols at the edges.
### Key Observations
* The diagram emphasizes the data flow and connectivity within the processing array.
* The use of directional subscripts for V_in and V_out signals indicates a structured data routing scheme.
* The internal structure of the PEs suggests a complex processing capability within each element.
### Interpretation
The diagram illustrates a parallel processing architecture where data is processed and routed between interconnected processing elements. The use of 4-bit voltage signals suggests a digital processing approach. The grid-like arrangement and directional data flow indicate a systolic array-like architecture, where data is pumped through the array in a synchronized manner. The different colors of the processing elements (light blue and light green) might indicate different types of processing units or different states within the array. The diagram provides a high-level overview of the array's structure and data flow, highlighting its potential for parallel processing applications.
</details>
supplementary note 4 Figure A. 12 shows the details of the Mosaic architecture, with a zoomed in neuron and routing tile pair. The diagram in the top shows how one cluster of neuron/one router sends and receives information to and from the routing/neuron tile. This
highlights the strength of this architecture which makes the connectivity easy through simple wiring to the neighbour, without suffering from long wires, as the maximum length of a wire is the size of the wire from one row/column, plus the size of the connecting column/row.
F igure A. 13 : Schematic of the neuron tile including the CMOS synapse and neuron circuits fabricated for use in this paper. RRAMs are used as the weights of the neurons. On the arrival of any of the input events Vin < i > , the amplifier pins node Vx to Vtop and thus a read voltage equivalent to Vtop -Vbot is applied across Gi , giving rise to current i in at M 1 . This current is mirrored to M 2 giving rise to i buf f which is in turn again mirrored through the M 3 -M 4 transistor pair. The 'synaptic dynamics' circuit is the Differential Pair Integrator (DPI) [ 311 ]. On the arrival of any of the input events, Vi , 0 < i < n , Iw , equivalent to i buf f , flows in transistor M 5. Depending on the value on Vg , a portion of Iw flows out of the MOS capacitor M 6 and discharges it. This current is proportional to Gi , 0 < i < n . As soon as the event is gone, MOS capacitor M 6 charges back through the M 8 path with current Itau , which determines the rate of charging, and thus the time constant of the synaptic dynamics. The output current of the DPI synapse, Isyn , is injected into the neuron's membrane potential node, Vmem , and charges MOS capacitor M 13 . There is also an alternative path with a DC current input through M 17 which can charge neuron's membrane potential. Membrane potential charging has a time constant determined by Vlk at the gate of M 11 . As soon as the voltage developed on Vmem passes the threshold of the following inverter stage, it generates a pulse. The width of the pulse, depends on the delay of the feedback path from Vout to the gate of M 12 . This delay is determined by the inverter delays, and the refractory time constant. The inverter symbols with the horizontal dashed lines correspond to a starved inverter circuits with longer delays. The refractory period time constant depends on the MOS cap M 16 and the bias on Vrp .
<details>
<summary>Image 51 Details</summary>

### Visual Description
## Circuit Diagram: Memristor Crossbar Readout Circuit
### Overview
The image presents a detailed circuit diagram of a memristor crossbar readout circuit. It includes several key components: a memristor crossbar array, a buffer stage, a write pulse generator, a sense amplifier, and an output stage. The diagram illustrates the flow of signals and the interconnections between these components.
### Components/Axes
* **Memristor Crossbar Array (Left):**
* Memristors labeled as G0, G1, ..., GN.
* Input voltages: Vin<0>, Vin<1>, ..., Vin<N> at the bottom.
* Voltage Vx at the top of the memristors.
* Voltage Vbot at the bottom of the memristors.
* Current iin flowing through transistor M1.
* Voltage Vtop at the output of the op-amp.
* **Buffer Stage (Green Box):**
* Transistors M2 and M3.
* Output current ibuff.
* **Write Pulse Generator (Gray Box):**
* Transistors M4, M5, M6, M7, M8, M9, and M10.
* Input voltages: Vin<0>, Vin<1>, ..., Vin<N>.
* Voltage Vtau.
* Voltage Vg.
* Write voltage Vw.
* Output current Isyn.
* A pulse waveform is shown above the gray box.
* **Sense Amplifier and Output Stage (Blue Box):**
* Transistors M11, M12, M13, M14, M15, M16, and M17.
* Voltage Vdc.
* Voltage Vmem.
* Voltage Vik.
* Voltage Vrp.
* Output voltage Vout.
* A series of inverters are present in the middle of the blue box.
* A waveform is shown above the blue box, representing Vout.
### Detailed Analysis
* **Memristor Crossbar Array:** The array consists of N+1 memristors, each connected to a transistor. The input voltages Vin<0> to Vin<N> control the transistors, allowing current to flow through selected memristors.
* **Buffer Stage:** The buffer stage, comprising transistors M2 and M3, provides isolation and amplification for the signal coming from the memristor array. The output current ibuff is fed into the write pulse generator.
* **Write Pulse Generator:** This circuit generates a write pulse based on the input voltages and the state of the memristors. The NOR gate combines the input voltages. The transistors M4 through M10 control the timing and amplitude of the write pulse.
* **Sense Amplifier and Output Stage:** The sense amplifier detects the current flowing through the memristor array and converts it into a voltage signal. The inverters amplify and shape the signal. The transistors M14 through M16 form an output buffer to drive the output voltage Vout.
### Key Observations
* The circuit is designed to read out the state of a memristor crossbar array.
* The write pulse generator is used to program the memristors.
* The sense amplifier and output stage provide a clean and amplified output signal.
### Interpretation
The circuit diagram illustrates a complete system for reading and writing data to a memristor crossbar array. The memristor array stores data as resistance values. The buffer stage isolates the array from the write pulse generator. The write pulse generator applies appropriate voltages to program the memristors. The sense amplifier and output stage convert the resistance values into a readable voltage signal. The circuit is designed for non-volatile memory applications, where data is stored even when power is off.
</details>
F igure A. 14 : Measurements from fabricated neuron's output frequency as a function of the input DC voltage. The DC voltage is applied at the gate of transistor M 17 shown in Fig. A. 13 as Vdc . Therefore, as the gate voltage of M 17 changes linearly, the current of M 17 and thus the output frequency of the neuron changes non-linearly. Each curve is measured with a different neuron's time constant, determined by a different voltage, Vlk , on the gate of transistor M 11 in Fig. A. 13 . As the leak voltage increases, the neuron's time constant decreases, giving rise to a lower output frequency.
<details>
<summary>Image 52 Details</summary>

### Visual Description
## Line Chart: Output Frequency vs. Vdc for Different Vlk Values
### Overview
The image is a line chart showing the relationship between output frequency (in kHz) and Vdc (in mV) for various values of Vlk (also in mV). The chart displays multiple lines, each representing a different Vlk value, allowing for a comparison of how Vdc affects output frequency at different Vlk settings.
### Components/Axes
* **X-axis:** Vdc (mV), ranging from 200 to 400 in increments of 25.
* **Y-axis:** Output Frequency (kHz), ranging from 0 to 80 in increments of 20.
* **Legend:** Located in the top-left corner, the legend identifies each line by its corresponding Vlk value:
* Blue: Vlk = 260mV
* Green: Vlk = 270mV
* Orange: Vlk = 280mV
* Red: Vlk = 290mV
* Purple: Vlk = 300mV
* Black: Vlk = 320mV
* Gray: Vlk = 350mV
* Dark Blue: Vlk = 370mV
### Detailed Analysis
Here's a breakdown of each data series, including trend descriptions and approximate data points:
* **Vlk = 260mV (Blue):** The output frequency remains at 0 kHz until Vdc reaches approximately 375 mV. Then, the output frequency increases sharply to approximately 50 kHz at Vdc = 400 mV.
* **Vlk = 270mV (Green):** The output frequency remains at 0 kHz until Vdc reaches approximately 325 mV. Then, the output frequency increases to approximately 75 kHz at Vdc = 400 mV.
* **Vlk = 280mV (Orange):** The output frequency remains at 0 kHz until Vdc reaches approximately 325 mV. Then, the output frequency increases to approximately 80 kHz at Vdc = 400 mV.
* **Vlk = 290mV (Red):** The output frequency remains at 0 kHz until Vdc reaches approximately 325 mV. Then, the output frequency increases to approximately 82 kHz at Vdc = 400 mV.
* **Vlk = 300mV (Purple):** The output frequency remains at 0 kHz until Vdc reaches approximately 325 mV. Then, the output frequency increases to approximately 70 kHz at Vdc = 400 mV.
* **Vlk = 320mV (Black):** The output frequency remains at 0 kHz until Vdc reaches approximately 350 mV. Then, the output frequency increases to approximately 50 kHz at Vdc = 400 mV.
* **Vlk = 350mV (Gray):** The output frequency remains at 0 kHz until Vdc reaches approximately 375 mV. Then, the output frequency increases to approximately 45 kHz at Vdc = 400 mV.
* **Vlk = 370mV (Dark Blue):** The output frequency remains at 0 kHz until Vdc reaches approximately 375 mV. Then, the output frequency increases to approximately 20 kHz at Vdc = 400 mV.
### Key Observations
* For all Vlk values, the output frequency remains at or near 0 kHz until Vdc reaches a certain threshold.
* The threshold Vdc value at which the output frequency starts to increase varies depending on the Vlk value. Higher Vlk values generally result in lower threshold Vdc values.
* The rate of increase in output frequency with respect to Vdc is generally very steep once the threshold is reached.
* The output frequency at Vdc = 400 mV varies significantly depending on the Vlk value.
### Interpretation
The chart demonstrates the relationship between Vdc, Vlk, and output frequency. It suggests that Vlk acts as a control parameter, influencing the threshold Vdc value required to initiate a significant output frequency. The data indicates that increasing Vlk generally lowers the Vdc threshold for output frequency activation. This relationship could be crucial in designing and optimizing circuits where precise control over output frequency is required based on specific voltage parameters. The steep increase in output frequency after the threshold suggests a non-linear relationship, possibly indicative of a switching behavior or a rapid amplification effect within the circuit.
</details>
supplementary note 5 Details of the implementation of the neuron row, the circuit that leverages the information of the conductance of a memristor to weight the effect of a spike to a neuron is shown in Figure A. 13 . The circuit features multiple inputs connected to a row of memristive devices (left) and a Front-End circuit buffering the current read from the devices to a differential-pair-integrator synapse. The synapse is then connected to a leaky-integrated-and-fire (LIF) neuron which eventually emits a spike. Figure A. 14 delves deeper in the behavior of the LIF neuron analyzing its output spiking frequency against an input DC voltage and its linear behavior respect to the RRAM conductance in a neuron row circuit.
T
F igure A. 15 : Schematic of a routing tile circuit offering two paths per direction. The routing tile receives eight inputs, comprising two pulse channels per direction, labelled as < 0 > or < 1 >, from the neighbouring tiles to the North (N), South (S), East (E) and West (W), and provides complimentary outputs. An example is shown of an input pulse arriving to the common gate of the fourth row of memory. Devices are coloured green or red to denote whether they are in the HCS or LCS. It is shown that, due to this input pulse, output pulses are produced by the routing columns containing the (green) devices programmed in the HCS.
<details>
<summary>Image 53 Details</summary>

### Visual Description
## Memristor Crossbar Array Diagram
### Overview
The image depicts a memristor crossbar array, a type of non-volatile memory. It shows an 8x6 grid of memristors, with input and output lines labeled. Some memristors are highlighted in red (LCS - Low Conductance State) and green (HCS - High Conductance State), indicating their state. The diagram also includes voltage source labels and pulse signal representations.
### Components/Axes
* **Grid:** An 8x6 grid representing the memristor array.
* **Memristors:** Each cell in the grid contains a memristor symbol.
* **Input Lines:** Labeled `Vin<0,N>`, `Vin<1,N>`, `Vin<0,S>`, `Vin<1,S>`, `Vin<0,W>`, `Vin<1,W>`, `Vin<0,E>`, `Vin<1,E>` on the left side of the grid.
* **Output Lines:** Labeled `Vout<0,N>`, `Vout<1,N>`, `Vout<0,S>`, `Vout<1,S>`, `Vout<0,W>`, `Vout<1,W>`, `Vout<0,E>`, `Vout<1,E>` at the bottom of the grid.
* **Voltage Source:** Labeled `Vbot` above each column of the grid.
* **Legend:** Located on the left side, indicating:
* Red: LCS (Low Conductance State)
* Green: HCS (High Conductance State)
* **Pulse Signal:** A square wave pulse signal is shown on the left side, pointing to the right, and another at the bottom, pointing downwards.
### Detailed Analysis or Content Details
* **Memristor States:**
* Row 3 (corresponding to `Vin<0,S>`): Contains one red (LCS) memristor in column 2, one green (HCS) memristor in column 3, and one red (LCS) memristor in column 5.
* Row 4 (corresponding to `Vin<1,S>`): Contains one red (LCS) memristor in column 1, one red (LCS) memristor in column 4, and one green (HCS) memristor in column 6.
* **Grid Structure:** The memristors are connected in a crossbar fashion, with horizontal lines representing input lines and vertical lines representing output lines.
* **Voltage Source Placement:** The `Vbot` voltage source is connected to the top of each column.
* **Input/Output Line Labels:** The input and output lines are labeled with a combination of 0/1 and N/S/W/E, likely representing different input/output configurations or addresses.
* **Pulse Signal Direction:** The input pulse signal points to the right, while the output pulse signal points downwards.
### Key Observations
* The memristor array is designed to store data based on the conductance state of individual memristors.
* The red and green memristors indicate the stored data pattern.
* The input and output lines are labeled in a structured manner, suggesting a specific addressing scheme.
* The pulse signals likely represent the read/write operations performed on the array.
### Interpretation
The diagram illustrates the architecture and operation of a memristor crossbar array. The array stores data by changing the conductance state of individual memristors. The input and output lines allow for addressing and accessing specific memristors within the array. The `Vbot` voltage source provides the necessary voltage for reading and writing data. The pulse signals represent the timing and direction of the read/write operations. The specific pattern of LCS and HCS memristors represents a particular data pattern stored in the array. The N/S/W/E notation in the input/output labels might refer to different directions or polarities of the applied voltage or current. The 0/1 notation might refer to different voltage levels or binary states.
</details>
F igure A. 16 : The routing column circuit with example waveforms. Input (red, left) voltage pulses, Vin, draw a current iin proportional to the conductance state, Gn, of the read 1 T 1 R structures. Two devices are labelled with HCS, indicating that they have been programmed with a conductance corresponding to the high conductance state, and one is labelled LCS in reference to the low conductance state. This resulting currents are buffered (green, centre), ibuff , into a current comparator circuit where it is compared with a reference current iref . When the buffered current exceeds the reference current a voltage pulse is generated at the column output (blue, right).
<details>
<summary>Image 54 Details</summary>

### Visual Description
## Circuit Diagram with Timing Charts
### Overview
The image presents a circuit diagram coupled with three timing charts illustrating the voltage and current behavior at different points in the circuit. The circuit involves memristors, transistors, and an operational amplifier, ultimately producing an output voltage. The timing charts show the input voltage pulses, buffer current, and output voltage over time.
### Components/Axes
**Timing Charts (Top Row):**
* **X-axis (all charts):** Time (ms), ranging from 0 to 3 ms.
* **Top-Left Chart:**
* Y-axis: Voltage (V), ranging from 0 to 1.25 V.
* Data Series:
* `Vin<0>`: Light orange pulses at approximately 0.5 ms.
* `Vin<1>`: Red pulses at approximately 1.5 ms.
* `Vin<N>`: Dark red pulses at approximately 2.5 ms.
* **Top-Middle Chart:**
* Y-axis: Current (μA), ranging from 0 to 2.5 μA.
* Data Series:
* `Ibuff`: Green pulses at approximately 1 ms and 2 ms.
* Annotations: "HCS" (High Conductance State) above the pulses, "LCS" (Low Conductance State) below the first pulse.
* **Top-Right Chart:**
* Y-axis: Current (μA), ranging from 0 to 1.25 μA.
* Data Series:
* `Vout`: Blue pulses at approximately 2 ms.
**Circuit Diagram (Bottom):**
* **Memristor Array:** Labeled `G0`, `G1`, ..., `GN`. Connected to input voltages `Vin<0>`, `Vin<1>`, ..., `Vin<N>`.
* **Input Voltages:** `Vin<0>`, `Vin<1>`, `Vin<N>` are connected to transistors.
* **Operational Amplifier:** Labeled with "+" and "-". Inputs are `Vx` and `Vtop`.
* **Buffer Stage:** Enclosed in a green box, labeled `Ibuff`.
* **Output Stage:** Enclosed in a blue box, labeled `Vout`.
* **Currents:** `Iin` and `Iref` are labeled.
* **Voltage:** `Vbot` is labeled.
### Detailed Analysis or Content Details
**Timing Charts:**
* **`Vin<0>`:** A voltage pulse of approximately 1.25V at around 0.5 ms.
* **`Vin<1>`:** A voltage pulse of approximately 1.25V at around 1.5 ms.
* **`Vin<N>`:** A voltage pulse of approximately 1.25V at around 2.5 ms.
* **`Ibuff`:** Current pulses of approximately 2.25 μA at around 1 ms and 2 ms.
* **`Vout`:** A current pulse of approximately 1.25 μA at around 2 ms.
**Circuit Diagram:**
* The memristor array is connected to input voltages via transistors.
* The output of the memristor array feeds into an operational amplifier.
* The operational amplifier's output drives a buffer stage.
* The buffer stage's output drives the final output stage, producing `Vout`.
### Key Observations
* The input voltage pulses (`Vin<0>`, `Vin<1>`, `Vin<N>`) occur at different times.
* The buffer current (`Ibuff`) pulses correspond to the input voltage pulses.
* The output voltage (`Vout`) pulse occurs after the buffer current pulse.
* The annotations "HCS" and "LCS" indicate the high and low conductance states of the memristors.
### Interpretation
The diagram illustrates a circuit that uses memristors as input elements. The input voltages control the conductance of the memristors, which in turn affects the current flowing through the circuit. The operational amplifier and buffer stage amplify and shape the signal before it reaches the output. The timing charts show how the voltage and current change at different points in the circuit over time. The "HCS" and "LCS" annotations suggest that the memristors are switching between high and low conductance states, which is a key characteristic of memristor behavior. The circuit appears to be implementing a logic function or signal processing operation based on the memristor array.
</details>
supplementary note 6 Details on the implementation of the Routing Tiles. Figure A. 15 shows a full-size schematics of a routing tile with 2 neurons allocated per direction. Figure A. 16 expands on the details of the implementation of the routing column, the circuit that uses the state of a memristor to decide whether to block or pass (route) a spike through the Mosaic architecture.
F igure A. 17 : An example of how a neuron tile can be interfaced to external event-based inputs (i.e., those generated by an event-based sensor). With respect to the neuron tile circuit presented in the paper (permitting connections to adjacent tiles as well as recurrent connections within the tile), this figure shows two additional rows of devices stacked on top of the array. As an arbitrary example, here two additional signals can be integrated in the neuron circuits.
<details>
<summary>Image 55 Details</summary>

### Visual Description
## Memristor Array Diagram
### Overview
The image depicts a memristor crossbar array with input and output connections. It shows a 2x3 grid of memristor devices, with voltage inputs on the rows and current buffers on the columns. The diagram illustrates how external inputs are connected to the array, and how output voltages are derived from the current buffers.
### Components/Axes
* **Memristor Devices:** Represented by a symbol resembling a resistor with a curved line inside.
* **Input Voltages:** Labeled as V<sub>in</sub><0> and V<sub>in</sub><1>, indicating two input lines.
* **Output Voltages:** Labeled as V<sub>out</sub><0> and V<sub>out</sub><1>, indicating two output lines.
* **Bottom Voltages:** Labeled as V<sub>bot</sub> at the top of each column.
* **Current Buffers:** Represented by circles labeled as i<sub>buff</sub>.
* **External Inputs:** An arrow pointing into the array, labeled "External inputs".
* **Gray Shaded Region:** Highlights a portion of the array, specifically the top row.
### Detailed Analysis
The diagram shows a 2x2 array of memristors, with each memristor connected between a horizontal input line and a vertical output line.
* **Input Lines:** Two horizontal input lines are labeled V<sub>in</sub><0> and V<sub>in</sub><1>.
* **Output Lines:** Two vertical output lines connect to current buffers (i<sub>buff</sub>), which then connect to the output voltages V<sub>out</sub><0> and V<sub>out</sub><1>.
* **Top Row:** The top row of memristors is connected to V<sub>bot</sub>.
* **External Inputs:** The "External inputs" arrow indicates that the array receives external signals.
* **Memristor Connections:** Each memristor has one terminal connected to a horizontal input line and the other terminal connected to a vertical output line.
### Key Observations
* The diagram illustrates a basic memristor crossbar architecture.
* The input voltages control the state of the memristors.
* The current buffers provide stable output voltages.
* The gray shaded region highlights a specific section of the array, possibly indicating a particular operation or configuration.
### Interpretation
The diagram represents a fundamental building block for memristor-based memory or computation. The memristor array allows for storing and processing information based on the resistance states of the individual memristors. The input voltages determine the state of the memristors, and the output voltages reflect the stored information or the result of a computation. The "External inputs" suggest that the array can be integrated into a larger system. The use of current buffers ensures that the output voltages are stable and can be reliably read.
</details>
## supplementary note 7
F igure A. 18 : SHD keyword spotting dataset test accuracies for Mosaic architectures with different total number of neurons in the network for a) 4 x 4 Neuron Tile layout (a total of 16 number of Neuron tiles) and b) 8 x 8 Neuron Tile layout. The number of neurons per tile is equal to the total number of recurrent neurons divided by the number of neuron tiles. Median and standard deviation are calculated using 3 experiments with varying sparsity constraints.
<details>
<summary>Image 56 Details</summary>

### Visual Description
## Bar Chart: SHD Test Accuracy vs. Total Recurrent Neurons
### Overview
The image presents two bar charts comparing SHD (Spiking Heidelberg Digits) test accuracy against the total number of recurrent neurons. Chart (a) displays results for 16 neuron tiles, while chart (b) shows results for 64 neuron tiles. Each chart compares the test accuracy for 2048, 1024, 512, and 256 total recurrent neurons. Error bars are included on each bar, indicating the variability in the test accuracy.
### Components/Axes
* **Chart Titles:**
* (a) Number of Neuron Tiles: 16
* (b) Number of Neuron Tiles: 64
* **Y-axis Title:** Test accuracy
* **Y-axis Scale:** 0 to 60, with no explicit markings, but implied increments of 20.
* **X-axis Title:** Total recurrent neurons
* **X-axis Categories:** 2048, 1024, 512, 256
* **Bar Colors:** Green (2048), Light Blue (1024), Gray (512), Red (256)
### Detailed Analysis
**Chart (a): Number of Neuron Tiles: 16**
* **2048 Neurons (Green):** Test accuracy is approximately 68% with a small error bar.
* **1024 Neurons (Light Blue):** Test accuracy is approximately 64% with a small error bar.
* **512 Neurons (Gray):** Test accuracy is approximately 59% with a moderate error bar.
* **256 Neurons (Red):** Test accuracy is approximately 53% with a moderate error bar.
**Chart (b): Number of Neuron Tiles: 64**
* **2048 Neurons (Green):** Test accuracy is approximately 68% with a small error bar.
* **1024 Neurons (Light Blue):** Test accuracy is approximately 64% with a small error bar.
* **512 Neurons (Gray):** Test accuracy is approximately 47% with a large error bar.
* **256 Neurons (Red):** Test accuracy is approximately 27% with a moderate error bar.
### Key Observations
* In both charts, the test accuracy generally decreases as the number of total recurrent neurons decreases from 2048 to 256.
* The test accuracy for 2048 and 1024 neurons is nearly identical between the 16 and 64 neuron tile configurations.
* The test accuracy for 512 and 256 neurons is significantly lower in the 64 neuron tile configuration compared to the 16 neuron tile configuration.
* The error bars tend to be larger for lower numbers of neurons, especially in the 64 neuron tile configuration.
### Interpretation
The data suggests that increasing the number of neuron tiles from 16 to 64 negatively impacts the test accuracy when using fewer total recurrent neurons (512 and 256). This could indicate that a higher number of neuron tiles requires a larger number of total recurrent neurons to maintain optimal performance. The larger error bars for lower neuron counts, especially with 64 tiles, suggest that the performance becomes less stable and more variable under these conditions. The similar performance at 2048 and 1024 neurons regardless of the number of tiles suggests that the network is less sensitive to the number of tiles when a sufficient number of neurons are available.
</details>
a
<details>
<summary>Image 57 Details</summary>

### Visual Description
## Bar Chart: SHD Test Accuracy vs. Sparsity Regularization Constant
### Overview
The image presents two bar charts comparing SHD (Spiking Hebbian Dynamics) test accuracy against different sparsity regularization constants (lambda). The chart on the left represents results with 16 neuron tiles, while the chart on the right represents results with 64 neuron tiles. Each chart displays test accuracy for regularization constants of 0.1, 0.05, and 0.01. Error bars are included to indicate the variability in the test accuracy.
### Components/Axes
* **Titles:**
* Left Chart: "Number of Neuron Tiles: 16"
* Right Chart: "Number of Neuron Tiles: 64"
* Both Charts: "SHD Test Accuracy vs. Sparsity Regularization Constant"
* **Y-axis:**
* Label: "Test accuracy"
* Scale: 0 to 60, with tick marks at 20, 40, and 60.
* **X-axis:**
* Label: "Regularization constant, λ"
* Categories: 0.1, 0.05, 0.01
* **Bar Colors:**
* 0.1: Light Purple
* 0.05: Medium Purple
* 0.01: Dark Purple
### Detailed Analysis
**Chart A (16 Neuron Tiles):**
* **Regularization Constant 0.1 (Light Purple):**
* Test accuracy: Approximately 62
* Error bar extends from approximately 42 to 72.
* **Regularization Constant 0.05 (Medium Purple):**
* Test accuracy: Approximately 64
* Error bar extends from approximately 54 to 74.
* **Regularization Constant 0.01 (Dark Purple):**
* Test accuracy: Approximately 68
* Error bar extends from approximately 58 to 78.
**Chart B (64 Neuron Tiles):**
* **Regularization Constant 0.1 (Light Purple):**
* Test accuracy: Approximately 62
* Error bar extends from approximately 42 to 72.
* **Regularization Constant 0.05 (Medium Purple):**
* Test accuracy: Approximately 64
* Error bar extends from approximately 54 to 74.
* **Regularization Constant 0.01 (Dark Purple):**
* Test accuracy: Approximately 68
* Error bar extends from approximately 58 to 78.
### Key Observations
* In both charts, test accuracy appears to increase slightly as the regularization constant (lambda) decreases from 0.1 to 0.01.
* The error bars are relatively large, indicating substantial variability in the test accuracy for each regularization constant.
* The test accuracy values and error bar ranges are nearly identical between the 16 neuron tiles and 64 neuron tiles configurations.
### Interpretation
The data suggests that decreasing the sparsity regularization constant (lambda) from 0.1 to 0.01 may lead to a slight improvement in SHD test accuracy, regardless of whether 16 or 64 neuron tiles are used. However, the large error bars indicate that this improvement may not be statistically significant. The similarity in results between the 16 and 64 neuron tile configurations suggests that increasing the number of neuron tiles from 16 to 64 does not significantly impact the test accuracy within the tested range of regularization constants. The error bars are large, which could be due to a number of factors, including the inherent variability in the SHD algorithm, the size of the test dataset, or the specific parameters used in the simulation. Further investigation with more data points and statistical analysis would be needed to confirm these trends and determine the optimal regularization constant for maximizing test accuracy.
</details>
b
F igure A. 19 : SHD keyword spotting dataset test accuracies for Mosaic architectures trained with different sparsity regularization values. As explained in the Methods section of the main text, the regularization is added to the loss function to exponentially penalize the long-range connections. The plot shows the accuracy for strong (default, λ = 0.1), medium ( λ = 0.05) and weaker ( λ = 0.01) sparsity regularization on a) 4 x 4 neuron tile layout and b) 8 x 8 neuron tile layout. Median and standard deviation are calculated using 4 experiments with varying number of neurons per neuron tile.
## supplementary note 8
<details>
<summary>Image 58 Details</summary>

### Visual Description
## Circuit Diagram: Memristor Circuit with Op-Amp Feedback
### Overview
The image depicts an electronic circuit incorporating a memristor, two MOSFET transistors (M1 and M2), and an operational amplifier (op-amp). The circuit appears to be designed to control the output current (I_out) based on the input voltage (V_in) and the memristor's conductance (G_i).
### Components/Axes
* **Transistors:** Two MOSFET transistors labeled M1 and M2.
* **Memristor:** A memristor element labeled G_i.
* **Op-Amp:** An operational amplifier with inputs labeled "+" and "-". The gain is labeled "A".
* **Voltages:** V_bot, V_x, V_in, V_top, V_out.
* **Currents:** i_in, I_out.
### Detailed Analysis
* **Transistor M1:** The drain of M1 is connected to V_x. The source of M1 is connected to the output voltage V_out. The gate of M1 is connected to the power supply.
* **Transistor M2:** The drain of M2 is connected to the output voltage V_out. The source of M2 is connected to ground. The gate of M2 is connected to the power supply.
* **Memristor G_i:** The memristor is connected between the node where V_x and i_in are defined and the drain of another transistor.
* **Input Transistor:** A transistor is connected between the memristor and V_bot. The gate of this transistor is connected to V_in.
* **Op-Amp:** The positive input (+) of the op-amp is connected to the node where V_x and i_in are defined. The negative input (-) of the op-amp is connected to V_top. The output of the op-amp is connected to the output voltage V_out.
### Key Observations
* The circuit uses an op-amp in a feedback configuration, likely to regulate the output voltage or current.
* The memristor's conductance (G_i) plays a crucial role in determining the circuit's behavior.
* The input voltage (V_in) controls the current through the transistor connected to the memristor.
* The output current (I_out) is controlled by the transistor M2.
### Interpretation
This circuit likely functions as a current source or a voltage-controlled resistor, where the memristor's state influences the output. The op-amp provides feedback to maintain a specific relationship between the input voltage, memristor conductance, and output current. The memristor's non-volatile resistance property could be exploited for memory or adaptive circuit applications. The circuit's behavior would depend on the specific characteristics of the memristor and the op-amp's gain.
</details>
bot
F igure A. 20 : Analysis of the read-out circuitry. The amplifier with gain A, pins voltage Vx to the voltage Vtop . On the arrival of the pulse on Vin < i > , a current equal to i in = ( Vtop -Vbot ) Gi flows into the memristor i , which is then mirrored out iout .
supplementary note 9 Figure A. 20 details the implementation of the read-out circuit used in the Mosaic architecture. Though not optimized for area, we have used this implementation for both the neuron and routing tiles.
The dominant power consumption of the circuit depends on the required bandwidth (BW) of the feedback loop. This BW depends on the maximum conductance of the RRAM, Gmax . For Gi , max , once an input arrives to Vin < i > , the current i in has to settle to ( Vtop -Vbot ) Gi , max within a settling time, ts , a proportion of the pulse width. This timing sets the speed at which the loop should work, and thus its BW. If the loop does not close in this time, the amplifier will slew, and the voltage Vx drops. In both neuron and routing tiles, this condition should be met for Vx to stay pinned at Vtop while RRAM is being read. However, the neuron and routing tiles have different BW requirements.
In the neuron tile , the read-out circuitry has to resolve between at least 8 levels of current for the 8 levels that each RRAM device can take. Therefore, the Least Significant Bit (LSB) of the i in current for the neuron tile is i in , LSB , N = Vref ( Gmax -Gmin ) N . Based on the Fig. 5 . 2 d, this value for the neuron tile is 100 mV ( 120 µ S -40 µ S ) 8 = 1 µ A . Note that since the 8 levels to be resolved are in the Low Resistive State of the RRAM, the Gmax and Gmin are the minimum and maximum of the range in the LRS, which correspond to 40 µ S and 120 µ S .
In the routing tile , the read-out circuitry has to resolve between two levels which will either let the spike regenerate and thus propagate, or will get blocked. Therefore, the LSB of the i in current in the routing tile is i in , LSB , R = Vref ( Gmax -Gmin ) N . Based on the Fig. 5 . 2 d, this value for the neuron
tile is 100 mV ( 40 µ S -10 µ S ) 2 = 15 µ A . Note that since the 2 levels to be resolved are the LRS and HRS of the RRAM, the Gmax and Gmin correspond to 10 µ S and 40 µ S .
To be able to distinguish between any two levels in both cases, we will consider a maximum error of i in , LSB 2 . Therefore, the maximum tolerable error in the the neuron tile is 0.5 µ A and in the routing tile is 7.5 µ A .
This means that if the feedback loop does not close in ts of the pulse width, Vx drop is a lot more tolerable in the routing tile than it is in the neuron tile. This suggests that the bandwidth requirements in the case of neuron tile is 7.5/0.5 = 15 times more than that of the routing tile. The BW requirements directly translate to the biasing of the amplifier and thus its power consumption. Therefore, the static power consumption of the neuron tile is 15 times that of the routing tile. The current requirements also translate to area, since larger currents require wider transistors.
appendix 3 : reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing
F igure A. 21 : Proposed volatile diffusive switching mechanism. (a-d) illustrate the various stages of filament formation and rupture. i-iv indicate the possible reactions happening within the device.
<details>
<summary>Image 59 Details</summary>

### Visual Description
## Diagram: Silver Bromide Electrochemical Cell
### Overview
The image depicts a schematic representation of a silver bromide (AgBr) electrochemical cell under different voltage conditions, illustrating the movement of ions and the formation/dissolution of a silver filament. The diagram is divided into four sub-figures (a, b, c, d) showing the cell's state at different applied voltages, and a legend explaining the reactions.
### Components/Axes
* **Sub-figures:** a, b, c, d
* **Voltage Labels:** 0 V, +I, ++I
* **Cell Components:**
* Left Electrode: Represented by a gray block filled with black circles (Ag atoms).
* Electrolyte: Represented by a light blue region containing blue circles (Br- ions) and white circles (Ag+ ions).
* Right Electrode: Represented by a series of light gray blocks connected to a voltage source.
* **Legend:** Located at the bottom, explaining the reactions with numbered labels (i, ii, iii, iv).
* **Annotations:** "Thin filament", "Diffusive dynamics"
### Detailed Analysis or ### Content Details
**Sub-figure a (0 V):**
* The left electrode (Ag) contains a uniform distribution of black circles (Ag atoms).
* The electrolyte (AgBr) contains a mix of blue circles (Br- ions) and white circles (Ag+ ions), seemingly randomly distributed.
**Sub-figure b (+I):**
* Voltage: +I
* Reaction i: At the interface between the left electrode and the electrolyte, a black circle (Ag atom) transforms into a white circle (Ag+ ion) and a red circle (electron). The arrow indicates the direction of the reaction.
* Reaction ii: At the interface between the electrolyte and the right electrode, a white circle (Ag+ ion) combines with a red circle (electron) to form a black circle (Ag atom).
* White arrows indicate the movement of blue circles (Br- ions) towards the left electrode.
**Sub-figure c (++I):**
* Voltage: ++I (higher positive voltage than in b)
* A "Thin filament" of black circles (Ag atoms) is formed, extending from the left electrode towards the right electrode. This filament is enclosed in a dashed purple rectangle.
* The concentration of white circles (Ag+ ions) is higher near the right electrode.
* The concentration of blue circles (Br- ions) is higher near the left electrode.
**Sub-figure d (0 V):**
* Voltage: 0 V
* Annotation: "Diffusive dynamics"
* Reaction iii: A blue circle (Br- ion) transforms into a white circle (VBr, Bromine vacancy).
* Reaction iv: A black circle (Ag atom) combines with a red circle (electron) to form a white circle (Ag+ ion).
* The black circles (Ag atoms) in the filament are dissolving, and the white circles (Ag+ ions) are dispersing.
**Possible Reactions (Legend):**
* **Reaction i:** Ag (black circle) → Ag+ (white circle) + e- (red circle) (Silver atom becomes a silver ion and releases an electron)
* **Reaction ii:** Ag+ (white circle) + e- (red circle) → Ag (black circle) (Silver ion gains an electron and becomes a silver atom)
* **Reaction iii:** Br- (blue circle) → VBr (white circle) (Bromide ion becomes a bromine vacancy)
* **Reaction iv:** Ag (black circle) - e- (red circle) → Ag+ (white circle) (Silver atom loses an electron and becomes a silver ion)
### Key Observations
* The applied voltage influences the movement of ions and the formation of a silver filament.
* At higher voltages, a thin filament of silver forms, connecting the electrodes.
* When the voltage is removed, the filament dissolves due to diffusion and electrochemical reactions.
* The legend provides a key to understanding the reactions occurring at the electrode-electrolyte interfaces.
### Interpretation
The diagram illustrates the electrochemical processes involved in the formation and dissolution of a silver filament in a silver bromide cell. Applying a positive voltage causes silver ions to migrate towards the cathode (right electrode), where they are reduced to silver atoms, forming a conductive filament. When the voltage is removed, the filament dissolves as silver atoms oxidize back into silver ions, and bromine ions diffuse. This process is fundamental to resistive switching phenomena observed in memristive devices. The diagram effectively visualizes the ionic transport and redox reactions that govern the cell's behavior.
</details>
supplementary note 1 Diffusive behaviour: The volatile threshold switching behaviour can be attributed to the redistribution of Ag + and Br -ions under an applied electric field, and their back-diffusion upon removing power. The soft lattice of the halide perovskite matrix has been observed to enable Ag + migration with an activation energy ∼ 0.15eV 2 . Interestingly, migration of halide ions and their vacancies within the perovskite matrix also occur at similar energies ∼ 0.10 -0.25eV [ 253 , 348 , 349 ], making it difficult to pinpoint a singular operation mechanism. We hypothesize that during the SET process, Ag atoms are ionized to Ag + and forms a percolation path through the device structure. Electrons from the grounded electrode oxidize Ag + to form weak Ag filaments. Parallelly, Br -ions get attracted towards the positively charged electrode and a weak filament composed of vacancies ( VBr ) is formed. Both these factors increase the device conductance from a high resistance state (HRS) to a temporary low resistance state (LRS). Upon removing the electric field, the low activation energy of the ions causes them to diffuse back spontaneously, breaking the percolation path and leading to volatile memory characteristics a.k.a. short-term plasticity. The low compliance current ( Icc ) of 1 µ A ensures that the electrochemical reactions are well regulated, and the percolation pathways formed are weak enough to allow these diffusive dynamics.
F igure A. 22 : Proposed non-volatile drift switching mechanism. (a-e) illustrate the various stages of filament formation and rupture. i-iv indicate the possible reactions happening within the device.
<details>
<summary>Image 60 Details</summary>

### Visual Description
## Device Operation Diagrams
### Overview
The image presents a series of diagrams illustrating the operation of a device under varying voltage conditions. The diagrams depict the movement of ions and electrons within the device, leading to the formation and dissolution of conductive filaments. The image also includes a legend describing possible reactions within the device.
### Components/Axes
Each diagram (a-e) shows a device consisting of two electrodes separated by a material. The left electrode is represented by a collection of black circles, and the right electrode is represented by a vertical line connected to a circuit symbol. The material between the electrodes contains blue circles, white circles, and in some cases, black circles.
* **Labels:**
* Diagrams are labeled a, b, c, d, and e.
* Voltage levels are indicated next to each diagram: 0 V (a, d), +I (b), +++I (c), -V (e).
* Titles: Thick filament, Drift dynamics, Possible Reactions
* **Legend (Possible Reactions):** Located at the bottom-right of the image.
* Reaction i: Ag (black circle) -> Ag+ (white circle) + e- (red circle)
* Reaction ii: Ag+ (white circle) + e- (red circle) -> Ag (black circle)
* Reaction iii: Br- (blue circle) -> VBr (white circle)
* Reaction iv: Ag (black circle) + e- (red circle) -> Ag+ (white circle)
* **Particles:**
* Ag (Silver): Represented by black circles.
* Ag+ (Silver Ion): Represented by white circles.
* e- (Electron): Represented by red circles.
* Br- (Bromide Ion): Represented by blue circles.
* VBr (Bromide Vacancy): Represented by white circles.
### Detailed Analysis or ### Content Details
* **Diagram a (0 V):**
* Voltage: 0 V
* Distribution: Blue circles (Br-) are evenly distributed within the material. White circles (Ag+) are sparsely distributed. Black circles (Ag) are concentrated in the left electrode.
* Trend: No significant movement or reaction is depicted.
* **Diagram b (+I):**
* Voltage: +I
* Distribution: Blue circles (Br-) move away from the left electrode. White circles (Ag+) move towards the right electrode. Red circles (e-) are present near the left electrode.
* Arrows: An arrow indicates the movement of blue circles away from the left electrode. Another arrow indicates the movement of white circles towards the right electrode.
* Reactions: Reaction i is shown near the left electrode, and reaction ii is shown near the right electrode.
* **Diagram c (+++I):**
* Voltage: +++I
* Distribution: A thick filament of black circles (Ag) has formed, spanning from the left electrode towards the right electrode. Blue circles (Br-) are concentrated at the bottom of the material.
* Region: A dashed blue rectangle labeled "Thick filament" highlights the region where the black circles are concentrated.
* **Diagram d (0 V):**
* Voltage: 0 V
* Distribution: Black circles (Ag) are concentrated near the left electrode, forming a partial filament. Blue circles (Br-) are concentrated at the bottom of the material.
* Title: "Drift dynamics" suggests the diagram illustrates the movement of ions.
* **Diagram e (-V):**
* Voltage: -V
* Distribution: The filament of black circles (Ag) is dissolving. Blue circles (Br-) are becoming more evenly distributed.
* Arrows: An arrow indicates the movement of blue circles. Reaction iii is shown near the left electrode, and reaction iv is shown near the right electrode.
### Key Observations
* The diagrams illustrate the formation and dissolution of a conductive filament under different voltage conditions.
* The movement of ions (Ag+ and Br-) plays a crucial role in the filament formation and dissolution process.
* The legend provides the chemical reactions associated with the movement of ions and electrons.
### Interpretation
The diagrams demonstrate the operation of a resistive switching device. Applying a positive voltage (+I or +++I) causes the formation of a conductive filament composed of silver (Ag) atoms. This filament bridges the gap between the electrodes, allowing current to flow. The formation of the filament is driven by the migration of silver ions (Ag+) and the reduction of silver ions to silver atoms. Applying a negative voltage (-V) causes the dissolution of the filament, breaking the conductive path. This process is driven by the oxidation of silver atoms to silver ions. The movement of bromide ions (Br-) also plays a role in the process. The "Drift dynamics" diagram suggests that the movement of ions is influenced by the electric field. The "Thick filament" diagram shows that under high positive voltage, a robust conductive path is formed. The device's ability to switch between conductive and non-conductive states makes it suitable for memory and other electronic applications.
</details>
supplementary note 2 Drift behaviour: Upon increasing the Icc to 1 mA ( 3 orders of magnitude higher than that used for volatile threshold switching), permanent and thicker conductive filamentary pathways are possibly formed within the device as illustrated in Fig. A. 22 . This increases the device conductance from a high resistance state (HRS) to a permanent and much lower low resistance state (LRS). Electrochemical reactions are triggered to a higher extent and hence, the switching dynamics is now dominated by the drift kinetics of the mobile ion species Ag + and Br -, rather than diffusion. Hence upon removing the electric field, the conductive filaments remain largely unaffected, and the devices retain their LRS and portray longterm plasticity. Application of voltage sweeps, or pulses of opposite polarity causes rupture of these filaments, and the devices are reset to their HRS. For DDAB-capped CsPbBr3NCs, the devices transition to a non-erasable non-volatile state within ∼ 50 cycles, indicating formation of very thick filaments (Fig. A. 23 ) On the other hand, the OGB-capped CsPbBr3 NCs display a record-high nonvolatile endurance of 5655 cycles and retention of 10 5 seconds (Fig. A. 24 ), pointing to better regulation of the filament formation and rupture kinetics.
F igure A. 23 : Non-volatile drift switching of DDAB-capped CsPbBr 3 NC memristors. (a) Representative IV characteristics. (b) Endurance. (c) Retention.
<details>
<summary>Image 61 Details</summary>

### Visual Description
## Current-Voltage and Time Dependence Charts
### Overview
The image presents three charts (a, b, c) illustrating the electrical characteristics of a device, likely a memristor or similar resistive switching element. Chart (a) shows the current-voltage (I-V) characteristics, exhibiting hysteresis. Chart (b) displays the current levels for Low Resistance State (LRS) and High Resistance State (HRS) over multiple cycles. Chart (c) shows the current levels for LRS and HRS over time.
### Components/Axes
**Chart a: Current vs. Voltage**
* **Y-axis:** Current (A), logarithmic scale from 10^-9 to 10^-1.
* **X-axis:** Voltage (V), linear scale from -4 to 2.
* **Data:** A single curve showing the I-V relationship with hysteresis. Arrows indicate the direction of voltage sweep. The curve is green.
* **Annotations:** Numbers 1, 2, 3, and 4 indicate different segments of the voltage sweep cycle.
**Chart b: Current vs. Cycles**
* **Y-axis:** Current (A), logarithmic scale from 10^-6 to 10^-2.
* **X-axis:** Cycles, linear scale from 0 to 50.
* **Data:** Two scatter plots representing LRS and HRS currents over cycles.
* LRS: Data points are light yellow-green squares.
* HRS: Data points are dark green squares.
* **Legend:** Located at the top-right of the chart.
* LRS (light yellow-green square)
* HRS (dark green square)
**Chart c: Current vs. Time**
* **Y-axis:** Current (A), logarithmic scale from 10^-9 to 10^-3.
* **X-axis:** Time (secs), linear scale from 0 to 1x10^5 (100,000).
* **Data:** Two curves representing LRS and HRS currents over time.
* LRS: Data points are light yellow-green squares.
* HRS: Data points are dark green squares.
* **Legend:** Located at the top-right of the chart.
* LRS (light yellow-green square)
* HRS (dark green square)
### Detailed Analysis
**Chart a: Current vs. Voltage**
* The I-V curve starts at approximately (0V, 10^-7 A) and follows the path indicated by the arrows.
* Segment 1: As voltage increases from 0V to approximately 1V, the current increases to approximately 10^-3 A.
* Segment 2: The current remains relatively constant at approximately 10^-3 A as the voltage increases from 1V to 2V.
* Segment 3: As the voltage decreases from 2V to -2V, the current initially remains at approximately 10^-3 A, then drops sharply to approximately 10^-6 A around -1V.
* Segment 4: As the voltage decreases from -2V to -4V, the current remains relatively constant at approximately 10^-3 A. As the voltage increases from -4V to 0V, the current increases from approximately 10^-3 A to approximately 10^-7 A.
**Chart b: Current vs. Cycles**
* The LRS current (light yellow-green squares) fluctuates around 10^-3 A.
* The HRS current (dark green squares) fluctuates around 10^-6 A.
* There is a clear separation between the LRS and HRS current levels.
**Chart c: Current vs. Time**
* The LRS current (light yellow-green squares) starts at approximately 10^-3 A and decreases slightly over time to approximately 5*10^-4 A.
* The HRS current (dark green squares) starts at approximately 10^-8 A and increases significantly over time to approximately 10^-6 A.
### Key Observations
* **Hysteresis:** Chart (a) clearly shows hysteresis in the I-V characteristics, indicating resistive switching behavior.
* **Stable LRS:** In chart (b), the LRS current remains relatively stable over multiple cycles.
* **Unstable HRS:** In chart (b), the HRS current fluctuates more significantly than the LRS current.
* **Time Dependence:** Chart (c) shows that the HRS current increases significantly over time, while the LRS current decreases slightly.
### Interpretation
The data suggests that the device exhibits resistive switching behavior, with a clear difference between the LRS and HRS current levels. The hysteresis in the I-V curve confirms this behavior. The stability of the LRS current over cycles indicates good endurance. However, the increase in HRS current over time suggests that the device may be drifting towards the LRS, which could be a reliability concern. The device's resistance changes depending on the applied voltage and the duration of the applied voltage. The device's resistance also changes over time.
</details>
F igure A. 24 : Non-volatile drift switching of OGB-capped CsPbBr 3 NC memristors. Figure shows the retention performance.
<details>
<summary>Image 62 Details</summary>

### Visual Description
## Chart: Current vs. Time
### Overview
The image is a plot of current (A) versus time (seconds) showing two distinct states, LRS (Low Resistance State) and HRS (High Resistance State). The plot shows the current values over time for both states, with the LRS exhibiting a higher current level than the HRS.
### Components/Axes
* **Y-axis:** Current (A), logarithmic scale from 10^-9 to 10^-3.
* **X-axis:** Time (secs), linear scale from 0 to 1x10^5.
* **Legend:** Located in the center-right of the plot.
* LRS (Low Resistance State): Represented by light blue squares.
* HRS (High Resistance State): Represented by dark blue squares.
### Detailed Analysis
* **LRS (Low Resistance State):**
* Trend: The current is relatively stable at approximately 1x10^-3 A for the majority of the time.
* Data Points:
* Initial value: ~1x10^-3 A
* Value at 5x10^4 secs: ~1x10^-3 A
* Value at 1x10^5 secs: ~5x10^-4 A
* **HRS (High Resistance State):**
* Trend: The current starts at a higher value, then decreases over time, with some fluctuations.
* Data Points:
* Initial value: ~1x10^-7 A
* Value at 5x10^4 secs: ~2x10^-9 A
* Value at 8x10^4 secs: ~1x10^-9 A
* Value at 1x10^5 secs: ~1x10^-7 A
### Key Observations
* The LRS maintains a significantly higher current level compared to the HRS throughout the experiment.
* The HRS exhibits a decreasing trend in current over time, indicating a change in resistance.
* The HRS shows a jump in current near the end of the time period.
### Interpretation
The plot demonstrates the behavior of a device that can switch between two resistance states. The LRS represents a state where the device allows a higher current to flow, while the HRS represents a state where the current is significantly lower. The change in current over time in the HRS suggests that the resistance of the device is not constant and may be influenced by external factors or internal changes within the device. The jump in current at the end of the HRS time period could indicate a transition back towards the LRS or some other change in the device's state.
</details>
F igure A. 25 : Transmission electron microscope (TEM) images of DDAB-capped CsPbBr 3 NCs.
<details>
<summary>Image 63 Details</summary>

### Visual Description
## Microscopic Image: Nanocrystal Array
### Overview
The image is a microscopic view of an array of nanocrystals. The nanocrystals appear as small, roughly square-shaped objects arranged in a grid-like pattern. The image provides a visual representation of the size and arrangement of these nanocrystals.
### Components/Axes
- **Scale Bar:** Located at the bottom-left corner, indicating a length of 10 nm.
### Detailed Analysis
The image shows a dense packing of nanocrystals. The nanocrystals are relatively uniform in size and shape, appearing as small, dark squares against a lighter background. The arrangement is mostly ordered, forming a grid-like structure, although some imperfections and dislocations are visible. The scale bar indicates that the nanocrystals are approximately 10 nm in size.
### Key Observations
- The nanocrystals are arranged in a grid-like pattern.
- The nanocrystals are roughly square-shaped.
- The scale bar indicates a length of 10 nm.
### Interpretation
The image demonstrates the successful synthesis and assembly of nanocrystals into an ordered array. The uniformity in size and shape of the nanocrystals suggests a controlled synthesis process. The grid-like arrangement indicates a self-assembly process or a directed assembly technique. The image provides valuable information about the morphology and arrangement of these nanocrystals, which can be crucial for understanding their properties and potential applications.
</details>
supplementary note 3
F igure A. 26 : Non-linear variation of the device conductance as a function of the stimulation pulse (a) amplitude, (b) width and (c) number. For (a), the pulse width and number is kept constant at 25 ms and 1 respectively. For (b), the pulse amplitude and number is kept constant at 1 V and 1 respectively. For (c), the pulse amplitude and width is kept constant at 1 V and 25 ms respectively.
<details>
<summary>Image 64 Details</summary>

### Visual Description
## Chart Type: Multiple Line Graphs
### Overview
The image presents three line graphs (a, b, and c) showing the relationship between current (in Amperes) and different parameters: voltage (in Volts), pulse width (in milliseconds), and time (in seconds). Each graph has a specific condition mentioned in the title.
### Components/Axes
**Graph a:**
* **Title:** Pulse width = 25 ms
* **X-axis:** Voltage (V), with markers at 1, 1.2, 1.4, 1.6, 1.8, and 2.
* **Y-axis:** Current (A), with a scale from 0 to 1.2x10^-5. Markers are present at 2x10^-6, 4x10^-6, 6x10^-6, 8x10^-6, 1x10^-5, and 1.2x10^-5.
* **Data Series:** Blue squares connected by a line.
**Graph b:**
* **Title:** Pulse amplitude = 1 V
* **X-axis:** Pulse width (ms), with markers at 10, 15, 20, 25, 30, 35, and 40.
* **Y-axis:** Current (A), with a scale from 0 to 1.4x10^-6. Markers are present at 1x10^-6, 1.2x10^-6, and 1.4x10^-6.
* **Data Series:** Green squares connected by a line.
**Graph c:**
* **Title:** Pulse amplitude = 1 V
* **X-axis:** Time (secs), with markers at 0, 0.2, 0.4, and 0.6.
* **Y-axis:** Current (A), with a scale from approximately 8x10^-7 to 1.8x10^-6. Markers are present at 8x10^-7, 1x10^-6, 1.2x10^-6, 1.4x10^-6, 1.6x10^-6, and 1.8x10^-6.
* **Data Series:** Orange squares connected by a line.
### Detailed Analysis
**Graph a (Current vs. Voltage):**
* The current increases as the voltage increases.
* At 1 V, the current is approximately 1x10^-6 A.
* At 1.2 V, the current is approximately 1.2x10^-6 A.
* At 1.4 V, the current is approximately 2.1x10^-6 A.
* At 1.6 V, the current is approximately 3.7x10^-6 A.
* At 1.8 V, the current is approximately 6.3x10^-6 A.
* At 2 V, the current is approximately 1.1x10^-5 A.
**Graph b (Current vs. Pulse Width):**
* The current increases as the pulse width increases.
* At 10 ms, the current is approximately 8.8x10^-7 A.
* At 15 ms, the current is approximately 1.15x10^-6 A.
* At 20 ms, the current is approximately 1.25x10^-6 A.
* At 25 ms, the current is approximately 1.35x10^-6 A.
* At 30 ms, the current is approximately 1.42x10^-6 A.
* At 35 ms, the current is approximately 1.45x10^-6 A.
* At 40 ms, the current is approximately 1.5x10^-6 A.
**Graph c (Current vs. Time):**
* The current increases rapidly initially and then plateaus after approximately 0.4 seconds.
* At 0 seconds, the current is approximately 1.0x10^-6 A.
* At 0.1 seconds, the current is approximately 1.3x10^-6 A.
* At 0.2 seconds, the current is approximately 1.4x10^-6 A.
* At 0.3 seconds, the current is approximately 1.5x10^-6 A.
* At 0.4 seconds, the current is approximately 1.65x10^-6 A.
* From 0.5 to 0.6 seconds, the current remains relatively constant at approximately 1.7x10^-6 A.
### Key Observations
* In graph a, the relationship between current and voltage appears to be non-linear, with the current increasing more rapidly at higher voltages.
* In graph b, the current increases with pulse width, but the rate of increase slows down at higher pulse widths.
* In graph c, the current exhibits a transient behavior, increasing rapidly at first and then reaching a steady-state value.
### Interpretation
The graphs illustrate how current responds to changes in voltage, pulse width, and time under specific conditions. Graph a shows the current response to varying voltage with a fixed pulse width. Graph b shows the current response to varying pulse width with a fixed voltage amplitude. Graph c shows the current response over time with a fixed voltage amplitude. The data suggests that the current is influenced by all three parameters, with the voltage having a more pronounced effect than the pulse width. The transient behavior in graph c indicates a capacitive or inductive effect in the circuit. The plateauing of the current in graph c suggests a saturation effect or a limit to the current flow.
</details>
F igure A. 27 : Echo state Properties. Variation in the device conductance of the volatile diffusive perovskite memristor as a function of the inter-group pulse interval. The interval between the two sequences increases from (a) 10 ms, (b) 30 ms to (c) 300 ms. (d-g) Current responses when subjected to 10 identical stimulation pulses ( 1 V, 5 ms ) with different pulse interval conditions for the final pulse. The interval varies from (d) 10 ms, (e) 23 ms, (f) 41 ms, to (g) 80 ms.
<details>
<summary>Image 65 Details</summary>

### Visual Description
## Chart Type: Multiple Time Series Plots
### Overview
The image contains seven time series plots (labeled a-g), each showing the relationship between current (in Amperes) and time (in seconds). Each plot also includes a schematic representation of a stimulus pattern along the time axis. The current values are on the order of 10^-6 Amperes.
### Components/Axes
* **Y-axis (Current):** All plots share the same y-axis, labeled "Current (A)". The scale ranges from approximately 8x10^-7 to 1.6x10^-6.
* **X-axis (Time):** All plots have an x-axis labeled "Time (secs)". The time scales vary between plots.
* **Data Points:** Data points are represented by squares. Plots a, b, and c use green squares, while plots d, e, f, and g use purple squares.
* **Stimulus Pattern:** Each plot includes a schematic representation of a stimulus pattern, depicted as vertical lines along the x-axis.
* **Labels:** Plot 'a' has labels 'a1' and 'a4' indicating specific data points.
### Detailed Analysis
**Plot a:**
* **X-axis:** Time ranges from approximately 0.02 to 0.1 seconds.
* **Data Points:** Green squares.
* **Trend:** The current increases with time.
* **Values:**
* a1: Time ~ 0.04 secs, Current ~ 1.2x10^-6 A
* a4: Time ~ 0.07 secs, Current ~ 1.4x10^-6 A
* **Stimulus Pattern:** Four vertical lines, indicating four stimuli.
**Plot b:**
* **X-axis:** Time ranges from approximately 0.6 to 0.7 seconds.
* **Data Points:** Green squares.
* **Trend:** The current increases slightly with time.
* **Values:**
* Time ~ 0.61 secs, Current ~ 1.3x10^-6 A
* Time ~ 0.67 secs, Current ~ 1.4x10^-6 A
* **Stimulus Pattern:** Two vertical lines, indicating two stimuli.
**Plot c:**
* **X-axis:** Time ranges from approximately 1.2 to 1.6 seconds.
* **Data Points:** Green squares.
* **Trend:** The current increases slightly with time.
* **Values:**
* Time ~ 1.2 secs, Current ~ 1.4x10^-6 A
* Time ~ 1.4 secs, Current ~ 1.5x10^-6 A
* **Stimulus Pattern:** Two vertical lines, indicating two stimuli.
**Plot d:**
* **X-axis:** Time ranges from approximately 0 to 0.2 seconds.
* **Data Points:** Purple squares.
* **Trend:** The current increases with time.
* **Values:**
* Time ~ 0.02 secs, Current ~ 1.2x10^-6 A
* Time ~ 0.16 secs, Current ~ 1.6x10^-6 A
* **Stimulus Pattern:** Eight vertical lines, indicating eight stimuli.
**Plot e:**
* **X-axis:** Time ranges from approximately 0.6 to 0.8 seconds.
* **Data Points:** Purple squares.
* **Trend:** The current increases with time.
* **Values:**
* Time ~ 0.61 secs, Current ~ 1.4x10^-6 A
* Time ~ 0.75 secs, Current ~ 1.6x10^-6 A
* **Stimulus Pattern:** Eight vertical lines, indicating eight stimuli.
**Plot f:**
* **X-axis:** Time ranges from approximately 1.2 to 1.4 seconds.
* **Data Points:** Purple squares.
* **Trend:** The current increases with time.
* **Values:**
* Time ~ 1.22 secs, Current ~ 1.4x10^-6 A
* Time ~ 1.35 secs, Current ~ 1.6x10^-6 A
* **Stimulus Pattern:** Eight vertical lines, indicating eight stimuli.
**Plot g:**
* **X-axis:** Time ranges from approximately 1.8 to 2.0 seconds.
* **Data Points:** Purple squares.
* **Trend:** The current increases with time.
* **Values:**
* Time ~ 1.81 secs, Current ~ 1.5x10^-6 A
* Time ~ 1.98 secs, Current ~ 1.7x10^-6 A
* **Stimulus Pattern:** Eight vertical lines, indicating eight stimuli.
### Key Observations
* The current generally increases with time in all plots.
* The stimulus patterns vary in the number of stimuli and their timing.
* Plots a, b, and c show a smaller number of stimuli and a shorter time scale compared to plots d, e, f, and g.
* Plots a, b, and c use green data points, while plots d, e, f, and g use purple data points.
### Interpretation
The plots likely represent the response of a system (possibly a biological or electronic system) to different stimulus patterns. The increase in current over time suggests that the system is responding to the stimuli. The different stimulus patterns and data point colors (green vs. purple) may represent different experimental conditions or different types of stimuli. The data suggests that the system's response is dependent on the timing and number of stimuli applied. Further information about the experimental setup and the nature of the stimuli would be needed to fully interpret the results.
</details>
supplementary note 4 The echo state property of a reservoir refers to the impact that previous inputs have on the current reservoir state, and how that influence fades out with time. To test this, four short pulses of 1 V, 5 ms are applied to the device in a paired-pulse format and the device states are recorded. A non-linear accumulative behaviour is observed as a function of the paired-pulse interval. In Fig A. 27 a, a short paired-pulse interval of 10 ms results in an echo index (defined as ( a 4 a 1 ) ∗ 100 ) of 118%. Longer intervals ( 30 ms and 300 ms in Fig. A. 27 b,c) result in smaller echo indices ( 107.5% and 107.2% respectively), reflective of the short-term memory in the perovskite memristors. To further test the echo state property, three pulse trains consisting of 10 identical stimulation pulses ( 1 V, 5 ms ) are applied to the device and the device states are recorded. In all cases, a non-linear accumulative behaviour is observed. As shown in Fig. A. 27 d-g, short intervals ( ≤ 23 ms ) for the last stimulation pulse result in further accumulation while long intervals result in depression of the device state. This indicates that the present device state remembers the input temporal features in the recent past but not the far past, allowing the diffusive perovskite memristors to act as efficient reservoir elements.
F igure A. 28 : Input waveforms. A representative "Write" (amplitude = 1 V, pulse width = 20 ms ) and "Read" (amplitude = -0.5 V, pulse width = 5 ms ) spike train applied to the volatile perovskite memristors in the reservoir layer.
<details>
<summary>Image 66 Details</summary>

### Visual Description
## Bar Chart: Voltage vs. Time
### Overview
The image is a bar chart displaying voltage (V) over time (s). The chart shows discrete voltage spikes at various time intervals. The voltage appears to be either 0V or 1V.
### Components/Axes
* **X-axis (Horizontal):** Time (s), ranging from 0.00 to 1.00, with tick marks at intervals of 0.25. There are also minor tick marks at intervals of 0.04166666666666667.
* **Y-axis (Vertical):** Voltage (V), ranging from -1.0 to 1.5, with tick marks at intervals of 0.5.
* **Data:** Vertical bars representing voltage at specific time points. The bars are gray.
### Detailed Analysis
The chart shows voltage spikes occurring at the following approximate times:
* Around 0.02 s: Several closely spaced spikes.
* Around 0.12 s: A single spike.
* Around 0.20 s: A single spike.
* Around 0.28 s: A single spike.
* Around 0.40 s: A single spike.
* Around 0.52 s: A single spike.
* Around 0.68 s: A single spike.
* Around 0.76 s: A single spike.
* Around 1.00 s: A single spike.
The voltage during these spikes appears to be approximately 1.0 V. At all other times, the voltage is 0 V.
### Key Observations
* The voltage is primarily at 0V, with intermittent spikes to 1V.
* The spikes are not evenly distributed over time. There is a cluster of spikes near the beginning of the time interval.
### Interpretation
The chart represents a digital signal where the voltage is either low (0V) or high (1V). The spikes indicate instances where the signal is high. The pattern of spikes suggests a specific event or process occurring at those times. The clustering of spikes at the beginning might indicate an initial setup or burst of activity.
</details>
F igure A. 29 : Weight distribution. The synaptic weight distribution after training. (a-b) Conductance distribution of both positive and negative differential perovskite memristors. (c) The effective (unscaled) conductance distribution. (d) Weight distribution of the readout layer with floating-point-double precision weights. The effective memristive weights and FP 64 weights follow a similar distribution.
<details>
<summary>Image 67 Details</summary>

### Visual Description
## Histogram Plots: Conductance Distributions
### Overview
The image presents four histogram plots (a, b, c, d) displaying the distribution of conductance values in milliSiemens (mS) for different experimental conditions or data sets. Each plot shows the count (frequency) of observations for various conductance ranges.
### Components/Axes
* **General Structure:** Each plot has a similar structure:
* **Y-axis:** "Count", ranging from 0 to a maximum value (varying between plots).
* **X-axis:** "Conductance (mS)", with varying ranges depending on the plot.
* **Legend:** Located in the top-right corner of each plot, indicating the data series represented.
* **Plot a:**
* **Y-axis:** Count, ranging from 0 to 20.
* **X-axis:** Conductance (mS), ranging from 0 to 3.
* **Legend:** "G+" (dark blue).
* **Plot b:**
* **Y-axis:** Count, ranging from 0 to 15.
* **X-axis:** Conductance (mS), ranging from 0 to 3.
* **Legend:** "G-" (dark blue).
* **Plot c:**
* **Y-axis:** Count, ranging from 0 to 15.
* **X-axis:** Conductance (mS), ranging from -2 to 2.
* **Legend:** "G+ - G-" (dark blue).
* **Plot d:**
* **Y-axis:** Count, ranging from 0 to 30.
* **X-axis:** Conductance (mS), ranging from -2 to 2.
* **Legend:** "FP64" (red).
### Detailed Analysis
* **Plot a (G+):**
* The distribution is right-skewed, with the highest count occurring between 0 and 1 mS.
* Approximate counts:
* 0-0.5 mS: ~20
* 0.5-1 mS: ~19
* 1-1.5 mS: ~12
* 1.5-2 mS: ~9
* 2-2.5 mS: ~10
* 2.5-3 mS: ~3
* 3-3.5 mS: ~7
* **Plot b (G-):**
* The distribution is also right-skewed, similar to Plot a.
* Approximate counts:
* 0-0.5 mS: ~11
* 0.5-1 mS: ~17
* 1-1.5 mS: ~13
* 1.5-2 mS: ~13
* 2-2.5 mS: ~5
* 2.5-3 mS: ~4
* 3-3.5 mS: ~8
* **Plot c (G+ - G-):**
* The distribution is centered around 0 mS, with a relatively symmetrical shape.
* Approximate counts:
* -2 to -1.5 mS: ~3
* -1.5 to -1 mS: ~7
* -1 to -0.5 mS: ~8
* -0.5 to 0 mS: ~14
* 0 to 0.5 mS: ~15
* 0.5 to 1 mS: ~8
* 1 to 1.5 mS: ~7
* 1.5 to 2 mS: ~3
* **Plot d (FP64):**
* The distribution is sharply peaked around 0 mS.
* Approximate counts:
* -2 to -1.5 mS: ~1
* -1.5 to -1 mS: ~2
* -1 to -0.5 mS: ~6
* -0.5 to 0 mS: ~14
* 0 to 0.5 mS: ~28
* 0.5 to 1 mS: ~7
* 1 to 1.5 mS: ~3
* 1.5 to 2 mS: ~1
### Key Observations
* Plots a (G+) and b (G-) show similar distributions, both skewed towards lower conductance values.
* Plot c (G+ - G-) shows a distribution centered around zero, suggesting a balance between G+ and G- values.
* Plot d (FP64) exhibits a much narrower distribution, with a strong peak at 0 mS, indicating a more consistent conductance value for this condition.
### Interpretation
The histograms provide insights into the conductance characteristics of different experimental conditions. The similarity between G+ and G- distributions suggests a common underlying mechanism. The G+ - G- distribution centered around zero implies a degree of compensation or balance between these two components. The sharp peak in the FP64 distribution indicates a more stable and defined conductance state compared to the other conditions. The data suggests that FP64 exhibits a more consistent conductance behavior, while G+ and G- have a broader range of conductance values. The difference between G+ and G- is relatively centered around zero, suggesting that they tend to balance each other out.
</details>
F igure A. 30 : Training RC with FP 64 readout weights using backpropagation. The training metrics of ANN is shown. (a) Training and testing accuracy over 5 epochs demonstrates that the network solves the classification task with high accuracy without overfitting. (b) Confidence matrix calculated at the end of training. The correct response probability is shown in the right color scale. It is evident that network performs slightly worse in discriminating irregular patterns.
<details>
<summary>Image 68 Details</summary>

### Visual Description
## Chart/Diagram Type: Combined Line Graph and Heatmap
### Overview
The image presents two sub-figures: (a) a line graph showing the accuracy of training and testing datasets over epochs, and (b) a heatmap displaying the relationship between different categories (Burst, Adaptation, Irregular, Tonic).
### Components/Axes
**Sub-figure (a): Line Graph**
* **Title:** Implicitly, "Accuracy vs. Epoch"
* **X-axis:** Epoch
* Scale: 0 to 5, incrementing by 1.
* **Y-axis:** Accuracy (%)
* Scale: 0 to 100, incrementing by 20.
* **Legend:** Located in the top-right of the graph.
* Training (Blue line with square markers)
* Testing (Red line with square markers)
**Sub-figure (b): Heatmap**
* **X-axis:** Categories (Burst, Adaptation, Irregular, Tonic)
* **Y-axis:** Categories (Burst, Adaptation, Irregular, Tonic)
* **Color Scale:** Ranges from approximately 0 (light green) to 1 (dark blue). The color scale is located on the right side of the heatmap.
### Detailed Analysis
**Sub-figure (a): Line Graph**
* **Training (Blue):**
* Trend: The training accuracy increases sharply from epoch 0 to 1, then plateaus with a slight increase from epoch 1 to 5.
* Data Points:
* Epoch 0: ~24%
* Epoch 1: ~85%
* Epoch 2: ~92%
* Epoch 3: ~93%
* Epoch 4: ~94%
* Epoch 5: ~95%
* **Testing (Red):**
* Trend: The testing accuracy increases sharply from epoch 0 to 1, then plateaus with a slight increase from epoch 1 to 5.
* Data Points:
* Epoch 0: ~22%
* Epoch 1: ~84%
* Epoch 2: ~90%
* Epoch 3: ~92%
* Epoch 4: ~92%
* Epoch 5: ~93%
**Sub-figure (b): Heatmap**
The heatmap represents the relationship between the categories "Burst", "Adaptation", "Irregular", and "Tonic". The color intensity indicates the strength of the relationship, with darker blue indicating a stronger relationship and lighter green indicating a weaker relationship.
* **Categories:**
* Burst
* Adaptation
* Irregular
* Tonic
* **Heatmap Values (Approximate):**
* Burst vs. Burst: ~0.9 (Dark Blue)
* Burst vs. Adaptation: ~0.2 (Light Green)
* Burst vs. Irregular: ~0.3 (Light Green)
* Burst vs. Tonic: ~0.2 (Light Green)
* Adaptation vs. Burst: ~0.2 (Light Green)
* Adaptation vs. Adaptation: ~0.9 (Dark Blue)
* Adaptation vs. Irregular: ~0.3 (Light Green)
* Adaptation vs. Tonic: ~0.2 (Light Green)
* Irregular vs. Burst: ~0.3 (Light Green)
* Irregular vs. Adaptation: ~0.3 (Light Green)
* Irregular vs. Irregular: ~0.9 (Dark Blue)
* Irregular vs. Tonic: ~0.3 (Light Green)
* Tonic vs. Burst: ~0.2 (Light Green)
* Tonic vs. Adaptation: ~0.2 (Light Green)
* Tonic vs. Irregular: ~0.3 (Light Green)
* Tonic vs. Tonic: ~0.9 (Dark Blue)
### Key Observations
* **Line Graph:** The training and testing accuracy are very close, suggesting the model is not overfitting. The accuracy increases rapidly in the first epoch and then plateaus.
* **Heatmap:** The diagonal elements (Burst vs. Burst, Adaptation vs. Adaptation, Irregular vs. Irregular, Tonic vs. Tonic) have high values (dark blue), indicating a strong self-correlation. The off-diagonal elements are generally low (light green), indicating weak relationships between different categories.
### Interpretation
The line graph demonstrates the learning curve of a model, showing how accuracy improves with training epochs. The close proximity of the training and testing accuracy lines suggests good generalization performance. The heatmap visualizes the relationships between different categories, indicating that each category is strongly correlated with itself and weakly correlated with the others. This could imply that the categories are well-defined and distinct.
</details>
T able A. 1 : Comparing training and test accuracies of both approaches. The neural spiking pattern classification performance table comparing the two approaches. The readout layer with drift-based halide-perovskite memristor weights trained with online ( Icc ) control achieves comparable result with FP 64 weights trained with backpropagation.
| | | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) |
|----------------------|----------|----------------|----------------|----------------|----------------|----------------|----------------|
| | | Epoch 0 | Epoch 1 | Epoch 2 | Epoch 3 | Epoch 4 | Epoch 5 |
| FP 64 | Training | 24 . 68 | 87 . 76 | 91 . 38 | 92 . 47 | 92 . 99 | 93 . 54 |
| with Backpropagation | Testing | 20 . 88 | 86 . 14 | 89 . 16 | 90 . 16 | 91 . 37 | 91 . 77 |
| Perovskite | Training | 10 . 32 | 69 . 11 | 89 . 12 | 83 . 09 | 85 . 59 | 86 . 75 |
| with I cc Control | Testing | 14 . 46 | 73 . 29 | 86 . 14 | 80 . 92 | 84 . 94 | 85 . 14 |
F igure A. 31 : I cc modulated training for drift-based perovskite configuration.
<details>
<summary>Image 69 Details</summary>

### Visual Description
## Diagram: Icc Controlled Weight Update
### Overview
The image is a flowchart illustrating the process of Icc (presumably, "Intra-Class Correlation") controlled weight update in a system involving initialization, inference, and weight adjustment based on prediction error. The diagram shows the flow of data and operations from input to the final weight update.
### Components/Axes
The diagram is divided into three main sections:
1. **Initialization:** Located at the top-left.
2. **Inference:** Located in the middle-left.
3. **Icc Controlled Weight Update:** Located at the top-right and extending to the right side of the diagram.
The diagram uses rounded rectangles to represent processes or operations, and arrows to indicate the flow of data.
### Detailed Analysis or ### Content Details
**1. Initialization:**
* Text: "Initialization"
* Process: "RESET G+ and G-"
**2. Inference:**
* Text: "Inference"
* Input: "Input"
* Process: "Readout Inference" containing the formula: `σ[∑ xi(G+ij - G-ij)]`
* Process: "Prediction Å·"
* Process: "Target y"
* Process: "Loss" with the formula: `y - Å·`
**3. Icc Controlled Weight Update:**
* Text: "Icc Controlled Weight Update"
* "Prediction Error" branches off from the "Loss" process.
* Process: "READ G+"
* Process: "READ G-"
* Scaling Factor: "η" (located between the Loss and Linear Scaling)
* Process: "Linear Scaling"
* Process: "Target G+"
* Process: "Target G-"
* Process: "G to Icc Mapping"
* Process: "Target I+CC"
* Process: "Target I-CC"
* Process: "RESET G+ and G-"
* Process: "WRITE G+ with I+CC"
* Process: "WRITE G- with I-CC"
### Key Observations
* The diagram shows a cyclical process where the output of the inference stage is used to update the weights, which then influence subsequent inferences.
* The "Loss" calculation (y - Å·) is central to the weight update process, as it determines the "Prediction Error" that drives the adjustments.
* The diagram uses G+ and G- to represent positive and negative components, respectively.
* The Icc mapping is used to control the weight update.
### Interpretation
The diagram illustrates a machine learning or neural network training process. The system takes an input, performs inference to generate a prediction (Å·), compares it to the target (y) to calculate the loss, and then uses this loss to update the weights (G+ and G-) in a controlled manner using the Icc mapping. The initialization step resets the weights, and the final step writes the updated weights back into the system. The use of G+ and G- suggests a mechanism for handling both positive and negative influences on the weights. The Icc controlled weight update likely aims to improve the model's accuracy and generalization by adjusting the weights based on the prediction error and the Icc mapping.
</details>
supplementary note 5 During the inference procedure, reservoir output vector of length 30 is fed into the readout layer. Memristors in the readout layer are placed in the differential architecture, in which the difference of conductance values of two differential memristors ( G + and G -) determines the effective synaptic strength. Scaled with β = 1/ ( G max -G min ) , where G max = 0.35mS and G min = 0.1mS, the weight matrix ( 30 × 4 ) is calculated as W = β [ G + -G -] . And the network prediction is determined by choosing output neuron index with the maximum activation level.
For the training procedure, the network loss is calculated as the difference between output layer prediction and the one-hot encoded target vector indicating one of the four firing patterns. At this point, one can calculate targeted weights using the backpropagation algorithm. However, to support fully online-learning, we tested Icc controlled weight update scheme where following stages in the pipeline can be easily implemented with the mixed-signal circuits in an event-driven manner. The Icc controlled weight update is implemented as follows. First, the required weight change is calculated with W target = η xi δ j , where η is suitably low learning rate, xi is the reservoir layer output and δ j is the calculated error. In order to calculate the target conductance values for both positive and negative memristors, we first linearly scale the weight change to conductance change (by multiplying with 1/ β ). Secondly, we read both the positive and negative conductance values. By using a push-pull mechanism, we calculate the target conductance values. The pushpull mechanism ensures a higher dynamic range in the differential configuration. Third, the target conductances are linearly mapped to the target Icc values ( Icc, target = ( G target + 1.249 × 10 -5 ) / 3 . 338 ) for positive and negative memristors. The weights are updated with the application of RESET and SET pulses with the targeted Icc. Using the linear relation between Icc → G control, we calculated mean ( µ G = 3.338Icc -1.294 × 10 -5 ) and standard deviation σ G = 7.040Icc + 3.0585 and sample from the corresponding Normal distribution.)
F igure A. 32 : 'Reconfigurability' on-the-fly of OGB-capped CsPbBr 3 NC memristors. The device is switched between its non-volatile and volatile mode on demand.
<details>
<summary>Image 70 Details</summary>

### Visual Description
## Current vs. Time Graph with Varying Voltage and Current Conditions
### Overview
The image is a line graph showing the current (in Amperes) as a function of time (in seconds). The graph illustrates how the current changes over time under different voltage and current conditions, which are indicated by numbered, colored vertical bands. The conditions alternate between applying a "perturb" voltage (VM) and setting the voltage with varying current limits (NVM).
### Components/Axes
* **Y-axis:** Current (A), with a scale from 0 to 2x10^-4 A. Major tick marks are at 0, 5x10^-5, 1x10^-4, 1.5x10^-4, and 2x10^-4 A.
* **X-axis:** Time (secs), with a scale from 0 to 0.6 seconds. Major tick marks are at 0, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6 seconds.
* **Data Series:** A single data series represented by a blue line with square markers, showing the current's evolution over time.
* **Vertical Bands:** Nine vertical bands, each with a different background color, numbered 1 through 9. Each band corresponds to a specific voltage/current condition described in the legend to the right of the graph.
* **Legend (Right Side):** A numbered list describing the voltage and current conditions for each vertical band.
1. NVM: I = 1 nA, read = 0.1 V, 5 ms
2. NVM: Vset = 5 V, 5 ms, Icc = 1 µA, read = 0.1 V, 5 ms
3. VM: Vperturb = 0.3 V, 5 ms
4. NVM: Vset = 5 V, 5 ms, Icc = 60 µA, read = 0.1 V, 5 ms
5. VM: Vperturb = 0.3 V, 5 ms
6. NVM: Vset = 5 V, 5 ms, Icc = 120 µA, read = 0.1 V, 5 ms
7. VM: Vperturb = 0.3 V, 5 ms
8. NVM: Icc = Vset = 5 V, 5 ms, 180 µA, read = 0.1 V, 5 ms
9. VM: Vperturb = 0.3 V, 5 ms
### Detailed Analysis
* **Band 1 (Blue):** NVM, I = 1 nA. The current is approximately 0 A.
* **Band 2 (Green):** NVM, Vset = 5 V, Icc = 1 µA. The current increases to approximately 2x10^-5 A.
* **Band 3 (Yellow):** VM, Vperturb = 0.3 V. The current remains around 2x10^-5 A, with some fluctuations.
* **Band 4 (Green):** NVM, Vset = 5 V, Icc = 60 µA. The current increases to approximately 6x10^-5 A.
* **Band 5 (Yellow):** VM, Vperturb = 0.3 V. The current remains around 6x10^-5 A, with some fluctuations.
* **Band 6 (Green):** NVM, Vset = 5 V, Icc = 120 µA. The current increases to approximately 1.2x10^-4 A.
* **Band 7 (Yellow):** VM, Vperturb = 0.3 V. The current remains around 1.2x10^-4 A, with some fluctuations.
* **Band 8 (Pink):** NVM, Icc = Vset = 5 V, 180 µA. The current increases to approximately 1.8x10^-4 A.
* **Band 9 (Purple):** VM, Vperturb = 0.3 V. The current remains around 1.8x10^-4 A, with some fluctuations.
### Key Observations
* The current increases significantly when the voltage is set (Vset) with increasing current limits (Icc).
* The current remains relatively stable, with minor fluctuations, when the perturb voltage (Vperturb) is applied.
* The current levels off at different plateaus corresponding to the different Icc values.
### Interpretation
The graph demonstrates the effect of applying different voltage and current conditions to a non-volatile memory (NVM) device. Setting the voltage (Vset) with increasing current limits (Icc) causes the current to increase in a stepwise manner, indicating a change in the device's resistance. Applying a perturb voltage (Vperturb) does not significantly alter the current, suggesting that it does not cause a significant change in the device's state. The data suggests that the NVM device's resistance can be controlled by varying the Vset and Icc parameters. The plateaus observed during the Vperturb phases indicate that the device maintains its state until a new Vset is applied.
</details>
supplementary note 6 To demonstrate "reconfigurability" on-the-fly, our devices are switched between volatile and non-volatile modes with precise compliance current ( Icc ) con-
trol and selection of activation voltages. Fig. A. 32 shows that our devices can act as a volatile memory even after setting it to multiple non-volatile states. This proves true "reconfigurability" of our devices, hitherto undemonstrated. Such behaviour is an example of the neuromorphic implementation of synapses in Spiking Neural Networks (SNNs) that demand both volatile and non-volatile switching properties, simultaneously.
- 1 . McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5 , 115 ( 4 1943 ).
- 2 . Turing, A. M. I.-computing machinery and intelligence. Mind LIX , 433 ( 236 1950 ).
- 3 . P., H. & von Neumann, J. The computer and the brain. Mathematical Tables and Other Aids to Computation 13 , 226 ( 67 1959 ).
- 4 . Tagkopoulos, I., Liu, Y.-C. & Tavazoie, S. Predictive behavior within microbial genetic networks. Science 320 , 1313 ( 5881 2008 ).
- 5 . Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhang, H., Zhu, B., Jordan, M., Gonzalez, J. E. & Stoica, I. Chatbot Arena: An open platform for evaluating LLMs by human preference. arXiv [cs.AI] ( 2024 ).
- 6 . Marr, D. Vision: A computational investigation into the human representation and processing of visual information 429 pp. (MIT Press, London, England, 2010 ).
- 7 . Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., Berker, A. d., Ganguli, S., Gillon, C. J., Hafner, D., Kepecs, A., Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud, R., Pack, C. C., Poirazi, P., Roelfsema, P., Sacramento, J., Saxe, A., Scellier, B., Schapiro, A. C., Senn, W., Wayne, G., Yamins, D., Zenke, F., Zylberberg, J., Therien, D. & Kording, K. P. A deep learning framework for neuroscience. Nature Neuroscience 22 , 1761 ( 2019 ).
- 8 . Braitenberg, V. & Schüz, A. in Cortex: Statistics and Geometry of Neuronal Connectivity 51 (Springer Berlin Heidelberg, Berlin, Heidelberg, 1998 ).
- 9 . OpenAI et al. GPT4 Technical Report. arXiv [cs.CL] ( 2023 ).
- 10 . Henighan, T., Kaplan, J., Katz, M., Chen, M., Hesse, C., Jackson, J., Jun, H., Brown, T. B., Dhariwal, P., Gray, S., Hallacy, C., Mann, B., Radford, A., Ramesh, A., Ryder, N., Ziegler, D. M., Schulman, J., Amodei, D. & McCandlish, S. Scaling laws for autoregressive generative modeling. arXiv [cs.LG] ( 2020 ).
- 11 . Horowitz, M. 1 . 1 Computing's Energy Problem (and what we can do about it) in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) ( 2014 ), 10 .
- 12 . Boroumand, A., Ghose, S., Akin, B., Narayanaswami, R., Oliveira, G. F., Ma, X., Shiu, E. & Mutlu, O. Google neural network models for edge devices: Analyzing and mitigating machine learning inference bottlenecks in 2021 30 th International Conference on Parallel Architectures and Compilation Techniques (PACT) 2021 30 th International Conference on Parallel Architectures and Compilation Techniques (PACT)Atlanta, GA, USA (IEEE, 2021 ), 159 .
- 13 . Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 18 , 507 ( 5 1971 ).
- 14 . Le Gallo, M., Sebastian, A., Mathis, R., Manica, M., Giefers, H., Tuma, T., Bekas, C., Curioni, A. & Eleftheriou, E. Mixed-precision in-memory computing. Nat. Electron. 1 , 246 ( 4 2018 ).
- 15 . Bartol, T. M., Bromer, C., Kinney, J., Chirillo, M. A., Bourne, J. N., Harris, K. M. & Sejnowski, T. J. Nanoconnectomic upper bound on the variability of synaptic plasticity. Elife 4 , e 10778 ( 2015 ).
- 16 . Burr, G. W., Shelby, R. M., Sebastian, A., Kim, S., Kim, S., Sidler, S., Virwani, K., Ishii, M., Narayanan, P., Fumarola, A., Sanches, L. L., Boybat, I., Le Gallo, M., Moon, K., Woo, J., Hwang, H. & Leblebici, Y. Neuromorphic computing using non-volatile memory. Adv. Phys. X. 2 , 89 ( 1 2017 ).
- 17 . Demirag, Y. Multiphysics modeling of Ge 2 Sb 2 Te 5 based synaptic devices for brain inspired computing MA thesis (Ihsan Dogramaci Bilkent University, Ankara, Turkey, 2018 ).
- 18 . Wan, W., Kubendran, R., Schaefer, C., Eryilmaz, S. B., Zhang, W., Wu, D., Deiss, S., Raina, P., Qian, H., Gao, B., et al. A compute-in-memory chip based on resistive random-access memory. Nature 608 , 504 ( 2022 ).
- 19 . Chanthbouala, A., Garcia, V., Cherifi, R. O., Bouzehouane, K., Fusil, S., Moya, X., Xavier, S., Yamada, H., Deranlot, C., Mathur, N. D., Bibes, M., Barthélémy, A. & Grollier, J. A ferroelectric memristor. Nat. Mater. 11 , 860 ( 10 2012 ).
- 20 . Lu, S. & Sengupta, A. Exploring the connection between binary and Spiking Neural Networks. Front. Neurosci. 14 , 535 ( 2020 ).
- 21 . John, R. A., Acharya, J., Zhu, C., Surendran, A., Bose, S. K., Chaturvedi, A., Tiwari, N., Gao, Y., He, Y., Zhang, K. K., Xu, M., Leong, W. L., Liu, Z., Basu, A. & Mathews, N. Optogenetics inspired transition metal dichalcogenide neuristors for in-memory deep recurrent neural networks. Nat. Commun. 11 , 3211 ( 1 2020 ).
- 22 . Sidler, S., Boybat, I., Shelby, R. M., Narayanan, P., Jang, J., Fumarola, A., Moon, K., Leblebici, Y., Hwang, H. & Burr, G. W. Large-scale neural networks implemented with Non-Volatile Memory as the synaptic weight element: Impact of conductance response in 2016 46 th European Solid-State Device Research Conference (ESSDERC) ESSDERC 2016 -46 th European Solid-State Device Research ConferenceLausanne, Switzerland (IEEE, 2016 ), 440 .
- 23 . Khaddam-Aljameh, R., Stanisavljevic, M., Mas, J. F., Karunaratne, G., Brändli, M., Liu, F., Singh, A., Müller, S. M., Egger, U., Petropoulos, A., Antonakopoulos, T., Brew, K., Choi, S., Ok, I., Lie, F. L., Saulnier, N., Chan, V ., Ahsan, I., Narayanan, V., Nandakumar, S. R., Le Gallo, M., Francese, P. A., Sebastian, A. & Eleftheriou, E. HERMES-Core-A 1 . 59 -TOPS/mm 2 PCM on 14 -nm CMOS In-Memory Compute Core Using 300 -ps/LSB Linearized CCO-Based ADCs. IEEE J. Solid-State Circuits , 1 ( 2022 ).
- 24 . Berdan, R., Vasilaki, E., Khiat, A., Indiveri, G., Serb, A. & Prodromakis, T. Emulating short-term synaptic dynamics with memristive devices. Sci. Rep. 6 , 18639 ( 1 2016 ).
- 25 . Ohno, T., Hasegawa, T., Tsuruoka, T., Terabe, K., Gimzewski, J. K. & Aono, M. Short-term plasticity and long-term potentiation mimicked in single inorganic synapses. Nat. Mater. 10 , 591 ( 8 2011 ).
- 26 . Zhang, X., Wang, W., Liu, Q., Zhao, X., Wei, J., Cao, R., Yao, Z., Zhu, X., Zhang, F., Lv, H., Long, S. & Liu, M. An artificial neuron based on a threshold switching memristor. IEEE Electron Device Lett. 39 , 308 ( 2 2018 ).
- 27 . Huang, H.-M., Yang, R., Tan, Z.-H., He, H.-K., Zhou, W., Xiong, J. & Guo, X. Quasi-HodgkinHuxley neurons with leaky integrate-and-fire functions physically realized with memristive devices. Adv. Mater. 31 , e 1803849 ( 3 2019 ).
- 28 . Dalgaty, T., Moro, F., Demirag, Y., De Pra, A., Indiveri, G., Vianello, E. & Payvand, M. Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems. Nat. Commun. 15 , 1 ( 1 2024 ).
- 29 . Indiveri, G., Linares-Barranco, B., Hamilton, T., van Schaik, A., Etienne-Cummings, R., Delbruck, T., Liu, S.-C., Dudek, P., Häfliger, P., Renaud, S., Schemmel, J., Cauwenberghs, G., Arthur, J., Hynna, K., Folowosele, F., Saighi, S., Serrano-Gotarredona, T., Wijekoon, J., Wang, Y. & Boahen, K. Neuromorphic silicon neuron circuits. Frontiers in Neuroscience 5 , 1 ( 2011 ).
- 30 . Sterling, P. in Principles of Neural Design 155 (The MIT Press, 2015 ).
- 31 . Liu, S.-C., Van Schaik, A., Minch, B. A. & Delbruck, T. Event-based 64 -channel binaural silicon cochlea with Q enhancement mechanisms in 2010 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2010 ), 2027 .
- 32 . Rueckauer, B. & Delbruck, T. Evaluation of event-based algorithms for optical flow with ground-truth from inertial measurement sensor. Front. Neurosci. 10 , 176 ( 2016 ).
- 33 . Mahowald, M. VLSI analogs of neuronal visual processing: a synthesis of form and function ( 1992 ).
- 34 . Boahen, K. Dendrocentric learning for synthetic intelligence. Nature 612 , 43 ( 7938 2022 ).
- 35 . Kubke, M. F., Massoglia, D. P. & Carr, C. E. Developmental changes underlying the formation of the specialized time coding circuits in barn owls (Tyto alba). J. Neurosci. 22 , 7671 ( 17 2002 ).
- 36 . O'Keefe, J. & Recce, M. L. Phase relationship between hippocampal place units and the EEG theta rhythm. Hippocampus 3 , 317 ( 3 1993 ).
- 37 . MacKay, D. M. & McCulloch, W. S. The limiting information capacity of a neuronal link. Bulletin of Mathematical Biophysics 14 , 127 ( 2 1952 ).
- 38 . Aceituno, P. V., de Haan, S., Loidl, R. & Grewe, B. F. Hierarchical target learning in the mammalian neocortex: A pyramidal neuron perspective. bioRxiv ( 2024 ).
- 39 . Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. bioRxiv ( 2020 ).
- 40 . Gerstner, W., Kempter, R., van Hemmen, J. L. & Wagner, H. A neuronal learning rule for sub-millisecond temporal coding. Nature 383 , 76 ( 6595 1996 ).
- 41 . Demirag, Y. & Indiveri, G. Network of biologically plausible neuron models can solve motor tasks through heterogeneity in Computational and Systems Neuroscience (COSYNE) (Lisbon, Portugal, 2024 ).
- 42 . Hardtdegen, A., La Torre, C., Cuppers, F., Menzel, S., Waser, R. & Hoffmann-Eifert, S. Improved switching stability and the effect of an internal series resistor in HfO 2 /TiO<italic>x</italic> bilayer ReRAM cells. IEEE Trans. Electron Devices 65 , 3229 ( 8 2018 ).
- 43 . Nandakumar, S. R., Boybat, I., Han, J.-P., Ambrogio, S., Adusumilli, P., Bruce, R. L., BrightSky, M., Rasch, M., Le Gallo, M. & Sebastian, A. Precision of synaptic weights programmed in phase-change memory devices for deep learning inference in 2020 IEEE International Electron Devices Meeting (IEDM) 2020 IEEE International Electron Devices Meeting (IEDM)San Francisco, CA, USA (IEEE, 2020 ), 29 . 4 . 1 .
- 44 . Gong, N., Idé, T., Kim, S., Boybat, I., Sebastian, A., Narayanan, V. & Ando, T. Signal and noise extraction from analog memory elements for neuromorphic computing. Nat. Commun. 9 , 2102 ( 1 2018 ).
- 45 . Gallo, M. L., Kaes, M., Sebastian, A. & Krebs, D. Subthreshold electrical transport in amorphous phase-change materials. New Journal of Physics 17 , 093035 ( 2015 ).
- 46 . Burr, G., Shelby, R., Nolfo, C., Jang, J., Shenoy, R., Narayanan, P., Virwani, K., Giacometti, E., Kurdi, B. & Hwang, H. Experimental Demonstration and Tolerancing of a Large-Scale Neural Network ( 165 , 000 Synapses), using Phase-Change Memory as the Synaptic Weight Element. 2014 IEEE International Electron Devices Meeting , 1 ( 2014 ).
- 47 . Stathopoulos, S., Khiat, A., Trapatseli, M., Cortese, S., Serb, A., Valov, I. & Prodromakis, T. Multibit memory operation of metal-oxide bi-layer memristors. Scientific reports 7 , 17532 ( 2017 ).
- 48 . Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G. C., Likharev, K. K. & Strukov, D. B. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521 , 61 ( 2015 ).
- 49 . Bayat, F. M., Prezioso, M., Chakrabarti, B., Nili, H., Kataeva, I. & Strukov, D. Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits. Nature communications 9 , 2331 ( 2018 ).
- 50 . Li, C., Hu, M., Li, Y., Jiang, H., Ge, N., Montgomery, E., Zhang, J., Song, W., Dávila, N., Graves, C. E., et al. Analogue signal and image processing with large memristor crossbars. Nature Electronics 1 , 52 ( 2018 ).
- 51 . Boybat, I., Le Gallo, M., Nandakumar, S., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multimemristive synapses. Nature communications 9 , 2514 ( 2018 ).
- 52 . Agarwal, S., Jacobs Gedrim, R. B., Hsia, A. H., Hughart, D. R., Fuller, E. J., Alec Talin, A., James, C. D., Plimpton, S. J. & Marinella, M. J. Achieving ideal accuracies in analog neuromorphic computing using periodic carry in 2017 Symposium on VLSI Technology (IEEE, 2017 ), T 174 .
- 53 . Suri, M., Bichler, O., Querlioz, D., Palma, G., Vianello, E., Vuillaume, D., Gamrat, C. & DeSalvo, B. CBRAM devices as binary synapses for low-power stochastic neuromorphic systems: auditory (cochlea) and visual (retina) cognitive processing applications in 2012 International Electron Devices Meeting ( 2012 ), 10 .
- 54 . Payvand, M., Muller, L. K. & Indiveri, G. Event-based circuits for controlling stochastic learning with memristive devices in neuromorphic architectures in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2018 ), 1 .
- 55 . Ambrogio, S., Narayanan, P., Tsai, H., Shelby, R. M., Boybat, I., di Nolfo, C., Sidler, S., Giordano, M., Bodini, M., Farinha, N. C., et al. Equivalent-accuracy accelerated neuralnetwork training using analogue memory. Nature 558 , 60 ( 2018 ).
- 56 . Lillicrap, T. P. & Santoro, A. Backpropagation through time and the brain. Curr. Opin. Neurobiol. Machine Learning, Big Data, and Neuroscience 55 , 82 ( 2019 ).
- 57 . Attneave, F. & Hebb, D. O. The organization of behavior; A neuropsychological theory. Am. J. Psychol. 63 , 633 ( 4 1950 ).
- 58 . Sjöström, P. J., Turrigiano, G. G. & Nelson, S. B. Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron 32 , 1149 ( 6 2001 ).
- 59 . Feldman, D. E. The spike-timing dependence of plasticity. Neuron 75 , 556 ( 4 2012 ).
- 60 . Oja, E. A simplified neuron model as a principal component analyzer. J. Math. Biol. 15 , 267 ( 3 1982 ).
- 61 . Frenkel, C., Lefebvre, M., Legat, J.-D. & Bol, D. A 0 . 086 -mm 2 12 . 7 -pJ/SOP 64 k-Synapse 256 -Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28 -nm CMOS. IEEE Trans. Biomed. Circuits Syst. 13 , 145 ( 1 2019 ).
- 62 . Frenkel, C., Legat, J.-D. & Bol, D. MorphIC: A 65 -nm 738 k-synapse/mm 2 quad-core binaryweight digital neuromorphic processor with stochastic spike-driven online learning. arXiv [cs.NE] ( 2019 ).
- 63 . Mayr, C., Partzsch, J., Noack, M., Hänzsche, S., Scholze, S., Höppner, S., Ellguth, G. & Schüffny, R. A biological-realtime neuromorphic system in 28 nm CMOS using low-leakage switched capacitor circuits. IEEE Trans. Biomed. Circuits Syst. 10 , 243 ( 1 2016 ).
- 64 . Brader, J. M., Senn, W. & Fusi, S. Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural computation 19 , 2881 ( 2007 ).
- 65 . Marblestone, A. H., Wayne, G. & Kording, K. P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10 , 94 ( 2016 ).
- 66 . Laskin, M., Metz, L., Nabarro, S., Saroufim, M., Noune, B., Luschi, C., Sohl-Dickstein, J. & Abbeel, P. Parallel training of deep networks with local updates. arXiv [cs.LG] ( 2020 ).
- 67 . Kaiser, J., Mostafa, H. & Neftci, E. Synaptic plasticity dynamics for Deep Continuous Local Learning (DECOLLE). Front. Neurosci. 14 , 424 ( 2020 ).
- 68 . Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323 , 533 ( 6088 1986 ).
- 69 . Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7 , 13276 ( 1 2016 ).
- 70 . Pozzi, I., Bohté, S. & Roelfsema, P. A biologically plausible learning rule for deep learning in the brain. arXiv [cs.NE] ( 2018 ).
- 71 . Lee, D.-H., Zhang, S., Fischer, A. & Bengio, Y. Difference Target Propagation. arXiv [cs.LG] ( 2014 ).
- 72 . Millidge, B., Tschantz, A. & Buckley, C. L. Predictive Coding Approximates Backprop along Arbitrary Computation Graphs. arXiv ( 2020 ).
- 73 . Baydin, A. G., Pearlmutter, B. A., Syme, D., Wood, F. & Torr, P. Gradients without Backpropagation. arXiv [cs.LG] ( 2022 ).
- 74 . Liu, Y. H., Ghosh, A., Richards, B. A., Shea-Brown, E. & Lajoie, G. Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules. arXiv [cs.NE] ( 2022 ).
- 75 . Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R. & Maass, W. A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications 11 , 1 ( 2020 ).
- 76 . Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., van Baalen, M. & Blankevoort, T. A white paper on neural network quantization. arXiv [cs.LG] ( 2021 ).
- 77 . Frenkel, C. & Indiveri, G. ReckOn: A 28 nm sub-mm 2 task-agnostic spiking recurrent neural network processor enabling on-chip learning over second-long timescales. arXiv [cs.NE] ( 2022 ).
- 78 . Lee, J., Kim, D. & Ham, B. Network quantization with element-wise gradient scaling. arXiv [cs.CV] ( 2021 ).
- 79 . Fournarakis, M. & Nagel, M. In-Hindsight Quantization Range Estimation for Quantized Training. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 3057 ( 2021 ).
- 80 . Bernstein, J., Zhao, J., Meister, M., Liu, M.-Y., Anandkumar, A. & Yue, Y. Learning compositional functions via multiplicative weight updates. arXiv [cs.NE] ( 2020 ).
- 81 . Nandakumar, S., Le Gallo, M., Boybat, I., Rajendran, B., Sebastian, A. & Eleftheriou, E. A phase-change memory model for neuromorphic computing. Journal of Applied Physics 124 , 152135 ( 2018 ).
- 82 . Bellec, G., Salaj, D., Subramoney, A., Legenstein, R. & Maass, W. Long short-term memory and learning-to-learn in networks of spiking neurons in Advances in Neural Information Processing Systems ( 2018 ), 787 .
- 83 . Zenke, F. & Neftci, E. O. Brain-Inspired Learning on Neuromorphic Substrates. Proceedings of the IEEE , 1 ( 2021 ).
- 84 . Bohnstingl, T., Wo´ zniak, S., Maass, W., Pantazi, A. & Eleftheriou, E. Online Spatio-Temporal Learning in Deep Neural Networks. arXiv ( 2020 ).
- 85 . Zenke, F. & Ganguli, S. Superspike: Supervised learning in multilayer spiking neural networks. Neural computation 30 , 1514 ( 2018 ).
- 86 . Perez-Nieves, N. & Goodman, D. F. M. Sparse Spiking Gradient Descent. arXiv ( 2021 ).
- 87 . Singh, S. P. & Sutton, R. S. Reinforcement Learning with Replacing Eligibility Traces. Mach. Learn. 22 , 123 ( 1 / 2 / 3 1996 ).
- 88 . Hull, C. L. Principles of Behavior. The Journal of Nervous and Mental Disease 101 , 396 ( 4 1945 ).
- 89 . Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C.-K., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y.-H., Wild, A., Yang, Y. & Wang, H. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38 , 82 ( 2018 ).
- 90 . Grübl, A., Billaudelle, S., Cramer, B., Karasenko, V. & Schemmel, J. Verification and Design Methods for the BrainScaleS Neuromorphic Hardware System. arXiv preprint arXiv: 2003 . 11455 ( 2020 ).
- 91 . Furber, S., Galluppi, F., Temple, S. & Plana, L. The SpiNNaker Project. Proceedings of the IEEE 102 , 652 ( 2014 ).
- 92 . Backus, J. Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. Communications of the ACM 21 , 613 ( 1978 ).
- 93 . Indiveri, G. & Liu, S.-C. Memory and information processing in neuromorphic systems. Proceedings of the IEEE 103 , 1379 ( 2015 ).
- 94 . Rubino, A., Livanelioglu, C., Qiao, N., Payvand, M. & Indiveri, G. Ultra-low-power FDSOI neural circuits for extreme-edge neuromorphic intelligence. IEEE Trans. Circuits Syst. I Regul. Pap. 68 , 45 ( 1 2021 ).
- 95 . Fuller, E. J., Keene, S. T., Melianas, A., Wang, Z., Agarwal, S., Li, Y., Tuchman, Y., James, C. D., Marinella, M. J., Yang, J. J., Salleo, A. & Talin, A. A. Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing. Science 364 , 570 ( 6440 2019 ).
- 96 . Huang, Y.-J., Chao, S.-C., Lien, D., Wen, C., He, J.-H. & Lee, S.-C. Dual-functional memory and threshold resistive switching based on the push-pull mechanism of oxygen ions. Sci. Rep. 6 , 23945 ( 1 2016 ).
- 97 . Abbas, H., Abbas, Y., Hassan, G., Sokolov, A. S., Jeon, Y.-R., Ku, B., Kang, C. J. & Choi, C. The coexistence of threshold and memory switching characteristics of ALD HfO 2 memristor synaptic arrays for energy-efficient neuromorphic computing. Nanoscale 12 , 14120 ( 26 2020 ).
- 98 . Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J. & Amodei, D. Scaling laws for neural language models. arXiv [cs.LG] ( 2020 ).
- 99 . Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G., Likharev, K. K. & Strukov, D. B. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521 , 61 ( 2015 ).
- 100 . Yao, P., Wu, H., Gao, B., Tang, J., Zhang, Q., Zhang, W., Yang, J. J. & Qian, H. Fully hardware-implemented memristor convolutional neural network. Nature 577 , 641 ( 2020 ).
- 101 . Chen, J., Yang, S., Wu, H., Indiveri, G. & Payvand, M. Scaling Limits of Memristor-Based Routers for Asynchronous Neuromorphic Systems. arXiv preprint arXiv: 2307 . 08116 ( 2023 ).
- 102 . Vianello, E. & Payvand, M. Scaling neuromorphic systems with 3 D technologies. Nat. Electron. 7 , 419 ( 6 2024 ).
- 103 . Moradi, S., Qiao, N., Stefanini, F. & Indiveri, G. A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs). Biomedical Circuits and Systems, IEEE Transactions on 12 , 106 ( 2018 ).
- 104 . Sutton, R. S. The Bitter Lesson ( 2024 ).
- 105 . Ielmini, D. Resistive switching memories based on metal oxides: mechanisms, reliability and scaling. Semiconductor Science and Technology 31 , 063002 ( 2016 ).
- 106 . Widrow, B. & Winter, R. Neural nets for adaptive filtering and adaptive pattern recognition. Computer 21 , 25 ( 1988 ).
- 107 . Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. in Advances in Neural Information Processing Systems 32 (eds Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E. & Garnett, R.) 8024 (Curran Associates, Inc., 2019 ).
- 108 . Bernstein, J., Wang, Y.-X., Azizzadenesheli, K. & Anandkumar, A. signSGD: Compressed Optimisation for Non-Convex Problems. arXiv [cs.LG] ( 2018 ).
- 109 . Boybat, I., Gallo, M. L., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multi-memristive synapses. Nature Communications 9 , 2514 ( 2018 ).
- 110 . Nandakumar, S. R., Gallo, M. L., Piveteau, C., Joshi, V., Mariani, G., Boybat, I., Karunaratne, G., Khaddam-Aljameh, R., Egger, U., Petropoulos, A., Antonakopoulos, T., Rajendran, B., Sebastian, A. & Eleftheriou, E. Mixed-Precision Deep Learning Based on Computational Memory. Frontiers in Neuroscience 14 , 406 ( 2020 ).
- 111 . Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. arXiv ( 2016 ).
- 112 . Le Gallo, M., Krebs, D., Zipoli, F., Salinga, M. & Sebastian, A. Collective Structural Relaxation in Phase-Change Memory Devices. Advanced Electronic Materials 4 , 1700627 ( 2018 ).
- 113 . Salimans, T., Ho, J., Chen, X., Sidor, S. & Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv ( 2017 ).
- 114 . Indiveri, G., Linares-Barranco, B., Legenstein, R., Deligeorgis, G. & Prodromakis, T. Integration of nanoscale memristor synapses in neuromorphic computing architectures. Nanotechnology 24 , 384010 ( 2013 ).
- 115 . Payvand, M., Nair, M. V., Müller, L. K. & Indiveri, G. A neuromorphic systems approach to in-memory computing with non-ideal memristive devices: From mitigation to exploitation. Faraday Discussions 213 , 487 ( 2019 ).
- 116 . Deiss, S., Douglas, R. & Whatley, A. in Pulsed Neural Networks (eds Maass, W. & Bishop, C.) 157 (MIT Press, 1998 ).
- 117 . Alibart, F., Gao, L., Hoskins, B. D. & Strukov, D. B. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23 , 075201 ( 2012 ).
- 118 . Nandakumar, S., Le Gallo, M., Boybat, I., Rajendran, B., Sebastian, A. & Eleftheriou, E. Mixed-precision architecture based on computational memory for training deep neural networks in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2018 ), 1 .
- 119 . Grossi, A., Nowak, E., Zambelli, C., Pellissier, C., Bernasconi, S., Cibrario, G., Hajjam, K. E., Crochemore, R., Nodin, J. F., Olivo, P. & Perniola, L. Fundamental variability limits of filament-based RRAM in 2016 IEEE International Electron Devices Meeting (IEDM) ( 2016 ), 4 . 7 . 1 .
- 120 . Payvand, M. & Indiveri, G. Spike-Based Plasticity Circuits for Always-on On-Line Learning in Neuromorphic Systems in 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE, Sapporo, Japan, 2019 ), 1 .
- 121 . Delbruck, T. 'Bump' circuits for computing similarity and dissimilarity of analog voltages in Neural Networks, 1991 ., IJCNN91 -Seattle International Joint Conference on 1 ( 1991 ), 475 .
- 122 . Payvand, M., Fouda, M., Kurdahi, F., Eltawil, A. & Neftci, E. O. Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays. arXiv preprint arXiv: 1910 . 06152 ( 2019 ).
- 123 . Goodman, D. Brian: a simulator for spiking neural networks in Python. Frontiers in Neuroinformatics 2 ( 2008 ).
- 124 . Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 , 2278 ( 1998 ).
- 125 . Dalgaty, T., Payvand, M., Moro, F., Ly, D. R., Pebay-Peyroula, F., Casas, J., Indiveri, G. & Vianello, E. Hybrid neuromorphic circuits exploiting non-conventional properties of RRAM for massively parallel local plasticity mechanisms. APL Materials 7 , 081125 ( 2019 ).
- 126 . Demirag, Y., Frenkel, C., Payvand, M. & Indiveri, G. Online training of spiking recurrent neural networks with Phase-Change Memory synapses 2021 .
- 127 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable synaptic eligibility traces with resistivity drift of phase-change materials in 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 2021 IEEE International Symposium on Circuits and Systems (ISCAS)Daegu, Korea. 00 (IEEE, 2021 ), 1 .
- 128 . Bohnstingl, T., Surina, A., Fabre, M., Demirag, Y., Frenkel, C., Payvand, M., Indiveri, G. & Pantazi, A. Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware in 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS)Incheon, Korea, Republic of. 00 (IEEE, 2022 ), 218 .
- 129 . Khrulkov, V., Novikov, A. & Oseledets, I. Expressive power of recurrent neural networks. arXiv ( 2017 ).
- 130 . Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. & Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv: 1609 . 03499 [cs] ( 2016 ).
- 131 . Teed, Z. & Deng, J. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. arXiv ( 2020 ).
- 132 . Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. & Amodei, D. Language Models are Few-Shot Learners. arXiv ( 2020 ).
- 133 . Berner, C., Brockman, G., Chan, B., Cheung, V., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F. & Zhang, S. Dota 2 with Large Scale Deep Reinforcement Learning, 66 ( 2019 ).
- 134 . Ha, D. & Schmidhuber, J. World Models. arXiv: 1803 . 10122 [cs, stat] ( 2018 ).
- 135 . Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C. & Silver, D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 , 350 ( 2019 ).
- 136 . Douglas, R. & Martin, K. in The synaptic organization of the brain (ed Shepherd, G.) 4 th, 459 (Oxford University Press, Oxford, New York, 1998 ).
- 137 . Douglas, R., Mahowald, M. & Mead, C. Neuromorphic analogue VLSI. Annual Review of Neuroscience 18 , 255 ( 1995 ).
- 138 . Ambrogio, S., Narayanan, P., Tsai, H., Shelby, R. M., Boybat, I., di Nolfo, C., Sidler, S., Giordano, M., Bodini, M., Farinha, N. C. P., Killeen, B., Cheng, C., Jaoudi, Y. & Burr, G. W. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558 , 60 ( 2018 ).
- 139 . Li, C., Belkin, D., Li, Y., Yan, P., Hu, M., Ge, N., Jiang, H., Montgomery, E., Lin, P., Wang, Z., Song, W., Strachan, J. P., Barnell, M., Wu, Q., Williams, R. S., Yang, J. J. & Xia, Q. Efficient and self-adaptive in-situ learning in multilayer memristor neural network. Nature Communications 9 , 1 ( 2018 ).
- 140 . Dalgaty, T., Castellani, N., Turck, C., Harabi, K.-E., Querlioz, D. & Vianello, E. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nature Electronics 4 , 151 ( 2021 ).
- 141 . Cai, F., Kumar, S., Van Vaerenbergh, T., Sheng, X., Liu, R., Li, C., Liu, Z., Foltin, M., Yu, S., Xia, Q., et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nature Electronics 3 , 409 ( 2020 ).
- 142 . Sebastian, A., Gallo, M. L. & Eleftheriou, E. Computational phase-change memory: beyond von Neumann computing. Journal of Physics D: Applied Physics 52 , 443002 ( 2019 ).
- 143 . Payvand, M., Nair, M. V., Müller, L. K. & Indiveri, G. A neuromorphic systems approach to in-memory computing with non-ideal memristive devices: From mitigation to exploitation. Faraday Discussions 213 , 487 ( 2019 ).
- 144 . Chicca, E. & Indiveri, G. A recipe for creating ideal hybrid memristive-CMOS neuromorphic processing systems. Applied Physics Letters 116 , 120501 ( 2020 ).
- 145 . Peng, X., Huang, S., Luo, Y., Sun, X. & Yu, S. DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute -in-Memory Accelerators with Versatile Device Technologies. IEEE International Electron Devices Meeting (IEDM) , 32 . 5 . 1 ( 2019 ).
- 146 . Peng, X., Huang, S., Jiang, H., Lu, A. & Yu, S. DNN+NeuroSim V 2 . 0 : An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems PP , 1 ( 2020 ).
- 147 . Burr, G. W., Brightsky, M. J., Sebastian, A., Cheng, H.-Y., Wu, J.-Y., Kim, S., Sosa, N. E., Papandreou, N., Lung, H.-L., Pozidis, H., Eleftheriou, E. & Lam, C. H. Recent Progress in Phase-Change Memory Technology. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 , 146 ( 2016 ).
- 148 . Burr, G. W., Shelby, R. M., Sebastian, A., Kim, S., Kim, S., Sidler, S., Virwani, K., Ishii, M., Narayanan, P., Fumarola, A., Sanches, L. L., Boybat, I., Le Gallo, M., Moon, K., Woo, J., Hwang, H. & Leblebici, Y. Neuromorphic computing using non-volatile memory. Advances in Physics: X 2 , 89 ( 2017 ).
- 149 . Tuma, T., Pantazi, A., Le Gallo, M., Sebastian, A. & Eleftheriou, E. Stochastic phase-change neurons. Nature Nanotechnology 11 , 693 ( 2016 ).
- 150 . Karunaratne, G., Gallo, M. L., Cherubini, G., Benini, L., Rahimi, A. & Sebastian, A. Inmemory hyperdimensional computing. Nature Electronics 3 , 327 ( 2020 ).
- 151 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable Synaptic Eligibility Traces with Resistivity Drift of PhaseChange Materials. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) , 1 ( 2021 ).
- 152 . Gallo, M. L., Sebastian, A., Cherubini, G., Giefers, H. & Eleftheriou, E. Compressed Sensing With Approximate Message Passing Using In-Memory Computing. IEEE Transactions on Electron Devices 65 , 4304 ( 2018 ).
- 153 . Gallo, M. L. & Sebastian, A. An overview of phase-change memory device physics. Journal of Physics D: Applied Physics 53 , 213002 ( 2020 ).
- 154 . Gallo, M. L., Athmanathan, A., Krebs, D. & Sebastian, A. Evidence for thermally assisted threshold switching behavior in nanoscale phase-change memory cells. Journal of Applied Physics 119 , 025704 ( 2016 ).
- 155 . Ielmini, D., Lavizzari, S., Sharma, D. & Lacaita, A. L. Physical Interpretation, Modeling and Impact on Phase Change Memory (PCM) Reliability of Resistance Drift Due to Chalcogenide Structural Relaxation. 2007 IEEE International Electron Devices Meeting , 939 ( 2007 ).
- 156 . Karpov, I., Mitra, M., Kau, D., Spadini, G., Kryukov, Y. & Karpov, V. Fundamental drift of parameters in chalcogenide phase change memory. Journal of Applied Physics 102 , 124503 ( 2007 ).
- 157 . Redaelli, A., Pirovano, A., Benvenuti, A. & Lacaita, A. L. Threshold switching and phase transition numerical models for phase change memory simulations. Journal of Applied Physics 103 , 111101 ( 2008 ).
- 158 . Salinga, M., Carria, E., Kaldenbach, A., Bornhöfft, M., Benke, J., Mayer, J. & Wuttig, M. Measurement of crystal growth velocity in a melt-quenched phase-change material. Nature Communications 4 , 2371 ( 2013 ).
- 159 . Nardone, M., Kozub, V. I., Karpov, I. V. & Karpov, V. G. Possible mechanisms for 1 /f noise in chalcogenide glasses: A theoretical description. Physical Review B 79 , 165206 ( 2009 ).
- 160 . Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neur. Circ. 9 , 85 ( 2016 ).
- 161 . Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm in Advances in neural information processing systems ( 2018 ), 8721 .
- 162 . Pozzi, I., Bohté, S. & Roelfsema, P. A Biologically Plausible Learning Rule for Deep Learning in the Brain. arXiv ( 2018 ).
- 163 . Sussillo, D. & Abbott, L. Generating coherent patterns of activity from chaotic neural networks. Neuron 63 , 544 ( 2009 ).
- 164 . Nicola, W. & Clopath, C. Supervised Learning in Spiking Neural Networks with FORCE Training. Nature Communications 8 , 2208 ( 2017 ).
- 165 . Neftci, E. O., Mostafa, H. & Zenke, F. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine 36 , 51 ( 2019 ).
- 166 . Lee, J. H., Delbruck, T. & Pfeiffer, M. Training Deep Spiking Neural Networks Using Backpropagation. Frontiers in Neuroscience 10 , 508 ( 2016 ).
- 167 . Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9 , 1735 ( 1997 ).
- 168 . Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, D. & Indiveri, G. A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128 K synapses. Frontiers in neuroscience 9 , 141 ( 2015 ).
- 169 . Payvand, M., Muller, L. K. & Indiveri, G. Event-based circuits for controlling stochastic learning with memristive devices in neuromorphic architectures in Circuits and Systems (ISCAS), 2018 IEEE International Symposium on ( 2018 ), 1 .
- 170 . Nair, M. V., Mueller, L. K. & Indiveri, G. A differential memristive synapse circuit for on-line learning in neuromorphic computing systems. Nano Futures 1 , 1 ( 2017 ).
- 171 . Balles, L., Pedregosa, F. & Roux, N. L. The Geometry of Sign Gradient Descent. arXiv ( 2020 ).
- 172 . Nair, M. V. & Dudek, P. Gradient-descent-based learning in memristive crossbar arrays in International Joint Conference on Neural Networks (IJCNN) ( 2015 ), 1 .
- 173 . Müller, L., Nair, M. & Indiveri, G. Randomized Unregulated Step Descent for Limited Precision Synaptic Elements in International Symposium on Circuits and Systems, (ISCAS) ( 2017 ).
- 174 . Payvand, M., Fouda, M. E., Kurdahi, F., Eltawil, A. M. & Neftci, E. O. On-Chip ErrorTriggered Learning of Multi-Layer Memristive Spiking Neural Networks. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10 , 522 ( 2020 ).
- 175 . Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412 . 6980 ( 2014 ).
- 176 . Athmanathan, A., Stanisavljevic, M., Papandreou, N., Pozidis, H. & Eleftheriou, E. Multilevel-Cell Phase-Change Memory: A Viable Technology. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 , 87 ( 2016 ).
- 177 . Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian Optimization of Machine Learning Algorithms in Proceedings of the 25 th International Conference on Neural Information Processing Systems - Volume 2 (Curran Associates Inc., Lake Tahoe, Nevada, 2012 ), 2951 .
- 178 . Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G. A. F., Joshi, P., Plank, P. & Risbud, S. R. Advancing neuromorphic computing with Loihi: A survey of results and outlook. Proceedings of the IEEE 109 , 911 ( 2021 ).
- 179 . Frenkel, C., Bol, D. & Indiveri, G. Bottom-Up and Top-Down Neural Processing Systems Design: Neuromorphic Intelligence as the Convergence of Natural and Artificial Intelligence. arXiv preprint arXiv: 2106 . 01288 ( 2021 ).
- 180 . Muller, L. K. & Indiveri, G. Rounding methods for neural networks with low resolution synaptic weights. arXiv preprint arXiv: 1504 . 05767 , 1 ( 2015 ).
- 181 . Frenkel, C., Legat, J.-D. & Bol, D. MorphIC: A 65 -nm 738 k-Synapse/mm 2 quad-core binaryweight digital neuromorphic processor with stochastic spike-driven online learning. IEEE Transactions on Biomedical Circuits and Systems 13 , 999 ( 2019 ).
- 182 . Frenkel, C., Legat, J.-D. & Bol, D. A 28 -nm convolutional neuromorphic processor enabling online learning with spike-based retinas in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2020 ), 1 .
- 183 . Fusi, S. & Abbott, L. Limits on the memory storage capacity of bounded synapses. Nature Neuroscience 10 , 485 ( 2007 ).
- 184 . Laborieux, A., Ernoult, M., Hirtzlin, T. & Querlioz, D. Synaptic metaplasticity in binarized neural networks. Nature communications 12 , 1 ( 2021 ).
- 185 . Khaddam-Aljameh, R., Stanisavljevic, M., Fornt Mas, J., Karunaratne, G., Brandli, M., Liu, F., Singh, A., Muller, S. M., Egger, U., Petropoulos, A., Antonakopoulos, T., Brew, K., Choi, S., Ok, I., Lie, F. L., Saulnier, N., Chan, V ., Ahsan, I., Narayanan, V., Nandakumar, S. R., Le Gallo, M., Francese, P. A., Sebastian, A. & Eleftheriou, E. HERMES-core-A 1 . 59 -TOPS/mm 2 PCM on 14 -nm CMOS in-memory compute core using 300 -ps/LSB linearized CCO-based ADCs. IEEE J. Solid-State Circuits 57 , 1027 ( 4 2022 ).
- 186 . Le Gallo, M., Sebastian, A., Cherubini, G., Giefers, H. & Eleftheriou, E. Compressed Sensing With Approximate Message Passing Using In-Memory Computing. IEEE Trans. Electron Devices 65 , 4304 ( 2018 ).
- 187 . Mead, C. How we created neuromorphic engineering. Nature Electronics 3 , 434 ( 2020 ).
- 188 . Chicca, E., Stefanini, F., Bartolozzi, C. & Indiveri, G. Neuromorphic electronic circuits for building autonomous cognitive systems. Proceedings of the IEEE 102 , 1367 ( 2014 ).
- 189 . Indiveri, G. & Horiuchi, T. Frontiers in Neuromorphic Engineering. Frontiers in Neuroscience 5 , 1 ( 2011 ).
- 190 . Mead, C. Neuromorphic Electronic Systems. Proceedings of the IEEE 78 , 1629 ( 1990 ).
- 191 . Serb, A., Bill, J., Khiat, A., Berdan, R., Legenstein, R. & Prodromakis, T. Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses. Nature Communications 7 , 12611 ( 2016 ).
- 192 . Li, Y., Wang, Z., Midya, R., Xia, Q. & Yang, J. J. Review of memristor devices in neuromorphic computing: materials sciences and device challenges. Journal of Physics D: Applied Physics 51 , 503002 ( 2018 ).
- 193 . Spiga, S., Sebastian, A., Querlioz, D. & Rajendran, B. in Memristive Devices for BrainInspired Computing (eds Spiga, S., Sebastian, A., Querlioz, D. & Rajendran, B.) 3 (Woodhead Publishing, 2020 ).
- 194 . Payvand, M. & Indiveri, G. Spike-Based Plasticity Circuits for Always-on On-Line Learning in Neuromorphic Systems in IEEE International Symposium on Circuits and Systems (ISCAS) ( 2019 ), 1 .
- 195 . Widrow, B. & Hoff, M. Adaptive Switching Circuits in 1960 IRE WESCON Convention Record, Part 4 (IRE, New York, 1960 ), 96 .
- 196 . Payvand, M., Fouda, M. E., Kurdahi, F., Eltawil, A. & Neftci, E. O. Error-triggered three-factor learning dynamics for crossbar arrays in International Conference on Artificial Intelligence Circuits and Systems (AICAS) ( 2020 ), 218 .
- 197 . Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules. Front. Neur. Circ. 12 , 53 ( 2018 ).
- 198 . Neftci, E. O. Data and Power Efficient Intelligence with Neuromorphic Learning Machines. iScience 5 , 52 ( 2018 ).
- 199 . Sanhueza, M. & Lisman, J. The CaMKII/NMDAR complex as a molecular memory. Molecular brain 6 , 1 ( 2013 ).
- 200 . Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323 , 533 ( 1986 ).
- 201 . Qiao, N., Bartolozzi, C. & Indiveri, G. An Ultralow Leakage Synaptic Scaling Homeostatic Plasticity Circuit With Configurable Time Scales up to 100 ks. IEEE Transactions on Biomedical Circuits and Systems ( 2017 ).
- 202 . Bartolozzi, C. & Indiveri, G. Synaptic dynamics in analog VLSI. Neural Computation 19 , 2581 ( 2007 ).
- 203 . Bartolozzi, C., Mitra, S. & Indiveri, G. An ultra low power current-mode filter for neuromorphic systems and biomedical signal processing in 2006 IEEE Biomedical Circuits and Systems Conference 2006 IEEE Biomedical Circuits and Systems Conference - Healthcare Technology (BioCas)London, UK (IEEE, 2006 ), 130 .
- 204 . Saxena, V. & Baker, R. J. Compensation of CMOS op-amps using split-length transistors in Circuits and Systems (MWSCAS), 2008 IEEE 51 st International Midwest Symposium on ( 2008 ), 109 .
- 205 . Garofolo, John S., Lamel, Lori F., Fisher, William M., Pallett, David S., Dahlgren, Nancy L., Zue, Victor & Fiscus, Jonathan G. TIMIT Acoustic-Phonetic Continuous Speech Corpus 1993 .
- 206 . Cramer, B., Stradmann, Y., Schemmel, J. & Zenke, F. The Heidelberg spiking data sets for the systematic evaluation of spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems ( 2020 ).
- 207 . Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv [cs.LG] ( 2017 ).
- 208 . Orchard, G., Jayawant, A., Cohen, G. K. & Thakor, N. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades. Frontiers in Neuroscience 9 ( 2015 ).
- 209 . Krizhevsky, A. Learning multiple layers of features from tiny images research rep. ( 2009 ).
- 210 . Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. & Fei-Fei, L. ImageNet: A large-scale hierarchical image database in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009 ).
- 211 . Sutton, R. S. & Barto, A. G. Reinforcement learning: an introduction (MIT Press, Cambridge, Mass, 1998 ).
- 212 . Wawrzy´ nski, P. & Tanwani, A. K. Autonomous reinforcement learning with experience replay. Neural Networks 41 , 156 ( 2013 ).
- 213 . Lehmann, M. P., Xu, H. A., Liakoni, V., Herzog, M. H., Gerstner, W. & Preuschoff, K. One-shot learning and behavioral eligibility traces in sequential decision making. Elife 8 , e 47463 ( 2019 ).
- 214 . Lisman, J. A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. Proc. Natl. Acad. Sci. U. S. A. 86 , 9574 ( 23 1989 ).
- 215 . Shouval, H. Z., Bear, M. F. & Cooper, L. N. A unified model of NMDA receptor-dependent bidirectional synaptic plasticity. Proceedings of the National Academy of Sciences 99 , 10831 ( 16 2002 ).
- 216 . Bosch, M., Castro, J., Saneyoshi, T., Matsuno, H., Sur, M. & Hayashi, Y. Structural and Molecular Remodeling of Dendritic Spine Substructures during Long-Term Potentiation. Neuron 82 , 444 ( 2 2014 ).
- 217 . He, K., Huertas, M., Hong, S. Z., Tie, X., Hell, J. W., Shouval, H. & Kirkwood, A. Distinct Eligibility Traces for LTP and LTD in Cortical Synapses. Neuron 88 , 528 ( 3 2015 ).
- 218 . Brzosko, Z., Schultz, W. & Paulsen, O. Retroactive modulation of spike timing-dependent plasticity by dopamine. eLife 4 , e 09685 ( 2015 ).
- 219 . Ielmini, D., Lavizzari, S., Sharma, D. & Lacaita, A. L. Temperature acceleration of structural relaxation in amorphous Ge 2 Sb 2 Te 5 . Applied Physics Letters 92 , 193511 ( 2008 ).
- 220 . Pirovano, A., Lacaita, A. L., Pellizzer, F., Kostylev, S. A., Benvenuti, A. & Bez, R. Low-field amorphous state resistance and threshold voltage drift in chalcogenide materials. IEEE Transactions on Electron Devices 51 , 714 ( 2004 ).
- 221 . Kim, S., Lee, B., Asheghi, M., Hurkx, F., Reifenberg, J. P., Goodson, K. E. & Wong, H.-S. P. Resistance and threshold switching voltage drift behavior in phase-change memory and their temperature dependence at microsecond time scales studied using a micro-thermal stage. IEEE Transactions on Electron Devices 58 , 584 ( 2011 ).
- 222 . Demirag, Y. Multiphysics modeling of Ge 2 Sb 2 Te 5 based synaptic devices for brain inspired computing MA thesis (Ihsan Dogramaci Bilkent University, Ankara, Turkey, 2018 ).
- 223 . Brader, J. M., Senn, W. & Fusi, S. Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Computation 19 , 2881 ( 2007 ).
- 224 . Delbruck, T. & Mead, C. Bump circuits in Proceedings of International Joint Conference on Neural Networks 1 ( 1993 ), 475 .
- 225 . Liu, S.-C., Kramer, J., Indiveri, G., Delbruck, T. & Douglas, R. Analog VLSI:Circuits and Principles (MIT Press, 2002 ).
- 226 . Rubino, A., Payvand, M. & Indiveri, G. Ultra-Low Power Silicon Neuron Circuit for ExtremeEdge Neuromorphic Intelligence in International Conference on Electronics, Circuits, and Systems, (ICECS), 2019 ( 2019 ), 458 .
- 227 . Strukov, D. B., Snider, G. S., Stewart, D. R. & Williams, R. S. The missing memristor found. Nature 453 , 80 ( 7191 2008 ).
- 228 . Kumar, S., Williams, R. S. & Wang, Z. Third-order nanocircuit elements for neuromorphic engineering. Nature 585 , 518 ( 7826 2020 ).
- 229 . Grollier, J., Querlioz, D., Camsari, K. Y., Everschor-Sitte, K., Fukami, S. & Stiles, M. D. Neuromorphic spintronics. Nat. Electron. 3 , 360 ( 7 2020 ).
- 230 . Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 18 , 507 ( 5 1971 ).
- 231 . Chicca, E., Stefanini, F., Bartolozzi, C. & Indiveri, G. Neuromorphic electronic circuits for building autonomous cognitive systems. Proc. IEEE Inst. Electr. Electron. Eng. 102 . Comment: Submitted to Proceedings of IEEE, spiking neural network implementations in full custom VLSI, 1367 ( 9 2014 ).
- 232 . Cheng, Q., Song, S.-H. & Augustine, G. J. Molecular mechanisms of short-term plasticity: Role of synapsin phosphorylation in augmentation and potentiation of spontaneous glutamate release. Front. Synaptic Neurosci. 10 , 33 ( 2018 ).
- 233 . Boyn, S., Grollier, J., Lecerf, G., Xu, B., Locatelli, N., Fusil, S., Girod, S., Carrétéro, C., Garcia, K., Xavier, S., Tomas, J., Bellaiche, L., Bibes, M., Barthélémy, A., Saïghi, S. & Garcia, V. Learning through ferroelectric domain dynamics in solid-state synapses. Nat. Commun. 8 , 14736 ( 1 2017 ).
- 234 . Wang, Z., Joshi, S., Savel'ev, S. E., Jiang, H., Midya, R., Lin, P., Hu, M., Ge, N., Strachan, J. P., Li, Z., Wu, Q., Barnell, M., Li, G.-L., Xin, H. L., Williams, R. S., Xia, Q. & Yang, J. J. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. Nat. Mater. 16 , 101 ( 1 2017 ).
- 235 . Mehonic, A., Sebastian, A., Rajendran, B., Simeone, O., Vasilaki, E. & Kenyon, A. J. Memristors-from in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio-inspired computing. Adv. Intell. Syst. 2 , 2000085 ( 11 2020 ).
- 236 . Mahmoodi, M. R., Prezioso, M. & Strukov, D. B. Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization. Nat. Commun. 10 , 5113 ( 1 2019 ).
- 237 . Karunaratne, G., Le Gallo, M., Cherubini, G., Benini, L., Rahimi, A. & Sebastian, A. Inmemory hyperdimensional computing. Nat. Electron. 3 , 327 ( 6 2020 ).
- 238 . Tuma, T., Pantazi, A., Le Gallo, M., Sebastian, A. & Eleftheriou, E. Stochastic phase-change neurons. Nat. Nanotechnol. 11 , 693 ( 8 2016 ).
- 239 . Appeltant, L., Soriano, M. C., Van der Sande, G., Danckaert, J., Massar, S., Dambre, J., Schrauwen, B., Mirasso, C. R. & Fischer, I. Information processing using a single dynamical node as complex system. Nat. Commun. 2 , 468 ( 1 2011 ).
- 240 . Zhu, X., Wang, Q. & Lu, W. D. Memristor networks for real-time neural activity analysis. Nat. Commun. 11 , 2439 ( 1 2020 ).
- 241 . Ninan, I. & Arancio, O. Presynaptic CaMKII is necessary for synaptic plasticity in cultured hippocampal neurons. Neuron 42 , 129 ( 1 2004 ).
- 242 . Yang, J. J., Strukov, D. B. & Stewart, D. R. Memristive devices for computing. Nat. Nanotechnol. 8 , 13 ( 1 2013 ).
- 243 . Midya, R., Wang, Z., Asapu, S., Joshi, S., Li, Y., Zhuo, Y., Song, W., Jiang, H., Upadhay, N., Rao, M., Lin, P., Li, C., Xia, Q. & Yang, J. J. Artificial neural network (ANN) to spiking neural network (SNN) converters based on diffusive memristors. Adv. Electron. Mater. 5 , 1900060 ( 9 2019 ).
- 244 . Yang, K., Li, F., Veeramalai, C. P. & Guo, T. A facile synthesis of CH 3 NH 3 PbBr 3 perovskite quantum dots and their application in flexible nonvolatile memory. Appl. Phys. Lett. 110 , 083102 ( 8 2017 ).
- 245 . Jeong, J., Kim, M., Seo, J., Lu, H., Ahlawat, P., Mishra, A., Yang, Y., Hope, M. A., Eickemeyer, F. T., Kim, M., Yoon, Y. J., Choi, I. W., Darwich, B. P., Choi, S. J., Jo, Y., Lee, J. H., Walker, B., Zakeeruddin, S. M., Emsley, L., Rothlisberger, U., Hagfeldt, A., Kim, D. S., Grätzel, M. & Kim, J. Y. Pseudo-halide anion engineering for α -FAPbI 3 perovskite solar cells. Nature 592 , 381 ( 7854 2021 ).
- 246 . Hassan, Y., Park, J. H., Crawford, M. L., Sadhanala, A., Lee, J., Sadighian, J. C., Mosconi, E., Shivanna, R., Radicchi, E., Jeong, M., Yang, C., Choi, H., Park, S. H., Song, M. H., De Angelis, F., Wong, C. Y., Friend, R. H., Lee, B. R. & Snaith, H. J. Ligand-engineered bandgap stability in mixed-halide perovskite LEDs. Nature 591 , 72 ( 7848 2021 ).
- 247 . Protesescu, L., Yakunin, S., Bodnarchuk, M. I., Krieg, F., Caputo, R., Hendon, C. H., Yang, R. X., Walsh, A. & Kovalenko, M. V. Nanocrystals of cesium lead Halide perovskites ( CsPbX 3, X = Cl, Br, and I): Novel optoelectronic materials showing bright emission with wide color gamut. Nano Lett. 15 , 3692 ( 6 2015 ).
- 248 . Saidaminov, M. I., Adinolfi, V., Comin, R., Abdelhady, A. L., Peng, W., Dursun, I., Yuan, M., Hoogland, S., Sargent, E. H. & Bakr, O. M. Planar-integrated single-crystalline perovskite photodetectors. Nat. Commun. 6 , 8724 ( 1 2015 ).
- 249 . Yakunin, S., Sytnyk, M., Kriegner, D., Shrestha, S., Richter, M., Matt, G. J., Azimi, H., Brabec, C. J., Stangl, J., Kovalenko, M. V. & Heiss, W. Detection of X-ray photons by solution-processed organic-inorganic perovskites. Nat. Photonics 9 , 444 ( 7 2015 ).
- 250 . Wu, W., Han, X., Li, J., Wang, X., Zhang, Y., Huo, Z., Chen, Q., Sun, X., Xu, Z., Tan, Y., Pan, C. & Pan, A. Ultrathin and conformable lead Halide perovskite photodetector arrays for potential application in retina-like vision sensing. Adv. Mater. 33 , e 2006006 ( 9 2021 ).
- 251 . Xiao, Z. & Huang, J. Energy-efficient hybrid perovskite memristors and synaptic devices. Adv. Electron. Mater. 2 , 1600100 ( 7 2016 ).
- 252 . Xu, W., Cho, H., Kim, Y.-H., Kim, Y.-T., Wolf, C., Park, C.-G. & Lee, T.-W. Organometal Halide perovskite artificial synapses. Adv. Mater. 28 , 5916 ( 28 2016 ).
- 253 . John, R. A., Yantara, N., Ng, Y. F., Narasimman, G., Mosconi, E., Meggiolaro, D., Kulkarni, M. R., Gopalakrishnan, P. K., Nguyen, C. A., De Angelis, F., Mhaisalkar, S. G., Basu, A. & Mathews, N. Ionotronic Halide perovskite drift-diffusive synapses for low-power neuromorphic computation. Adv. Mater. 30 , e 1805454 ( 51 2018 ).
- 254 . Lee, S., Kim, H., Kim, D. H., Kim, W. B., Lee, J. M., Choi, J., Shin, H., Han, G. S., Jang, H. W. & Jung, H. S. Tailored 2 D/ 3 D Halide perovskite heterointerface for substantially enhanced endurance in conducting bridge resistive switching memory. ACS Appl. Mater. Interfaces 12 , 17039 ( 14 2020 ).
- 255 . John, R. A., Yantara, N., Ng, S. E., Patdillah, M. I. B., Kulkarni, M. R., Jamaludin, N. F., Basu, J., Ankit, Mhaisalkar, S. G., Basu, A. & Mathews, N. Diffusive and drift Halide perovskite memristive barristors as nociceptive and synaptic emulators for neuromorphic computing. Adv. Mater. 33 , 2007851 ( 15 2021 ).
- 256 . Tian, H., Zhao, L., Wang, X., Yeh, Y.-W., Yao, N., Rand, B. P. & Ren, T.-L. Extremely low operating current resistive memory based on exfoliated 2 D perovskite single crystals for neuromorphic computing. ACS Nano 11 , 12247 ( 12 2017 ).
- 257 . Wang, Y., Lv, Z., Liao, Q., Shan, H., Chen, J., Zhou, Y., Zhou, L., Chen, X., Roy, V. A. L., Wang, Z., Xu, Z., Zeng, Y.-J. & Han, S.-T. Synergies of electrochemical metallization and valance change in all-inorganic perovskite quantum dots for resistive switching. Adv. Mater. 30 , e 1800327 ( 28 2018 ).
- 258 . Tan, H., Ni, Z., Peng, W., Du, S., Liu, X., Zhao, S., Li, W., Ye, Z., Xu, M., Xu, Y., Pi, X. & Yang, D. Broadband optoelectronic synaptic devices based on silicon nanocrystals for neuromorphic computing. Nano Energy 52 , 422 ( 2018 ).
- 259 . Jarschel, P., Kim, J. H., Biadala, L., Berthe, M., Lambert, Y., Osgood 3 rd, R. M., Patriarche, G., Grandidier, B. & Xu, J. Single-electron tunneling PbS/InP heterostructure nanoplatelets for synaptic operations. ACS Appl. Mater. Interfaces 13 , 38450 ( 32 2021 ).
- 260 . Wang, Y., Lv, Z., Chen, J., Wang, Z., Zhou, Y., Zhou, L., Chen, X. & Han, S.-T. Photonic synapses based on inorganic perovskite quantum dots for neuromorphic computing. Adv. Mater. 30 , e 1802883 ( 38 2018 ).
- 261 . Jiang, T., Shao, Z., Fang, H., Wang, W., Zhang, Q., Wu, D., Zhang, X. & Jie, J. Highperformance nanofloating gate memory based on lead Halide perovskite nanocrystals. ACS Appl. Mater. Interfaces 11 , 24367 ( 27 2019 ).
- 262 . Hao, J., Kim, Y.-H., Habisreutinger, S. N., Harvey, S. P., Miller, E. M., Foradori, S. M., Arnold, M. S., Song, Z., Yan, Y., Luther, J. M. & Blackburn, J. L. Low-energy room-temperature optical switching in mixed-dimensionality nanoscale perovskite heterojunctions. Sci. Adv. 7 ( 18 2021 ).
- 263 . Subramanian Periyal, S., Jagadeeswararao, M., Ng, S. E., John, R. A. & Mathews, N. Halide perovskite quantum dots photosensitized-amorphous oxide transistors for multimodal synapses. Adv. Mater. Technol. 5 , 2000514 ( 11 2020 ).
- 264 . Xiao, X., Hu, J., Tang, S., Yan, K., Gao, B., Chen, H. & Zou, D. Recent advances in Halide perovskite memristors: Materials, structures, mechanisms, and applications. Adv. Mater. Technol. 5 , 1900914 ( 6 2020 ).
- 265 . Xiao, Z., Yuan, Y., Shao, Y., Wang, Q., Dong, Q., Bi, C., Sharma, P., Gruverman, A. & Huang, J. Giant switchable photovoltaic effect in organometal trihalide perovskite devices. Nat. Mater. 14 , 193 ( 2 2015 ).
- 266 . Chen, L.-W., Wang, W.-C., Ko, S.-H., Chen, C.-Y., Hsu, C.-T., Chiao, F.-C., Chen, T.-W., Wu, K.-C. & Lin, H.-W. Highly uniform all-vacuum-deposited inorganic perovskite artificial synapses for reservoir computing. Adv. Intell. Syst. 3 , 2000196 ( 1 2021 ).
- 267 . Midya, R., Wang, Z., Zhang, J., Savel'ev, S. E., Li, C., Rao, M., Jang, M. H., Joshi, S., Jiang, H., Lin, P., Norris, K., Ge, N., Wu, Q., Barnell, M., Li, Z., Xin, H. L., Williams, R. S., Xia, Q. & Yang, J. J. Anatomy of Ag/Hafnia-based selectors with 1010 nonlinearity. Adv. Mater. 29 ( 12 2017 ).
- 268 . Wang, Z., Rao, M., Midya, R., Joshi, S., Jiang, H., Lin, P., Song, W., Asapu, S., Zhuo, Y., Li, C., Wu, H., Xia, Q. & Yang, J. J. Threshold switching of Ag or Cu in dielectrics: Materials, mechanism, and applications. Adv. Funct. Mater. 28 , 1704862 ( 6 2018 ).
- 269 . Guo, M. Q., Chen, Y. C., Lin, C. Y., Chang, Y. F., Fowler, B., Li, Q. Q., Lee, J. & Zhao, Y. G. Unidirectional threshold resistive switching in Au/NiO/Nb:SrTiO 3 devices. Appl. Phys. Lett. 110 , 233504 ( 23 2017 ).
- 270 . Du, C., Cai, F., Zidan, M. A., Ma, W., Lee, S. H. & Lu, W. D. Reservoir computing using dynamic memristors for temporal information processing. Nat. Commun. 8 , 2204 ( 1 2017 ).
- 271 . Gibbons, T. E. Unifying quality metrics for reservoir networks in The 2010 International Joint Conference on Neural Networks (IJCNN) 2010 International Joint Conference on Neural Networks (IJCNN)Barcelona, Spain (IEEE, 2010 ), 1 .
- 272 . Suri, M., Bichler, O., Querlioz, D., Cueto, O., Perniola, L., Sousa, V., Vuillaume, D., Gamrat, C. & DeSalvo, B. Phase change memory as synapse for ultra-dense neuromorphic systems: Application to complex visual pattern extraction in 2011 International Electron Devices Meeting 2011 IEEE International Electron Devices Meeting (IEDM)Washington, DC, USA (IEEE, 2011 ), 4 . 4 . 1 .
- 273 . Hu, M., Strachan, J. P., Li, Z., Grafals, E. M., Davila, N., Graves, C., Lam, S., Ge, N., Yang, J. J. & Williams, R. S. Dot-product engine for neuromorphic computing: programming 1 T 1 Mcrossbar to accelerate matrix-vector multiplication in Proceedings of the 53 rd Annual Design Automation Conference DAC ' 16 : The 53 rd Annual Design Automation Conference 2016 Austin Texas (ACM, New York, NY, USA, 2016 ), 1 .
- 274 . Boybat, I., Le Gallo, M., Nandakumar, S. R., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multimemristive synapses. Nat. Commun. 9 , 2514 ( 1 2018 ).
- 275 . Sun, X., Wang, N., Chen, C.-Y., Ni, J., Agrawal, A., Cui, X., Venkataramani, S., El Maghraoui, K., Srinivasan, V. ( & Gopalakrishnan, K. Ultra-Low Precision 4 -bit Training of Deep Neural Networks in Advances in Neural Information Processing Systems (eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) 33 (Curran Associates, Inc., 2020 ), 1796 .
- 276 . Payvand, M., Demirag, Y., Dalgaty, T., Vianello, E. & Indiveri, G. Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 2020 IEEE International Symposium on Circuits and Systems (ISCAS)Seville, Spain. 00 (IEEE, 2020 ), 1 .
- 277 . Tanaka, G., Yamane, T., Héroux, J. B., Nakane, R., Kanazawa, N., Takeda, S., Numata, H., Nakano, D. & Hirose, A. Recent advances in physical reservoir computing: A review. Neural Netw. 115 , 100 ( 2019 ).
- 278 . Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition 590 pp. (Cambridge University Press, Cambridge, England, 2014 ).
- 279 . Watts, D. J. & Strogatz, S. H. Collective dynamics of 'small-world'networks. Nature 393 , 440 ( 1998 ).
- 280 . Kawai, Y., Park, J. & Asada, M. A small-world topology enhances the echo state property and signal propagation in reservoir computing. Neural Networks 112 , 15 ( 2019 ).
- 281 . Loeffler, A., Zhu, R., Hochstetter, J., Li, M., Fu, K., Diaz-Alvarez, A., Nakayama, T., Shine, J. M. & Kuncic, Z. Topological Properties of Neuromorphic Nanowire Networks. Frontiers in Neuroscience 14 , 184 ( 2020 ).
- 282 . Park, H.-J. & Friston, K. Structural and Functional Brain Networks: From Connections to Cognition. Science 342 ( 2013 ).
- 283 . Gallos, L. K., Makse, H. A. & Sigman, M. A small world of weak ties provides optimal global integration of self-similar modules in functional brain networks. Proceedings of the National Academy of Sciences 109 , 2825 ( 2012 ).
- 284 . Sporns, O. & Zwi, J. D. The Small World of the Cerebral Cortex. Neuroinformatics 2 , 145 ( 2004 ).
- 285 . Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 , 186 ( 2009 ).
- 286 . Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 , 186 ( 2009 ).
- 287 . Hasler, J. Large-scale field-programmable analog arrays. Proceedings of the IEEE 108 , 1283 ( 2019 ).
- 288 . Jo, S. H., Chang, T., Ebong, I., Bhadviya, B. B., Mazumder, P. & Lu, W. Nanoscale memristor device as synapse in neuromorphic systems. Nano letters 10 , 1297 ( 2010 ).
- 289 . Ielmini, D. & Waser, R. Resistive Switching: From Fundamentals of Nanoionic Redox Processes to Memristive Device Applications (John Wiley & Sons, 2015 ).
- 290 . Strukov, D., Indiveri, G., Grollier, J. & Fusi, S. Building brain-inspired computing. Nature Communications 10 ( 2019 ).
- 291 . Kingra, S. K., Parmar, V., Chang, C.-C., Hudec, B., Hou, T.-H. & Suri, M. SLIM: Simultaneous Logic-In-Memory computing exploiting bilayer analog OxRAM devices. Scientific Reports 10 , 1 ( 2020 ).
- 292 . Wo´ zniak, S., Pantazi, A., Bohnstingl, T. & Eleftheriou, E. Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nature Machine Intelligence 2 , 325 ( 2020 ).
- 293 . Ambrogio, S., Narayanan, P., Okazaki, A., Fasoli, A., Mackin, C., Hosokawa, K., Nomura, A., Yasuda, T., Chen, A., Friz, A., et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620 , 768 ( 2023 ).
- 294 . Le Gallo, M., Khaddam-Aljameh, R., Stanisavljevic, M., Vasilopoulos, A., Kersting, B., Dazzi, M., Karunaratne, G., Brändli, M., Singh, A., Müller, S. M., et al. A 64 -core mixed-signal inmemory compute chip based on phase-change memory for deep neural network inference. Nature Electronics 6 , 680 ( 2023 ).
- 295 . Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nature Nanotechnology 15 , 529 ( 2020 ).
- 296 . Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. In-datacenter performance analysis of a Tensor Processing Unit in Proceedings of the 44 th annual international symposium on computer architecture ( 2017 ), 1 .
- 297 . Yu, S., Sun, X., Peng, X. & Huang, S. Compute-in-memory with emerging nonvolatile-memories: challenges and prospects in 2020 IEEE Custom Integrated Circuits Conference (CICC) ( 2020 ), 1 .
- 298 . Joksas, D., Freitas, P., Chai, Z., Ng, W., Buckwell, M., Li, C., Zhang, W., Xia, Q., Kenyon, A. & Mehonic, A. Committee machines-a universal method to deal with non-idealities in memristor-based neural networks. Nature Communications 11 , 1 ( 2020 ).
- 299 . Zidan, M. A., Strachan, J. P. & Lu, W. D. The future of electronics based on memristive systems. Nature Electronics 1 , 22 ( 2018 ).
- 300 . Mannocci, P., Farronato, M., Lepri, N., Cattaneo, L., Glukhov, A., Sun, Z. & Ielmini, D. In-memory computing with emerging memory devices: Status and outlook. APL Machine Learning 1 ( 2023 ).
- 301 . Duan, S., Hu, X., Dong, Z., Wang, L. & Mazumder, P. Memristor-based cellular nonlinear/neural network: design, analysis, and applications. IEEE Transactions on Neural Networks and Learning Systems 26 , 1202 ( 2014 ).
- 302 . Ascoli, A., Messaris, I., Tetzlaff, R. & Chua, L. O. Theoretical foundations of memristor cellular nonlinear networks: Stability analysis with dynamic memristors. IEEE Transactions on Circuits and Systems I: Regular Papers 67 , 1389 ( 2019 ).
- 303 . Wang, R., Shi, T., Zhang, X., Wei, J., Lu, J., Zhu, J., Wu, Z., Liu, Q. & Liu, M. Implementing insitu self-organizing maps with memristor crossbar arrays for data mining and optimization. Nature Communications 13 , 1 ( 2022 ).
- 304 . Likharev, K., Mayr, A., Muckra, I. & Türel, Ö. CrossNets: High-performance neuromorphic architectures for CMOL circuits. Annals of the New York Academy of Sciences 1006 , 146 ( 2003 ).
- 305 . Betta, G., Graffi, S., Kovacs, Z. M. & Masetti, G. CMOS implementation of an analogically programmable cellular neural network. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 40 , 206 ( 1993 ).
- 306 . Khacef, L., Rodriguez, L. & Miramond, B. Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning. Electronics 9 , 1605 ( 2020 ).
- 307 . Lin, P., Pi, S. & Xia, Q. 3 D integration of planar crossbar memristive devices with CMOS substrate. Nanotechnology 25 , 405202 ( 2014 ).
- 308 . Boahen, K., Nomura, M., Ros Vidal, E. & Van Rullen, R. Address-Event Senders and Receivers: Implementing Direction-Selectivity and Orientation-Tuning (eds Cohen, A., Douglas, R., Horiuchi, T., Indiveri, G., Koch, C., Sejnowski, T. & Shamma, S.) 1998 .
- 309 . Park, J., Yu, T., Joshi, S., Maier, C. & Cauwenberghs, G. Hierarchical address event routing for reconfigurable large-scale neuromorphic systems. IEEE transactions on neural networks and learning systems 28 , 2408 ( 2016 ).
- 310 . Cai, F., Kumar, S., Van Vaerenbergh, T., Sheng, X., Liu, R., Li, C., Liu, Z., Foltin, M., Yu, S., Xia, Q., et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nature Electronics 3 , 409 ( 2020 ).
- 311 . Bartolozzi, C. & Indiveri, G. Synaptic dynamics in analog VLSI. Neural computation 19 , 2581 ( 2007 ).
- 312 . Esmanhotto, E., Brunet, L., Castellani, N., Bonnet, D., Dalgaty, T., Grenouillet, L., Ly, D., Cagli, C., Vizioz, C., Allouti, N., et al. High-Density 3 D Monolithically Integrated Multiple 1 T 1 R Multi-Level-Cell for Neural Networks in 2020 IEEE International Electron Devices Meeting (IEDM) ( 2020 ), 36 .
- 313 . Chen, J., Wu, C., Indiveri, G. & Payvand, M. Reliability Analysis of Memristor Crossbar Routers: Collisions and On/off Ratio Requirement in 2022 29 th IEEE International Conference on Electronics, Circuits and Systems (ICECS) ( 2022 ), 1 .
- 314 . Werbos, P. J. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78 , 1550 ( 1990 ).
- 315 . Dalgaty, T., Castellani, N., Turck, C., Harabi, K.-E., Querlioz, D. & Vianello, E. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nature Electronics 4 , 151 ( 2021 ).
- 316 . Zhao, M., Wu, H., Gao, B., Zhang, Q., Wu, W., Wang, S., Xi, Y., Wu, D., Deng, N., Yu, S., Chen, H.-Y. & Qian, H. Investigation of statistical retention of filamentary analog RRAM for neuromophic computing in 2017 IEEE International Electron Devices Meeting (IEDM) ( 2017 ), 39 . 4 . 1 .
- 317 . Moro, F., Esmanhotto, E., Hirtzlin, T., Castellani, N., Trabelsi, A., Dalgaty, T., Molas, G., Andrieu, F., Brivio, S., Spiga, S., et al. Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks. IEEE International Symposium in Circuits and Systems ( 2022 ).
- 318 . Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine 20 , 45 ( 2001 ).
- 319 . Lee, H.-Y., Hsu, C.-M., Huang, S.-C., Shih, Y.-W. & Luo, C.-H. Designing low power of sigma delta modulator for biomedical application. Biomedical Engineering: Applications, Basis and Communications 17 , 181 ( 2005 ).
- 320 . Corradi, F. & Indiveri, G. A neuromorphic event-based neural recording system for smart brain-machine-interfaces. IEEE Transactions on Biomedical Circuits and Systems 9 , 699 ( 2015 ).
- 321 . Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. OpenAI Gym 2016 .
- 322 . Luo, W., Sun, P., Zhong, F., Liu, W., Zhang, T. & Wang, Y. End-to-End Active Object Tracking and Its Real-World Deployment via Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 , 1317 ( 2020 ).
- 323 . Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Science Robotics 5 ( 2020 ).
- 324 . Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C. & Silver, D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 , 350 ( 2019 ).
- 325 . OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L. & Zaremba, W. Learning Dexterous In-Hand Manipulation. arXiv: 1808 . 00177 [cs, stat] ( 2019 ).
- 326 . Jordan, J., Schmidt, M., Senn, W. & Petrovici, M. A. Evolving interpretable plasticity for spiking networks. eLife 10 , e 66273 ( 2021 ).
- 327 . Rabaey, J. M., Chandrakasan, A. P. & Nikoli´ c, B. Digital integrated circuits: a design perspective (Pearson education Upper Saddle River, NJ, 2003 ).
- 328 . Yik, J., Ahmed, S. H., Ahmed, Z., Anderson, B., Andreou, A. G., Bartolozzi, C., Basu, A., Blanken, D. d., Bogdan, P., Bohte, S., et al. NeuroBench: Advancing neuromorphic computing through collaborative, fair and representative benchmarking. arXiv preprint arXiv: 2304 . 04640 ( 2023 ).
- 329 . Merolla, P., Arthur, J., Alvarez, R., Bussat, J.-M. & Boahen, K. A Multicast Tree Router for Multichip Neuromorphic Systems. Circuits and Systems I: Regular Papers, IEEE Transactions on 61 , 820 ( 2014 ).
- 330 . Painkras, E., Plana, L., Garside, J., Temple, S., Galluppi, F., Patterson, C., Lester, D., Brown, A. & Furber, S. SpiNNaker: A 1 -W 18 -Core System-on-Chip for Massively-Parallel Neural Network Simulation. IEEE Journal of Solid-State Circuits 48 , 1943 ( 2013 ).
- 331 . Benjamin, B. V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, A. R., Bussat, J., Alvarez-Icaza, R., Arthur, J., Merolla, P. & Boahen, K. Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations. Proceedings of the IEEE 102 , 699 ( 2014 ).
- 332 . Basu, A., Deng, L., Frenkel, C. & Zhang, X. Spiking neural network integrated circuits: A review of trends and future directions in 2022 IEEE Custom Integrated Circuits Conference (CICC) ( 2022 ), 1 .
- 333 . Pan, X., Ye, T., Xia, Z., Song, S. & Huang, G. Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition ( 2023 ), 2082 .
- 334 . Yu, T., Li, X., Cai, Y., Sun, M. & Li, P. S 2 -mlp: Spatial-shift mlp architecture for vision in Proceedings of the IEEE/CVF winter conference on applications of computer vision ( 2022 ), 297 .
- 335 . Strother, J. A., Nern, A. & Reiser, M. B. Direct observation of ON and OFF pathways in the Drosophila visual system. Current Biology 24 , 976 ( 2014 ).
- 336 . Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G. A. F., Joshi, P., Plank, P. & Risbud, S. R. Advancing neuromorphic computing with Loihi: A survey of results and outlook. Proceedings of the IEEE 109 , 911 ( 2021 ).
- 337 . Dalgaty, T., Mesquida, T., Joubert, D., Sironi, A., Vivet, P. & Posch, C. HUGNet: Hemi-Spherical Update Graph Neural Network applied to low-latency event-based optical flow in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition ( 2023 ), 3952 .
- 338 . Aimone, J. B., Date, P., Fonseca-Guerra, G. A., Hamilton, K. E., Henke, K., Kay, B., Kenyon, G. T., Kulkarni, S. R., Mniszewski, S. M., Parsa, M., et al. A review of non-cognitive applications for neuromorphic computing. Neuromorphic Computing and Engineering 2 , 032003 ( 2022 ).
- 339 . Dalgaty, T., Payvand, M., De Salvo, B., et al. Hybrid CMOS-RRAM neurons with intrinsic plasticity in IEEE ISCAS ( 2019 ), 1 .
- 340 . Joshi, V., Gallo, M. L., Haefeli, S., Boybat, I., Nandakumar, S. R., Piveteau, C., Dazzi, M., Rajendran, B., Sebastian, A. & Eleftheriou, E. Accurate deep neural network inference using computational phase-change memory. Nature Communications 11 ( 2020 ).
- 341 . Corradi, F., Bontrager, D. & Indiveri, G. Toward neuromorphic intelligent brain-machine interfaces: An event-based neural recording and processing system in Biomedical Circuits and Systems Conference (BioCAS) ( 2014 ), 584 .
- 342 . Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I. & Bachem, O. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation version 0 . 0 . 13 . 2021 .
- 343 . Zucchet, N., Meier, R., Schug, S., Mujika, A. & Sacramento, J. Online learning of long-range dependencies. arXiv [cs.LG] ( 2023 ).
- 344 . Scellier, B. & Bengio, Y. Equilibrium Propagation: Bridging the gap between energy-based models and Backpropagation. Front. Comput. Neurosci. 11 , 24 ( 2017 ).
- 345 . Polimeni, J. M., Mayumi, K., Giampietro, M. & Alcott, B. The Jevons paradox and the myth of resource efficiency improvements 200 pp. (Routledge, London, England, 2012 ).
- 346 . Newman, M. & Watts, D. Renormalization group analysis of the small-world network model. Physics Letters A 263 , 341 ( 1999 ).
- 347 . Zamarreño-Ramos, C., Camuñas-Mesa, L., Pérez-Carrasco, J., Masquelier, T., SerranoGotarredona, T. & Linares-Barranco, B. On spike-timing-dependent-plasticity, memristive devices, and building a self-learning visual cortex. Frontiers in Neuroscience 5 , 1 ( 2011 ).
- 348 . Zhu, X., Lee, J. & Lu, W. D. Iodine vacancy redistribution in organic-inorganic Halide perovskite films and resistive switching effects. Adv. Mater. 29 , 1700527 ( 29 2017 ).
- 349 . Nedelcu, G., Protesescu, L., Yakunin, S., Bodnarchuk, M. I., Grotevent, M. J. & Kovalenko, M. V. Fast anion-exchange in highly luminescent nanocrystals of cesium lead Halide perovskites (CsPbX 3 , X = Cl, Br, I). Nano Lett. 15 , 5635 ( 8 2015 ).
This thesis consists of six selected publications, conducted in collaboration with electrical engineers, computer scientists, material designers and neuroscientists. In this section, I roughly discuss my personal contributions to each project.
## Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning (Chapter 2 )
- collaborated with material scientists for data collection and modeling (e.g., Fig. 1 )
- coded training and evaluation of RRAM-based neural network simulations (e.g., Alg. 1 )
- assisted circuit design with simulation findings (e.g., Fig. 2 )
- contributed to majority of the manuscript including figures
Online training of spiking recurrent neural networks with Phase-Change Memory synapses (Chapter 3 )
- performed literature review to identify the problem
- coded e-prop, PCM-based analog simulation framework, and neural network training
- conducted all data analysis and visualization
- contributed to majority of the manuscript including figures
## Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware (Chapter 3 )
- collaboratively planned the project with IBM researchers and INI (e.g., work assignments, experiments, deadlines)
- assisted with experiment datasets, architecture design following prior work [ 126 ]
- weekly supervision of Anja Å urina (e.g., debugging, hyperparameter opt...)
- assisted paper writing and designed several figures
## PCM-trace: scalable synaptic eligibility traces with resistivity drift of Phase-Change Materials (Chapter 3 )
- performed literature review to identify the problem
- collaborated with material scientists for data collection and modeling
- coded pcm-trace and multi-pcm trace experiments and analysis
- assisted circuit design with simulation findings
- contributed to majority of the manuscript
## Reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing (Chapter 4 )
- performed literature review to identify the problem
- collaborated with material scientists for data collection, required device specifications and modeling (e.g. non-volatility time constant, Fig. 4 b-c)
- coded training and evaluation of simulations with volatile and non-volatile memristor models (e.g., RC framework)
- designed ICC modulated training following prior work [ 276 ] (e.g., Supplementary Fig. 28 )
- contributed to majority of the manuscript
Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems (Chapter 5 )
- performed extensive literature review to identify the strengths the idea
- collaborated with material scientists for data collection and modeling (e.g., Fig. 2 d)
- coded layout-aware training and evaluation of SHD with backprop and RL with ES benchmarks (e.g., Fig. 4 )
- contributed to majority of the manuscript
## Articles in peer-reviewed journals:
- 1 . John, R. A., Demirag, Y., Shynkarenko, Y., Berezovska, Y., Ohannessian, N., Payvand, M., Zeng, P., Bodnarchuk, M. I., Krumeich, F., Kara, G., Shorubalko, I., Nair, M. V., Cooke, G. A., Lippert, T., Indiveri, G. & Kovalenko, M. V. Reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing. Nat. Commun. 13 , 2074 ( 1 2022 ).
- 2 . Dalgaty, T., Moro, F., Demirag, Y., De Pra, A., Indiveri, G., Vianello, E. & Payvand, M. Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems. Nat. Commun. 15 , 1 ( 1 2024 ).
- 3 . D'Agostino, S., Moro, F., Torchet, T., Demirag, Y., Grenouillet, L., Castellani, N., Indiveri, G., Vianello, E. & Payvand, M. DenRAM: neuromorphic dendritic architecture with RRAM for efficient temporal processing with delays. Nat. Commun. 15 , 1 ( 1 2024 ).
## Preprints:
- 4 . Demirag, Y., Frenkel, C., Payvand, M. & Indiveri, G. Online training of spiking recurrent neural networks with Phase-Change Memory synapses 2021 .
## Conference contributions:
- 5 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable synaptic eligibility traces with resistivity drift of phase-change materials in 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 2021 IEEE International Symposium on Circuits and Systems (ISCAS)Daegu, Korea. 00 (IEEE, 2021 ), 1 .
- 6 . Demirag, Y. & Indiveri, G. Network of biologically plausible neuron models can solve motor tasks through heterogeneity in Computational and Systems Neuroscience (COSYNE) (Lisbon, Portugal, 2024 ).
- 7 . Demirag, Y., Dittmann, R., Indiveri, G. & Neftci, E. Overcoming phase-change material nonidealities by meta-learning for adaptation on the edge in Proceedings of Neuromorphic Materials, Devices, Circuits and Systems (NeuMatDeCaS) ( 2023 ).
- 8 . Payvand, M., Demirag, Y., Dalgaty, T., Vianello, E. & Indiveri, G. Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 2020 IEEE International Symposium on Circuits and Systems (ISCAS)Seville, Spain. 00 (IEEE, 2020 ), 1 .
- 9 . Bohnstingl, T., Surina, A., Fabre, M., Demirag, Y., Frenkel, C., Payvand, M., Indiveri, G. & Pantazi, A. Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware in 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS)Incheon, Korea, Republic of. 00 (IEEE, 2022 ), 218 .
- 10 . Payvand, M., D'Agostino, S., Moro, F., Demirag, Y., Indiveri, G. & Vianello, E. Dendritic computation through exploiting resistive memory as both delays and weights in Proceedings of the 2023 International Conference on Neuromorphic Systems International Conference on Neuromorphic Systems (ICONS) (New York, USA, 2023 ), 1 .
- 11 . Raghunathan, K. C., Demirag, Y., Neftci, E. & Payvand, M. Hardware-aware Few-shot Learning on a Memristor-based Small-world Architecture in Neuro Inspired Computational Elements Conference (NICE) (IEEE, 2024 ).
- 12 . Raghunathan, K. C., Demirag, Y., Moro, F., Neftci, E. & Payvand, M. Few-shot learning on brain-inspired small-world graphical hardware in International Conference on Neuromorphic, Natural and Physical Computing (NNPC 2023 ) (Hannover, Germany, 2023 ).
- 13 . Yu, Z., Bégon-Lours, L., Demirag, Y. & Offrein, B. J. BEOL compatible cross-bar array of ferroelectric synapses in Proceedings of the 2021 International Conference on Neuromorphic Systems International Conference on Neuromorphic Systems (ICONS) ( 2021 ).
- 14 . Yik, J., Ahmed, S. H., Ahmed, Z., Anderson, B., Andreou, A. G., Bartolozzi, C., Basu, A., Blanken, D. d., Bogdan, P., Bohte, S., Bouhadjar, Y., Buckley, S., Cauwenberghs, G., Corradi, F., de Croon, G., Danielescu, A., Daram, A., Davies, M., Demirag, Y., Eshraghian, J., Forest, J., Furber, S., Furlong, M., Gilra, A., Indiveri, G., Joshi, S., Karia, V., Khacef, L., Knight, J. C., Kriener, L., Kubendran, R., Kudithipudi, D., Lenz, G., Manohar, R., Mayr, C., Michmizos, K., Muir, D., Neftci, E., Nowotny, T., Ottati, F., Ozcelikkale, A., Pacik-Nelson, N., Panda, P., Pao-Sheng, S., Payvand, M., Pehle, C., Petrovici, M. A., Posch, C., Renner, A., Sandamirskaya, Y., Schaefer, C. J. S., van Schaik, A., Schemmel, J., Schuman, C., Seo, J.-S., Sheik, S., Shrestha, S. B., Sifalakis, M., Sironi, A., Stewart, K., Stewart, T. C., Stratmann, P., Tang, G., Timcheck, J., Verhelst, M., Vineyard, C. M., Vogginger, B., Yousefzadeh, A., Zhou, B., Zohora, F. T., Frenkel, C. & Reddi, V. J. NeuroBench: Advancing neuromorphic computing through collaborative, fair and representative benchmarking in ( 2023 ).