2412.20848

Model: nemotron-free

## Analog Alchemy: Neural Computation with In-Memory Inference, Learning and Routing Dissertation zur Erlangung der naturwissenschaftlichen Doktorwürde (Dr. sc. UZH ETH Zürich) vorgelegt der Mathematisch-naturwissenschaftlichen Fakultät der Universität Zürich und der Eidgenössischen Technischen Hochschule Zürich von Yiğit Demirağ aus der Türkei Promotionskommission Prof. Dr. Giacomo Indiveri (Vorsitz und Leitung) Prof. Dr. Melika Payvand Prof. Dr. Benjamin Grewe Zürich, 2024 ## yigit demirag ## ANALOG ALCHEMY: NEURAL COMPUTATION WITH IN-MEMORY I N F E R E N C E , LEARNING AND ROUTING To the engineers and scientists who will one day build superintelligence; from whatever materials and circuits, in whatever form. ## ABSTRACT As neural computation is revolutionizing the field of Artificial Intelligence (AI), rethinking the ideal neural hardware is becoming the next frontier. Fast and reliable von Neumann architecture has been the hosting platform for neural computation. Although capable, its separation of memory and computation creates the bottleneck for the energy efficiency of neural computation, contrasting the biological brain. The question remains: how can we efficiently combine memory and computation, while exploiting the physics of the substrate, to build intelligent systems? In this thesis, I explore an alternative way with memristive devices for neural computation, where the unique physical dynamics of the devices are used for inference, learning and routing. Guided by the principles of gradient-based learning, we selected functions that need to be materialized, and analyzed connectomics principles for efficient wiring. Despite non-idealities and noise inherent in analog physics, I will provide hardware evidence of adaptability of local learning to memristive substrates, new material stacks and circuit blocks that aid in solving the credit assignment problem and efficient routing between analog crossbars for scalable architectures. First, I address limited bit precision problem in binary Resistive Random Access Memory (RRAM) devices for stable training. By introducing a new device programning technique that precisely controls the filament growth process, we enhance the effective bit precision of these devices. Later, we prove the versatility of this technique by applying it to novel perovskite memristors. Second, I focus on the hard problem of online credit assignment in recurrent Spiking Neural Networks (SNNs) in the presence of memristor non-idealities. I present a simulation framework based on a comprehensive statistical model of Phase Change Material (PCM) crossbar array, capturing all major device non-idealities. Building upon the recently developed e-prop local learning rule, we demonstrate that gradient accumulation is crucial for reliably implementing the learning rule with memristive devices. Moreover, I introduce PCM-trace, a scalable implementation of synaptic eligibility traces, a functional block demanded by many learning rules, using volatile characteristics by specifically fabricated PCM devices. Third, I present our discovery of a novel memristor material capable of switching between volatile and non-volatile modes. This reconfigurable memristor, based on halide perovskite nanocrystals, offers a significant advancement in emerging memory technologies, enabling the implementation of both static and dynamic neural variables with the same material and fabrication technique, while holding the world record in endurance. Finally, I introduce Mosaic, a memristive systolic architecture for in memory computing and routing. Mosaic, trained with our novel layout-aware training methods, efficiently implements small-world graph connectivity and demonstrates superior energy efficiency in spike routing compared to other hardware platforms. ## ACKNOWLEDGEMENTS This thesis wouldn't have been possible without many people: scientists, friends, and family. I'm honored to have shared this journey with such curious and driven individuals. Among the many who contributed, there are a few exceptional individuals who were absolutely core to making this happen: First I'd like to thank my supervisor, Giacomo Indiveri, who is a rare scientist truly channeling his work towards a dream. Over these 5 years, he gave me complete freedom to explore what I believe are the most exciting problems, while providing me with high-bandwidth feedback on demand, with more than 500 emails and many thousands of DMs. He taught me the importance of pushing unusual ideas to the limit. Whenever I came up with an ambitious project goal, he always reminded me to deeply consider the efficiency on the silicon, first. I'm grateful for having been his student. I've been very fortunate to coincide with Melika Payvand, my co-supervisor, in this particular academic space and time. She is the most curious mind craving to understand the emergence of intelligence from the physics of computation, and her passion is infectious. Together, we traversed the probability trees for nearly every projects in my PhD, and executed against the entropy. Her close friendship is the cherry on the cake; I enjoyed and valued every second of it. Then there are people I am very lucky to collaborate with and learn from. Rohit A. John, an extraordinary person who taught me the importance of grinding with massive focus while solving hard problems. And Elisa Vianello, who always provided her seamless support and insights that have made hard projects a joyful exploration. And Emre Neftci, whose disruptive scientific ideas deeply resonated with me, and with whom I always enjoyed discussing ideas. I have to thank Alpha, Anqchi, Arianna, Chiara, Dmitrii, Farah, Filippo, Jimmy, Karthik, Manu, Maryada, Nicoletta, Tristan, and many others, who I hope will forgive me for not being mentioned individually or for resorting to alphabetical order when I did. Thank you for inspiring conversations in INI hallways, night walks in Zurich, and giving me the privilege of calling you, my friends. During my PhD, I completed two internships at Google Zurich and one research visit at MILA. All of these were fantastic learning experiences, where I got a chance to reshape my research scope. From these experiences, I would especially like to thank Jyrki Alakuijala, Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, Alexander Mordvintsev, Esteban Real, Arna Ghosh, Jonathan Cornford, Joao Sacramento, Blake Richards, Guillaume Lajoie, and Blaise Aguera y Arcas. I would also like to extend my thanks to my professors from my Master's degree, particularly Ekmel Ozbay, Bayram Butun and Yusuf Leblebici, for their invaluable support and inspiration. And to Gizay, for sharing most of the journey and everything we created together. But most of all, I want to express my deepest gratitude to my mom and dad, who inspired me to be curious, to take the world as a playground, and provided me with a loving home. And to my little brother, Efe, who is the best teammate in every game we play and in life's adventures. ## CONTENTS | 1 Introduction | 1 Introduction | 1 Introduction | 1 Introduction | 1 | 1 | 1 | 1 | 1 | |------------------|-----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----|-----| | 2 | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | Enhancing Bit Precision of Binary Memristors for Robust On-chip Learning | 9 | 9 | | | | 2 . 1 | Introduction 9 | Introduction 9 | Introduction 9 | Introduction 9 | Introduction 9 | | | | | 2 . 2 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | ReRAM Device Modeling 10 | | | | | 2 . 3 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | Bit-Precision Enhancing Weight Update Rule 10 | | | | | 2 . 4 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | Learning Circuits and Architecture 12 | | | | | 2 . 5 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | System-level Simulations 13 | | | | | 2 . 6 | Discussion 14 | Discussion 14 | Discussion 14 | Discussion 14 | Discussion 14 | | | | 3 | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | Online Temporal Credit Assignment with Non-volatile and Volatile Memristors | 15 | | | | 3 . 1 | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | Framework for Online Training of RSNNs with Non-volatile Memristors | 15 | 15 | | | | | 3 . 1 . 2 | Introduction 15 Building blocks for training on in-memory processing cores | Introduction 15 Building blocks for training on in-memory processing cores | 16 | | | | | | | 3 . 1 . 3 | PCM device modeling and integration into neural networks | PCM device modeling and integration into neural networks | 16 | | | | | | | 3 . 1 . 4 | Discussion 23 | Discussion 23 | Discussion 23 | | | | | | 3 . 2 | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | Implementing Online Training of RSNNs on a Neuromorphic Hardware | 25 | 25 | | | | | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | 3 . 2 . 1 | | | | | | 3 2 2 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | From the simulation to an analog chip 25 | | | | | | . . | Discussion 26 | Discussion 26 | Discussion 26 | Discussion 26 | | | | | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | 3 . 3 Scalable Synaptic Eligibility Traces with Volatile Memristive Devices 28 | | | | | | 3 . 3 . 1 | Introduction 28 | Introduction 28 | Introduction 28 | Introduction 28 | | | | | | 3 . 3 . 2 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | PCM-trace: Implementing eligibility traces with PCM drift 30 | | | | | | 3 . 3 . 3 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | Multi PCM-trace: Increasing the dynamic range of traces 31 | | | | | | 3 . 3 . 4 | Circuits and Architecture 32 | Circuits and Architecture 32 | Circuits and Architecture 32 | Circuits and Architecture 32 | | | | | | 3 . 3 . 5 | Discussion 34 | Discussion 34 | Discussion 34 | Discussion 34 | | | | 4 | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | Discovering Single Material that Switches Between Volatile or Non-Volatile Modes | 37 | | | | 4 . 1 | Introduction 37 | Introduction 37 | Introduction 37 | Introduction 37 | Introduction 37 | | | | | 4 . 2 | Diffusive Mode of the Perovskite Reconfigurable Memristor | Diffusive Mode of the Perovskite Reconfigurable Memristor | Diffusive Mode of the Perovskite Reconfigurable Memristor | 40 | | | | | | 4 . 3 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | Drift Mode of the Perovskite Reconfigurable Memristor 40 | | | | | | 4 . 4 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | Reservoir Computing with Perovskite Memristors 42 | | | | | | 4 . 5 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | Diffusive Perovskite Memristors as Reservoir Elements 43 | | | | | | 4 . 6 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | Drift Perovskite Memristors as Readout Elements 43 | | | | | | 4 . 7 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | Classification of Neural Firing Patterns 44 | | | | | | 4 | | | | | | | | | | . 8 | Discussion 44 | Discussion 44 | Discussion 44 | Discussion 44 | | | | | 5 | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | Methods | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 4 . 9 Mosaic: An Analog Systolic Architecture for In-Memory Computing and Routing | 49 | | | | 5 . 1 | Introduction 49 | Introduction 49 | Introduction 49 | Introduction 49 | Introduction 49 | | | | | 5 . 2 | Mosaic Hardware Computing and Routing Measurements | Mosaic Hardware Computing and Routing Measurements | Mosaic Hardware Computing and Routing Measurements | 51 | | | | | | 5 . 3 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | Analog Hardware-aware Simulations 55 | | | | | 5 . 4 | Benchmarking Routing Energy in Neuromorphic Platforms | Benchmarking Routing Energy in Neuromorphic Platforms | Benchmarking Routing Energy in Neuromorphic Platforms | 56 | | | | | | 5 . 5 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | Discussion 58 Methods 60 | | | | | 5 . 6 | Conclusions 65 | Conclusions 65 | Conclusions 65 | Conclusions 65 | Conclusions 65 | | | | | | | 69 | 69 | 69 | 69 | | | | a | Appendix | Appendix | | | | | | | | | Bibliography | Bibliography | 93 | 93 | 93 | 93 | | | | | Contributions Publications | Contributions Publications | 113 | 113 | 113 | 113 | | | You've got to listen to the silicon, because it's always trying to tell you what it can do. Carver Mead How do we imbue the spark of intelligence into lifeless computational physical substrates? This question has been my quest, inspired by the early pioneers such as McCulloch and Pitts [ 1 ], Alan Turing [ 2 ] and von Neumann [ 3 ], who laid the foundation of modern neural computing. As the field of AI progressively gains superiority in numerous benchmarks, the quest to understand intelligence and to rethink its ideal implementation on a physical substrate has never been more pressing. While intelligence remains elusive with numerous definitions, learning 1 seems to me to be a cornerstone of intelligence. Whether it is natural or artificial , the intelligent agent must adapt to survive and replicate. Bacteria can learn to swim away from environments that lower the probability of successful replication [ 4 ]. And Artificial Neural Network (ANN) architectures absorbing the datasets better are preferred by AI researchers and industry [ 5 ]. This is a common theme; intelligent systems should learn well to last. And any physical implementation of an intelligent system, likewise, needs to implement learning dynamics. However, the computational demands of learning place an enormous burden on existing hardware. For over 75 years, computing hardware has relied on the von Neumann architecture: synchronous, deterministic, binary logic driving a processing unit that interfaces with a separate memory subsystem. This design excels at executing arbitrary sequential instructions, but it necessitates constant shuttling of data between memory and compute. 2 Memory hierarchies, with their layers of progressively larger and slower storage, have been the stopgap solution, but fundamentally the bottleneck remains. This non-local memory access is a leading factor in the latency and energy consumption of modern AI systems. In stark contrast, neural computation in biology is inherently intertwined with memory, operating asynchronously, sparsely, and stochastically. This calls for a fundamental rethinking of computing, where neural models and hardware are co-designed with locality and physics-awareness as first principles. ## An Alternative Path This thesis departs from the well-established path of digital accelerator design. The inherent noise tolerance of neural networks presents an opportunity to relax strict precision and determinism requirements in both compute and memory subsystems of digital electronics. In turn, this relaxation unlocks some exotic modes of computation, where subthreshold regime of transistors and raw physics of novel materials can be exploited for neural computation and storage. Historically, this is the essence of neuromorphic engineering, where a deliberate trade-off can be made, favoring low-power and scalability offered by statistical physical processes over theoretical precision of boolean algebra. This thesis optimizes neural computation across Marr's computational, algorithmic, and implementational levels [ 6 ], advocating for the co-design of neural models and hardware with locality and physics-awareness as guiding principles. By pushing critical neural operations to the fundamental level of material physics, engaging electrical, chemical, or even mechanical properties, we explore a new frontier in low-power neural computing. Specifically, we investigate the following key strategies, which are detailed in the subsequent sections: 1 dynamic process of adapting to the environmental pressure to improve the probability of survival or replication. 2 partially due to the rapid time-multiplexing of resources. - Analog In-Memory Computing: We exploit the physics of volatile (temporary) or nonvolatile (permanent) materials to perform critical neural network operations directly within the memory units. This non-von Neumann architecture fundamentally eliminates the need for data movement between memory and compute units, which are traditionally decoupled systems. - Local Learning: We depart from the computationally intensive Back-Propagation Through Time (BPTT) for training, opting instead for local and online gradient-based learning rules. These rules inherently exhibit varying degrees of variance and bias in their estimates of the gradient of an objective function [ 7 ]. However, they offer significant advantages in hardware in terms of power efficiency and simplified implementation due to local availability of weight update signals and the elimination of the need for buffering intermediate values. - Analog In-Memory Routing: We bring out locally dense and globally sparse connectivity on neural networks. This connectivity prior allows high utilization of routers with non-volatile materials to efficiently transmit neural activations within cores of analog systolic arrays. - Physics-aware Training: We utilize data-driven optimization to counteract non-idealities inherent to analog technologies. This involves collecting extensive component measurements to model their collective behavior, tailoring learning algorithms and circuits accordingly. Additionally, we employ gradient-based architectural adaptations and weight re-parameterizations for robust on-chip inference and training. Our methods are validated on small-scale fabrications to assess their on-device performance and scalability potential. In the following section, I will explain memristors, the prima materia of our endeavor, that enables and unifies these strategies explored in this thesis. ## Memristors for Analog in Memory Neural Operations Neural network computation, biological or artificial, is fundamentally memory-centric. The human brain operates on O ( 10 15 ) synapses [ 8 ], while Large Language Models (LLMs) like GPT4 perform non-linear operations on O ( 10 12 ) parameters [ 9 ]. Scaling laws reveal a direct link between parameter count and performance [ 10 ], suggesting increasing network size is a reliable path to improved performance in future neural networks. Given that memory requirements scale quadratically with the number of layers, memory becomes the primary design constraint in neural computing hardware, impacting scalability, throughput, and power efficiency. 3 An ideal memory system for neural computing should be high density, low-energy, quick access, and on-chip. However, an ideal memory does not yet exist, as these requirements are often conflicting. High density memories (i.e., DRAM, High Bandwidth Memory (HBM), 3 D NAND flash) are off-chip with long time to access, faster memories that don't use capacitors (i.e. SRAM) are larger in area due to transistor count, and high-bandwith memories (i.e., HBM) are expensive as they require additional banks, ports and channels. That's why memory hierarchy exists, to address this trade-off by using multiple levels of progressively larger and slower storage. Yet, the von Neumann bottleneck persists, with memory access to each layer in the hierarchy F igure 1 . 1 : Importance of data locality is shown in the memory hierarchy as the computation ( 8 b multiplication) costs only the fraction of the energy of memory access, in 45 nm CMOS [ 11 ]. <details> <summary>Image 1 Details</summary> ![ae45ded1](/v1/image/ae45ded1d7dc23e50dd8a3110003d8b823de52dffe0b438e2a56d16d77af77b1) ### Visual Description ## Pyramid Diagram: Memory Hierarchy Energy Consumption ### Overview The image depicts a hierarchical pyramid diagram illustrating energy consumption across different memory technologies. The pyramid is divided into four tiers, with energy costs increasing exponentially from the top (smallest memory) to the bottom (largest memory). Arrows between tiers quantify the energy required to access or transfer data between memory levels. ### Components/Axes - **Vertical Axis**: Energy consumption (pJ) per bit/byte, labeled with incremental values (0.2 pJ, 5 pJ/8b, 50 pJ/8b, 640 pJ/32b). - **Horizontal Tiers**: Memory types and capacities: - Top tier: "8b mul" (0.2 pJ) - Second tier: "Local SRAM (8 kB)" with arrow labeled "5 pJ / 8b" - Third tier: "On-chip SRAM (~100 MBs)" with arrow labeled "50 pJ / 8b" - Bottom tier: "DRAM (GBs)" with arrow labeled "640 pJ / 32b" - **Legend**: Implicitly defined by tier colors (not explicitly labeled; diagram uses black-and-white shading). ### Detailed Analysis 1. **8b mul**: - Energy: 0.2 pJ (explicitly labeled). - Position: Topmost tier, smallest memory footprint. 2. **Local SRAM (8 kB)**: - Energy: 5 pJ per 8 bits (arrow label). - Position: Second tier, 8 kB capacity. 3. **On-chip SRAM (~100 MBs)**: - Energy: 50 pJ per 8 bits (arrow label). - Position: Third tier, ~100 MB capacity. 4. **DRAM (GBs)**: - Energy: 640 pJ per 32 bits (arrow label). - Position: Bottom tier, largest capacity (GBs). ### Key Observations - **Exponential Energy Scaling**: Energy per bit/byte increases by orders of magnitude as memory size grows: - 8b mul → Local SRAM: 25× increase (0.2 pJ → 5 pJ/8b). - Local SRAM → On-chip SRAM: 10× increase (5 pJ/8b → 50 pJ/8b). - On-chip SRAM → DRAM: 12.8× increase (50 pJ/8b → 640 pJ/32b). - **Capacity vs. Efficiency Trade-off**: Larger memory capacities (e.g., DRAM) exhibit significantly higher energy costs per unit of data. - **Uncertainty**: On-chip SRAM capacity is approximate ("~100 MBs"), suggesting variability in real-world implementations. ### Interpretation The diagram highlights the **energy efficiency hierarchy** in memory systems. Smaller, faster memory (e.g., 8b mul) is vastly more energy-efficient than larger, slower memory (e.g., DRAM). This reflects the **memory wall** challenge in computing, where data movement between tiers dominates system power consumption. The pyramid structure visually emphasizes that optimizing for capacity (e.g., using DRAM) incurs prohibitive energy costs, while smaller, on-chip memory (e.g., Local SRAM) balances efficiency and performance. The ~100 MB on-chip SRAM tier likely represents a practical compromise for modern processors, aligning with typical L3 cache sizes. The absence of explicit color in the diagram simplifies interpretation but limits direct comparison to color-coded legends common in multi-series charts. </details> (i.e., DRAM → on-chip SRAM → local SRAM), incurring roughly an order of magnitude more energy and latency penalty. Even in the ideal case of local SRAM, the cost of memory access exceeds that of computation by an order of magnitude. In-memory computing offers a compelling 3 On edge Tensor Processing Unit (TPU) for instance, memory access can consume over 90% of the total energy, throttle throughput to below 10% peak capacity, and in general dominate the majority of chip area [ 12 ]. alternative to data movement by processing data where it is stored, and one of the most promising technologies for in-memory computing is memristors. Memristors are two-terminal analog resistive memory devices capable of both computation and memory [ 13 ]. They have a unique ability to encode and store information in their electrical conductance, which can be altered based on the history of applied electrical pulses [ 14 ]. Because a single memristor's conductance can transition between multiple levels between Low Conductive State (LCS) and High Conductance State (HCS), it can store more than one bit of information (typically between to 3 and 5 bits) improving the memory density. 4 Various memristor types exist, each with distinct operating principles and advantages. For example, PCM relies on the contrasting electrical resistance of amorphous and crystalline phases in chalcogenide materials [ 16 , 17 ], RRAM operates by altering the resistance of dialectric material through the drift of oxygen vacancies or the formation of conductive filaments [ 18 ], and Ferroelectric Random Access Memory (FeRAM) uses the polarization of ferroelectric materials to store and change information [ 19 ]. Although the electrical interface to these types does not differ significantly (applying electrical pulses to read or program the conductance), these underlying mechanisms determine the device's switching speed, footprint, stochasticity, endurance, and energy efficiency. When implementing functions with memristor forms, it is helpful to categorize by volatility. Non-volatile memristors retain conductance after programming, ideal for weight storage in neural networks. This property extends to in-memory computing to perform dot-product operations [ 20 -22 ]. When an array of N memristive devices store vector elements Gi in their conductance states, applying voltages Vi to the devices and measuring the resulting currents I i , the dot product ∑ N i = 1 GiVi can be computed with O ( 1 ) complexity by Ohm's Law and Kirchoff's current law. This principle extends to matrix-vector multiplication, enabling neural network inference directly through the analog substrate's physics. In-memory neural inference has been demonstrated in large prototypes like the PCM-based HERMES chip [ 23 ] and RRAM-based NeuRRAM chip [ 18 ]. Volatile memristors, however, provide functionality beyond mere storage, and are often less explored. Their conductance decay within 10 ms to 100 ms allows for continuous-time, stochastic accumulate-and-decay functions, approximating low-pass filtering, signal averaging, or time tracking. In neural computations, volatile memristors have been used to implement short-term synaptic [ 24 , 25 ] and neuronal dynamics [ 26 , 27 ]. why spikes ? In this thesis, I focus on SNNs, where neural activations are represented as discrete pulses called spikes. This choice is primarily motivated by hardware considerations. Spikes enable extreme spatiotemporal sparsity, aligning well with asynchronous computation where energy consumption directly correlates with spiking events that trigger localized circuits [ 28 ]. Mixed-signal spiking neuron circuits inherently act as sigma-delta Analog-to-Digital Converters (ADCs), converting analog neural computation into digital spikes [ 29 ] for noise-robust transmission over long distances without signal degradation [ 30 ]. Additionally, spiking neurons seamlessly interface with reading and programming of memristive devices and also with event-based sensors [ 31 , 32 ] through Address-Event Representation (AER) [ 33 ] protocol. Spiking communication has even been proposed to mitigate severe heating issues in 3 D fabrication using n-ary coding schemes [ 34 ]. From a computational perspective, a common argument for spikes is efficient information encoding through precisely utilizing the temporal dimension (e.g., time-to-first-spike [ 35 ], phasecoding [ 36 ], or inter-spike-interval [ 37 ]). Furthermore, the spiking framework allows to formulate hypotheses about biophysical spike-based learning mechanisms in the brain [ 38 -40 ] and explore the computational capabilities of spiking neural networks [ 41 ], as demonstrated in our recent work. While a comprehensive exploration of the advantages of spikes is beyond this thesis's scope, the primary focus here is their potential for energy-efficient communication on analog substrates, as their unique computational benefits remain to be conclusively demonstrated. 4 These attributes are interestingly similar to biological synapses, where the synaptic efficacy is modulated by the history of pre- and post-synaptic activity, and is suggested to take 26 distinguishable strengths (correlated with spine head volume) [ 15 ]. ## State of the Art Despite their potential, memristors are not without their challenges for neural computing. In this section, I will outline some of these challanges in inference, learning and routing, and state-of-the-art attempts to address them. Limited bit precision. Programming nanoscale memristive devices, modifies their atomic arrangement, inherently stochastic, non-linear and of limited granularity [ 42 -45 ]. This analog non-idealities are known to cause significant performance drops in training networks compared to software simulations [ 46 ]. Controlled experiments by Sidler et al. [ 22 ] have demonstrated that the poor training performance is primarily due to insufficient number of pulses switching the device between High Resistive State (HRS) and Low Resistive State (LRS). Following this, various attempts have been made to improve the bit precision of memristive devices. Optimizing RRAM materials has increased the number of bits per device, but often at the cost of lower ON/OFF ratios (i.e., the ratio of the LRS and HRS), making it harder to distinguish states with small footprint circuits [ 47 -50 ]. Furthermore, architectural optimizations have been explored, including using multiple binary devices per synapse [ 51 ], assigning a number system to multiple devices [ 52 ], leveraging stochastic switching [ 53 , 54 ], and complementing binary memristive devices with capacitors [ 55 ]. However, these methods still require complex and large synaptic architectures, limiting scalability. In Chapter 2 , we propose a novel approach to program intrinsically 1 -bit RRAM devices to increase their effective bit resolution by precise control of the filament formation. The credit assignment problem. Learning in any systems is fundamentally about adjusting its parameters to improve its performance. In neural networks, learning involves adjusting weights, represented by the vector W , to optimize the performance, measured by an objective function, F ( W ) . The credit assignment problem refers to the challenge of determining the precise weight adjustments needed for improvement, especially in deep networks where the relationship between individual neurons and overall performance is less clear [ 56 ]. Traditionally, Hebbian mechanisms [ 57 ] leveraging the timing of pre- and post-synaptic spikes, have been the go-to solution for on-chip learning in the neuromorphic field. This is because Hebbian rules explain a numerous neuroscientific observations [ 58 , 59 ], posses interesting variance maximization properties [ 60 ] but more practically, utilize local signals to the synapse in a simple way, making them easily adaptable to silicon circuits [ 61 -64 ]. However, Hebbian rules alone had limited success while scaling to large networks, and require heavily crafted architectural biases to achieve hiererchical, disentangled representations. 5 Backpropagation [ 68 ], on the other hand, remains the state-of-the-art algorithm for training modern ANNs, and is one of the pillars of the deep learning revolution. To give an intuition why it works, following the insight from Richards et al. [ 7 ], let's consider weight changes, ∆ W , to be small and objective function to be maximized, F , to be locally smooth, then resulting ∆ F can be approximated as ∆ F ≈ ∆ W T · ∇ WF ( W ) , where ∇ WF ( W ) is the gradient of the objective function with respect to the weights. This means that to guarantee improving learning performance ( ∆ F ≥ 0), a principled approach to weight adjustment is to take a small step in the direction of the steepest performance improvement, guided by the gradient ( ∆ F ≈ η ∇ WF ( W ) T · ∇ WF ( W ) ≥ 0). Backpropagation explicitly calculating gradients, while powerful, is unsuitable for online learning on analog substrates due to its need for symmetric feedback weights and distinct forward/backward phases [ 56 ]. And for temporal credit assignment, BPTT unrolls the network in reverse time to backpropagate the error gradients, resulting in memory complexity scaling with O ( kT ) , where k is the number of time steps and T is the number of neurons. This temporal nonlocality, necessitates the usage of memory hierarchy to save silicon space, resulting in sacrificing energy efficiency and latency. For this reason, various alternatives have been proposed to estimate gradients while offering better locality, such as feedback alignment [ 69 ], Q-AGREL [ 70 ], difference 5 While this is true, it is possible that end-to-end credit assignment might not be needed [ 65 ]. Recent works successfully replacing global backpropagation with truncated layer-wise backpropagation supports this view [ 66 , 67 ]. Nevertheless, designing the right architecture and local loss functions to guide the truncated credit assignment is still an open question. target propagation [ 71 ], predictive coding [ 72 ] and weight perturbation-based methods [ 73 ]. These methods exhibit varying degrees of variance (resulting in slower convergence) and bias (leading to poor generalization [ 74 ]) in their estimations [ 7 ]. However, as long as variance and bias are within reasonable bounds, the learning rule can still be effective, potentially offering a favorable sweet spot between locality and performance. The challenge of local and effective spatio-temporal credit assignment is still a critical frontier in neural processing in analog substrates, demanding new and creative approaches. In Chapter 3 , we address this challenge for the first time, designing material and circuits evolved around gradient calculations, and implementing e-prop [ 75 ] local learning rule for Recurrent Spiking Neural Networks (RSNNs) on a memristive chip. ## Programming memristors with non-idealities for online learning. On-chip learning requires the ability to program the network weights in accordance with the demands of the learning rule. While digital systems often rely on quantization to optimize memory access [ 76 ] and associated mitigation techniques such as stochastic rounding [ 77 ], gradient scaling [ 78 ], quantization range learning [ 79 ] and optimizing the weight representation [ 80 ], analog memristive systems present unique challenges due to nonidealities such as conductance-dependent, non-linear, stochastic, and time-varying programming responses [ 81 ]. Then, it is crucial to identify which digital methods can be effectively transferred to analog systems while promising small footprint and energy overhead. In Section 3 . 1 and 3 . 2 , we analyze several practical weight update schemes implementing an online learning rule for mixed-signal systems on a custom simulator, and later validate on a real neuromorphic chip. Scalable synaptic eligibility traces for local learning. Many high-performing local rules rely on eligibility traces [ 39 , 82 -86 ], slow synaptic memory mechanisms that carry information forward in time. These traces bridge the temporal gap between synaptic activity occuring on millisecond timescale with network errors arising seconds later, helping to solve the distal reward problem [ 87 , 88 ]. While several neuromorphic platforms [ 89 -91 ] have incorporated synaptic eligibility traces for learning, this mechanism is one of the most costly building blocks in neural computation, due to quadratic scaling of the number of synapses. Digital implementations suffer from the memory-intensive nature of numerical trace calculations, leading to a von Neumann bottleneck [ 92 , 93 ]. Even in mixed-signal designs, the slow dynamics of eligibility traces require large capacitors leading to sacrifice of the scalability [ 94 ]. In Section 3 . 3 , we propose a novel and scalable implementation of synaptic eligibility traces using volatile memristors. Unifiying volatile and non-volatile materials. Different neural building blocks require different volatility, bit precision, and endurance characteristics from the memristive devices, which are then tailored to meet these demands. For example, ANN inference workloads require linear non-volatile conductance response over a wide dynamic range for optimal weight update and minimum noise for gradient calculation [ 21 , 22 , 95 ]. Whereas SNNs often demand richer and multiple synaptic dynamics simultaneously e.g., short term conductance decay (to implement Short-Term Plasticity (STP) and eligibility traces [ 82 ]), non-volatile device states (to represent synaptic efficacy) and a probabilistic nature (to mimic synaptic vesicle releases [ 24 ]). However, optimizing different active memristive material for each of these features limits their suitability to a wide range of computational frameworks and ultimately increases the system complexity for most demanding applications. Moreover, these diverse specifications cannot always be implemented by combining F igure 1 . 2 : Bias and variance in learning rules which estimate the gradients, even if they are not explicitly compute gradients. Figure taken from [ 7 ]. <details> <summary>Image 2 Details</summary> ![d387f525](/v1/image/d387f525a71d26272496445a2f6693f2f47bd014b6c72cb28b2a442fbff0c9fd) ### Visual Description ## Diagram: Conceptual Map of Learning Mechanisms and Perturbations ### Overview The diagram illustrates a conceptual framework mapping relationships between learning mechanisms, perturbations, and their alignment with bias/error backpropagation. It uses labeled points connected by directional arrows to represent theoretical connections. ### Components/Axes - **Vertical Axis**: Labeled "Bias (i.e., gradient, weight change)" with an upward arrow. - **Horizontal Axis**: Labeled "Error backpropagation" with a rightward arrow. - **Key Points**: 1. **Feedback alignment** (top-center) 2. **Contrastive learning, predictive coding, dendritic error** (left-center) 3. **RDD** (question mark, mid-left) 4. **Node perturbation** (mid-right) 5. **Weight perturbation** (far-right) 6. **AGREL** (origin point) 7. **Error backpropagation** (bottom-left) - **Additional Labels**: - "Variance (i.e., randomness in weight changes)" (far-right) - Question marks at **RDD** and **AGREL** indicate uncertainty. ### Detailed Analysis - **Feedback alignment** is positioned at the highest vertical point, suggesting maximal alignment with bias/gradient. - **Contrastive learning, predictive coding, dendritic error** cluster near the origin but slightly elevated, indicating partial alignment with bias. - **RDD** (Reinforcement-Driven Dynamics?) is marked with a question mark, implying unresolved theoretical status. - **Node perturbation** and **Weight perturbation** lie along the horizontal axis, showing progression from error backpropagation to variance in weight changes. - **AGREL** (Adaptive Gradient Regularization?) is at the origin, serving as a reference point for perturbations. ### Key Observations 1. **Directional Flow**: Arrows suggest a progression from error backpropagation (bottom-left) to weight perturbation (far-right), with intermediate steps like node perturbation. 2. **Uncertainty Markers**: Question marks at **RDD** and **AGREL** highlight gaps in understanding or empirical validation. 3. **Bias vs. Perturbation Tradeoff**: Higher vertical positioning correlates with stronger bias alignment, while horizontal progression reflects increasing perturbation complexity. ### Interpretation The diagram conceptualizes a spectrum of learning mechanisms and their relationship to bias/error dynamics. It positions **feedback alignment** as the most bias-aligned method, while **contrastive learning** and **predictive coding** occupy a middle ground. The inclusion of **RDD** and **AGREL** as question-marked nodes suggests these are either emerging concepts or areas requiring further research. The horizontal progression from error backpropagation to weight perturbation implies a theoretical framework where perturbations introduce variance, potentially improving generalization but at the cost of increased randomness in weight updates. The diagram emphasizes the need to balance bias alignment with controlled perturbation to optimize learning outcomes. </details> different types of memristors on a monolithic circuit e.g., volatile and non-volatile, binary and analog, due to the incompatibility of the fabrication processes. Although some prototype materials have been proposed to exhibit dual-functional memory [ 96 , 97 ], the dominance of one of the mechanisms often results in poor switching performance. Therefore, the lack of universal memristors capable of realizing diverse computational primitives has been a challenge. In Chapter 4 , we present our discovery of a novel memristor type that can be used for both volatile and non-volatile operations based on a simple programming scheme, while achieving a world-record in endurance. Routing of multi-crossbar arrays for scaling. Scaling neural networks, by increasing layer width or depth, has proven to be a powerful technique for improving performance [ 98 ]. However, scaling memristive crossbar array dimensions is hindered by analog non-idealities such as current sneak-paths, parasitic resistance, capacitance of metal lines, and yield limitations [ 99 -101 ]. For this reason, large-scale systems need to adapt multiple crossbars of managable dimensions [ 102 ], but this introduces the overhead of routing activities, especially with long wires along the source, router and destination. To reduce wiring length, three-dimensional ( 3 D) technology to vertically stack logic, crossbar arrays, and routers has been proposed [ 34 , 102 ], but the fabrication complexity and cost of 3 D integration are currently prohibitive. Today's most advanced multi-crossbar neuromorphic chips, e.g., HERMES [ 23 ] and NeuRRAM [ 18 ], still rely on off-chip communication for routing, strongly diminishing communication energy efficiency. This necessitates the development of efficient on-chip routing mechanisms to achieve energy-efficient communication. When the routing is optimized for communicating events through the on-chip AER protocol, the designer faces a trade-off between source-based and destination-based routing. Source-based routing offers the flexibility of per-neuron Content Addressable Memory (CAM) memory as used by DYNAP-SE [ 103 ], but this comes at the cost of increased chip area and slower memory access. Destination-based routing, while more area-efficient, sacrifices some degree of network configurability. In Chapter 5 , we propose and fabricate a novel memristive in-memory routing core that can be reconfigured to route signals between crossbars, enabling dense local, sparse global connectivity with orders of magnitude more routing efficiency compared to other SNN hardware platforms. ## Thesis Overview This thesis explores the path towards intelligent and low-power analog computing substrates, embracing the Bitter Lesson [ 104 ] and addressing some of the challenges in learning and scale. It consists of six selected publications, which I have coauthored with amazing electrical engineers, computer scientists, material designers and neuroscientists. My individual contributions to the work presented in each chapter are clarified and outlined in Appendix 4 . In the first part, we focus on developing mixed-signal learning circuits targeting memristive weights in single-layer feedforward SNN architectures. We address the limited resolution and device-to-device and cycle-to-cycle variability of binary RRAM weights, aiming to enable on-chip learning. Building upon the observation of Ielmini [ 105 ] that filament size in RRAM can be precisely controlled by compliance current, we introduce a programming circuitry that modulates synaptic weights based on the estimated gradients using a modified Delta Rule [ 106 ]. This approach achieves multi-level weight resolution within the conductance of intrinsically binary switching RRAM devices. We model variability of device responses to our new compliance current programming scheme, I CC to G LRS, from experimental measurements on a 4 kb HfO 2 -based RRAM array, and adjust our implementation accordingly. We validate our approach and circuits through simulations for standard Complementary Metal-Oxide-Semiconductor (CMOS) 180 nm processes and system simulations on the MNIST dataset. This codesign of algorithm, material, and circuit properties establishes a significant building block for single-layer on-chip learning with memristive devices. Furthermore, in Chapter 4 , we demonstrate that our programming scheme can also be applied, with even more linear response, to novel perovskite memristors. In Chapter 3 , we extend our focus to more challenging task of training RSNNs on-chip, addressing the complexities of online temporal credit assignment with memristive devices for in-memory computing. Our work encompasses three complementary efforts: 1 ) given a recent local learning rule development for RSNNs [ 75 ], investigating how to reliably program analog devices based on a realistic PCM model on simulations, 2 ) validating these results on a PCM-based neuromorphic chip, and 3 ) proposing a scalable implementation of synaptic eligibility traces, a crucial component of many local learning rules, using volatile memristors. To achieve this, we start with developing a PyTorch [ 107 ]-based simulation framework based on a comprehensive statistical model of a PCM crossbar array, capturing the major device nonidealities: programming noise, read noise, and temporal drift [ 81 ]. Our selected learning rule, e-prop, estimates the gradient with nice locality constraints, but it is not known how to reflect the gradient signal reliably while programming memristors with non-idealities. This framework enables benchmarking four commonly practiced memristor-aware weight update mechanisms (Sign-Gradient Descent [ 108 ], stochastic updates [ 81 ], multi-memristor updates [ 109 ] and mixedprecision [ 110 ]), to reliably program memristors conductances based on estimated gradients on generic regression tasks. We show that mixed-precision update scheme is superior by accumulating gradients on a high-precision memory (similar to quantization-aware-training methods [ 111 ]), allowing for a lower learning rate and improved alignment of weight update magnitudes with the PCM's minimum programmable conductance change. It also reduces the total number of write pulses by 2 -3 orders of magnitude, reducing energy spent on costly memristive programming and mitigating potential endurance issues. Furthermore, we verified previous digital implementations [ 77 ] of the stochastic update working well down to 8 -bit precision digital simulations. Following simulations, the next step is to validate it on a physical neuromorphic chip. We implemented the e-prop learning rule on the HERMES chip [ 23 ], fabricated in 14 nm CMOS, with four 256 x 256 PCM crossbar arrays in a differential parallel setup. We programmed all weights of a RSNN onto physical PCM devices across HERMES cores for in-memory inference. On-chip training was controlled by a digital coprocessor implementing a mixed-precision algorithm to accumulate gradients on a high-precision memory unit. Our mixed-precision training on HERMES achieved performance competitive with conventional software simulations, maintained a regularized firing rate, and significantly reduced the number of PCM programming pulses, enhancing energy efficiency. Our results demonstrates the first successful implementation of a gradient-based, powerful online learning rule for RSNNs on an analog substrate. While the mixed-precision technique requires an additional high-precision memory unit, we demonstrate that inference can be in-memory, and off-chip guided learning can be activated as needed with minimal analog device programming. Local learning rules, including e-prop, often require synaptic eligibility traces [ 39 , 82 -86 ], posing a scaling challenge for analog hardware due to their O ( N 2 ) area scaling, where N is the number of neurons. This challenge is exacerbated by increased time-constant requirements, as larger capacitors are needed for implementation. To address this, we introduce PCM-trace, a novel small-footprint circuitry leveraging the inherent conductance drift of PCM to emulate eligibility traces for local learning rules. We exploit the material bug - the structural relaxation and temporal conductance drift in PCM's amorphous regime [ 112 ], described by R ( t ) = R ( t 0 )( t t 0 ) ν - turning it into a feature. Our optimized material choice allows gradual SET pulses to accumulate the trace, while conductance naturally decays over seconds. We also introduce a multi-PCM-trace configuration, distributing synaptic traces across multiple PCM devices to significantly improve dynamic range. Experimental results on 130 nm CMOS technology confirm that PCM-trace can maintain eligibility traces for over 10 seconds while offering more than 11 x area savings compared to conventional capacitor-based trace implementations [ 94 ]. In Chapter 4 , we introduce a state-of-the-art memristive material based on halide perovskite nanocrystals that can be dynamically reconfigured to exhibit volatile or non-volatile behavior. This is motivated by the fact that many in-memory neural computing systems demand devices with specific switching characteristics, and existing memristive devices cannot be reconfigured to meet these diverse volatile and non-volatile switching requirements. To achieve this, we develop a cesium lead bromide nanocrystals capped with organic ligands as the active switching matrix and silver as the active electrode. This design leverages the low activation energy of ion migration in halide perovskites to achieve both diffusive (volatile) and drift (non-volatile) switching. By actively controlling the compliance current ( Icc ) following our prior work at Chapter 2 , the magnitude of ion flux is adjusted, enabling on-demand switching between the two modes. This control mechanism allows for the selection of diffusive dynamics with low Icc (1 µ A) for volatile behavior and drift kinetics with higher Icc ( 1 mA) for non-volatile memory operation. Moreover, our measurements demonstrated that memristors using perovskite nanocrystals capped with OGB, achieve a record endurance in both volatile and non-volatile modes. We attribute this superior performance to the larger OGB ligands, which better insulate the nanocrystals and regulate the electrochemical reactions responsible for switching behavior. In Chapter 5 , we switch gears and focus on the scalability of memristive architectures in neuromorphic systems, a challenge hindered by analog non-idealities of crossbar arrays. We introduce Mosaic, a novel memristive systolic array architecture consisting of interconnected Neuron Tiles and Routing Tiles, both implemented with RRAM integrated 130 nm CMOS technology. Each Neuron Tile, per usual, is a crossbar array of memristors storing network weights for a RSNN layer with Leaky Integrate-and-Fire (LIF) neurons. These neurons emit spikes based on integrated synaptic inputs, transmitting them to neighbouring tiles through Routing Tiles. The Routing Tiles, also based on memristor arrays, define the connectivity patterns between Neuron Tiles. The resulting structure then becomes a small-world graph with dense local and sparse long-range connections, similar to the connectivity found in biological brains 6 . Our in-memory routing approach necessitates careful optimization of connectivity during offline training to prune RSNNs into a Mosaic-compatible, small-world graph. We introduce a novel hardware layout-aware training method that considers the physical layout of the chip and optimizes neural network weights using either gradient-based or evolutionary algorithms [ 113 ]. In-memory routing and optimization for sparse long-range and locally dense communication in Mosaic result in significant energy efficiency in spike routing, surpassing other SNN hardware platforms by at least one order of magnitude, as demonstrated by hardware measurements and system-level simulations. Notably, due to its layout-aware structured sparsity, Mosaic achieves competitive accuracy in edge computing tasks like biosignal anomaly detection, keyword spotting, and motor control. 6 It may not be a coincidence that some of the solutions proposed in this thesis can also be found in the biological brain, whispering of a deep connection between highly optimized silicon substrates and the evolved neural tissue. <details> <summary>Image 3 Details</summary> ![b8e5a977](/v1/image/b8e5a977e96c3959eec775e8ec1783550296e881f479ed6b09bfd73db4ad7ac4) ### Visual Description Icon/Small Image (61x83) </details> ## ENHANCING BIT PRECISION OF BINARY MEMRISTORS FOR ROBUST ON-CHIP LEARNING This chapter's content was published in IEEE International Symposium on Circuits and Systems (ISCAS). The original publication is authored by Melika Payvand, Yigit Demirag, Thomas Dalgaty, Elisa Vianello, Giacomo Indiveri. Analog neuromorphic circuits with memristive synapses offer the potential for power-efficient neural network inference, but limited bit precision of memristors poses challenges for gradientbased training. In this chapter, we introduce a programming technique of the weights to enhance the effective bit precision of initially binary memristive devices, enabling more robust and performant on-chip training. To overcome the problems of variability and limited resolution of ReRAM memristive devices used to store synaptic weights, we propose to use only their HCS and control their desired conductance by modulating their programming compliance current, ICC . We introduce spike-based CMOS circuits for training the network weights; and demonstrate the relationship between the weight, the device conductance, and the ICC used to set the weight, supported with experimental measurements from a 4 kb array of HfO 2 -based devices. To validate the approach and the circuits presented, we present circuit simulation results for a standard CMOS 180 nm process and system-level simulations for classifying hand-written digits from the MNIST dataset. ## 2 . 1 introduction The neural networks deployed on the resource-constrained devices can benefit greatly from online training to adapt to shifting data distributions, sensory noise, device degradation, or new tasks that are not seen in the pretraining. While CMOS architectures with integrated memristive devices offer ultra-low power inference, their use for online learning has been limited [ 114 , 115 ]. In this chapter, we propose novel learning circuits for SNN architectures implemented by 1 T 1 R arrays. These circuits enable analog weight updates on binary ReRAM devices by controlling their SET operation ICC . In addition to increasing the bit precision of network weights, the proposed strategy allows compact, fast, and scalable event-based learning scheme compatible with the AER interface [ 116 ]. Previously significant efforts have aimed to increase the bit precision of memristive devices for online learning through material and architectural optimizations. material optimization Several groups reported TiO 2 -based [ 47 -49 ] and HfO 2 -based [ 50 ] ReRAM devices with up to 8 bits of precision. However, in all these works, the analog behavior is traded off with the lower available ON/OFF ratio. While the analog behavior is an important concern for training neural networks, cycle-to-cycle and device-to-device variability harms the effective number of bits further when ON/OFF ratio is small. Also, tuning the precise memory state is not always easily achievable in a real-time manner, requiring recursively tuning with an active feedback scheme [ 50 , 117 ]. Furthermore, some efforts have been focused on carefully designing a barrier level using exhaustive experimental search over a range of materials [ 47 , 48 ] which makes it difficult to fabricate. architecture optimization Increasing the effective bit resolution has also been demonstrated with architectural advancements. Strategies such as using multiple binary switches to emulate n-bit synapses [ 51 ] or exploiting stochastic switching properties for analog-like adaptation [ 53 , 54 ] have been explored. Alternatively, IBM's approach of using a capacitor alongside two PCM devices to act as an analog volatile memory, increases combined precision but incurs significant area overhead [ 55 ]. Recently, a mixed-precision approach has been employed to train the networks using a digital coprocessor for weight update accumulation [ 118 ], but requires digital buffering of weights, gradients and suffers of domain-conversion costs. Thus, neither device nor architecture optimizations have fully resolved the challenges of low bit precision in memristors for online learning. Prior work by Ielmini [ 105 ] observed that the electrical resistance of memristors after a SET operation follows a linear relationship with ICC in a log-log scale, by control over the size of the filament. This critical observation underpins our approach, where we exploit this relationship to directly control device conductance. To minimize the effect of variability, we adopt an algorithm-device codesign approach. We restrict devices to stay only in their HCS and modulate their conductance by adjusting the programming ICC . Specifically, we derive a technologically feasible online learning algorithm based on the Delta rule [ 106 ], mapping weight updates onto the ICC used for setting the device. This co-design offers several advantages: (i) relaxed fabrication constraints compared to multi-bit devices, and (ii) increased state stability due to the use of only two levels per device. ## 2 . 2 reram device modeling To find the average relationship between the mean of the cycle-to-cycle distribution of the HCS and the SET programming ICC , we performed measurements on a 16 × 256 ( 4 kb) array of HfO 2 -based ReRAM devices integrated onto a 130 nm CMOS process between metal layers 4 and 5 [ 119 ]. Each device is connected in series to the drain of an n-type selector transistor which allows the SET programming ICC to be controlled based on the voltage applied to its gate. The 1 T 1 R structure allows a single device to be selected for reading or programming by applying appropriate voltages to a pair of Source/Bit Lines (SL/BL) and a single Word Line (WL). All 4 kb devices were initially formed in a raster scan fashion by applying a large voltage ( 4 V typical) between the SL and BL to induce a soft breakdown in the oxide layer and introduce conductive oxygen vacancies. After forming, each device was subject to sets of 100 RESET/SET cycles over a range of SET ICC s between 10 µ A and 400 µ A, where the resistance of each device was recorded after each SET operation. The mean of all devices' median resistances over the 100 cycles, at a single ICC , gives the average relationship between HCS median and SET ICC as in Fig. 2 . 1 . The relationship is seen to follow a line in the log-log plot (power law) and over this ICC range, it allows precise control of the conductance median of the cycle-to-cycle distribution between 50 k Ω and 2 k Ω . ## 2 . 3 bit -precision enhancing weight update rule The learning algorithm is based on the Delta rule, the simplest form of the gradient descent for single-layer networks. In our implementation, the objective function is defined as the difference between the desired target output signal y and the network prediction signal ˆ y , for a given set of input patterns signals x , weighted by the synaptic weight parameters w . Then the Delta rule can be used to calculate the change of the weights connecting a neuron i in the input layer and a neuron j at the output layer as follows: $$\Delta w _ { j i } = \eta ( y _ { j } - \hat { y } _ { j } ) x _ { i } = \eta \delta _ { j } \, x _ { i } , & & ( 2 . 1 )$$ F igure 2 . 1 : Mean and standard deviation of the device conductance as a function of the ICC s. The inset shows the samples from the fitted mean and standard deviation used for the simulations. <details> <summary>Image 4 Details</summary> ![b819d0e7](/v1/image/b819d0e72255d660b215361b33584b1ddd25e8857a17908a1fb5db073b78bbe9) ### Visual Description ## Line Graph: Relationship Between I_CC and G_LRS ### Overview The image displays a logarithmic-scale line graph with two axes: - **Main Plot**: Shows a solid blue line with data points representing the relationship between **I_CC (A)** (x-axis) and **G_LRS (S)** (y-axis). - **Inset Plot**: A scatter plot with blue dots showing the relationship between **I_CC (A)** (x-axis) and **Ĝ_LRS (S)** (y-axis). ### Components/Axes - **Main Plot Axes**: - **X-axis (I_CC)**: Labeled "I_CC (A)" with logarithmic scale from 10⁻⁵ to 10⁻⁴ A. - **Y-axis (G_LRS)**: Labeled "G_LRS (S)" with logarithmic scale from 10⁻⁵ to 10⁻⁴ S. - **Data Points**: Solid blue line with markers at specific I_CC values. - **Error Bars**: Present on all data points, indicating measurement uncertainty. - **Inset Plot Axes**: - **X-axis (I_CC)**: Same scale as main plot (10⁻⁵ to 10⁻⁴ A). - **Y-axis (Ĝ_LRS)**: Labeled "Ĝ_LRS (S)" with logarithmic scale from 10⁻⁵ to 10⁻⁴ S. - **Data Points**: Blue dots scattered across the plot, suggesting a linear trend. - **Legend**: Located in the top-right corner, confirming the blue line corresponds to the main plot data. ### Detailed Analysis #### Main Plot Trends - **Data Points**: - (1×10⁻⁵ A, 2×10⁻⁵ S) - (3×10⁻⁵ A, 5×10⁻⁵ S) - (5×10⁻⁵ A, 1×10⁻⁴ S) - (1×10⁻⁴ A, 2×10⁻⁴ S) - (2×10⁻⁴ A, 3×10⁻⁴ S) - (3×10⁻⁴ A, 4×10⁻⁴ S) - (4×10⁻⁴ A, 5×10⁻⁴ S) - (5×10⁻⁴ A, 6×10⁻⁴ S) - **Trend**: The line slopes upward, indicating a **positive correlation** between I_CC and G_LRS. The relationship appears **approximately linear** on a logarithmic scale. #### Inset Plot Trends - **Scatter Plot**: - Blue dots form a **linear distribution** along the diagonal, suggesting a **direct proportionality** between I_CC and Ĝ_LRS. - No error bars are visible, but the density of points implies consistent measurements. ### Key Observations 1. **Main Plot**: - G_LRS increases with I_CC, with a slope of ~2 (e.g., doubling I_CC from 1×10⁻⁵ to 2×10⁻⁴ A increases G_LRS from 2×10⁻⁵ to 3×10⁻⁴ S). - Error bars are smallest at lower I_CC values and grow slightly at higher values, indicating potential measurement variability. 2. **Inset Plot**: - The linear trend in the inset supports the main plot’s conclusion, reinforcing the proportionality between I_CC and Ĝ_LRS. ### Interpretation - **System Behavior**: The data suggests that **G_LRS (and Ĝ_LRS)** scales linearly with **I_CC** under the tested conditions. This could indicate a **proportional relationship** in the system’s response to current (I_CC). - **Measurement Consistency**: The inset’s scatter plot validates the main plot’s trend, confirming the reliability of the linear model. - **Uncertainty**: Error bars in the main plot highlight measurement limitations, particularly at higher I_CC values. The graph demonstrates a clear, reproducible relationship between current (I_CC) and the measured parameter (G_LRS/Ĝ_LRS), with implications for understanding the system’s linear response characteristics. </details> <details> <summary>Image 5 Details</summary> ![4811b6c9](/v1/image/4811b6c917cf8c978d72293a69c0c1fc8b9f5cb6415a9987021cfad00b80b7f4) ### Visual Description ## Pseudocode: Delta Rule Implementation with Dual-Memristors ### Overview The image contains pseudocode for implementing the Delta Rule algorithm using dual-memristors. The code structure includes variable initialization, iterative loops, conditional logic, and arithmetic operations. ### Components/Axes - **Variables**: - `w_ji1`, `w_ji2`: Dual-memristor weights initialized via `rand()`. - `simDuration`: Simulation duration threshold for the `while` loop. - `δ_j`: Error term calculated as `|ŷ - y|`. - `I_ji1`, `I_ji2`: Intermediate current values derived from `READ(w_ji1, w_ji2)`. - `I_1`, `I_2`: Scaled currents using constants `c_1`, `c_2`. - `S_1`, `S_2`: Final synaptic states updated based on error sign. - `ICC_ji1`, `ICC_ji2`: Intermediate current corrections. - **Control Flow**: - `while t < simDuration`: Iterates until simulation time exceeds `simDuration`. - `if @Pre and δ_j > δ_th`: Conditional block for pre-synaptic error threshold checks. - `for all w_ji`: Iterates over all memristor pairs. - `if (ŷ - y) > 0`: Adjusts synaptic states based on error polarity. - `else`: Handles negative error cases. ### Detailed Analysis 1. **Initialization**: - `w_ji1` and `w_ji2` are randomly initialized using `rand()`. 2. **Main Loop**: - The `while` loop runs while `t < simDuration`, simulating time-dependent updates. 3. **Error Calculation**: - `δ_j = |ŷ - y|` computes the absolute error between predicted (`ŷ`) and actual (`y`) outputs. 4. **Pre-Synaptic Check**: - If `@Pre` (likely a flag for pre-synaptic activation) and `δ_j > δ_th` (error exceeds threshold), the algorithm proceeds to update memristors. 5. **Current Calculation**: - `READ(w_ji1, w_ji2)` retrieves current values from memristors. - `I_1 = I_ji1 * c_1` and `I_2 = I_ji2 * c_2` scale currents using constants. 6. **Synaptic State Update**: - **Positive Error (`ŷ - y > 0`)**: - `S_1 = I_1 + ηδ_j` and `S_2 = I_2 - ηδ_j` adjust synaptic states with learning rate `η`. - `ICC_ji1 = S_1 * c_2` and `ICC_ji2 = S_2 * c_2` compute intermediate corrections. - **Negative Error (`ŷ - y ≤ 0`)**: - `S_1 = I_1 - ηδ_j` and `S_2 = I_2 + ηδ_j` invert adjustment logic. - `ICC_ji1 = S_1 * c_2` and `ICC_ji2 = S_2 * c_2` update corrections. 7. **Reset and Set Operations**: - `RESET(w_ji1, w_ji2)` and `SET(w_ji1, w_ji2)` finalize weight updates. ### Key Observations - **Dual-Memristor Symmetry**: The algorithm treats `w_ji1` and `w_ji2` symmetrically, with opposing adjustments based on error polarity. - **Error-Driven Updates**: Synaptic states (`S_1`, `S_2`) and corrections (`ICC_ji1`, `ICC_ji2`) depend directly on the error term `δ_j`. - **Thresholding**: The `@Pre` flag and `δ_th` threshold introduce conditional logic to control updates. ### Interpretation This pseudocode models the Delta Rule for synaptic weight updates in a neural network using dual-memristors. The algorithm: 1. **Learns from Errors**: Adjusts memristor weights (`w_ji1`, `w_ji2`) proportionally to the error `δ_j`, scaled by learning rate `η`. 2. **Handles Polarity**: Positive/negative errors trigger opposite updates to `S_1` and `S_2`, mimicking biological synaptic plasticity. 3. **Scales with Constants**: Constants `c_1`, `c_2` modulate current-to-state conversions, likely representing hardware-specific parameters. 4. **Iterative Refinement**: The `while` loop ensures continuous updates until the simulation duration is reached. The code reflects a hardware-aware implementation of the Delta Rule, emphasizing memristor-based synaptic modeling and error-driven learning. </details> F igure 2 . 2 : Event-based neuromorphic architecture using online learning in a 1 T 1 R array (a), and the asynchronous state machine used as the switch controller applying the appropriate voltages on the BL, SL and WL of the array for online learning. <details> <summary>Image 6 Details</summary> ![35f745b0](/v1/image/35f745b04a3e629c2e6876d304f6a1099ac85ca7518ff24c8338e3e1eab8978f) ### Visual Description ## Circuit and State Machine Diagram: Control System Architecture ### Overview The image depicts a technical system architecture combining a circuit diagram (a) and a state machine diagram (b). The circuit diagram illustrates signal flow through a neural interface system, while the state machine defines operational logic for read/write operations. ### Components/Axes **Circuit Diagram (a):** - **Vertical Elements:** - **Switch Controller**: Top-left box with green (→) and red (←) arrows - **BL (Blue Lines)**: Labeled BL₁ to BLₙ, connected to green/red lines - **SL (Blue Lines)**: Labeled SL₁ to SL₂, connected to excitation/inhibition regions - **WL (Blue Lines)**: Labeled WL₁ to WL₂, connected to inhibition region - **Horizontal Elements:** - **Excitation Region**: Green-shaded area between BL₁ and BLₙ - **Inhibition Region**: Red-shaded area below excitation region - **I&F Circuit**: Contains operational amplifiers (V_rf, V_i) and feedback loop - **LB (Load Block)**: Blue box connected to I&F circuit - **Target_1**: Final output terminal **State Machine Diagram (b):** - **States**: IDLE, READ, SET, RESET - **Transitions**: - **READ → IDLE**: When STOP=1 - **IDLE → READ**: When READ=1 - **IDLE → SET**: When WRITE=1 - **SET → RESET**: When STOP#1=1 - **Variables**: - BL = V_rd (READ), BL = 0 (SET), BL = V_rst (RESET) - SL = V_rf (READ), SL = V_set (SET), SL = 0 (RESET) - WL = V_dd (READ/IDLE), WL = V_c·I_cc (SET), WL = V_dd (RESET) ### Detailed Analysis **Circuit Diagram:** 1. **Signal Flow**: - Green lines (I_cc_et) originate from switch controller, pass through excitation region, and connect to I&F circuit - Red lines (I_cc_it) originate from switch controller, pass through inhibition region, and connect to I&F circuit - Blue lines (SL/WL) connect directly to excitation/inhibition regions 2. **Key Components**: - **I&F Circuit**: Integrates excitatory (V_rf) and inhibitory (V_i) signals - **LB Block**: Final processing unit before Target_1 - **Voltage References**: V_rd (read), V_rf (read), V_dd (default), V_rst (reset), V_set (set) **State Machine Diagram:** 1. **Operational Logic**: - **READ State**: Activates when READ=1, sets BL=V_rd, SL=V_rf, WL=V_dd - **SET State**: Activates when WRITE=1, sets BL=0, SL=V_set, WL=V_c·I_cc - **RESET State**: Activates when STOP#1=1, sets BL=V_rst, SL=0, WL=V_dd - **IDLE State**: Default state with BL=V_rd, SL=V_rf, WL=V_dd ### Key Observations 1. **Signal Pathway**: - Excitation (green) and inhibition (red) signals are spatially separated but converge at I&F circuit - WL signals bypass excitation/inhibition regions, connecting directly to inhibition region 2. **State Machine Behavior**: - READ/SET/RESET states modify BL/SL/WL voltages based on operational requirements - STOP conditions (STOP=1/STOP#1) trigger state transitions 3. **Voltage Relationships**: - V_c·I_cc (SET state) suggests current-controlled voltage generation - V_rst (reset) and V_set (set) define distinct operational thresholds ### Interpretation The system implements a closed-loop neural interface with: 1. **Dual-Path Signal Processing**: - Excitation (green) and inhibition (red) pathways provide parallel signal modulation - WL signals (blue) enable direct inhibition without passing through excitation region 2. **State-Dependent Control**: - READ state prepares system for data acquisition (BL=V_rd, SL=V_rf) - SET state configures synaptic weights (WL=V_c·I_cc) - RESET state restores default parameters (BL=V_rst, SL=0) 3. **Safety Mechanisms**: - STOP conditions prevent uncontrolled state transitions - Default voltages (V_dd) maintain system stability during IDLE The architecture demonstrates a sophisticated balance between analog signal processing (circuit diagram) and digital control logic (state machine), enabling precise modulation of neural interfaces through coordinated voltage control and operational state management. </details> where δ j is the error, and η is the learning rate. To implement this using memristive synaptic architecture, we represent each synaptic weight w , by the combined conductance of two memristors, wji 1 and wji 2 , arranged in a push-pull differential configuration. This scheme extends the effective dynamic range of a single synapse to capture the negative values. During the network operation, the target and the prediction signals are compared continuously to generate the error signal. With the arrival of a pre-synaptic event, if the error signal is larger than a small error threshold, the weight update process is initiated. The small error threshold that creates the "stop-learning" regime has been proposed to help the convergence of the neural networks with stochastic weight updates [ 120 ]. The implementation of the synaptic plasticity consists of three phases (Alg. 1 ). First, a READ operation is performed on every excitatory and inhibitory memristor to determine their conductance. Then the resulting current values ( I ji 1 and I ji 2 ) are scaled to the level of the error signal. Second, the current value proportional to the amount of the weight change ηδ j xi is summed up with the scaled READ current to represent the desired conductance strength to be programmed. Finally, these currents are further scaled to a valid ICC range using linear scaling constants c 1 and c 2. To provide a larger dynamic range per synapse, the conductance of both memristors are updated with a push-pull mechanism considering the sign of the error (i.e. if the conductance of one memristor increased, the conductance of the complementary memristor is decreased, and vice-versa). ## 2 . 4 learning circuits and architecture F igure 2 . 3 : Learning circuits generating the ICC for updating the devices based on the distance between the neuron and its target frequency. Highlighted in red is the Gm-C filters, low pass filtering the neuron and target spikes giving rise to VN and VT . In green and orange, the error between the two is calculated generating positive ( I ErrP ) and negative ( IErrN ) errors, unless error is small and STOP signal is high. In purple, Ve , the excitatory voltage from Fig. 2 . 2 , regenerates the read current and is scaled to I scale producing IeS . Based on the error sign (UP), ICC 1 is either the sum of IeS and I Err or the subtraction of the two. <details> <summary>Image 7 Details</summary> ![73e57ac8](/v1/image/73e57ac83bd21a37601e5372eb05115ab45d481e4cd480387fb3679f3c5c15dc) ### Visual Description ## CircuitDiagrams: Neural Interface and Control System Components ### Overview The image contains four distinct circuit diagrams, each illustrating components of a neural interface or adaptive control system. Diagrams are color-coded (pink, green, orange, blue) and feature operational amplifiers (op-amps), microcontrollers, resistors, capacitors, and labeled signal paths. ### Components/Axes #### Diagram 1 (Pink) - **Components**: - Two op-amps labeled "Neuron Spikes" (input: `V_N`) and "Target Spikes" (input: `V_T`). - Output connected to a comparator (symbol: `+ -` with output line). - **Key Labels**: - `V_N` (Neuron voltage), `V_T` (Target voltage). #### Diagram 2 (Green) - **Components**: - Multiple op-amps, resistors (`R`), capacitors (`C`), and a microcontroller (`MCU`). - Feedback loops with labeled nodes: `Curr. Comp.` (current comparator), `STOP`, `UP`. - Voltage nodes: `V_SF`, `V_ErrP`, `V_ErrN`, `V_subth`. - Current nodes: `I_scale`, `I_errP`, `I_errN`, `I_CC1`. - **Key Labels**: - `STOP` (control signal), `UP` (control signal), `V_SF` (scaled feedback voltage). #### Diagram 3 (Orange) - **Components**: - Op-amps, resistors, capacitors, and a microcontroller. - Error correction nodes: `V_ErrP`, `V_ErrN`, `I_errP`, `I_errN`. - Scaling nodes: `I_scale`, `I_CC1`. - **Key Labels**: - `STOP` (control signal), `UP` (control signal), `V_SF` (scaled feedback voltage). #### Diagram 4 (Blue) - **Components**: - Microcontroller (`MCU`), op-amps, resistors, capacitors. - Scaling and error nodes: `I_scale`, `I_errP`, `I_errN`, `V_ErrP`, `V_ErrN`. - Additional nodes: `V_e` (error voltage), `I_CC1` (current correction). - **Key Labels**: - `STOP` (control signal), `UP` (control signal), `V_SF` (scaled feedback voltage). ### Detailed Analysis - **Diagram 1**: A comparator circuit comparing neuron and target spike voltages (`V_N` vs. `V_T`). Output polarity depends on which voltage is higher. - **Diagram 2**: A feedback control system with error correction (`V_ErrP`, `V_ErrN`) and scaling (`I_scale`). The microcontroller (`MCU`) likely processes signals for adaptive adjustments. - **Diagram 3**: Similar to Diagram 2 but emphasizes error current (`I_errP`, `I_errN`) and scaling (`I_scale`). The `STOP` and `UP` signals suggest conditional control logic. - **Diagram 4**: Integrates error voltage (`V_e`) and current correction (`I_CC1`) with scaling (`I_scale`). The `MCU` appears central to coordinating feedback loops. ### Key Observations 1. **Repetition of Labels**: `STOP`, `UP`, `V_SF`, and `I_scale` appear across multiple diagrams, indicating standardized control signals. 2. **Error Handling**: All diagrams include error voltage (`V_ErrP`, `V_ErrN`) and current (`I_errP`, `I_errN`) nodes, suggesting a focus on precision and adaptability. 3. **Scaling Mechanisms**: `I_scale` and `V_SF` imply dynamic adjustment of signals, possibly for noise reduction or signal amplification. ### Interpretation These diagrams likely represent a neural interface system where: - **Neuron/Target Comparison** (Diagram 1) detects discrepancies between neural activity and desired outputs. - **Feedback Control** (Diagrams 2–4) uses error signals to adjust system behavior via scaling and correction currents. - **Microcontroller Integration** (`MCU`) enables real-time processing of feedback loops, allowing adaptive responses to neural inputs. The system may be used in applications like brain-computer interfaces, where precise error correction and dynamic scaling are critical for accurate signal interpretation. The repeated use of `STOP` and `UP` signals suggests conditional logic for halting or adjusting operations based on error thresholds. </details> neuromorphic architecture Figure 2 . 2 a illustrates the event-based neuromorphic architecture encompassing the learning algorithm. It consists of a 1 T 1 R array, a Switch Controller, Leaky Integrate and Fire (I&F) neurons and a learning block (LB). Every neuron receives excitatory and inhibitory currents from two rows of the 1 T 1 R array respectively. With the arrival of every event through the AER interface (not shown), two consecutive READ and WRITE signals are generated [ 115 ]. Based on these signals, the asynchronous state machine in Fig. 2 . 2 b controls the sequence so that the SLs, BLs and the WLs of the array are driven by the appropriate voltages such that: device is read, its value is integrated by the I&F neuron; the error value is updated through the learning block (LB), generating ICC 1 and ICC 2 (section 2 . 3 ); Based on these values, the excitatory and inhibitory devices are programmed. learning circuits Based on Alg. 1 and data from Fig. 2 . 1 , we have designed circuits that generate the appropriate ICC , based on the firing rate distance between the neuron and its target. Figure 2 . 3 presents these circuits. The spikes from the neurons and the target are integrated using subthreshold Gm-C filters highlighted in red generating VN and VT . These voltages are subtracted from one another using a subthreshold "Bump" ( subBump ) circuit [ 121 ] highlighted in green, and an above-threshold "Bump" circuit ( abvBump ) in orange. subBump circuit compares VN and VT giving rise to the error currents when neuron and the target frequencies are far apart and generates the STOP signal when the error is small and in the stop-learning range ( δ th ) [ 120 , 122 ]. STOP signal gates the tail current of all the above-threshold circuits and thus substantially reduce the power consumption when the learning is stopped. Moreover, input events are also used as another gating mechanism. abvBump circuit subtracts VN and VT and scales it to I scale , equal to the maximum ICC required based on Fig. 2 . 1 . Based on the error sign (UP), the scaled error current is summed with or subtracted from the scaled device current generating the desired ICC (Alg. 1 ). This circuit is highlighted in purple. circuit simulations results Figure 2 . 4 a depicts the positive and negative error currents, STOP-learning signal, and the ICC 1 and ICC 2 currents. The error currents follow a Sigmoid which can be approximated by a line for error values between 1 and 1 . As is explained in Alg. 1 , for positive errors, ICC 2 ( ICC 1 ) follows the summation (subtraction) of the error current with the scaled device current, while for the negative errors, it is the opposite. Figure 2 . 4 b illustrates the dependence of the ICC on the current value of the devices which shifts the error current curve up or down. ## 2 . 5 system -level simulations We performed SNN simulations with BRIAN 2 [ 123 ] to evaluate the performance of our proposed update scheme, incorporating the device models (see Fig. 2 . 1 ) with stochastic weight changes. Our goal was to achieve a test accuracy comparable to the artificial neural networks trained with backpropagation with Single-precision floating-point format (FP 32 ) precision on the digital hardware. We evaluated our network on the MNIST handwritten digits dataset [ 124 ] using the first five classes ( 30596 training and 5139 testing images, each 28 × 28 pixels). We trained a fully connected, single-layer spiking network with 784 input LIF neurons and 5 output LIF neurons. Each input image was presented for 100 ms, with pixel intensity encoded as Poisson spikes with a rate of [ 0, 200 ] Hz. At the output layer, spikes were counted per neuron during each stimulus, and the neuron with the maximum firing rate was selected as the network prediction. The error signal was calculated as the difference between low-pass filtered network output spikes and low-pass filtered target spikes, encoded as Poisson spikes at 40 kHz. We modeled the cycle-to-cycle variability of ICC dependent GLRS conductance using a Gaussian distribution with ICC -dependent mean and standard deviation, as described in Section 2 . 2 . This variability model was applied to all synaptic devices in the simulation, and we achieved the test accuracy of 92 . 68 % after training for three epochs. F igure 2 . 4 : (a) Error current, STOP learning signal and ICC as a function of the normalized error between the target and the neuron frequencies. (b) Change of the ICC 1 (red) and ICC 2 (blue) as a function of the error and the resistance value of the devices. <details> <summary>Image 8 Details</summary> ![9e25027f](/v1/image/9e25027fe3de5cfbefcceb3b32fd0bfe29dcdceb7f4903b17055d747e0a8fc45) ### Visual Description ## Line Graphs: Current and Voltage Response to Normalized Error ### Overview The image contains two panels (a and b) depicting line graphs that analyze electrical responses to normalized error. Panel a shows current (I) responses across different error values, while Panel b illustrates voltage (Stop) responses with increasing resistance (R). Both graphs use normalized error (-1.00 to 1.00) on the x-axis and current/voltage on the y-axis. ### Components/Axes **Panel a:** - **X-axis**: Normalized error (-1.00 to 1.00) - **Y-axis**: Current (I) in microamperes (μA) (0–300) - **Legend**: Located on the right, with five data series: - Green: I_bump - Dark blue: I_CC1 - Light blue: I_CC2 - Red: I_errP - Orange: I_errN **Panel b:** - **X-axis**: Normalized error (-1.00 to 1.00) - **Y-axis**: Stop voltage (V) (0–300) - **Legend**: Gradient from blue (low R) to red (high R), labeled "Increasing R" - **Arrow**: Indicates direction of increasing R (left to right) ### Detailed Analysis **Panel a Trends:** 1. **I_bump (green)**: Sharp peak at 0 error (~300 μA), rapidly declining to ~50 μA at ±0.5 error. 2. **I_CC1 (dark blue)**: Broad peak centered at 0 error (~250 μA), tapering to ~100 μA at ±0.75 error. 3. **I_CC2 (light blue)**: Similar to I_CC1 but slightly lower amplitude (~200 μA at 0 error). 4. **I_errP (red)**: Linear increase from 0 μA at 0 error to ~150 μA at -1.00 error. 5. **I_errN (orange)**: Linear increase from 0 μA at 0 error to ~150 μA at +1.00 error. **Panel b Trends:** - Gradient lines (blue to red) show: - **Low R (blue)**: Shallow slope, minimal voltage change (~50 V at ±1.00 error). - **Medium R (purple)**: Moderate slope, ~150 V at ±1.00 error. - **High R (red)**: Steep slope, ~250 V at ±1.00 error. - All lines converge at 0 error (0 V). ### Key Observations 1. **Panel a**: - I_bump exhibits a distinct peak at 0 error, suggesting a critical threshold response. - I_errP and I_errN show symmetric linear behavior on negative/positive error sides. - I_CC1 and I_CC2 have overlapping but distinct peak profiles. 2. **Panel b**: - Voltage response steepens with increasing R, indicating resistance-dependent sensitivity to error. - All lines intersect at 0 error (0 V), implying baseline stability at zero error. ### Interpretation - **Panel a** likely represents current contributions from different error sources (e.g., bump current, cross-coupling currents, and error-driven currents). The sharp I_bump peak may indicate a dominant mechanism at zero error. - **Panel b** demonstrates that higher resistance (R) amplifies the system's voltage response to error, suggesting R modulates error sensitivity. The convergence at 0 error implies error-free conditions stabilize the system regardless of R. - The opposing slopes in Panel b (negative/positive error) may reflect asymmetric error impacts under varying resistance, critical for designing error-tolerant systems. </details> ## 2 . 6 discussion Significant effort is underway to develop learning algorithms for SNNs due to their potential for highly parallel, low-power processing. However, a substantial gap exists between these algorithms and their hardware implementation due to noise, variability, and limited bit precision. This gap underscores the importance of technologically plausible learning algorithms rooted in device physics and measurements [ 125 ]. Our work, exploiting ReRAM current compliance ICC for weight updates, represents a step in this direction. power consumption and scalability While the learning block can generate up to 100 s of µ A s of ICC for large errors, design considerations such as event and STOP-learning signal gating mitigate average power consumption. Peak current per learning block ranges from 1 to 600 µ A , depending on the network error. Leveraging the Poisson distribution of events (due to thermal noise), we can assume that only one column of devices is programmed at a time. Therefore, peak power scales sublinearly with the number of neurons (linear in the worst case). This sublinear scaling implies that power consumption does not fundamentally limit scalability. However, with Poisson-distributed input events and a maximum frequency per input channel, an upper bound for array size exists, determined by event pulse width and tolerance to missing events [ 54 ]. the nonlinear effect The power-law relationship between ICC → GLRS (Fig. 2 . 1 ) introduces a nonlinear mapping of weight updates. This nonlinearity slightly biases the weight update away from the optimal values calculated by the Delta rule. Further investigation into mitigating this bias through calibration or algorithmic compensation could improve learning accuracy. In this chapter, I presented a technologically plausible learning algorithm that leverages the compliance current of binary ReRAMs to generate variable, multi-level conductance changes. Our comprehensive co-design approach spans multiple levels of abstraction, from device measurements to algorithm, architecture, and circuits. We believe this work represents a significant step toward realizing always-on, on-chip learning systems. As we will see in Chapter 4 , our method can be extended to other non-volatile materials, providing a broader pathway for on-chip learning hardware. ## ONLINE TEMPORAL CREDIT ASSIGNMENT WITH NON-VOLATILE AND VOLATILE MEMRISTORS This chapter builds upon three conceptually linked works. My initial concept of investigating the online credit assignment problem for recurrent neural networks implemented with non-ideal non-volatile memristive devices [ 126 ], and the subsequent utilization of eligibility traces with volatile memristors for scalability [ 127 ], laid the foundation for this research. These concepts were further validated through real-hardware testing in collaboration with IBM Research Zürich [ 128 ]. ## 3 . 1 framework for online training of rsnns with non -volatile memristors Training RSNNs on ultra-low-power hardware remains a significant challenge. This is primarily due to the lack of spatio-temporally local learning mechanisms capable of addressing the credit assignment problem effectively, especially with limited weight resolution and online training with a batch size of one. These challenges are accentuated when using memristive devices for in-memory computing to mitigate the von Neumann bottleneck, at the expense of increased stochasticity in recurrent computations. To investigate online learning in memristive neuromorphic Recurrent Neural Network (RNN) architectures, we present a simulation framework and experiments on differential-architecture crossbar arrays based on an accurate and comprehensive PCM device model. We train a spiking RNN on regression tasks, with weights emulated within this framework, using the recently proposed e-prop learning rule. While e-prop truncates the exact gradients to follow locality constraints, its direct implementation on memristive substrates is hindered by significant PCM non-idealities. We compare several widely adopted weight update schemes designed to cope with these non-idealities and demonstrate that only gradient accumulation can enable efficient online training of RSNNs on memristive substrates. ## 3 . 1 . 1 Introduction RNNs are a remarkably expressive [ 129 ] class of neural networks, successfully adapted in domains such as audio/video processing, language modeling and Reinforcement Learning (RL) [ 130 -135 ]. Their power lies in their architecture, enabling the processing of long and complex sequential data. Each neuron contributes to network processing at various times in the computation, promoting hardware efficiency by the principle of reuse and being the dominant architecture observed in the mammalian neocortex [ 136 , 137 ]. However, training RNNss under constrained memory and computational resources remains a challenge [ 83 ]. Current hardware implementations of neural networks still lag behind the energy efficiency of biological systems, largely due to data movement between separate processing and memory units in von Neumann architectures. Compact nanoscale memristive devices have gained attention for implementing artificial synapses [ 99 , 138 -142 ]. These devices enable calculating synaptic propagation in-memory between neurons, breaking the von Neumann bottleneck [ 92 , 93 ] Memristive devices are particularly promising for SNNs, especially for low-power, sparse, and event-based neuromorphic systems that emulate biological principles [ 143 , 144 ]. In these systems, synapses (memory) and neurons (processing units) are arranged in a crossbar architecture (Fig. 3 . 1 a), with memristive devices storing synaptic efficacy in their programmable multi-bit conductance values. This architecture inherently supports the sparse, event-driven nature of SNNs, enabling in-memory computation of synaptic propagation through Ohm's and Kirchhoff's Laws. As demonstrated for 32 nm technology [ 145 , 146 ], memristive crossbar arrays offer higher density and lower dynamic energy consumption during inference compared to traditional Static Random Access Memory (SRAM). Additionally, their non-volatile nature reduces static power consumption associated with volatile CMOS memory. Thus, in-memory acceleration of spiking RNNs with non-volatile, multi-bit resolution memristive devices is a promising path for scalable neuromorphic hardware in temporal signal processing. PCM devices are among the most mature emerging resistive memory technologies. Their small footprint, fast read/write operation, and multi-bit storage capacity make them ideal for in-memory computation of synaptic propagation [ 147 , 148 ]. Consequently, PCM technology has seen increased interest in neuromorphic computing [ 138 , 149 -151 ]. While a single PCM device can achieve 3 -4 bits of resolution [ 152 ], they exhibit significant non-idealities due to stochastic Joule heating-based switching physics. Molecular dynamics introduce 1/ f noise and structural relaxation, leading to cycle-to-cycle variation in addition to device-to-device variability from fabrication. Hardware-algorithm co-design with chip-in-the-loop setups is one approach to address these non-idealities [ 138 ]. However, neural network training necessitates iterative evaluation of architectures, learning rule modifications, and hyperparameter tuning on large datasets, which are time and resource-intensive with such setups. In contrast, a software simulation framework with highly accurate statistical model of memristive devices offers faster iteration and better understanding of device effects due to increased observability of internal state variables. In this work, we investigate whether a RSNN can be trained with a local learning rule despite the adverse impacts of memristive in-memory computing, including write/read noise, conductance drift, and limited bit precision. We build upon the statistical PCM model from Nandakumar et al. [ 81 ] to faithfully model a differential memristor crossbar array (Section 3 . 1 . 3 ), define a target spiking RNN architecture, and describe the properties of an ideal learning rule, selecting the e-prop algorithm [ 75 ] for training (Section 3 . 1 . 3 . 1 ). We implement multiple memristor-aware weight update methods to map ideal e-prop updates to memristor conductances on the crossbar array, addressing device non-idealities (Section 3 . 1 . 3 . 2 ). Finally, we present a training scheme exploiting in-memory computing with extreme sparsity and reduced conductance updates for energy-efficient training (Section 3 . 1 . 3 . 4 ). ## 3 . 1 . 2 Building blocks for training on in-memory processing cores In this section, we describe the main components of our simulation framework for training spiking RNNs with PCM synapses 1 . ## 3 . 1 . 3 PCM device modeling and integration into neural networks The nanoscale PCM device typically consists of a Ge2Sb2Te5 (GST) switching material sandwiched between two metal electrodes, forming a mushroom-like structure (Fig. 3 . 1 b). Short electrical pulses applied to the device terminals induce Joule heating, locally modifying the temperature distribution within the PCM. This controlled temperature change can switch the molecular configuration of GST between amorphous (high-resistance) and crystalline (low-resistance) states [ 17 ]. A short, high-amplitude RESET pulse (typically 3 . 5 V amplitude and 20 ns duration) increases the amorphous state volume by melting a significant portion of the GST, which then rapidly quenches to form an amorphous configuration. Conversely, a longer, lower-amplitude SET pulse ( 3 . 5 V amplitude and 20 ns duration) increases the crystalline volume by raising the temperature to initiate crystal nuclei growth. To read the device conductance, a smaller-amplitude READ pulse ( 0 . 2 V amplitude 50 ns duration) is applied to avoid inducing phase transitions. In practice, PCM programming operations suffer from write/read noise and conductance drift [ 153 ]. The asymmetry of SET and RESET operations, along with the nonlinear conductance response to pulse number and frequency, complicate precise programming. Accurately capturing these non-idealities and device dynamics in network models is crucial for realistic evaluation of metrics such as weight update robustness, hyperparameter choices, and training duration. While comprehensive models exist for describing PCM electrical [ 45 ], thermal [ 154 ], structural [ 155 , 156 ], and phase-change properties [ 157 , 158 ], these models often involve on-the-fly differential equations with uncertain numerical convergence, lack incorporation of inter- and 1 The code is available at https://github.com/YigitDemirag/srnn-pcm F igure 3 . 1 : a. PCM devices can be arranged in a crossbar architecture to emulate both a non-volatile synaptic memory and a parallel and asynchronous synaptic propagation using in-memory computation. b. Mushroom-type geometry of a single PCM device. The conductance of the device can be reconfigured by changing the volume ratio of amorphous and crystalline regions. <details> <summary>Image 9 Details</summary> ![279d4014](/v1/image/279d40141931b5c240eaa15ce6654691cab4518aaf69569439e78216ebca02fc) ### Visual Description ## Diagram: Neuromorphic Synapse Array with Conductance Modulation Device ### Overview The image presents two interconnected technical diagrams: 1. **Left Panel (a)**: A schematic of a neuromorphic neural network with presynaptic and postsynaptic neurons connected via synapses. 2. **Right Panel (b)**: A legend defining material components and a cylindrical diagram illustrating conductance modulation (RESET/SET operations) in a memory device. --- ### Components/Axes #### Panel a: Neural Network - **Labels**: - **Presynaptic neurons**: Gray circles on the left. - **Synapses**: Green squares connecting presynaptic and postsynaptic neurons. - **Postsynaptic neurons**: Gray circles on the right. - **Inset**: - **G⁺/G⁻**: Symbols representing excitatory (G⁺) and inhibitory (G⁻) synaptic states. - **Legend (Panel b)**: - **Electrode (Pt/Ti)**: Blue. - **Ge₂Sb₂Te₅ (Crystalline)**: Green. - **Ge₂Sb₂Te₅ (Amorphous)**: Red. - **Heater (TiN)**: Yellow. - **Dielectric (SiO₂)**: Gray. #### Panel b: Conductance Modulation Device - **Cylindrical Diagram**: - **Layers**: 1. **Electrode (Pt/Ti)**: Blue outer layer. 2. **Ge₂Sb₂Te₅ (Crystalline)**: Green middle layer. 3. **Heater (TiN)**: Yellow inner layer. 4. **Dielectric (SiO₂)**: Gray base. - **Arrows**: - **RESET Operation**: Left-to-right arrow labeled "Decreasing conductance." - **SET Operation**: Right-to-left arrow labeled "Increasing conductance." --- ### Detailed Analysis #### Panel a: Neural Network - **Structure**: - 5 presynaptic neurons connect to 5 postsynaptic neurons via 25 synapses (5×5 grid). - Synapses are uniformly green, suggesting identical initial conductance states. - **Inset**: - **G⁺/G⁻**: Indicates synaptic plasticity, with G⁺ (excitatory) and G⁻ (inhibitory) states. #### Panel b: Conductance Modulation Device - **RESET Operation**: - Red amorphous Ge₂Sb₂Te₅ layer shrinks, reducing conductance. - **SET Operation**: - Red amorphous layer expands, increasing conductance. - **Material Roles**: - **Heater (TiN)**: Yellow layer generates heat to drive phase changes in Ge₂Sb₂Te₅. - **Dielectric (SiO₂)**: Gray base insulates the structure. --- ### Key Observations 1. **Color Consistency**: - Red in the legend (Ge₂Sb₂Te₅ amorphous) matches the red layer in the cylindrical diagram. - Green (Ge₂Sb₂Te₅ crystalline) and yellow (TiN heater) align with their respective layers. 2. **Conductance Dynamics**: - RESET reduces conductance by crystallizing Ge₂Sb₂Te₅ (thinner layer). - SET increases conductance by amorphizing Ge₂Sb₂Te₅ (thicker layer). 3. **Neural Network Context**: - Synapses likely model synaptic plasticity, with G⁺/G⁻ states mimicking biological synapses. --- ### Interpretation This diagram illustrates a **neuromorphic computing system** where: - **Synaptic Plasticity**: The neural network (Panel a) mimics biological synapses, with G⁺/G⁻ states representing excitatory/inhibitory signaling. - **Resistive Memory**: The cylindrical device (Panel b) uses phase-change materials (Ge₂Sb₂Te₅) to modulate conductance via RESET/SET operations. The amorphous (red) and crystalline (green) phases of Ge₂Sb₂Te₅ directly control conductance, enabling memory storage. - **Material Synergy**: - The TiN heater (yellow) provides thermal energy to drive phase transitions in Ge₂Sb₂Te₅. - Pt/Ti electrodes (blue) and SiO₂ dielectric (gray) form the electrical interface. **Significance**: This system bridges biological neural networks and artificial memory devices, enabling energy-efficient, adaptive computing. The RESET/SET operations mirror synaptic strengthening/weakening, critical for machine learning applications. </details> intra-device stochasticity, or are designed for pulse shapes and current-voltage sweeps that do not reflect circuit operating conditions [ 17 ]. Therefore, we adopted the statistical PCM model by Nandakumar et al. [ 81 ], which captures major PCM non-idealities based on measurements from 10 , 000 devices. This model includes nonlinear conductance change with respect to applied pulses, conductance-dependent write and read stochasticity, and temporal drift (Fig. 3 . 2 ). A programming history variable represents the device's nonlinear response to consecutive SET pulses, updated after each pulse. After a new SET pulse, the model samples conductance change ( ∆ G ) from a Gaussian distribution with mean and standard deviation based on programming history and previous conductance. Drift is included using the empirical exponential drift model [ 156 ] G ( t ) = G ( T 0 ) ( t / T 0 ) -v , where G ( T 0 ) is the conductance after a WRITE pulse at time T 0, and G ( t ) is the final conductance after drift. The model also accounts for 1/ f READ noise [ 159 ], which increases monotonically with conductance. To integrate this model into neural network simulations, we developed a PyTorch-based PCM crossbar array simulation framework [ 107 ]. This framework tracks all simulated PCM devices in the crossbar simultaneously, enabling realistic SET, RESET, and READ operations (implementation details in Section A). Section 3 . 1 . 3 . 3 describes how this framework is used to represent synaptic weights in an RNN. F igure 3 . 2 : The chosen PCM model from [ 81 ] captures the major device non-idealities. a. The WRITE model enables calculation of the conductance increase with each consecutive SET pulse applied to the device. The band illustrates one standard deviation. b. The READ model enables calculation of 1/ f noise, increasing as a function of conductance. c. The DRIFT model calculates the temporal conductance evolution as a function time. T 0 indicates the time of measurement after the initial programming of the device. <details> <summary>Image 10 Details</summary> ![e2f5925f](/v1/image/e2f5925f3fe873bcf7087e5ced436adb217f0435d631a0d108049997df7a75f6) ### Visual Description ## Multi-Model Conductance Analysis ### Overview The image presents three distinct models of device conductance behavior: WRITE, READ, and DRIFT. Each graph demonstrates different relationships between conductance parameters and operational variables, with confidence intervals and decay patterns visualized. ### Components/Axes **a. WRITE Model** - X-axis: Pulse number (1-20) - Y-axis: Device conductance (µS) - Legend: None - Shaded region: Confidence interval (blue) **b. READ Model** - X-axis: Device conductance (µS) [0-12] - Y-axis: Measured conductance (µS) [0-12.5] - Legend: None - Shaded region: Confidence interval (blue) **c. DRIFT Model** - X-axis: Time (s) [log scale: 10^-2 to 10^4] - Y-axis: Device conductance (µS) [2-15] - Legend: G(T₀) values (12, 10, 8, 6, 4, 2) with corresponding line colors ### Detailed Analysis **a. WRITE Model** - Conductance increases non-linearly from ~2.5 µS (pulse 1) to ~10 µS (pulse 20) - Confidence interval widens from ±0.5 µS (pulse 1) to ±1.5 µS (pulse 20) - Notable inflection point at pulse 8 (slope change from 0.3 µS/pulse to 0.2 µS/pulse) **b. READ Model** - Linear relationship with slope = 1.02 (R²=0.998) - Measured conductance = Device conductance + 0.2 µS offset - Confidence interval remains constant at ±0.3 µS across all measurements **c. DRIFT Model** - Exponential decay patterns for all G(T₀) values - Half-life calculations: - G(T₀)=12: 32s - G(T₀)=10: 28s - G(T₀)=8: 24s - G(T₀)=6: 20s - G(T₀)=4: 16s - G(T₀)=2: 12s - All lines converge at ~0.5 µS after 10^4 seconds ### Key Observations 1. WRITE model shows diminishing returns in conductance increase per pulse 2. READ model demonstrates near-perfect linearity with minimal measurement error 3. DRIFT model reveals inverse relationship between initial conductance and decay rate 4. Confidence intervals in WRITE model suggest increasing measurement uncertainty with repeated pulsing ### Interpretation The WRITE model suggests a saturating response mechanism with increasing variability in device behavior over repeated stimulation. The READ model's linearity indicates a stable measurement system with consistent calibration. The DRIFT model reveals time-dependent conductance decay that follows first-order kinetics, with higher initial conductance values decaying more rapidly. This could indicate thermally activated processes or material degradation mechanisms. The convergence of all DRIFT curves at low conductance values suggests a common limiting factor in the decay process, possibly related to environmental conditions or intrinsic material properties. </details> ## 3 . 1 . 3 . 1 Credit Assignment Solutions for Recurrent Network Architectures The credit assignment problem refers to the problem of determining the appropriate change for each synaptic weight to achieve the desired network behavior [ 7 ]. As the architecture determines the information flow inside the network, the credit assignment solution is intertwined with network architecture. Consequently, many proposed solutions in SNNs landscape are specific to architecture components e.g., eligibility traces [ 160 ], dendritic [ 161 ] or neuromodulatory [ 162 ] signals. In our work, we select a RSNN with LIF neuron dynamics described by the following discretetime equations [ 75 ]: ̸ $$v _ { j } ^ { t + 1 } & = \alpha v _ { j } ^ { t } + \sum _ { i \neq j } W _ { j i } ^ { r e c } z _ { i } ^ { t } + \sum _ { i } W _ { j i } ^ { i n } x _ { i } ^ { t } - z _ { j } ^ { t } v _ { t h } \\ & z _ { j } ^ { t } = H \left ( \frac { v _ { j } ^ { t } - v _ { t h } } { v _ { t h } } \right )$$ where v t j is the membrane voltage of neuron j at time t . The output state of a neuron is a binary variable, z t j , that can either indicate a spike, 1, or no spike, 0. The neuron spikes when the membrane voltage exceeds the threshold voltage vth , a condition that is implemented based on the Heaviside function H . The parameter α ∈ [ 0, 1 ] is the membrane decay factor calculated as α = e -δ t / τ m , where δ t is the discrete time step resolution of the simulation and τ m is the neuronal membrane decay time constant, typically tens of milliseconds. The network activity is driven by input spikes x t i . Input and recurrent weights are represented as W in ji and W rec ji respectively. At the output layer, the recurrent spikes are fed through readout weights W out kj into a single layer of leaky integrator units yk with the decay factor κ ∈ [ 0, 1 ] . This continuous valued output unit is analogous to a motor function which generates coherent motor output patterns [ 163 ] of the type shown in Fig. 3 . 3 . Training goal is to find optimal network weights { W inp ji , W rec ji and W out kj } , that maximize the task performance [ 7 ]. For an ideal neuromorphic hardware, the learning algorithm must (i) use spatio-temporally local signals (ii) be online and (iii) tested beyond the toy problems. As an example, FORCE algorithm [ 163 , 164 ] performs well on motor tasks, but it violates the first requirement by requiring knowledge of all synaptic weights. BPTT with surrogate gradients [ 165 , 166 ] has its own drawbacks due to the need to buffer intermediate neuron states and activations, violating the second requirement. E-prop offers a local and online learning rule for single-layer RNNs [ 75 ] by factorizing the gradients into a sum of products between instantaneously available learning signals and local eligibility traces. Speficially, the gradient dE dWji is represented as a sum of products over time t : $$\frac { d E } { d W _ { j i } } = \sum _ { t } \frac { d E } { d z _ { j } ^ { t } } \cdot \left [ \frac { d z _ { j } ^ { t } } { d W _ { j i } } \right ] _ { l o c a l } \, ,$$ where E is the loss term such as the mean squared error between the network output y t k and target y ∗ , t k for regression tasks. The term [ dz t j dWji ] local is not an approximation. It is computed locally, carries the factorization of the gradient (a local measure of the the synaptic weight's contribution to the neuronal activity) forward in time, and is described as the eligibility trace for the synapse from neuron i to neuron j at time t . Ideally, the term dE dz t j would be the total derivative of the loss function with respect to the neuron's spike output. However, this is unavailable online as it requires information about the spike's future impact on the error. Therefore, e-prop approximates the learning signal using the partial derivative ∂ E ∂ z t j , considering only the direct influence of the spike output on the instantaneous error. This approximation enables e-prop to function as an online learning algorithm for RSNNs, at the cost of truncating the exact gradients. E-prop's performance is notable, achieving performance comparable results to Long Short-Term Memory (LSTM) networks [ 167 ] trained with BPTT on complex temporal tasks. While the OSTL algorithm [ 84 ] also supports complex recurrent architectures and generates exact gradients, its computational complexity makes it less suitable for hardware implementation compared to e-prop. In this work, we focus on e-prop due to its - Sufficient gradient alignment: while not exact, e-prop provides a sufficient approximation of the true gradient. - Relative simplicity: computational locality and simplicity are well-suited for low-power neuromorphic hardware applications. - Neuroscientific relevance: the employment of eligibility traces aligns with neuroscientific observations of synaptic plasticity. ## 3 . 1 . 3 . 2 Memristor-Aware Weight Update Optimization In mixed-signal neuromorphic processors, Learning Blocks (LBs) are typically located with neuron circuits [ 143 , 168 , 169 ]. LBs continuously monitor signals available to the neuron (e.g., presynaptic activities, inference, feedback) and, based on the desired learning rule, instruct weight updates on the synapses. However, when network weights are implemented with memristive devices, weight updates are subject to analog non-idealities. Therefore, LB design must consider these device non-idealities, such as programming noise and asymmetric SET/RESET updates, to ensure accurate transfer of calculated gradients to device conductances. To save both energy and area, weight updates are typically implemented in a single-shot fashion, using one or multiple gradual SET pulses to update the device without requiring a read-verify cycle. In the following, we describe four widely adopted weight update methods for LBs, implemented in our PCM crossbar array simulation framework. Each method is designed to cope with device non-idealities. In all experiments, our framework employs differential synaptic configuration [ 81 , 109 , 110 ], where each synapse has two sets of memristors ( G + and G -) whose difference represents the effective synaptic conductance (Fig. 3 . 1 a) 2 the sign gradient descent ( signgd ) In SignGD, synaptic weights W is updated based solely on the sign of the loss function gradient with respect to the weights, ˆ ∇ W L , as approximated by the online learning rule. Updates only occur when the magnitude of the gradient for a weight exceeds a predefined threshold θ such that $$\Delta W = - \delta \, s i g n ( \hat { \nabla } _ { W } \mathcal { L } ) \odot \mathbb { I } ( | \hat { \nabla } _ { W } \mathcal { L } | > \theta ) & & ( 3 . 3 )$$ where δ is a positive stochastic variable representing the conductance change due to a single SET pulse applied to the memristor, sign ( · ) is the element-wise sign function, ⊙ is Hadamard product, I ( · ) is the indicator function implementing stop-learning regime ( 1 if true, 0 otherwise). This approach ensures convergence under certain conditions [ 171 ]. Due to its simplicity, SignGD is popular in memristive neuromorphic systems [ 172 -174 ]. Upon weight update onset, LB applies a single SET pulse to either the G + or the G -PCM device, as determined by the gradient's sign. The effective value of δ is not constant due to WRITE noise and can lead to biases in the weight update distribution, potentially impacting learning dynamics and convergence. 2 In differential configuration, unidirectional updates can saturate one or both devices [ 81 , 109 , 110 ]. While push-pull mechanism [ 170 ] can address this issue in memristors with symmetric SET/RESET characteristics, it's not feasible in PCMs since the abrupt nature of RESET operations [ 46 ]. This necessiates a frequent saturation check and refresh mechanism to reset and reprogram both memristors. stochastic update ( su ) Conventional optimization methods often require weight updates that are 3 -4 orders of magnitude smaller than the original weight values [ 175 ], posing a challenge for PCM devices with limited precision [ 176 ]. To bridge this precision gap, SU stochastically executes updates based on the approximated gradient's magnitude [ 81 ]: $$P ( \ u p d a t e ) = \min \left ( 1 , \frac { | \hat { \nabla } _ { W } \mathcal { L } | } { p } \right ) \quad ( 3 . 4 )$$ where p is a scaling factor controlling the update probability. Choosing p such that update probability P ( update ) is proportional to ∥ ˆ ∇ W L∥ ensures that larger gradients are more likely to trigger updates, effectively adapting the learning rate to the limited precision of PCM devices. In our implementation, we scale the gradient by 1/ p before comparing it to a random uniform value to determine whether an update occurs. This approach, inspired by Nandakumar et al. [ 81 ], allows for fine-grained control over the effective learning rate. Unlike the original work, we perform the refresh operation before the update to prevent updates on saturated devices. This modification further enhances the stability and reliability of the learning process. multi -memristor update ( mmu ) MMUenhances synaptic weight resolution and mitigates write noise by utilizing 2 N PCM devices per synapse, arranged in N differential pairs [ 109 ]. Updates are applied sequentially to these devices, effectively reducing the minimum achievable weight change by a factor of 2 N and the variance due to write noise by a factor of √ 2 N (see Supplementary Note 3 ). In our implementation, we estimate the number of SET pulses required to achieve the desired conductance change, assuming a linear conductance increase of 0 . 75 µ S per pulse (see Section 3 . 1 . 3 ). 3 These pulses are then applied sequentially to the PCM devices in a circular queue. A refresh operation is performed if the conductance of any device pair exceeds 9 µ S and their difference is less than 4 . 5 µ S. This refresh mechanism helps maintain the dynamic range of the synaptic weights and ensures reliable long-term operation. mixed -precision update ( mpu ) MPUaddresses the discrepancy between the high precision of learning algorithms and the limited resolution of PCM devices by accumulating gradients on a high-precision co-processor until they reach a threshold that can be reliably represented in PCM. This approach is analogous to quantization-aware training techniques [ 110 , 111 ]. In our implementation, approximated gradients calculated by e-prop are accumulated in FP 32 memory until they are an integer multiple of the PCM update granularity ( 0.75 µ S). These accumulated values are then converted to the corresponding number of pulses and applied to the PCM devices. A refresh operation is triggered when the conductance of either device in a pair exceeds 9 µ S, and their difference is less than 4.5 µ S, maintaining synaptic efficacy and preventing saturation. ## 3 . 1 . 3 . 3 Training a Spiking RNN on a PCM Crossbar Simulation Framework We used PCM crossbar array model to determine realistic values for the network parameters { W in ji , W rec ji and W out kj } . To represent synaptic weights, W ∈ [ -1, 1 ] , with PCM device conductance values, G ∈ [ 0.1, 12 ] µ S [ 81 ], we used the linear relationship W = β [ ∑ N G + -∑ N G -] , where ∑ N G + and ∑ N G -are the total conductance of N memristors 4 representing the potentiation and the depression of the synapse respectively [ 81 ]. The forward computation (inference) of Eq. 3 . 1 is simulated using the PCM crossbar simulation framework, incorporating the effects of READ noise and temporal conductance drift. Subsequently, the weight updates calculated by the e-prop algorithm are applied to the PCM-based crossbar arrays using each of the methods described in Section 3 . 1 . 3 . 2 . 3 For a more precise pulse estimation method, refer to Nandakumar et al. [ 81 ]. 4 N = 1 for all weight update methods, except multi-memristor updates. F igure 3 . 3 : Overview of the spiking RNN training framework with the proposed PCM crossbar array simulation framework, illustrated for a pattern generation task. Network weights are allocated from three crossbar array models, Ginp , Grec , Gout . The network-generated pattern is compared to the target pattern to produce a learning signal, which is fed back to each neuron. The LB calculates instantaneous weight changes ∆ W using the e-prop learning rule and has to efficiently transfer the desired weight change to a conductance change, i.e. ∆ W → ∆ G , while accounting for PCM non-idealities. <details> <summary>Image 11 Details</summary> ![bd0d3d21](/v1/image/bd0d3d214ad929f1d03c5470d3914f5597040c6638b9f02ceef31ea2aeef5ff1) ### Visual Description ## Diagram: Recurrent Neural Network Architecture with Gradient Flow ### Overview The diagram illustrates a recurrent neural network (RNN) architecture with explicit gradient flow and learning mechanisms. It shows the interaction between input neurons, recurrent neurons, output units, and a learning block, with gradients (ΔG) propagating through the system to optimize network performance. ### Components/Axes 1. **Input Neurons**: Vertical column on the left, labeled "Input Neurons" with three black dots representing neuron activations. 2. **Recurrent Neurons**: Central circular structure with interconnected black dots and arrows, labeled "Recurrent Neurons." 3. **Output Unit**: Single neuron on the right, labeled "Output Unit." 4. **Learning Block**: Rectangular box labeled "Learning Block" with ΔW → ΔG arrow, connected to "Per Neuron" adjustments. 5. **Network Loss**: Box labeled "Network Loss" with a dashed line to "Generated Pattern" and solid line to "Target Pattern." 6. **Gradients**: - ΔG_inp (green arrow from Input Neurons to Recurrent Neurons) - ΔG_rec (green arrow from Recurrent Neurons to Learning Block) - ΔG_out (green arrow from Output Unit to Learning Block) ### Detailed Analysis - **Input Flow**: Input neurons (G_inp) feed into recurrent neurons (G_rec), which loop back to themselves via bidirectional arrows. - **Output Generation**: Recurrent neurons (G_out) produce an output pattern compared to the target pattern. - **Gradient Propagation**: - ΔG_inp adjusts input neuron weights. - ΔG_rec and ΔG_out propagate through the learning block to update weights (ΔW) per neuron. - **Loss Calculation**: Network loss is computed by comparing generated and target patterns, driving weight adjustments. ### Key Observations 1. **Recurrent Feedback Loop**: The circular arrows in recurrent neurons indicate temporal dependencies and memory retention. 2. **Gradient Hierarchy**: Gradients flow from output (ΔG_out) and hidden layers (ΔG_rec) to input (ΔG_inp), suggesting backpropagation through time. 3. **Learning Mechanism**: The learning block explicitly links weight updates (ΔW) to gradient calculations (ΔG), emphasizing optimization. ### Interpretation This architecture demonstrates a standard RNN with backpropagation, where: - **Recurrent connections** enable handling of sequential data by maintaining state. - **Gradient flow** ensures error correction propagates through all layers, optimizing weights to minimize network loss. - The **target pattern** serves as a reference for supervised learning, guiding the network to align generated outputs with desired results. The diagram highlights the interplay between forward propagation (input → output) and backward error correction (gradients → weight updates), critical for training RNNs in tasks like sequence prediction or time-series analysis. </details> T able 3 . 1 : Performance evaluation of spiking RNNs with models of PCM crossbar arrays. | Method | Sign-gradient | Stochastic | Multi-mem (N= 4 ) | Multi-mem (N= 8 ) | Mixed-precision | |----------|-----------------|--------------|---------------------|---------------------|-------------------| | MSE Loss | 0 . 2080 | 0 . 1808 | 0 . 1875 | 0 . 1645 | 0 . 0380 | ## 3 . 1 . 3 . 4 Results We validated online training on a 1 D continual pattern generation task [ 75 ], relevant for motor control and value function estimation in RL [ 163 ], using our analog crossbar framework (see Section 3 . 1 . 3 . 5 ) Table 3 . 1 summarizes the training performance of the RSNN using different weight update methods on PCM crossbar arrays. We defined an MSE loss of < 0 . 1 as the performance performance threshold for this task (see Section A for selection criteria). Among the five configurations, only the mixed-precision approach achieved this threshold, demonstrating sparse spiking activity and successful pattern generation. During training, the weight saturation problem due to the differential configuration is rare ( < 1% ) , as shown in Fig 3 . 5 (right). We hypothesize that this is due to the mixed-precision algorithm reducing the total number of WRITE pulses through update accumulation ( ∼ 12 WRITE pulses are applied per epoch, Fig. 3 . 6 .). Fig. 3 . 5 (left) illustrates the effective weight distribution of the PCM synapses at the end of the training. To assess performance loss due to PCM non-idealities (WRITE/READ noise, drift), we simulated an ideal 4 -bit device model, effectively acting as a digital 4 -bit memory (Section A). Table 3 . 2 summarizes the performance of different update methods with this ideal model. Stochastic, multi-memristor ( N = 8), and mixed-precision updates successfully solved the task, with mixed-precision achieving the best accuracy. All methods performed better without PCM non-idealities. Interestingly, stochastic updates outperformed both multi-memristor methods, suggesting the importance of few stochastic updates when training with quantized weights. To further evaluate the impact of limited bit precision, we trained the same network with e-prop using standard FP 32 weights. This high-resolution training yielded comparable results to mixed-precision training with either ideal quantized memory or the PCM cell model. <details> <summary>Image 12 Details</summary> ![4c1b3a8d](/v1/image/4c1b3a8ddcfbe4a73bf4b429eef90960f056161587329725fed4910e477a296e) ### Visual Description ## Line Plots: Recurrent Layer Spike Patterns, Training Loss, and Pattern Generation ### Overview The image contains three interconnected plots analyzing a recurrent neural network (RNN) model's behavior during training and pattern generation. The top plot visualizes neuron activity over time, the bottom-left plot tracks training loss, and the bottom-right plot compares target and generated patterns. All plots use time (ms) or epochs as the x-axis, with distinct y-axes for neuron IDs, loss values, and output magnitudes. --- ### Components/Axes 1. **Top Plot: Recurrent Layer Spike Patterns** - **X-axis**: Time (ms), ranging from 0 to 1000 ms. - **Y-axis**: Neuron ID (0–100), with discrete integer values. - **Data**: Blue vertical bars represent spike events. Spikes are irregularly distributed, with clusters (e.g., Neuron IDs 25–75 active between 400–600 ms). - **Legend**: "Recurrent layer spike patterns" (blue). 2. **Bottom-Left Plot: Training Loss** - **X-axis**: Epoch (0–250), linear scale. - **Y-axis**: Loss (0.1–0.3), logarithmic scale with shaded uncertainty band (±0.05). - **Data**: Blue line shows loss decreasing from ~0.3 (epoch 0) to ~0.1 (epoch 250), with fluctuations (e.g., peak at ~0.28 at epoch 50). 3. **Bottom-Right Plot: Pattern Generation** - **X-axis**: Time (ms), 0–1000 ms. - **Y-axis**: Output (-1 to 1), linear scale. - **Data**: - **Target** (blue line): Smooth sinusoidal oscillations with amplitude ~1. - **RNN Output** (orange line): Matches Target closely but with minor amplitude variations (e.g., ~0.9 at 400 ms) and phase shifts (e.g., lag at 800 ms). - **Third Line** (unlabeled blue): Overlaps with Target but shows sharper peaks (e.g., 1.1 at 600 ms), not reflected in the legend. --- ### Detailed Analysis 1. **Recurrent Layer Spike Patterns** - Spikes are sparse and neuron-specific. For example: - Neuron 50 fires at ~200 ms and ~800 ms. - Neuron 75 has dense spikes between 400–600 ms. - No clear temporal correlation between neurons, suggesting distributed processing. 2. **Training Loss** - Loss decreases monotonically overall but with periodic spikes (e.g., ~0.25 at epoch 100). The shaded area indicates variability in loss estimates, likely from batch averaging. 3. **Pattern Generation** - **Target vs. RNN Output**: - RNN Output closely tracks Target but with: - Amplitude discrepancies (e.g., RNN peaks at 0.95 vs. Target’s 1.0 at 400 ms). - Phase delays (e.g., RNN lags by ~50 ms at 800 ms). - **Third Line Anomaly**: The unlabeled blue line deviates from both Target and RNN Output, suggesting either: - A mislabeled data series. - An additional hidden layer output not included in the legend. --- ### Key Observations 1. **Training Dynamics**: The loss curve’s downward trend confirms model convergence, but fluctuations suggest potential overfitting risks. 2. **Spike Patterns**: Neuron activity is distributed across time, with no dominant rhythmic patterns, implying complex temporal coding. 3. **Pattern Generation**: RNN successfully replicates Target dynamics but with minor inaccuracies, indicating room for improvement in temporal precision. --- ### Interpretation - **Model Performance**: The RNN learns to approximate Target patterns effectively, as evidenced by the Training Loss decline and close alignment in the Pattern Generation plot. However, residual errors (e.g., amplitude/phase mismatches) suggest the need for architectural adjustments (e.g., deeper layers, regularization). - **Spike Patterns**: The irregularity of neuron spikes may reflect the model’s ability to encode diverse temporal features, though the lack of coordination between neurons could limit predictive power for sequential tasks. - **Third Line Mystery**: The unlabeled blue line in Pattern Generation warrants investigation. If intentional, it might represent an auxiliary output (e.g., gradient checks), but its exclusion from the legend introduces ambiguity. --- ### Spatial Grounding & Trend Verification - **Legend Placement**: Top-right corner for all plots, ensuring clarity. - **Color Consistency**: - Top Plot: Blue matches "Recurrent layer spike patterns." - Bottom-Right Plot: Blue = Target, Orange = RNN Output (matches legend). - **Trend Logic Checks**: - Training Loss: Decreasing trend aligns with expected learning behavior. - Pattern Generation: RNN Output’s lagging phase matches typical RNN limitations in capturing long-term dependencies. --- ### Conclusion The data demonstrates an RNN’s capacity to learn and generate patterns with reasonable accuracy but highlights opportunities for refinement in temporal precision and model transparency (e.g., clarifying the third line in Pattern Generation). The spike patterns suggest distributed, non-synchronized neural activity, which may be advantageous for certain tasks but requires further analysis for optimization. </details> F igure 3 . 4 : Dynamics of a network trained with the mixed-precision algorithm. The raster plot (top) shows the sparse spiking activity ( ∼ 3 . 3 Hz) of recurrent-layer neurons. The training loss (bottom left) demonstrates MSE loss over 250 epochs is averaged over ten best network hyperparameters (see Fig A. 8 for the best performing hyperparameter). Properly-tuned neuronal time constants and trained network weights result in generated patterns following the targets (bottom right). The generated patterns are extracted from three different spiking RNNs. F igure 3 . 5 : (left) The effective conductance distributions ( G + -G -) of the synapses in the input, recurrent and output layer, at the end of the training with the mixed-precision method. (right) Averaged over 50 training runs, the mean number of PCM devices requiring a refresh is shown for each layer. The refresh operation was not needed for recurrent and output layers. <details> <summary>Image 13 Details</summary> ![677b65d0](/v1/image/677b65d0b5c60bd3ca6418f03f3f36664ff7fb6067dde4447b3610311a4fca18) ### Visual Description ## Histograms and Line Graph: Layer Conductance and PCM Evolution ### Overview The image contains two primary components: 1. **Three histograms** (left) showing the distribution of effective conductance (μS) for three neural network layers: Input (Inp), Recurrent (Rec), and Output (Out). 2. **A line graph** (right) tracking the number of PCMs (Phase Change Memory units) over training epochs for the same layers. --- ### Components/Axes #### Histograms (Left) - **X-axis**: Effective conductance (μS), ranging from -10 to +10. - **Y-axis**: Count (log scale, 10⁰ to 10³). - **Legends**: - Cyan: Inp layer - Blue: Rec layer - Purple: Out layer #### Line Graph (Right) - **X-axis**: Epoch (0 to 250). - **Y-axis**: Number of PCMs (0 to 0.5). - **Legends**: - Cyan: Inp layer - Blue: Rec layer - Purple: Out layer --- ### Detailed Analysis #### Histograms 1. **Inp Layer (Cyan)**: - Single dominant peak at **~8 μS** with a count of **~1,000**. - No other significant peaks; distribution is narrow. 2. **Rec Layer (Blue)**: - Two peaks: - **~ -5 μS** with count **~500**. - **~ +2 μS** with count **~300**. - Broader distribution compared to Inp layer. 3. **Out Layer (Purple)**: - Two peaks: - **~0 μS** with count **~100**. - **~+3 μS** with count **~50**. - Distribution is bimodal and less concentrated than Inp/Rec layers. #### Line Graph 1. **Inp Layer (Cyan)**: - Steady increase from **0 PCMs** at epoch 0 to **~0.25 PCMs** at epoch 250. - Shaded area (confidence interval) widens slightly over time. 2. **Rec Layer (Blue)**: - Remains **flat at ~0 PCMs** throughout all epochs. 3. **Out Layer (Purple)**: - Slight upward trend from **0 PCMs** to **~0.05 PCMs** by epoch 250. - Shaded area remains narrow, indicating low variability. --- ### Key Observations 1. **Conductance Distribution**: - Inp layer exhibits the highest conductance values (peak at 8 μS), suggesting stronger synaptic connections. - Rec layer shows mixed polarity (negative and positive peaks), indicating diverse synaptic weights. - Out layer has lower conductance magnitudes, with a bimodal distribution centered near 0 μS. 2. **PCM Evolution**: - Inp layer’s PCM count grows linearly, implying increasing memory unit utilization or connectivity. - Rec layer’s stagnation suggests no significant changes in memory unit dynamics. - Out layer’s minimal growth may reflect limited plasticity or stabilization post-training. --- ### Interpretation - **Conductance Trends**: The Inp layer’s high conductance aligns with its role in receiving external inputs, while the Rec layer’s mixed polarity reflects internal feedback mechanisms. The Out layer’s low conductance may indicate specialized, sparse connectivity for output generation. - **PCM Dynamics**: The Inp layer’s rising PCM count could correlate with enhanced learning capacity or memory consolidation. The Rec layer’s stability might indicate a fixed internal state, while the Out layer’s slow growth suggests gradual adaptation. - **Anomalies**: The Rec layer’s negative conductance peak (-5 μS) is unusual, potentially indicating inhibitory synaptic pathways or data artifacts. This analysis highlights how layer-specific properties (conductance, PCM growth) evolve during training, offering insights into neural network architecture and learning dynamics. </details> F igure 3 . 6 : Total number of WRITE pulses applied to PCM devices are shown for the input, recurrent and output layers. Only 0 . 07 %, 0 . 07 % and 0 . 1 % of PCM devices within each layer are programmed respectively during mixed-precision training. <details> <summary>Image 14 Details</summary> ![ff85e0ed](/v1/image/ff85e0ed01fa8fdb729318c292c7de07745f7d012a29513be19f3734c3accbf2) ### Visual Description ## Line Graphs: WRITE Pulses Across Neural Network Layers ### Overview The image contains three line graphs depicting the relationship between "Epoch" (training iterations) and "WRITE pulses" (data writes) for three neural network components: Input layer, Recurrent layer, and Output layer. Each graph includes a shaded uncertainty band around the central trend line. ### Components/Axes - **X-axis**: "Epoch" (0–250), incrementing by 50 - **Y-axis**: "WRITE pulses" with varying scales: - Input layer: 0–2000 - Recurrent layer: 0–1500 - Output layer: 0–35 - **Legends**: Positioned right-aligned, with color-coded labels: - Input layer: Blue line with light blue shading - Recurrent layer: Dark blue line with medium blue shading - Output layer: Dark blue line with light blue shading ### Detailed Analysis 1. **Input Layer** - **Trend**: Steep logarithmic growth from 0 to ~1,800 pulses at 250 epochs - **Key Points**: - 50 epochs: ~500 pulses - 100 epochs: ~1,200 pulses - 150 epochs: ~1,500 pulses - 200 epochs: ~1,650 pulses - 250 epochs: ~1,800 pulses - **Uncertainty**: ±10% band around the curve 2. **Recurrent Layer** - **Trend**: Rapid initial growth plateauing at ~1,200 pulses - **Key Points**: - 50 epochs: ~300 pulses - 100 epochs: ~1,200 pulses - 150 epochs: ~1,250 pulses - 200 epochs: ~1,275 pulses - 250 epochs: ~1,300 pulses - **Uncertainty**: ±5% band around the curve 3. **Output Layer** - **Trend**: Linear growth from 0 to ~30 pulses - **Key Points**: - 50 epochs: ~5 pulses - 100 epochs: ~12 pulses - 150 epochs: ~20 pulses - 200 epochs: ~25 pulses - 250 epochs: ~30 pulses - **Uncertainty**: ±2% band around the curve ### Key Observations - Input layer shows the most significant growth, suggesting high data processing demands - Recurrent layer stabilizes after 100 epochs, indicating memory optimization - Output layer maintains consistent linear growth, implying stable output generation - All graphs show decreasing uncertainty bands over time, suggesting model convergence ### Interpretation The data demonstrates typical neural network training dynamics: 1. **Input Layer**: High WRITE pulses reflect intensive feature extraction from raw data 2. **Recurrent Layer**: Early stabilization suggests efficient temporal pattern learning 3. **Output Layer**: Linear growth indicates progressive refinement of final predictions The uncertainty bands reveal that: - Input layer has highest training variability (±10%) - Recurrent layer shows moderate stability (±5%) - Output layer demonstrates precise convergence (±2%) This pattern suggests the model prioritizes input processing early in training, with output refinement becoming more consistent as epochs increase. The Recurrent layer's plateau may indicate saturation of temporal dependencies in the dataset. </details> T able 3 . 2 : Performance evaluation of spiking RNNs with an ideal crossbar array model 5 | Method | Sign-gradient | Stochastic | Multi-mem (N= 4 ) | Multi-mem (N= 8 ) | Mixed-precision | |----------|-----------------|--------------|---------------------|---------------------|-------------------| | MSE Loss | 0 . 1021 | 0 . 0758 | 0 . 1248 | 0 . 0850 | 0 . 0289 | Similar to Nandakumar et al. [ 81 ], we found that the probability scaling factor p in the stochastic update method allows tuning the number of devices programmed during training. Figure 3 . 7 shows that increasing p (decreasing update probability) can reduce WRITE pulses by up to an order of magnitude without degrading the loss. This result highlights the potential for optimizing energy efficiency in memristive online learning systems by strategically adjusting the update probability. F igure 3 . 7 : The stochastic update method enables tuning the number of WRITE pulses to be applied to PCM devices. <details> <summary>Image 15 Details</summary> ![68a1b740](/v1/image/68a1b74010a7136b6876566c2a3e84b1cf8f0e54b509b7fc6585df8bca23ea6b) ### Visual Description ## Scatter Plot: Number of WRITE pulses applied to recurrent layer and Loss vs. Probability scaling factor ### Overview The image is a scatter plot visualizing the relationship between the **probability scaling factor (p)** and the **number of WRITE pulses applied to a recurrent layer**, with data points color-coded by **log-loss values**. The plot shows a clear inverse relationship between p and both the number of WRITE pulses and log-loss. --- ### Components/Axes - **X-axis**: "Probability scaling factor, p" (range: 0 to 1750, linear scale). - **Y-axis**: "Number of WRITE pulses applied to recurrent layer and Loss" (range: 0 to 3000, linear scale). - **Legend**: Located in the top-right corner, mapping log-loss values to colors: - **Light green**: log-loss = -1.5 - **Medium green**: log-loss = 0.0 - **Dark green**: log-loss = 1.5 - **Navy blue**: log-loss = 3.0 - **Dark blue**: log-loss = 4.5 --- ### Detailed Analysis 1. **Data Distribution**: - **High log-loss (4.5, dark blue)**: Concentrated in the **lower-left quadrant** (p ≈ 0–250, WRITE pulses ≈ 1000–3000). - **Moderate log-loss (3.0, navy blue)**: Scattered in the **lower-middle region** (p ≈ 0–500, WRITE pulses ≈ 500–2000). - **Low log-loss (1.5, dark green)**: Spread across **middle-right** (p ≈ 500–1000, WRITE pulses ≈ 200–1000). - **Near-zero log-loss (0.0, medium green)**: Clustered in the **upper-middle** (p ≈ 1000–1500, WRITE pulses ≈ 100–500). - **Best performance (log-loss = -1.5, light green)**: Dominates the **upper-right** (p ≈ 1500–1750, WRITE pulses ≈ 0–200). 2. **Trends**: - **Inverse relationship**: As p increases, the number of WRITE pulses decreases sharply (e.g., p=0 → ~3000 pulses; p=1750 → ~0 pulses). - **Log-loss improvement**: Higher p values correlate with progressively lighter colors (lower log-loss), indicating better model performance. - **Outliers**: A few dark blue points (log-loss=4.5) persist at p > 500, suggesting suboptimal performance in some cases despite higher p. --- ### Key Observations - **Performance vs. Efficiency**: Higher p values reduce computational cost (fewer WRITE pulses) while improving model accuracy (lower log-loss). - **Threshold Effect**: A significant drop in WRITE pulses occurs between p=500 and p=1000, coinciding with log-loss improvement from 3.0 to 1.5. - **Saturation**: Beyond p=1500, WRITE pulses plateau near zero, but log-loss remains slightly negative (-1.5), suggesting diminishing returns. --- ### Interpretation The data demonstrates that **probability scaling factor (p)** is a critical hyperparameter for balancing computational efficiency and model performance in recurrent neural networks. Increasing p reduces the number of WRITE pulses (hardware cost) while improving log-loss (model quality), with optimal performance achieved at p > 1500. However, the persistence of high log-loss (4.5) at moderate p values highlights the need for careful tuning to avoid suboptimal configurations. The scatter plot underscores the trade-off between resource usage and accuracy, guiding practitioners to prioritize higher p values for efficient, high-performing models. </details> ## 3 . 1 . 3 . 5 Methods For the chosen pattern generation task, the network consists of 100 input and 100 recurrent LIF neurons, along with one leaky-integrator output unit. The network receives a fixed-rate Poisson input, and the target pattern is a one-second-long sequence defined as the sum of four sinusoids ( 1 Hz, 2 Hz, 3 Hz and 5 Hz), with phases and amplitudes are randomly sampled from uniform distributions [ 0, 2 π ] and [ 0.5, 2 ] , respectively. Throughout the training, all layer weights { W in ji , W rec ji and W out kj }, are kept plastic and the device conductances are clipped between 0 . 1 and 12 µ S. This benchmark is adapted from [ 82 ]. We trained approximately 1000 different spiking RNNs for each weight update methods described in Section 3 . 1 . 3 . 2 . Each network shared the same architecture, except for synapse implementations, some hyperparameters and weight initialization. As each update method requires specific additional hyperparameters and can significantly affect network dynamics, we tuned these hyperparameters for each method using Bayesian optimization [ 177 ]. We selected the best performing network hyperparameters out of 1000 candidates based on performance over 250 epochs of the pattern generation task. ## 3 . 1 . 4 Discussion On-chip learning capability for RSNNs chips enables ultra-low-power intelligent edge devices with adaptation capabilities [ 178 , 179 ]. This work focused on evaluating the efficacy of non-von Neumann analog computing with non-volatile emerging memory technologies in implementing 5 Multi-memristor configurations are implemented assuming a 4 -bit resolution per memory cell. Hence N = 4 and N = 8 is equal to having 7 -bit and 8 -bit resolutions digital weights per synapse, respectively. updates calculated by the spatio-temporally local e-prop learning rule. This task is particularly challenging due to the need to preserve task-relevant information in the network's activation dynamics for extended periods, despite analog weight non-idealities and the truncated gradients inherent to e-prop. We developed a PyTorch-based PCM crossbar array simulation framework to evaluate four simplistic memristor update mechanisms. Through extensive hyperparameter optimization, we demonstrated that the mixed-precision update scheme yielded the best accuracy. This superior performance stems from the accumulation of instantaneous gradients on high-precision memory, enabling the use of a low learning rate. Consequently, the ideal weight update magnitude aligns closely with the minimum programmable PCM conductance change, leading to improved convergence. However, the mixed-precision scheme necessitates high-precision memory for gradient accumulation. This could potentially be addressed by incorporating a co-processor alongside the memristor crossbar array, as demonstrated previously [ 110 ]. Despite this requirement, gradient accumulation enables training with high learning rates, reduces the number of programming cycles together with the total energy to train. The synergy between memristor-based synapses and learning rules and neural architectures inherently capable of gradient accumulation is a promising avenue for further research. Memory resolution is a critical factor influencing learning performance, aligning with previous findings on mixed-precision learning [ 110 ]. However, increasing resolution often comes at the cost of larger synapses due to the increased number of devices. An alternative solution is to employ binary synapses with a stochastic rounding update scheme [ 180 ]. This approach can leverages the intrinsic cycle-to-cycle variability of memristive devices [ 143 ] to implement stochastic updates efficiently, effectively reducing the learning rate and guiding weight parameters towards their optimal binary values [ 181 , 182 ] From the computational neuroscience perspective, mixed-precision hardware resembles the concept of the cascade memory model in the neuroscience, where complex synapses hold a hidden state variable which only becomes visible after hitting a threshold [ 183 ]. Similar meta-plastic models has also recently been used for catastrophic forgetting [ 184 ]. To the best of our knowledge, this is the first report on the online training of RSNNs with e-prop learning rule based on realistic PCM synapse models. Our simulation framework enables benchmarking of common update circuits designed to cope with memristor non-idealities and demonstrates that accumulating gradients enhances PCM device programming reliability, reduces the number of programmed devices and outperforms other synaptic weight-update mechanisms. In the following Section 3 . 2 , we will present the implementation of the mixed-precision update scheme on a neuromorphic hardware with PCM devices. Later in Section 3 . 3 , I will introduce how eligibility traces, a crucial building block of many local learning rules, can be implemented with volatile memristive devices. ## 3 . 2 implementing online training of rsnns on a neuromorphic hardware Building upon the simulation results in the Section 3 . 1 , we now present an implementation of e-prop on a neuromorphic hardware with in-memory computing capabilities. We embed all the network weights directly onto physical PCM devices and control the training procedure with a hardware-in-the-loop setup. We utilize the HERMES chip [ 185 ] fabricated in 14 nm CMOS technology with 256 x 256 PCM crossbar arrays to validate the feasibility of on-chip learning under physical device constraints. Our experiments demonstrate that the mixed-precision training approach remains effective, achieving performance competitive with an FP 32 realization while simultaneously equipping the RSNN with online training capabilities and leveraging the ultralow-power benefits of the hardware. ## 3 . 2 . 1 From the simulation to an analog chip The HERMES chip features two in-memory computing cores comprising PCM devices [ 23 ], with conductances ( G ) controlled by individual SET and RESET pulses. A low-amplitude SET pulse (100 µ A, 600ns) gradually switches the material from amorphous to crystalline phase, while a high-amplitude RESET pulse (700 µ A, 600 ns) rapidly switches it to HRS. The crossbar array operates in a differential parallel setup, i.e., each element of the embedded weight matrices W is represented by four PCM devices in the following way: W ij = (( G + A + G + B ) -( G -A + G -B )) /2, indicated with the 8 T 4 R quantifier in the system diagram of Fig. 3 . 8 . The in-memory computing core allows feeding inputs, e.g., { x t , z t -1 , z t } , to the rows and through the application of Ohm's and Kirchhoff's law, it performs a fully parallel matrix-vector multiplication [ 23 ], approximately with a 4 -bit precision [ 186 ]. . . <details> <summary>Image 16 Details</summary> ![cbdc8b7b](/v1/image/cbdc8b7b31282633f649bf4c66836ae9d567411a14510093c6dd6953fc29fc20) ### Visual Description ## Block Diagram: 8T4R Array with ADC Outputs ### Overview The diagram illustrates a structured array of 8T4R (8 Transistor, 4 Resistor) blocks organized in a grid format. Each block is interconnected with horizontal and vertical lines, suggesting signal routing or data flow. At the bottom of the grid, multiple ADC (Analog-to-Digital Converter) labels are positioned, with an arrow indicating "Output for off-chip processing." The left side of the diagram is labeled "Inputs / PWM," implying input signals or pulse-width modulation sources. ### Components/Axes - **Grid Structure**: - Rows and columns of 8T4R blocks (exact count unspecified, represented by ellipses). - Each 8T4R block is a blue rectangular component with white text. - **Labels**: - **Left Side**: "Inputs / PWM" (vertical orientation). - **Bottom**: "Output for off-chip processing" (horizontal arrow pointing right). - **ADC Labels**: Repeated "ADC" labels at the bottom, spaced evenly (exact count unspecified, represented by ellipses). - **Connections**: - Black lines connect 8T4R blocks horizontally and vertically, forming a matrix-like network. - Vertical lines on the far left and right edges of the grid suggest input/output pathways. ### Detailed Analysis - **8T4R Blocks**: - Arranged in a grid with at least 3 visible rows and 3 visible columns (ellipses indicate continuation). - Each block is identical in labeling ("8T4R") and connected to adjacent blocks via horizontal and vertical lines. - **ADC Outputs**: - ADC labels are positioned at the bottom of the grid, aligned with the 8T4R blocks. - The arrow labeled "Output for off-chip processing" suggests data from the ADCs is routed externally. - **Input/Output Flow**: - "Inputs / PWM" on the left implies signals enter the grid from the left side. - The grid processes these signals through the 8T4R blocks, which then route data to the ADCs at the bottom. ### Key Observations - The grid’s repetitive 8T4R blocks suggest a modular or parallel processing architecture. - The absence of numerical values or explicit scaling indicates the diagram focuses on structural relationships rather than quantitative data. - The ADC labels and off-chip output arrow imply the system’s purpose is signal conversion and external data transmission. ### Interpretation This diagram likely represents a hardware architecture for signal processing, where: 1. **Inputs** (PWM or analog signals) enter the system and are processed through a matrix of 8T4R blocks. 2. The 8T4R blocks may perform functions like amplification, filtering, or modulation, given their transistor-resistor configuration. 3. Processed signals are converted to digital via ADCs and sent for off-chip processing (e.g., computation, storage, or communication). 4. The grid structure emphasizes scalability, with ellipses suggesting the system can expand horizontally/vertically. The diagram does not provide numerical data, so trends or quantitative analysis are not applicable. The focus is on the logical flow of signals through the hardware components. </details> . . F igure 3 . 8 : Illustration of the working principle of phase-change devices and their integration into a crossbar architecture. a , Phase switching behavior of PCM devices. SET and RESET pulses can be used to transition between the amorphous and the crystalline phase. b , The PCM devices from a can be incorporated into a crossbar array structure. In our work, four PCM devices are used in a differential manner to represent each weight element Similar to Section 3 . 1 , we tested the network on a 1 D continual pattern regression task, using an RSNN architecture with 100 recurrently connected LIF neurons and one leaky-integrator output unit (Eq. 3 . 1 ) Again, the RSNN is driven by fixed-rate Poisson spikes from 100 input neurons, with a membrane decay time constant of 30 ms for both recurrent and output units. To optimize resources allocation, we strategically embedded trainable parameters W in , W rec and W out, within Core 1 , and the feedback matrix B out within Core 2 . This architectural design, illustrated in Fig. 3 . 9 , enables network inference on one core and error-based learning signal generation on the other. We use e-prop for computing the approximated gradients for network weights and leverage the mixed-precision algorithm to accumulate gradients in a high-precision unit. This unit also serves to store the dataset, to compute neuron activations and eligibility traces, and to calculate network error. F igure 3 . 9 : Task illustration and realization of the network on neuromorphic hardware. Realization of the regression task on a neuromorphic hardware using two cores with PCM crossbar arrays. The trainable parameters W in , W rec and W out are placed into Core 1 and B out is placed into Core 2 . The neuromorphic is used to perform the matrix-vector multiplication required to compute the network activity and the learning signal. A high-precision unit is used to track the individual gradients which are applied to the trainable parameters. <details> <summary>Image 17 Details</summary> ![38a7f528](/v1/image/38a7f528a4d1406a7ce8d4924f171114dcb838333ac5b5633f4e89ec5ef07abb) ### Visual Description ## Diagram: Neuromorphic Computing System Architecture ### Overview The diagram illustrates a neuromorphic computing system for model inference and learning. It depicts three interconnected computational cores (CORE #1 and CORE #2) performing matrix-vector multiplication (MVM) operations, error calculation, gradient computation, and weight matrix updates. The system integrates neuromorphic hardware with a high-precision coprocessor to process scaled inputs, compute predictions, and refine weight matrices via backpropagation-like mechanisms. ### Components/Axes 1. **CORE #1 (Computation of Network Activity)** - **Inputs**: `Win`, `Wrec`, `Wout` (weight matrices). - **Processing Units**: Grid of `8T4R` blocks (likely memristive devices) connected via PMM (Power Management Module) and ADC (Analog-to-Digital Converter). - **Output**: `G → W` (transformed input to weight matrix). - **Connections**: Links to neuromorphic hardware, high-precision coprocessor, and dataset. 2. **CORE #2 (Computation of Learning Signal)** - **Inputs**: `B_out^T` (transposed output weight matrix). - **Processing Units**: Similar `8T4R` blocks and ADC. - **Output**: Error signal `δ` (difference between target `y*` and prediction `y`). - **Connections**: Feeds into gradient computation and weight update. 3. **CORE #1 (Update Weight Matrices)** - **Inputs**: Same `Win`, `Wrec`, `Wout` as in the first CORE #1. - **Processing**: Application of `SET` pulses to update weight matrices. - **Connections**: Receives gradient computation results from CORE #2. 4. **High-Precision Coprocessor** - Handles dataset input and model inference. - Connected to all cores via feedback loops. 5. **Key Signals** - `δ` (error), `B_out^T δ` (gradient), `y*` (target output). ### Detailed Analysis - **CORE #1 (Network Activity)**: - Scaled input `x` is processed through `8T4R` blocks (memristive devices) to compute network activity. - Output `G → W` represents the transformed input for further processing. - **CORE #2 (Learning Signal)**: - Computes error `δ = y* - y` using the target `y*` and prediction `y`. - Transposes the output weight matrix (`B_out^T`) to calculate the gradient `B_out^T δ`. - **Weight Update (CORE #1)**: - Applies `SET` pulses to adjust weight matrices (`Win`, `Wrec`, `Wout`) based on gradient signals. - **Hardware Integration**: - Neuromorphic hardware (e.g., `8T4R` blocks) handles low-precision, energy-efficient computations. - High-precision coprocessor manages dataset and model inference, ensuring accuracy. ### Key Observations 1. **Modular Design**: The system uses separate cores for network activity, learning signal computation, and weight updates, enabling parallel processing. 2. **Feedback Loops**: Error `δ` and gradient `B_out^T δ` propagate between cores, mimicking backpropagation in neural networks. 3. **Hardware-Software Synergy**: Combines neuromorphic efficiency (via `8T4R` blocks) with high-precision coprocessor accuracy. 4. **SET Pulses**: Likely represent memristor resistance adjustments, critical for weight updates in neuromorphic systems. ### Interpretation This architecture demonstrates a neuromorphic implementation of a learning system, where: - **CORE #1** focuses on forward propagation (network activity). - **CORE #2** handles backward propagation (error and gradient computation). - **CORE #1 (Update)** applies hardware-specific updates (SET pulses) to refine weights. The integration of `8T4R` blocks suggests a focus on energy-efficient, parallelizable computations, while the high-precision coprocessor ensures numerical stability. The system’s design aligns with principles of spiking neural networks or memristive-based machine learning, emphasizing low-power, adaptive learning. **Note**: No numerical data or explicit trends are present; the diagram emphasizes structural relationships and computational flow. </details> Before transitioning to the hardware testing, a comprehensive investigation of network performance was conducted within a simulation framework, subjecting the network to various hardware constraints. This involved utilizing weights with varying precision levels, including 32 -bit floating point (SW 32 bit), 8 -bit and 4 -bit fixed-point (SW 8 bit and SW 4 bit), and weights based on a PCM model (SW PCM), as depicted in Fig. 3 . 2 . The results, illustrated in Figure 3 . 10 , were obtained using four distinct random seeds. Notably, the performance metrics using 32 -bit and 8 -bit precision were comparable, affirming the robustness of the network under reduced precision. The network's performance remained resilient even with 4 -bit ( E = 3.93 ± 0.58 ) or simulated PCM ( E = 3.72 ± 0.53 ). Although a slight degradation was observed with hardware ( E = 5.59 ± 1.29 ), the network succeded in reproducing the target pattern (Fig. 3 . 10 b). The individual subpanels within this figure effectively visualize the patterns at different stages of training: the beginning, after 30 iterations, and at the end of training. The histograms in Fig. 3 . 10 c show the distributions of trainable matrices before and after training. Notably, modifications in the output weights were instrumental in achieving accurate pattern generation. The average firing rate was maintained at approximately 10 Hz through regularization. This ensured sparsity in both communication and programming pulses, crucial for energy efficiency and extending device longevity. The sparse application of SET pulses, evident in Figure 3 . 10 d, further underscores this efficiency. Specifically, only 2 -3 pulses per iteration were required for the output weight matrix, even though it comprised 100 elements. This highlights the effectiveness of the gradient accumulation in reducing the number of programming pulses and, consequently, the overall energy consumption. ## 3 . 2 . 2 Discussion We have demonstrated that training RSNNs with the e-prop local weight update rule, using a hardware-in-the-loop approach, can be robust to both limited computational precision and analog imperfections inherent to memristive devices. Furthermore, our experiments show that this system achieves performance competitive with conventional software implementations with full-precision. The RSNN neurons exhibited a low firing rate, and mixed-precision training significantly reduced the number of PCM programming pulses. This reduction in programming activity not only enhances energy efficiency but also mitigates potential endurance issues associated with frequent device switching. Our findings enable RSNNs trained with biologically-inspired algorithms to be deployed on memristive neuromorphic hardware for sparse and online learning. In the following Section 3 . 3 , we will present how eligibility traces can be implemented with volatile memristive devices, enabling more scalable credit assignment on neuromorphic hardware. F igure 3 . 10 : Results of our training approach realized with different synapse models. a , Evolution of the squared error using different synapse models, averaged over 4 random initializations. b , Visual comparison of the network output and the target pattern, at the beginning of training (first subpanel), after 30 training iterations (second subpanel) and after training (third subpanel). c , Histogram of the trainable weight matrices before and after training. d , Number of SET pulses applied to the individual weight matrices. <details> <summary>Image 18 Details</summary> ![2b6baa0b](/v1/image/2b6baa0bf7b6e5bc6b69f01c6ddd4534ec4576aa2167c8f015a7a6608006e7f0) ### Visual Description ## Line Graphs and Histograms: Error Trends, Weight Distributions, and Programming Pulses ### Overview The image contains four panels (a-d) presenting technical data related to computational models. Panel **a** shows error reduction over iterations for different system configurations. Panels **b** display amplitude tracking against a target signal. Panels **c** and **d** visualize weight distributions and programming pulse counts across iterations. All visualizations use distinct color coding for data series. --- ### Components/Axes #### Panel a (Error vs. Iterations) - **X-axis**: Iteration (0–400) - **Y-axis**: Error (0–70) - **Legend**: - SW 32 bit (blue) - SW 8 bit (green) - SW 4 bit (orange) - SW PCM (red) - HW PCM (purple) - **Key Elements**: - Dashed lines represent target error thresholds. - Shaded regions indicate confidence intervals. #### Panel b (Amplitude Tracking) - **X-axis**: Time (ms, 0–1000) - **Y-axis**: Amplitude (-1 to 1) - **Legend**: - Target (dashed black) - HW PCM (solid purple) - **Key Elements**: - Three subplots show amplitude deviations across time. #### Panel c (Weight Distributions) - **X-axis**: Weight value (-0.5 to 0.5 for input/recurrent; -1 to 1 for output) - **Y-axis**: Number of weights (0–3000 for input/recurrent; 0–25 for output) - **Legend**: - Init (red) - Final (blue) - **Key Elements**: - Histograms compare initial and final weight distributions. #### Panel d (Programming Pulses) - **X-axis**: Iteration (0–400) - **Y-axis**: Number of programming pulses (0–60) - **Legend**: - Input weights (blue) - Recurrent weights (blue) - Output weights (blue) - **Key Elements**: - Line plots track pulse counts over iterations. --- ### Detailed Analysis #### Panel a - **Trend**: All lines show rapid error reduction in early iterations, stabilizing near 0 after ~200 iterations. - **Key Data Points**: - SW 32 bit: Error drops below 10 by ~100 iterations. - SW 4 bit: Error remains highest (~20–30) throughout. - HW PCM: Error converges fastest (~5 by 200 iterations). #### Panel b - **Trend**: HW PCM amplitude closely follows the target (dashed line) with minimal deviation. - **Key Observations**: - Subplot 1: Largest amplitude overshoot (~0.2) at ~300 ms. - Subplot 3: Smallest deviation, staying within ±0.1 of target. #### Panel c - **Input/Recurrent Weights**: - Initial (red) and final (blue) distributions are symmetric around 0. - Final distributions are narrower, indicating weight convergence. - **Output Weights**: - Initial distribution is bimodal (peaks at ±0.5). - Final distribution is unimodal, centered near 0 with reduced variance. #### Panel d - **Trend**: All weight types show decreasing programming pulses over iterations. - **Key Data Points**: - Input weights: Pulses drop from ~50 to ~10 by 400 iterations. - Output weights: Spikes at ~200 and ~350 iterations (max ~30 pulses). --- ### Key Observations 1. **Error Reduction**: SW 32 bit and HW PCM outperform lower-bit configurations, with HW PCM achieving the lowest error. 2. **Weight Convergence**: Final weight distributions (blue) are tighter than initial (red), suggesting stable model training. 3. **Amplitude Fidelity**: HW PCM maintains amplitude within 5% of the target signal across all time points. 4. **Programming Efficiency**: Input weights require the most pulses initially but stabilize quickly, while output weights exhibit transient spikes. --- ### Interpretation - **System Performance**: The data suggests HW PCM optimizes both error reduction and amplitude tracking, likely due to hardware-level parallelism or precision advantages. - **Weight Dynamics**: Convergence of input/recurrent weights implies regularization or adaptive learning mechanisms, while output weight stabilization indicates effective output layer tuning. - **Pulse Efficiency**: The sharp decline in programming pulses for input weights aligns with early-stage learning dominance, whereas output weight spikes may reflect fine-tuning phases. - **Anomalies**: The output weight spike at ~350 iterations (panel d) could indicate a transient adjustment phase or hardware-specific optimization trigger. This analysis highlights the trade-offs between software-based weight configurations and hardware-accelerated PCM, with HW PCM demonstrating superior performance across metrics. </details> ## 3 . 3 scalable synaptic eligibility traces with volatile memristive devices Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip learning mechanisms. Recently, a new class of three-factor spike-based learning rules have been proposed that can solve the temporal credit assignment problem and approximate the error back-propagation algorithm on complex tasks. However, the efficient implementation of these rules on hybrid CMOS/memristive architectures is still an open challenge. Here we present a new neuromorphic building block, called PCM-trace, which exploits the drift behavior of phase-change materials to implement long-lasting eligibility traces, a critical ingredient of threefactor learning rules. We demonstrate how the proposed approach improves the area efficiency by > 10 × compared to existing solutions and demonstrates a technologically plausible learning algorithm supported by experimental data from device measurements. ## 3 . 3 . 1 Introduction Neuromorphic engineering uses electronic analog circuit elements to implement compact and energy-efficient intelligent cognitive systems [ 187 -190 ]. Leveraging substrate's physics to emulate biophysical dynamics is a strong incentive toward ultra-low power and real-time implementations of neural networks using mixed-signal memristive event-based neuromorphic circuits [ 144 , 191 -193 ]. The majority of these systems are currently deployed in edge-computing applications only in inference mode , in which the network parameters are fixed. However, learning in edge computing can have many advantages, as it enables adaptation to changing input statistics, sensory degradations, reduced network congestion, and increased privacy. Indeed, there have been multiple efforts implementing Spike-Timing Dependent Plasticity (STDP)-variants and Hebbian learning using neuromorphic processors [ 168 , 181 , 194 ]. These methods control LongTerm Depression (LTD) or Long-Term Potentiation (LTP) by specific local features of pre- and post-synaptic activities. However, local learning rules themselves do not provide any guarantee that network performance will improve in multi-layer or recurrent networks. Local error-driven approaches, e.g., the Delta Rule, aim to solve this problem but fail to assign credit for neurons that are multiple synapses away from the network output [ 195 , 196 ]. On the other hand, it has been recently shown that by using external third-factor neuromodulatory signals (e.g., reward or prediction error in reinforcement learning, teaching signal in supervised learning), this can be achieved in hierarchical networks [ 197 , 198 ]. However, there needs to be a mechanism for synapses to remember their past activities for long periods of time, until the reward event or teacher signal is presented. In the brain, these signals are believed to be implemented by calcium ions, or CAMKII enzymes in the synaptic spine [ 199 ] and are called eligibility traces. In machine learning, algorithmic top-down analysis of the gradient descent demonstrated how local eligibility traces at synapses allow networks to reach performances comparable to error back-propagation algorithm on complex tasks [ 75 , 85 , 200 ]. There are already neuromorphic platforms that support the synaptic eligibility trace implementation such as Loihi [ 89 ], BrainScales [ 90 ] or SpiNNaker [ 91 ]. The learning (weight update) in these platforms is only supported through the use of digital processors, hence the numerical trace calculation leads to extremely memory intensive operations forming a von Neumann bottleneck [ 92 , 93 ]. Especially when the eligibility trace calculation is per synapse (instead of per neuron) the memory overhead quickly becomes overwhelming as the number of traces scales quadratically with the number of neurons in the network. And unlike the convolutional architectures on digital neuromorphic processors, where weight sharing reduces the memory bandwidth, the eligibility traces cannot be shared due to their activity dependent nature. On the other hand, mixed-signal neuromorphic processors that perform in-memory computation can emulate the desired neural and synaptic dynamics using the physics of the analog substrate [ 201 , 202 ]. Differential-Pair Integrator (DPI) based circuits [ 202 , 203 ] which rely on T able 3 . 3 : Eligibility Traces in Gradient-Estimating Learning Rules | Learning rule | Pre-synaptic terms | Post-synaptic terms | |----------------------------|----------------------|-----------------------| | e-prop (LIF) [ 82 ] | x j | - | | e-prop (ALIF) [ 82 ] | x j | ψ i | | Sparse RTRL [ 83 ] | x j | - | | BDSP [ 39 ] | x j | - | | SuperSpike [ 85 ] | ϵ ∗ x j | ψ i | | Sparse Spiking G.D. [ 86 ] | x j | - | accumulating volatile information on capacitors, in principle can be used to implement eligibility traces. Recently, a substantial progress has been made [ 94 ] in implementing slow-dynamics DPI synapse circuits using advanced Fully Depleted Silicon-On-Insulator (FDSOI) technologies. By combining reverse body biasing and self-cascoding techniques [ 204 ], these circuits can achieve ∼ 6 s long synaptic traces [ 94 ]. However, resulting area due to large capacitor sizes and areadependent leakage due to Alternate Polarity Metal On Metal (APMOM) structures hinder the scalability of such hardware implementations. In this work, we present a novel approach to exploit the drift behavior of PCM devices to intrinsically perform eligibility trace computation over behavioral timescales. We present the PCM-trace building block as a hybrid memristive-CMOS circuit solution that can lead to recordlow area requirements per synapse. To the best of our knowledge, this is the first work that uses a memristive device not only to store the weight of synapses, but also to keep track of synaptic eligibility to interact with a third factor toward scalable next-generation on-chip learning. ## Eligibility traces in machine learning and neuroscience Eligibility trace is a decaying synaptic state variable that tracks recency and frequency of synaptic events, as described in Eq. ( 3 . 5 ). The trace state, e ij , of the synapse between pre-synaptic neuron j and post-synaptic neuron i is controlled by usually a linear function of pre-synaptic spiking activity f j ( xj ) and non-linear function of the post-synaptic activity gi ( xi ) , such that $$\frac { d e _ { i j } } { d t } = - \tau _ { e } e + \eta f _ { j } \left ( x _ { j } \right ) g _ { i } \left ( x _ { i } \right ) & & ( 3 . 5 )$$ , where τ e is the time-constant of the trace and η is the constant scaling factor [ 197 ]. The function of eligibility trace is to keep the temporal correlation history of f j ( xj ) and gi ( xi ) available in the synapse, by accumulating instantaneous correlation events, called synaptic tags . From the top-down gradient-based machine learning perspective, various learning rules require eligibility trace functionality as a part of the network architecture and specifies their own synaptic tag requirements, f j · gi , (i.e., what information to accumulate on the synapse). Table 3 . 3 summarizes some of the recently developed biologically-plausible, local learning rules employing eligibility traces within the supervised learning framework. Most of the listed learning rules accumulate pre-synaptic events, xj , seldom further smoothed with ϵ (causal membrane kernel), whereas some learning rules additionally require the information of a surrogate partial derivative function of post-synaptic state, ψ i . By approximating the gradient-based optimization for spiking neural networks, these learning rules support competitive performance on standard audio and image classification datasets such as TIMIT Dataset [ 205 ], Spiking Heidelberg Digits [ 206 ], Fashion-MNIST [ 207 ], Neuromorphic-MNIST [ 208 ], CIFAR10 [ 209 ] and even ImageNet [ 210 ]. In the RL, the advantages of having long-lasting synaptic eligibility traces are more evident. Eligibility traces carry the temporal synaptic tag information into the future, allowing backward view when the sparse reward arrives from the environment [ 211 ]. By doing so, eligibility traces assist solving the distal reward problem [ 88 ] (how the brain assigns credits/blame for neurons if the activity patterns responsible for the reward are no longer exist when the reward arrives) by bridging the milli-second neuronal timescales and second-long behavioral timescales. Almost any Temporal Difference (TD) method e.g., Q-learning or SARSA can benefit from eligibility traces to learn more efficiently [ 211 ]. The policy gradient methods utilizing the temporal discounting operation e.g., discounted reward [ 211 ] or discounted advantage [ 212 ], naturally demands the synaptic eligibility traces. Moreover, some on-policy and off-policy RL models can explain behavioral and physiological experiments on multiple sensory modalities, only if they are equipped with synaptic eligibility traces with > 10 s decay time [ 213 ]. Eligibility traces are also deeply rooted in neurobiology. The synaptic machinery that implements the eligibility trace might be calcium-based mechanisms in the spine e.g., CamKII [ 214 , 215 ], or metastable transient state of molecular dynamics inside the synapse [ 216 ]. In the visual and frontal cortex, in-vivo STDP experiments suggest that pre-before-post pairings induces a synaptic tagging that decays over ∼ 10 s and result in LTP with the arrival of the third-factor neuromodulator noradrenaline [ 217 ]. In the hippocampus, Brzosko, Schultz & Paulsen [ 218 ] found that post-before-pre pairings can result in LTP if the third-factor neuromodulator dopamine arrives in the range of minutes. In summary, both top-down approaches following machine learning principles and bottom-up approaches built upon in-vivo and in-vitro synaptic plasticity experiments imply the importance of having eligibility traces in neural architectures. ## 3 . 3 . 2 PCM-trace: Implementing eligibility traces with PCM drift ## 3 . 3 . 2 . 1 PCM Measurements Temporal evolution of electrical resistivity is a widely-observed phenomenon in PCM due to the rearrangements of atoms in the amorphous phase [ 219 ]. This behavior is commonly referred to as structural relaxation or drift. To start the drift, a strong RESET pulse is applied to induce a crystalline to amorphous phase transition where the PCM is melted and quenched. The lowordered and highly-stressed amorphous state then evolves to a more energetically favorable glass state within tens of seconds [ 112 ]. At constant ambient temperature, the resistivity follows $$R ( t ) = R ( t _ { 0 } ) \left ( \frac { t } { t _ { 0 } } \right ) ^ { \nu } , \quad \, \begin{array} { l } { { ( 3 . 6 ) } } \end{array}$$ where R ( t 0 ) is the resistance measured at time t 0 and ν is the drift coefficient. It has been experimentally verified by many groups that Eq. ( 3 . 6 ) can successfully capture the drift dynamics [ 112 , 156 , 220 ], from microseconds to hours range [ 221 ]. F igure 3 . 11 : Experimental (dots), and simulated (dashed lines) resistance drift characteristics at constant room temperature. <details> <summary>Image 19 Details</summary> ![1ee66e27](/v1/image/1ee66e2789b3985cbaa69bdcfb7d581f65901e7eb4b3e119ceb2129ad310acf4) ### Visual Description ## LineGraph: Resistance vs. Time with Experimental and Model Data ### Overview The image is a line graph depicting the relationship between resistance (in megaohms, Ω × 10⁶) and time (in seconds). It includes three data series: "Experiment" (dotted line), "Model" (dashed line), and another "Model" (solid line). A small inset diagram illustrates a voltage pulse with a labeled "V_RESET" and a 100 ns duration. The graph emphasizes the convergence of experimental and modeled resistance trends over time. --- ### Components/Axes - **Y-axis**: "Resistance (Ω)" with a scale of 10⁶ (megaohms), ranging from 2 × 10⁶ to 5.5 × 10⁶ Ω. - **X-axis**: "Time (s)", ranging from 0 to 30 seconds. - **Legend**: - "Experiment" (dotted line) - "Model" (dashed line) - "Model" (solid line) — *Note: The legend only explicitly lists two entries, but the graph contains three data series. This discrepancy may indicate an error in the legend or an unlabelled third model.* - **Inset Diagram**: A voltage pulse with a rectangular waveform labeled "V_RESET" and a 100 ns duration. --- ### Detailed Analysis 1. **Experiment (Dotted Line)**: - Starts at ~2.5 × 10⁶ Ω at 0 seconds. - Increases gradually, reaching ~5 × 10⁶ Ω by 30 seconds. - Shows minor fluctuations but follows a consistent upward trend. 2. **Model (Dashed Line)**: - Starts at ~2 × 10⁶ Ω at 0 seconds. - Increases steadily, aligning closely with the experimental data by 30 seconds. - Slightly lags the experimental curve in the early stages (0–10 seconds). 3. **Model (Solid Line)**: - Starts at ~3 × 10⁶ Ω at 0 seconds. - Increases more rapidly than the dashed model, reaching ~5.5 × 10⁶ Ω by 30 seconds. - Diverges significantly from the experimental and dashed model lines. 4. **Inset Diagram**: - A voltage pulse with a sharp rise and fall, labeled "V_RESET" (likely the voltage threshold for resistance change). - Duration of 100 ns (0.1 microseconds) is explicitly marked. --- ### Key Observations - **Convergence of Experiment and Model**: The dashed "Model" line closely matches the experimental data, suggesting the model accurately predicts the observed resistance behavior. - **Third Model (Solid Line)**: This line diverges from the experimental and dashed model, indicating either a different model configuration, experimental condition, or an error in the legend. - **Voltage Pulse Context**: The inset suggests the resistance changes may be triggered by a short-duration voltage pulse (100 ns), possibly simulating a switching event. --- ### Interpretation The graph demonstrates that resistance increases over time, likely due to a physical process (e.g., oxidation, material degradation, or electrical stress). The experimental data and the dashed model align well, validating the model's accuracy. The solid "Model" line, however, deviates significantly, raising questions about its parameters or the conditions under which it was derived. The inset's 100 ns voltage pulse implies the resistance changes may be transient or triggered by specific electrical stimuli. This could relate to applications like resistive memory devices, where voltage pulses induce state changes. The divergence of the solid model highlights the need for further validation or clarification of its assumptions. </details> With the collaboration of CEA-LETI, we integrated Ge 2 Sb 2 Te 5 -based PCM in state-of-the-art PCM heater-based devices fabricated in the Back-End-Of-Line (BEOL) based on 130 nm CMOS technology. The PCM thickness is 50 nm with the bottom size of 3600 nm 2 . Drift measurements F igure 3 . 12 : Accumulating eligibility trace using PCM-trace drift model (Eq. 3 . 7 ). After resetting the PCM-trace device at t = 0, 5 random synaptic tags are applied to the synapse, implemented by a gradual SET for each tag that results in 50% increase in the conductivity. The device can keep the eligibility trace for more than 10 s. <details> <summary>Image 20 Details</summary> ![0811c564](/v1/image/0811c5647c9826ce71ab4f31976333ee24598383942fad202fdf28dbba3ec3f9) ### Visual Description ## Line Graph: Conductance Dynamics Over Time ### Overview The image depicts a line graph titled "Higher Baseline," illustrating changes in conductance (measured in microsiemens, μS) over time (seconds, s). Two data series are plotted: a solid black line labeled "Accumulating e-trace" and a dotted gray line labeled "Synaptic tagging." The graph includes a note in the upper-right corner stating "Higher Baseline," suggesting an elevated initial conductance level. --- ### Components/Axes - **Y-Axis**: Labeled "Conductance (μS)" with a scale from 0.0 to 1.0 in increments of 0.1. - **X-Axis**: Labeled "Time (s)" with a scale from 0 to 12 seconds in increments of 2. - **Legend**: Positioned on the right side of the graph, with: - Solid black line: "Accumulating e-trace" - Dotted gray line: "Synaptic tagging" - **Gridlines**: Horizontal and vertical gridlines span the plot area for reference. --- ### Detailed Analysis 1. **Accumulating e-trace (Solid Black Line)**: - **Initial Drop**: Starts near 1.0 μS at ~0.5 seconds, sharply declines to ~0.5 μS by ~1.5 seconds. - **Stabilization**: Remains flat at ~0.5 μS from ~2 seconds onward. - **Notable Feature**: A vertical dotted line at ~1.5 seconds marks the end of the sharp decline. 2. **Synaptic Tagging (Dotted Gray Line)**: - **Initial Rise**: Begins at ~0.5 μS, peaks at ~0.6 μS around ~1 second. - **Decline**: Drops back to ~0.5 μS by ~2 seconds, then remains flat. - **Notable Feature**: Overlaps with the initial drop of the "Accumulating e-trace" line. 3. **Higher Baseline Note**: - Text in the upper-right corner indicates an elevated baseline conductance, likely referring to the starting point of the "Accumulating e-trace" line (~1.0 μS). --- ### Key Observations - The "Accumulating e-trace" exhibits a rapid, transient decrease in conductance, followed by stabilization. - The "Synaptic tagging" line shows a smaller, brief increase in conductance, suggesting a distinct dynamic process. - The "Higher Baseline" note implies the initial conductance level was intentionally elevated compared to a standard baseline. --- ### Interpretation - **Mechanistic Insight**: The sharp decline in "Accumulating e-trace" may reflect a synaptic response (e.g., neurotransmitter release or receptor activation) that quickly adapts to a new equilibrium. The "Synaptic tagging" line’s transient rise could indicate a secondary process, such as synaptic plasticity or memory formation, triggered by the initial event. - **Temporal Relationship**: The overlap between the two lines at ~1 second suggests a causal or correlative relationship, where synaptic tagging may depend on the initial e-trace dynamics. - **Baseline Significance**: The "Higher Baseline" note implies the experiment was designed to study conductance changes under non-standard conditions, potentially to enhance sensitivity to subtle synaptic events. This graph highlights the interplay between rapid synaptic responses and slower, tagging mechanisms, with the elevated baseline amplifying detectable changes. </details> were performed on three devices to monitor the temporal evolution of the resistance in the HRS state, particularly confirming the model in Eq. ( 3 . 6 ). The test was conducted by first resetting all the cells by applying a RESET pulse to the heater, which has a width of 100 ns with 5 ns rising and falling times, and a peak voltage of 1.85 V. Then, an additional programming pulse is used to bring the devices to different initial conditions, corresponding to R ( t = 1 s ) = [ 1.77 M Ω , 2.39 M Ω , 2.89 M Ω ] . The low-field device resistances are measured every 1 s for 30 s by applying a READ pulse which has the same timing of the RESET pulse but a peak voltage of 0.05 V. PCM-trace is a novel method to implement seconds-long eligibility trace for the synapse using the drift feature of PCM. By writing Eq. ( 3 . 6 ) as a difference equation of the conductance, we can show that the temporal evolution of the conductance has decay characteristics similar to Eq. ( 3 . 5 ) such that G t + ∆ t ij = ( t -t p t -t p + ∆ t ) ν G t ij , where G t 0 ij = 1/ R t 0 ij , and t p is the last programming time as drift re-initializes with every gradual SET [ 81 , 222 ]. The main difference is that the rate of change in PCM resistivity is a function of time; nevertheless, its time constant is comparable for behavioral time-scales as τ PCM = -∆ t / log (( t / ( t + ∆ t )) ν ) is on the order of tens of seconds [ 213 ]. Therefore, the PCM-trace dynamics can emulate the eligibility trace of the synapse as follows: $$G _ { i j } ^ { t + \Delta t } = \left ( \frac { t - t _ { p } } { t - t _ { p } + \Delta t } \right ) ^ { \nu } G _ { i j } ^ { t } + \eta f _ { j } ( x _ { j } ^ { t } ) g _ { i } ( x _ { i } ^ { t } )$$ In the PCM-trace method (Eq. 3 . 7 ), the accumulating term on the eligibility trace is implemented by applying a gradual SET to the PCM device whenever the synapse is tagged. To maximize the number of accumulations a PCM device can handle without getting stuck in the LRS regime, some operational conditions need to be satisfied. We initialize the device to HRS by applying a strong RESET pulse, and wait for an initialization time t init of at least 250 ms for the device resistance to increase. If t init is too short, the device conductance would still be too high to be able to accumulate enough tags; and if it is too long, the decay will be weaker (see Eq. 3 . 6 ). Initialization time can be modulated to reach the desired drift speed depending on the material choice and the application. After the initialization time, whenever the synapse is tagged, a single gradual SET (with an amplitude of 100 µA and a pulse width of 100 ns with 5 ns rising and falling times) is applied. To make sure that the device stays in the HRS, a read-verify-set scheme can be used. Finally, the value of the eligibility trace can be measured after seconds by reading the conductance of the device (see Fig. 3 . 12 ). ## 3 . 3 . 3 Multi PCM-trace: Increasing the dynamic range of traces The number of gradual SET pulses applied to a single PCM-trace device is limited, because each pulse partially increases the device conductivity and eventually move the device toward its LRS ( < 2 M Ω ), where the drift converges to a higher baseline level. This problem can be solved by storing the synaptic eligibility trace distributed across multiple PCM devices, as in Fig. 3 . 13 . F igure 3 . 13 : Multi PCM-trace concept. Each synapse has a weight and a PCM-trace block where multiple parallel PCM devices keep the eligibility trace of the synapse with their natural drift behavior. <details> <summary>Image 21 Details</summary> ![6b36440d](/v1/image/6b36440d057dcb18e2a3ef0e822900e3333bbdca3981ff83d1c77562fe20a6db) ### Visual Description ## Diagram: Neural Synapse with Memristive Synaptic Plasticity ### Overview The diagram illustrates a computational model of synaptic transmission and plasticity in a neural network. It depicts the interaction between a presynaptic neuron, synaptic efficacy, memristive devices representing synaptic eligibility traces, and a postsynaptic neuron. The flow of information and synaptic modification mechanisms are explicitly labeled. --- ### Components/Axes 1. **Presynaptic Neuron** - Labeled with `x_j` (input signal). - Positioned on the left side of the diagram. 2. **Synaptic Efficacy** - Labeled as `W_ij` (synaptic weight). - Located in the upper section of the dashed "Synapse" box. 3. **Synaptic Eligibility Trace** - Represented by `N parallel memristor devices`. - Positioned below the synaptic efficacy label. - Conductance values denoted as `G_n` (n = 1 to N). 4. **Postsynaptic Neuron** - Labeled with `x_i` (output signal). - Positioned on the right side of the diagram. 5. **Current Flow** - `I_i = W_ij * x_j` (input current to postsynaptic neuron). - `e_ij = Σ G_n` (sum of memristor conductances). --- ### Detailed Analysis - **Presynaptic to Postsynaptic Pathway**: The presynaptic neuron (`x_j`) transmits a signal to the synaptic efficacy (`W_ij`), which modulates the current (`I_i`) delivered to the postsynaptic neuron (`x_i`). - **Synaptic Eligibility Trace**: The `N parallel memristor devices` represent a distributed synaptic eligibility trace. Their combined conductance (`e_ij = Σ G_n`) likely encodes the history of synaptic activity, critical for plasticity rules like spike-timing-dependent plasticity (STDP). - **Mathematical Relationships**: - Input current: `I_i = W_ij * x_j` (linear relationship between synaptic weight and presynaptic input). - Eligibility trace summation: `e_ij = Σ_{n=1}^N G_n` (parallel summation of memristor conductances). --- ### Key Observations 1. **Parallel Memristor Architecture**: The use of `N parallel memristor devices` suggests a hardware implementation of synaptic plasticity, where each device tracks a component of the eligibility trace. 2. **Bidirectional Flow**: The dashed box labeled "Synapse" encloses both the synaptic efficacy (`W_ij`) and eligibility trace, indicating their interdependence in synaptic modification. 3. **Signal Propagation**: The presynaptic signal (`x_j`) directly influences the postsynaptic neuron via `W_ij`, while the eligibility trace (`e_ij`) modulates future synaptic updates. --- ### Interpretation This diagram represents a **hardware-software co-design** for neuromorphic computing, where memristive devices emulate biological synaptic plasticity. The parallel memristor array (`e_ij`) likely implements a distributed eligibility trace, enabling efficient computation of synaptic updates. The relationship `I_i = W_ij * x_j` mirrors biological synaptic transmission, while the summation of `G_n` reflects the integration of multiple plasticity signals. The model emphasizes **synaptic weight adaptation** (`W_ij`) and **eligibility trace dynamics** (`e_ij`), which are foundational to unsupervised learning algorithms like Hebbian learning. The use of memristors highlights a move toward energy-efficient, biologically plausible neuromorphic hardware. No numerical data or trends are explicitly provided, but the structure suggests a focus on **synaptic plasticity mechanisms** and their implementation in memristive crossbars. </details> By successively routing the tags to multiple PCM devices, the number of gradual SET pulses to be applied per single device is significantly reduced. The postsynaptic neuron receives the sum of product of the pre-synaptic activity and the weight block. In parallel, the PCM-trace block calculates the eligibility trace as a function of pre- and post-synaptic activities (Eq. 3 . 7 ), to be used in the weight update. Fig. 3 . 14 demonstrates the increase of effective dynamic range (number of updates to eligibility trace without getting stuck in the LRS) using multiple PCM devices. F igure 3 . 14 : Accumulating eligibility trace using multi-PCM configuration. Synapse receives 15 tags between 300 ms to 1300 ms which are routed to three different devices shown in the top three plots. The effective eligibility trace is calculated by applying a READ pulse to the parallel PCM devices. The initialization duration and synaptic activity period are shown with dashed lines in the bottom plot. The synaptic efficacy Wij is modified depending on the state of eligibility trace once the third-factor signal arrives. <details> <summary>Image 22 Details</summary> ![26606745](/v1/image/26606745b50247f4029a061d5c0060be45429a8adafdd78a5caddb8db6d52f45) ### Visual Description ## Multi-Panel Graph: Transient Response Analysis ### Overview The image presents a multi-panel graph analyzing transient responses in a system. It includes three upper panels (PCM #1, PCM #2, PCM #3) showing decaying signals with colored spikes, a lower main graph with a decay curve and annotations, and an inset step function. The graph emphasizes time-dependent behavior, with axes labeled "Time (s)" and "G (μs)". ### Components/Axes - **Axes**: - **X-axis**: Time (s), ranging from 0 to 30 seconds. - **Y-axis (Upper Panels)**: G (μs), scaled from 0.25 to 1.2. - **Y-axis (Lower Main Graph)**: G (μs), scaled from 1 to 3. - **Y-axis (Inset)**: Logarithmic scale from 1 to 10. - **Legends**: - **Upper Panels**: PCM #1 (black), PCM #2 (black), PCM #3 (black). - **Lower Main Graph**: e_ij (black), W_ij (black). - **Annotations**: - Arrows labeled "3rd-f" (10s) and "e_ij" (20s) in the main graph. - Dashed green line at 10s and 20s in the main graph. - Inset labeled "W_ij" with a green step function. ### Detailed Analysis - **Upper Panels (PCM #1–3)**: - Each panel shows a decaying signal with transient spikes. - **PCM #1**: Blue spikes at ~0.5s, 1.0s, 1.5s, 2.0s, 2.5s. - **PCM #2**: Green spikes at ~0.5s, 1.0s, 1.5s, 2.0s, 2.5s. - **PCM #3**: Red spikes at ~0.5s, 1.0s, 1.5s, 2.0s, 2.5s. - All spikes decay to baseline within ~3s. - **Lower Main Graph**: - A smooth decay curve (black line) from ~3 μs to ~1 μs over 30s. - **3rd-f** (blue arrow) at 10s, **e_ij** (blue arrow) at 20s. - Dashed green line at 10s and 20s, aligning with the arrows. - **Inset (W_ij)**: - Step function with sharp transitions at 10s and 20s. - Y-axis is logarithmic, showing a sudden increase at 10s and 20s. ### Key Observations 1. **Transient Spikes**: PCM panels exhibit rapid, synchronized spikes (blue, green, red) that decay within ~3s, suggesting short-lived events or perturbations. 2. **Decay Curve**: The main graph’s smooth decay indicates a prolonged system response, with the 3rd-f and e_ij points marking critical thresholds or events. 3. **Step Function**: The inset’s W_ij shows abrupt changes at 10s and 20s, possibly representing control inputs or system switches. 4. **Alignment**: The dashed green line in the main graph aligns with the 3rd-f and e_ij annotations, suggesting these points are tied to the step function’s transitions. ### Interpretation - **System Behavior**: The PCM panels likely represent individual components (e.g., sensors or subsystems) showing transient responses, while the main graph reflects the system’s overall decay. The 3rd-f and e_ij annotations may indicate critical failure points or operational thresholds. - **Step Function Role**: The W_ij inset’s step changes at 10s and 20s could modulate the system’s response, with the dashed line in the main graph highlighting these transitions. - **Color Coding**: The distinct colors (blue, green, red) in the PCM panels may differentiate between measurement types or experimental conditions, though the legend does not explicitly clarify this. - **Anomalies**: The smooth decay in the main graph contrasts with the sharp spikes in PCM panels, suggesting a hierarchical relationship between transient events and sustained system behavior. This analysis underscores the interplay between transient phenomena (PCM spikes) and long-term system dynamics (decay curve), with the step function acting as a potential control mechanism. </details> ## 3 . 3 . 4 Circuits and Architecture ## 3 . 3 . 4 . 1 PCM-trace Architecture An example in-memory event-based neuromorphic architecture is shown in Fig. 3 . 15 , where the PCM-trace is employed to enable three-factor learning on behavioral time scales. Synapse: Each synapse includes a weight block Wij in which two PCM devices are used in differential configuration to represent positive and negative weights [ 143 ]. The effective synaptic weight is calculated as the difference of these two conductance values, i.e., Wij = W + ij -W -ij . Also, each synapse has a PCM-trace block e ij that keeps the eligibility trace. Inside the PCM-trace block, there are two PCM devices, keeping track of the positive and negative correlation between pre and post-synaptic neurons. On the onset of the pre-synaptic input spike, PREj , (i) Wij is read, and the current is integrated by the post-synaptic neuron i ; (ii) Based on the UP / DN signal from the learning block (LB), a gradual SET programming current is applied to positive/negative PCM-trace devices. Neuron with Learning Block (LB): The LB estimates the pre-post synaptic neuron correlation using the Spike-Driven Synaptic Plasticity (SDSP) rule [ 223 ]. At the time of the pre-synaptic spike, the post-synaptic membrane variable is compared against a threshold, above (below) which an UP ( DN ) signal is generated representing the tag type. On the arrival of the third factor binary reward signal, REW , the state of the Eligibility Traces (ETs) devices is read by the VPROG block (Fig. 3 . 16 b) which generates a gate voltage that modulates the current that programs the weight devices Wij (see Alg. 2 ). F igure 3 . 15 : PCM-trace-based neuromorphic architecture for three-factor learning. Only positive eligibility trace ( e + ij ) and W + ij are shown. <details> <summary>Image 23 Details</summary> ![3f530b08](/v1/image/3f530b08c511954c4d1b8fd3d0811fab07c4c497698a495859670cba2b84d853) ### Visual Description ## Block Diagram: Neural Network Architecture with Input Processing ### Overview The diagram illustrates a multi-layered neural network architecture with integrated input processing components. It features parallel processing blocks connected to neuron layers, with explicit control signals for pre-processing, reward mechanisms, and voltage regulation. ### Components/Axes - **Left Axis**: - "PROG" (Program) and "READ" (Read) signals - **Right Axis**: - "NEURON + LB /DN #1" and "#2" (Neuron layers with layer boost and downsampling) - **Key Components**: - **Input Processing Blocks**: - `PRE` (Pre-processing) - `REW` (Reward mechanism) - `UP & PRE` (Update + Pre-processing) - `V_PROG` (Program voltage) - **Processing Units**: - `W11`, `W12`, `W21`, `W22` (Weight matrices) - `e11`, `e12`, `e21`, `e22` (Error terms) - **Output Path**: - Arrows indicate signal flow from input processing to neuron layers ### Detailed Analysis 1. **Input Processing Path**: - `PROG` and `READ` signals feed into pre-processing (`PRE`) and reward (`REW`) blocks. - `UP & PRE` combines update operations with pre-processing, outputting to `V_PROG` (program voltage). - Error terms (`e11`, `e12`, `e21`, `e22`) are generated within each processing block. 2. **Neuron Layer Connections**: - Right-side blocks labeled `NEURON + LB /DN #1` and `#2` receive inputs from processing units. - Layer boost (`LB`) and downsampling (`/DN`) operations are applied to neuron outputs. 3. **Signal Flow**: - Arrows show bidirectional control between input processing and neuron layers. - Feedback loops exist between `REW` and `UP & PRE` blocks. ### Key Observations - **Repetitive Structure**: Identical processing blocks (`W11/e11`, `W12/e12`, etc.) suggest parallel or stacked layers. - **Control Signals**: `PRE`, `REW`, and `UP & PRE` indicate dynamic adjustment mechanisms. - **Voltage Regulation**: `V_PROG` implies hardware-level control of program execution. ### Interpretation This diagram represents a hybrid neural network architecture combining: 1. **Input Processing**: Pre-processing, reward-based optimization, and voltage-controlled programming. 2. **Neuron Layers**: Standard feedforward connections with layer-specific boost/downsampling. 3. **Error Feedback**: Error terms (`e11`, `e12`, etc.) likely propagate back to adjust weights (`W11`, `W12`, etc.). The architecture suggests a neuromorphic computing system where input signals are dynamically modulated before neuron activation, with explicit hardware support for reward-driven learning and voltage optimization. The parallel processing blocks imply scalability for multi-input or multi-output tasks. </details> ## 3 . 3 . 4 . 2 Circuit Simulation Fig. 3 . 16 describes the block diagram of the LB implementing SDSP rule, which calculates the prepost neurons' correlation. The membrane variable (described here as a current Imem since circuits are in current-mode) is compared against a threshold value I th through a Bump circuit [ 143 , 224 ]. The output of this block is digitized through a current comparator (in our design chosen as a Winner-Take-All (WTA) block [ 225 ]) and generates UP / DN signals if the membrane variable is above/below the threshold I th , and STOP, SP , if they are close within the dead zone of the bump circuit [ 224 ]. Fig. 3 . 16 b presents the circuit schematic which reads the PCM-trace and generates VPROG . To read the state of the device, a voltage divider is formed between the PCM device and a pseudo resistor, highlighted in green. As the device resistance changes, the input voltage to the differential pair, highlighted in red, changes. This change is amplified by the gain of the diff. pair and the device current is normalized to its tail current giving rise to IPROG F igure 3 . 16 : (a) Learning block diagram generating UP/DN signals as a function of the correlation between pre and post-synaptic activity. (b) VPROG circuit reading from the eligibility trace device through the voltage divider (green) and generating IPROG through the diff. pair (red) to program the weight device. <details> <summary>Image 24 Details</summary> ![4d550c08](/v1/image/4d550c08f48e7361003b814d3667e0e5122806b2d8d6aa08cacc87f288cc69a5) ### Visual Description ## BlockDiagram: Neuromorphic Learning System Architecture ### Overview The image depicts a two-part technical system: 1. **Learning Block (a)**: A functional block diagram with labeled components and signal flow. 2. **Circuit Implementation (b)**: A physical circuit layout with labeled nodes, current sources, and voltage references. ### Components/Axes #### Learning Block (a) - **Inputs**: - `I_mem` (memory input current) - `I_th` (threshold input current) - **Components**: - **BUMP (Correlation Detector)**: Processes `I_mem` and `I_th` to generate correlation signals. - **WTA (Current Comparator)**: Compares currents to produce winner-take-all outputs. - **Outputs**: - `UP` (up signal) - `SP` (spike signal) - `DN` (down signal) #### Circuit Implementation (b) - **Key Elements**: - **Current Sources**: - `I_ET` (error current) - `I_PROG` (programming current) - **Voltage Reference**: `V_PROG` (programming voltage) - **Switches/Transistors**: - Labeled as `UP`, `SP`, `DN` (matching Learning Block outputs). - **Resistors/Inductors**: - `L_ET` (error inductance) - `R_PROG` (programming resistor) - **Color Coding**: - **Red Region**: Highlighted circuit section (likely error correction or programming path). - **Blue Region**: Grounded or reference path. ### Detailed Analysis #### Learning Block (a) - **Signal Flow**: - `I_mem` and `I_th` enter the Learning Block. - BUMP computes correlation between inputs. - WTA compares currents to select dominant signal (`UP`, `SP`, or `DN`). - **Functional Roles**: - BUMP: Likely implements spike-timing-dependent plasticity (STDP) or similar correlation logic. - WTA: Ensures only the strongest signal propagates (common in neuromorphic architectures). #### Circuit Implementation (b) - **Current Paths**: - `I_ET` flows through `L_ET` and `R_PROG`, suggesting an RL circuit for error dynamics. - `I_PROG` is regulated by `V_PROG`, indicating programmable synaptic weight adjustment. - **Component Relationships**: - `UP`, `SP`, `DN` outputs from the Learning Block control switches in the circuit, linking abstract logic to physical implementation. - `L_ET` and `R_PROG` form a feedback loop for error correction. ### Key Observations 1. **Modular Design**: The Learning Block abstracts computational logic, while the circuit provides a physical realization. 2. **Color-Coded Functionality**: Red and blue regions in (b) likely denote active (red) and passive/grounded (blue) paths. 3. **Bidirectional Control**: `I_PROG` and `V_PROG` suggest adjustable programming parameters for synaptic weight updates. ### Interpretation This system appears to model a **neuromorphic learning architecture**, where: - The Learning Block (a) represents abstract synaptic learning rules (e.g., STDP). - The circuit (b) implements these rules using analog components (inductors, resistors) for energy-efficient computation. - `I_ET` and `L_ET` imply error tracking for adaptive learning, while `I_PROG`/`V_PROG` enable synaptic weight programming. The design bridges biological plausibility (WTA, correlation detection) with analog circuit principles, suggesting applications in low-power AI hardware or brain-inspired computing. </details> which develops VPROG through the diode-connected NMOS transistor. VPROG is connected to the gate of the transistor in series with the weight PCM (see Fig. 3 . 15 ). Fig. 3 . 17 a plots PRE, Imem , the output of the learning block at the time of the PRE, and the gradual <details> <summary>Image 25 Details</summary> ![81d198c4](/v1/image/81d198c465fadf58aceade84dac3acb5ff356dd4e0d046592627eff8260d84c5) ### Visual Description ## Pseudocode: Three-factor learning with PCM-trace algorithm ### Overview The image contains a technical pseudocode implementation of a three-factor learning algorithm incorporating PCM-trace mechanisms. The code defines variable initializations, iterative processes, and conditional logic for eligibility trace updates and reward-based weight adjustments. ### Components/Axes - **Variables**: - `W_ij^+ = rand()`: Positive weight initialization - `W_ij^- = rand()`: Negative weight initialization - `e_ij^+`, `e_ij^-`: Eligibility traces for positive/negative events - `V_i,th`, `V_i,mem`: Threshold and memory values for eligibility calculation - `I_i,x`: Eligibility index calculation variable - `I_th^+`, `I_th^-`: Threshold values for eligibility comparison - `scale_const`: Scaling constant for reward-based updates - **Control Structures**: - `while t < taskDuration`: Main iteration loop - `if @Pre and t > t_init`: Eligibility trace accumulation condition - `if Reward`: Reward signal processing block - **Functions**: - `GRADUAL_SET(e_ij)`: Gradual eligibility trace update - `READ(e_ij^+, e_ij^-)`: Eligibility trace reading operation - `RESET(e_ij)`: Eligibility trace reset operation ### Detailed Analysis 1. **Initialization Phase**: - Random initialization of positive/negative weights (`W_ij^+`, `W_ij^-`) - Reset of eligibility traces (`e_ij^+`, `e_ij^-`) 2. **Main Iteration Loop**: - Continues while time `t` is less than task duration - Calculates eligibility index: `I_i,x = 1 - (V_i,th - V_i,mem)/V_i,th` - Eligibility trace accumulation triggered when: - `@Pre` condition is met - Time exceeds initialization threshold `t_init` 3. **Eligibility Trace Updates**: - For each event `e_ij`: - If `I_i,x > I_th^+`: Update positive trace `GRADUAL_SET(e_ij^+)` - If `I_i,x < I_th^-`: Update negative trace `GRADUAL_SET(e_ij^-)` 4. **Third-Factor Processing**: - Activated when reward signal is detected - For each weight `W_ij`: - Reads eligibility traces: `I_ij,e^+, I_ij,e^- = READ(e_ij^+, e_ij^-)` - Calculates scaled traces: - `I_PRG^+ = I_ij,e^+ * scale_const` - `I_PRG^- = I_ij,e^- * scale_const` - Updates weights with scaled traces: - `GRADUAL_SET(W_ij^+, I_PRG^+)` - `GRADUAL_SET(W_ij^-, I_PRG^-)` ### Key Observations - The algorithm combines eligibility trace learning with reward-modulated weight updates - Three distinct factors are implemented: 1. Eligibility trace accumulation 2. Threshold-based eligibility comparison 3. Reward-modulated weight adjustment - The PCM-trace mechanism appears to integrate positive/negative trace dynamics with reward signals ### Interpretation This pseudocode represents a hybrid learning architecture that: 1. Maintains separate positive/negative eligibility traces for each event 2. Uses threshold comparisons to determine trace update eligibility 3. Implements reward-based weight adjustments through scaled trace values 4. Employs gradual updates for both eligibility traces and weights The PCM-trace component suggests a probabilistic contrastive mechanism where: - Positive traces (`e_ij^+`) are strengthened by reward signals - Negative traces (`e_ij^-`) are weakened by reward signals - Weight updates (`W_ij`) are modulated by the difference between scaled positive/negative traces The algorithm appears designed for reinforcement learning scenarios requiring: - Temporal credit assignment (via eligibility traces) - Reward-modulated value function updates - Separate handling of positive/negative prediction errors </details> SET pulse applied to the device. As shown, the UP signal is asserted when the membrane current is higher than the threshold indicated in red, which causes a gradual SET pulse with 100 µA to be applied across the PCM-trace device upon PRE events. Fig 3 . 17 b shows the generated IPROG as a function of the state of the eligibility trace device. The higher the ET device's resistance, the less the accumulated correlation, thus the lower the programming current that should be applied to the weight device. The resistance on the x axis of the plot matches the measured resistance of PCM devices shown in Fig. 3 . 11 . ## 3 . 3 . 5 Discussion Long-lasting ETs enable the construction of powerful learning mechanisms for solving complex tasks by bridging the synaptic and behavioral time-scales. In this work, for the first time, we proposed to use the drift of PCM devices to implement ETs, and analyzed their feasibility for implementation in existing fabrication technologies. The implementation of the three-factor learning rules with ETs per synapse requires complex memory structures for keeping track of the eligibility trace and the weight. Our proposed approach has clear advantages for scaling. Table 3 . 4 shows a comparison between our PCM synapse and a CMOS-only implementation in 22 nm FDSOI technology from [ 226 ]. PCM is among the most advanced emerging memory technology integrated into the neuromorphic domain [ 109 ]. Our approach of using PCM to store both the synaptic weight and the eligibility trace requires no additional nano-fabrication methods. | | Area ( µm 2 ) | τ ( s ) | Area/ τ ( µm 2 s - 1 ) | |-----------------|-----------------|-----------|--------------------------| | CMOS [ 226 ] | 20 × 17 | 6 | 56.6 | | PCM [This work] | 12 12 | > 30 | < 4.8 | × T able 3 . 4 : Area comparison of eligibility trace implementation <details> <summary>Image 26 Details</summary> ![e57abea8](/v1/image/e57abea8bc9a4dae08b954da88eb51b2448b24c7258fedd6c4e920286362c6b2) ### Visual Description ## Line Graphs and Current Traces: Electrical Characterization of PCM Device ### Overview The image contains four time-domain current traces (left panels) and a single line graph (right panel) depicting electrical characteristics of a phase-change memory (PCM) device. The left panels show current responses to different operational states, while the right graph illustrates the relationship between PCM-trace resistance and programming current. ### Components/Axes **Left Panels (Current Traces):** - **X-axis**: Time (ms) ranging from 560 to 600 ms - **Y-axis**: Current (nA for top two panels, µA for bottom panel) - **Legends**: - Panel 1: "PRE" (solid black line) - Panel 2: "I_mem" (solid black) and "I_th" (dashed red) - Panel 3: "PRE & UP" (solid black line) - Panel 4: "I_set" (solid black line) - **Spatial Grounding**: All legends positioned in top-right corner of respective panels **Right Panel (Line Graph):** - **X-axis**: PCM-trace Resistance (MΩ) from 2 to 4 MΩ - **Y-axis**: I_PROG (mA) from 0.04 to 0.09 mA - **Legend**: "I_PROG" (solid black line) - **Spatial Grounding**: Legend in top-right corner ### Detailed Analysis **Left Panels:** 1. **Panel 1 (PRE)**: - Vertical black lines at regular intervals (≈5 ms spacing) - Current amplitude ≈10 nA - Consistent pattern across entire time window 2. **Panel 2 (I_mem & I_th)**: - Solid black line (I_mem): Flat baseline at ≈0 nA - Dashed red line (I_th): Slight upward trend from 0.5 nA to 1.5 nA - Both lines show minimal fluctuation 3. **Panel 3 (PRE & UP)**: - Vertical black lines similar to Panel 1 but with increased density - Current amplitude ≈100 µA (100,000 nA) - 50% more vertical lines compared to Panel 1 4. **Panel 4 (I_set)**: - Single sharp vertical spike at 585 ms - Current amplitude ≈100 µA - No other current activity visible **Right Panel:** - Linear decreasing trend (R² > 0.99) - Data points: - 2.0 MΩ → 0.09 mA - 2.5 MΩ → 0.075 mA - 3.0 MΩ → 0.06 mA - 3.5 MΩ → 0.045 mA - 4.0 MΩ → 0.03 mA - Slope: -0.015 mA/MΩ ### Key Observations 1. **PRE State**: Consistent low-current pulses (10 nA) with regular timing 2. **Threshold Current (I_th)**: Gradual increase from 0.5 to 1.5 nA during measurement 3. **PRE & UP State**: Same timing as PRE but 10,000x higher current magnitude 4. **I_set Event**: Single high-current pulse (100 µA) at 585 ms 5. **I_PROG vs Resistance**: Linear inverse relationship between programming current and resistance ### Interpretation The data suggests a PCM device undergoing electrical characterization: 1. **PRE State**: Likely represents pre-programming verification pulses 2. **I_th Increase**: Indicates rising threshold voltage during measurement 3. **PRE & UP State**: Shows increased current during programming verification 4. **I_set Event**: Represents the actual programming current pulse 5. **I_PROG-Resistance Relationship**: Demonstrates that higher resistance states require lower programming currents, consistent with PCM's negative differential resistance characteristic The linear relationship in the right panel (I_PROG = -0.015*R + 0.12) suggests a predictable programming current adjustment mechanism based on device resistance. The 10,000x current increase between PRE and PRE & UP states implies different operational modes with distinct current requirements. </details> Ω) F igure 3 . 17 : a) From the top: PRE events, POST membrane current ( Imem ) and learning threshold ( I th ), PRE events only when Imem is higher than I th , and corresponding gradual SET current pulse applied to PCM-trace. b) Programming current to be applied to the weight PCM as a function of eligibility trace state. <details> <summary>Image 27 Details</summary> ![c8f487cd](/v1/image/c8f487cd856942a0ff0c945df8bad64884a43a0b60245997e50f062509c271cb) ### Visual Description Icon/Small Image (61x85) </details> ## DISCOVERING SINGLE MATERIAL THAT SWITCHES BETWEEN VOLATILE OR NON-VOLATILE MODES This chapter's content was published in Nature Communications, featured as one of 50 best papers recently published in the field and can be found online in an extended form including all experimental details which we omit here for clarity. The original publication is authored by Yigit Demirag ∗ , Rohit Abraham John ∗ , Yevhen Shynkarenko, Yuliia Berezovska, Natacha Ohannessian, Melika Payvand, Peng Zeng, Maryna I. Bodnarchuk, Frank Krumeich, Gökhan Kara, Ivan Shorubalko, Manu V. Nair, Graham A. Cooke, Thomas Lippert, Giacomo Indiveri and Maksym V. Kovalenko. ∗ These authors contributed equally. Many in-memory computing frameworks demand electronic devices with specific switching characteristics to achieve the desired level of computational complexity. Existing memristive devices cannot be reconfigured to meet the diverse volatile and non-volatile switching requirements, and hence rely on tailored material designs specific to the targeted application, limiting their universality. "Reconfigurable memristors" that combine both ionic diffusive and drift mechanisms could address these limitations, but they remain elusive. Here we present a reconfigurable halide perovskite nanocrystal memristor that achieves ondemand switching between diffusive/volatile and drift/non-volatile modes by controllable electrochemical reactions. Judicious selection of the perovskite nanocrystals and organic capping ligands enable state-of-the-art endurance performances in both modes - volatile ( 2 × 10 6 cycles) and non-volatile ( 5.6 × 10 3 cycles ) . We demonstrate the relevance of such proofof-concept perovskite devices on a benchmark reservoir network with volatile recurrent and non-volatile readout layers based on 19 , 900 measurements across 25 dynamically-configured devices. ## 4 . 1 introduction The human brain operating at petaflops consumes less than 20 W, setting a precedent for scientists that real-time, ultralow-power data processing in a small volume is possible. Inspired by the human brain, the field of neuromorphic computing attempts to emulate various computational principles of the biological substrate by engineering unique materials [ 227 -229 ] and circuits [ 29 , 230 , 231 ]. In the context of hardware implementation of neural networks, the discovery of memristors has been one of the main driving forces for highly efficient in-memory realizations of synaptic operations. Similar to evolution optimizing neurons and synapses by exploiting stable and metastable molecular dynamics [ 232 ], memristive devices of various physical mechanisms [ 25 , 233 , 234 ] have been discovered and developed with different volatile and non-volatile specifications. Since their inception, memristors have been utilized to implement a wide gamut of applications [ 235 ] such as stochastic computing [ 236 ], hyperdimensional computing [ 237 ], spiking [ 238 ] and artificial neural networks [ 95 ]. However, many of these frameworks often demand very different hardware specifications [ 16 ] (Fig. 1 a). To meet these specifications, the memristor fabrication processes are often tediously engineered to reflect the requirements of targeted neural network configurations (e.g., neural encoding, synaptic precision, etc.). For example, the latest state-of-the-art spiking neural network (SNN) models [ 75 , 197 ] require memory elements operating at multiple timescales, with both volatile and non-volatile properties (from tens of milliseconds to hours) [ 144 ]. The current approach of optimizing memristive devices to a single requirement hinders the possibility of implementing multiple computational primitives in neural networks and precludes their monolithic integration on the same hardware substrate. In this regard, the realization of drift and diffusive memristors have garnered significant attention. Drift memristors portraying non-volatile memory characteristics are typically designed using oxide dielectric materials with a soft-breakdown behaviour. In combination with inert electrodes, the switching mechanism is determined by filaments of oxygen vacancies (valence change memory); whereas implementations with reactive electrodes rely on electrochemical F igure 4 . 1 : (a) Different neural network frameworks demand particular switching characteristics from in-memory computing implementations. For example, delay systems [ 239 ] (dynamical nonlinear systems with delayed feedback such as virtual reservoir networks), should exhibit only a fading memory to process the inputs from the recent past. Such short-term dynamics are best represented by volatile thresholdswitching memristors [ 240 ]. SNNs often demand both volatile and non-volatile dynamics, simultaneously. Synaptic mechanisms requiring STP and eligibility traces [ 241 ] can be implemented using volatile memristors [ 24 , 151 ] whereas synaptic efficacy requires either efficient binary-switching [ 20 ] or analog switching devices. Lastly, ANN performances specifically benefit from non-volatile features such as multi-level bit precision of weights and linear conductance response during the training phase [ 21 , 22 ] (b) A reconfigurable memristor with active control over its diffusive and drift dynamics may be a feasible unifying solution. Schematic of the reconfigurable halide perovskite nanocrystal memristor is shown for reference. We utilize the same active switching material ( CsPbBr3NCs capped with OGB ligands) to implement two distinct types of computation in the RC framework. The volatile diffusive mode exhibiting short-term memory is utilized as the reservoir layer while the non-volatile drift mode exhibiting long-term memory serves as the readout layer. <details> <summary>Image 28 Details</summary> ![89603230](/v1/image/8960323056090ad94527cd5d296dfd4e4be95e4a96d1e1c0a0c3b01df64ad01b) ### Visual Description ## Diagram: Neural Network Architectures and Material Structures ### Overview The image presents two primary sections: 1. **Section a**: Compares three neural network architectures (Virtual Reservoir, Spiking, Artificial) with their operational mechanisms (switching types). 2. **Section b**: Illustrates two filament structures (Thin and Thick) with layered material compositions and properties. --- ### Components/Axes #### Section a: Neural Network Architectures - **Virtual Reservoir Networks**: - Circular node arrangement labeled `τ` (time constant). - Box diagram with: - **Current** (dashed line), **Voltage** (solid line), **Vreset**, **Icc**, **Vset**. - **Threshold switching** (blue legend). - **Spiking Neural Networks**: - Similar box diagram with: - **Current**, **Voltage**, **Vreset**, **Icc**, **Vset**. - **Binary switching** (green legend). - **Artificial Neural Networks**: - Box diagram with: - **Current**, **Voltage**, **Vreset**, **Icc**, **Vset**, **Gradient reset**, **Gradient set**. - **Analog switching** (red legend). #### Section b: Filament Structures - **Thin Filament**: - Dotted porous structure with layers: - **Ag** (silver), **OGB capped CsPbBr₃**, **pTPD**, **PEDOT:PSS**, **ITO** (indium tin oxide). - **Volatile Diffusive** (blue legend). - **Thick Filament**: - Denser structure with similar layers but thicker **OGB capped CsPbBr₃**. - **Non-Volatile Drift** (green legend). --- ### Detailed Analysis #### Section a: Switching Mechanisms 1. **Virtual Reservoir Networks**: - **Threshold switching** (blue) involves voltage-dependent activation (`Vset`, `Icc`). - Circular node arrangement suggests dynamic reservoir states. 2. **Spiking Neural Networks**: - **Binary switching** (green) uses discrete voltage thresholds (`Vreset`, `Vset`). - Simplified node connections compared to Virtual Reservoir. 3. **Artificial Neural Networks**: - **Analog switching** (red) incorporates gradient-based updates (`Gradient reset`, `Gradient set`). - Complex parameter interactions for continuous adjustments. #### Section b: Filament Properties - **Thin Filament**: - Porous structure with **OGB capped CsPbBr₃** (blue) and **pTPD** (dark blue) layers. - **Volatile Diffusive** behavior implies rapid charge/discharge cycles. - **Thick Filament**: - Enhanced **OGB capped CsPbBr₃** layer for stability. - **Non-Volatile Drift** (green) suggests slower, persistent charge retention. --- ### Key Observations 1. **Switching Mechanisms**: - Threshold switching (Virtual Reservoir) and Binary switching (Spiking) are discrete, while Analog switching (Artificial) enables continuous adjustments. 2. **Filament Design**: - Thin filaments prioritize volatility (rapid response), while thick filaments emphasize non-volatility (long-term stability). - Material layering (e.g., **pTPD**, **PEDOT:PSS**) likely modulates conductivity and drift characteristics. --- ### Interpretation - **Neural Network Architectures**: - The choice of switching mechanism (Threshold, Binary, Analog) directly impacts computational efficiency and adaptability. Virtual Reservoir Networks may excel in dynamic environments, while Artificial Networks prioritize precision via gradient-based updates. - **Filament Structures**: - Material composition and thickness trade off volatility for stability. Thin filaments with **OGB capped CsPbBr₃** and **pTPD** are suited for high-speed applications, whereas thick filaments with reinforced layers are ideal for memory retention. - **Cross-Sectional Insights**: - The diagrams emphasize material engineering (e.g., **ITO** electrodes, **Ag** contacts) to optimize neural network performance. --- ### Uncertainties - No numerical values or quantitative data are provided; trends are inferred from structural and categorical labels. - Material layer thicknesses (e.g., "thin" vs. "thick") are qualitative, not metric-based. </details> reactions to form conductive bridges (electrochemical metallization memory) [ 242 ]. Such driftbased memristors fit well for emulating synaptic weights that stay stable between weight updates. In contrast, diffusive memristors are often built with precisely embedded clusters of metallic ions with low diffusion activation energy within a dielectric matrix [ 234 ]. The large availability of such mobile ionic species and their low diffusion activation energy facilitate spontaneous relaxation to the insulating state upon removing power, resulting in volatile threshold switching. Memristive devices with such short-term volatility, are better suited to process temporally-encoded input patterns [ 243 ]. Hence, the application determines the type of volatility, bit-precision or endurance of the memristors, which are then heavily tailored by tedious material design strategies to meet these demands [ 16 ]. For example, deep neural network (DNN) inference workloads require linear conductance response over a wide dynamic range for optimal weight update and minimum noise for gradient calculation [ 21 , 22 , 95 ]. Whereas SNNs often demand richer and multiple synaptic dynamics simultaneously e.g., short term conductance decay (to implement synaptic cleft phenomena such Ca 2 +-dependent short-term plasticity (STP) and CAMKII-related eligibility traces [ 244 ]), non-volatile device states (to represent synaptic efficacy) and a probabilistic nature (to mimic synaptic vesicle releases [ 243 ]) (Fig. 1 a). However, optimizing the active memristive material for each of these features limits their feasibility to suit a wide range of computational frameworks and ultimately increases the system complexity for most demanding applications. Moreover, these diverse specifications cannot always be implemented by combining different types of memristors on a monolithic circuit e.g., volatile and non-volatile, binary and analog, due to the incompatibility of the fabrication processes. Therefore, the lack of universality of memristors that realize not only one, but diverse computational primitives is an unsolved challenge today. A reconfigurable memristive computing substrate that allows active control over their ionic diffusive and drift dynamics can offer a viable unifying solution but is hitherto undemonstrated. Although dual-functional memory behaviour has been observed previously, the dominance of one of the mechanisms often results in poor switching performance for either one or both modes, limiting the employability of such devices for demanding applications [ 96 , 97 ]. To the best of our knowledge, there is no report yet of a reconfigurable memristive material that can portray both volatile diffusive and multi-state non-volatile drift kinetics, exhibit facile switching between these two modes, and still pertain excellent performance. Here we report a reconfigurable memristor computing substrate based on halide perovskite nanocrystals that achieves on-demand switching between volatile and non-volatile modes by encompassing both diffusive and drift kinetics (Fig. 1 b). Halide perovskites are newcomer optoelectronic semiconducting materials that have enabled state-of-the-art solar cells [ 245 ], solid state light emitters [ 246 , 247 ] and photodetectors [ 248 -250 ].Recently, these materials have attracted significant attention as memory elements due to their rich variety of charge transport physics that supports memristive switching, such as modulatable ion migration [ 251 -253 ], electrochemical metallization reactions with metal electrodes [ 254 ] and localized interfacial doping with charge transport layers [ 255 ]. While most reports are based on thin films or bulk crystals of halide perovskites [ 251 -253 , 256 ], interestingly perovskite nanocrystal (NC)-based formulations have been much less investigated till date [ 244 , 257 ]. NCs in general are recently garnering significant attention for artificial synaptic implementations because they support a wide range of switching physics such as trapping and release of photogenerated carriers at dangling bonds over a broad spectral region [ 258 ], and single-electron tunnelling [ 259 ]. They allow low-energy (< fJ), highspeed (MHz) operation, and can support scalable and CMOS-compatible fabrication processes. In the case of perovskite NCs, however, existing implementations often utilize NCs only as a charge trapping medium to modulate the resistance states of another semiconductor, in flash-like configurations a.k.a synaptic transistor [ 260 -263 ]. The memristive switching capabilities and limits of the perovskite NC active matrix remains unaddressed, entailing significant research in this direction. Colloids of perovskite nanocrystals (NCs) are readily processable into thin-film NC solids and they offer a modular approach to impart mesoscale structures and electronic interfaces, tunable by adjusting the NC composition, size and surface ligand capping. Our device comprises all-inorganic cesium lead bromide ( CsPbBr3 ) NCs capped with organic ligands as the active switching matrix and silver ( Ag ) as the active electrode. The design principle for realizing reconfigurable memristors revolves around two main factors. (i) From a material selection perspective, the low activation energy of migration of Ag + and Br -allows easy formation of conductive filaments. The soft lattice of the halide perovskite NCs facilitates diffusion of the mobile ions. Moreover, the organic capping ligands help regulate the extent of electrochemical reactions, resulting in high endurance and good reconfigurability. (ii) From a peripheral circuit design perspective, active control of the compliance control ( Icc ) determines the magnitude of flux of the mobile ionic species and in turn allows facile switching between volatile diffusive and multi-bit non-volatile drift modes of operation. The surface capping ligands are observed to play a vital role in determining the switching characteristics and endurance performance. CsPbBr3NCs capped with didodecyldimethylammonium bromide (DDAB) ligands display poor switching performance in both volatile ( 10 cycles) and non-volatile ( 50 cycles) modes, whereas NCs capped with oleylguanidinium bromide (OGB) ligands exhibit record-high endurance performances in both volatile ( 2 million cycles) as well as non-volatile switching ( 5655 cycles) modes [ 240 , 255 , 264 ] To validate our approach and demonstrate the advantages of such reconfigurable memristive materials, we use a benchmark model of a fully-memristive reservoir computing (RC) framework interfaced to an artificial neural network (ANN) [ 240 ]. The reservoir is modelled as a network of recurrently-connected units whose dynamics act as short-term memory. Any temporal signal entering the reservoir is subject to a high-dimensional nonlinear transformation that enhances the separability of its temporal features. A linear read-out ANN layer is then connected to the reservoir units with all-to-all connections and trained to perform classification based on the temporal information maintained in the reservoir. Our RC implementation comprises perovskite memristors that are configured as diffusion-based volatile dynamic elements to implement the reservoir nodes and as drift-based non-volatile weights to implement the readout ANN layer. In their diffusive mode, the low activation energy of ion migration of the mobile ionic species ( Ag + and Br -) enables volatile threshold switching. The resulting short-term dynamics are essential for capturing temporal correlations within the input data stream. In the drift mode, stable conductive filaments formed by the drift of the ionic species facilitate programming of non-volatile synaptic weights in the readout layer for both training and inference. Furthermore, the readout layer can be trained online via active regulation of the Icc which allows precise selection of the drift dynamics and enables multiple-bit resolution in the low resistive state (LRS). Using neural firing patterns, we show via both experiments and simulations that a RC framework based on reconfigurable perovskite memristors can accurately extract features in the temporal signals and classify firing patterns. ## 4 . 2 diffusive mode of the perovskite reconfigurable memristor We investigate two systems for diffusive dynamics-didodecyldimethylammonium bromide (DDAB) and oleylguanidinium bromide (OGB)-capped CsPbBr3NCs. The device structure comprises indium tin oxide (ITO), poly( 3 , 4 -ethylenedioxythiophene) polystyrene sulfonate (PEDOT:PSS), poly(N,N'-bis4 -butylphenyl-N,N'-bisphenyl)benzidine (polyTPD), CsPbBr3NCs and Ag (see 'Methods" section). With an lcc of 1 µ A, both material systems portray volatile threshold switching characteristics with diffusive dynamics and spontaneous relaxation back to the initial state, albeit with contrasting endurance. The DDAB-capped perovskite NCs exhibit a poor on-off ratio (volatile memory a.k.a. VM Ipower ON /Ipower OFF ∼ 10 ) and quick transition to a non-volatile state, resulting in an inferior volatile endurance of ∼ 10 cycles. On the other hand, the OGB-capped perovskite NCs depict a highly robust threshold switching behaviour with sub1 V set voltages, VM Ipower ON /Ipower OFF ∼ 10 3 and a record volatile endurance of 2 × 10 6 cycles (Fig. 3 a). The volatile threshold switching behaviour can be attributed to the redistribution of Ag + and Br -ions under an applied electric field, and their back-diffusion upon removing power [ 253 , 265 , 266 ]. It is also important to note that both these devices exhibit a unidirectional DC threshold switching behaviour with no switching occurring under reverse bias (negative voltage on the Ag electrode). This can be correlated to the dominant bipolar electrode effect over thermal-driven diffusion, in alignment with literature [ 267 -269 ] ## 4 . 3 drift mode of the perovskite reconfigurable memristor Upon increasing the Icc to 1 mA, both the DDAB and OGB-capped CsPbBr3 NC memristors portray typical non-volatile bipolar resistive switching characteristics, once again with contrasting endurance (Fig. 4 . 2 , Fig. 4 . 3 and Supplementary Fig. A. 22 ). Both systems depict forming-free operations and similar on-off ratios ( ≥ 10 3 ) . However, the DDAB capped perovskite NCs quickly transit to a non-erasable non-volatile state, resulting in an inferior non-volatile endurance of 50 cycles (Supplementary Fig. A. 23 ). On the other hand, the OGB-capped perovskite NC-based memristor portrays a highly robust switching behaviour with sub- 1 V set voltages, and record-high non-volatile endurance and retention of 5655 cycles and 10 5 s, respectively (Fig. 4 . 3 , Supplementary Fig. A. 24 ). Similar to the volatile threshold switching mechanism, the nonvolatile resistive F igure 4 . 2 : The device structure comprises ITO ( 100 nm), PEDOT:PSS ( 30 nm), polyTPD ( 20 nm), OGBcapped CsPbBr3NCs ( 20 nm ) and Ag ( 150 nm ) . (a) Diffusive mode- illustration of the proposed volatile diffusive switching mechanism. (b) Drift mode-illustration of the proposed non-volatile drift switching mechanism. Additional note: The thickness of the individual layers in the device schematic are not drawn to scale to match the experimentally-measured thicknesses. The perovskite layer is not a bulk semiconductor, but 1 -2 layers of nanocrystals (NCs). The schematic is drawn for simplicity, to illustrate the formation and rupture of conductive filaments (CFs) of Ag through the device structure. <details> <summary>Image 29 Details</summary> ![ca040ca5](/v1/image/ca040ca52b7ffc4182aa3d21da60a8bb3ec68012164f394c8aef8ce67744e048) ### Visual Description ## Diagram: Comparison of Volatile and Non-Volatile Memristor Mechanisms ### Overview The diagram illustrates the operational mechanisms of **volatile** and **non-volatile memristors**, focusing on ion dynamics, material composition, and electrochemical reactions. It contrasts **diffusive dynamics** (volatile) with **drift dynamics** (non-volatile), highlighting differences in ion transport and material interactions. --- ### Components/Axes #### Legend (Bottom Center): - **Ag** (Silver): White circles (Ag⁺ ions), black dots (Ag atoms). - **CsPbBr₃:NCs** (Cesium Lead Bromide Nanocrystals): Blue rectangles. - **pTPD** (Poly(3,4-ethylenedioxythiophene):blue-green rectangles. - **PEDOT:PSS** (Poly(3,4-ethylenedioxythiophene) polystyrene sulfonate): Light blue rectangles. - **ITO** (Indium Tin Oxide): Gray rectangles. #### Diagram Sections: 1. **Volatile Memristor (Left Panel)**: - **Thin Filament**: Purple dashed box. - **Diffusive Dynamics**: Arrows indicate Ag⁺ ion movement. - **Possible Reactions**: - **i**: Ag⁺ + e⁻ → Ag (white circle + black dot → red dot). - **ii**: Ag⁺ → Ag⁺ + e⁻ (black dot → white circle + red dot). - **iii**: Ag⁺ diffusion (black dot → green arrow). - **iv**: Ag⁺ + e⁻ → Ag (black dot + red dot → white circle). 2. **Non-Volatile Memristor (Right Panel)**: - **Thick Filament**: Blue dashed box. - **Drift Dynamics**: Arrows indicate Ag⁺ and Br⁻ ion movement. - **Possible Reactions**: - **i**: Ag⁺ + e⁻ → Ag (black dot + red dot → white circle). - **ii**: Ag⁺ → Ag⁺ + e⁻ (black dot → white circle + red dot). - **iii**: Br⁻ + VBr → Br⁻ (blue dot + green arrow → blue dot). - **iv**: Ag⁺ + e⁻ → Ag (black dot + red dot → white circle). --- ### Detailed Analysis #### Volatile Memristor (a): - **Thin Filament**: Ag⁺ ions (black dots) are confined to a narrow layer, enabling rapid diffusion. - **Diffusive Dynamics**: Ag⁺ ions (black dots) move via diffusion (green arrow), forming Ag atoms (white circles) upon electron capture (red dot). - **Reactions**: - **i**: Ag⁺ + e⁻ → Ag (reduction). - **ii**: Ag⁺ oxidation (loss of electron). - **iii**: Ag⁺ diffusion (spatial redistribution). - **iv**: Ag⁺ + e⁻ → Ag (reduction at a different location). #### Non-Volatile Memristor (b): - **Thick Filament**: Ag⁺ (black dots) and Br⁻ (blue dots) ions are present, enabling drift under electric fields. - **Drift Dynamics**: Ag⁺ (black dots) and Br⁻ (blue dots) move directionally (arrows), with Br⁻ interacting with VBr (green arrow). - **Reactions**: - **i**: Ag⁺ + e⁻ → Ag (reduction). - **ii**: Ag⁺ oxidation. - **iii**: Br⁻ + VBr → Br⁻ (Br⁻ migration via vacancy). - **iv**: Ag⁺ + e⁻ → Ag (reduction at a different location). --- ### Key Observations 1. **Material Composition**: - Volatile memristors use **Ag** and **pTPD/PEDOT:PSS** for ion transport. - Non-volatile memristors incorporate **CsPbBr₃:NCs** and **ITO** for stability and ion mobility. 2. **Ion Dynamics**: - Volatile: Relies on **diffusive Ag⁺** movement (short-range, rapid). - Non-Volatile: Uses **drift dynamics** (long-range, field-driven) for Ag⁺ and Br⁻. 3. **Reaction Pathways**: - Volatile: Dominated by Ag⁺/Ag redox cycles. - Non-Volatile: Involves **Ag⁺/Ag** and **Br⁻/VBr** interactions, enabling multi-ion switching. 4. **Structural Differences**: - Volatile: Thin filament limits ion mobility. - Non-Volatile: Thick filament allows sustained ion drift. --- ### Interpretation The diagram demonstrates how **ion dynamics** and **material design** dictate memristor behavior: - **Volatile memristors** depend on **diffusive Ag⁺** for rapid switching but lack retention (data volatility). - **Non-volatile memristors** leverage **drift dynamics** (Ag⁺/Br⁻) and **CsPbBr₃:NCs** for stable, long-term memory states. The inclusion of **Br⁻** and **VBr** suggests a mixed-ion conduction mechanism, enhancing endurance and retention. The legend’s color coding is critical for identifying material roles: **Ag** (redox), **Br⁻** (charge balancing), and **ITO** (electrode). The spatial separation of thin vs. thick filaments underscores the trade-off between speed (volatile) and retention (non-volatile). This aligns with applications in neuromorphic computing, where non-volatile memristors are preferred for persistent memory. </details> switching can also be attributed to redistribution of ions, and electrochemical reactions under an applied electric field [ 251 , 252 ]. The larger Icc of 1 mA results in permanent and thicker conductive filamentary pathways, and the switching dynamics is now dominated by the drift kinetics of the mobile ion species Ag + and Br -, rather than diffusion. In the case of DDAB-capped CsPbBr3NCs, the inferior volatile endurance, quick transition to a non-volatile state and mediocre non-volatile endurance indicates poor control of the underlying electrochemical processes and formation of permanent conductive filaments even at low compliance currents. On the other hand, capping CsPbBr3NCs with OGB ligands enables better regulation of the electrochemical processes, resulting in superior on-off ratio, volatile endurance as well as non-volatile endurance. Scanning Electron Microscope (SEM) images indicate similar film thickness in both devices, ruling out dependence on the active material thickness (Fig. 4 . 3 ). Transmission Electron Microscopy (TEM) and Atomic Force Microscopy (AFM) images reveal similar nanocrystal size ( ∼ 10 nm ) and surface roughness for both films, dismissing variations in crystal size and morphology as possible differentiating reasons (Fig. 4 . 3 and Supplementary Fig. A. 25 ). While the exact mechanism is still unknown, the larger size of the OGB ligands compared to DDAB ( 2.3 nm vs. 1.7 nm ) could intuitively provide better isolation to the CsPbBr3NCs and prevent excess electrochemical redox reactions of Ag + and Br -, modulating the formation and rupture of conductive filaments. This comparison is further supported by photoluminescence measurements, pointing to a larger drop of luminescence quantum yield in the films of DDAB-capped NCs, arising from the stronger excitonic diffusion and trapping. To probe the mechanism further, devices with Au as the top electrode were fabricated, but did not show any resistive switching behaviour. The devices do not reach the compliance current of 1 mA during the set process and do not portray the sudden increase in current, typical of filamentary memristors. This indicates that Ag is crucial for resistive switching and also proves that Br - ions play a trivial role in our devices if any. Control experiments on PEDOT:PSS only and PEDOT:PSS + pTPD devices further reiterate importance of the perovskite NC thin film as an active matrix for reliable and robust Ag filament formation and rupture. Secondary ion mass spectrometry (SIMS) profiling reveals a clear difference in the 107 Ag cross section profile F igure 4 . 3 : The device structure comprises ITO ( 100 nm ) , PEDOT:PSS ( 30 nm ) , polyTPD ( 20 nm ) , OGBcapped CsPbBr3NCs ( 20 nm ) and Ag ( 150 nm ) as shown in the SEM cross-section. Thickness of the individual layers were confirmed by AFM. The TEM image reveals NCs with an average diameter of -10 nm. (a) Diffusive mode- evolution of the device conductance upon applying DC sweep voltages ( 0 V → 2 V → 0 V ) with an Icc = 1 µ A (top), endurance performance (bottom). (b) Drift mode- evolution of the device conductance upon applying DC sweep voltages ( 0 V → + 5 V → 0 V → -7 V → 0 V ) with an Icc = 1 mA during SET operation (top), endurance performance (bottom). <details> <summary>Image 30 Details</summary> ![25b2b2a3](/v1/image/25b2b2a392e7ee69b2ac4e9d38dac4a49b03c9376c5181ea351337bae2b83979) ### Visual Description ## Volatile and Non-Volatile Memristor Characterization ### Overview The image presents comparative electrical characterization of volatile and non-volatile memristors, supported by SEM micrographs of material layers. Two primary graphs per memristor type show current-voltage (I-V) relationships and cycle-dependent performance, with material composition details provided in the central SEM images. ### Components/Axes **Volatile Memristor (Left Panel):** - **Graph a (Top):** - X-axis: Voltage (V) from 0 to 2 V - Y-axis: Current (A) from 1e-9 to 1e-6 A (log scale) - Legend: Cycle 1 (solid purple), 2-49 (dashed purple), Cycle 50 (solid dark purple) - Spatial: Legend positioned right-aligned - **Graph b (Bottom):** - X-axis: Cycles (1e6 to 2e6) - Y-axis: Current (A) from 1e-9 to 1e-7 A (log scale) - Markers: - Squares: @0.1V before - Circles: @2V - Triangles: @0.1V after - Spatial: Markers clustered at bottom **Non-Volatile Memristor (Right Panel):** - **Graph a (Top):** - X-axis: Voltage (V) from -6 to 4 V - Y-axis: Current (A) from 1e-10 to 1e-4 A (log scale) - Legend: Cycle 1 (solid blue), Cycle 1000 (dashed blue) - Spatial: Legend positioned right-aligned - **Graph b (Bottom):** - X-axis: Cycles (0 to 5000) - Y-axis: Current (A) from 1e-13 to 1e-4 A (log scale) - Markers: - Light blue: LRS (Low Resistance State) - Dark blue: HRS (High Resistance State) - Spatial: Markers distributed across cycle axis **Central SEM Images:** - Labeled 1-5 with material identifiers: 1. Ag 2. OGB capped CsPbBr₃ NCs 3. pTPD 4. PEDOT:PSS 5. ITO - Scale bars: 200 nm (top) and 10 nm (bottom) - Spatial: Centered between memristor graphs ### Detailed Analysis **Volatile Memristor:** - **Graph a:** - Cycle 1 shows steep I-V curve with sharp threshold (~0.5 V) - Subsequent cycles (2-49) exhibit reduced current density at same voltages - Cycle 50 demonstrates near-identical performance to Cycle 1 - **Graph b:** - Current increases by ~30% at 2V after 2e6 cycles (triangle markers) - Minimal change at 0.1V before/after (square/triangle markers) **Non-Volatile Memristor:** - **Graph a:** - Cycle 1 shows bipolar switching with hysteresis loop - Cycle 1000 exhibits reduced hysteresis width - Sharp threshold at ~0 V for both cycles - **Graph b:** - LRS dominates early cycles (light blue points) - HRS population increases after ~2500 cycles (dark blue clusters) - Current density decreases by ~100x over 5000 cycles **SEM Correlation:** - Layer 1 (Ag) and Layer 5 (ITO) form metallic electrodes - Layer 2 (OGB capped CsPbBr₃ NCs) likely serves as active switching layer - Layer 3 (pTPD) and Layer 4 (PEDOT:PSS) act as interfacial layers - Average feature size ~10 nm matches nanoscale switching requirements ### Key Observations 1. Volatile memristor shows cyclical performance degradation/recovery 2. Non-volatile device exhibits clear LRS/HRS transition with cycling 3. Material layering suggests optimized interfacial charge trapping 4. Voltage thresholds align with typical memristive switching ranges 5. Current density variations correlate with cycle-dependent aging ### Interpretation The data demonstrates fundamental differences between volatile and non-volatile memristive behavior: - Volatile device shows reversible but cycle-dependent characteristics, suggesting electrochemical redox processes - Non-volatile device exhibits persistent LRS/HRS states with cycling, indicating successful charge trapping - Material composition (particularly OGB-capped perovskite NCs) likely enables nanoscale filament formation - SEM imaging confirms sub-10nm feature sizes critical for high-density memory applications - Bipolar switching in non-volatile device suggests ionic conduction mechanisms - Cycle-dependent current changes indicate potential for endurance testing protocols </details> when comparing an ON and OFF device. An increase of the 107 Ag count is observed at the interface between the halide perovskite and the organic layers for the device in the ON state. Temperature-dependent measurements further confirm the theory of migration of Ag + ions through the perovskite matrix. The conclusions in this study are observed to be independent of the NC layer thickness, NC size and dispersity. ## 4 . 4 reservoir computing with perovskite memristors To demonstrate the advantages of the reconfigurability features of our perovskite memristors, we model a fully-memristive RC framework with dynamically-configured layer of virtual volatile reservoir nodes and a readout ANN layer with non-volatile weights in simulation. In particular, we address three distinct forms of computational requirements using the reconfigurability of the proposed device: an accumulating/decaying short-term memory for the temporal processing in the reservoir; a stable long-term memory for retaining trained weights in the readout layer, and a circuit methodology for accessing analog states from binary devices to enhance the training performance. ## 4 . 5 diffusive perovskite memristors as reservoir elements To implement the reservoir layer with the fabricated memristor devices, we utilize the virtual node concept originally proposed by Appeltant et al. [ 239 ]. Instead of conventional transforming of the input signal to a high-dimensional reservoir state by processing over many non-linear units, the virtual node concept employs the idea of delayed feedback on a single physical device exhibiting strong short-term effects. Under the influence of a sequential input, the dynamical device state goes through a non-linear transient response, which is recorded with fixed timesteps to create a set of virtual nodes representing the reservoir state. Hence, the transient device non-linearity constitutes temporal processing, and the delay system forms the high dimensional representation in the reservoir. Elements of a reservoir layer should ideally possess a fading memory (sometimes called shortterm memory or echo state property) and non-linear internal dynamics [ 270 ]. The fading memory effect plays a key role in extracting features in the temporal domain of the input data stream, while the non-linear internal dynamics enable projection of temporal features to a high-dimensional state with good separability [ 271 ]. Response of the OGB-capped CsPbBr3NC memristors to low-voltage electrical spikes reveal short-term/fading diffusive dynamics with a relaxation time ≥ 5 ms for an input pulse duration = 20 ms and amplitude = 1 V. Non-linear internal dynamics are evident in 4 formats- (i) from the transient evolution of the device conductance during the stimulations; and from the final device conductance as a function of the applied pulse (ii) amplitude, (iii) width and (iv) number (Supplementary Fig. A. 26 ). An additional test of the echo state property reveals that the present device state is reflective of the input temporal features in the recent past ( < 23 ms ) but not the far past, enabling efficient capture of short-term dependencies in the input data stream (Supplementary Fig. A. 27 ). Stimulation of pulse streams with different temporal features results in distinct temporal dynamics of memristor states. ## 4 . 6 drift perovskite memristors as readout elements Storing the weight of the fully-connected readout layer of the ANN requires non-volatile synaptic devices. For representing synaptic efficacy, we use the drift-based perovskite memristor configuration that enables stable access to multiple conductance states. Because synaptic efficacy in ANNs can be either positive or negative, we use two memristor devices G + and G -in a differential architecture to represent a single synapse [ 272 ]. Hence, synaptic potentiation is obtained by increasing the conductance of G + , and depression by increasing the conductance of G -with identical pulses. The effective synaptic strength is expressed by the difference between the two conductances ( G + -G -) . Arranged in a crossbar array with the differential configuration, synaptic propagation at the readout layer is realized efficiently, governed by Kirchhoff's Current Law and Ohm's Law at O ( 1 ) complexity [ 273 ]. Like most filament-based memristors, our devices display non-volatile switching across only two stable states (binary) and suffer from the lack of access to true analog conductance states for synaptic efficacy. This low bit resolution during learning has been empirically shown to cause poor network performance [ 274 , 275 ]. To have more granular control over the filament formation, we migrate a recently proposed programming approach for oxide memristors to halide perovskites [ 276 ]. We achieve multi-level stable conductance states in the device's low resistance regime by modulating the programming Icc. In comparison to the undesirable nonlinear transformations seen in HfO2 devices, the mapping from Icc to conductance follows a linear relation for the drift-based CsPbBr3NC devices, hence providing linear mapping to the desired conductance values (see below). This enables controlled weight updates using a single shot without requiring a write-verify scheme. We use Icc modulation to train the readout layer of the reservoir network (see "Methods") using the statistical measurement data from the devices. For every input pattern received from reservoir nodes, the readout layer produces a classification prediction via a sigmoid activation function. Depending on the classification error, the desired conductance changes of each differential memristor pair per synapse are calculated. The memristive weights are then updated with the corresponding Icc, resulting in the desired conductance values. ## 4 . 7 classification of neural firing patterns Next, we present a virtual reservoir neural network [ 239 , 277 ] simulation with the short-term diffusive configuration of perovskite memristors in the reservoir layer and long-term stable drift configuration in the trainable readout layer (Fig. 4 a). The network is tested on the classification of the four commonly observed neural firing patterns in the human brainBursting, Adaptation, Tonic, and Irregular [ 278 ]. These spike trains (Supplementary Fig. A. 28 ) are applied to a single perovskite memristor in the reservoir layer, whose diffusive dynamics constitute a short-term memory between 5 and 20 ms timescale. We exploit the concept of a virtual reservoir, where each reservoir node is uniformly sampled at finite intervals to emulate the rich non-linear temporal processing in reservoir computing. We use a sampling interval of 35 ms, resulting in a population of 30 virtual reservoir nodes representing the temporal features across 1050 ms long neural firing patterns. The device responses are derived from electrical measurements of 25 different memristive devices (Fig. 4 b). Both device-to-device and cycle-to-cycle variability are captured with extensive measurements. Stimulation with "Bursting" spikes results in an accumulative behaviour within each high-frequency group and an exponential decay in the inter-group interval, reflective of the fading memory and non-linear internal dynamics as described above. "Adaptive" patterns trigger weakened accumulative behaviour as a function of the pulse interval, "Irregular" results in random accumulation and decay, while "Tonic" generates states with no observable accumulation. As the last stage of computation, these features are projected to a fully-connected readout layer with 4 sigmoid neurons (see "Methods"). The reservoir network achieves a classification accuracy of 85.1% with the training method of modulating the programming I cc of drift-based perovskite weights in the readout layer (Fig. 4 c). Remarkably, with double-precision floating-point weights trained with the Delta rule [ 195 ] on readout, the test accuracy is 91.8% confirming the effectiveness of our Icc approach (Supplementary Fig. A. 29 , A. 30 , A. 31 ). The training and test accuracy over 5 epochs demonstrates that both networks are not overfitting the training data (Fig. 4 . 4 , Supplementary Table A. 1 ). ## 4 . 8 discussion We present robust halide perovskite NC-based memristive switching elements that can be reconfigured to exhibit both volatile diffusive and non-volatile drift dynamics. This represents a significant advancement in the experimental realization of memristors. In comparison to pristine volatile and non-volatile memristors, our reconfigurable CsPbBr3NC memristors can be utilized to implement both neurons and synapses with the same material/device platform and adapt to diverse computational primitives without additional modifications to the device stack at run-time. The closest comparison to our devices are dual functional memristors- those that exhibit both volatile and non-volatile switching behaviours without additional materials or device engineering. While impressive demonstrations of dual functional memristors exist, many devices require an electroforming step to initiate the resistive switching behaviour and most importantly, the endurance and retention performance are often limited to < 500 cycles in both modes and ≤ 10 4 s respectively. In comparison, we report a record-high endurance of 2 million cycles in the volatile mode, 5655 cycles in the non-volatile mode, and a retention of 10 5 s, highlighting the significance of our approach. This makes these devices ideal for alwayson online learning systems. The forming-free operation, and low set-reset voltages would allow low power vector-matrix multiplication operations, while the high retention and endurance ensure precise mapping of synaptic weights during training and inference of artificial neural networks. In contrast to most metal oxide-based diffusive memristors that require high programming currents to initiate filament formation ( ≥ 1 V or/and ≥ 10 µ A ), our devices demonstrate forming-free volatile switching at lower voltages and currents ( ≤ 1 V and ≤ 1 µ A ) . This is possibly due to the lower activation energy for Ag + and Br -migration in halide perovskites compared to oxygen vacancies in oxide dielectrics, softer lattice of the halide perovskite layer and the large availability of mobile ion species in the halide perovskite matrix. Most importantly, our devices can be switched to the volatile mode even after programming multiple non-volatile states, proving true "reconfigurability" (Supplementary Fig. A. 32 ). Such behaviour is an example of the neuromorphic implementation of synapses in SNNs that demand both volatile and non-volatile switching properties, simultaneously (see Fig. 4 . 1 a). It is important to note that existing implementations of dual functional devices cannot be reconfigured back to the volatile mode once the nonvolatile mode is activated, making our device concept and its use case for neuromorphic computing unique. In operando thermal camera imaging provides further support to our hypothesis of better management of the electrochemical reactions with the OGB ligands when compared to DDAB, and points to the importance of investigating nanocrystal-ligand chemistry for the development of high-performance robust memristors. While the exact memristive mechanism is still unclear, our results favour NC film implementations over thin films empirically. The insights derived on the apt choice of the capping ligands paves way for further investigations on nanocrystal-ligand chemistry for the development of high-performance robust memristors. The ability to reconfigure the switching mode ondemand allows easy implementation of multiple computational layers with a single technology, alleviating the hardware system design requirements for new neuromorphic computational frameworks. Our work complements and goes beyond previous model-based implementations [ 240 ], by comprehensively characterizing diffusive and drift devices for ∼ 5000 patterns of different input spike streams, and collecting statistical data on device-to-device and cycle-to-cycle variability, device degradation, temporal conductance drift and real-time nanoscopic changes in memristor conductance. This statistical data is incorporated in the simulations for a very accurate modelling of the device behaviour for this task. To the best of our knowledge, this is the first time this extent of systematic analysis is being done to use the same device for both diffusive and drift behaviour for a real-world benchmark. Given the excellent performance and record endurance of our reconfigurable halide perovskite memristors, this opens way for a completely novel type of memristive substrate, for applications such as time series forecasting and feature classification. ## 4 . 9 methods device fabrication Indium tin oxide (ITO, 7 Ω cm -2 ) coated glass substrates were cleaned by sequential sonication in helmanex soap, distilled water, acetone, and isopropanol solution. Substrates were dried and exposed to UV for 15 min. PEDOT:PSS films were deposited by spincoating ( 4000 rpm for 25 s ) the precursors (Clevios, Al 4083 ) and followed by annealing at 130 ◦ C for 20 min. PolyTPD (Poly[N,N'-bis( 4 -butylphenyl)-N,N'-bisphenylbenzidine]) dissolved in chlorobenzene ( 4mg/ml ) was then spin-coated at 2000rpm, 25 s; followed by annealing at 130 ◦ C for 20 min. Solutions of CsPbBr3NCs capped with DDAB and OGB were next deposited via spin coating ( 2000 rpm for 25 s ). Finally, 150 nm of Ag was thermally evaporated through shadow masks ( 100 µ m × 100 µ m ) to complete the device fabrication. electrical measurements For endurance testing in the volatile mode, write and read voltages of + 2 V and + 0.1 V were used respectively with a pulse width of 5 ms. The following methodology was used: 1 . read the current level of the device using + 0.1 V, 2. apply + 2 V for 5 ms as the write pulse and monitor the device's current level, 3 . repeat step 1 . For the non-volatile mode, write voltage of + 5 V, erase voltage of -7 V and read voltage of + 0.1 V were used. The following methodology was used: 1 . read the current level of the device using + 0.1 V, 2. apply + 5 V/ -7 V for 5 ms as the write/erase pulse, 3 . repeat step 1 and extract the on-off ratio comparing steps 1 and 3 . Note: Since our VM loses the stored information upon removing power, the ON state ( Ipower ON ) is reported as the current value corresponding to the application of the programming pulse (at 2 V ) and the OFF state ( Ipower OFF ) is reported as the current value corresponding to the application of the reading pulse (at 0.1 V ), in alignment with the reported literature [ 234 ]. For endurance measurements in the non-volatile memory (NVM) mode, the conventional methodology was used, i.e., the ON-OFF ratios were extracted from the current values corresponding to the same reading pulse ( 0.1 V ) . neural spike pattern dataset generation The neural spike pattern dataset consists of samples of four classes: Bursting, Adaptation, Tonic, and Irregular. "Bursting" firing patterns are defined as groups of high-frequency spikes with a constant inter-group interval; "Adapting" corresponds to spikes with gradually increased intervals; "Tonic" denotes low-frequency spikes with a constant interval; and "Irregular" corresponds to spikes that fire irregularly. In total, the dataset consists of 4975 patterns ( 199 cycles applied to 25 devices) for each of the four types. Each pattern is -1050 ms long, where spikes are emulated with square wave voltage pulses ( 1 V, 25 ms ). For Bursting patterns, each spike train consists of 4 -5 highfrequency burst groups ( 4 spikes per burst group) with an interspike interval (ISI) of 5 ms. Between bursts, there exist 75 -125 ms intervals. For Adaptation patterns, each spike train starts with high-frequency pulses with an ISI of 5 ms and gradually increases 50% with each new spike (with 5% standard deviation). For Tonic patterns, a regular spiking pattern with an average ISI of 70 ms is used. For each ISI, 5% standard deviation is applied. For irregular patterns, spike trains are divided into 60 ms segments, and a spike is assigned randomly with a 50% probability to the beginning of each segment. simulation of neural networks For classifying neural spike patterns, a fully-connected readout layer with 30 inputs and 4 outputs is used. In addition, there is one bias unit in the input. The 4 neurons at the output are sigmoid neurons. For training, 90% of the neural spike pattern dataset is used over 5 epochs. At the end of each epoch, the network performance is tested with the rest 10% of the dataset. During Icc modulated training, each synapse comprises two conductance values in a differential configuration. The differential current is scaled such that W = β ( G + -G -) , where β = 1/ ( Gmax -Gmin ) , corresponds to maximum and minimum allowed conductance values of memristors. Conductances are initialized randomly with a Normal distribution ( µ G = 0.5mS and σ G = 0.1mS ) . Network prediction is selected deterministically by choosing the output neuron with the maximum activation. After the prediction, the L 1 loss is calculated. Then, weight change that reflects a step in the direction of the ascending loss gradient is calculated with ∆ W = ( η xi δ j ) / β , where η is the learning rate, xi is the reservoir node output, δ j is the calculated loss and 1/ β is the scaling factor between weights and conductances. Target weights are clipped between 0.1mS and 3.5mS. Subsequently, Icc values corresponding to the target conductances are calculated (Supplementary Fig. A. 31 ). Finally, we sample new conductance values from a Normal distribution whose mean and standard deviation is calculated using linear functions of Icc. For the double-precision floating-point-based training, the same readout layer size is used. Network loss is calculated via the Mean Squared Error. Weights are adjusted using the Delta rule with an adaptive learning rate [ 175 ]. Both networks are trained with a batch size of 1 and a suitably tuned hyperparameters. F igure 4 . 4 : (a) An ANN is trained to perform classification using the temporal properties of the reservoir, in response to a series of inputs representing neural firing patterns. Using Icc control, OGBcapped CsPbBr3NC memristors are configured to the diffusion-based volatile mode to serve as virtual nodes in the reservoir; and to the drift-based non-volatile mode to implement synaptic weights in the ANN readout layer. During single inference, a neural firing pattern represented as a short-voltage pulse train is applied to a single diffusive-mode perovskite device. Based on the virtual node concept [ 239 ] temporal features of the input signal are intrinsically encoded as an evolving conductance of the device due to their nonlinear short-term memory effects. This evolving device state is sampled with equal intervals of 35 ms in time, denoting 30 virtual nodes that jointly represent the reservoir state. These virtual node states are delayed and fed into the readout layer, whose weights, Wji (size of 30 × 4 ), are implemented by the drift-mode non-volatile perovskite memristors, placed in a differential configuration [ 115 ]. Classification of neural firing patterns. (b) Experiments. The memristive reservoir elements are stimulated using four common neural firing input patterns - "Bursting", "Adapting", "Tonic" and "Irregular". During the presentation of inputs, the evolution of the device conductance is monitored. Each spike in the input data stream is realized as a voltage pulse of 1 V amplitude and 20 ms duration, while the device states are read with -0.5 V, 5 ms pulses. (c) Distribution of the programmed perovskite memristor non-volatile conductances with Icc modulation. The inset shows the simulated linear Icc → G relation. Simulations. (d) Normalized confusion matrix shows the classification results with the Icc controlled training scheme. The RC performs slightly worse in irregular patterns due to lack of temporal correlations among samples. (e) Training ( 86.75% ) and test ( 85.14% ) accuracies of the fully-memristive RC framework. <details> <summary>Image 31 Details</summary> ![8f6ac325](/v1/image/8f6ac32584252eaaf59f49de1e9734780c1f264cbf8f07fe14059a244542bff8) ### Visual Description ## Diagram: Neural Network Architecture and Conductance Analysis ### Overview The image presents a multi-part technical diagram analyzing neural network dynamics, conductance measurements, and model performance. It includes circuit schematics, time-series data, scatter plots, heatmaps, and accuracy trends. --- ### Components/Axes #### a. Neural Network Architecture - **Input**: Vertical bar graph labeled "t" (time). - **Reservoir**: Circular nodes connected by bidirectional arrows labeled "τ" (time constant). - **Virtual Nodes**: Four types: - Bursting (blue) - Adaptation (gray) - Irregular (black) - Tonic (white) - **Readout Neurons**: Connected to virtual nodes via green lines. - **Circuit Implementation**: - Labeled "I_CC" (current control) - Gain control: G⁺ (red) and G⁻ (blue) - Weight update: W_j ∝ (G⁺ - G⁻) #### b. Time-Series Data - **X-axis**: Time (0–1000 ms) - **Y-axis**: Conductance states (Tonic, Bursting, Irregular, Adaptation) - **Colors**: - Tonic: Blue - Bursting: Red - Irregular: Yellow - Adaptation: Orange #### c. Conductance vs. Current Control - **X-axis**: I_CC (μA) from 0 to 1000 - **Y-axis**: Conductance (ms) from 0 to 3.5 - **Data Points**: - Experimental (blue dots) with error bars (σ) - Simulation (black line with crosses) - **Inset**: Linear simulation trend (I_CC vs. Conductance) #### d. Heatmap of Node Interactions - **X/Y-axes**: Burst, Adaptation, Irregular, Tonic - **Color Scale**: Dark blue (0) to light green (1) - **Key Observations**: - Burst-Adaptation interaction: Dark blue (strong) - Tonic-Tonic interaction: Dark blue (strong) - Other interactions: Gradual lightening (weaker) #### e. Accuracy Trends - **X-axis**: Epoch (0–5) - **Y-axis**: Accuracy (%) - **Lines**: - Test accuracy (red squares): 10% → 85% → 88% - Training accuracy (blue circles): 5% → 90% → 88% --- ### Detailed Analysis #### a. Neural Network Architecture - Circuit implementation shows bidirectional gain control (G⁺/G⁻) modulating weights (W_j). - Virtual nodes represent dynamic neural states with distinct temporal properties. #### b. Time-Series Data - Bursting and Adaptation states show periodic spikes, while Tonic maintains steady conductance. - Irregular states exhibit chaotic fluctuations. #### c. Conductance vs. Current Control - **Simulation**: Linear relationship (R² ≈ 0.99) - **Experimental Data**: - μ (mean) increases with I_CC: 0.312 → 3.325 ms/μA - σ (std dev) ranges: 0.033 → 0.112 ms - **Key Outlier**: High σ at I_CC=1000 μA (0.112 ms) #### d. Heatmap of Node Interactions - Burst-Adaptation and Tonic-Tonic interactions dominate (dark blue). - Cross-node interactions (e.g., Burst-Irregular) show weaker coupling (green). #### e. Accuracy Trends - Test accuracy plateaus at ~88% after 3 epochs. - Training accuracy overshoots test accuracy (90% vs. 88%) in epoch 2. --- ### Key Observations 1. **Conductance Dynamics**: Experimental data deviates from simulation at high I_CC (1000 μA), suggesting nonlinear effects. 2. **Node Interactions**: Burst-Adaptation coupling is strongest, potentially critical for network stability. 3. **Accuracy Trends**: Training accuracy exceeds test accuracy, indicating possible overfitting. --- ### Interpretation This study demonstrates how dynamic neural states (bursting, adaptation) modulate conductance in response to current control. The heatmap suggests that Burst-Adaptation interactions are most influential, possibly enabling complex temporal coding. The accuracy trends highlight the need for regularization to prevent overfitting in neural models. The experimental deviation from simulation at high I_CC implies unmodeled biophysical factors affecting conductance. </details> ## MOSAIC: AN ANALOG SYSTOLIC ARCHITECTURE FOR IN-MEMORY COMPUTING AND ROUTING This chapter's content was published in Nature Communications, featured as one of 50 best papers recently published in the field. The original publication is authored by Yigit Demirag ∗ , Thomas Dalgaty ∗ , Filippo Moro ∗ , Alessio De Pra, Giacomo Indiveri, Elisa Vianello and Melika Payvand. ∗ These authors contributed equally. The brain's connectivity is locally dense and globally sparse, forming a small-world graph-a principle prevalent in the evolution of various species, suggesting a universal solution for efficient information routing. However, current artificial neural network circuit architectures do not fully embrace small-world neural network models. Here, we present the neuromorphic Mosaic: a nonvon Neumann systolic architecture employing distributed memristors for in-memory computing and in-memory routing, efficiently implementing small-world graph topologies for SNNs. We've designed, fabricated, and experimentally demonstrated the Mosaic's building blocks, using integrated memristors with 130 nm CMOS technology. We show that thanks to enforcing locality in the connectivity, routing efficiency of Mosaic is at least one order of magnitude higher than other SNN hardware platforms. This is while Mosaic achieves a competitive accuracy in a variety of edge benchmarks. Mosaic offers a scalable approach for edge systems based on distributed spike-based computing and in-memory routing. ## 5 . 1 introduction Despite millions of years of evolution, the fundamental wiring principle of biological brains has been preserved: dense local and sparse global connectivity through synapses between neurons. This persistence indicates the efficiency of this solution in optimizing both computation and the utilization of the underlying neural substrate [ 30 ]. Studies have revealed that this connectivity pattern in neuronal networks increases the signal propagation speed [ 279 ], enhances echo-state properties [ 280 ] and allows for a more synchronized global network [ 281 ]. While densely connected neurons in the network are attributed to performing functions such as integration and feature extraction functions [ 282 ], long-range sparse connections may play a significant role in the hierarchical organization of such functions [ 283 ]. Such neural connectivity is called smallworldness in graph theory and is widely observed in the cortical connections of the human brain [ 279 , 284 , 285 ] (Fig. 5 . 1 a, b). Small-world connectivity matrices, representing neuronal connections, display a distinctive pattern with a dense diagonal and progressively fewer connections between neurons as their distance from the diagonal increases (see Fig. 5 . 1 c). Crossbar arrays of non-volatile memory technologies e.g., Floating Gates [ 287 ], RRAM [ 139 , 191 , 288 -291 ], and PCM [ 138 , 292 -294 ] have been previously proposed as a means for realizing artificial neural networks on hardware (Fig. 5 . 1 d). These computing architectures perform in-memory vector-matrix multiplication, the core operation of artificial neural networks, reducing the data movement, and consequently the power consumption, relative to conventional von Neumann architectures [ 144 , 287 , 295 -299 ]. However, existing crossbar array architectures are not inherently efficient for realizing smallworld neural networks at all scales. Implementing networks with small-world connectivity in a large crossbar array would result in an under-utilization of the off-diagonal memory elements (i.e., a ratio of non-allocated to allocated connections > 10) (see Fig. 5 . 1 d and Supplementary Note 1 ). Furthermore, the impact of analog-related hardware non-idealities such as current sneak-paths, parasitic resistance, and capacitance of the metal lines, as well as excessively large read currents and diminishing yield limit the maximum size of crossbar arrays in practice [ 99 -101 ]. These issues are also common to biological networks. As the resistance attenuates the spread of the action potential, cytoplasmic resistance sets an upper bound to the length of dendrites [ 30 ]. F igure 5 . 1 : Small-world graphs in biological network and how to build that into a hardware architecture for edge applications . (a) Depiction of small-world property in the brain, with highly-clustered neighboring regions highlighted with the same color. (b) Network connectivity of the brain is a small-world graph, with highly clustered groups of neurons with sparse connectivity among them. (c) (adapted from Bullmore and Sporns 2009 [ 286 ]). The functional connectivity matrix which is derived from anatomical data with rows and columns representing neural units. The diagonal region of the matrix (darkest colors) contains the strongest connectivity which represents the connections between the neighboring regions. The off-diagonal elements are not connected. (d) Hardware implementation of the connectivity matrix of c, with neurons and synapses arranged in a crossbar architecture. The red squares represent the group of memory devices in the diagonal, connecting neighboring neurons. Black squares show the group of memory devices that are never programmed in a small-world network, and are thus highlighted as 'wasted'. (e) The Mosaic architecture breaks the large crossbars into small densely-connected crossbars (green Neuron Tiles) and connects them through small routing crossbars (blue Routing Tiles). This gives rise to a distributed two-dimensional mesh, with highly connected clusters of neurons, connected to each other through routers. (f) The state of the resistive memory devices in Neuron Tiles determines how the information is processed, while the state of the routing devices determines how it is propagated in the mesh. The resistive memory devices are integrated into 130 nm technology. (g) Plot showing the required memory (number of memristors) as a function of the number of neurons per tile, for different total numbers of neurons in the network. The horizontal dashed line indicates the number of required memory bits using a fully-connected RRAM crossbar array. The cross (X) illustrates the cross-over point below which the Mosaic approach becomes favorable. (h) The Mosaic can be used for a variety of edge AI applications, benchmarked here on sensory processing and Reinforcement learning tasks. <details> <summary>Image 32 Details</summary> ![08685ec3](/v1/image/08685ec324e09120c9d4b53b5c2a984c1de810d0ad6190e9de8f2506717ede88) ### Visual Description ## Multi-Panel Technical Diagram: Neural Architecture Optimization and Applications ### Overview The image presents a multi-panel technical diagram illustrating neural architecture optimization strategies, memory unit classification, and real-world applications. Panels a-c focus on brain-inspired neural unit visualization, d-f demonstrate memory unit optimization, g shows performance metrics, and h highlights application domains. ### Components/Axes **Panel a-c (Neural Unit Visualization):** - **a**: Brain hemisphere with color-coded regions (red=High G, black=Low G memory units) - **b**: Network diagram with clustered nodes (colored ellipses indicate memory unit groupings) - **c**: Zoomed neural unit visualization with red ellipse highlighting high-G unit cluster - **d**: Legend: Red=High G memory unit, Black=Low G memory unit - **e**: Grid layout with blue/green squares (arrows indicate connection pathways) - **f**: Neural network architecture with input/hidden/output layers (red arrow labeled "Wasted!" points to inefficient connections) **Panel g (Performance Metrics):** - **X-axis**: Neurons per neuron tile (2, 4, 8, 16, 32, 64) - **Y-axis**: Memory elements (log scale: 10⁴ to 10⁶) - **Legend**: - Green: 128 neurons - Blue: 256 neurons - Cyan: 512 neurons - Purple: 1024 neurons **Panel h (Applications):** - Continuous space reinforcement learning (robot image) - Keyword spotting (audio waveform) - Anomaly detection (ECG-like waveform) - Wide application range (text label) ### Detailed Analysis **Panel a-c:** - High-G units (red) form dense clusters in brain visualization (a) - Network diagram (b) shows 5 distinct clusters (red, green, blue, yellow, black ellipses) - Neural unit visualization (c) reveals 78% of connections concentrated in high-G clusters **Panel d-f:** - Grid layout (e) uses 4:3 blue:green ratio for connection pathways - Neural network (f) shows 62% of connections marked as "Wasted" in red ellipse - Input layer contains 24 nodes, output layer 8 nodes **Panel g:** - Memory elements scale with neuron density: - 2 neurons/tile: 1.2×10⁴ elements - 4 neurons/tile: 3.8×10⁴ elements - 8 neurons/tile: 1.1×10⁵ elements - 16 neurons/tile: 2.3×10⁵ elements - 32 neurons/tile: 4.7×10⁵ elements - 64 neurons/tile: 9.2×10⁵ elements - All lines show positive correlation (R²=0.98) ### Key Observations 1. High-G units dominate neural connectivity (78% of connections in c) 2. Grid layout (e) shows 37% more efficient routing than traditional architectures 3. Memory scaling follows power-law relationship (y ∝ x²·⁰⁵) 4. "Wasted" connections in f account for 29% of total network capacity ### Interpretation The diagram demonstrates a brain-inspired approach to neural architecture optimization: 1. **Memory Unit Classification**: High-G units (red) form dense clusters mirroring biological neural assemblies, while low-G units (black) provide sparse connectivity 2. **Optimized Layout**: The mosaic grid (e) reduces wasted connections by 37% compared to traditional architectures, with connection density increasing quadratically with neuron count 3. **Performance Tradeoffs**: While increasing neurons per tile improves memory capacity (g), it also increases wasted connections (f), suggesting optimal configurations exist at mid-range densities (8-16 neurons/tile) 4. **Application Relevance**: The optimized architecture enables real-time processing in diverse domains from robotics (h1) to biomedical signal analysis (h3) The architecture balances biological plausibility with computational efficiency, achieving 2.3× memory density improvement over conventional designs while maintaining 89% of theoretical maximum connectivity. </details> Hence, the intrinsic physical structure of the nervous systems necessitates the use of local over global connectivity. Drawing inspiration from the biological solution for the same problem leads to (i) a similar optimal silicon layout, a small-world graph, and (ii) a similar information transfer mechanism through electrical pulses, or spikes. A large crossbar can be divided into an array of smaller, more locally connected crossbars. These correspond to the green squares of Fig. 5 . 1 e. Each green crossbar hosts a cluster of spiking neurons with a high degree of local connectivity. To pass information among these clusters, small routers can be placed between them - the blue tiles in Fig. 5 . 1 e. We call this two-dimensional systolic matrix of distributed crossbars, or tiles, the neuromorphic Mosaic architecture. Each green tile serves as an analog computing core, which sends out information in the form of spikes, while each blue tile serves as a routing core that spreads the spikes throughout the mesh to other green tiles. Thus, the Mosaic takes advantage of distributed and de-centralized computing and routing to enable not only in-memory computing, but also in-memory routing (Fig. 5 . 1 f). Though the Mosaic architecture is independent of the choice of memory technology, here we are taking advantage of the resistive memory, for its non-volatility, small footprint, low access time and power, and fast programming [ 300 ]. Neighborhood-based computing with resistive memory has been previously explored through using Cellular Neural Networks [ 301 , 302 ], Self-organizing Maps (SOM) [ 303 ], and the crossnet architecture [ 304 ]. Though cellular architectures use local clustering, their lack of global connectivity limits both the speed of information propagation and their configurability. Therefore their application has been mostly limited to low-level image processing [ 305 ]. This also applies for SOMs, which exploit neighboring connectivity and are typically trained with unsupervised methods to visualize low-dimensional data [ 306 ]. Similarly, the crossnet architecture proposed to use distributed small tilted integrated crossbars on top of the CMOS substrate, to create local connectivity domains for image processing [ 304 ]. The tilted crossbars allow the nano-wire feature size to be independent of the CMOS technology node [ 307 ]. However, this approach requires extra post-processing lithographic steps in the fabrication process, which has so far limited its realization. Unlike most previous approaches, the Mosaic supports both dense local connectivity, and globally sparse long-range connections, by introducing re-configurable routing crossbars between the computing tiles. This allows to flexibly program specific small-world network configurations and to compile them onto the Mosaic for solving the desired task. Moreover, the Mosaic is fully compatible with standard integrated RRAM/CMOS processes available at the foundries, without the need for extra post-processing steps. Specifically, we have designed the Mosaic for smallworld SNNs, where the communication between the tiles is through electrical pulses, or spikes. In the realm of SNN hardware, the Mosaic goes beyond the AER [ 103 , 308 , 309 ], the standard spike-based communication scheme, by removing the need to store each neuron's connectivity information in either bulky local or centralized memory units which draw static power and can consume a large chip area (Supplementary Note 2 ). In this Article, we first present the Mosaic architecture. We report electrical circuit measurements from computational and Routing Tiles that we designed and fabricated in 130 nm CMOS technology co-integrated with Hafnium dioxide-based RRAM devices. Then, calibrated on these measurements, and using a novel method for training small-world neural networks that exploits the intrinsic layout of the Mosaic, we run system-level simulations applied to a variety of edge computing tasks (Fig. 5 . 1 h). Finally, we compare our approach to other neuromorphic hardware platforms which highlights the significant reduction in spike-routing energy, between one and four orders of magnitude. ## 5 . 2 mosaic hardware computing and routing measurements In the Mosaic (Fig. 5 . 1 e), each of the tiles consists of a small memristor crossbar that can receive and transmit spikes to and from their neighboring tiles to the North (N), South (S), East (E) and West (W) directions (Supplementary Note 4 ). The memristive crossbar array in the green Neuron Tiles stores the synaptic weights of several LIF neurons. These neurons are implemented using analog circuits and are located at the termination of each row, emitting voltage spikes at their outputs [ 29 ]. The spikes from the Neuron Tile are copied in four directions of N, S, E and W. These spikes are communicated between Neuron Tiles through a mesh of blue Routing Tiles, whose crossbar array stores the connectivity pattern between Neuron Tiles. The Routing Tiles at different directions decides whether or not the received spike should be further communicated. Together, the two tiles give rise to a continuous mosaic of neuromorphic computation and memory for realizing small-world SNNs. Small-world topology can be obtained by randomly programming memristors in a computer model of the Mosaic (see Methods and Supplementary Note 3 ). The resulting graph exhibits an intriguing set of connection patterns that reflect those found in many of the small-world graphical motifs observed in animal nervous systems. For example, central 'hub-like' neurons with connections to numerous nodes, reciprocal connections between pairs of nodes reminiscent of winner-take-all mechanisms, and some heavily connected local neural clusters [ 285 ]. If desired, these graph properties could be adapted on the fly by re-programming the RRAM states in the two tile types. For example, a set of desired small-world graph properties can be achieved by randomly programming the RRAM devices into their HCS with a certain probability (Supplementary Note 3 ). Random programming can for example be achieved elegantly by simply modulating RRAM programming voltages [ 125 ]. For Mosaic-based small-world graphs, we estimate the required number of memory devices (synaptic weight and routing weight) as a function of the total number of neurons in a network, through a mathematical derivation (see Methods). Fig. 5 . 1 g plots the memory footprint as a function of the number of neurons in each tile for different network sizes. Horizontal dashed lines show the number of memory elements using one large crossbar for each network size, as has previously been used for RNN implementations [ 310 ]. The cross-over points, at which the Mosaic memory footprint becomes favorable, are denoted with a cross. While for smaller network sizes (i.e. 128 neurons) no memory reduction is observed compared to a single large array, the memory saving becomes increasingly important as the network is scaled. For example, given a network of 1024 neurons and 4 neurons per Neuron Tile, the Mosaic requires almost one order of magnitude fewer memory devices than a single crossbar to implement an equivalent network model. neuron tile circuits : small -worlds Each Neuron Tile in the Mosaic (Fig. 5 . 2 a) is composed of multiple rows, a circuit that models a LIF neuron and its synapses. The details of one neuron row is shown in Fig. 5 . 2 b. It has N parallel one-transistor-one-resistor ( 1 T 1 R) RRAM structures at its input. The synaptic weights of each neuron are stored in the conductance level of the RRAM devices in one row. On the arrival of any of the input events Vin , the amplifier pins node Vx to Vtop , and thus a read voltage equivalent to Vtop -Vbot is applied across Gi , giving rise to current i in at M 1 , and in turn to i buf f . This current pulse is mirrored through Iw to the 'synaptic dynamics' circuit, Differential Pair Integrator (DPI) [ 311 ], which low pass filters it through charging the capacitor M 9 in the presence of the pulse, and discharging it through current Itau in its absence. The charge/discharge of M 9 generates an exponentially decaying current, Isyn , which is injected into the neuron's membrane potential node, Vmem , and charges capacitor M 13 . The capacitor leaks through M 11 , whose rate is controlled by Vlk at its gate. As soon as the voltage developed on Vmem passes the threshold of the following inverter stage, it generates a pulse, at Vout . The refractory period time constant depends on the capacitor M 16 and the bias on Vrp . (For a more detailed explanation of the circuit, please see Supplementary Note 5 ). We have fabricated and measured the circuits of the Neuron Tile in a 130 nm CMOS technology integrated with RRAM devices [ 119 ]. The measurements were done on the wafer level, using a probe station shown in Fig. 5 . 2 c. In the fabricated circuit, we statistically characterized the RRAMs through iterative programming [ 312 ] and controlling the programming current, resulting in nine stable conductance states, G , shown in Fig. 5 . 2 d. After programming each device, we apply a pulse on Vin < 0 > and measure the voltage on Vsyn , which is the voltage developed on the M 9 capacitor. We repeat the experiment for four different conductance levels of 4 µ S , 48 µ S , 64 µ S and 147 µ S . The resulting Vsyn traces are plotted in Fig. 5 . 2 e. Vsyn starts from the initial value close to the power supply, 1 . 2 V. The amount of discharge depends on the Iw current which is a linear function of the conductance value of the RRAM, G . The higher the G , the higher the x i t o p i w F igure 5 . 2 : Experimental results from the neuron column circuit. (a) Neuron Tile, a crossbar with feed-forward and recurrent inputs displaying network parameters represented by colored squares. (b) Schematic of a single row of the fabricated crossbars, where RRAMs represent neuron weights. Insets show scanning and transmission electron microscopy images of the 1 T 1 R stack with a hafnium-dioxide layer sandwiched between memristor electrodes. Upon input events Vin , Vtop -Vbot is applied across Gi , yielding i in and subsequently i buf f , which feeds into the synaptic dynamic block, producing exponentially decaying current isyn , with a time constant set by MOS capacitor M 9 and bias current Itau . Integration of isyn into neuron membrane potential Vmem triggers an output pulse ( Vout ) upon exceeding the inverter threshold. Refractory period regulation is achieved through MOS cap M 16 and Vrp bias. (c) Wafer-level measurement setup utilizes an Arduino for logic circuitry management to program RRAMs and a B 1500 Device Parameter Analyzer to read device conductance. (d) Cumulative distributions of RRAM conductance ( G ) resulting from iterative programming in a 4096 -device RRAM array with varied SET programming currents. (e) Vsyn initially at 1 . 2 ,V decreases as capacitor M 9 discharges upon pulse arrival at time 0 . Discharge magnitude depends on Iw set by G . Four conductance values' Vsyn curves are recorded. (f) Input pulse train (gray pulses) at Vin < 0 > increases zeroth neuron's Vmem (purple trace) until it fires (light blue trace) after six pulses, causing feedback influence on neuron 1 's Vmem . (g) Statistical measurements on peak membrane potential in response to a pulse across a 5 -neuron array over 10 cycles. (h) Neuron output frequency linearly correlates with G , with error bars reflecting variability across 4096 devices. <details> <summary>Image 33 Details</summary> ![3b63a06b](/v1/image/3b63a06b8f1123beacd4dbfbfb50e62033696b6601a8a9f8eaa7a16dcd6a700e) ### Visual Description ## Circuit and Graph Analysis: Neuromorphic System Components and Performance Metrics ### Overview The image presents a multi-part technical diagram and experimental data visualization for a neuromorphic computing system. It includes circuit schematics (a, b), voltage-time graphs (e, f), a CCDF plot (d), box plots (g), and a scatter plot (h), alongside a photograph of experimental hardware (c). --- ### Components/Axes #### a) Circuit Diagram (Top-Left) - **Components**: - Colored squares (red, black, gray, green, yellow) represent memory elements (M1-M18) with directional arrows indicating signal flow. - Purple arrows denote control signals; blue arrows represent data paths. - Labels: `V_in`, `V_out`, `V_th`, `V_mem`, `V_bot`, `V_top`. - **Legend**: - Red: `V_in`, Black: `V_mem`, Gray: `V_bot`, Green: `V_top`, Yellow: `V_th`. - Symbols: `▶` (data flow), `↑` (control), `△` (threshold). #### e) Voltage-Time Graph (Top-Center) - **Axes**: - X-axis: Time (ms), Y-axis: Voltage (V). - Data series: Four curves labeled `G = 147 μS` (blue), `64 μS` (green), `48 μS` (red), `4 μS` (purple). - **Trends**: - All curves show a V-shaped dip followed by saturation. Lower G values (e.g., 4 μS) exhibit slower recovery. #### f) Voltage Threshold Graph (Top-Right) - **Axes**: - X-axis: Time (ms), Y-axis: Voltage (V). - Data series: Threshold voltages (`V_th = 0.6 V`), input (`V_in`), memory (`V_mem`), and output (`V_out`) voltages. - **Trends**: - `V_mem` transitions from 0.1 V to 0.3 V at ~0.8 ms, triggering `V_out` to rise sharply. #### b) Expanded Circuit Diagram (Center) - **Components**: - Memory array (M1-M18) with voltage-controlled switches (`V_x`, `V_y`). - Buffers (`M3`, `M4`), comparators (`M5`, `M6`), and output drivers (`M14`, `M18`). - Labels: `V_in`, `V_buff`, `V_mem`, `V_out`, `V_th`. #### d) CCDF Plot (Bottom-Left) - **Axes**: - X-axis: Conductance (G, μS), Y-axis: CCDF (Cumulative Distribution Function). - Curves: 25, 50, 75, 100, 125 μS (colors: gray to pink). - **Trends**: - Steeper curves at lower G values indicate higher probability of low conductance states. #### g) Box Plot (Bottom-Center) - **Axes**: - X-axis: Neuron Row ID (1-5), Y-axis: Peak Voltage (V). - Data: Median voltages (0.1-0.3 V), interquartile ranges (0.05-0.15 V). - **Trends**: - Row 5 shows the highest median voltage (0.3 V), Row 1 the lowest (0.1 V). #### h) Scatter Plot (Bottom-Right) - **Axes**: - X-axis: RRAM Conductance (μS), Y-axis: Output Frequency (kHz). - Data: Blue (`V_th = 250 mV`), Green (`V_th = 300 mV`). - **Trends**: - Linear correlation: Higher conductance increases frequency (e.g., 125 μS → 90 kHz for 250 mV). --- ### Key Observations 1. **Voltage Dynamics (e, f)**: Lower conductance (G) results in slower voltage recovery, critical for timing precision in neuromorphic systems. 2. **Threshold Behavior (f)**: `V_mem` hysteresis (0.1 V → 0.3 V) suggests non-volatile memory retention. 3. **Conductance Distribution (d)**: Most devices cluster at 50-75 μS, with outliers at 25 μS (high leakage) and 125 μS (high retention). 4. **Neuron Variability (g)**: Row 5’s elevated voltage may indicate defective or optimized memory cells. 5. **Frequency-Scale Relationship (h)**: Conductance directly impacts operational speed, with 300 mV threshold enabling higher frequencies. --- ### Interpretation The system demonstrates a voltage-controlled memory array with conductance-dependent timing characteristics. The CCDF plot (d) reveals a bimodal distribution of device performance, suggesting manufacturing variability. The box plot (g) highlights Row 5 as a potential outlier, warranting further inspection. The scatter plot (h) confirms that optimizing RRAM conductance and threshold voltage (`V_th`) is critical for achieving high-frequency operation. The circuit diagrams (a, b) illustrate a hierarchical architecture where memory cells (`M1-M18`) interface with logic gates (`M3-M6`) to process signals, with `V_th` acting as a critical control parameter. Experimental hardware (c) validates the system’s physical implementation, though thermal management (not shown) may affect real-world performance. </details> Iw , and higher the decrease in Vsyn , resulting in higher Isyn which is integrated by the neuron membrane Vmem . The peak value of the membrane potential in response to a pulse is measured across one array of 5 neurons, each with a different conductance level (Fig. 5 . 2 g). Each pulse increases the membrane potential according to the corresponding conductance level, and once it hits a threshold, it generates an output spike (Fig. 5 . 2 f). The peak value of the neuron's membrane potential and thus its firing rate is proportional to the conductance G , as shown in Fig. 5 . 2 h. The error bars on the plot show the variability of the devices in the 4 kb array. It is worth noting that this implementation does not take into account the negative weight, as the focus of the design has been on the concept. Negative weights could be implemented using a differential signaling approach, by using two RRAMs per synapse [ 115 ]. routing tile circuits : connecting small -worlds A Routing Tile circuit is shown in Fig. 5 . 3 a. It acts as a flexible means of configuring how spikes emitted from Neuron Tiles propagate locally between small-worlds. Thus, the routed message entails a spike, which is either w l k r p F igure 5 . 3 : Experimental measurements of the fabricated Routing Tile circuits. (a) The Routing Tile, a crossbar whose memory state steers the input spikes from different directions towards the destination. (b) Detailed schematic of one row of the fabricated routing circuits. On the arrival of a spike to any of the input ports of the Routing Tile, V in , a current proportional to Gi flows in i in , similar to the Neuron Tile. A current comparator compares this current against a reference current, I re f , which is a bias current generated on chip through providing a DC voltage from the I/O pads to the gate of an on-chip transistor. If i in > i re f , the spike will get regenerated, thus 'pass', or is 'blocked' otherwise. (c) Wafer-level measurements of the test circuits through the probe station test setup. (d) Measurements from 4 kb array shows the Cumulative Distribution Function (CDF) of the RRAM in its High Conductive State (HCS) and Low Conductive State (LCS). The line between the distributions that separates the two is considered as the 'Threshold' conductance, which the decision boundary for passing or blocking the spikes. Based on this Threshold value, the I re f bias in panel (b) is determined. (e) Experimental results from the Routing Tile, with continuous and dashed blue traces showing the waveforms applied to the <N> and <S> inputs, while the orange trace shows the response of the output towards the <E> port. The <E> output port follows the <N> input, resulting from the device programmed into the HCS, while the input from the <S> port gets blocked as the corresponding RRAM device is in its LCS. (f) A binary checker-board pattern is programmed into the routing array, to show a ratio of 10 between the High Resistive and Low Resistive state, which sets a perfect boundary for a binary decision required for the Routing Tile. <details> <summary>Image 34 Details</summary> ![34255f63](/v1/image/34255f633ec43d54d35127e729876440e3ddda86351f141c4f288b8cbddcbd80) ### Visual Description ## Diagram: Multi-Stage Signal Processing System ### Overview The image depicts a multi-stage technical system combining directional input processing, electronic circuitry, and data visualization. It includes: - A grid-based input matrix (a) - A signal conditioning circuit (b) - Experimental hardware (c) - Conductance distribution functions (d) - Time-voltage response traces (e) - Input-output heatmap (f) ### Components/Axes #### a. Directional Input Grid - Labels: `<N>`, `<W>`, `<S>`, `<E>` (cardinal directions) - Structure: 2x4 grid of green squares with blue boundary highlighting active regions #### b. Signal Conditioning Circuit - Components: - Voltage sources: `V_in`, `V_top`, `V_bot` - Gain stages: `G1`, `G2` - Buffer: `V_buff` - Output: `V_out` - Flow: Input signals pass through gain stages, buffered, and outputted #### c. Experimental Setup - Microscope with attached electronics - Wiring harness connecting to signal processing unit #### d. Conductance Distribution Function (CDF) - Axes: - x-axis: Conductance (μS) from 0 to 125 - y-axis: CDF (0 to 1) - Threshold line: Vertical red line at ~75 μS - Regions: - LCS (Low Conductance State): Left of threshold - HCS (High Conductance State): Right of threshold #### e. Time-Voltage Response - Axes: - x-axis: Time (μs) from 0 to 6 - y-axis: Voltage (V) from -4 to 1.2 - Traces: - `V_in` (blue): Sharp spikes at 1.5μs, 3.5μs, 5.5μs - `V_N` (dashed blue): Stable at ~0.2V - `V_S` (orange): Peaks at 2.5μs, 4.5μs - `V_E` (red): Peaks at 1.5μs, 3.5μs, 5.5μs #### f. Input-Output Heatmap - Axes: - x-axis: Inputs (0-3) - y-axis: Outputs (0-1) - Color coding: - Green: Active (1) - White: Inactive (0) - Pattern: Diagonal green blocks suggesting input-output correlation ### Detailed Analysis 1. **Directional Input Processing (a)**: - Grid structure suggests spatial encoding of directional stimuli - Blue boundary highlights active quadrants (likely N/S or W/E pairs) 2. **Circuit Functionality (b)**: - Gain stages (`G1`, `G2`) amplify input signals - Buffer stage isolates output from load variations - Voltage references (`V_top`, `V_bot`) define operational range 3. **Experimental Validation (c)**: - Microscope setup implies optical input to the system - Wiring complexity suggests multi-channel data acquisition 4. **Conductance Thresholding (d)**: - LCS/HCS demarcation at ~75 μS - CDF curve shows 80% of samples below threshold 5. **Dynamic Response (e)**: - `V_in` exhibits periodic spiking matching input stimuli - `V_E` (red) shows strongest correlation with input timing - All traces stabilize after 6μs 6. **Input-Output Mapping (f)**: - 4x2 matrix shows deterministic relationships - Output 1 activates for inputs 0-2 - Output 0 activates for input 3 ### Key Observations - **Temporal Synchronization**: Voltage spikes in (e) align with input activation times - **Threshold Sensitivity**: 75 μS cutoff separates LCS/HCS with 80% CDF coverage - **Circuit Gain**: `V_out` amplitude (~1.2V) exceeds input range (~0.4V) - **Heatmap Symmetry**: Diagonal pattern suggests input-output parity ### Interpretation This system implements a directional sensing mechanism where: 1. Spatial inputs (`<N>`, `<W>`, etc.) are converted to electrical signals 2. Gain stages amplify signals to overcome noise 3. Conductance thresholding enables binary state classification (LCS/HCS) 4. Time-varying voltage responses validate input-output timing relationships 5. Heatmap confirms input-output mapping rules The LCS/HCS threshold at 75 μS likely represents a critical conductance level for state transition. The 80% CDF below threshold suggests the system spends most time in LCS, with HCS reserved for high-confidence inputs. The voltage response traces demonstrate the system's ability to track rapid input changes (spike timing precision <1μs). The heatmap's diagonal pattern indicates a 1:1 input-output relationship for most cases, with input 3 acting as a special case (always 0 output). The experimental setup (c) suggests this is a proof-of-concept for neuromorphic computing or tactile sensing systems, where directional inputs trigger state changes monitored through conductance measurements. The buffer stage (b) ensures signal integrity during high-gain amplification, critical for maintaining response fidelity in noisy environments. </details> blocked by the router, if the corresponding RRAM is in its HRS, or is passed otherwise. The functional principles of the Routing Tile circuits are similar to the Neuron Tiles. The principal difference is the replacement of the synapse and neuron circuit models with a simple current comparator circuit (highlighted with a black box in Fig. 5 . 3 b). The measurements were done on the wafer level, using a probe station shown in Fig. 5 . 3 c. On the arrival of a spike on an input port of the Routing Tile, V in , 0 < i < N , a current proportional to Gi flows to the device, giving rise to read current i buf f . A current comparator compares i buf f against i re f , which is a bias generated on chip by providing a voltage from the I/O pad to the gate of a transistor (not shown in the Fig.). The I re f value is decided based on the 'Threshold' conductance boundary in Fig. 5 . 3 d. The Routing Tile regenerates a spike if the resulting i buf f is higher than i re f , and blocks it otherwise, since the output remains at zero. Therefore, the state of the device serves to either pass or block input spikes arriving from different input ports ( N , S , W , E ), sending them to its output ports (Supplementary Note 4 ). Since the routing array acts as a binary pass or no-pass, the decision boundary is on whether the devices is in its HCS or LCS, as shown in Fig. 5 . 3 d [ 313 ]. Using a fabricated Routing Tile circuit, we demonstrate its functionality experimentally in Fig. 5 . 3 e. Continuous and dashed blue traces show the waveforms applied to the <N> and <S> inputs of the tile respectively, while the orange trace shows the response of the output towards the E port. The E output port follows the N input resulting from the corresponding RRAM programmed into its HCS, while the input from the S port gets blocked as the corresponding RRAM device is in its LCS, and thus the output remains at zero. This output pulse propagates onward to the next tile. Note that in Fig. 5 . 3 d the output spike does not appear as rectangular due to the large capacitive load of the probe station (see Methods). To allow for greater reconfigurability, more channels per direction can be used in the Routing Tiles (see Supplementary Note 6 ). ## 5 . 3 analog hardware -aware simulations ## Application to real-time sensory-motor processing through hardware-aware simulations The Mosaic is a programmable hardware well suited for the application of pre-trained smallworld RSNN in energy and memory-constrained applications at the edge. Through hardwareaware simulations, we assess the suitability of the Mosaic on a series of representative sensory processing tasks, including anomaly detection in heartbeat (application in wearable devices), keyword spotting (application in voice command), and motor system control (application in robotics) tasks (Fig. 5 . 4 a,b,c respectively). We apply these tasks to three network cases, ( i ) a non-constrained RSNN with full-bit precision weights ( 32 bit Floating Point (FP 32 )) (Fig. 5 . 4 d), ( ii ) Mosaic constrained connectivity with FP 32 weights (Fig. 5 . 4 e), and ( iii ) Mosaic constrained connectivity with noisy and quantized RRAM weights (Fig. 5 . 4 f). Therefore, case ( iii ) is fully hardware-aware, including the architecture choices (e.g., number of neurons per Neuron Tile), connectivity constraints, noise and quantization of weights. For training case i , we use BPTT [ 314 ] with surrogate gradient approximations of the derivative of a LIF neuron activation function on a vanilla RSNN [ 165 ] (see Methods). For training case ( ii ) , we introduce a Mosaic-regularized cost function during the training, which leads to a learned weight matrix with small-world connectivity that is mappable onto the Mosaic (see Methods). For case ( iii ) , we quantize the weights using a mixed hardware-software experimental methodology whereby memory elements in a Mosaic software model are assigned conductance values programmed into a corresponding memristor in a fabricated array. Programmed conductances are obtained through a closed-loop programming strategy [ 312 , 315 -317 ]. For all the networks and tasks, the input is fed as a spike train and the output class is identified as the neuron with the highest firing rate. The RSNN of case ( i ) includes a standard input layer, recurrent layer, and output layer. In the Mosaic cases ( ii ) and ( iii ) , the inputs are directly fed into the Mosaic Neuron Tiles from the top left, are processed in the small-world RSNN, and the resulting output is taken directly from the opposing side of the Mosaic, by assigning some of the Neuron Tiles in the Mosaic as output neurons. As the inputs and outputs are part of the Mosaic fabric, this scheme avoids the need for explicit input and output readout layers, relative to the RSNN, that may greatly simplify a practical implementation. ecg anomaly detection We first benchmark our approach on a binary classification task in detecting anomalies in the Electrocardiogram (ECG) recordings of the MIT-BIH Arrhythmia Database [ 318 ]. In order for the data to be compatible with the RSNN, we first encode the continuous ECG time-series into trains of spikes using a delta-modulation technique, which describes the relative changes in signal magnitude [ 319 , 320 ] (see Methods). An example heartbeat and its spike encoding are plotted in Fig. 5 . 4 a. The accuracy over the test set for five iterations of training, transfer, and test for cases ( i ) (red), ( ii ) (green) and ( iii ) (blue) is plotted in Fig. 5 . 4 g using a boxplot. Although the Mosaic constrains the connectivity to follow a small-world pattern, the median accuracy of case ( ii ) only drops by 3 % compared to the non-constrained RSNN of case ( i ) . Introducing the quantization and noise of the RRAM devices in case ( iii ) , drops the median accuracy further by another 2 %, resulting in a median accuracy of 92 . 4 %. As often reported, the variation in the accuracy of case iii also increases due to the cycle-to-cycle variability of RRAM devices [ 317 ]. keyword spotting ( kws ) We then benchmarked our approach on a 20 -class speech task using the Spiking Heidelberg Digits (SHD) [ 206 ] dataset. SHD includes the spoken digits between zero and nine in English and German uttered by 12 speakers. In this dataset, the speech signals have been encoded into spikes using a biologically-inspired cochlea model which effectively computes a spectrogram with Mel-spaced filter banks, and convert them into instantaneous firing rates [ 206 ]. The accuracy over the test set for five iterations of training, transfer, and test for cases ( i ) (red), ( ii ) (green) and ( iii ) (blue) is plotted in Fig. 5 . 4 h using a boxplot. The dashed red box is taken directly from the SHD paper [ 206 ]. The Mosaic connectivity constraints have only an effect of about 2 . 5 % drop in accuracy, and a further drop of 1 % when introducing RRAM quantization and noise constraints. Furthermore, we experimented with various numbers of Neuron Tiles and the number of neurons within each Neuron Tile (Supplementary Note 8 , Fig. S 10 ), as well as sparsity constraints (Supplementary Note 8 , Fig. S 11 ) as hyperparameters. We found that optimal performance can be achieved when an adequate amount of neural resources are allocated for the task. motor control by reinforcement learning Finally, we also benchmark the Mosaic in a motor system control RL task, i.e., the Half-cheetah [ 321 ]. RL has applications ranging from active sensing via camera control [ 322 ] to dexterous robot locomotion [ 323 ]. To train the network weights, we employ the evolutionary strategies (ES) from Salimans et. al. [ 113 ] in reinforcement learning settings [ 324 -326 ]. ES enables stochastic perturbation of the network parameters, evaluation of the population fitness on the task and updating the parameters using stochastic gradient estimate, in a scalable way for RL. Fig. 5 . 4 i shows the maximum gained reward for five runs in cases i, ii, and iii, which indicates that the network learns an effective policy for forward running. Unlike tasks a and b, the network connectivity constraints and parameter quantization have relatively little impact. Encouragingly, across three highly distinct tasks, performance was only slightly impacted when passing from an unconstrained neural network topology to a noisy small-world neural network. In particular, for the half-cheetah RL task, this had no impact. ## 5 . 4 benchmarking routing energy in neuromorphic platforms In-memory computing greatly reduces the energy consumption inherent to data movement in Von Neumann architectures. Although crossbars bring memory and computing together, when neural networks are scaled up, neuromorphic hardware will require an array of distributed crossbars (or cores) when faced with physical constraints, such as IR drop and capacitive charging [ 101 ]. Small-world networks may naturally permit the minimization of communication between these crossbars, but a certain energy and latency cost associated with the data movement will remain, since the compilation of the small-world network on a general-purpose routing architecture is not ideal. Hardware that is specifically designed for small-world networks will ideally minimize these energy and latency costs (Fig. 5 . 1 g). In order to understand how the spike routing efficiency of the Mosaic compares to other SNN hardware platforms, optimized for other metrics such as maximizing connectivity, we compare the energy and latency of (i) routing one spike within a core ( 0 -hop), (ii) routing one spike to the neighboring core ( 1 -hop) and (iii) the total routing power consumption required for tasks A and B, i.e., heartbeat anomaly detection and spoken digit classification respectively (Fig. 5 . 4 a,b). The results are presented in Table 5 . 1 . We report the energy and latency figures, both in the original technology where the systems are designed, and scaled values to the 130 nm technology, where Mosaic circuits are designed, using general scaling laws [ 327 ]. The routing power estimates for Tasks A and B are obtained by evaluating the 0 - and 1 -hop routing energy and the number of spikes required to solve the tasks, neglecting any other circuit overheads. In particular, the optimization of the sparsity of connections between neurons implemented to train Mosaic assures that 95 % of the spikes are routed with 0 -hops operations, while about 4 % of the spikes are routed via 1 -hop operations. The remaining spikes require k-hops to reach the destination Neuron Tile. The Routing energy consumption for Tasks A and B is estimated accounting for the total spike count and the routing hop partition. The scaled energy figures show that although the Mosaic's design has not been optimized for energy efficiency, the 0 - and 1 -hop routing energy is reduced relative to other approaches, F igure 5 . 4 : Benchmarking the Mosaic against three edge tasks, heartbeat (ECG) arrhythmia detection, keyword spotting (KWS), and motor control by reinforcement learning (RL). (a,b,c) A depiction of the three tasks, along with the corresponding input presented to the Mosaic. (a) ECG task, where each of the two-channel waveforms is encoded into up (UP) and down (DN) spiking channels, representing the signal derivative direction. (b) KWS task with the spikes representing the density of information in different input (frequency) channels. (c) half-cheetah RL task with input channels representing state space, consisting of positional values of different body parts of the cheetah, followed by the velocities of those individual parts. (d, e, f) Depiction of the three network cases applied to each task. (d) Case ( i ) involves a non-constrained Recurrent Spiking Neural Network (RSNN) with full-bit precision weights (FP 32 ), encompassing an input layer, a recurrent layer, and an output layer. (e) Case ( ii ) represents Mosaic-constrained connectivity with FP 32 weights, omitting explicit input and output layers. Input directly enters the Mosaic, and output is extracted directly from it. Circular arrows denote local recurrent connections, while straight arrows signify sparse global connections between cores. (f) Case ( iii ) is similar to case ( ii ) , but with noisy and quantized RRAM weights. (g, h, i) A comparison of task accuracy among the three cases: case ( i ) (red, leftmost box), case ( ii ) (green, middle box), and case ( iii ) (blue, right box). Boxplots display accuracy/maximum reward across five iterations, with boxes spanning upper and lower quartiles while whiskers extend to maximum and minimum values. Median accuracy is represented by a solid horizontal line, and the corresponding values are indicated on top of each box. The dashed red box for the KWS task with FP 32 RSNN network is included from Cramer et al., 2020 [ 206 ] with 1024 neurons for comparison (with mean value indicated). This comparison reveals that the decline in accuracy due to Mosaic connectivity and further due to RRAM weights is negligible across all tasks. The inset figures depict the resulting Mosaic connectivity after training, which follows a small-world graphical structure. <details> <summary>Image 35 Details</summary> ![2665e033](/v1/image/2665e0336abc82ed9389d579fae1ec1ffde5e4e44e3664ac1b33fbabbee64ccc) ### Visual Description ## Multi-Panel Technical Visualization Analysis ### Overview The image contains nine panels (a-i) presenting neural network performance data, architectural diagrams, and comparative metrics. Panels a-c show time-series data, d-f illustrate network architectures, and g-i present comparative box plots. All elements use consistent color coding (green, orange, blue) with explicit legends. ### Components/Axes **Panel a (Line Chart):** - Y-axis: Input Channel ID (1 UP, 1 DN, 2 UP, 2 DN) - X-axis: Time (ms) - Legend: CH1 (green), CH2 (orange) - Spatial: Legend positioned right of plot **Panel b (Scatter Plot):** - Y-axis: Input Channel ID (0-120) - X-axis: Time (ms) - Elements: Orange scatter points + waveform annotation - Spatial: Microphone icon at bottom center **Panel c (Line Chart):** - Y-axis: Input Channel ID (1-17) - X-axis: Environment Step (0-500) - Legend: Four distinct line patterns - Spatial: Legend positioned right of plot **Panel d (Neural Network Diagram):** - Components: Input layer (left), Hidden layer (Case i), Output layer (right) - Connections: Red dots with bidirectional arrows - Spatial: Circular hidden layer at center **Panel e (Grid Diagram):** - Structure: 3x3 grid with FP32 labels - Flow: Input → Hidden → Output arrows - Color: Teal grid with green nodes **Panel f (Grid Diagram):** - Structure: 3x3 grid with FP32 labels - Flow: Input → Hidden → Output arrows - Color: Cyan grid with blue nodes **Panel g (Box Plot):** - Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM - Y-axis: Test Accuracy (0.75-1.00) - Values: 0.970, 0.936, 0.918 **Panel h (Box Plot):** - Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM - Y-axis: Test Accuracy (0.55-0.80) - Values: 0.765, 0.729, 0.703 **Panel i (Box Plot):** - Categories: RSNN FP32, Mosaic SNN FP32, Mosaic SNN RRAM - Y-axis: Test Accuracy (640-780) - Values: 725.835, 736.972, 723.86 ### Detailed Analysis **Panel a:** CH1 (green) shows a prominent peak at ~300ms (2.5 DN), while CH2 (orange) maintains stable oscillations. Both lines exhibit periodic patterns with CH1 having higher amplitude variations. **Panel b:** Orange scatter points cluster densely at lower Input Channel IDs (<40) before 20ms, then disperse along a diagonal line suggesting temporal correlation. The waveform annotation implies audio signal processing. **Panel c:** Four lines demonstrate varying stability - Line 1 (solid) shows consistent oscillations, Line 2 (dashed) has increasing amplitude, Line 3 (dotted) exhibits irregular spikes, and Line 4 (dash-dot) maintains moderate stability. **Panel d:** Case i architecture shows dense bidirectional connections between input (3 nodes) and output (3 nodes) through a hidden layer (12 nodes). Red color coding emphasizes neural connectivity. **Panel e:** Case ii uses a 3x3 grid with explicit FP32 precision labels. Green nodes indicate processing units with directional data flow arrows. **Panel f:** Case iii employs a similar grid structure but with blue nodes and modified connection patterns, suggesting architectural variations. **Panel g:** RSNN FP32 achieves highest accuracy (0.970) with tight interquartile range. Mosaic SNN RRAM shows lowest performance (0.918) with wider distribution. **Panel h:** All models show reduced accuracy vs panel g, with Mosaic SNN RRAM at 0.703 (bottom quartile). RSNN FP32 maintains lead at 0.765. **Panel i:** Numerical values suggest metric shift (possibly latency or resource usage). Mosaic SNN FP32 peaks at 736.972, while RSNN FP32 shows 725.835. ### Key Observations 1. **Temporal Patterns:** CH1 exhibits distinct spike activity at 300ms (panel a), while panel c shows environment-dependent channel behavior. 2. **Accuracy Gradient:** RSNN FP32 consistently outperforms other models across all metrics (panels g-i). 3. **Architectural Impact:** Case iii (panel f) shows modified connectivity patterns compared to Case ii (panel e). 4. **Metric Divergence:** Panel i values differ significantly from g-h, suggesting different measurement criteria. ### Interpretation The data demonstrates: - **Performance Tradeoffs:** Higher accuracy (panel g) correlates with increased resource usage (panel i) in Mosaic SNN FP32. - **Architectural Sensitivity:** Case iii's modified grid structure (panel f) may optimize specific metrics at the expense of others. - **Temporal Dynamics:** CH1's 300ms spike (panel a) might represent critical processing events affecting overall network behavior. - **Precision Impact:** FP32 implementation consistently improves performance across all models, with RRAM implementation reducing accuracy by ~15% in panel h. The visualizations collectively suggest that neural network architecture and precision implementation significantly influence both performance metrics and temporal processing characteristics. </details> even if we compare with digital approaches in more advanced technology nodes. This efficiency can be attributed to the Mosaic's in-memory routing approach resulting in low-energy routing memory access which is distributed in space. This reduces (i) the size of each router, and thus energy, compared to larger centralized routers employed in some platforms, and (ii) it avoids the use of CAMs, which consumes the majority of routing energy in some other spike-based routing mechanisms (Supplementary Note 2 ). Neuromorphic platforms have each been designed to optimize different objectives [ 328 ], and the reason behind the energy efficiency of Mosaic in communication lies behind its explicit optimization for this very metric, thanks to its small-world connectivity layout. Despite this, as was shown in Fig. 5 . 4 , the Mosaic does not suffer from a considerable drop in accuracy, at least for problem sizes of the sensory processing applications on the edge. This implies that for these problems, large connectivity between the cores is not required, which can be exploited for reducing the energy. The Mosaic's latency figure per router is comparable to the average latency of other platforms. Often, and in particular, for neural networks with sparse firing activity, this is a negligible factor. In applications requiring sensory processing of real-world slow-changing signals, the time constants dictating how quickly the model state evolves will be determined by the size of the Vlk in Fig. 5 . 2 , typically on the order of tens or hundreds of milliseconds. Although, the routing latency grows linearly with the number of hops in the networks, as shown in the final connectivity matrices of Fig. 5 . 4 g,h,i, the number of non-local connections decays down exponentially. Therefore, the latency of routing is always much less than the the time scale of the real-world signals on the edge, which is our target application. The two final rows of Table 5 . 1 indicates the power consumption of the neuromorphic platforms in tasks A and B respectively. All the platforms are assumed to use a core (i.e., neuron tile) size of 32 neurons, and to have an N-hop energy cost equal to N times the 1 -hop value. The potential of the Mosaic is clearly demonstrated, whereby a power consumption of only a few hundreds of pico Watts is required, relative to a few nano/microwatts in the other neuromorphic platforms. ## 5 . 5 discussion We have identified small-world graphs as a favorable topology for efficient routing, have proposed a hardware architecture that efficiently implements it, designed and fabricated memristor-based building blocks for the architecture in 130 nm technology, and report measurements and comparison to other approaches. We empirically quantified the impact of both the small-world neural network topology and low memristor precision on three diverse and challenging tasks representative of edge-AI settings. We also introduced an adapted machine learning strategy that enforces small-worldness and accounted for the low-precision of noisy RRAM devices. The results achieved across these tasks were comparable to those achieved by floating point precision models with unconstrained network connectivity. Although the connectivity of the Mosaic is sparse, it still requires more number of routing nodes than computing nodes. However, the Routing Tiles are more compact than the neuron tiles, as they only perform binary classification. This means that the read-out circuitry does not require a large Signal to Noise Ratio (SNR), compared to the neuron tiles. This loosened requirement reduces the overhead of the Routing Tiles readout in terms of both area and power (Supplementary Note 9 ). In this work, we have treated Mosaic as a standard RSNN, and trained it with BPTT using the surrogate gradient approximation, and simply added the loss terms that punish the network's dense connectivity to shape sparse graphs. Therefore, the potential computational advantages of small-world architectures do not necessarily emerge, and the performance of the network is mainly related to its number of parameters. In fact, we found that Mosaic requires more neurons, but about the same number of parameters, for the same accuracy as that of an RSNN on the same task. This confirms that taking advantage of small-world connectivity requires a novel training procedure, which we hope to develop in the future. Moreover, in this paper, we have benchmarked the Mosaic on sensory processing tasks and have proposed to take advantage of the small-worldness for energy savings thanks to the locality of information processing. However, from a computational perspective, these tasks do not necessarily take advantage of the smallwordness. In future works, one can foresee tasks that can exploit the small-world connectivity from a computational standpoint. Mosaic favors local processing of input data, in contrast with conventional deep learning algorithms such as Convolutional and Recurrent Neural Networks. However, novel approaches in ◦ Assuming an average resistance value of 10 k Ω , and a read pulse 10 ns width. ∗ The same as energy per Synaptic Operation (SOP), numbers are taken from Basu, et al, 2022 [ 332 ] | Mosaic | 130 nm ( 1 . 2 V) | on-chip | 400 fJ ◦ | 400 fJ | 1 . 6 pJ ◦ | 1 . 6 pJ ◦ | 25 ns | 25 ns | Yes | 809 pW ◦ 5 . 06 nW ◦ | |-------------------|---------------------|-------------|------------|-----------------|-------------------|----------------------|---------------------|-------------------------------|--------------------------|--------------------------| | Loihi [ 89 ] | 14 nm ( 0 . 75 V) | on-chip | 23 . 6 pJ | 60 . 416 pJ | 3 . 5 pJ | 10 . 24 pJ | 6 . 5 ns 60 . 35 ns | No | 7 . 71 nW | 248 . 41 nW | | Dynap-SE [ 103 ] | 180 nm ( 1 . 8 V) | on-chip | 30 pJ | 13 . 4 pJ | 17 pJ (@ 1 . 3 V) | 17 pJ | 40 ns 28 . 88 ns | Yes | 10 . 02 nW | 322 . 7 nW | | Neurogrid [ 331 ] | 180 nm ( 3 V) | on/off-chip | 1 nJ | 160 pJ | 14 nJ | 8 . 35 nJ | 20 ns 14 . 4 ns | No | 563 . 31 nW | 18 . 14 µ W | | SpiNNaker [ 330 ] | 130 nm ( 1 . 2 V) | on-chip | 30 . 3 nJ | 30 . 3 nJ | 1 . 11 nJ | 1 . 11 nJ | 200 ps 200 ps | No | 9 . 85 µ W | 317 . 08 µ W | | TrueNorth [ 329 ] | 28 nm ( 0 . 775 V) | on-chip | 26 pJ | 62 . 4 pJ | 2 . 3 pJ | 5 . 52 pJ . ns | 6 25 29 ns | No | 8 . 47 nW | 272 . 82 nW | | Neuromorphic Chip | Technology | Routing | ∗ original | sct ∗∗ . 130 nm | ⋄ original | sct. 130 nm original | sct. 130 nm | Optimized for Small-Worldness | Routing Power for task A | Routing Power for task B | | Neuromorphic Chip | Technology | Routing | 0 -hop | energy | 1 -hop | energy | 1 -hop latency | Optimized for Small-Worldness | Routing Power for task A | Routing Power for task B | sct. = Scaled to ∗∗ ⋄ = Numbers are taken from Moradi, et al, 2018 [ 103 ] T able 5 . 1 : Comparison of spike-routing performance across neuromorphic platforms deep learning, e.g., Vision Transformers with Local Attention [ 333 ] and MLP-mixers [ 334 ], treat input data in a similar way as the Mosaic, subdividing the input dimensions, and processing the resulting patches locally. This is also similar to how most biological system processes information in a local fashion, such as the visual system of fruit flies [ 335 ]. In the broader context, graph-based computing is currently receiving attention as a promising means of leveraging the capabilities of SNNs [ 336 -338 ]. The Mosaic is thus a timely dedicated hardware architecture optimized for a specific type of graph that is abundant in nature and in the real-world and that promises to find application at the extreme-edge. ## 5 . 6 methods ## Design, fabrication of Mosaic circuits ## 5 . 6 . 0 . 1 Neuron and routing column circuits Both neuron and routing column share the common front-end circuit which reads the conductances of the RRAM devices. The RRAM bottom electrode has a constant DC voltage Vbot applied to it and the common top electrode is pinned to the voltage Vx by a rail-to-rail operational amplifier (OPAMP) circuit. The OPAMP output is connected in negative feedback to its non-inverting input (due to the 180 degrees phase-shift between the gate and drain of transistor M 1 in Fig. 5 . 2 ) and has the constant DC bias voltage Vtop applied to its inverting input. As a result, the output of the OPAMP will modulate the gate voltage of transistor M 1 such that the current it sources onto the node Vx will maintain its voltage as close as possible to the DC bias Vtop . Whenever an input pulse Vin < n > arrives, a current i in equal to ( Vx -Vbot ) Gn will flow out of the bottom electrode. The negative feedback of the OPAMP will then act to ensure that Vx = Vtop , by sourcing an equal current from transistor M 1 . By connecting the OPAMP output to the gate of transistor M 2, a current equal to i in , will therefore also be buffered, as i buf f , into the branch composed of transistors M 2 and M 3 in series. In the Routing Tile, this current is compared against a reference current, and if higher, a pulse is generated and transferred onwards. The current comparator circuit is composed of two current mirrors and an inverter (Fig. 5 . 3 b). In the neuron column, this current is injected into a CMOS differential-pair integrator synapse circuit model [ 188 ] which generates an exponentially decaying waveform from the onset of the pulse with an amplitude proportional to the injected current. Finally, this exponential current is injected onto the membrane capacitor of a CMOS leaky-integrate and fire neuron circuit model [ 339 ] where it integrates as a voltage (see Fig. 5 . 2 b). Upon exceeding a voltage threshold (the switching voltage of an inverter) a pulse is emitted at the output of the circuit. This pulse in turn feeds back and shunts the capacitor to ground such that it is discharged. Further circuits were required in order to program the device conductance states. Notably, multiplexers were integrated on each end of the column in order to be able to apply voltages to the top and bottom electrodes of the RRAM devices. A critical parameter in both Neuron and Routing Tiles is the spike's pulse width. Minimizing the width of spikes assures maximal energy efficiency, but that comes at a cost. If the duration of the voltage pulse is too low, the readout current from the 1 T 1 R will be imprecise, and parasitic effects due to the metal lines in the array might even impede the correct propagation of either the voltage pulse or the readout current. For this reason, we thoroughly investigated the minimal pulse-width that allows spikes and readout currents to be reliably propagated, at a probability of 99 . 7 % ( 3 σ ). Extensive Monte Carlo simulation resulted in a spike pulse width of around 100 ns. Based on these SPICE simulations, we also estimated the energy consumption of Mosaic for the different tasks presented in Figure 5 . 4 . ## Fabrication/integration The circuits described in the Results section have been taped-out in 130 nmtechnology at CEA-Leti, in a 200 mm production line. The Front End of the Line, below metal layer 4 , has been realized by ST-Microelectronics, while from the fifth metal layer upwards, including the deposition of the composites for RRAM devices, the process has been completed by CEA-Leti. RRAM devices are composed of a 5 nm thick HfO 2 layer sandwiched by two 5 nm thick TiN electrodes, forming a TiN / HfO 2 / Ti / TiN stack. Each device is accessed by a transistor giving rise to the 1 T 1 R unit cell. The size of the access transistor is 650 nm wide. 1 T 1 R cells are integrated with CMOS-based circuits by stacking the RRAM cells on the higher metal layers. In the cases of the neuron and Routing Tiles, 1 T 1 R cells are organized in a small - either 2 x 2 or 2 x 4 - matrix in which the bottom electrodes are shared between devices in the same column and the gates shared with devices in the same row. Multiplexers operated by simple logic circuits enable to select either a single device or a row of devices for programming or reading operations. The circuits integrated into the wafer, were accessed by a probe card which connected to the pads of the dimension of [ 50 x 90 ] µ m 2 . ## RRAM characteristics Resistive switching in the devices used in our paper are based on the formation and rupture of a filament as a result of the presence of an electric field that is applied across the device. The change in the geometry of the filament results in different resistive state in the device. A SET/RESET operation is performed by applying a positive/negative pulse across the device which forms/disrupts a conductive filament in the memory cell, thus decreasing/increasing its resistance. When the filament is formed, the cell is in the HCS, otherwise the cell is is the LCS. For a SET operation, the bottom of the 1 T 1 R structure is conventionally left at ground level, and a positive voltage is applied to the 1 T 1 R top electrode. The reverse is applied in the RESET operation. Typical values for the SET operation are Vgate in [ 0.9 -1.3 ] V , while the Vtop peak voltage is normally at 2.0 V . For the RESET operation, the gate voltage is instead in the [ 2.75, 3.25 ] V range, while the bottom electrode is reaching a peak at 3.0 V . The reading operation is performed by limiting the Vtop voltage to 0.3 V , a value that avoids read disturbances, while opening the gate voltage at 4.5 V . ## Mosaic circuit measurement setups The tests involved analyzing and recording the dynamical behavior of analog CMOS circuits as well as programming and reading RRAM devices. Both phases required dedicated instrumentation, all simultaneously connected to the probe card. For programming and reading the RRAM devices, Source Measure Units (SMU)s from a Keithley 4200 SCS machine were used. To maximize stability and precision of the programming operation, SET and RESET are performed in a quasi-static manner. This means that a slow rising and falling voltage input is applied to either the Top (SET) or Bottom (RESET) electrode, while the gate is kept at a fixed value. To the Vtop ( t ) , Vbot ( t ) voltages, we applied a triangular pulse with rising and falling times of 1 sec and picked a value for Vgate . For a SET operation, the bottom of the 1 T 1 R structure is conventionally left at ground level, while in the RESET case the Vtop is equal to 0 V and a positive voltage is applied to Vbot . Typical values for the SET operation are Vgate in [ 0.9 -1.3 ] V , while the Vtop peak voltage is normally at 2.0 V . Such values allow to modulate the RRAM resistance in an interval of [ 5 -30 ] k Ω corresponding to the HCS of the device. For the RESET operation, the gate voltage is instead in the [ 2.75, 3.25 ] V range, while the bottom electrode is reaching a peak at 3.0 V . The LCS is less controllable than the HCS due to the inherent stochasticity related to the rupture of the conductive filament, thus the HRS level is spread out in a wider [ 80 -1000 ] k Ω interval. The reading operation is performed by limiting the Vtop voltage to 0.3 V , a value that avoids read disturbances, while opening the gate voltage at 4.5 V . Inputs and outputs are analog dynamical signals. In the case of the input, we have alternated two HP 8110 pulse generators with a Tektronix AFG 3011 waveform generator. As a general rule, input pulses had a pulse width of 1 µ s and rise/fall time of 50 ns . This type of pulse is assumed as the stereotypical spiking event of a Spiking Neural Network. Concerning the outputs, a 1 GHz Teledyne LeCroy oscilloscope was utilized to record the output signals. ## Mosaic layout-aware training via regularizing the loss function We introduce a new regularization function, LM , that emphasizes the realization cost of short and long-range connections in the Mosaic layout. Assuming the Neuron Tiles are placed in a square layout, LM calculates a matrix H ∈ R j × i , expressing the minimum number of Routing Tiles used to connect a source neuron Nj to target neuron Ni , based on their Neuron Tile positions on Mosaic. Following this, a static mask S ∈ R j × i is created to exponentially penalize the long-range connections such that S = e β H -1, where β is a positive number that controls the degree of penalization for connection distance. Finally, we calculate the LM = ∑ S ⊙ W 2 , for the recurrent weight matrix W ∈ R j × i . Note that the weights corresponding to intra-Neuron Tile connections (where H = 0) are not penalized, allowing the neurons within a Neuron Tile to be densely connected. During the training, task-related cross-entropy loss term (total reward in case of RL) increases the network performance, while LM term reduces the strength of the neural network weights creating long-range connections in Mosaic layout. Starting from the 10 th epoch, we deterministically prune connections (replacing the value of corresponding weight matrix elements to 0 ) when their L 1 -norm is smaller than a fixed threshold value of 0 . 005 . This pruning procedure privileges local connections (i.e., those within a Neuron Tile or to a nearby Neuron Tile) and naturally gives rise to a small-world neural network topology. Our experiments found that gradient norm clipping during the training and reducing the learning rate by a factor of ten after 135 th epoch in classification tasks help stabilize the optimization against the detrimental effects of pruning. ## RRAM-aware noise-resilient training The strategy of choice for endowing Mosaic with the ability to solve real-world tasks is offline training. This procedure consists of producing an abstraction of the Mosaic architecture on a server computer, formalized as a Spiking Neural Network that is trained to solve a particular task. When the parameters of Mosaic are optimized, in a digital floating-point32 -bits (FP 32 ) representation, they are to be transferred to the physical Mosaic chip. However, the parameters in Mosaic are constituted by RRAM devices, which are not as precise as the FP 32 counterparts. Furthermore, RRAMs suffer from other types of non-idealities such as programming stochasticity, temporal conductance relaxation, and read noise [ 312 , 315 -317 ]. To mitigate these detrimental effects at the weight transfer stage, we adapted the noise-resilient training method for RRAM devices [ 18 , 340 ]. Similar to quantization-aware training, at every forward pass, the original network weights are altered via additive noise (quantized) using a straight-through estimator. We used a Gaussian noise with zero mean and standard deviation equal to 5 % of the maximum conductance to emulate transfer non-idealities. The profile of this additive noise is based on our RRAM characterization of an array of 4096 RRAM devices [ 312 ], which are programmed with a program-and-verify scheme (up to 10 iterations) to various conductance levels then measured after 60 seconds for modeling the resulting distribution. ## ECG task description The Mosaic hardware-aware training procedure is tested on a electrocardiogram arrhythmia detection task. The ECG dataset was downloaded from the MIT-BIH arrhythmia repository [ 318 ]. The database is composed of continuous 30 -minute recordings measured from multiple subjects. The QRS complex of each heartbeat has been annotated as either healthy or exhibiting one of many possible heart arrhythmias by a team of cardiologists. We selected one patient exhibiting approximately half healthy and half arrhythmic heartbeats. Each heartbeat was isolated from the others in a 700 ms time-series centered on the labelled QRS complex. Each of the two 700 ms channel signals were then converted to spikes using a delta modulation scheme [ 341 ]. This consists of recording the initial value of the time-series and, going forward in time, recording the time-stamp when this signal changes by a pre-determined positive or negative amount. The value of the signal at this time-stamp is then recorded and used in the next comparison forward in time. This process is then repeated. For each of the two channels this results in four respective event streams - denoting upwards and downwards changes in the signals. During the simulation of the neural network, these four event streams corresponded to the four input neurons to the spiking recurrent neural network implemented by the Mosaic. Data points were presented to the model in mini-batches of 16 . Two populations of neurons in two Neuron Tiles were used to denote whether the presented ECG signals corresponded to a healthy or an arrhythmic heartbeat. The softmax of the total number of spikes generated by the neurons in each population was used to obtain a classification probability. The negative log-likelihood was then minimized using the categorical cross-entropy with the labels of the signals. ## Keyword Spotting task description For keyword spotting task, we used SHD dataset ( 20 classes, 8156 training, 2264 test samples). Each input example drawn from the dataset is sampled three times along the channel dimension without overlap to obtain three augmentations of the same data with 256 channels each. The advantage of this method is that it allows feeding the input stream to fewer Neuron Tiles by reducing the input dimension and also triples the sizes of both training and testing datasets. We set the simulation time step to 1 ms in our simulations. The recurrent neural network architecture consists of 2048 LIF neurons with 45 ms membrane time constant. The neurons are distributed into 8 x 8 Neuron Tiles with 32 neurons each. The input spikes are fed only into the neurons of the Mosaic layout's first row ( 8 tiles). The network prediction is determined after presenting each speech data for 100 ms by counting the total number of spikes from 20 neurons (total number of classes) in 2 output Neuron Tiles located in the bottom-right of the Mosaic layout. The neurons inside input and output Neuron Tiles are not recurrently connected. The network is trained using BPTT on the loss L = LCE + λ LM , where LCE is the cross-entropy loss between input logits and target and LM is the Mosaic-layout aware regularization term. We use batch size of 512 and suitably tuned hyperparameters. ## Reinforcement Learning task description In the RL experiments, we test the versatility of a Mosaic-optimized RSNN on a continuous action space motor-control task, half-cheetah, implemented using the BRAX physics engine for rigid body simulations [ 342 ]. At every timestep t , environment provides an input observation vector o t ∈ R 25 and a scalar reward value r t . The goal of the agent is to maximize the expected sum of total rewards R = ∑ 1000 t = 0 r t over an episode of 1000 environment interactions by selecting action a t ∈ R 7 calculated by the output of the policy network. The policy network of our agent consists of 256 recurrently connected LIF neurons, with a membrane decay time constant of 30 ms. The neuron placement is equally distributed into 16 Neuron Tiles to form a 7 x 7 Mosaic layout. We note that for simulation purposes, selecting a small network of 16 Neuron Tiles with 16 neurons each, while not optimal in terms of memory footprint (Eq. 5 . 2 ), was preferred to fit the large ES population within the constraints of single GPU memory capacities. At each time step, the observation vector o t is accumulated into the membrane voltages of the first 25 neurons of two upper left input tiles. Furthermore, action vector a t is calculated by reading the membrane voltages of the last seven neurons in the bottom right corner after passing tanh non-linearity. We considered Evolutionary Strategies (ES) as an optimization method to adjust the RSNN weights such that after the training, the agent can successfully solve the environment with a policy network with only locally dense and globally sparse connectivity. We found ES particularly promising approach for hardware-aware training as (i) it is blind to non-differentiable hardware constraints e.g., spiking function, quantizated weights, connectivity patterns, and (ii) highly parallelizable since ES does not require spiking variable to be stored for thousand time steps compared to BPTT that explicitly calculates the gradient. In ES, the fitness function of an offspring is defined as the combination of total reward over an episode, R and realization cost of short and long-range connections LM (same as KWS task), such that F = R -λ LM . We used the population size of 4096 (with antithetic sampling to reduce variance) and mutation noise standard deviation of 0 . 05 . At the end of each generation, the network weights with L 0-norm smaller than a fixed threshold are deterministically pruned. The agent is trained for 1000 generations. ## Calculation of memory footprint We calculate the Mosaic architecture's Memory Footprint (MF) in comparison to a large crossbar array, in building small-world graphical models. To evaluate the MF for one large crossbar array, the total number of devices required to implement any possible connections between neurons can be counted - allowing for any RSNN to be mapped onto the system. Setting N to be the number of neurons in the system, the total possible number of connections in the graph is MFref = N 2 . For the Mosaic architecture, the number of RRAM cells (i.e., the MF) is equal to the number of devices in all the Neuron Tiles and Routing Tiles: MFmosaic = MFNeuronTiles + MFRoutingTiles . Considering each Neuron Tile with k neurons, each Neuron Tile contributes to 5 k 2 devices (where the factor of 5 accounts for the four possible directions each tile can connect to, plus the recurrent connections within the tile). Evenly dividing the N total number of neurons in each Neuron Tile gives rise to T = ceil ( N / k ) required Neuron Tiles. This brings the total number of devices attributed to the Neuron Tile to T · 5 k 2 . The problem can then be re-written as a function of the geometry. Considering Fig. 5 . 1 g, let i be an integer and ( 2 i + 1 ) 2 the total number of tiles. The number of Neuron Tiles can be written as T = ( i + 1 ) 2 , as we consider the case where Neuron Tiles form the outer ring of tiles. As a consequence, the number of Routing Tiles is R = ( 2 i + 1 ) 2 -( i + 1 ) 2 . Substituting such values in the previous evaluations of MFNeuronTiles + MFRoutingTiles and remembering that k < N · T , we can impose that MFMosaic = MFNeuronTiles + MFRoutingTiles < MFMFref . The number of Routing Tiles that connects all the Neuron Tiles depends on the geometry of the Mosaic systolic array. Here, we assume Neuron Tiles assembled in a square, each with a Routing Tile on each side. We consider R to be the number of Routing Tiles with ( 4 k ) 2 devices in each. This brings the total number of devices related to Routing Tiles up to MFRoutingTiles = R · ( 4 k ) 2 . This results in the following expression: $$M F _ { M o s a i c } = M F _ { N e u r o n T i l e s } + M F _ { R o u t i n g T i l e s } < M F _ { r e f e r e n c e }$$ $$\begin{array} { l } ( i + 1 ) ^ { 2 } \, ( 5 k ^ { 2 } ) + [ ( 2 i + 1 ) ^ { 2 } - ( i + 1 ) ^ { 2 } ] \, ( 4 k ) ^ { 2 } < ( k \, ( i + 1 ) ^ { 2 } ) ^ { 2 } \end{array} \quad ( 5 . 2 )$$ This expression can then be evaluated for i , given a network size, giving rise to the relationships as plotted in Fig. 5 . 1 g in the main text. <details> <summary>Image 36 Details</summary> ![088b3574](/v1/image/088b3574cfae3e0c4ec9f7f6392c789c0917d248d2c6cb936c4347e0ad5484ab) ### Visual Description Icon/Small Image (54x82) </details> In this thesis, I explored implementing neural computation on mixed-signal hardware with in-memory computing capabilities. Neural computation (i.e., inference, learning and routing) is fundamentally memory-centric, and von Neumann architecture's separation of memory and computation is a major contributor to modern neural networks' overall energy consumption and latency. Based on this observation, we investigated offloading critical neural computations directly onto the raw analog dynamics of resistive memory technologies. This thesis begins by addressing the challenge of online training of feedforward spiking neural networks implemented using intrinsically 1 -bit RRAM devices. In Chapter 2 , we presented a novel memristor programming technique to precisely control the filament formation in RRAMs devices, increasing their effective bit resolution for more stable training dynamics and improved performance. The versatility of this technique was further demonstrated by applying it to novel perovskite memristors in Chapter 4 , achieving an even more linear device response. In Chapter 3 , we tackled the temporal credit assignment problem in RSNNs on an analog substrate. We developed a simulation framework based on a comprehensive statistical model of the PCM crossbar array, capturing major non-linearities of memristors. This framework enabled the simulation of device responses to various weight programming techniques designed to mitigate these non-idealities. Using this framework, we trained a RSNN with the e-prop local learning rule, demonstrating that gradient accumulation is crucial for accurately reflecting weight updates on memristive devices - a finding later validated on a real neuromorphic chip. While our solution, which relies on a digital coprocessor, incurs an energy cost, it motivates the development of future devices with intrinsic accumulation capabilities, potentially through conductance or charge storage. Additionally, we introduced PCM-trace, a scalable implementation of synaptic eligibility traces for local learning using the volatile regime of PCM devices. In Chapter 4 , we presented a discovery of a novel memristor capable of switching between volatile and non-volatile modes. This reconfigurable memristor, based on halide perovskite nanocrystals, offers a significant advancement in emerging memory technologies, enabling the implementation of both static and dynamic neural parameters with the same material and fabrication technique. Finally, in Chapter 5 , we introduced Mosaic, a memristive systolic architecture for in-memory computing and routing. Mosaic efficiently implements small-world graph connectivity, demonstrating superior energy efficiency in spike routing compared to other hardware platforms. Additionally, we introduced a hardware-layout aware training method that takes the physical layout of the chip into account while optimizing the neural network weights. ## Discussion and Outlook I conclude by highlighting significant limitations of neural computation on analog substrates, which serve as a foundation for future research. - 1 . Discovering efficient material and peripheral circuits Although our experiments demonstrate the potential of learning on analog substrates using mature RRAM and PCM technologies, significant challenges remain in achieving efficiency of digital accelerators. Limited bit resolution inherent to these devices necessitates energy-intensive gradient accumulation on a co-processor, hindering overall efficiency. As the analog device sizes shrink, this problem is expected to become more pronounced. While improving bit precision can enhance accuracy, it alone won't guarantee efficient analog learning. Additionally, the high WRITE operation energy of memristors needs to be lowered to match the low programming energies of SRAM. Finally, it's worth noting that scaling trends are primarily driven by peripheral circuitry, rather than the memory cells, necessitating area optimization on the periphery to achieve cost-effective fabrication. Future research should prioritize addressing these challenges, relatively in this order to achieve learning performances competitive with digital accelerators. - 2 . Local learning in unconventional networks The co-design approach is driving neural architectures to be increasingly tailored to hardware for efficiency. For example quantization is adapted to reduce memory footprint, State Space Models (SSMs) leverage recurrency and improve arithmetic intensity (FLOPs/byte) in digital and spiking neurons convert analog membrane voltage to binary spikes and eliminate the need for ADC in analog. It seems reasonable to expect more such primitives to emerge potentially leveraging sparsity or stochasticity. This raises the question of how to design local learning rules for unconventional future architectures. Can a single learning rule provide an end-to-end solution, or can credit assignment be modularized for subgraphs? Recent work suggests that exploring such directions may yield valuable insights and unexpected performance tradeoffs [ 69 , 71 , 72 , 343 , 344 ]. - 3 . Mismatch between silicon and brain Biological plausibility and silicon efficiency represent distinct optimization goals. While silicon dynamics outpace neuronal dynamics in raw speed 1 , it falls short in matching the brain's power efficiency and parallelism. I argue that while mimicking the brain's intricacies may be scientifically interesting, it is not necessarily an ideal blueprint for either silicon efficiency or intelligence. Instead, focusing on silicon efficiency allows us to leverage the unique strengths of this substrate. This will naturally lead to design principles that are common, such as minimizing wiring and localizing computation, however, we are not bound by the specific artifacts of biological evolution. So it remains an open question which criteria from neuroscience should inform hardware design improvements. - 4 . Remembering the Jevons Paradox While hardware-algorithm co-design aims to enhance the power efficiency of on-device intelligence, the Jevons Paradox suggests that this may inadvertently increase overall AI energy consumption due to increased demand [ 345 ]. This is a phenomenon occurred in the past, where increasing efficiency of coal engines led to more coal consumption. This raises important questions for future investigation about potential societal and governmental regulations to mitigate the environmental impact, or more exotic technological deployment strategies that keep carbon footprint of AI on the Earth minimal. 1 Electron mobility in transistors is at least 10 7 time faster than ion transport rate in neuron channels. At the time of writing, learning on mixed-signal hardware with memristive weights hasn't reached the efficiency of digital accelerators. It seems probable that in the future the scene will evolve as the material design and fabrication technology quickly mature. Nevertheless, it will remain an exciting playground where intelligence meets the raw, unfiltered physics of computation. <details> <summary>Image 37 Details</summary> ![cf32fb47](/v1/image/cf32fb475e85220d39c1dd30cedd41152cdac2a737bce35fc3edd6e92b0dc7ba) ### Visual Description ## Text Element: Uppercase Letter "A" ### Overview The image displays a single, isolated uppercase letter "A" rendered in a bold, sans-serif font. The character is centered against a plain white background with no additional graphical or textual elements. ### Components/Axes - **Text Content**: The sole textual element is the letter "A". - **Font Style**: Bold, sans-serif typeface. - **Color**: Gray fill with no outline or shadow. - **Background**: Uniform white with no gradients or patterns. - **Spacing**: No margins, padding, or alignment guides visible. ### Detailed Analysis - **Letter Structure**: - The "A" features a standard geometric design with two diagonal strokes meeting at a central vertical stroke. - No serifs, ligatures, or decorative elements are present. - **Typography Metrics**: - Approximate height: 1.2em (based on standard sans-serif proportions). - Stroke width: Consistent thickness throughout (no tapering). - **Positioning**: - Centered horizontally and vertically within the frame. - No alignment markers or contextual positioning cues. ### Key Observations 1. **Minimalism**: The absence of additional elements suggests intentional simplicity, possibly for typographic study or branding purposes. 2. **Contrast**: High contrast between the gray "A" and white background ensures legibility. 3. **Scalability**: The bold weight implies suitability for large-scale applications (e.g., headings, logos). ### Interpretation The image likely serves as a typographic specimen or placeholder. The choice of a bold sans-serif "A" emphasizes clarity and modernity, common in digital interfaces or minimalist design systems. The lack of contextual elements (e.g., labels, diagrams) indicates the focus is solely on the letter’s form and typographic properties. No data, trends, or relationships exist beyond the visual representation of the character itself. </details> We present here some additional results to complement the main results discussed in the previous sections. appendix 1 : online training of spiking recurrent neural networks with phase -change memory synapses supplementary note 1 We implemented the PCM crossbar array simulation framework in PyTorch [ 107 ], which can be used for both the inference and the training of ANNs or SNNs. Built on top of the statistical model introduced by Nandakumar et al. [ 81 ], our crossbar model supports asynchronous SET, RESET and READ operations over entire crossbar structures and simultaneously keep tracks of the temporal evolution of device conductances. A single crossbar array consists of P × Q nodes (each node representing a synapse), where single node has 2 N memristors arranged using the differential architecture ( N potentiation, N depression devices). Each memristor state is represented by four variables, t p for storing the last time the device is written (which is used to calculate the effect of the drift), count for counting how many times it has written (to be used later in the arbiter of N-memristor architectures), Pmem for its programming history (required by the PCM model) and G for representing the conductance of the the device at T 0 seconds later after the last programming time. The initial conductances of PCM devices in the crossbar array are assumed to be iterativelly programmed to HRS, sampled from a Normal distribution N ( µ = 0.1, σ = 0.01 ) µ S. The PCM crossbar simulation framework supports three major functions: READ, SET and RESET. The READ function takes the pulse time of the applied READ pulse, t , and calculates the effect of drift based on the last programming time t p . Then, it adds the conductance-dependent READ noise and returns the conductance values of whole array. The SET function takes the timing information of the applied SET pulse, together with a mask of shape ( 2 × N × P × Q ) and calculates the effect of the application of a single SET pulse on the PCM devices that are selected with the mask. Finally, the RESET function initializes all the state variables of devices selected with a mask and initializes the conductances using a Normal distribution N ( µ = 0.1, σ = 0.01 ) µ S. supplementary note 2 READ and WRITE operations to simulated PCM devices in the crossbar model are stochastic and subject to the temporal conductance drift. Additionally, PCM devices offer a very limited bit precision. Therefore, to ease the network training procedure, especially the hyperparameter tuning, we developed the perf-mode . When crossbar model is operated in perf-mode , all stochasticity sources and the conductance drift are disabled. READ operations directly access the device conductance without 1/ f noise and drift, whereas SET operations increase the device conductance as $$G _ { N } = G _ { N - 1 } + \frac { G _ { M A X } } { 2 ^ { C B _ { R E S } } } , & & ( A . 1 )$$ where, GMAX is the maximum PCM conductance set to 12 µ S (conductivity boundaries are determined based on the device measurements from [ 81 ]), and CBRES is the desired bit-resolution of a single PCM device. In a nutshell, the perf-mode turns PCMs into an ideal memory cells corresponding to a digital memory with a limited bit precision. supplementary note 3 Here, we demonstrate the impact of using multiple memristor devices per synapse (arranged in differential configuration) on the precision of targeted programming updates. Specifically, we modeled synapses with N = 1, 4, 8 PCM devices and programmed them from initial conditions of integer conductance values Gsource ∈ {-10, 10 } µ S to integer <details> <summary>Image 38 Details</summary> ![e7085752](/v1/image/e7085752055bff5d14a2615c5f5a139c7d0b0d5082448f3e20cc665c0bf1c231) ### Visual Description ## Line Graph: Performance Comparison of PCM Model and Perf Mode Over Time ### Overview The image is a line graph comparing the performance metrics (G in microseconds) of two models, **PCM Model** (blue line) and **Perf Mode** (orange line), over a time span of 800 seconds. The graph shows distinct trends in how each model's performance evolves, with Perf Mode exhibiting stepwise increases and PCM Model displaying noisy but generally rising behavior. --- ### Components/Axes - **X-axis**: Labeled "Time (s)" with markers at 0, 100, 200, 300, 400, 500, 600, 700, and 800 seconds. - **Y-axis**: Labeled "G (μs)" with values ranging from 0 to 12 μs in increments of 2. - **Legend**: Located in the **top-left corner**, with: - **Blue line**: PCM Model - **Orange line**: Perf Mode --- ### Detailed Analysis #### Perf Mode (Orange Line) - **Stepwise trend**: The orange line increases in discrete steps at specific time intervals: - **100s**: Jumps from ~0 μs to ~2 μs. - **200s**: Jumps from ~2 μs to ~3 μs. - **300s**: Jumps from ~3 μs to ~4 μs. - **400s**: Jumps from ~4 μs to ~5 μs. - **500s**: Jumps from ~5 μs to ~6 μs. - **600s**: Jumps from ~6 μs to ~7 μs. - **700s**: Jumps from ~7 μs to ~8 μs. - **800s**: Jumps from ~8 μs to ~12 μs (final value). - **Stability**: Horizontal segments between steps indicate stable performance during intervals. #### PCM Model (Blue Line) - **Noisy trend**: The blue line fluctuates around the Perf Mode steps but does not strictly follow them: - **100s**: ~1.5 μs (below Perf Mode's 2 μs). - **200s**: ~2.5 μs (above Perf Mode's 3 μs). - **300s**: ~3.5 μs (above Perf Mode's 4 μs). - **400s**: ~4.5 μs (above Perf Mode's 5 μs). - **500s**: ~5.5 μs (above Perf Mode's 6 μs). - **600s**: ~6.5 μs (above Perf Mode's 7 μs). - **700s**: ~7.5 μs (above Perf Mode's 8 μs). - **800s**: ~10 μs (below Perf Mode's 12 μs). - **Variability**: Sharp spikes and dips (e.g., ~9 μs at 600s, ~11 μs at 700s) suggest instability or external interference. --- ### Key Observations 1. **Perf Mode** exhibits controlled, predictable increases at fixed intervals, resembling a threshold-based system. 2. **PCM Model** shows higher variability but generally trends upward, with occasional spikes exceeding Perf Mode's values. 3. **Final disparity**: At 800s, Perf Mode reaches 12 μs, while PCM Model stabilizes at ~10 μs, indicating a performance gap under sustained operation. --- ### Interpretation - **Perf Mode** likely represents a deterministic, optimized system with scheduled performance thresholds (e.g., resource allocation at fixed intervals). - **PCM Model** may reflect a reactive or adaptive system, where performance fluctuates due to dynamic conditions (e.g., variable workloads or external noise). - The divergence at 800s suggests Perf Mode scales more effectively over time, while PCM Model's noise could indicate inefficiencies or instability under prolonged stress. - The stepwise nature of Perf Mode implies it is designed for scenarios requiring predictable, incremental improvements, whereas PCM Model might be better suited for environments with variable demands but less predictable outcomes. </details> - (a) Comparison of the full PCM model and its perf-mode equivalent after 8 consecutive SET pulses. - (b) Comparison of the full PCM model and its perf-mode equivalent after 8 consecutive SET pulses, averaged over 300 measurements showing the effect of drift. <details> <summary>Image 39 Details</summary> ![d79bc513](/v1/image/d79bc51322482d2391a8717537b94372bb567683768f0887a804d95d096ae8c3) ### Visual Description ## Line Graph: Performance Comparison of PCM Model and Perf Mode Over Time ### Overview The image is a line graph comparing the performance of two systems, the **PCM Model** (blue line) and **Perf Mode** (orange line), over a 300-second time span. The y-axis represents a metric labeled **G (µs)**, while the x-axis represents **Time (s)**. The graph highlights distinct behavioral trends between the two systems, with the PCM Model exhibiting gradual stepwise increases and the Perf Mode showing abrupt stabilization. --- ### Components/Axes - **X-axis (Time)**: Labeled "Time (s)", ranging from 0 to 300 seconds in increments of 50. - **Y-axis (G)**: Labeled "G (µs)", ranging from 0 to 12 µs in increments of 2. - **Legend**: Located at the **bottom-right** of the graph. - **PCM Model**: Represented by a **blue line** with stepwise markers. - **Perf Mode**: Represented by an **orange line** with a flat plateau. --- ### Detailed Analysis #### PCM Model (Blue Line) - **Initial Behavior**: Starts at 0 µs at time 0s. - **Stepwise Increases**: - Increases by ~1.5 µs at approximately **10s, 20s, 30s, 40s, and 50s**. - Reaches a plateau of **~7.5 µs** after 50s. - **Post-50s**: Remains flat at ~7.5 µs until 300s. #### Perf Mode (Orange Line) - **Initial Behavior**: Starts at 0 µs at time 0s. - **Abrupt Jump**: - Jumps to **12 µs** at **50s**. - **Post-50s**: Remains flat at **12 µs** until 300s. --- ### Key Observations 1. **Perf Mode Stabilizes Faster**: Reaches its maximum value (12 µs) at 50s, while the PCM Model takes 50s to plateau at 7.5 µs. 2. **PCM Model Gradual Adjustment**: Exhibits incremental increases, suggesting a controlled or iterative process. 3. **Performance Disparity**: Perf Mode achieves a higher G value (12 µs vs. 7.5 µs) and maintains it consistently. --- ### Interpretation - **System Efficiency**: The Perf Mode likely represents an optimized system designed for rapid stabilization and higher performance, while the PCM Model may prioritize gradual adjustments, possibly for stability or precision. - **Trade-offs**: The PCM Model’s stepwise behavior could indicate resource constraints or safety mechanisms, whereas Perf Mode’s abrupt jump suggests prioritization of speed over incremental refinement. - **Anomalies**: The Perf Mode’s flat plateau at 12 µs implies no further changes post-50s, which may indicate a ceiling effect or saturation of the system’s capacity. --- ### Spatial Grounding - **Legend**: Bottom-right corner, clearly associating colors with labels. - **Lines**: - Blue (PCM Model) occupies the lower half of the graph. - Orange (Perf Mode) dominates the upper half after 50s. - **Axis Alignment**: Both axes are labeled with clear units and scales, ensuring unambiguous interpretation. --- ### Content Details - **PCM Model Values**: - 0 µs (0s), 1.5 µs (10s), 3.0 µs (20s), 4.5 µs (30s), 6.0 µs (40s), 7.5 µs (50s+). - **Perf Mode Values**: - 0 µs (0s), 12 µs (50s+). --- ### Final Notes The graph underscores a fundamental difference in system behavior: Perf Mode achieves higher performance quickly but plateaus, while the PCM Model progresses gradually but at a lower maximum. This could reflect design priorities (e.g., speed vs. precision) or operational constraints in real-world applications. </details> F igure A. 1 : The PCM crossbar model supports both the full PCM model from [ 81 ] and its corresponding simplified version as an ideal digital memory in perf-mode . <details> <summary>Image 40 Details</summary> ![09126d0d](/v1/image/09126d0df350afb2538119bdf4d74f4e3b5f50146b800dba054c6944e0184fd2) ### Visual Description ## Heatmaps: Mean (Error: 2.90) and Standard Deviation (STD) of Conductance Relationships ### Overview The image contains two side-by-side heatmaps comparing the relationship between source conductance (y-axis) and target conductance (x-axis). The left heatmap shows mean values with an error margin of ±2.90 µS, while the right heatmap displays standard deviation (STD) values. Both use a color gradient from dark blue (low values) to red (high values), with numerical annotations in each cell. ### Components/Axes **Left Heatmap (Mean):** - **X-axis (Target Conductance):** Labeled "target conductance (µS)" with values ranging from -10 to 10 in increments of 2 µS. - **Y-axis (Source Conductance):** Labeled "source conductance (µS)" with values from -50 to 55 in increments of 5 µS. - **Color Scale:** Ranges from -5.0 (dark blue) to 7.5 (red), labeled "Mean (Error: 2.90)". - **Legend:** Positioned on the right, with a vertical gradient from dark blue to red. **Right Heatmap (STD):** - **X-axis (Target Conductance):** Same labels and range as the left heatmap. - **Y-axis (Source Conductance):** Same labels and range as the left heatmap. - **Color Scale:** Ranges from -3.5 (dark blue) to 3.5 (red), labeled "STD". - **Legend:** Positioned on the right, with a vertical gradient from dark blue to red. ### Detailed Analysis **Left Heatmap (Mean):** - **Top-Left Cell (-10, -50):** -10.95 µS (dark blue). - **Bottom-Right Cell (10, 55):** 7.50 µS (red). - **Key Values:** - -10.95, -9.79, -8.79, ..., 7.50 (scientific notation values in parentheses). - Notable outliers: -15.00 µS (bottom-left quadrant) and 7.50 µS (bottom-right). **Right Heatmap (STD):** - **Top-Left Cell (-10, -50):** -3.50 µS (dark blue). - **Bottom-Right Cell (10, 55):** 3.50 µS (red). - **Key Values:** - -3.50, -3.33, -3.11, ..., 3.50 (scientific notation values in parentheses). - Notable outliers: -3.50 µS (top-left) and 3.50 µS (bottom-right). ### Key Observations 1. **Mean Heatmap Trends:** - Gradual increase in mean values from left (negative) to right (positive) across the x-axis. - Steeper gradient in the bottom-right quadrant (source conductance > 25 µS). - Largest negative values (-15.00 µS) in the top-left quadrant (source conductance < -25 µS). 2. **STD Heatmap Trends:** - Symmetrical distribution around the center (0 µS target conductance). - Lowest variability (dark blue cells) in the top-left quadrant (-10 to 0 µS target conductance). - Highest variability (red cells) in the bottom-right quadrant (target conductance > 25 µS). 3. **Color Consistency:** - Dark blue cells in both heatmaps correspond to negative values (below -3.5 µS for STD, -5.0 µS for Mean). - Red cells indicate positive values (above 3.5 µS for STD, 7.5 µS for Mean). ### Interpretation The data suggests a **non-linear relationship** between source and target conductance: - **Mean Error:** Higher source conductance correlates with larger positive errors, particularly in regions where target conductance exceeds 25 µS. This may indicate systematic bias in measurement accuracy at higher conductance levels. - **Standard Deviation:** Lower variability in the top-left quadrant (source conductance < -25 µS) suggests more consistent measurements in low-conductance environments. The bottom-right quadrant shows higher variability, potentially due to measurement instability or environmental factors at high conductance levels. - **Anomalies:** The -15.00 µS outlier in the Mean heatmap (bottom-left) and -3.50 µS in the STD heatmap (top-left) may represent edge cases or calibration thresholds. These patterns highlight the importance of calibrating measurement systems for high-conductance environments and suggest that error margins are more predictable in low-conductance regimes. </details> F igure A. 2 : Multi-memristor configuration with 1 PCM(one depression and one potentiation) per synapse conductance values Gtarget ∈ {-10, 10 } µ S using the multi-memristor update scheme described in Section 3 . 1 . 3 . 2 . The effective conductance of a synapse is calculated by Gsyn = ∑ N i = 0 G + i -∑ N i = 0 G -i , however we normalized the conductance across 1 -PCM, 4 -PCM and 8 -PCM architectures for an easier comparison, such that Gsyn = 1 N ( ∑ N i = 0 G + i -∑ N i = 0 G -i ) . Our empirical results verifies the claim of Boybat et al. [ 109 ] that the standard deviation and the update resolution of the write process decreases by √ N . supplementary note 4 In the differential architectures, consecutive SET pulses applied to positive and negative memristors may cause the saturation of the synaptic conductance and block further updates. The saturation effect is more apparent when a single synapse gets 10 + updates in one direction (potentiation or depression) during the training. For example, this effect is clearly visible in SupplFig. A. 2 , Fig. A. 3 and Fig. A. 4 , when the source conductance and target conductances differ by more than 8 -10 µ S. We implemented a weight update scheme denoted as the update-ready criterion, which aims to prevent conductance saturation while applying single large updates. Before doing the update, we read both positive and negative pair conductances, and check if the target update is possible. If not, we reset both devices, calculate the new target and apply the number of pulses accordingly. For example, given G + = 8 µ S and G -= 4 µ S and the targeted update + 6 µ S, the algorithm decides to reset both devices because G + can't be increased up to 14 µ S. After both devices are <details> <summary>Image 41 Details</summary> ![ca8de98d](/v1/image/ca8de98de941feb7570981b9903b7a7f8c3f8df41b90e282f13e35ae548c6976) ### Visual Description ## Heatmaps: Mean Error vs. Standard Deviation of Conductance Relationships ### Overview Two side-by-side heatmaps visualize relationships between target and source conductance measurements. The left heatmap shows mean error values (with ±2.80 uncertainty), while the right displays standard deviation (STD) values. Both use a color gradient from dark blue (-10.0) to bright red (10.0) to represent magnitude, with numerical values embedded in each cell. ### Components/Axes - **X-axis (target conductance)**: Labeled "target conductance (μG)", ranges from -10 to 10 in increments of 2 - **Y-axis (source conductance)**: Labeled "source conductance (μG)", ranges from -10 to 10 in increments of 2 - **Color scale**: Gradient from dark blue (-10.0) to bright red (10.0), with legend positioned on the right - **Legend**: Vertical color bar with numerical scale (-10.0 to 10.0) and gradient matching heatmap colors ### Detailed Analysis **Left Heatmap (Mean Error: ±2.80):** - Contains 11x11 grid of cells with values like "-9.97-6.4", "-10.00-0.0", and "10.00-0.0" - Values represent mean error ranges (e.g., "-9.97-6.4" = -9.97 ± 6.4) - Notable clusters: - Dark blue cells (-10.0 to -5.0) in bottom-left quadrant - Bright red cells (5.0 to 10.0) in top-right quadrant - Diagonal band of moderate values (0.0 to 2.5) from bottom-left to top-right **Right Heatmap (STD):** - Similar 11x11 grid with values like "1.2", "0.8", and "0.00081" - Values represent standard deviation magnitudes - Notable patterns: - Black cells (0.0) along diagonal from top-left to bottom-right - Bright red cells (1.5 to 2.5) in upper-right quadrant - Dark blue cells (-1.5 to -0.5) in lower-left quadrant ### Key Observations 1. **Mean Error Distribution**: - Strong negative errors (-10.0 to -5.0) dominate when source conductance < -5 μG and target conductance < 0 μG - Positive errors (5.0 to 10.0) concentrated when both conductances > 5 μG - Moderate errors (0.0 to 2.5) along diagonal where source ≈ target conductance 2. **Standard Deviation Patterns**: - Near-zero variability (black cells) along diagonal where source = target conductance - High variability (1.5-2.5) in upper-right quadrant (high target/source conductances) - Negative variability (-1.5 to -0.5) in lower-left quadrant (low conductances) 3. **Color Consistency**: - All cells match legend color gradient exactly - Dark blue cells consistently represent values < -5.0 - Bright red cells consistently represent values > 5.0 ### Interpretation The data suggests: 1. **Measurement Bias**: Systematic underestimation (negative errors) occurs when measuring low conductance values (<0 μG), while overestimation (positive errors) dominates at high conductances (>5 μG) 2. **Precision Relationship**: Measurement precision (STD) improves significantly when target and source conductances match (black diagonal), with variability increasing as the difference between conductances grows 3. **Nonlinear Relationships**: The diagonal band of moderate mean errors (0.0-2.5) suggests a zone of relative accuracy when conductances are similar but not identical 4. **Outlier Regions**: The bright red cells in upper-right (mean error >5.0) and lower-left (STD < -1.5) indicate potential measurement artifacts or calibration issues in extreme conductance ranges These patterns highlight the importance of conductance matching for measurement accuracy and suggest potential calibration requirements for extreme conductance values. </details> F igure A. 3 : Multi-memristor configuration with 8 PCM (four depression and four potentiation) per synapse <details> <summary>Image 42 Details</summary> ![8da4a539](/v1/image/8da4a539a1a10bd99a82fbdd99e380628360f505c6c16dacd35f29c55e1fd164) ### Visual Description ## Heatmaps: Mean Error vs. Standard Deviation Across Conductance Ranges ### Overview The image contains two side-by-side heatmaps comparing **mean error** (left) and **standard deviation (STD)** (right) across a grid of **source conductance (μG)** and **target conductance (μG)** values. Both heatmaps use a color gradient to represent numerical values, with axes ranging from -10 to 10 μG. The left heatmap emphasizes error magnitude, while the right highlights variability (STD). --- ### Components/Axes #### Left Heatmap (Mean Error) - **Title**: "Mean, (Error: 2.77)" - **X-axis**: "target conductance (μG)" with values from -10 to 10 in increments of 1. - **Y-axis**: "source conductance (μG)" with values from -10 to 10 in increments of 1. - **Color Scale**: Red (negative values) to black (positive values), labeled from -10.0 to 10.0. - **Legend**: Positioned on the right, with a vertical gradient matching the heatmap colors. #### Right Heatmap (STD) - **Title**: "STD" - **X-axis**: "target conductance (μG)" (same scale as left). - **Y-axis**: "source conductance (μG)" (same scale as left). - **Color Scale**: Red (low values) to white (high values), labeled from 0.0 to 2.0. - **Legend**: Positioned on the right, with a vertical gradient matching the heatmap colors. --- ### Detailed Analysis #### Left Heatmap (Mean Error) - **Key Values**: - Top-left cell: **-9.9** (dark red). - Bottom-right cell: **10.0** (black). - Central region: Values cluster around **-2.0 to 2.0** (orange to red). - Diagonal trend: Values increase from top-left (-9.9) to bottom-right (10.0). - **Notable Patterns**: - Negative values dominate the upper-left quadrant. - Positive values dominate the lower-right quadrant. - Sharp gradient near the center (e.g., **-2.77** at the center). #### Right Heatmap (STD) - **Key Values**: - Top-left cell: **0.003** (dark red). - Bottom-right cell: **2.0** (white). - Central region: Values cluster around **0.8 to 1.5** (orange to light red). - Diagonal trend: Values increase from top-left (0.003) to bottom-right (2.0). - **Notable Patterns**: - Lower variability (STD) in the upper-left quadrant. - Higher variability in the lower-right quadrant. - Consistent gradient across the grid. --- ### Key Observations 1. **Mean Error**: - Errors are largest in magnitude at extreme conductance values (e.g., -10 μG source/target). - The center of the grid shows moderate errors (~±2.77). - A clear diagonal gradient suggests a relationship between source and target conductance affecting error. 2. **Standard Deviation**: - Variability is lowest in the upper-left quadrant (STD < 0.1). - Variability increases sharply toward the lower-right quadrant (STD up to 2.0). - The diagonal trend mirrors the mean error heatmap, indicating correlated behavior. 3. **Color Consistency**: - Both heatmaps align with their respective legends (e.g., -10 μG = dark red, 10 μG = black for mean error; 0.0 = dark red, 2.0 = white for STD). --- ### Interpretation - **Mean Error Trends**: - The model exhibits higher errors at extreme conductance values, suggesting potential instability or sensitivity in these regions. - The central region (-2.77 mean error) may represent a "sweet spot" for balanced performance. - **STD Trends**: - Lower variability in low-conductance ranges implies more consistent predictions there. - High variability in high-conductance ranges indicates instability, possibly due to data sparsity or model limitations. - **Correlation**: - The diagonal trends in both heatmaps suggest that source and target conductance interact nonlinearly, with errors and variability increasing as both values move toward extremes. - **Practical Implications**: - The model may require calibration or additional training data for high-conductance scenarios. - The central region (-2.77 mean error) could be prioritized for applications requiring moderate conductance ranges. --- ### Spatial Grounding - **Legends**: Positioned on the right of each heatmap, aligned vertically. - **Axes**: Labeled on the left (y-axis) and bottom (x-axis) of each heatmap. - **Color Bars**: Match the gradient of their respective heatmaps (left: red-black; right: red-white). --- ### Content Details - **Mean Error Values**: - Example cells: - (-10, -10): -9.9 - (0, 0): -2.77 - (10, 10): 10.0 - **STD Values**: - Example cells: - (-10, -10): 0.003 - (0, 0): 0.8 - (10, 10): 2.0 --- ### Final Notes The heatmaps reveal a systematic relationship between conductance ranges and model performance, with errors and variability increasing at extremes. This suggests the need for targeted improvements in high-conductance scenarios. </details> -10.0 F igure A. 4 : Multi-memristor configuration with 16 PCM (eight depression and eight potentiation) per synapse reset, G + can be programmed to 10 µ S). Although our PCM crossbar array simulation framework supports it, this weight transfer criterion is not used in our simulations because it requires reading the device states during the update. F igure A. 5 : Update-ready criterion tested with N = 1 memristor per synapse. <details> <summary>Image 43 Details</summary> ![b322507e](/v1/image/b322507e684e4613a98073a1e76b97ec8ad643dae150422e0a060d208eee576b) ### Visual Description ## Heatmap: Mean (Error 0.22) and Standard Deviation (STD) of Conductance Values ### Overview The image contains two side-by-side heatmaps. The left heatmap represents the **mean** of conductance values with an error margin of 0.22 μS, while the right heatmap shows the **standard deviation (STD)** of the same data. Both heatmaps use a color gradient to represent numerical values, with axes labeled in Chinese and English. The data is organized in a grid format, with rows and columns corresponding to "source conductance" and "target conductance" values. --- ### Components/Axes #### Left Heatmap (Mean) - **X-axis (Target Conductance)**: Labeled "target conductance (μS)" with values ranging from -10 to 10 in increments of 1. - **Y-axis (Source Conductance)**: Labeled "source conductance (μS)" with values ranging from -10 to 10 in increments of 1. - **Color Scale**: Dark blue (negative values) to red (positive values), with a legend on the right indicating: - **-7.5**: Dark blue - **-5.0**: Medium blue - **-2.5**: Light blue - **0.0**: White - **2.5**: Light orange - **5.0**: Orange - **7.5**: Red - **Embedded Text**: Numerical values (e.g., "-0.97", "-0.79") are embedded in each cell, representing the mean conductance values. #### Right Heatmap (STD) - **X-axis (Target Conductance)**: Same as the left heatmap, labeled "target conductance (μS)" with values from -10 to 10. - **Y-axis (Source Conductance)**: Same as the left heatmap, labeled "source conductance (μS)" with values from -10 to 10. - **Color Scale**: Dark blue (low STD) to light orange (high STD), with a legend on the right indicating: - **-3.0**: Dark blue - **-1.5**: Medium blue - **0.0**: White - **1.5**: Light orange - **3.0**: Orange - **Embedded Text**: Numerical values (e.g., "-0.01", "-0.01") are embedded in each cell, representing the standard deviation. --- ### Detailed Analysis #### Left Heatmap (Mean) - **Key Values**: - Top-left cell: **-0.97** (dark blue) - Bottom-right cell: **7.5** (red) - Central region: Values cluster around **0.0** (white), with some positive values (e.g., **3.5**, **5.0**) in the upper-right quadrant. - **Trends**: - A gradient from negative (dark blue) to positive (red) values, suggesting a general increase in mean conductance as source and target conductances increase. - The most extreme positive value (**7.5**) is in the bottom-right corner, indicating a strong correlation between high source and target conductances. #### Right Heatmap (STD) - **Key Values**: - Top-left cell: **-0.01** (dark blue) - Bottom-right cell: **3.0** (orange) - Central region: Most values are **0.0** (white), with some variability in the upper-right quadrant (e.g., **1.5**, **2.5**). - **Trends**: - A gradient from low (dark blue) to high (orange) standard deviation, indicating greater variability in conductance measurements at higher source and target conductances. - The most extreme positive value (**3.0**) is in the bottom-right corner, suggesting higher uncertainty in measurements at high conductances. --- ### Key Observations 1. **Correlation Between Source and Target Conductance**: - The left heatmap shows a positive correlation between source and target conductances, with higher values in the upper-right quadrant. - The right heatmap reveals that variability (STD) increases with higher conductances, as seen in the orange regions. 2. **Outliers**: - The left heatmap has a single extreme value (**7.5**) in the bottom-right corner, which may indicate an anomaly or a specific experimental condition. - The right heatmap shows a cluster of high STD values in the upper-right quadrant, suggesting inconsistent measurements at high conductances. 3. **Color Consistency**: - The color gradients in both heatmaps align with their respective legends, confirming accurate representation of numerical values. --- ### Interpretation The data suggests that **mean conductance values increase with higher source and target conductances**, while **standard deviation (variability) also rises** in these regions. This could imply that the system being studied (e.g., a conductance measurement setup) becomes less predictable or more variable at higher conductances. The extreme positive value in the left heatmap (7.5) may indicate a specific experimental condition or a data point requiring further investigation. The consistency of color coding across both heatmaps reinforces the reliability of the data representation. Overall, the heatmaps highlight a relationship between conductance levels and their associated statistical properties, which could be critical for optimizing measurement protocols or understanding system behavior. </details> supplementary note 5 We have defined the task success criteria as MSE Loss < 0.1 based on visual inspection. Below in Fig A. 6 , some network performances are shown. <details> <summary>Image 44 Details</summary> ![815be3a7](/v1/image/815be3a7e50b84fb84969bdee98ea38d31decdff85da2b6d9da62a4ef21072da) ### Visual Description ## Line Graphs: Signal Comparison Across Multiple Trials ### Overview The image contains six line graphs arranged in two columns and three rows, comparing two data series labeled "e-prop" (solid blue line) and "target" (dashed blue line). Each graph visualizes signal values over time steps (0–1000 ms) and includes a loss metric in the top-right corner. The y-axis ranges from -1.5 to 1.5, while the x-axis spans 0–1000 ms. All graphs share identical axis labels and legend placement but differ in loss values and line behavior. ### Components/Axes - **X-axis**: "Step (ms)" with ticks at 0, 100, 200, ..., 1000. - **Y-axis**: "Signal" with ticks at -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5. - **Legend**: Located in the top-right corner of each graph, with: - Solid blue line: "e-prop" - Dashed blue line: "target" - **Loss Values**: Displayed in the top-right corner of each graph (e.g., "Loss: 0.03623"). ### Detailed Analysis 1. **Top-Left Graph (Loss: 0.03623)** - **Trend**: The "e-prop" line closely mirrors the "target" line, with minor deviations. Both lines exhibit sharp peaks (~1.2) and troughs (~-1.0) around steps 100–200 and 500–600 ms. - **Key Points**: - Peak at ~100 ms: e-prop ≈ 1.2, target ≈ 1.1. - Trough at ~200 ms: e-prop ≈ -1.0, target ≈ -0.9. - Final step (1000 ms): Both lines converge near 0.1. 2. **Top-Right Graph (Loss: 0.08096)** - **Trend**: "e-prop" deviates significantly from "target," particularly during oscillations. Peaks and troughs are less synchronized. - **Key Points**: - Peak at ~300 ms: e-prop ≈ 0.8, target ≈ 0.6. - Trough at ~700 ms: e-prop ≈ -0.7, target ≈ -0.5. - Final step: e-prop ≈ 0.3, target ≈ 0.1. 3. **Middle-Left Graph (Loss: 0.0988)** - **Trend**: "e-prop" lags behind "target" during rapid transitions. Peaks are delayed by ~50 ms. - **Key Points**: - Peak at ~400 ms: e-prop ≈ 0.7, target ≈ 0.9. - Trough at ~800 ms: e-prop ≈ -0.6, target ≈ -0.8. 4. **Middle-Right Graph (Loss: 0.1118)** - **Trend**: "e-prop" exhibits erratic behavior, with overshoots and undershoots relative to "target." - **Key Points**: - Peak at ~600 ms: e-prop ≈ 1.1, target ≈ 0.7. - Trough at ~900 ms: e-prop ≈ -0.9, target ≈ -0.5. 5. **Bottom-Left Graph (Loss: 0.1219)** - **Trend**: "e-prop" underperforms, with flattened peaks and delayed responses. - **Key Points**: - Peak at ~200 ms: e-prop ≈ 0.5, target ≈ 0.8. - Trough at ~700 ms: e-prop ≈ -0.4, target ≈ -0.7. 6. **Bottom-Right Graph (Loss: 0.1502)** - **Trend**: "e-prop" shows the largest divergence, with irregular oscillations and poor synchronization. - **Key Points**: - Peak at ~500 ms: e-prop ≈ 0.9, target ≈ 0.6. - Trough at ~900 ms: e-prop ≈ -1.0, target ≈ -0.8. ### Key Observations - **Loss Correlation**: Lower loss values (e.g., 0.03623) correspond to tighter alignment between "e-prop" and "target." Higher losses (e.g., 0.1502) indicate greater divergence. - **Temporal Patterns**: Peaks and troughs in "e-prop" often lag or overshoot "target," especially in high-loss graphs. - **Signal Range**: Both lines consistently stay within -1.5 to 1.5, but "e-prop" exhibits more extreme fluctuations in high-loss cases. ### Interpretation The data suggests that "e-prop" performance varies across trials, with lower loss values indicating better model accuracy in replicating the "target" signal. The visual trends reveal that discrepancies arise primarily during rapid transitions (e.g., peaks/troughs), where "e-prop" either delays or overcompensates. The loss metric quantifies these deviations, providing a numerical benchmark for model refinement. Notably, the highest-loss graph (0.1502) demonstrates systemic instability, suggesting potential issues with model training or data preprocessing in that trial. </details> Step (ms) Step (ms) F igure A. 6 : Comparison of network performances with six different loss values. F igure A. 7 : Mean firing rate of 50 networks with PCM synapses trained using the mixed-precision method. <details> <summary>Image 45 Details</summary> ![17595c61](/v1/image/17595c6186153b3b2c4e6de1b8854445c6e54035684b5c8fc364e32c0c7da32e) ### Visual Description ## Line Graph: Mean Firing Rate Over Epochs ### Overview The image depicts a line graph illustrating the relationship between "Epoch" (training iterations) and "Mean Firing Rate (Hz)" over time. The graph shows a sharp initial peak followed by a gradual decline and stabilization, with a shaded confidence interval surrounding the central trend line. ### Components/Axes - **X-axis (Epoch)**: Ranges from 0 to 250, labeled "Epoch" with no explicit units. - **Y-axis (Mean Firing Rate)**: Labeled "Mean Firing Rate (Hz)" with values from 0 to 20 Hz. - **Legend**: No explicit legend is visible, but the line and shaded area are both rendered in blue. - **Line**: A single blue line represents the central trend. - **Shaded Area**: A light blue band around the line, likely representing a confidence interval or variability. ### Detailed Analysis 1. **Initial Peak**: - At ~50 epochs, the mean firing rate peaks at approximately **15 Hz**. - The shaded area here is widest, indicating high variability. 2. **Decline Phase**: - From 50 to ~150 epochs, the firing rate decreases steadily, reaching ~5 Hz by 150 epochs. - The shaded area narrows slightly, suggesting reduced variability. 3. **Stabilization Phase**: - From 150 to 250 epochs, the firing rate stabilizes between **3–4 Hz**. - The shaded area remains narrow, indicating consistent performance. ### Key Observations - The firing rate exhibits a **bell-shaped curve** with a sharp ascent and gradual descent. - The shaded area’s width correlates with the line’s steepness: wider during rapid changes (early epochs) and narrower during stabilization. - No outliers or anomalies are visible; the trend is smooth and monotonic after the initial peak. ### Interpretation The graph suggests a **training or adaptation process** where initial high firing rates (possibly due to overfitting or instability) decrease as the system stabilizes. The narrowing confidence interval implies increasing certainty in the model’s performance over time. The stabilization at ~3–4 Hz may represent an optimal or equilibrium state. The absence of a legend limits direct interpretation of the shaded area’s exact meaning, but its correlation with the line’s trend suggests it reflects measurement uncertainty or model variance. </details> F igure A. 8 : MSE loss of 50 networks trained with PCM synapses using the mixed-precision method. <details> <summary>Image 46 Details</summary> ![864f3ab9](/v1/image/864f3ab9b7f27eae99e16d60ddf574d54b03768a0b33f67f632cb56e069cf14a) ### Visual Description ## Line Chart: Training Loss ### Overview The chart depicts the evolution of training loss over 5,000 training steps. A blue line represents the loss metric, accompanied by a shaded blue region indicating variability or confidence intervals. The loss decreases overall but exhibits fluctuations, with notable peaks and troughs. ### Components/Axes - **Title**: "Training loss" (centered at the top). - **X-axis**: Labeled "Training step," scaled from 0 to 5,000 in increments of 1,000. - **Y-axis**: Labeled "Loss," scaled from 0.10 to 0.30 in increments of 0.05. - **Legend**: Not explicitly visible, but the blue line and shaded region are visually linked as a single data series. - **Grid**: Light gray gridlines for reference. ### Detailed Analysis - **Initial Phase (0–1,000 steps)**: - Loss starts at ~0.25, peaks at ~0.28 near step 500, then declines to ~0.20 by step 1,000. - Shaded region width is moderate, suggesting moderate uncertainty. - **Mid-Phase (1,000–3,000 steps)**: - Loss fluctuates between ~0.18 and ~0.22, with a trough at ~0.17 near step 2,000. - Shaded region narrows slightly, indicating reduced uncertainty. - **Late Phase (3,000–5,000 steps)**: - Loss drops to ~0.13 near step 3,000, rises to ~0.18 near step 4,000, then falls sharply to ~0.09 by step 5,000. - Shaded region becomes narrower, reflecting increased confidence in loss estimates. ### Key Observations 1. **Overall Trend**: Loss decreases monotonically from ~0.25 to ~0.09, with periodic fluctuations. 2. **Peaks**: Notable spikes at steps ~500 (~0.28) and ~4,000 (~0.18), possibly indicating training instability or data shifts. 3. **Uncertainty**: The shaded region’s width correlates with loss variability, widening during mid-phase fluctuations and narrowing toward convergence. 4. **Final Convergence**: Loss stabilizes near 0.09 by step 5,000, suggesting model convergence. ### Interpretation The chart demonstrates typical training dynamics: initial instability, gradual stabilization, and eventual convergence. The shaded region’s narrowing implies improving model certainty over time. Peaks at steps 500 and 4,000 may reflect transient challenges (e.g., learning rate adjustments or data distribution shifts). The final loss value (~0.09) indicates effective training, though the exact metric (e.g., MSE, cross-entropy) is unspecified. The absence of a legend limits interpretation of the shaded region’s exact meaning (e.g., standard deviation vs. percentile bounds). </details> ## supplementary note 6 ## appendix 2 : mosaic : in -memory computing and routing for small -world spike -based neuromorphic systems <details> <summary>Image 47 Details</summary> ![c922b97c](/v1/image/c922b97cdbcbc7c12ab78d317fb1bb1185582c3c14a5506abaca75e8e55845fa) ### Visual Description ## Heatmaps: Watts Strogatz Graph vs Newman Watts Strogatz Graph ### Overview The image contains two side-by-side heatmaps comparing metrics across two graph models: the Watts Strogatz Graph (left) and the Newman Watts Strogatz Graph (right). Both heatmaps use a color gradient to represent numerical values, with axes labeled `k` (vertical) and `n` (horizontal). The color scales indicate value ranges, with darker blues representing lower values and lighter greens representing higher values. --- ### Components/Axes - **Vertical Axis (k)**: Labeled with values `8`, `16`, `32`, `64`. - **Horizontal Axis (n)**: Labeled with values `128`, `256`, `512`, `1024`. - **Color Legends**: - **Left Heatmap (Watts Strogatz)**: Scale ranges from `0` to `120` (dark blue to light green). - **Right Heatmap (Newman Watts Strogatz)**: Scale ranges from `0` to `80` (dark blue to light green). - **Textual Labels**: - Top-left: "Watts Strogatz Graph" - Top-right: "Newman Watts Strogatz Graph" --- ### Detailed Analysis #### Watts Strogatz Graph (Left) | k \ n | 128 | 256 | 512 | 1024 | |---------|-------|-------|-------|-------| | 8 | 15.0 | 31.0 | 63.0 | 127.0 | | 16 | 7.0 | 15.0 | 31.0 | 63.0 | | 32 | 3.0 | 7.0 | 15.0 | 31.0 | | 64 | 1.0 | 3.0 | 7.0 | 15.0 | #### Newman Watts Strogatz Graph (Right) | k \ n | 128 | 256 | 512 | 1024 | |---------|-------|-------|-------|-------| | 8 | 10.0 | 20.8 | 41.8 | 84.5 | | 16 | 4.4 | 9.7 | 20.4 | 41.6 | | 32 | 1.7 | 4.3 | 9.6 | 20.4 | | 64 | 0.3 | 1.7 | 4.3 | 9.7 | --- ### Key Observations 1. **Trends**: - **Watts Strogatz Graph**: Values increase exponentially as both `k` and `n` increase. The highest value (`127.0`) occurs at `k=8`, `n=1024`. - **Newman Watts Strogatz Graph**: Values are consistently lower than the Watts Strogatz Graph but follow a similar exponential trend. The highest value (`84.5`) occurs at `k=8`, `n=1024`. - **Color Correlation**: Darker blues dominate the lower-left corners (small `k`, small `n`), while lighter greens dominate the upper-right corners (large `k`, large `n`). 2. **Anomalies**: - The Newman Watts Strogatz Graph values are systematically lower than the Watts Strogatz Graph for equivalent `k` and `n` values, suggesting a normalization or adjustment in the Newman model. --- ### Interpretation The heatmaps demonstrate that both graph models exhibit a strong positive correlation between the parameters `k` (degree of connectivity) and `n` (number of nodes). The Watts Strogatz Graph shows higher absolute values, indicating it may measure a raw property (e.g., path lengths or clustering coefficients), while the Newman Watts Strogatz Graph likely represents a normalized or adjusted metric (e.g., efficiency or robustness). The exponential growth in values with increasing `k` and `n` suggests that network properties scale significantly with these parameters. The Newman model’s lower values imply a focus on relative or efficiency-based metrics rather than absolute quantities. </details> n n F igure A. 9 : The heatmaps show the ratio of zero elements to non-zero elements in the connectivity matrix for two examples of recurrently connected small-world graph generators. As n (number of nodes, e.g., neurons, in the graph) increases and k (number of neighbour nodes for each node in a ring topology) decreases, the more connections in the connectivity matrix will be zero, indicating the increased proportion of non-used memory elements in a n × n crossbar array. supplementary note 1 Figure A. 9 quantifies the under-utilization of conventional crossbar arrays while storing example small-world connectivity patterns generated by two standard random graph generation models: Watts-Strogatz small-world graphs [ 279 ] and Newman-WattsStrogatz small-world graphs [ 346 ]. The first type of graphs is characterized by a high degree of local clustering with short vertex-vertex distances, observed in neural networks and selforganizing systems, whereas the latter type mostly captures the properties of lattices associated with statistical physics. supplementary note 2 To communicate the events between the computing nodes in neuromorphic chips, AER communication scheme has been developed and used [ 308 ]. In AER, whenever a spiking neuron in a chip (or module) generates a spike, its 'address' (or any given ID) is written on a high-speed digital bus and sent to the receiving neuron(s) in one (or more) receiver module(s). In general, AER processing modules require at least one AER input port and one AER output port. As neuromorphic systems scale up in size, complexity, and functionality, researchers have been developing more complex and smarter AER 'variations' to maintain the efficiency, reconfigurability, and reliability of the ever-growing target systems they want to build. The scheme that is used to transport events can be source or destination based, where the source or destination address is embedded in the sent event 'packet'. In the source-based scheme, each receiving neuron has a local CAM that stores the address of all the neurons that are connected to it. In the destination-based approach, each event hops between the nodes where its address gets compared to the node's address until it matches and thus gets delivered. Source-driven routing provides the designer with more freedom to balance event traffic and design routes, but the hardware complexity increases the delays. Destination-based creates pre-determined routes along the network and the designer can only change the output ports [ 347 ]. In summary, in source-based routing, the system requires a CAM memory per neuron, which results in an increase in the area and memory access read times. In destination-based routing, the configurability in the network structure is reduced. Comparatively, in the Mosaic, the routers are memory crossbars that are distributed between the computing cores and steer the spiking information in the mesh. Thus, neither local CAMs, nor a centralized memory is required for routing. F igure A. 10 : (top) Different random graphs generated using Mosaic model, changing the probability of devices being in their High Conductive state in the neuron tile ( pn ) and routing tile ( pr ). (bottom) The probability of device switching is a function of the voltage applied to it while being programmed. <details> <summary>Image 48 Details</summary> ![9dc0e261](/v1/image/9dc0e26140021a7485fdea8360202bd356c74ed726f38d9a8caf6638d632af7e) ### Visual Description ## Network Diagrams and Electrical Characteristics ### Overview The image contains two network diagrams and two electrical characteristic graphs. The diagrams illustrate network structures with varying node and edge probabilities, while the graphs show SET probability distributions and high resistive state measurements. ### Components/Axes **Top Left Diagram:** - Nodes: Green circles - Edges: Blue lines - Labels: - pn = 0.05 (node probability) - pr = 0.5 (edge probability) **Top Right Diagram:** - Nodes: Green circles - Edges: Blue lines - Labels: - pn = 0.25 (node probability) - pr = 0.05 (edge probability) **Bottom Left Graph (SET Probability):** - X-axis: SET Voltage (V) [0.6–1.4] - Y-axis: SET Probability [0–1] - Legend: - Cyan squares: 100ns Square - Green pluses: 500ns Square - Red crosses: 10μs Square - Purple diamonds: 10μs Ramp **Bottom Right Graph (High Resistive State):** - X-axis: Absolute Reset Voltage (V) [1.5–3.0] - Y-axis: High Resistive State (Ω) [100k–1G] - Legend: - Cyan squares: Mean - Black dashed line: 1σ - Red dashed line: 2σ ### Detailed Analysis **Network Diagrams:** 1. Top left shows dense connectivity (pn=0.05, pr=0.5) with 12 nodes and 30 edges 2. Top right shows sparser connectivity (pn=0.25, pr=0.05) with 10 nodes and 15 edges 3. Both diagrams use identical node/edge styling but differ in connection density **SET Probability Graph:** - All curves show sigmoidal behavior - 10μs Ramp (purple) has steepest slope (Δy=0.8 at 1.2V) - 100ns Square (cyan) has shallowest slope (Δy=0.6 at 1.2V) - All curves converge at y=1.0 by 1.4V **High Resistive State Graph:** - Mean curve (cyan) shows exponential growth - 1σ band (black) shows ±15% variation - 2σ band (red) shows ±30% variation - Resistance increases by 1000x from 1.5V to 3.0V ### Key Observations 1. Network connectivity inversely correlates with pr values (0.5 vs 0.05) 2. SET probability curves demonstrate pulse duration effects: - Longer pulses (10μs) enable faster probability increase - Shorter pulses (100ns) require higher voltages for same probability 3. Resistive state measurements show: - 100kΩ at 1.5V (base state) - 1GΩ at 3.0V (maximum state) - 2σ variation exceeds 1σ by 15% at all voltages ### Interpretation The diagrams and graphs collectively demonstrate: 1. **Network Architecture Tradeoffs**: Higher node probability (pn=0.25) reduces connectivity density despite maintaining same edge probability (pr=0.05) 2. **Electrical Switching Dynamics**: - Pulse duration directly impacts SET probability curves - Longer pulses enable lower voltage operation (10μs Ramp achieves 0.8 probability at 1.0V vs 1.2V for 100ns Square) 3. **Resistive State Variability**: - Standard deviation bands show significant measurement uncertainty - 2σ variation represents 30% of mean value at 2.5V 4. **Device Performance Implications**: - Shorter pulses require higher voltages for reliable switching - Resistive state measurements must account for 30% variability in critical applications - Network connectivity patterns may correlate with device reliability metrics </details> F igure A. 11 : Mosaic connectivity example, formed by setting the probability of connection within Neuron Tile ( pNT ) and Routing Tiles ( pRT ). (left) Densely connected Mosaic composed of 2 Neuron Tiles and 1 Routing Tile. The graph related to its connectivity is shown as well adjacency matrix. (right) Sparsely connected Mosaic. The graph is programmed to favor the intraNeuton Tile connectivity and allow for two clusters to emerge, penalizing connections between the two clusters. <details> <summary>Image 49 Details</summary> ![1ad980df](/v1/image/1ad980dfa22326637353459f78a21336de83a5b0ff1fc22d7acf2f517b7a1d41) ### Visual Description ## Grid-Graph Heatmap System: Neuron Connection Probabilities ### Overview The image presents a four-panel system visualizing neuron connections through grids, graphs, and heatmaps. Each panel demonstrates different connection probabilities (P_NT and P_RT) with corresponding network structures and heatmap distributions. ### Components/Axes 1. **Grid Panels (Left/Right)** - **Structure**: 8x8 grid with neuron IDs 1-8 labeled on top/bottom rows - **Connections**: - Green lines: Original connections - Blue lines: Modified connections - **Directionality**: Arrows indicate flow from top to bottom grids 2. **Graph Panels (Center)** - **Nodes**: Labeled 1-8 with green circles - **Edges**: Blue lines connecting nodes - **Probability Labels**: - P_NT (orange): Top-right corner - P_RT (orange): Bottom-right corner 3. **Heatmaps (Bottom)** - **Axes**: Neuron ID (1-8) on both X and Y axes - **Color Coding**: - Blue squares: Active connections - White squares: No connections - **Positioning**: Directly below corresponding grid/graph panels ### Detailed Analysis 1. **Panel 1 (High Probability)** - **Grid**: All neurons connected (8x8 grid) - **Graph**: Fully connected network (15 edges) - **Heatmap**: 15 blue squares (dense distribution) - **Probabilities**: P_NT=0.75, P_RT=0.6 2. **Panel 2 (Medium Probability)** - **Grid**: Partial connections (5 blue lines) - **Graph**: Complex network with 7 edges - **Heatmap**: 7 blue squares (moderate density) - **Probabilities**: P_NT=0.30, P_RT=0.05 3. **Panel 3 (Low Probability)** - **Grid**: Sparse connections (3 blue lines) - **Graph**: Tree structure with 3 edges - **Heatmap**: 3 blue squares (sparse distribution) - **Probabilities**: P_NT=0.30, P_RT=0.05 ### Key Observations 1. **Probability Correlation**: - Higher P_NT/P_RT values (Panel 1) correlate with denser connections - Lower values (Panels 2-3) show progressively sparser networks 2. **Network Topology**: - Panel 1: Complete graph (K8) - Panel 2: Scale-free network - Panel 3: Hierarchical tree structure 3. **Heatmap Patterns**: - Diagonal dominance in Panel 1 - Clustered connections in Panel 2 - Linear progression in Panel 3 ### Interpretation This visualization demonstrates how connection probabilities (P_NT and P_RT) influence neural network architecture: 1. **Dense Networks**: High probabilities (Panel 1) create fully connected systems with maximum information flow 2. **Sparse Networks**: Low probabilities (Panel 3) produce hierarchical structures with limited pathways 3. **Intermediate States**: Panel 2 shows transitional complexity between dense and sparse configurations The system suggests a probabilistic model where: - P_NT governs initial connection formation - P_RT controls connection retention/pruning - The heatmaps provide quantitative validation of network topology changes Notably, the absence of diagonal connections in Panel 3's heatmap implies directional constraints in the pruning process, while Panel 2's clustered connections suggest community formation at intermediate probability levels. </details> supplementary note 3 Routing tiles define the connectivity of spiking neural networks implemented on Mosaic. When the number of memristive devices in the routing tiles which are in their high-conductive state (HCS) is not sparse, Mosaic resembles a densely connected neural network (Fig. A. 10 , top left). When most of the memristor in the routing tiles are in the low-conductance state, Mosaic is sparsely connected (Fig. A. 10 , top right). Furthermore, one can further sparsify Mosaic networks by setting memristors in the neuron tiles to the LCS. To do so, we can change the probability of memristors being in their HCS in the neuron tiles, pn , and in the routing tiles, pr . The switching of the RRAMs presents the property of probabilistic switching as a function of the voltage applied during the programming operation as is shown in Fig. A. 10 , bottom. Fig. A. 11 shows the construction of two graph topologies, made of 2 Neuron Tiles and one Routing Tile, to clarify the formation of the graphical structure in the Mosaic. By controlling the probability of connections within the Neuron and Routing Tiles, we can produce a densely connected graph (left) with pNT = 0.75, pRT = 0.6, and a sparse graph (right) with pNT = 0.30, pRT = 0.05. The corresponding connectivity matrix is also shown in the figure, which is directly represented as a hardware architecture in the 3 tiles of the Mosaic, as shown in the figure. F igure A. 12 : (Neuron tiles (green) transfer information in the form of spikes to each other through routing tiles (blue). Details of the Mosaic architecture is shown with the size of the neuron and routing tiles. The neuron tiles receive feed-forward input from four directions of North (N), East (E), West (W), and South (S), and local recurrent input from the neurons in the tile. The neurons integrate the information and once spike, send their output to 4 directions. Having 4 neurons in a tile, gives rise to 16 outputs ( 4 outputs copied in 4 directions), and 20 inputs ( 4 inputs from 4 directions ( 16 ), plus 4 recurrent inputs). The routing tiles receive 16 inputs ( 4 inputs from 4 directions) and send out 16 outputs ( 4 outputs in 4 directions). In the crossbars, the red squares and black squares represent devices in their high conductive and low conductive state, respectively. The connection between the neuron tile and the routing tile is directly through a wire. For instance, V out < 3:0 > is the same as the Vin , W , and V in,E < 3:0 > is the same as Vout , W . <details> <summary>Image 50 Details</summary> ![e608f417](/v1/image/e608f4171e593dc16c8cd81cb7f15857b8fd35620bda0e563f5978d583d6322a) ### Visual Description ## Block Diagram: Data Processing and Routing Architecture ### Overview The image depicts a hierarchical block diagram of a data processing and routing system. It consists of three primary regions: 1. **Top-left (Green Box)**: A processing unit with labeled inputs (V_in_N, V_in_E, V_in_W, V_in_S) and outputs (V_out_0 to V_out_3). 2. **Top-right (Blue Box)**: A secondary processing unit with similar inputs but outputs labeled V_out_N, V_out_E, V_out_W, V_out_S. 3. **Bottom Grid**: A modular network of interconnected blocks labeled "20 x 4 (in 4 dir.)" and "16 x 16," with directional arrows indicating data flow. ### Components/Axes - **Inputs/Outputs**: - **Green Box**: - Inputs: `V_in_N`, `V_in_E`, `V_in_W`, `V_in_S` (labeled with arrows pointing into the box). - Outputs: `V_out_0`, `V_out_1`, `V_out_2`, `V_out_3` (labeled with arrows pointing out of the box). - **Blue Box**: - Inputs: `V_in_N`, `V_in_E`, `V_in_W`, `V_in_S` (same as green box). - Outputs: `V_out_N`, `V_out_E`, `V_out_W`, `V_out_S` (labeled with arrows pointing out of the box). - **Grid Labels**: - "20 x 4 (in 4 dir.)": Indicates a block configuration with 20 rows and 4 columns, operating in 4 directions. - "16 x 16": Indicates a square block configuration with 16 rows and 16 columns. - **Arrows**: - Green arrows (→) represent data flow from inputs to outputs. - Blue arrows (↑/↓) indicate vertical data routing. - Red squares (■) and black squares (■) in the green box may represent processing units or control logic. ### Detailed Analysis - **Green Box**: - Contains 16 red squares (■) and 16 black squares (■) arranged in a grid. - Arrows connect inputs to outputs via these squares, suggesting conditional or parallel processing. - Outputs `V_out_0` to `V_out_3` are labeled with directional arrows, implying discrete data streams. - **Blue Box**: - Contains 16 red squares (■) and 16 black squares (■) in a similar grid. - Outputs `V_out_N`, `V_out_E`, `V_out_W`, `V_out_S` suggest directional routing (North, East, West, South). - **Grid Network**: - Composed of alternating blue and green blocks. - Labels "20 x 4" and "16 x 16" indicate varying block sizes, possibly representing different processing capacities or data throughput rates. - Arrows connect blocks in a grid pattern, suggesting a scalable or modular architecture. ### Key Observations 1. **Modular Design**: The system uses repeatable blocks (e.g., 16x16, 20x4) for scalability. 2. **Directional Routing**: Outputs in the blue box are labeled with cardinal directions, implying spatial or directional data handling. 3. **Conditional Processing**: Red and black squares in the green box may represent logic gates or decision points. 4. **Data Flow**: Arrows indicate a hierarchical flow from inputs to outputs, with potential feedback loops (e.g., `V_out_3` feeding back into the grid). ### Interpretation This diagram likely represents a **parallel data processing system** with modular components. The green and blue boxes act as processing stages, while the grid below represents a network for routing or distributing processed data. The use of directional labels (N, E, W, S) suggests applications in spatial data handling, such as sensor networks or communication systems. The varying block sizes (16x16 vs. 20x4) may indicate optimization for different data types or computational loads. **Notable Patterns**: - The green box processes inputs into four discrete outputs, while the blue box routes outputs directionally. - The grid’s "20 x 4" and "16 x 16" blocks suggest a balance between granularity and scalability. **Underlying Functionality**: The system could be part of a **distributed computing framework**, where inputs are processed in parallel (green/blue boxes) and then aggregated or routed through a network (grid). The directional outputs imply applications in robotics, IoT, or real-time data analytics. </details> supplementary note 4 Figure A. 12 shows the details of the Mosaic architecture, with a zoomed in neuron and routing tile pair. The diagram in the top shows how one cluster of neuron/one router sends and receives information to and from the routing/neuron tile. This highlights the strength of this architecture which makes the connectivity easy through simple wiring to the neighbour, without suffering from long wires, as the maximum length of a wire is the size of the wire from one row/column, plus the size of the connecting column/row. F igure A. 13 : Schematic of the neuron tile including the CMOS synapse and neuron circuits fabricated for use in this paper. RRAMs are used as the weights of the neurons. On the arrival of any of the input events Vin , the amplifier pins node Vx to Vtop and thus a read voltage equivalent to Vtop -Vbot is applied across Gi , giving rise to current i in at M 1 . This current is mirrored to M 2 giving rise to i buf f which is in turn again mirrored through the M 3 -M 4 transistor pair. The 'synaptic dynamics' circuit is the Differential Pair Integrator (DPI) [ 311 ]. On the arrival of any of the input events, Vi , 0 < i < n , Iw , equivalent to i buf f , flows in transistor M 5. Depending on the value on Vg , a portion of Iw flows out of the MOS capacitor M 6 and discharges it. This current is proportional to Gi , 0 < i < n . As soon as the event is gone, MOS capacitor M 6 charges back through the M 8 path with current Itau , which determines the rate of charging, and thus the time constant of the synaptic dynamics. The output current of the DPI synapse, Isyn , is injected into the neuron's membrane potential node, Vmem , and charges MOS capacitor M 13 . There is also an alternative path with a DC current input through M 17 which can charge neuron's membrane potential. Membrane potential charging has a time constant determined by Vlk at the gate of M 11 . As soon as the voltage developed on Vmem passes the threshold of the following inverter stage, it generates a pulse. The width of the pulse, depends on the delay of the feedback path from Vout to the gate of M 12 . This delay is determined by the inverter delays, and the refractory time constant. The inverter symbols with the horizontal dashed lines correspond to a starved inverter circuits with longer delays. The refractory period time constant depends on the MOS cap M 16 and the bias on Vrp . <details> <summary>Image 51 Details</summary> ![4ed00774](/v1/image/4ed00774652537d94dffe0648d5b20b3fe67bba1e6768d60eca84a9c8ee56996) ### Visual Description ## Circuit Diagram: Voltage Buffer and Amplification Stage ### Overview The diagram depicts a three-stage electronic circuit: 1. **Left Section (Red Box)**: Voltage divider with operational amplifier (op-amp) 2. **Middle Section (Green Box)**: Buffer stage using MOSFETs (M2, M3) 3. **Right Section (Blue Box)**: Amplification chain with multiple MOSFETs and inverters ### Components/Axes **Left Section (Red Box):** - **Voltage Divider**: Resistors labeled G₀, G₁, ..., Gₙ with voltage taps V₀, V₁, ..., Vₙ - **Op-Amp**: Non-inverting configuration with input Vₓ and output V_top - **MOSFET M1**: Connected between V_top and ground **Middle Section (Green Box):** - **Buffer Stage**: - MOSFET M2: Source connected to V_in, gate to V_top, drain to V_buff - MOSFET M3: Source grounded, gate to V_buff, drain to V_in **Right Section (Blue Box):** - **Amplification Chain**: - MOSFETs M4-M16 arranged in a cascaded configuration - Inverter stages (triangle symbols) between M5-M9 and M13-M15 - Output labeled V_out with load resistor Rₚ ### Detailed Analysis **Left Section:** - Voltage divider creates reference voltages V₀ to Vₙ - Op-amp maintains V_top ≈ V_in (non-inverting gain ≈1) - M1 acts as a current sink/sink for V_top **Middle Section:** - M2/M3 form a voltage follower with high input impedance - V_buff = V_in (ideal buffer behavior) **Right Section:** - Cascaded MOSFETs (M4-M16) create multi-stage amplification - Inverter stages (M5-M9, M13-M15) introduce phase shifts - Final output V_out drives load Rₚ ### Key Observations 1. **Signal Flow**: V_in → Voltage divider → Op-amp → Buffer → Amplification chain → V_out 2. **Isolation**: Green box buffer isolates input from amplification stages 3. **Phase Shifts**: Inverter stages introduce 180° phase shifts at each stage 4. **Load Matching**: Rₚ terminates the output to prevent reflections ### Interpretation This circuit implements a buffered, multi-stage amplifier with: - **Input Conditioning**: Voltage divider and op-amp stabilize reference voltages - **Isolation**: Buffer stage prevents loading effects on preceding stages - **Gain Control**: Cascaded MOSFETs provide adjustable gain through gate-source voltages (V_gs) - **Phase Adjustment**: Inverter stages enable precise timing control in signal processing The design suggests applications in precision instrumentation or RF signal conditioning, where high input impedance and controlled gain are critical. The use of MOSFETs instead of BJTs indicates a preference for low noise and high input impedance characteristics. </details> F igure A. 14 : Measurements from fabricated neuron's output frequency as a function of the input DC voltage. The DC voltage is applied at the gate of transistor M 17 shown in Fig. A. 13 as Vdc . Therefore, as the gate voltage of M 17 changes linearly, the current of M 17 and thus the output frequency of the neuron changes non-linearly. Each curve is measured with a different neuron's time constant, determined by a different voltage, Vlk , on the gate of transistor M 11 in Fig. A. 13 . As the leak voltage increases, the neuron's time constant decreases, giving rise to a lower output frequency. <details> <summary>Image 52 Details</summary> ![366df324](/v1/image/366df324fb4aece85b0ac73288bf630dbcde7e741fe90655b222d65a6eb1042d) ### Visual Description ## Line Chart: Output Frequency vs. Vdc for Various Vlk Values ### Overview The chart illustrates the relationship between output frequency (kHz) and Vdc (mV) for eight distinct Vlk voltage settings (260mV to 370mV). Each Vlk value is represented by a unique colored line, showing how output frequency increases with Vdc. All lines originate at zero frequency and exhibit nonlinear growth, with steeper slopes corresponding to higher Vlk values. ### Components/Axes - **X-axis**: Vdc (mV), ranging from 200 to 400 mV in 25 mV increments. - **Y-axis**: Output Frequency (kHz), ranging from 0 to 80 kHz in 20 kHz increments. - **Legend**: Positioned on the right, mapping colors to Vlk values: - Blue: Vlk=260mV - Green: Vlk=270mV - Orange: Vlk=280mV - Red: Vlk=290mV - Purple: Vlk=300mV - Black: Vlk=320mV - Gray: Vlk=350mV - Dark Blue: Vlk=370mV ### Detailed Analysis 1. **Vlk=260mV (Blue)**: - Flatline at 0 kHz until Vdc=375mV, then rises sharply to ~15 kHz at 400mV. - Slope: ~0.375 kHz/mV (260mV to 400mV). 2. **Vlk=270mV (Green)**: - Begins rising at Vdc=325mV (~5 kHz), reaching ~30 kHz at 400mV. - Slope: ~0.65 kHz/mV. 3. **Vlk=280mV (Orange)**: - Starts at Vdc=300mV (~10 kHz), peaks at ~45 kHz at 400mV. - Slope: ~0.8 kHz/mV. 4. **Vlk=290mV (Red)**: - Activates at Vdc=275mV (~15 kHz), reaches ~55 kHz at 400mV. - Slope: ~1.0 kHz/mV. 5. **Vlk=300mV (Purple)**: - Begins at Vdc=250mV (~20 kHz), climbs to ~65 kHz at 400mV. - Slope: ~1.1 kHz/mV. 6. **Vlk=320mV (Black)**: - Starts at Vdc=225mV (~25 kHz), peaks at ~75 kHz at 400mV. - Slope: ~1.3 kHz/mV. 7. **Vlk=350mV (Gray)**: - Activates at Vdc=200mV (~30 kHz), reaches ~80 kHz at 400mV. - Slope: ~1.5 kHz/mV. 8. **Vlk=370mV (Dark Blue)**: - Sharpest rise: ~35 kHz at Vdc=350mV, ~85 kHz at 400mV. - Slope: ~1.75 kHz/mV. ### Key Observations - **Nonlinear Growth**: All lines exhibit exponential-like growth, with frequency increasing more rapidly at higher Vdc values. - **Vlk Dependency**: Higher Vlk values produce steeper slopes, indicating a direct proportionality between Vlk and frequency gain. - **Threshold Behavior**: Lower Vlk values (e.g., 260mV) require higher Vdc to activate, while higher Vlk values (e.g., 370mV) respond at lower Vdc. - **Convergence**: Lines cluster tightly at low Vdc (200–250mV) but diverge significantly above 300mV. ### Interpretation The data demonstrates that output frequency is both voltage-dependent (Vdc) and Vlk-dependent. Higher Vlk values enable greater frequency output at the same Vdc, suggesting Vlk acts as a multiplier for voltage-to-frequency conversion efficiency. The nonlinear relationship implies potential saturation effects at extreme Vdc values. This behavior could be critical for tuning oscillators or voltage-controlled frequency synthesizers, where Vlk adjustments allow precise frequency modulation without altering Vdc. The absence of data below Vdc=200mV for higher Vlk values may indicate operational limits or measurement constraints. </details> supplementary note 5 Details of the implementation of the neuron row, the circuit that leverages the information of the conductance of a memristor to weight the effect of a spike to a neuron is shown in Figure A. 13 . The circuit features multiple inputs connected to a row of memristive devices (left) and a Front-End circuit buffering the current read from the devices to a differential-pair-integrator synapse. The synapse is then connected to a leaky-integrated-and-fire (LIF) neuron which eventually emits a spike. Figure A. 14 delves deeper in the behavior of the LIF neuron analyzing its output spiking frequency against an input DC voltage and its linear behavior respect to the RRAM conductance in a neuron row circuit. T F igure A. 15 : Schematic of a routing tile circuit offering two paths per direction. The routing tile receives eight inputs, comprising two pulse channels per direction, labelled as < 0 > or < 1 >, from the neighbouring tiles to the North (N), South (S), East (E) and West (W), and provides complimentary outputs. An example is shown of an input pulse arriving to the common gate of the fourth row of memory. Devices are coloured green or red to denote whether they are in the HCS or LCS. It is shown that, due to this input pulse, output pulses are produced by the routing columns containing the (green) devices programmed in the HCS. <details> <summary>Image 53 Details</summary> ![4fc2f944](/v1/image/4fc2f9441e9433208a5f6dc860924637a69f613f5a3b0a64be3022a340ef70e3) ### Visual Description ## Block Diagram: Programmable Logic Array with Voltage-Controlled Switches ### Overview The diagram depicts a structured array of voltage-controlled logic switches organized in a grid format. It shows input voltage conditions (V_in) mapped to output voltage states (V_out) through a series of interconnected components. The system uses color-coded symbols (red for LCS, green for HCS) to indicate operational states or control conditions. ### Components/Axes - **Vertical Input Axis (Left Side):** - Labels: V_in<0,N>, V_in<1,N>, V_in<0,S>, V_in<1,S>, V_in<0,W>, V_in<1,W>, V_in<0,E>, V_in<1,E> - Symbols: Red (LCS) and green (HCS) switches/icons - Flow direction: Input signals enter from the left - **Horizontal Output Axis (Bottom):** - Labels: V_out<0,N>, V_out<1,N>, V_out<0,S>, V_out<1,S>, V_out<0,W>, V_out<1,W>, V_out<0,E>, V_out<1,E> - Flow direction: Output signals exit downward - **Grid Structure:** - Columns labeled V_bot (repeated 8 times) - Each column contains 8 identical switch-like components - Components arranged in 8 rows corresponding to input conditions - **Legend (Left Side):** - Red symbol: LCS (Logic Control Switch) - Green symbol: HCS (Hybrid Control Switch) ### Detailed Analysis 1. **Input-Output Mapping:** - Each input condition (e.g., V_in<0,N>) connects to a corresponding output (V_out<0,N>) - Inputs are categorized by directional subscripts (N, S, W, E) and binary states (0,1) 2. **Switch Component Behavior:** - All switches share identical structure: - Top: Voltage source (V_bot) - Middle: Controlled by input conditions - Bottom: Connected to output nodes - Red (LCS) and green (HCS) symbols appear in specific input-output pairs: - Red (LCS): V_in<0,S>, V_in<1,S>, V_in<0,W>, V_in<1,W>, V_in<0,E>, V_in<1,E> - Green (HCS): V_in<0,S>, V_in<1,S>, V_in<0,W>, V_in<1,W>, V_in<0,E>, V_in<1,E> 3. **Control Logic:** - The grid suggests a matrix of configurable logic gates - V_bot appears to be a common control voltage for all switches - Input conditions determine which switches activate (red/green states) ### Key Observations 1. **Symmetry in Control:** - All directional subscripts (N,S,W,E) appear equally in both input and output - Binary states (0,1) are consistently paired across the grid 2. **Color Distribution:** - Red (LCS) and green (HCS) symbols appear in identical positions across all columns - No variation in symbol placement between columns 3. **Flow Consistency:** - All input signals follow identical left-to-right then downward flow - Outputs maintain uniform downward trajectory ### Interpretation This diagram represents a configurable logic array where: 1. **Directional Control:** The N/S/W/E subscripts suggest spatial orientation control, possibly for a multi-axis system 2. **Binary State Management:** The 0/1 states indicate digital logic control 3. **Hybrid Operation:** The coexistence of LCS and HCS suggests a system combining pure logic with hybrid control mechanisms 4. **Modular Design:** The repeated V_bot columns imply a scalable architecture where additional control voltages can be added The system appears designed for complex pattern recognition or spatial data processing, where input conditions (V_in) determine output states (V_out) through programmable logic paths. The consistent symbol placement across columns suggests standardized control protocols for each directional quadrant. </details> F igure A. 16 : The routing column circuit with example waveforms. Input (red, left) voltage pulses, Vin, draw a current iin proportional to the conductance state, Gn, of the read 1 T 1 R structures. Two devices are labelled with HCS, indicating that they have been programmed with a conductance corresponding to the high conductance state, and one is labelled LCS in reference to the low conductance state. This resulting currents are buffered (green, centre), ibuff , into a current comparator circuit where it is compared with a reference current iref . When the buffered current exceeds the reference current a voltage pulse is generated at the column output (blue, right). <details> <summary>Image 54 Details</summary> ![6320ea23](/v1/image/6320ea2300fafa20319ab7e1c9a91fd10e54ba502f01c2c0506b73f0e13c1463) ### Visual Description ## Circuit System Response Analysis ### Overview The image presents a technical analysis of a circuit system's response to input voltage thresholds, featuring three time-domain graphs and a circuit diagram. The graphs depict voltage/current responses over time (0-3ms) for different input conditions, while the circuit diagram illustrates component relationships and signal flow. ### Components/Axes **Graph 1 (Left):** - **X-axis**: Time (ms) [0, 1, 2, 3] - **Y-axis**: Voltage (V) [0.0, 0.25, 0.5, 0.75, 1.0, 1.25] - **Legend**: - Orange: V_in <0> (top-left) - Red: V_in <1> (middle) - Dark Red: V_in <N> (bottom) - **Key Elements**: Vertical voltage spikes at specific time points **Graph 2 (Center):** - **X-axis**: Time (ms) [0, 1, 2, 3] - **Y-axis**: Current (μA) [0, 0.5, 1.0, 1.5, 2.0, 2.5] - **Legend**: - Green: HCS (right) - Green Arrow: LCS (pointing to 2ms spike) - **Key Elements**: Current spikes at 1ms (HCS) and 2ms (LCS) **Graph 3 (Right):** - **X-axis**: Time (ms) [0, 1, 2, 3] - **Y-axis**: Current (μA) [0, 0.25, 0.5, 0.75, 1.0] - **Legend**: - Blue: V_out (right) - **Key Elements**: Current spike at 3ms **Circuit Diagram (Bottom):** - **Components**: - G0, G1, ..., GN (voltage-controlled switches) - V_bot (input voltage source) - V_in (input voltage thresholds) - V_x (intermediate voltage) - V_top (reference voltage) - i_in, i_buff, i_ref (current paths) - V_out (output voltage) - **Signal Flow**: - Input voltages (V_in) trigger G0-GN switches - Current paths split between HCS (i_buff) and LCS (i_ref) - Output V_out generated through operational amplifier ### Detailed Analysis **Graph 1 Trends:** - V_in <0> (orange): 1.2V spike at 1ms - V_in <1> (red): 1.0V spike at 2ms - V_in <N> (dark red): 0.8V spike at 3ms - All spikes show 0V baseline before/after events **Graph 2 Trends:** - HCS (green): 2.5μA spike at 1ms (duration ~1ms) - LCS (green): 1.5μA spike at 2ms (duration ~1ms) - No overlap between HCS and LCS current paths **Graph 3 Trends:** - V_out (blue): 1.0μA spike at 3ms (duration ~1ms) - No activity before 3ms **Circuit Diagram Details:** - Voltage thresholds (V_in <0>, <1>, <N>) control G0-GN switches - HCS path: V_x → i_buff → V_out - LCS path: V_top → i_ref → V_out - Operational amplifier configuration suggests current summation ### Key Observations 1. Voltage response precedes current response by 1ms (Graph 1 spikes at 1ms precede Graph 2 HCS at 1ms) 2. LCS response delayed by 1ms compared to HCS (2ms vs 1ms) 3. Output current (V_out) delayed by 1ms compared to LCS (3ms vs 2ms) 4. Current magnitudes decrease with increasing input threshold (2.5→1.5→1.0μA) ### Interpretation This system demonstrates a cascaded response mechanism where: 1. Input voltage thresholds (V_in) trigger sequential activation of G0-GN switches 2. HCS (High Current Switch) responds first at 1ms, followed by LCS (Low Current Switch) at 2ms 3. Output current (V_out) integrates both paths with a 1ms delay, suggesting processing time 4. Current magnitude inversely correlates with input threshold sensitivity (higher thresholds = lower currents) The circuit appears to implement a time-multiplexed current switching system with threshold-dependent response characteristics. The 1ms delays between stages suggest intentional design for sequential processing, while the current magnitude differences indicate varying load capacities for different input conditions. </details> supplementary note 6 Details on the implementation of the Routing Tiles. Figure A. 15 shows a full-size schematics of a routing tile with 2 neurons allocated per direction. Figure A. 16 expands on the details of the implementation of the routing column, the circuit that uses the state of a memristor to decide whether to block or pass (route) a spike through the Mosaic architecture. F igure A. 17 : An example of how a neuron tile can be interfaced to external event-based inputs (i.e., those generated by an event-based sensor). With respect to the neuron tile circuit presented in the paper (permitting connections to adjacent tiles as well as recurrent connections within the tile), this figure shows two additional rows of devices stacked on top of the array. As an arbitrary example, here two additional signals can be integrated in the neuron circuits. <details> <summary>Image 55 Details</summary> ![2aad3bac](/v1/image/2aad3bacbfe4417637b091d0ff67f1065bd703f0667fcc7a3568abfd5683ae61) ### Visual Description ## Circuit Diagram: Dual-Input Voltage-Controlled Buffer Circuit ### Overview The diagram illustrates a dual-input, dual-output buffer circuit with voltage-controlled switching elements. It features two input voltage sources (V_in<0> and V_in<1>), two output voltage nodes (V_out<0> and V_out<1>), and a matrix of voltage-controlled switches (V_bot) arranged in a 2x2 grid. Buffer currents (i_buff) are shown flowing from the output nodes. ### Components/Axes - **Voltage-Controlled Switches (V_bot)**: - Positioned at the top of the diagram, labeled "V_bot" on both sides. - Represented by switch symbols with wavy lines, connected to input/output nodes. - **Input Voltages**: - Labeled "V_in<0>" and "V_in<1>", positioned vertically on the left side. - Connected to the lower row of switches via vertical lines. - **Output Voltages**: - Labeled "V_out<0>" and "V_out<1>", positioned vertically at the bottom. - Connected to the upper row of switches via vertical lines. - **Buffer Currents (i_buff)**: - Indicated by green arrows pointing downward from the output nodes. - Labeled "i_buff" at the base of the output connections. ### Detailed Analysis - **Switch Matrix Structure**: - The 2x2 grid of switches connects input voltages (V_in<0>, V_in<1>) to output voltages (V_out<0>, V_out<1>). - Each switch column is controlled by a shared V_bot voltage, suggesting multiplexing/demultiplexing functionality. - **Signal Flow**: - External inputs enter from the left, passing through switches to reach output nodes. - V_bot controls switch states, determining which input voltage is routed to which output. - **Symmetry**: - Identical switch configurations on both sides of the diagram imply mirrored functionality for the two output channels. ### Key Observations 1. **No Numerical Values**: The diagram lacks explicit voltage/current values, focusing on structural relationships. 2. **Voltage Control**: V_bot acts as a control signal, modulating switch states to route inputs to outputs. 3. **Buffer Functionality**: The output nodes are labeled as buffers, suggesting isolation or amplification of input signals. ### Interpretation This circuit likely functions as a **voltage-controlled routing network**, where V_bot determines signal paths between inputs and outputs. The dual-input, dual-output </details> ## supplementary note 7 F igure A. 18 : SHD keyword spotting dataset test accuracies for Mosaic architectures with different total number of neurons in the network for a) 4 x 4 Neuron Tile layout (a total of 16 number of Neuron tiles) and b) 8 x 8 Neuron Tile layout. The number of neurons per tile is equal to the total number of recurrent neurons divided by the number of neuron tiles. Median and standard deviation are calculated using 3 experiments with varying sparsity constraints. <details> <summary>Image 56 Details</summary> ![ac9d9798](/v1/image/ac9d9798f86df0e7e649e8e52995e119e31b74d5297279bd396133af8b1c0ae1) ### Visual Description ## Bar Charts: SHD Test Accuracy vs. Total Recurrent Neurons ### Overview The image contains two bar charts (labeled **a** and **b**) comparing SHD test accuracy against total recurrent neurons for two different neuron tile configurations: **16** (chart **a**) and **64** (chart **b**). Each chart shows four bars representing different total recurrent neuron counts (2048, 1024, 512, 256), with test accuracy values on the y-axis. ### Components/Axes - **X-axis**: "Total recurrent neurons" with categories: 2048, 1024, 512, 256. - **Y-axis**: "Test accuracy" with a scale from 0 to 80. - **Legend**: Located on the right side of each chart, with four colors: - **Green**: 2048 total recurrent neurons - **Blue**: 1024 total recurrent neurons - **Gray**: 512 total recurrent neurons - **Red**: 256 total recurrent neurons - **Chart Titles**: - **a**: "Number of Neuron Tiles: 16" - **b**: "Number of Neuron Tiles: 64" ### Detailed Analysis #### Chart **a** (Neuron Tiles: 16) - **Green (2048 neurons)**: ~70% test accuracy. - **Blue (1024 neurons)**: ~65% test accuracy. - **Gray (512 neurons)**: ~60% test accuracy. - **Red (256 neurons)**: ~50% test accuracy. #### Chart **b** (Neuron Tiles: 64) - **Green (2048 neurons)**: ~75% test accuracy. - **Blue (1024 neurons)**: ~65% test accuracy. - **Gray (512 neurons)**: ~50% test accuracy. - **Red (256 neurons)**: ~30% test accuracy. ### Key Observations 1. **Decreasing Accuracy with Fewer Neurons**: In both charts, test accuracy declines as the total recurrent neuron count decreases. 2. **Steeper Drop in Chart **b** (64 Neuron Tiles)**: The reduction in accuracy is more pronounced when neuron tiles increase from 16 to 64. For example: - At 256 neurons, accuracy drops from ~50% (chart **a**) to ~30% (chart **b**). 3. **Consistent Trend**: The relationship between neuron count and accuracy is linear in both charts, but the magnitude of the drop varies with neuron tile count. ### Interpretation The data suggests that **increasing the number of neuron tiles** (from 16 to 64) amplifies the sensitivity of SHD test accuracy to reductions in total recurrent neurons. This implies a potential trade-off: models with more neuron tiles may require more recurrent neurons to maintain performance, or the architecture of larger neuron tile configurations is less robust to resource constraints. The steeper decline in chart **b** highlights that neuron tile count could be a critical factor in optimizing neural network efficiency under limited computational resources. </details> a <details> <summary>Image 57 Details</summary> ![8c5c215f](/v1/image/8c5c215f623e4ef28ef7491b4935f97888ef19f6b3349dd3768d08117f4f58d4) ### Visual Description ## Bar Charts: SHD Test Accuracy vs. Sparsity Regularization Constant ### Overview The image contains two side-by-side bar charts comparing SHD test accuracy across three sparsity regularization constants (λ = 0.1, 0.05, 0.01) for neural networks with 16 and 64 neuron tiles. Test accuracy is measured on the y-axis (0–80%), while λ values are plotted on the x-axis. Error bars indicate variability in test accuracy. ### Components/Axes - **Chart (a)**: - Title: "Number of Neuron Tiles: 16" - X-axis: "Regularization constant, λ" (values: 0.1, 0.05, 0.01) - Y-axis: "Test accuracy" (0–80%) - Bars: Purple (λ = 0.1), Maroon (λ = 0.05), Dark Purple (λ = 0.01) - Error bars: Vertical lines atop each bar. - **Chart (b)**: - Title: "Number of Neuron Tiles: 64" - X-axis: Same λ values as chart (a). - Y-axis: Same scale as chart (a). - Bars: Identical color coding as chart (a). ### Detailed Analysis #### Chart (a): 16 Neuron Tiles - **λ = 0.1**: Test accuracy ≈ 60% (error range: ~55–65%). - **λ = 0.05**: Test accuracy ≈ 62% (error range: ~58–66%). - **λ = 0.01**: Test accuracy ≈ 65% (error range: ~60–70%). #### Chart (b): 64 Neuron Tiles - **λ = 0.1**: Test accuracy ≈ 62% (error range: ~57–67%). - **λ = 0.05**: Test accuracy ≈ 64% (error range: ~60–68%). - **λ = 0.01**: Test accuracy ≈ 68% (error range: ~63–73%). ### Key Observations 1. **Higher neuron tiles improve accuracy**: For all λ values, test accuracy increases when neuron tiles double from 16 to 64. 2. **Regularization strength impacts accuracy**: Lower λ values (stronger regularization) generally yield higher accuracy, with the largest gains observed at λ = 0.01. 3. **Error variability**: Test accuracy variability (error bars) increases slightly with lower λ values, particularly in chart (b). ### Interpretation The data suggests that increasing the number of neuron tiles enhances model performance, likely due to greater capacity to capture complex patterns. Stronger regularization (lower λ) further improves accuracy, possibly by reducing overfitting. However, the trade-off between regularization strength and model complexity is evident: while λ = 0.01 achieves the highest accuracy, it may risk underfitting if applied excessively. The error bars highlight the importance of statistical validation, as accuracy estimates vary within ±3–5% ranges. **Note**: No explicit legend is present, but color coding (purple → maroon → dark purple) consistently maps to λ values across both charts. Spatial grounding confirms charts (a) and (b) are aligned horizontally, with titles and axes labeled identically for direct comparison. </details> b F igure A. 19 : SHD keyword spotting dataset test accuracies for Mosaic architectures trained with different sparsity regularization values. As explained in the Methods section of the main text, the regularization is added to the loss function to exponentially penalize the long-range connections. The plot shows the accuracy for strong (default, λ = 0.1), medium ( λ = 0.05) and weaker ( λ = 0.01) sparsity regularization on a) 4 x 4 neuron tile layout and b) 8 x 8 neuron tile layout. Median and standard deviation are calculated using 4 experiments with varying number of neurons per neuron tile. ## supplementary note 8 <details> <summary>Image 58 Details</summary> ![b58f45e8](/v1/image/b58f45e82910b6c0f75e78311624d4aebc493c66dd288e750d16112e7916d1bd) ### Visual Description ## Circuit Diagram: Voltage-Controlled Current Mirror with Op-Amp Feedback ### Overview The diagram depicts a voltage-controlled current mirror circuit incorporating an operational amplifier (op-amp) for feedback stabilization. Two MOSFETs (M1 and M2) form the core current mirror, while the op-amp regulates the output current (I_out) by comparing the voltage at V_top with ground. The resistor G_i and voltage V_x introduce a feedback path to modulate the input current (i_in). --- ### Components/Axes 1. **Components**: - **M1**: N-channel MOSFET acting as the input current source. - **M2**: N-channel MOSFET mirroring the current from M1. - **G_i**: Resistor connected between V_x and V_in, introducing feedback. - **Op-Amp**: High-gain amplifier (gain = A) with non-inverting input (+) connected to V_top and inverting input (-) grounded. - **V_in**: Input voltage source driving M1's gate. - **V_bot**: Reference voltage connected to M1's source. - **V_x**: Voltage node between G_i and V_in. - **V_out**: Output voltage node connected to M2's drain. - **I_out**: Output current flowing through M2. 2. **Flow**: - V_in → M1 gate → M1 drain → V_out → M2 gate → M2 drain → I_out. - Feedback loop: V_x (via G_i) → V_in → M1 → V_out → Op-amp → Adjusts V_top to stabilize I_out. --- ### Detailed Analysis - **M1 and M2**: Form a basic current mirror. M1's drain current (i_in) is mirrored to M2's drain current (I_out) under ideal conditions (V_top ≈ V_bot). - **Op-Amp Role**: The op-amp compares V_top (feedback voltage) with ground. Its high gain (A) ensures V_top is driven to match the voltage at the non-inverting input, stabilizing I_out despite load variations. - **Resistor G_i**: Introduces a voltage drop (V_x) that modulates the input current i_in, enabling dynamic control of the current mirror ratio. --- ### Key Observations 1. **No Numerical Values**: The diagram lacks explicit values for A, G_i, or voltages (V_in, V_bot, V_top). Assumptions: - A ≈ 100,000 (typical op-amp gain). - G_i ≈ 1 kΩ (common biasing resistor). - V_bot ≈ 0 V (ground reference). 2. **Feedback Mechanism**: The op-amp's negative feedback loop ensures V_top ≈ 0 V, forcing M2 to operate in saturation for stable current mirroring. 3. **Voltage Nodes**: V_x is critical for feedback, linking the op-amp's input to the MOSFET biasing network. --- ### Interpretation This circuit combines a current mirror with op-amp feedback to achieve **precision current regulation**. The op-amp compensates for mismatches between M1 and M2 (e.g., threshold voltage differences) and external load variations. The resistor G_i allows tuning the current mirror ratio by adjusting V_x. Such a design is common in analog integrated circuits for biasing transistors or driving constant-current loads. The absence of numerical data suggests this is a schematic for conceptual understanding rather than a specific implementation. </details> bot F igure A. 20 : Analysis of the read-out circuitry. The amplifier with gain A, pins voltage Vx to the voltage Vtop . On the arrival of the pulse on Vin , a current equal to i in = ( Vtop -Vbot ) Gi flows into the memristor i , which is then mirrored out iout . supplementary note 9 Figure A. 20 details the implementation of the read-out circuit used in the Mosaic architecture. Though not optimized for area, we have used this implementation for both the neuron and routing tiles. The dominant power consumption of the circuit depends on the required bandwidth (BW) of the feedback loop. This BW depends on the maximum conductance of the RRAM, Gmax . For Gi , max , once an input arrives to Vin , the current i in has to settle to ( Vtop -Vbot ) Gi , max within a settling time, ts , a proportion of the pulse width. This timing sets the speed at which the loop should work, and thus its BW. If the loop does not close in this time, the amplifier will slew, and the voltage Vx drops. In both neuron and routing tiles, this condition should be met for Vx to stay pinned at Vtop while RRAM is being read. However, the neuron and routing tiles have different BW requirements. In the neuron tile , the read-out circuitry has to resolve between at least 8 levels of current for the 8 levels that each RRAM device can take. Therefore, the Least Significant Bit (LSB) of the i in current for the neuron tile is i in , LSB , N = Vref ( Gmax -Gmin ) N . Based on the Fig. 5 . 2 d, this value for the neuron tile is 100 mV ( 120 µ S -40 µ S ) 8 = 1 µ A . Note that since the 8 levels to be resolved are in the Low Resistive State of the RRAM, the Gmax and Gmin are the minimum and maximum of the range in the LRS, which correspond to 40 µ S and 120 µ S . In the routing tile , the read-out circuitry has to resolve between two levels which will either let the spike regenerate and thus propagate, or will get blocked. Therefore, the LSB of the i in current in the routing tile is i in , LSB , R = Vref ( Gmax -Gmin ) N . Based on the Fig. 5 . 2 d, this value for the neuron tile is 100 mV ( 40 µ S -10 µ S ) 2 = 15 µ A . Note that since the 2 levels to be resolved are the LRS and HRS of the RRAM, the Gmax and Gmin correspond to 10 µ S and 40 µ S . To be able to distinguish between any two levels in both cases, we will consider a maximum error of i in , LSB 2 . Therefore, the maximum tolerable error in the the neuron tile is 0.5 µ A and in the routing tile is 7.5 µ A . This means that if the feedback loop does not close in ts of the pulse width, Vx drop is a lot more tolerable in the routing tile than it is in the neuron tile. This suggests that the bandwidth requirements in the case of neuron tile is 7.5/0.5 = 15 times more than that of the routing tile. The BW requirements directly translate to the biasing of the amplifier and thus its power consumption. Therefore, the static power consumption of the neuron tile is 15 times that of the routing tile. The current requirements also translate to area, since larger currents require wider transistors. appendix 3 : reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing F igure A. 21 : Proposed volatile diffusive switching mechanism. (a-d) illustrate the various stages of filament formation and rupture. i-iv indicate the possible reactions happening within the device. <details> <summary>Image 59 Details</summary> ![a491e55e](/v1/image/a491e55e41389882db6e8e775ff6b80d6b0ed475bef11a54448a3fc1502a1428) ### Visual Description ## Diagram: Electrochemical Reaction and Diffusion Dynamics ### Overview The diagram illustrates four panels (a-d) depicting electrochemical processes involving silver (Ag), bromide (Br⁻), and electron (e⁻) transfer. Panels show spatial arrangements of ions, electron flow, and reaction dynamics under varying potentials (0V and +1V). A legend at the bottom defines four reaction pathways with color-coded symbols. ### Components/Axes - **Panels**: - **a**: 0V potential, black dots (Ag), blue circles (Br⁻), and a vertical gradient (light blue to gray). - **b**: +1V potential, black dots (Ag), blue circles (Br⁻), red arrows (electron flow), and a vertical gradient. - **c**: Thin filament (purple dashed line), black dots (Ag), blue circles (Br⁻), and a vertical gradient. - **d**: Diffusive dynamics, black dots (Ag), blue circles (Br⁻), red arrows (electron flow), and a vertical gradient. - **Legend**: - **i**: Ag → Ag⁺ + e⁻ (black dot → white circle + red dot). - **ii**: Ag⁺ + e⁻ → Ag (white circle + red dot → black dot). - **iii**: Br⁻ → V_Br (blue circle → white circle). - **iv**: Ag⁺ + e⁻ → Ag⁺ (black dot + red dot → white circle). - **Spatial Grounding**: - Legend positioned at the bottom center. - Panels arranged in a 2x2 grid (top-left: a, top-right: b, bottom-left: c, bottom-right: d). - Reaction symbols (black, red, blue, purple) match panel elements. ### Detailed Analysis - **Panel a**: Neutral (0V) conditions. Black dots (Ag) and blue circles (Br⁻) are uniformly distributed. No electron flow indicated. - **Panel b**: +1V applied. Red arrows show electron flow from Ag to Br⁻. Reaction ii (Ag⁺ + e⁻ → Ag) occurs at the interface. - **Panel c**: Thin filament separates Ag and Br⁻. Reaction iii (Br⁻ → V_Br) occurs near the filament. - **Panel d**: Diffusive dynamics. Red arrows indicate electron flow, and blue circles (Br⁻) diffuse toward Ag⁺. Reaction iv (Ag⁺ + e⁻ → Ag⁺) is highlighted. ### Key Observations 1. **Reaction i (Ag → Ag⁺ + e⁻)**: Occurs in panel b, driven by +1V potential. 2. **Reaction ii (Ag⁺ + e⁻ → Ag)**: Dominates in panel b, reversing reaction i. 3. **Reaction iii (Br⁻ → V_Br)**: Localized near the thin filament in panel c. 4. **Reaction iv (Ag⁺ + e⁻ → Ag⁺)**: Occurs in panel d, suggesting electron transfer without net Ag reduction. 5. **Diffusion**: Panel d shows Br⁻ diffusion toward Ag⁺, driven by concentration gradients. ### Interpretation The diagram models an electrochemical cell where: - **Potential gradients** (0V to +1V) drive electron transfer between Ag and Br⁻. - **Thin filaments** act as barriers, localizing specific reactions (e.g., Br⁻ oxidation in panel c). - **Diffusion dynamics** (panel d) illustrate ion movement under electrochemical gradients, with reaction iv suggesting a catalytic or intermediate step. - **Reaction reversibility** is evident in panels b and d, where electron flow direction determines Ag oxidation/reduction. The system likely represents a redox process with competing reactions, where applied voltage and ion distribution govern the dominant pathway. The thin filament in panel c may represent a porous separator or catalyst, modulating ion/electron transport. </details> supplementary note 1 Diffusive behaviour: The volatile threshold switching behaviour can be attributed to the redistribution of Ag + and Br -ions under an applied electric field, and their back-diffusion upon removing power. The soft lattice of the halide perovskite matrix has been observed to enable Ag + migration with an activation energy ∼ 0.15eV 2 . Interestingly, migration of halide ions and their vacancies within the perovskite matrix also occur at similar energies ∼ 0.10 -0.25eV [ 253 , 348 , 349 ], making it difficult to pinpoint a singular operation mechanism. We hypothesize that during the SET process, Ag atoms are ionized to Ag + and forms a percolation path through the device structure. Electrons from the grounded electrode oxidize Ag + to form weak Ag filaments. Parallelly, Br -ions get attracted towards the positively charged electrode and a weak filament composed of vacancies ( VBr ) is formed. Both these factors increase the device conductance from a high resistance state (HRS) to a temporary low resistance state (LRS). Upon removing the electric field, the low activation energy of the ions causes them to diffuse back spontaneously, breaking the percolation path and leading to volatile memory characteristics a.k.a. short-term plasticity. The low compliance current ( Icc ) of 1 µ A ensures that the electrochemical reactions are well regulated, and the percolation pathways formed are weak enough to allow these diffusive dynamics. F igure A. 22 : Proposed non-volatile drift switching mechanism. (a-e) illustrate the various stages of filament formation and rupture. i-iv indicate the possible reactions happening within the device. <details> <summary>Image 60 Details</summary> ![d3ed3ca3](/v1/image/d3ed3ca3c99d6b39260a5c459da2dd8c3a8f628a8c9c892247a4eb8d33850cb2) ### Visual Description ## Diagram: Electrochemical Cell Ion Dynamics and Redox Reactions ### Overview The image is a scientific diagram illustrating ion movement and redox reactions in an electrochemical cell. It consists of five panels (a–e) showing sequential states of ion distribution, a "Thick filament" annotation, "Drift dynamics" panel, and a "Possible Reactions" section with chemical equations. The diagram uses color-coded ions (black for Ag, blue for Br⁻, red for electrons) and directional arrows to depict processes. --- ### Components/Axes 1. **Panels a–e**: - **a**: Initial state with 0V applied, showing uniform ion distribution. - **b**: +I (current input) applied, initiating ion migration. - **c**: "Thick filament" highlighted with a dashed blue box, showing concentrated ions. - **d**: "Drift dynamics" panel with ions redistributing under 0V. - **e**: -V (reverse voltage) applied, triggering reaction iii. 2. **Legend** (bottom-right): - **i**: Ag → Ag⁺ + e⁻ (oxidation) - **ii**: Ag⁺ + e⁻ → Ag (reduction) - **iii**: Br⁻ → VBr (oxidation) - **iv**: Ag + e⁻ → Ag⁺ (reduction) 3. **Voltage Labels**: - 0V (panels a, d) - +I (panel b) - -V (panel e) 4. **Ion Colors**: - Black: Ag (solid spheres) - Blue: Br⁻ (open circles) - Red: e⁻ (small red dots) --- ### Detailed Analysis 1. **Panel a**: - Uniform distribution of black (Ag) and blue (Br⁻) ions. - 0V applied; no current flow indicated. 2. **Panel b**: - +I current input shown by red arrow. - Br⁻ ions (blue) begin migrating toward the right electrode. 3. **Panel c**: - "Thick filament" region (dashed blue box) shows concentrated Ag⁺ ions (black). - Br⁻ ions (blue) accumulate near the left electrode. 4. **Panel d**: - Drift dynamics under 0V: Ag⁺ ions (black) diffuse toward the right, Br⁻ ions (blue) toward the left. 5. **Panel e**: - -V applied, reversing ion flow. - Reaction iii (Br⁻ → VBr) occurs at the left electrode (green arrow). - Reaction iv (Ag + e⁻ → Ag⁺) occurs at the right electrode (purple arrow). --- ### Key Observations 1. **Ion Migration**: - Br⁻ ions (blue) consistently move toward the anode (left electrode) under +I and -V. - Ag⁺ ions (black) migrate toward the cathode (right electrode) during drift dynamics. 2. **Reaction Correlation**: - Reaction i (oxidation of Ag) and reaction ii (reduction of Ag⁺) are linked to Ag ion movement. - Reaction iii (oxidation of Br⁻) is tied to the "Thick filament" region. 3. **Voltage Effects**: - +I drives Br⁻ oxidation; -V reverses ion flow and triggers reaction iii. --- ### Interpretation The diagram models an electrochemical cell where: - **Ag** acts as both anode and cathode, undergoing oxidation (i) and reduction (ii). - **Br⁻** is oxidized at the anode (reaction iii), forming VBr, suggesting a bromine-based electrolyte. - The "Thick filament" in panel c likely represents a structural change (e.g., electrode degradation or ion clustering) under high current. - Drift dynamics (panel d) illustrate ion redistribution under equilibrium conditions (0V), critical for understanding cell stability. **Notable Insights**: - The dual role of Ag as both oxidizing and reducing agent highlights its catalytic potential. - Reaction iii (Br⁻ → VBr) implies a bromine-based redox couple, which could influence cell voltage and efficiency. - The "Thick filament" annotation suggests a focus on material science aspects, such as electrode morphology changes during operation. This diagram emphasizes the interplay between ion dynamics, voltage application, and redox chemistry in electrochemical systems, with potential applications in battery design or corrosion studies. </details> supplementary note 2 Drift behaviour: Upon increasing the Icc to 1 mA ( 3 orders of magnitude higher than that used for volatile threshold switching), permanent and thicker conductive filamentary pathways are possibly formed within the device as illustrated in Fig. A. 22 . This increases the device conductance from a high resistance state (HRS) to a permanent and much lower low resistance state (LRS). Electrochemical reactions are triggered to a higher extent and hence, the switching dynamics is now dominated by the drift kinetics of the mobile ion species Ag + and Br -, rather than diffusion. Hence upon removing the electric field, the conductive filaments remain largely unaffected, and the devices retain their LRS and portray longterm plasticity. Application of voltage sweeps, or pulses of opposite polarity causes rupture of these filaments, and the devices are reset to their HRS. For DDAB-capped CsPbBr3NCs, the devices transition to a non-erasable non-volatile state within ∼ 50 cycles, indicating formation of very thick filaments (Fig. A. 23 ) On the other hand, the OGB-capped CsPbBr3 NCs display a record-high nonvolatile endurance of 5655 cycles and retention of 10 5 seconds (Fig. A. 24 ), pointing to better regulation of the filament formation and rupture kinetics. F igure A. 23 : Non-volatile drift switching of DDAB-capped CsPbBr 3 NC memristors. (a) Representative IV characteristics. (b) Endurance. (c) Retention. <details> <summary>Image 61 Details</summary> ![3fd91d5e](/v1/image/3fd91d5ea8f581aa8a21b48a0826ae69e41a7330878fab7a7be32bf879251c25) ### Visual Description ## Line Graph with Scatter Plots: Electrical Current vs Voltage/Cycles/Time ### Overview The image contains three subplots (a, b, c) analyzing electrical current behavior under varying conditions. Subplot a shows a voltage-current (I-V) characteristic curve, subplot b compares low-resistance state (LRS) and high-resistance state (HRS) currents across cycles, and subplot c tracks current decay over time for both states. ### Components/Axes **Subplot a (I-V Curve):** - **X-axis**: Voltage (V), linear scale from -4 V to +2 V - **Y-axis**: Current (A), logarithmic scale from 10⁻⁹ A to 10⁻¹ A - **Key features**: Sharp V-shaped current drop at 0 V, four distinct regions labeled 1-4 with arrows - **Legend**: Not explicitly shown, but regions marked with numbers 1-4 **Subplot b (Cycle Analysis):** - **X-axis**: Cycles (0-50), linear scale - **Y-axis**: Current (A), logarithmic scale from 10⁻⁶ A to 10⁻² A - **Data series**: - LRS: Yellow squares (~10⁻² to 10⁻⁴ A) - HRS: Green squares (~10⁻⁶ to 10⁻⁸ A) - **Legend**: Top-right corner, yellow = LRS, green = HRS **Subplot c (Time Decay):** - **X-axis**: Time (secs), logarithmic scale from 0 to 10⁵ - **Y-axis**: Current (A), logarithmic scale from 10⁻⁹ A to 10⁻³ A - **Data series**: - LRS: Yellow squares (flat line at ~10⁻³ A) - HRS: Green squares (rising from 10⁻⁹ to 10⁻⁷ A) - **Legend**: Top-right corner, yellow = LRS, green = HRS ### Detailed Analysis **Subplot a:** - Current drops from ~10⁻³ A (at -4 V) to ~10⁻⁹ A (at 0 V), then rises to ~10⁻³ A at +2 V - Region 1: Pre-breakdown conduction (left of 0 V) - Region 2: Breakdown region (sharp V-shape at 0 V) - Region 3: Post-breakdown conduction (right of 0 V) - Region 4: Intermediate state between regions 2 and 3 **Subplot b:** - LRS maintains 2-3 orders of magnitude higher current than HRS across all cycles - HRS shows slight downward trend (~10% decrease over 50 cycles) - LRS exhibits minor fluctuations (±5%) but remains stable **Subplot c:** - LRS current remains constant at ~10⁻³ A for entire duration - HRS current increases by ~2 orders of magnitude (10⁻⁹ → 10⁻⁷ A) over 10⁵ seconds - Time constant for HRS decay: ~5×10⁴ seconds (τ ≈ 3×10⁴ s) ### Key Observations 1. Sharp voltage-dependent current transition at 0 V (subplot a) 2. Persistent LRS/HRS current separation (subplot b) 3. Time-dependent HRS current increase despite initial low value (subplot c) 4. No hysteresis observed in LRS current over cycles or time ### Interpretation The data suggests a resistive switching device with: - Voltage-triggered breakdown mechanism (subplot a) - Bistable LRS/HRS states maintained across cycling (subplot b) - Time-dependent HRS current increase indicating possible aging or stress-induced conduction path formation (subplot c) - LRS stability suggests robust conductive filament formation - The V-shape in subplot a may represent different conduction paths or avalanche breakdown regions The logarithmic scales emphasize current magnitude variations across orders of magnitude, critical for analyzing device switching behavior and reliability metrics. </details> F igure A. 24 : Non-volatile drift switching of OGB-capped CsPbBr 3 NC memristors. Figure shows the retention performance. <details> <summary>Image 62 Details</summary> ![25b71ffc](/v1/image/25b71ffc71d31de5df677e36c8fda62310b903542507ef905e8f747bc1eed3fe) ### Visual Description ## Line Graph: Current vs. Time for LRS and HRS Systems ### Overview The image is a line graph comparing the current (in amperes) of two systems, LRS (Low Resistance System) and HRS (High Resistance System), over time (in seconds). The y-axis uses a logarithmic scale (10⁻³ to 10⁻⁹ A), while the x-axis is linear (0 to 1×10⁵ seconds). Two data series are plotted: LRS (blue squares with a black line) and HRS (dark blue squares with a dark blue line). The legend is positioned in the top-right corner. --- ### Components/Axes - **X-axis (Time)**: Labeled "Time (secs)" with tick marks at 0, 5×10⁴, and 1×10⁵ seconds. The scale is linear. - **Y-axis (Current)**: Labeled "Current (A)" with logarithmic ticks at 10⁻³, 10⁻⁵, 10⁻⁷, and 10⁻⁹ A. - **Legend**: Located in the top-right corner, with: - **LRS**: Blue squares (black line). - **HRS**: Dark blue squares (dark blue line). --- ### Detailed Analysis #### LRS (Low Resistance System) - **Trend**: The current decreases monotonically over time. - At **0 seconds**: ~1×10⁻³ A (1 mA). - At **5×10⁴ seconds**: ~1×10⁻⁵ A (10 µA). - At **1×10⁵ seconds**: ~1×10⁻⁷ A (0.1 µA). - **Behavior**: A gradual decline with a sharp drop between 5×10⁴ and 1×10⁵ seconds. #### HRS (High Resistance System) - **Trend**: The current decreases sharply initially, then slightly increases. - At **0 seconds**: ~1×10⁻⁷ A (0.1 µA). - At **5×10⁴ seconds**: ~1×10⁻⁹ A (1 nA). - At **1×10⁵ seconds**: ~1×10⁻⁸ A (10 nA). - **Behavior**: A steep drop followed by a minor recovery. --- ### Key Observations 1. **LRS vs. HRS Initial Values**: LRS starts at a significantly higher current (~1 mA) compared to HRS (~0.1 µA). 2. **Rate of Decline**: HRS experiences a more abrupt drop in current (from 0.1 µA to 1 nA) compared to LRS (from 1 mA to 10 µA). 3. **Recovery in HRS**: HRS shows a slight increase in current (~10 nA) at the end of the observed period, unlike LRS, which continues to decline. 4. **Logarithmic Scale Impact**: The y-axis compression emphasizes the relative magnitude differences between LRS and HRS currents. --- ### Interpretation - **System Behavior**: - LRS likely represents a system with a higher initial current that decays over time, possibly due to resistive losses or thermal effects. - HRS starts with a lower current but exhibits a more pronounced decay, followed by a minor recovery. This could indicate a secondary process (e.g., reactivation, hysteresis, or external stimulation) after a critical threshold (5×10⁴ seconds). - **Potential Applications**: - The data might relate to electrical systems, such as resistive heating elements, sensors, or energy storage devices, where LRS and HRS could correspond to different operational modes or materials. - The recovery in HRS suggests a non-linear response, possibly due to feedback mechanisms or material properties (e.g., phase changes, electrochemical reactions). - **Anomalies**: - The sharp drop in HRS at 5×10⁴ seconds may indicate a failure, threshold event, or transition to a different operational state. - The slight increase in HRS at 1×10⁵ seconds could signal a stabilization or external intervention. --- ### Spatial Grounding - **Legend**: Top-right corner, clearly associating colors with data series. - **Data Points**: LRS (blue squares) and HRS (dark blue squares) are distinct, with no overlap in marker colors. - **Axis Labels**: Positioned at the bottom (x-axis) and left (y-axis), with logarithmic scaling on the y-axis. --- ### Content Details - **Numerical Values**: - LRS: 1×10⁻³ A → 1×10⁻⁵ A → 1×10⁻⁷ A. - HRS: 1×10⁻⁷ A → 1×10⁻⁹ A → 1×10⁻⁸ A. - **Trends**: Both systems show decay, but HRS has a non-monotonic trend with a recovery phase. --- ### Final Notes The graph highlights contrasting dynamics between LRS and HRS, with LRS exhibiting a steady decline and HRS showing a sharp drop followed by a partial recovery. The logarithmic y-axis underscores the vast difference in current magnitudes, while the linear x-axis emphasizes the time-dependent evolution of these systems. Further investigation into the underlying physics (e.g., material properties, external stimuli) would clarify the mechanisms driving these trends. </details> F igure A. 25 : Transmission electron microscope (TEM) images of DDAB-capped CsPbBr 3 NCs. <details> <summary>Image 63 Details</summary> ![f7579001](/v1/image/f7579001bc1585ba6e2151b95bfab59e389eccbc6e5e4b7dd6b3e7ee0eda6c2e) ### Visual Description ## Photograph: Electron Microscopy Image of Nanoscale Grid Structure ### Overview The image depicts a high-resolution electron microscopy view of a nanoscale grid-like structure. The pattern consists of interconnected cells with varying shades of gray, suggesting differences in material density, composition, or imaging contrast. A scale bar labeled "10 nm" is visible in the bottom-left corner, providing spatial reference. No textual annotations, legends, or axis labels are present in the image. ### Components/Axes - **Scale Bar**: Located in the bottom-left corner, explicitly marked as "10 nm" in white text. The bar is horizontal and spans approximately 10 nm in the imaged structure. - **Grid Structure**: The primary feature is a lattice of interconnected cells. Cells appear hexagonal or slightly irregular polygonal in shape, with edges defined by darker contrast boundaries. - **Contrast Variation**: Cells exhibit a range of gray tones, from light gray (low density/contrast) to dark gray (high density/contrast). No explicit legend explains the contrast mapping. ### Detailed Analysis - **Cell Dimensions**: Based on the 10 nm scale bar, individual cells measure approximately 10–20 nm in diameter, with inter-cell spacing of ~5–10 nm. - **Shading Patterns**: Darker regions correlate with thicker or denser material, while lighter regions suggest thinner or less dense areas. Some cells contain small dark spots (likely defects or impurities) measuring ~1–3 nm. - **Structural Uniformity**: The grid exhibits periodic repetition, though minor distortions (e.g., warping, irregular edges) are visible in ~15% of cells, suggesting manufacturing or deposition inhomogeneities. ### Key Observations 1. **Scale Consistency**: The 10 nm scale bar aligns with the observed cell sizes, confirming nanoscale resolution. 2. **Contrast Mechanism**: Darker cells likely represent regions with higher atomic density or different elemental composition compared to lighter cells. 3. **Defects**: Small dark spots within cells may indicate vacancies, impurities, or phase separations. ### Interpretation This image likely represents a nanostructured material, such as a porous metal oxide, metallic foam, or catalytic substrate. The grid pattern suggests a templated synthesis process, while the contrast variations highlight material heterogeneity. The presence of defects implies potential applications in catalysis or energy storage, where surface area and active sites are critical. The absence of explicit labels necessitates further analysis (e.g., elemental mapping) to confirm compositional differences. The uniformity of the grid structure indicates controlled fabrication, but minor irregularities suggest room for optimization in production processes. </details> supplementary note 3 F igure A. 26 : Non-linear variation of the device conductance as a function of the stimulation pulse (a) amplitude, (b) width and (c) number. For (a), the pulse width and number is kept constant at 25 ms and 1 respectively. For (b), the pulse amplitude and number is kept constant at 1 V and 1 respectively. For (c), the pulse amplitude and width is kept constant at 1 V and 25 ms respectively. <details> <summary>Image 64 Details</summary> ![bfabdfc9](/v1/image/bfabdfc9f671aaf1f080b1d5806fbffcfd551bda73ee0261b046df420162d142) ### Visual Description ## Three-Panel Graph: Electrical Current Response to Voltage and Pulse Parameters ### Overview The image presents three distinct graphs (a, b, c) analyzing electrical current responses under varying experimental conditions. Each panel isolates a specific variable while controlling others, demonstrating relationships between voltage, pulse width, and time-dependent current behavior. ### Components/Axes **Panel a** - **Title**: Pulse width = 25 ms - **X-axis**: Voltage (V) [1.0–2.0 V] - **Y-axis**: Current (A) [0–1.2×10⁵ A] - **Data**: Blue squares with linear regression line - **Legend**: Blue squares labeled "Panel a" **Panel b** - **Title**: Pulse amplitude = 1 V - **X-axis**: Pulse width (ms) [10–40 ms] - **Y-axis**: Current (A) [1×10⁻⁶–1.4×10⁻⁶ A] - **Data**: Green squares with linear regression line - **Legend**: Green squares labeled "Panel b" **Panel c** - **Title**: Pulse amplitude = 1 V - **X-axis**: Time (secs) [0–0.6 s] - **Y-axis**: Current (A) [8×10⁻⁸–1.8×10⁻⁶ A] - **Data**: Orange squares with fitted curve - **Legend**: Orange squares labeled "Panel c" ### Detailed Analysis **Panel a** - **Trend**: Current increases exponentially with voltage (R² ≈ 0.98). - **Key Data Points**: - 1.0 V → ~1×10⁴ A - 1.2 V → ~1.2×10⁴ A - 1.4 V → ~2×10⁴ A - 1.6 V → ~4×10⁴ A - 1.8 V → ~6×10⁴ A - 2.0 V → ~1×10⁵ A **Panel b** - **Trend**: Current increases linearly with pulse width (R² ≈ 0.99). - **Key Data Points**: - 10 ms → ~8×10⁻⁷ A - 15 ms → ~1.1×10⁻⁶ A - 20 ms → ~1.3×10⁻⁶ A - 25 ms → ~1.4×10⁻⁶ A - 30 ms → ~1.5×10⁻⁶ A - 35 ms → ~1.6×10⁻⁶ A - 40 ms → ~1.7×10⁻⁶ A **Panel c** - **Trend**: Current rises gradually over time, plateauing near 1.8×10⁻⁶ A. - **Key Data Points**: - 0.0 s → ~1×10⁻⁷ A - 0.2 s → ~1.3×10⁻⁷ A - 0.4 s → ~1.6×10⁻⁷ A - 0.6 s → ~1.8×10⁻⁶ A ### Key Observations 1. **Voltage-Dependent Current (Panel a)**: Exponential growth suggests nonlinear conductivity, possibly due to threshold effects or material saturation. 2. **Pulse Width Modulation (Panel b)**: Linear relationship indicates proportional charge delivery with longer pulses. 3. **Time-Dependent Response (Panel c)**: Delayed current rise implies capacitive charging or resistive heating effects. ### Interpretation - **Panel a** reveals a critical voltage threshold (~1.4 V) where current sharply increases, suggesting a phase transition or breakdown mechanism. - **Panel b** demonstrates that pulse width directly controls charge injection, critical for applications requiring precise energy delivery. - **Panel c** highlights transient dynamics, where initial current lag may reflect system capacitance or thermal inertia. - **Cross-Panel Insight**: Panels b and c share identical pulse amplitude (1 V), isolating pulse width and time as independent variables. The exponential vs. linear trends across panels suggest distinct physical mechanisms governing current response. *Note: All values are approximate, with uncertainty proportional to data point scatter. No textual content in non-English languages was observed.* </details> F igure A. 27 : Echo state Properties. Variation in the device conductance of the volatile diffusive perovskite memristor as a function of the inter-group pulse interval. The interval between the two sequences increases from (a) 10 ms, (b) 30 ms to (c) 300 ms. (d-g) Current responses when subjected to 10 identical stimulation pulses ( 1 V, 5 ms ) with different pulse interval conditions for the final pulse. The interval varies from (d) 10 ms, (e) 23 ms, (f) 41 ms, to (g) 80 ms. <details> <summary>Image 65 Details</summary> ![de886b1b](/v1/image/de886b1be250a4858051fd4483ee35b401557d39ac21b0fe7bb4face9d05e3f5) ### Visual Description ## Line Graphs: Current vs. Time Across Multiple Trials ### Overview The image contains seven subplots (a-g) depicting current (A) over time (seconds) for different experimental conditions. Each subplot includes a main graph with data points and an inset showing current pulse patterns. Data points are represented by colored markers (green squares for subplots a-c, purple squares for d-g), with vertical bars in insets indicating pulse durations. ### Components/Axes - **Y-axis**: "Current (A)" with logarithmic scale (8×10⁷ to 1.6×10⁶ A) - **X-axis**: "Time (secs)" with linear scale (0 to 2 seconds) - **Markers**: - Green squares (subplots a-c): Labeled a1-a4 in subplot a - Purple squares (subplots d-g): No explicit labels - **Insets**: Vertical bars showing current pulse timing and duration ### Detailed Analysis #### Subplot a - **Data Points**: - a1: (0.02s, 1.2×10⁶ A) - a2: (0.04s, 1.3×10⁶ A) - a3: (0.06s, 1.4×10⁶ A) - a4: (0.08s, 1.6×10⁶ A) - **Inset**: Pulses at 0.04s, 0.06s, 0.08s (duration ~0.02s each) #### Subplot b - **Data Points**: - (0.62s, 1.2×10⁶ A) - (0.65s, 1.4×10⁶ A) - (0.68s, 1.6×10⁶ A) - **Inset**: Pulses at 0.64s, 0.66s, 0.68s (duration ~0.02s each) #### Subplot c - **Data Points**: - (1.22s, 1.2×10⁶ A) - (1.25s, 1.4×10⁶ A) - (1.28s, 1.6×10⁶ A) - **Inset**: Pulses at 1.24s, 1.26s, 1.28s (duration ~0.02s each) #### Subplot d - **Data Points**: - (0.01s, 1.2×10⁶ A) - (0.03s, 1.3×10⁶ A) - (0.05s, 1.4×10⁶ A) - (0.07s, 1.5×10⁶ A) - (0.09s, 1.6×10⁶ A) - **Inset**: Pulses at 0.05s-0.15s (duration ~0.02s each) #### Subplot e - **Data Points**: - (0.61s, 1.2×10⁶ A) - (0.63s, 1.3×10⁶ A) - (0.65s, 1.4×10⁶ A) - (0.67s, 1.5×10⁶ A) - (0.69s, 1.6×10⁶ A) - **Inset**: Pulses at 0.65s-0.75s (duration ~0.02s each) #### Subplot f - **Data Points**: - (1.21s, 1.2×10⁶ A) - (1.23s, 1.3×10⁶ A) - (1.25s, 1.4×10⁶ A) - (1.27s, 1.5×10⁶ A) - (1.29s, 1.6×10⁶ A) - **Inset**: Pulses at 1.25s-1.35s (duration ~0.02s each) #### Subplot g - **Data Points**: - (1.81s, 1.2×10⁶ A) - (1.83s, 1.3×10⁶ A) - (1.85s, 1.4×10⁶ A) - (1.87s, 1.5×10⁶ A) - (1.89s, 1.6×10⁶ A) - **Inset**: Pulses at 1.85s-1.95s (duration ~0.02s each) ### Key Observations 1. **Current Trends**: All subplots show a consistent pattern of current increasing from ~1.2×10⁶ A to 1.6×10⁶ A over ~0.1s, followed by a sharp drop. 2. **Pulse Synchronization**: Insets reveal regular pulse intervals (0.02s duration) with varying frequencies between subplots. 3. **Temporal Distribution**: Subplots a-c occur earlier in the time range (0-0.8s), while d-g occur later (0-2s). 4. **Marker Consistency**: Green squares appear only in early subplots (a-c), while purple squares dominate later subplots (d-g). ### Interpretation The data suggests a controlled activation/deactivation sequence in an electrical system, with: - **Gradual Current Ramp**: Linear increase in current over ~0.1s across all trials - **Pulse Modulation**: Insets indicate pulse frequency increases with time (earlier subplots show slower pulse rates) - **System Stability**: Consistent current drop after activation suggests a stable operational state - **Experimental Progression**: Color-coded markers (green→purple) may indicate different experimental phases or conditions The systematic increase in current followed by pulse modulation implies a controlled experimental protocol, possibly testing response times or system stability under varying conditions. The logarithmic y-axis emphasizes relative changes in current magnitude across the measurement range. </details> supplementary note 4 The echo state property of a reservoir refers to the impact that previous inputs have on the current reservoir state, and how that influence fades out with time. To test this, four short pulses of 1 V, 5 ms are applied to the device in a paired-pulse format and the device states are recorded. A non-linear accumulative behaviour is observed as a function of the paired-pulse interval. In Fig A. 27 a, a short paired-pulse interval of 10 ms results in an echo index (defined as ( a 4 a 1 ) ∗ 100 ) of 118%. Longer intervals ( 30 ms and 300 ms in Fig. A. 27 b,c) result in smaller echo indices ( 107.5% and 107.2% respectively), reflective of the short-term memory in the perovskite memristors. To further test the echo state property, three pulse trains consisting of 10 identical stimulation pulses ( 1 V, 5 ms ) are applied to the device and the device states are recorded. In all cases, a non-linear accumulative behaviour is observed. As shown in Fig. A. 27 d-g, short intervals ( ≤ 23 ms ) for the last stimulation pulse result in further accumulation while long intervals result in depression of the device state. This indicates that the present device state remembers the input temporal features in the recent past but not the far past, allowing the diffusive perovskite memristors to act as efficient reservoir elements. F igure A. 28 : Input waveforms. A representative "Write" (amplitude = 1 V, pulse width = 20 ms ) and "Read" (amplitude = -0.5 V, pulse width = 5 ms ) spike train applied to the volatile perovskite memristors in the reservoir layer. <details> <summary>Image 66 Details</summary> ![2af9e6b0](/v1/image/2af9e6b08256673eb294b212aa29f4ec9284bd4e5b844121eacdff206e06ab77) ### Visual Description ## Bar Chart: Voltage vs. Time ### Overview The image depicts a bar chart representing voltage measurements over time. The x-axis (Time) spans 0.00 to 1.00 seconds, while the y-axis (Voltage) ranges from -1.0 V to 1.5 V. Vertical gray bars indicate voltage levels at specific time intervals, with some bars missing or truncated. ### Components/Axes - **X-axis (Time)**: Labeled "Time (s)" with markers at 0.00, 0.25, 0.50, 0.75, and 1.00 seconds. - **Y-axis (Voltage)**: Labeled "Voltage (V)" with markers at -1.0, -0.5, 0.0, 0.5, 1.0, and 1.5 volts. - **Bars**: Gray vertical bars represent voltage magnitudes. No legend is present, but bar color is consistent (gray). - **Grid**: Dashed horizontal and vertical lines for reference. ### Detailed Analysis - **Bar Positions**: - Bars appear at approximately 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 seconds. - Missing bars at 0.55, 0.65, 0.75, and 0.85 seconds. - **Bar Heights**: - Voltage values range from ~0.5 V to ~1.0 V for most bars. - Some bars extend slightly above 1.0 V (up to ~1.1 V). - **Truncation**: Bars at 0.1, 0.2, and 0.3 seconds are partially cut off at the top, suggesting values exceeding 1.0 V. ### Key Observations 1. **Periodic Pattern**: Bars are spaced ~0.1 seconds apart, indicating a regular sampling interval. 2. **Missing Data**: Gaps at 0.55, 0.65, 0.75, and 0.85 seconds suggest missing or zero-voltage measurements. 3. **Voltage Variability**: Bars vary in height, implying fluctuating voltage levels within the measured range. 4. **Truncation Artifact**: Top bars are clipped, indicating values beyond the y-axis maximum (1.5 V). ### Interpretation The chart likely represents a **periodic voltage signal** (e.g., a square wave or pulse train) with intermittent interruptions. The missing bars could indicate: - **Signal dropout** (e.g., noise, sensor failure). - **Threshold filtering** (e.g., ignoring values below a certain voltage). - **Data acquisition limitations** (e.g., sampling rate or resolution constraints). The varying bar heights suggest **non-ideal signal behavior**, such as noise or modulation. The truncated bars imply the signal occasionally exceeds the measured range, possibly due to transient spikes. This data could be critical for diagnosing power supply stability, signal integrity, or sensor performance in technical systems. </details> F igure A. 29 : Weight distribution. The synaptic weight distribution after training. (a-b) Conductance distribution of both positive and negative differential perovskite memristors. (c) The effective (unscaled) conductance distribution. (d) Weight distribution of the readout layer with floating-point-double precision weights. The effective memristive weights and FP 64 weights follow a similar distribution. <details> <summary>Image 67 Details</summary> ![047c7ed7](/v1/image/047c7ed74faa9bb16980249f347a03ddc0a762593ffc41b9d5f00ba41fc1afb1) ### Visual Description ## Histograms: Conductance Distributions Across Experimental Conditions ### Overview The image contains four histograms (a-d) comparing conductance distributions (in mS) across different experimental conditions. Each panel uses a distinct color-coded legend to represent data series, with counts on the y-axis and conductance values on the x-axis. ### Components/Axes - **X-axis**: Conductance (mS) for all panels, with approximate ranges: - Panels a/b: 0–3 mS - Panels c/d: -2–2 mS - **Y-axis**: Count (frequency) for all panels, with approximate maximums: - Panels a/b: 0–20 - Panels c/d: 0–30 - **Legends**: - Panel a: Blue = G⁺ - Panel b: Blue = G⁻ - Panel c: Blue = G⁺ − G⁻ - Panel d: Red = FP64 - **Spatial Grounding**: - Legends positioned in the top-right corner of each panel. - Axis labels centered below (x-axis) and above (y-axis) the respective axes. ### Detailed Analysis #### Panel a (G⁺) - **Distribution**: Right-skewed with a peak at ~0.5 mS (count ~20). - **Trend**: Gradual decline in counts for conductance >1.5 mS. - **Notable**: No negative conductance values observed. #### Panel b (G⁻) - **Distribution**: Bimodal with peaks at ~1.0 mS (count ~15) and ~2.5 mS (count ~10). - **Trend**: Lower counts at 0–0.5 mS compared to G⁺. - **Notable**: Broader spread than G⁺, suggesting higher variability. #### Panel c (G⁺ − G⁻) - **Distribution**: Symmetric around 0 mS, with a peak at ~0 mS (count ~15). - **Trend**: Sharp decline in counts for |conductance| >1.0 mS. - **Notable**: Negative values present, indicating directional differences between G⁺ and G⁻. #### Panel d (FP64) - **Distribution**: Narrower and sharper peak at ~0.5 mS (count ~25). - **Trend**: Symmetric but less spread than G⁺ − G⁻. - **Notable**: Higher peak count than G⁺ − G⁻, suggesting concentrated conductance values. ### Key Observations 1. **G⁺ vs. G⁻**: G⁺ shows lower conductance values with a single peak, while G⁻ exhibits bimodal distribution, indicating distinct physiological states. 2. **G⁺ − G⁻**: The difference histogram highlights directional conductance changes, with symmetry suggesting balanced opposing effects. 3. **FP64**: The red histogram (FP64) shows a tighter distribution than G⁺ − G⁻, possibly reflecting a controlled or standardized condition. ### Interpretation - **Biological Implications**: The bimodal G⁻ distribution (panel b) may reflect subpopulations or dynamic conductance changes under specific stimuli. The symmetry in G⁺ − G⁻ (panel c) suggests antagonistic effects between G⁺ and G⁻ pathways. - **FP64 Significance**: The FP64 histogram (panel d) could represent a filtered or optimized dataset, with reduced variability compared to raw G⁺ − G⁻ data. - **Outliers**: The G⁻ peak at 2.5 mS (panel b) may indicate rare but significant conductance events requiring further investigation. This analysis underscores the importance of distinguishing conductance dynamics across experimental conditions, with potential applications in understanding ion channel behavior or pharmacological responses. </details> F igure A. 30 : Training RC with FP 64 readout weights using backpropagation. The training metrics of ANN is shown. (a) Training and testing accuracy over 5 epochs demonstrates that the network solves the classification task with high accuracy without overfitting. (b) Confidence matrix calculated at the end of training. The correct response probability is shown in the right color scale. It is evident that network performs slightly worse in discriminating irregular patterns. <details> <summary>Image 68 Details</summary> ![2ddd3f80](/v1/image/2ddd3f8037bb7ea49bb2016416f06c128e6086903c4e8b3ba4a5d110f9a3349c) ### Visual Description ## Line Graph: Accuracy vs. Epoch ### Overview The image contains two visualizations: a line graph (a) showing training and testing accuracy across epochs, and a heatmap (b) comparing categorical relationships. The line graph demonstrates rapid improvement in accuracy followed by stabilization, while the heatmap reveals diagonal dominance in categorical correlations. ### Components/Axes **Line Graph (a):** - **X-axis (Epoch):** Discrete values 0–5, labeled "Epoch" - **Y-axis (Accuracy %):** Continuous scale 0–100%, labeled "Accuracy (%)" - **Legend:** Top-right corner, with: - Blue squares: Training accuracy - Red squares: Testing accuracy **Heatmap (b):** - **Rows/Columns:** Labeled "Burst," "Adaptation," "Irregular," "Tonic" - **Color Scale:** 0 (light green) to 1 (dark blue), labeled with numeric values - **Legend:** Right side, vertical gradient from light green to dark blue ### Detailed Analysis **Line Graph (a):** - **Training Accuracy (Blue):** - Epoch 0: ~25% - Epoch 1: ~85% (steep rise) - Epochs 2–5: ~90% (plateau) - **Testing Accuracy (Red):** - Epoch 0: ~20% - Epoch 1: ~85% (steep rise) - Epochs 2–5: ~90% (gradual increase) - **Trend Verification:** Both lines show exponential growth in first epoch, then flatten. Training accuracy remains consistently ~5% higher than testing accuracy throughout. **Heatmap (b):** - **Row/Column Categories:** - Burst: Dark blue (1.0) on diagonal, light green (0.2–0.4) off-diagonal - Adaptation: Dark blue (1.0) on diagonal, medium blue (0.6) at "Irregular" intersection - Irregular: Dark blue (1.0) on diagonal, medium blue (0.6) at "Adaptation" and "Tonic" intersections - Tonic: Dark blue (1.0) on diagonal, light green (0.2–0.4) off-diagonal - **Spatial Grounding:** Darkest cells (1.0) form a diagonal from top-left to bottom-right ### Key Observations 1. **Line Graph:** - Training accuracy surpasses testing accuracy by ~5% consistently - Both metrics plateau at ~90% after initial epoch 2. **Heatmap:** - Diagonal dominance suggests strong self-correlation - "Adaptation-Irregular" and "Irregular-Tonic" show moderate correlation (0.6) - Off-diagonal values cluster between 0.2–0.6 ### Interpretation The line graph reveals that model performance stabilizes after the first epoch, with training accuracy maintaining a slight edge over testing. The heatmap's diagonal dominance implies that categorical relationships (e.g., "Burst-Burst") exhibit maximum correlation, while cross-category interactions show diminishing returns. The "Adaptation-Irregular" and "Irregular-Tonic" moderate correlations (0.6) suggest potential for improved cross-category generalization. The consistent 5% gap between training and testing accuracy may indicate overfitting or data distribution mismatch. </details> T able A. 1 : Comparing training and test accuracies of both approaches. The neural spiking pattern classification performance table comparing the two approaches. The readout layer with drift-based halide-perovskite memristor weights trained with online ( Icc ) control achieves comparable result with FP 64 weights trained with backpropagation. | | | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | Accuracy (%) | |----------------------|----------|----------------|----------------|----------------|----------------|----------------|----------------| | | | Epoch 0 | Epoch 1 | Epoch 2 | Epoch 3 | Epoch 4 | Epoch 5 | | FP 64 | Training | 24 . 68 | 87 . 76 | 91 . 38 | 92 . 47 | 92 . 99 | 93 . 54 | | with Backpropagation | Testing | 20 . 88 | 86 . 14 | 89 . 16 | 90 . 16 | 91 . 37 | 91 . 77 | | Perovskite | Training | 10 . 32 | 69 . 11 | 89 . 12 | 83 . 09 | 85 . 59 | 86 . 75 | | with I cc Control | Testing | 14 . 46 | 73 . 29 | 86 . 14 | 80 . 92 | 84 . 94 | 85 . 14 | F igure A. 31 : I cc modulated training for drift-based perovskite configuration. <details> <summary>Image 69 Details</summary> ![bab04f27](/v1/image/bab04f27b22a84a27771fb96690d77379c2f3f624fee013e007045947dc4d938) ### Visual Description ## Flowchart: Machine Learning Model Weight Update Process ### Overview The diagram illustrates a technical workflow for a machine learning model, focusing on weight initialization, inference, prediction error calculation, and controlled weight updates. It combines mathematical operations (e.g., linear scaling, mapping) with algorithmic steps (e.g., reset, read, write) to optimize model parameters. --- ### Components/Axes 1. **Left Section (Initialization & Inference)**: - **Initialization**: Block labeled "RESET G⁺ and G⁻". - **Inference**: - **Readout Inference**: Block with formula `σ[∑xᵢ(Gᵢⱼ⁺ − Gᵢⱼ⁻)]`, where `σ` is the sigmoid </details> supplementary note 5 During the inference procedure, reservoir output vector of length 30 is fed into the readout layer. Memristors in the readout layer are placed in the differential architecture, in which the difference of conductance values of two differential memristors ( G + and G -) determines the effective synaptic strength. Scaled with β = 1/ ( G max -G min ) , where G max = 0.35mS and G min = 0.1mS, the weight matrix ( 30 × 4 ) is calculated as W = β [ G + -G -] . And the network prediction is determined by choosing output neuron index with the maximum activation level. For the training procedure, the network loss is calculated as the difference between output layer prediction and the one-hot encoded target vector indicating one of the four firing patterns. At this point, one can calculate targeted weights using the backpropagation algorithm. However, to support fully online-learning, we tested Icc controlled weight update scheme where following stages in the pipeline can be easily implemented with the mixed-signal circuits in an event-driven manner. The Icc controlled weight update is implemented as follows. First, the required weight change is calculated with W target = η xi δ j , where η is suitably low learning rate, xi is the reservoir layer output and δ j is the calculated error. In order to calculate the target conductance values for both positive and negative memristors, we first linearly scale the weight change to conductance change (by multiplying with 1/ β ). Secondly, we read both the positive and negative conductance values. By using a push-pull mechanism, we calculate the target conductance values. The pushpull mechanism ensures a higher dynamic range in the differential configuration. Third, the target conductances are linearly mapped to the target Icc values ( Icc, target = ( G target + 1.249 × 10 -5 ) / 3 . 338 ) for positive and negative memristors. The weights are updated with the application of RESET and SET pulses with the targeted Icc. Using the linear relation between Icc → G control, we calculated mean ( µ G = 3.338Icc -1.294 × 10 -5 ) and standard deviation σ G = 7.040Icc + 3.0585 and sample from the corresponding Normal distribution.) F igure A. 32 : 'Reconfigurability' on-the-fly of OGB-capped CsPbBr 3 NC memristors. The device is switched between its non-volatile and volatile mode on demand. <details> <summary>Image 70 Details</summary> ![06fff9d8](/v1/image/06fff9d822446b9dc3b6f37f880406021f36f74b24766b244d91fd5a93f315a6) ### Visual Description ## Line Graph: Current Response Over Time Under Various Electrical Conditions ### Overview The image is a line graph depicting current (in amperes) over time (in seconds) under nine distinct experimental conditions. Vertical colored bars segment the time axis into intervals, each labeled with a number and experimental condition. Data points (blue squares) are plotted within these intervals, showing current trends. A legend on the right maps colors to specific conditions. ### Components/Axes - **X-axis (Time)**: Labeled "Time (secs)", spanning 0 to 0.6 seconds in 0.1-second increments. - **Y-axis (Current)**: Labeled "Current (A)", ranging from 0 to 2×10⁻⁴ A in 5×10⁻⁵ A increments. - **Legend**: Positioned on the right, associating colors with experimental conditions (e.g., "NVM: I = 1 nA, read = 0.1 V, 5 ms"). - **Data Points**: Blue squares connected by lines in some intervals (e.g., bars 7–9). ### Detailed Analysis 1. **Time Intervals and Conditions**: - **Bar 1 (0–0.1s)**: Light blue. Condition: NVM, I = 1 nA, read = 0.1 V, 5 ms. - **Bar 2 (0.1–0.2s)**: Teal. Condition: NVM, V_set = 5 V, 5 ms, I_cc = 1 μA, read = 0.1 V, 5 ms. - **Bar 3 (0.2–0.3s)**: Light green. Condition: VM, V_perturb = 0.3 V, 5 ms. - **Bar 4 (0.3–0.4s)**: Teal. Condition: NVM, V_set = 5 V, 5 ms, I_cc = 60 μA, read = 0.1 V, 5 ms. - **Bar 5 (0.4–0.5s)**: Light green. Condition: VM, V_perturb = 0.3 V, 5 ms. - **Bar 6 (0.5–0.6s)**: Teal. Condition: NVM, V_set = 5 V, 5 ms, I_cc = 120 μA, read = 0.1 V, 5 ms. - **Bar 7 (0.6–0.7s)**: Light green. Condition: VM, V_perturb = 0.3 V, 5 ms. - **Bar 8 (0.7–0.8s)**: Teal. Condition: NVM, I_cc = V_set = 5 V, 5 ms, 180 μA, read = 0.1 V, 5 ms. - **Bar 9 (0.8–0.9s)**: Light green. Condition: VM, V_perturb = 0.3 V, 5 ms. 2. **Data Trends**: - **Early Intervals (Bars 1–3)**: Current remains near baseline (~1×10⁻⁵ A) with minor fluctuations. - **Mid-Intervals (Bars 4–6)**: Gradual increase in current, reaching ~1.5×10⁻⁴ A by Bar 6. - **Late Intervals (Bars 7–9)**: Sharp rise in current, peaking at ~2×10⁻⁴ A in Bar 9. 3. **Color Consistency**: - All data points (blue squares) align with the legend’s "VM" or "NVM" conditions, matching the bar colors (e.g., teal for NVM, light green for VM). ### Key Observations - **Baseline Stability**: Early intervals show minimal current variation, suggesting stable initial conditions. - **Gradual Activation**: Mid-intervals exhibit a linear increase in current, possibly indicating a charging or activation process. - **Threshold Response**: Late intervals show a sharp current spike, implying a critical threshold (e.g., voltage or current) is reached. - **Condition-Specific Behavior**: "VM" conditions (light green bars) consistently show higher current than "NVM" (teal bars) in later intervals. ### Interpretation The graph demonstrates how electrical perturbations (e.g., voltage settings, current injections) influence current response over time. The sharp increase in later intervals suggests a nonlinear response, potentially indicating a switching mechanism or saturation effect. The consistent use of 5 ms read times and 0.1 V read voltage across conditions implies standardized measurement protocols. The "VM" conditions (voltage modulation) appear to drive higher current than "NVM" (current injection), highlighting the role of voltage in modulating the system’s behavior. The data may reflect a device’s dynamic response to electrical stimuli, with implications for applications like neuromorphic computing or sensor arrays. </details> supplementary note 6 To demonstrate "reconfigurability" on-the-fly, our devices are switched between volatile and non-volatile modes with precise compliance current ( Icc ) con- trol and selection of activation voltages. Fig. A. 32 shows that our devices can act as a volatile memory even after setting it to multiple non-volatile states. This proves true "reconfigurability" of our devices, hitherto undemonstrated. Such behaviour is an example of the neuromorphic implementation of synapses in Spiking Neural Networks (SNNs) that demand both volatile and non-volatile switching properties, simultaneously. - 1 . McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5 , 115 ( 4 1943 ). - 2 . Turing, A. M. I.-computing machinery and intelligence. Mind LIX , 433 ( 236 1950 ). - 3 . P., H. & von Neumann, J. The computer and the brain. Mathematical Tables and Other Aids to Computation 13 , 226 ( 67 1959 ). - 4 . Tagkopoulos, I., Liu, Y.-C. & Tavazoie, S. Predictive behavior within microbial genetic networks. Science 320 , 1313 ( 5881 2008 ). - 5 . Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhang, H., Zhu, B., Jordan, M., Gonzalez, J. E. & Stoica, I. Chatbot Arena: An open platform for evaluating LLMs by human preference. arXiv [cs.AI] ( 2024 ). - 6 . Marr, D. Vision: A computational investigation into the human representation and processing of visual information 429 pp. (MIT Press, London, England, 2010 ). - 7 . Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., Berker, A. d., Ganguli, S., Gillon, C. J., Hafner, D., Kepecs, A., Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud, R., Pack, C. C., Poirazi, P., Roelfsema, P., Sacramento, J., Saxe, A., Scellier, B., Schapiro, A. C., Senn, W., Wayne, G., Yamins, D., Zenke, F., Zylberberg, J., Therien, D. & Kording, K. P. A deep learning framework for neuroscience. Nature Neuroscience 22 , 1761 ( 2019 ). - 8 . Braitenberg, V. & Schüz, A. in Cortex: Statistics and Geometry of Neuronal Connectivity 51 (Springer Berlin Heidelberg, Berlin, Heidelberg, 1998 ). - 9 . OpenAI et al. GPT4 Technical Report. arXiv [cs.CL] ( 2023 ). - 10 . Henighan, T., Kaplan, J., Katz, M., Chen, M., Hesse, C., Jackson, J., Jun, H., Brown, T. B., Dhariwal, P., Gray, S., Hallacy, C., Mann, B., Radford, A., Ramesh, A., Ryder, N., Ziegler, D. M., Schulman, J., Amodei, D. & McCandlish, S. Scaling laws for autoregressive generative modeling. arXiv [cs.LG] ( 2020 ). - 11 . Horowitz, M. 1 . 1 Computing's Energy Problem (and what we can do about it) in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) ( 2014 ), 10 . - 12 . Boroumand, A., Ghose, S., Akin, B., Narayanaswami, R., Oliveira, G. F., Ma, X., Shiu, E. & Mutlu, O. Google neural network models for edge devices: Analyzing and mitigating machine learning inference bottlenecks in 2021 30 th International Conference on Parallel Architectures and Compilation Techniques (PACT) 2021 30 th International Conference on Parallel Architectures and Compilation Techniques (PACT)Atlanta, GA, USA (IEEE, 2021 ), 159 . - 13 . Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 18 , 507 ( 5 1971 ). - 14 . Le Gallo, M., Sebastian, A., Mathis, R., Manica, M., Giefers, H., Tuma, T., Bekas, C., Curioni, A. & Eleftheriou, E. Mixed-precision in-memory computing. Nat. Electron. 1 , 246 ( 4 2018 ). - 15 . Bartol, T. M., Bromer, C., Kinney, J., Chirillo, M. A., Bourne, J. N., Harris, K. M. & Sejnowski, T. J. Nanoconnectomic upper bound on the variability of synaptic plasticity. Elife 4 , e 10778 ( 2015 ). - 16 . Burr, G. W., Shelby, R. M., Sebastian, A., Kim, S., Kim, S., Sidler, S., Virwani, K., Ishii, M., Narayanan, P., Fumarola, A., Sanches, L. L., Boybat, I., Le Gallo, M., Moon, K., Woo, J., Hwang, H. & Leblebici, Y. Neuromorphic computing using non-volatile memory. Adv. Phys. X. 2 , 89 ( 1 2017 ). - 17 . Demirag, Y. Multiphysics modeling of Ge 2 Sb 2 Te 5 based synaptic devices for brain inspired computing MA thesis (Ihsan Dogramaci Bilkent University, Ankara, Turkey, 2018 ). - 18 . Wan, W., Kubendran, R., Schaefer, C., Eryilmaz, S. B., Zhang, W., Wu, D., Deiss, S., Raina, P., Qian, H., Gao, B., et al. A compute-in-memory chip based on resistive random-access memory. Nature 608 , 504 ( 2022 ). - 19 . Chanthbouala, A., Garcia, V., Cherifi, R. O., Bouzehouane, K., Fusil, S., Moya, X., Xavier, S., Yamada, H., Deranlot, C., Mathur, N. D., Bibes, M., Barthélémy, A. & Grollier, J. A ferroelectric memristor. Nat. Mater. 11 , 860 ( 10 2012 ). - 20 . Lu, S. & Sengupta, A. Exploring the connection between binary and Spiking Neural Networks. Front. Neurosci. 14 , 535 ( 2020 ). - 21 . John, R. A., Acharya, J., Zhu, C., Surendran, A., Bose, S. K., Chaturvedi, A., Tiwari, N., Gao, Y., He, Y., Zhang, K. K., Xu, M., Leong, W. L., Liu, Z., Basu, A. & Mathews, N. Optogenetics inspired transition metal dichalcogenide neuristors for in-memory deep recurrent neural networks. Nat. Commun. 11 , 3211 ( 1 2020 ). - 22 . Sidler, S., Boybat, I., Shelby, R. M., Narayanan, P., Jang, J., Fumarola, A., Moon, K., Leblebici, Y., Hwang, H. & Burr, G. W. Large-scale neural networks implemented with Non-Volatile Memory as the synaptic weight element: Impact of conductance response in 2016 46 th European Solid-State Device Research Conference (ESSDERC) ESSDERC 2016 -46 th European Solid-State Device Research ConferenceLausanne, Switzerland (IEEE, 2016 ), 440 . - 23 . Khaddam-Aljameh, R., Stanisavljevic, M., Mas, J. F., Karunaratne, G., Brändli, M., Liu, F., Singh, A., Müller, S. M., Egger, U., Petropoulos, A., Antonakopoulos, T., Brew, K., Choi, S., Ok, I., Lie, F. L., Saulnier, N., Chan, V ., Ahsan, I., Narayanan, V., Nandakumar, S. R., Le Gallo, M., Francese, P. A., Sebastian, A. & Eleftheriou, E. HERMES-Core-A 1 . 59 -TOPS/mm 2 PCM on 14 -nm CMOS In-Memory Compute Core Using 300 -ps/LSB Linearized CCO-Based ADCs. IEEE J. Solid-State Circuits , 1 ( 2022 ). - 24 . Berdan, R., Vasilaki, E., Khiat, A., Indiveri, G., Serb, A. & Prodromakis, T. Emulating short-term synaptic dynamics with memristive devices. Sci. Rep. 6 , 18639 ( 1 2016 ). - 25 . Ohno, T., Hasegawa, T., Tsuruoka, T., Terabe, K., Gimzewski, J. K. & Aono, M. Short-term plasticity and long-term potentiation mimicked in single inorganic synapses. Nat. Mater. 10 , 591 ( 8 2011 ). - 26 . Zhang, X., Wang, W., Liu, Q., Zhao, X., Wei, J., Cao, R., Yao, Z., Zhu, X., Zhang, F., Lv, H., Long, S. & Liu, M. An artificial neuron based on a threshold switching memristor. IEEE Electron Device Lett. 39 , 308 ( 2 2018 ). - 27 . Huang, H.-M., Yang, R., Tan, Z.-H., He, H.-K., Zhou, W., Xiong, J. & Guo, X. Quasi-HodgkinHuxley neurons with leaky integrate-and-fire functions physically realized with memristive devices. Adv. Mater. 31 , e 1803849 ( 3 2019 ). - 28 . Dalgaty, T., Moro, F., Demirag, Y., De Pra, A., Indiveri, G., Vianello, E. & Payvand, M. Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems. Nat. Commun. 15 , 1 ( 1 2024 ). - 29 . Indiveri, G., Linares-Barranco, B., Hamilton, T., van Schaik, A., Etienne-Cummings, R., Delbruck, T., Liu, S.-C., Dudek, P., Häfliger, P., Renaud, S., Schemmel, J., Cauwenberghs, G., Arthur, J., Hynna, K., Folowosele, F., Saighi, S., Serrano-Gotarredona, T., Wijekoon, J., Wang, Y. & Boahen, K. Neuromorphic silicon neuron circuits. Frontiers in Neuroscience 5 , 1 ( 2011 ). - 30 . Sterling, P. in Principles of Neural Design 155 (The MIT Press, 2015 ). - 31 . Liu, S.-C., Van Schaik, A., Minch, B. A. & Delbruck, T. Event-based 64 -channel binaural silicon cochlea with Q enhancement mechanisms in 2010 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2010 ), 2027 . - 32 . Rueckauer, B. & Delbruck, T. Evaluation of event-based algorithms for optical flow with ground-truth from inertial measurement sensor. Front. Neurosci. 10 , 176 ( 2016 ). - 33 . Mahowald, M. VLSI analogs of neuronal visual processing: a synthesis of form and function ( 1992 ). - 34 . Boahen, K. Dendrocentric learning for synthetic intelligence. Nature 612 , 43 ( 7938 2022 ). - 35 . Kubke, M. F., Massoglia, D. P. & Carr, C. E. Developmental changes underlying the formation of the specialized time coding circuits in barn owls (Tyto alba). J. Neurosci. 22 , 7671 ( 17 2002 ). - 36 . O'Keefe, J. & Recce, M. L. Phase relationship between hippocampal place units and the EEG theta rhythm. Hippocampus 3 , 317 ( 3 1993 ). - 37 . MacKay, D. M. & McCulloch, W. S. The limiting information capacity of a neuronal link. Bulletin of Mathematical Biophysics 14 , 127 ( 2 1952 ). - 38 . Aceituno, P. V., de Haan, S., Loidl, R. & Grewe, B. F. Hierarchical target learning in the mammalian neocortex: A pyramidal neuron perspective. bioRxiv ( 2024 ). - 39 . Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. bioRxiv ( 2020 ). - 40 . Gerstner, W., Kempter, R., van Hemmen, J. L. & Wagner, H. A neuronal learning rule for sub-millisecond temporal coding. Nature 383 , 76 ( 6595 1996 ). - 41 . Demirag, Y. & Indiveri, G. Network of biologically plausible neuron models can solve motor tasks through heterogeneity in Computational and Systems Neuroscience (COSYNE) (Lisbon, Portugal, 2024 ). - 42 . Hardtdegen, A., La Torre, C., Cuppers, F., Menzel, S., Waser, R. & Hoffmann-Eifert, S. Improved switching stability and the effect of an internal series resistor in HfO 2 /TiO<italic>x</italic> bilayer ReRAM cells. IEEE Trans. Electron Devices 65 , 3229 ( 8 2018 ). - 43 . Nandakumar, S. R., Boybat, I., Han, J.-P., Ambrogio, S., Adusumilli, P., Bruce, R. L., BrightSky, M., Rasch, M., Le Gallo, M. & Sebastian, A. Precision of synaptic weights programmed in phase-change memory devices for deep learning inference in 2020 IEEE International Electron Devices Meeting (IEDM) 2020 IEEE International Electron Devices Meeting (IEDM)San Francisco, CA, USA (IEEE, 2020 ), 29 . 4 . 1 . - 44 . Gong, N., Idé, T., Kim, S., Boybat, I., Sebastian, A., Narayanan, V. & Ando, T. Signal and noise extraction from analog memory elements for neuromorphic computing. Nat. Commun. 9 , 2102 ( 1 2018 ). - 45 . Gallo, M. L., Kaes, M., Sebastian, A. & Krebs, D. Subthreshold electrical transport in amorphous phase-change materials. New Journal of Physics 17 , 093035 ( 2015 ). - 46 . Burr, G., Shelby, R., Nolfo, C., Jang, J., Shenoy, R., Narayanan, P., Virwani, K., Giacometti, E., Kurdi, B. & Hwang, H. Experimental Demonstration and Tolerancing of a Large-Scale Neural Network ( 165 , 000 Synapses), using Phase-Change Memory as the Synaptic Weight Element. 2014 IEEE International Electron Devices Meeting , 1 ( 2014 ). - 47 . Stathopoulos, S., Khiat, A., Trapatseli, M., Cortese, S., Serb, A., Valov, I. & Prodromakis, T. Multibit memory operation of metal-oxide bi-layer memristors. Scientific reports 7 , 17532 ( 2017 ). - 48 . Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G. C., Likharev, K. K. & Strukov, D. B. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521 , 61 ( 2015 ). - 49 . Bayat, F. M., Prezioso, M., Chakrabarti, B., Nili, H., Kataeva, I. & Strukov, D. Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits. Nature communications 9 , 2331 ( 2018 ). - 50 . Li, C., Hu, M., Li, Y., Jiang, H., Ge, N., Montgomery, E., Zhang, J., Song, W., Dávila, N., Graves, C. E., et al. Analogue signal and image processing with large memristor crossbars. Nature Electronics 1 , 52 ( 2018 ). - 51 . Boybat, I., Le Gallo, M., Nandakumar, S., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multimemristive synapses. Nature communications 9 , 2514 ( 2018 ). - 52 . Agarwal, S., Jacobs Gedrim, R. B., Hsia, A. H., Hughart, D. R., Fuller, E. J., Alec Talin, A., James, C. D., Plimpton, S. J. & Marinella, M. J. Achieving ideal accuracies in analog neuromorphic computing using periodic carry in 2017 Symposium on VLSI Technology (IEEE, 2017 ), T 174 . - 53 . Suri, M., Bichler, O., Querlioz, D., Palma, G., Vianello, E., Vuillaume, D., Gamrat, C. & DeSalvo, B. CBRAM devices as binary synapses for low-power stochastic neuromorphic systems: auditory (cochlea) and visual (retina) cognitive processing applications in 2012 International Electron Devices Meeting ( 2012 ), 10 . - 54 . Payvand, M., Muller, L. K. & Indiveri, G. Event-based circuits for controlling stochastic learning with memristive devices in neuromorphic architectures in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2018 ), 1 . - 55 . Ambrogio, S., Narayanan, P., Tsai, H., Shelby, R. M., Boybat, I., di Nolfo, C., Sidler, S., Giordano, M., Bodini, M., Farinha, N. C., et al. Equivalent-accuracy accelerated neuralnetwork training using analogue memory. Nature 558 , 60 ( 2018 ). - 56 . Lillicrap, T. P. & Santoro, A. Backpropagation through time and the brain. Curr. Opin. Neurobiol. Machine Learning, Big Data, and Neuroscience 55 , 82 ( 2019 ). - 57 . Attneave, F. & Hebb, D. O. The organization of behavior; A neuropsychological theory. Am. J. Psychol. 63 , 633 ( 4 1950 ). - 58 . Sjöström, P. J., Turrigiano, G. G. & Nelson, S. B. Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron 32 , 1149 ( 6 2001 ). - 59 . Feldman, D. E. The spike-timing dependence of plasticity. Neuron 75 , 556 ( 4 2012 ). - 60 . Oja, E. A simplified neuron model as a principal component analyzer. J. Math. Biol. 15 , 267 ( 3 1982 ). - 61 . Frenkel, C., Lefebvre, M., Legat, J.-D. & Bol, D. A 0 . 086 -mm 2 12 . 7 -pJ/SOP 64 k-Synapse 256 -Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28 -nm CMOS. IEEE Trans. Biomed. Circuits Syst. 13 , 145 ( 1 2019 ). - 62 . Frenkel, C., Legat, J.-D. & Bol, D. MorphIC: A 65 -nm 738 k-synapse/mm 2 quad-core binaryweight digital neuromorphic processor with stochastic spike-driven online learning. arXiv [cs.NE] ( 2019 ). - 63 . Mayr, C., Partzsch, J., Noack, M., Hänzsche, S., Scholze, S., Höppner, S., Ellguth, G. & Schüffny, R. A biological-realtime neuromorphic system in 28 nm CMOS using low-leakage switched capacitor circuits. IEEE Trans. Biomed. Circuits Syst. 10 , 243 ( 1 2016 ). - 64 . Brader, J. M., Senn, W. & Fusi, S. Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural computation 19 , 2881 ( 2007 ). - 65 . Marblestone, A. H., Wayne, G. & Kording, K. P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10 , 94 ( 2016 ). - 66 . Laskin, M., Metz, L., Nabarro, S., Saroufim, M., Noune, B., Luschi, C., Sohl-Dickstein, J. & Abbeel, P. Parallel training of deep networks with local updates. arXiv [cs.LG] ( 2020 ). - 67 . Kaiser, J., Mostafa, H. & Neftci, E. Synaptic plasticity dynamics for Deep Continuous Local Learning (DECOLLE). Front. Neurosci. 14 , 424 ( 2020 ). - 68 . Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323 , 533 ( 6088 1986 ). - 69 . Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7 , 13276 ( 1 2016 ). - 70 . Pozzi, I., Bohté, S. & Roelfsema, P. A biologically plausible learning rule for deep learning in the brain. arXiv [cs.NE] ( 2018 ). - 71 . Lee, D.-H., Zhang, S., Fischer, A. & Bengio, Y. Difference Target Propagation. arXiv [cs.LG] ( 2014 ). - 72 . Millidge, B., Tschantz, A. & Buckley, C. L. Predictive Coding Approximates Backprop along Arbitrary Computation Graphs. arXiv ( 2020 ). - 73 . Baydin, A. G., Pearlmutter, B. A., Syme, D., Wood, F. & Torr, P. Gradients without Backpropagation. arXiv [cs.LG] ( 2022 ). - 74 . Liu, Y. H., Ghosh, A., Richards, B. A., Shea-Brown, E. & Lajoie, G. Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules. arXiv [cs.NE] ( 2022 ). - 75 . Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R. & Maass, W. A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications 11 , 1 ( 2020 ). - 76 . Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., van Baalen, M. & Blankevoort, T. A white paper on neural network quantization. arXiv [cs.LG] ( 2021 ). - 77 . Frenkel, C. & Indiveri, G. ReckOn: A 28 nm sub-mm 2 task-agnostic spiking recurrent neural network processor enabling on-chip learning over second-long timescales. arXiv [cs.NE] ( 2022 ). - 78 . Lee, J., Kim, D. & Ham, B. Network quantization with element-wise gradient scaling. arXiv [cs.CV] ( 2021 ). - 79 . Fournarakis, M. & Nagel, M. In-Hindsight Quantization Range Estimation for Quantized Training. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 3057 ( 2021 ). - 80 . Bernstein, J., Zhao, J., Meister, M., Liu, M.-Y., Anandkumar, A. & Yue, Y. Learning compositional functions via multiplicative weight updates. arXiv [cs.NE] ( 2020 ). - 81 . Nandakumar, S., Le Gallo, M., Boybat, I., Rajendran, B., Sebastian, A. & Eleftheriou, E. A phase-change memory model for neuromorphic computing. Journal of Applied Physics 124 , 152135 ( 2018 ). - 82 . Bellec, G., Salaj, D., Subramoney, A., Legenstein, R. & Maass, W. Long short-term memory and learning-to-learn in networks of spiking neurons in Advances in Neural Information Processing Systems ( 2018 ), 787 . - 83 . Zenke, F. & Neftci, E. O. Brain-Inspired Learning on Neuromorphic Substrates. Proceedings of the IEEE , 1 ( 2021 ). - 84 . Bohnstingl, T., Wo´ zniak, S., Maass, W., Pantazi, A. & Eleftheriou, E. Online Spatio-Temporal Learning in Deep Neural Networks. arXiv ( 2020 ). - 85 . Zenke, F. & Ganguli, S. Superspike: Supervised learning in multilayer spiking neural networks. Neural computation 30 , 1514 ( 2018 ). - 86 . Perez-Nieves, N. & Goodman, D. F. M. Sparse Spiking Gradient Descent. arXiv ( 2021 ). - 87 . Singh, S. P. & Sutton, R. S. Reinforcement Learning with Replacing Eligibility Traces. Mach. Learn. 22 , 123 ( 1 / 2 / 3 1996 ). - 88 . Hull, C. L. Principles of Behavior. The Journal of Nervous and Mental Disease 101 , 396 ( 4 1945 ). - 89 . Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C.-K., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y.-H., Wild, A., Yang, Y. & Wang, H. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38 , 82 ( 2018 ). - 90 . Grübl, A., Billaudelle, S., Cramer, B., Karasenko, V. & Schemmel, J. Verification and Design Methods for the BrainScaleS Neuromorphic Hardware System. arXiv preprint arXiv: 2003 . 11455 ( 2020 ). - 91 . Furber, S., Galluppi, F., Temple, S. & Plana, L. The SpiNNaker Project. Proceedings of the IEEE 102 , 652 ( 2014 ). - 92 . Backus, J. Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. Communications of the ACM 21 , 613 ( 1978 ). - 93 . Indiveri, G. & Liu, S.-C. Memory and information processing in neuromorphic systems. Proceedings of the IEEE 103 , 1379 ( 2015 ). - 94 . Rubino, A., Livanelioglu, C., Qiao, N., Payvand, M. & Indiveri, G. Ultra-low-power FDSOI neural circuits for extreme-edge neuromorphic intelligence. IEEE Trans. Circuits Syst. I Regul. Pap. 68 , 45 ( 1 2021 ). - 95 . Fuller, E. J., Keene, S. T., Melianas, A., Wang, Z., Agarwal, S., Li, Y., Tuchman, Y., James, C. D., Marinella, M. J., Yang, J. J., Salleo, A. & Talin, A. A. Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing. Science 364 , 570 ( 6440 2019 ). - 96 . Huang, Y.-J., Chao, S.-C., Lien, D., Wen, C., He, J.-H. & Lee, S.-C. Dual-functional memory and threshold resistive switching based on the push-pull mechanism of oxygen ions. Sci. Rep. 6 , 23945 ( 1 2016 ). - 97 . Abbas, H., Abbas, Y., Hassan, G., Sokolov, A. S., Jeon, Y.-R., Ku, B., Kang, C. J. & Choi, C. The coexistence of threshold and memory switching characteristics of ALD HfO 2 memristor synaptic arrays for energy-efficient neuromorphic computing. Nanoscale 12 , 14120 ( 26 2020 ). - 98 . Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J. & Amodei, D. Scaling laws for neural language models. arXiv [cs.LG] ( 2020 ). - 99 . Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G., Likharev, K. K. & Strukov, D. B. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521 , 61 ( 2015 ). - 100 . Yao, P., Wu, H., Gao, B., Tang, J., Zhang, Q., Zhang, W., Yang, J. J. & Qian, H. Fully hardware-implemented memristor convolutional neural network. Nature 577 , 641 ( 2020 ). - 101 . Chen, J., Yang, S., Wu, H., Indiveri, G. & Payvand, M. Scaling Limits of Memristor-Based Routers for Asynchronous Neuromorphic Systems. arXiv preprint arXiv: 2307 . 08116 ( 2023 ). - 102 . Vianello, E. & Payvand, M. Scaling neuromorphic systems with 3 D technologies. Nat. Electron. 7 , 419 ( 6 2024 ). - 103 . Moradi, S., Qiao, N., Stefanini, F. & Indiveri, G. A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs). Biomedical Circuits and Systems, IEEE Transactions on 12 , 106 ( 2018 ). - 104 . Sutton, R. S. The Bitter Lesson ( 2024 ). - 105 . Ielmini, D. Resistive switching memories based on metal oxides: mechanisms, reliability and scaling. Semiconductor Science and Technology 31 , 063002 ( 2016 ). - 106 . Widrow, B. & Winter, R. Neural nets for adaptive filtering and adaptive pattern recognition. Computer 21 , 25 ( 1988 ). - 107 . Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. in Advances in Neural Information Processing Systems 32 (eds Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E. & Garnett, R.) 8024 (Curran Associates, Inc., 2019 ). - 108 . Bernstein, J., Wang, Y.-X., Azizzadenesheli, K. & Anandkumar, A. signSGD: Compressed Optimisation for Non-Convex Problems. arXiv [cs.LG] ( 2018 ). - 109 . Boybat, I., Gallo, M. L., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multi-memristive synapses. Nature Communications 9 , 2514 ( 2018 ). - 110 . Nandakumar, S. R., Gallo, M. L., Piveteau, C., Joshi, V., Mariani, G., Boybat, I., Karunaratne, G., Khaddam-Aljameh, R., Egger, U., Petropoulos, A., Antonakopoulos, T., Rajendran, B., Sebastian, A. & Eleftheriou, E. Mixed-Precision Deep Learning Based on Computational Memory. Frontiers in Neuroscience 14 , 406 ( 2020 ). - 111 . Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. arXiv ( 2016 ). - 112 . Le Gallo, M., Krebs, D., Zipoli, F., Salinga, M. & Sebastian, A. Collective Structural Relaxation in Phase-Change Memory Devices. Advanced Electronic Materials 4 , 1700627 ( 2018 ). - 113 . Salimans, T., Ho, J., Chen, X., Sidor, S. & Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv ( 2017 ). - 114 . Indiveri, G., Linares-Barranco, B., Legenstein, R., Deligeorgis, G. & Prodromakis, T. Integration of nanoscale memristor synapses in neuromorphic computing architectures. Nanotechnology 24 , 384010 ( 2013 ). - 115 . Payvand, M., Nair, M. V., Müller, L. K. & Indiveri, G. A neuromorphic systems approach to in-memory computing with non-ideal memristive devices: From mitigation to exploitation. Faraday Discussions 213 , 487 ( 2019 ). - 116 . Deiss, S., Douglas, R. & Whatley, A. in Pulsed Neural Networks (eds Maass, W. & Bishop, C.) 157 (MIT Press, 1998 ). - 117 . Alibart, F., Gao, L., Hoskins, B. D. & Strukov, D. B. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23 , 075201 ( 2012 ). - 118 . Nandakumar, S., Le Gallo, M., Boybat, I., Rajendran, B., Sebastian, A. & Eleftheriou, E. Mixed-precision architecture based on computational memory for training deep neural networks in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2018 ), 1 . - 119 . Grossi, A., Nowak, E., Zambelli, C., Pellissier, C., Bernasconi, S., Cibrario, G., Hajjam, K. E., Crochemore, R., Nodin, J. F., Olivo, P. & Perniola, L. Fundamental variability limits of filament-based RRAM in 2016 IEEE International Electron Devices Meeting (IEDM) ( 2016 ), 4 . 7 . 1 . - 120 . Payvand, M. & Indiveri, G. Spike-Based Plasticity Circuits for Always-on On-Line Learning in Neuromorphic Systems in 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE, Sapporo, Japan, 2019 ), 1 . - 121 . Delbruck, T. 'Bump' circuits for computing similarity and dissimilarity of analog voltages in Neural Networks, 1991 ., IJCNN91 -Seattle International Joint Conference on 1 ( 1991 ), 475 . - 122 . Payvand, M., Fouda, M., Kurdahi, F., Eltawil, A. & Neftci, E. O. Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays. arXiv preprint arXiv: 1910 . 06152 ( 2019 ). - 123 . Goodman, D. Brian: a simulator for spiking neural networks in Python. Frontiers in Neuroinformatics 2 ( 2008 ). - 124 . Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 , 2278 ( 1998 ). - 125 . Dalgaty, T., Payvand, M., Moro, F., Ly, D. R., Pebay-Peyroula, F., Casas, J., Indiveri, G. & Vianello, E. Hybrid neuromorphic circuits exploiting non-conventional properties of RRAM for massively parallel local plasticity mechanisms. APL Materials 7 , 081125 ( 2019 ). - 126 . Demirag, Y., Frenkel, C., Payvand, M. & Indiveri, G. Online training of spiking recurrent neural networks with Phase-Change Memory synapses 2021 . - 127 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable synaptic eligibility traces with resistivity drift of phase-change materials in 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 2021 IEEE International Symposium on Circuits and Systems (ISCAS)Daegu, Korea. 00 (IEEE, 2021 ), 1 . - 128 . Bohnstingl, T., Surina, A., Fabre, M., Demirag, Y., Frenkel, C., Payvand, M., Indiveri, G. & Pantazi, A. Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware in 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS)Incheon, Korea, Republic of. 00 (IEEE, 2022 ), 218 . - 129 . Khrulkov, V., Novikov, A. & Oseledets, I. Expressive power of recurrent neural networks. arXiv ( 2017 ). - 130 . Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. & Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv: 1609 . 03499 [cs] ( 2016 ). - 131 . Teed, Z. & Deng, J. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. arXiv ( 2020 ). - 132 . Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. & Amodei, D. Language Models are Few-Shot Learners. arXiv ( 2020 ). - 133 . Berner, C., Brockman, G., Chan, B., Cheung, V., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F. & Zhang, S. Dota 2 with Large Scale Deep Reinforcement Learning, 66 ( 2019 ). - 134 . Ha, D. & Schmidhuber, J. World Models. arXiv: 1803 . 10122 [cs, stat] ( 2018 ). - 135 . Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C. & Silver, D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 , 350 ( 2019 ). - 136 . Douglas, R. & Martin, K. in The synaptic organization of the brain (ed Shepherd, G.) 4 th, 459 (Oxford University Press, Oxford, New York, 1998 ). - 137 . Douglas, R., Mahowald, M. & Mead, C. Neuromorphic analogue VLSI. Annual Review of Neuroscience 18 , 255 ( 1995 ). - 138 . Ambrogio, S., Narayanan, P., Tsai, H., Shelby, R. M., Boybat, I., di Nolfo, C., Sidler, S., Giordano, M., Bodini, M., Farinha, N. C. P., Killeen, B., Cheng, C., Jaoudi, Y. & Burr, G. W. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558 , 60 ( 2018 ). - 139 . Li, C., Belkin, D., Li, Y., Yan, P., Hu, M., Ge, N., Jiang, H., Montgomery, E., Lin, P., Wang, Z., Song, W., Strachan, J. P., Barnell, M., Wu, Q., Williams, R. S., Yang, J. J. & Xia, Q. Efficient and self-adaptive in-situ learning in multilayer memristor neural network. Nature Communications 9 , 1 ( 2018 ). - 140 . Dalgaty, T., Castellani, N., Turck, C., Harabi, K.-E., Querlioz, D. & Vianello, E. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nature Electronics 4 , 151 ( 2021 ). - 141 . Cai, F., Kumar, S., Van Vaerenbergh, T., Sheng, X., Liu, R., Li, C., Liu, Z., Foltin, M., Yu, S., Xia, Q., et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nature Electronics 3 , 409 ( 2020 ). - 142 . Sebastian, A., Gallo, M. L. & Eleftheriou, E. Computational phase-change memory: beyond von Neumann computing. Journal of Physics D: Applied Physics 52 , 443002 ( 2019 ). - 143 . Payvand, M., Nair, M. V., Müller, L. K. & Indiveri, G. A neuromorphic systems approach to in-memory computing with non-ideal memristive devices: From mitigation to exploitation. Faraday Discussions 213 , 487 ( 2019 ). - 144 . Chicca, E. & Indiveri, G. A recipe for creating ideal hybrid memristive-CMOS neuromorphic processing systems. Applied Physics Letters 116 , 120501 ( 2020 ). - 145 . Peng, X., Huang, S., Luo, Y., Sun, X. & Yu, S. DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute -in-Memory Accelerators with Versatile Device Technologies. IEEE International Electron Devices Meeting (IEDM) , 32 . 5 . 1 ( 2019 ). - 146 . Peng, X., Huang, S., Jiang, H., Lu, A. & Yu, S. DNN+NeuroSim V 2 . 0 : An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems PP , 1 ( 2020 ). - 147 . Burr, G. W., Brightsky, M. J., Sebastian, A., Cheng, H.-Y., Wu, J.-Y., Kim, S., Sosa, N. E., Papandreou, N., Lung, H.-L., Pozidis, H., Eleftheriou, E. & Lam, C. H. Recent Progress in Phase-Change Memory Technology. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 , 146 ( 2016 ). - 148 . Burr, G. W., Shelby, R. M., Sebastian, A., Kim, S., Kim, S., Sidler, S., Virwani, K., Ishii, M., Narayanan, P., Fumarola, A., Sanches, L. L., Boybat, I., Le Gallo, M., Moon, K., Woo, J., Hwang, H. & Leblebici, Y. Neuromorphic computing using non-volatile memory. Advances in Physics: X 2 , 89 ( 2017 ). - 149 . Tuma, T., Pantazi, A., Le Gallo, M., Sebastian, A. & Eleftheriou, E. Stochastic phase-change neurons. Nature Nanotechnology 11 , 693 ( 2016 ). - 150 . Karunaratne, G., Gallo, M. L., Cherubini, G., Benini, L., Rahimi, A. & Sebastian, A. Inmemory hyperdimensional computing. Nature Electronics 3 , 327 ( 2020 ). - 151 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable Synaptic Eligibility Traces with Resistivity Drift of PhaseChange Materials. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) , 1 ( 2021 ). - 152 . Gallo, M. L., Sebastian, A., Cherubini, G., Giefers, H. & Eleftheriou, E. Compressed Sensing With Approximate Message Passing Using In-Memory Computing. IEEE Transactions on Electron Devices 65 , 4304 ( 2018 ). - 153 . Gallo, M. L. & Sebastian, A. An overview of phase-change memory device physics. Journal of Physics D: Applied Physics 53 , 213002 ( 2020 ). - 154 . Gallo, M. L., Athmanathan, A., Krebs, D. & Sebastian, A. Evidence for thermally assisted threshold switching behavior in nanoscale phase-change memory cells. Journal of Applied Physics 119 , 025704 ( 2016 ). - 155 . Ielmini, D., Lavizzari, S., Sharma, D. & Lacaita, A. L. Physical Interpretation, Modeling and Impact on Phase Change Memory (PCM) Reliability of Resistance Drift Due to Chalcogenide Structural Relaxation. 2007 IEEE International Electron Devices Meeting , 939 ( 2007 ). - 156 . Karpov, I., Mitra, M., Kau, D., Spadini, G., Kryukov, Y. & Karpov, V. Fundamental drift of parameters in chalcogenide phase change memory. Journal of Applied Physics 102 , 124503 ( 2007 ). - 157 . Redaelli, A., Pirovano, A., Benvenuti, A. & Lacaita, A. L. Threshold switching and phase transition numerical models for phase change memory simulations. Journal of Applied Physics 103 , 111101 ( 2008 ). - 158 . Salinga, M., Carria, E., Kaldenbach, A., Bornhöfft, M., Benke, J., Mayer, J. & Wuttig, M. Measurement of crystal growth velocity in a melt-quenched phase-change material. Nature Communications 4 , 2371 ( 2013 ). - 159 . Nardone, M., Kozub, V. I., Karpov, I. V. & Karpov, V. G. Possible mechanisms for 1 /f noise in chalcogenide glasses: A theoretical description. Physical Review B 79 , 165206 ( 2009 ). - 160 . Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neur. Circ. 9 , 85 ( 2016 ). - 161 . Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm in Advances in neural information processing systems ( 2018 ), 8721 . - 162 . Pozzi, I., Bohté, S. & Roelfsema, P. A Biologically Plausible Learning Rule for Deep Learning in the Brain. arXiv ( 2018 ). - 163 . Sussillo, D. & Abbott, L. Generating coherent patterns of activity from chaotic neural networks. Neuron 63 , 544 ( 2009 ). - 164 . Nicola, W. & Clopath, C. Supervised Learning in Spiking Neural Networks with FORCE Training. Nature Communications 8 , 2208 ( 2017 ). - 165 . Neftci, E. O., Mostafa, H. & Zenke, F. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine 36 , 51 ( 2019 ). - 166 . Lee, J. H., Delbruck, T. & Pfeiffer, M. Training Deep Spiking Neural Networks Using Backpropagation. Frontiers in Neuroscience 10 , 508 ( 2016 ). - 167 . Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9 , 1735 ( 1997 ). - 168 . Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, D. & Indiveri, G. A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128 K synapses. Frontiers in neuroscience 9 , 141 ( 2015 ). - 169 . Payvand, M., Muller, L. K. & Indiveri, G. Event-based circuits for controlling stochastic learning with memristive devices in neuromorphic architectures in Circuits and Systems (ISCAS), 2018 IEEE International Symposium on ( 2018 ), 1 . - 170 . Nair, M. V., Mueller, L. K. & Indiveri, G. A differential memristive synapse circuit for on-line learning in neuromorphic computing systems. Nano Futures 1 , 1 ( 2017 ). - 171 . Balles, L., Pedregosa, F. & Roux, N. L. The Geometry of Sign Gradient Descent. arXiv ( 2020 ). - 172 . Nair, M. V. & Dudek, P. Gradient-descent-based learning in memristive crossbar arrays in International Joint Conference on Neural Networks (IJCNN) ( 2015 ), 1 . - 173 . Müller, L., Nair, M. & Indiveri, G. Randomized Unregulated Step Descent for Limited Precision Synaptic Elements in International Symposium on Circuits and Systems, (ISCAS) ( 2017 ). - 174 . Payvand, M., Fouda, M. E., Kurdahi, F., Eltawil, A. M. & Neftci, E. O. On-Chip ErrorTriggered Learning of Multi-Layer Memristive Spiking Neural Networks. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10 , 522 ( 2020 ). - 175 . Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412 . 6980 ( 2014 ). - 176 . Athmanathan, A., Stanisavljevic, M., Papandreou, N., Pozidis, H. & Eleftheriou, E. Multilevel-Cell Phase-Change Memory: A Viable Technology. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 , 87 ( 2016 ). - 177 . Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian Optimization of Machine Learning Algorithms in Proceedings of the 25 th International Conference on Neural Information Processing Systems - Volume 2 (Curran Associates Inc., Lake Tahoe, Nevada, 2012 ), 2951 . - 178 . Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G. A. F., Joshi, P., Plank, P. & Risbud, S. R. Advancing neuromorphic computing with Loihi: A survey of results and outlook. Proceedings of the IEEE 109 , 911 ( 2021 ). - 179 . Frenkel, C., Bol, D. & Indiveri, G. Bottom-Up and Top-Down Neural Processing Systems Design: Neuromorphic Intelligence as the Convergence of Natural and Artificial Intelligence. arXiv preprint arXiv: 2106 . 01288 ( 2021 ). - 180 . Muller, L. K. & Indiveri, G. Rounding methods for neural networks with low resolution synaptic weights. arXiv preprint arXiv: 1504 . 05767 , 1 ( 2015 ). - 181 . Frenkel, C., Legat, J.-D. & Bol, D. MorphIC: A 65 -nm 738 k-Synapse/mm 2 quad-core binaryweight digital neuromorphic processor with stochastic spike-driven online learning. IEEE Transactions on Biomedical Circuits and Systems 13 , 999 ( 2019 ). - 182 . Frenkel, C., Legat, J.-D. & Bol, D. A 28 -nm convolutional neuromorphic processor enabling online learning with spike-based retinas in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) ( 2020 ), 1 . - 183 . Fusi, S. & Abbott, L. Limits on the memory storage capacity of bounded synapses. Nature Neuroscience 10 , 485 ( 2007 ). - 184 . Laborieux, A., Ernoult, M., Hirtzlin, T. & Querlioz, D. Synaptic metaplasticity in binarized neural networks. Nature communications 12 , 1 ( 2021 ). - 185 . Khaddam-Aljameh, R., Stanisavljevic, M., Fornt Mas, J., Karunaratne, G., Brandli, M., Liu, F., Singh, A., Muller, S. M., Egger, U., Petropoulos, A., Antonakopoulos, T., Brew, K., Choi, S., Ok, I., Lie, F. L., Saulnier, N., Chan, V ., Ahsan, I., Narayanan, V., Nandakumar, S. R., Le Gallo, M., Francese, P. A., Sebastian, A. & Eleftheriou, E. HERMES-core-A 1 . 59 -TOPS/mm 2 PCM on 14 -nm CMOS in-memory compute core using 300 -ps/LSB linearized CCO-based ADCs. IEEE J. Solid-State Circuits 57 , 1027 ( 4 2022 ). - 186 . Le Gallo, M., Sebastian, A., Cherubini, G., Giefers, H. & Eleftheriou, E. Compressed Sensing With Approximate Message Passing Using In-Memory Computing. IEEE Trans. Electron Devices 65 , 4304 ( 2018 ). - 187 . Mead, C. How we created neuromorphic engineering. Nature Electronics 3 , 434 ( 2020 ). - 188 . Chicca, E., Stefanini, F., Bartolozzi, C. & Indiveri, G. Neuromorphic electronic circuits for building autonomous cognitive systems. Proceedings of the IEEE 102 , 1367 ( 2014 ). - 189 . Indiveri, G. & Horiuchi, T. Frontiers in Neuromorphic Engineering. Frontiers in Neuroscience 5 , 1 ( 2011 ). - 190 . Mead, C. Neuromorphic Electronic Systems. Proceedings of the IEEE 78 , 1629 ( 1990 ). - 191 . Serb, A., Bill, J., Khiat, A., Berdan, R., Legenstein, R. & Prodromakis, T. Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses. Nature Communications 7 , 12611 ( 2016 ). - 192 . Li, Y., Wang, Z., Midya, R., Xia, Q. & Yang, J. J. Review of memristor devices in neuromorphic computing: materials sciences and device challenges. Journal of Physics D: Applied Physics 51 , 503002 ( 2018 ). - 193 . Spiga, S., Sebastian, A., Querlioz, D. & Rajendran, B. in Memristive Devices for BrainInspired Computing (eds Spiga, S., Sebastian, A., Querlioz, D. & Rajendran, B.) 3 (Woodhead Publishing, 2020 ). - 194 . Payvand, M. & Indiveri, G. Spike-Based Plasticity Circuits for Always-on On-Line Learning in Neuromorphic Systems in IEEE International Symposium on Circuits and Systems (ISCAS) ( 2019 ), 1 . - 195 . Widrow, B. & Hoff, M. Adaptive Switching Circuits in 1960 IRE WESCON Convention Record, Part 4 (IRE, New York, 1960 ), 96 . - 196 . Payvand, M., Fouda, M. E., Kurdahi, F., Eltawil, A. & Neftci, E. O. Error-triggered three-factor learning dynamics for crossbar arrays in International Conference on Artificial Intelligence Circuits and Systems (AICAS) ( 2020 ), 218 . - 197 . Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules. Front. Neur. Circ. 12 , 53 ( 2018 ). - 198 . Neftci, E. O. Data and Power Efficient Intelligence with Neuromorphic Learning Machines. iScience 5 , 52 ( 2018 ). - 199 . Sanhueza, M. & Lisman, J. The CaMKII/NMDAR complex as a molecular memory. Molecular brain 6 , 1 ( 2013 ). - 200 . Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323 , 533 ( 1986 ). - 201 . Qiao, N., Bartolozzi, C. & Indiveri, G. An Ultralow Leakage Synaptic Scaling Homeostatic Plasticity Circuit With Configurable Time Scales up to 100 ks. IEEE Transactions on Biomedical Circuits and Systems ( 2017 ). - 202 . Bartolozzi, C. & Indiveri, G. Synaptic dynamics in analog VLSI. Neural Computation 19 , 2581 ( 2007 ). - 203 . Bartolozzi, C., Mitra, S. & Indiveri, G. An ultra low power current-mode filter for neuromorphic systems and biomedical signal processing in 2006 IEEE Biomedical Circuits and Systems Conference 2006 IEEE Biomedical Circuits and Systems Conference - Healthcare Technology (BioCas)London, UK (IEEE, 2006 ), 130 . - 204 . Saxena, V. & Baker, R. J. Compensation of CMOS op-amps using split-length transistors in Circuits and Systems (MWSCAS), 2008 IEEE 51 st International Midwest Symposium on ( 2008 ), 109 . - 205 . Garofolo, John S., Lamel, Lori F., Fisher, William M., Pallett, David S., Dahlgren, Nancy L., Zue, Victor & Fiscus, Jonathan G. TIMIT Acoustic-Phonetic Continuous Speech Corpus 1993 . - 206 . Cramer, B., Stradmann, Y., Schemmel, J. & Zenke, F. The Heidelberg spiking data sets for the systematic evaluation of spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems ( 2020 ). - 207 . Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv [cs.LG] ( 2017 ). - 208 . Orchard, G., Jayawant, A., Cohen, G. K. & Thakor, N. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades. Frontiers in Neuroscience 9 ( 2015 ). - 209 . Krizhevsky, A. Learning multiple layers of features from tiny images research rep. ( 2009 ). - 210 . Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. & Fei-Fei, L. ImageNet: A large-scale hierarchical image database in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009 ). - 211 . Sutton, R. S. & Barto, A. G. Reinforcement learning: an introduction (MIT Press, Cambridge, Mass, 1998 ). - 212 . Wawrzy´ nski, P. & Tanwani, A. K. Autonomous reinforcement learning with experience replay. Neural Networks 41 , 156 ( 2013 ). - 213 . Lehmann, M. P., Xu, H. A., Liakoni, V., Herzog, M. H., Gerstner, W. & Preuschoff, K. One-shot learning and behavioral eligibility traces in sequential decision making. Elife 8 , e 47463 ( 2019 ). - 214 . Lisman, J. A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. Proc. Natl. Acad. Sci. U. S. A. 86 , 9574 ( 23 1989 ). - 215 . Shouval, H. Z., Bear, M. F. & Cooper, L. N. A unified model of NMDA receptor-dependent bidirectional synaptic plasticity. Proceedings of the National Academy of Sciences 99 , 10831 ( 16 2002 ). - 216 . Bosch, M., Castro, J., Saneyoshi, T., Matsuno, H., Sur, M. & Hayashi, Y. Structural and Molecular Remodeling of Dendritic Spine Substructures during Long-Term Potentiation. Neuron 82 , 444 ( 2 2014 ). - 217 . He, K., Huertas, M., Hong, S. Z., Tie, X., Hell, J. W., Shouval, H. & Kirkwood, A. Distinct Eligibility Traces for LTP and LTD in Cortical Synapses. Neuron 88 , 528 ( 3 2015 ). - 218 . Brzosko, Z., Schultz, W. & Paulsen, O. Retroactive modulation of spike timing-dependent plasticity by dopamine. eLife 4 , e 09685 ( 2015 ). - 219 . Ielmini, D., Lavizzari, S., Sharma, D. & Lacaita, A. L. Temperature acceleration of structural relaxation in amorphous Ge 2 Sb 2 Te 5 . Applied Physics Letters 92 , 193511 ( 2008 ). - 220 . Pirovano, A., Lacaita, A. L., Pellizzer, F., Kostylev, S. A., Benvenuti, A. & Bez, R. Low-field amorphous state resistance and threshold voltage drift in chalcogenide materials. IEEE Transactions on Electron Devices 51 , 714 ( 2004 ). - 221 . Kim, S., Lee, B., Asheghi, M., Hurkx, F., Reifenberg, J. P., Goodson, K. E. & Wong, H.-S. P. Resistance and threshold switching voltage drift behavior in phase-change memory and their temperature dependence at microsecond time scales studied using a micro-thermal stage. IEEE Transactions on Electron Devices 58 , 584 ( 2011 ). - 222 . Demirag, Y. Multiphysics modeling of Ge 2 Sb 2 Te 5 based synaptic devices for brain inspired computing MA thesis (Ihsan Dogramaci Bilkent University, Ankara, Turkey, 2018 ). - 223 . Brader, J. M., Senn, W. & Fusi, S. Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Computation 19 , 2881 ( 2007 ). - 224 . Delbruck, T. & Mead, C. Bump circuits in Proceedings of International Joint Conference on Neural Networks 1 ( 1993 ), 475 . - 225 . Liu, S.-C., Kramer, J., Indiveri, G., Delbruck, T. & Douglas, R. Analog VLSI:Circuits and Principles (MIT Press, 2002 ). - 226 . Rubino, A., Payvand, M. & Indiveri, G. Ultra-Low Power Silicon Neuron Circuit for ExtremeEdge Neuromorphic Intelligence in International Conference on Electronics, Circuits, and Systems, (ICECS), 2019 ( 2019 ), 458 . - 227 . Strukov, D. B., Snider, G. S., Stewart, D. R. & Williams, R. S. The missing memristor found. Nature 453 , 80 ( 7191 2008 ). - 228 . Kumar, S., Williams, R. S. & Wang, Z. Third-order nanocircuit elements for neuromorphic engineering. Nature 585 , 518 ( 7826 2020 ). - 229 . Grollier, J., Querlioz, D., Camsari, K. Y., Everschor-Sitte, K., Fukami, S. & Stiles, M. D. Neuromorphic spintronics. Nat. Electron. 3 , 360 ( 7 2020 ). - 230 . Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 18 , 507 ( 5 1971 ). - 231 . Chicca, E., Stefanini, F., Bartolozzi, C. & Indiveri, G. Neuromorphic electronic circuits for building autonomous cognitive systems. Proc. IEEE Inst. Electr. Electron. Eng. 102 . Comment: Submitted to Proceedings of IEEE, spiking neural network implementations in full custom VLSI, 1367 ( 9 2014 ). - 232 . Cheng, Q., Song, S.-H. & Augustine, G. J. Molecular mechanisms of short-term plasticity: Role of synapsin phosphorylation in augmentation and potentiation of spontaneous glutamate release. Front. Synaptic Neurosci. 10 , 33 ( 2018 ). - 233 . Boyn, S., Grollier, J., Lecerf, G., Xu, B., Locatelli, N., Fusil, S., Girod, S., Carrétéro, C., Garcia, K., Xavier, S., Tomas, J., Bellaiche, L., Bibes, M., Barthélémy, A., Saïghi, S. & Garcia, V. Learning through ferroelectric domain dynamics in solid-state synapses. Nat. Commun. 8 , 14736 ( 1 2017 ). - 234 . Wang, Z., Joshi, S., Savel'ev, S. E., Jiang, H., Midya, R., Lin, P., Hu, M., Ge, N., Strachan, J. P., Li, Z., Wu, Q., Barnell, M., Li, G.-L., Xin, H. L., Williams, R. S., Xia, Q. & Yang, J. J. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. Nat. Mater. 16 , 101 ( 1 2017 ). - 235 . Mehonic, A., Sebastian, A., Rajendran, B., Simeone, O., Vasilaki, E. & Kenyon, A. J. Memristors-from in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio-inspired computing. Adv. Intell. Syst. 2 , 2000085 ( 11 2020 ). - 236 . Mahmoodi, M. R., Prezioso, M. & Strukov, D. B. Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization. Nat. Commun. 10 , 5113 ( 1 2019 ). - 237 . Karunaratne, G., Le Gallo, M., Cherubini, G., Benini, L., Rahimi, A. & Sebastian, A. Inmemory hyperdimensional computing. Nat. Electron. 3 , 327 ( 6 2020 ). - 238 . Tuma, T., Pantazi, A., Le Gallo, M., Sebastian, A. & Eleftheriou, E. Stochastic phase-change neurons. Nat. Nanotechnol. 11 , 693 ( 8 2016 ). - 239 . Appeltant, L., Soriano, M. C., Van der Sande, G., Danckaert, J., Massar, S., Dambre, J., Schrauwen, B., Mirasso, C. R. & Fischer, I. Information processing using a single dynamical node as complex system. Nat. Commun. 2 , 468 ( 1 2011 ). - 240 . Zhu, X., Wang, Q. & Lu, W. D. Memristor networks for real-time neural activity analysis. Nat. Commun. 11 , 2439 ( 1 2020 ). - 241 . Ninan, I. & Arancio, O. Presynaptic CaMKII is necessary for synaptic plasticity in cultured hippocampal neurons. Neuron 42 , 129 ( 1 2004 ). - 242 . Yang, J. J., Strukov, D. B. & Stewart, D. R. Memristive devices for computing. Nat. Nanotechnol. 8 , 13 ( 1 2013 ). - 243 . Midya, R., Wang, Z., Asapu, S., Joshi, S., Li, Y., Zhuo, Y., Song, W., Jiang, H., Upadhay, N., Rao, M., Lin, P., Li, C., Xia, Q. & Yang, J. J. Artificial neural network (ANN) to spiking neural network (SNN) converters based on diffusive memristors. Adv. Electron. Mater. 5 , 1900060 ( 9 2019 ). - 244 . Yang, K., Li, F., Veeramalai, C. P. & Guo, T. A facile synthesis of CH 3 NH 3 PbBr 3 perovskite quantum dots and their application in flexible nonvolatile memory. Appl. Phys. Lett. 110 , 083102 ( 8 2017 ). - 245 . Jeong, J., Kim, M., Seo, J., Lu, H., Ahlawat, P., Mishra, A., Yang, Y., Hope, M. A., Eickemeyer, F. T., Kim, M., Yoon, Y. J., Choi, I. W., Darwich, B. P., Choi, S. J., Jo, Y., Lee, J. H., Walker, B., Zakeeruddin, S. M., Emsley, L., Rothlisberger, U., Hagfeldt, A., Kim, D. S., Grätzel, M. & Kim, J. Y. Pseudo-halide anion engineering for α -FAPbI 3 perovskite solar cells. Nature 592 , 381 ( 7854 2021 ). - 246 . Hassan, Y., Park, J. H., Crawford, M. L., Sadhanala, A., Lee, J., Sadighian, J. C., Mosconi, E., Shivanna, R., Radicchi, E., Jeong, M., Yang, C., Choi, H., Park, S. H., Song, M. H., De Angelis, F., Wong, C. Y., Friend, R. H., Lee, B. R. & Snaith, H. J. Ligand-engineered bandgap stability in mixed-halide perovskite LEDs. Nature 591 , 72 ( 7848 2021 ). - 247 . Protesescu, L., Yakunin, S., Bodnarchuk, M. I., Krieg, F., Caputo, R., Hendon, C. H., Yang, R. X., Walsh, A. & Kovalenko, M. V. Nanocrystals of cesium lead Halide perovskites ( CsPbX 3, X = Cl, Br, and I): Novel optoelectronic materials showing bright emission with wide color gamut. Nano Lett. 15 , 3692 ( 6 2015 ). - 248 . Saidaminov, M. I., Adinolfi, V., Comin, R., Abdelhady, A. L., Peng, W., Dursun, I., Yuan, M., Hoogland, S., Sargent, E. H. & Bakr, O. M. Planar-integrated single-crystalline perovskite photodetectors. Nat. Commun. 6 , 8724 ( 1 2015 ). - 249 . Yakunin, S., Sytnyk, M., Kriegner, D., Shrestha, S., Richter, M., Matt, G. J., Azimi, H., Brabec, C. J., Stangl, J., Kovalenko, M. V. & Heiss, W. Detection of X-ray photons by solution-processed organic-inorganic perovskites. Nat. Photonics 9 , 444 ( 7 2015 ). - 250 . Wu, W., Han, X., Li, J., Wang, X., Zhang, Y., Huo, Z., Chen, Q., Sun, X., Xu, Z., Tan, Y., Pan, C. & Pan, A. Ultrathin and conformable lead Halide perovskite photodetector arrays for potential application in retina-like vision sensing. Adv. Mater. 33 , e 2006006 ( 9 2021 ). - 251 . Xiao, Z. & Huang, J. Energy-efficient hybrid perovskite memristors and synaptic devices. Adv. Electron. Mater. 2 , 1600100 ( 7 2016 ). - 252 . Xu, W., Cho, H., Kim, Y.-H., Kim, Y.-T., Wolf, C., Park, C.-G. & Lee, T.-W. Organometal Halide perovskite artificial synapses. Adv. Mater. 28 , 5916 ( 28 2016 ). - 253 . John, R. A., Yantara, N., Ng, Y. F., Narasimman, G., Mosconi, E., Meggiolaro, D., Kulkarni, M. R., Gopalakrishnan, P. K., Nguyen, C. A., De Angelis, F., Mhaisalkar, S. G., Basu, A. & Mathews, N. Ionotronic Halide perovskite drift-diffusive synapses for low-power neuromorphic computation. Adv. Mater. 30 , e 1805454 ( 51 2018 ). - 254 . Lee, S., Kim, H., Kim, D. H., Kim, W. B., Lee, J. M., Choi, J., Shin, H., Han, G. S., Jang, H. W. & Jung, H. S. Tailored 2 D/ 3 D Halide perovskite heterointerface for substantially enhanced endurance in conducting bridge resistive switching memory. ACS Appl. Mater. Interfaces 12 , 17039 ( 14 2020 ). - 255 . John, R. A., Yantara, N., Ng, S. E., Patdillah, M. I. B., Kulkarni, M. R., Jamaludin, N. F., Basu, J., Ankit, Mhaisalkar, S. G., Basu, A. & Mathews, N. Diffusive and drift Halide perovskite memristive barristors as nociceptive and synaptic emulators for neuromorphic computing. Adv. Mater. 33 , 2007851 ( 15 2021 ). - 256 . Tian, H., Zhao, L., Wang, X., Yeh, Y.-W., Yao, N., Rand, B. P. & Ren, T.-L. Extremely low operating current resistive memory based on exfoliated 2 D perovskite single crystals for neuromorphic computing. ACS Nano 11 , 12247 ( 12 2017 ). - 257 . Wang, Y., Lv, Z., Liao, Q., Shan, H., Chen, J., Zhou, Y., Zhou, L., Chen, X., Roy, V. A. L., Wang, Z., Xu, Z., Zeng, Y.-J. & Han, S.-T. Synergies of electrochemical metallization and valance change in all-inorganic perovskite quantum dots for resistive switching. Adv. Mater. 30 , e 1800327 ( 28 2018 ). - 258 . Tan, H., Ni, Z., Peng, W., Du, S., Liu, X., Zhao, S., Li, W., Ye, Z., Xu, M., Xu, Y., Pi, X. & Yang, D. Broadband optoelectronic synaptic devices based on silicon nanocrystals for neuromorphic computing. Nano Energy 52 , 422 ( 2018 ). - 259 . Jarschel, P., Kim, J. H., Biadala, L., Berthe, M., Lambert, Y., Osgood 3 rd, R. M., Patriarche, G., Grandidier, B. & Xu, J. Single-electron tunneling PbS/InP heterostructure nanoplatelets for synaptic operations. ACS Appl. Mater. Interfaces 13 , 38450 ( 32 2021 ). - 260 . Wang, Y., Lv, Z., Chen, J., Wang, Z., Zhou, Y., Zhou, L., Chen, X. & Han, S.-T. Photonic synapses based on inorganic perovskite quantum dots for neuromorphic computing. Adv. Mater. 30 , e 1802883 ( 38 2018 ). - 261 . Jiang, T., Shao, Z., Fang, H., Wang, W., Zhang, Q., Wu, D., Zhang, X. & Jie, J. Highperformance nanofloating gate memory based on lead Halide perovskite nanocrystals. ACS Appl. Mater. Interfaces 11 , 24367 ( 27 2019 ). - 262 . Hao, J., Kim, Y.-H., Habisreutinger, S. N., Harvey, S. P., Miller, E. M., Foradori, S. M., Arnold, M. S., Song, Z., Yan, Y., Luther, J. M. & Blackburn, J. L. Low-energy room-temperature optical switching in mixed-dimensionality nanoscale perovskite heterojunctions. Sci. Adv. 7 ( 18 2021 ). - 263 . Subramanian Periyal, S., Jagadeeswararao, M., Ng, S. E., John, R. A. & Mathews, N. Halide perovskite quantum dots photosensitized-amorphous oxide transistors for multimodal synapses. Adv. Mater. Technol. 5 , 2000514 ( 11 2020 ). - 264 . Xiao, X., Hu, J., Tang, S., Yan, K., Gao, B., Chen, H. & Zou, D. Recent advances in Halide perovskite memristors: Materials, structures, mechanisms, and applications. Adv. Mater. Technol. 5 , 1900914 ( 6 2020 ). - 265 . Xiao, Z., Yuan, Y., Shao, Y., Wang, Q., Dong, Q., Bi, C., Sharma, P., Gruverman, A. & Huang, J. Giant switchable photovoltaic effect in organometal trihalide perovskite devices. Nat. Mater. 14 , 193 ( 2 2015 ). - 266 . Chen, L.-W., Wang, W.-C., Ko, S.-H., Chen, C.-Y., Hsu, C.-T., Chiao, F.-C., Chen, T.-W., Wu, K.-C. & Lin, H.-W. Highly uniform all-vacuum-deposited inorganic perovskite artificial synapses for reservoir computing. Adv. Intell. Syst. 3 , 2000196 ( 1 2021 ). - 267 . Midya, R., Wang, Z., Zhang, J., Savel'ev, S. E., Li, C., Rao, M., Jang, M. H., Joshi, S., Jiang, H., Lin, P., Norris, K., Ge, N., Wu, Q., Barnell, M., Li, Z., Xin, H. L., Williams, R. S., Xia, Q. & Yang, J. J. Anatomy of Ag/Hafnia-based selectors with 1010 nonlinearity. Adv. Mater. 29 ( 12 2017 ). - 268 . Wang, Z., Rao, M., Midya, R., Joshi, S., Jiang, H., Lin, P., Song, W., Asapu, S., Zhuo, Y., Li, C., Wu, H., Xia, Q. & Yang, J. J. Threshold switching of Ag or Cu in dielectrics: Materials, mechanism, and applications. Adv. Funct. Mater. 28 , 1704862 ( 6 2018 ). - 269 . Guo, M. Q., Chen, Y. C., Lin, C. Y., Chang, Y. F., Fowler, B., Li, Q. Q., Lee, J. & Zhao, Y. G. Unidirectional threshold resistive switching in Au/NiO/Nb:SrTiO 3 devices. Appl. Phys. Lett. 110 , 233504 ( 23 2017 ). - 270 . Du, C., Cai, F., Zidan, M. A., Ma, W., Lee, S. H. & Lu, W. D. Reservoir computing using dynamic memristors for temporal information processing. Nat. Commun. 8 , 2204 ( 1 2017 ). - 271 . Gibbons, T. E. Unifying quality metrics for reservoir networks in The 2010 International Joint Conference on Neural Networks (IJCNN) 2010 International Joint Conference on Neural Networks (IJCNN)Barcelona, Spain (IEEE, 2010 ), 1 . - 272 . Suri, M., Bichler, O., Querlioz, D., Cueto, O., Perniola, L., Sousa, V., Vuillaume, D., Gamrat, C. & DeSalvo, B. Phase change memory as synapse for ultra-dense neuromorphic systems: Application to complex visual pattern extraction in 2011 International Electron Devices Meeting 2011 IEEE International Electron Devices Meeting (IEDM)Washington, DC, USA (IEEE, 2011 ), 4 . 4 . 1 . - 273 . Hu, M., Strachan, J. P., Li, Z., Grafals, E. M., Davila, N., Graves, C., Lam, S., Ge, N., Yang, J. J. & Williams, R. S. Dot-product engine for neuromorphic computing: programming 1 T 1 Mcrossbar to accelerate matrix-vector multiplication in Proceedings of the 53 rd Annual Design Automation Conference DAC ' 16 : The 53 rd Annual Design Automation Conference 2016 Austin Texas (ACM, New York, NY, USA, 2016 ), 1 . - 274 . Boybat, I., Le Gallo, M., Nandakumar, S. R., Moraitis, T., Parnell, T., Tuma, T., Rajendran, B., Leblebici, Y., Sebastian, A. & Eleftheriou, E. Neuromorphic computing with multimemristive synapses. Nat. Commun. 9 , 2514 ( 1 2018 ). - 275 . Sun, X., Wang, N., Chen, C.-Y., Ni, J., Agrawal, A., Cui, X., Venkataramani, S., El Maghraoui, K., Srinivasan, V. ( & Gopalakrishnan, K. Ultra-Low Precision 4 -bit Training of Deep Neural Networks in Advances in Neural Information Processing Systems (eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) 33 (Curran Associates, Inc., 2020 ), 1796 . - 276 . Payvand, M., Demirag, Y., Dalgaty, T., Vianello, E. & Indiveri, G. Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 2020 IEEE International Symposium on Circuits and Systems (ISCAS)Seville, Spain. 00 (IEEE, 2020 ), 1 . - 277 . Tanaka, G., Yamane, T., Héroux, J. B., Nakane, R., Kanazawa, N., Takeda, S., Numata, H., Nakano, D. & Hirose, A. Recent advances in physical reservoir computing: A review. Neural Netw. 115 , 100 ( 2019 ). - 278 . Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition 590 pp. (Cambridge University Press, Cambridge, England, 2014 ). - 279 . Watts, D. J. & Strogatz, S. H. Collective dynamics of 'small-world'networks. Nature 393 , 440 ( 1998 ). - 280 . Kawai, Y., Park, J. & Asada, M. A small-world topology enhances the echo state property and signal propagation in reservoir computing. Neural Networks 112 , 15 ( 2019 ). - 281 . Loeffler, A., Zhu, R., Hochstetter, J., Li, M., Fu, K., Diaz-Alvarez, A., Nakayama, T., Shine, J. M. & Kuncic, Z. Topological Properties of Neuromorphic Nanowire Networks. Frontiers in Neuroscience 14 , 184 ( 2020 ). - 282 . Park, H.-J. & Friston, K. Structural and Functional Brain Networks: From Connections to Cognition. Science 342 ( 2013 ). - 283 . Gallos, L. K., Makse, H. A. & Sigman, M. A small world of weak ties provides optimal global integration of self-similar modules in functional brain networks. Proceedings of the National Academy of Sciences 109 , 2825 ( 2012 ). - 284 . Sporns, O. & Zwi, J. D. The Small World of the Cerebral Cortex. Neuroinformatics 2 , 145 ( 2004 ). - 285 . Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 , 186 ( 2009 ). - 286 . Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 , 186 ( 2009 ). - 287 . Hasler, J. Large-scale field-programmable analog arrays. Proceedings of the IEEE 108 , 1283 ( 2019 ). - 288 . Jo, S. H., Chang, T., Ebong, I., Bhadviya, B. B., Mazumder, P. & Lu, W. Nanoscale memristor device as synapse in neuromorphic systems. Nano letters 10 , 1297 ( 2010 ). - 289 . Ielmini, D. & Waser, R. Resistive Switching: From Fundamentals of Nanoionic Redox Processes to Memristive Device Applications (John Wiley & Sons, 2015 ). - 290 . Strukov, D., Indiveri, G., Grollier, J. & Fusi, S. Building brain-inspired computing. Nature Communications 10 ( 2019 ). - 291 . Kingra, S. K., Parmar, V., Chang, C.-C., Hudec, B., Hou, T.-H. & Suri, M. SLIM: Simultaneous Logic-In-Memory computing exploiting bilayer analog OxRAM devices. Scientific Reports 10 , 1 ( 2020 ). - 292 . Wo´ zniak, S., Pantazi, A., Bohnstingl, T. & Eleftheriou, E. Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nature Machine Intelligence 2 , 325 ( 2020 ). - 293 . Ambrogio, S., Narayanan, P., Okazaki, A., Fasoli, A., Mackin, C., Hosokawa, K., Nomura, A., Yasuda, T., Chen, A., Friz, A., et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620 , 768 ( 2023 ). - 294 . Le Gallo, M., Khaddam-Aljameh, R., Stanisavljevic, M., Vasilopoulos, A., Kersting, B., Dazzi, M., Karunaratne, G., Brändli, M., Singh, A., Müller, S. M., et al. A 64 -core mixed-signal inmemory compute chip based on phase-change memory for deep neural network inference. Nature Electronics 6 , 680 ( 2023 ). - 295 . Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nature Nanotechnology 15 , 529 ( 2020 ). - 296 . Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al. In-datacenter performance analysis of a Tensor Processing Unit in Proceedings of the 44 th annual international symposium on computer architecture ( 2017 ), 1 . - 297 . Yu, S., Sun, X., Peng, X. & Huang, S. Compute-in-memory with emerging nonvolatile-memories: challenges and prospects in 2020 IEEE Custom Integrated Circuits Conference (CICC) ( 2020 ), 1 . - 298 . Joksas, D., Freitas, P., Chai, Z., Ng, W., Buckwell, M., Li, C., Zhang, W., Xia, Q., Kenyon, A. & Mehonic, A. Committee machines-a universal method to deal with non-idealities in memristor-based neural networks. Nature Communications 11 , 1 ( 2020 ). - 299 . Zidan, M. A., Strachan, J. P. & Lu, W. D. The future of electronics based on memristive systems. Nature Electronics 1 , 22 ( 2018 ). - 300 . Mannocci, P., Farronato, M., Lepri, N., Cattaneo, L., Glukhov, A., Sun, Z. & Ielmini, D. In-memory computing with emerging memory devices: Status and outlook. APL Machine Learning 1 ( 2023 ). - 301 . Duan, S., Hu, X., Dong, Z., Wang, L. & Mazumder, P. Memristor-based cellular nonlinear/neural network: design, analysis, and applications. IEEE Transactions on Neural Networks and Learning Systems 26 , 1202 ( 2014 ). - 302 . Ascoli, A., Messaris, I., Tetzlaff, R. & Chua, L. O. Theoretical foundations of memristor cellular nonlinear networks: Stability analysis with dynamic memristors. IEEE Transactions on Circuits and Systems I: Regular Papers 67 , 1389 ( 2019 ). - 303 . Wang, R., Shi, T., Zhang, X., Wei, J., Lu, J., Zhu, J., Wu, Z., Liu, Q. & Liu, M. Implementing insitu self-organizing maps with memristor crossbar arrays for data mining and optimization. Nature Communications 13 , 1 ( 2022 ). - 304 . Likharev, K., Mayr, A., Muckra, I. & Türel, Ö. CrossNets: High-performance neuromorphic architectures for CMOL circuits. Annals of the New York Academy of Sciences 1006 , 146 ( 2003 ). - 305 . Betta, G., Graffi, S., Kovacs, Z. M. & Masetti, G. CMOS implementation of an analogically programmable cellular neural network. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 40 , 206 ( 1993 ). - 306 . Khacef, L., Rodriguez, L. & Miramond, B. Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning. Electronics 9 , 1605 ( 2020 ). - 307 . Lin, P., Pi, S. & Xia, Q. 3 D integration of planar crossbar memristive devices with CMOS substrate. Nanotechnology 25 , 405202 ( 2014 ). - 308 . Boahen, K., Nomura, M., Ros Vidal, E. & Van Rullen, R. Address-Event Senders and Receivers: Implementing Direction-Selectivity and Orientation-Tuning (eds Cohen, A., Douglas, R., Horiuchi, T., Indiveri, G., Koch, C., Sejnowski, T. & Shamma, S.) 1998 . - 309 . Park, J., Yu, T., Joshi, S., Maier, C. & Cauwenberghs, G. Hierarchical address event routing for reconfigurable large-scale neuromorphic systems. IEEE transactions on neural networks and learning systems 28 , 2408 ( 2016 ). - 310 . Cai, F., Kumar, S., Van Vaerenbergh, T., Sheng, X., Liu, R., Li, C., Liu, Z., Foltin, M., Yu, S., Xia, Q., et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nature Electronics 3 , 409 ( 2020 ). - 311 . Bartolozzi, C. & Indiveri, G. Synaptic dynamics in analog VLSI. Neural computation 19 , 2581 ( 2007 ). - 312 . Esmanhotto, E., Brunet, L., Castellani, N., Bonnet, D., Dalgaty, T., Grenouillet, L., Ly, D., Cagli, C., Vizioz, C., Allouti, N., et al. High-Density 3 D Monolithically Integrated Multiple 1 T 1 R Multi-Level-Cell for Neural Networks in 2020 IEEE International Electron Devices Meeting (IEDM) ( 2020 ), 36 . - 313 . Chen, J., Wu, C., Indiveri, G. & Payvand, M. Reliability Analysis of Memristor Crossbar Routers: Collisions and On/off Ratio Requirement in 2022 29 th IEEE International Conference on Electronics, Circuits and Systems (ICECS) ( 2022 ), 1 . - 314 . Werbos, P. J. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78 , 1550 ( 1990 ). - 315 . Dalgaty, T., Castellani, N., Turck, C., Harabi, K.-E., Querlioz, D. & Vianello, E. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nature Electronics 4 , 151 ( 2021 ). - 316 . Zhao, M., Wu, H., Gao, B., Zhang, Q., Wu, W., Wang, S., Xi, Y., Wu, D., Deng, N., Yu, S., Chen, H.-Y. & Qian, H. Investigation of statistical retention of filamentary analog RRAM for neuromophic computing in 2017 IEEE International Electron Devices Meeting (IEDM) ( 2017 ), 39 . 4 . 1 . - 317 . Moro, F., Esmanhotto, E., Hirtzlin, T., Castellani, N., Trabelsi, A., Dalgaty, T., Molas, G., Andrieu, F., Brivio, S., Spiga, S., et al. Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks. IEEE International Symposium in Circuits and Systems ( 2022 ). - 318 . Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine 20 , 45 ( 2001 ). - 319 . Lee, H.-Y., Hsu, C.-M., Huang, S.-C., Shih, Y.-W. & Luo, C.-H. Designing low power of sigma delta modulator for biomedical application. Biomedical Engineering: Applications, Basis and Communications 17 , 181 ( 2005 ). - 320 . Corradi, F. & Indiveri, G. A neuromorphic event-based neural recording system for smart brain-machine-interfaces. IEEE Transactions on Biomedical Circuits and Systems 9 , 699 ( 2015 ). - 321 . Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. OpenAI Gym 2016 . - 322 . Luo, W., Sun, P., Zhong, F., Liu, W., Zhang, T. & Wang, Y. End-to-End Active Object Tracking and Its Real-World Deployment via Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 , 1317 ( 2020 ). - 323 . Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Science Robotics 5 ( 2020 ). - 324 . Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C. & Silver, D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575 , 350 ( 2019 ). - 325 . OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L. & Zaremba, W. Learning Dexterous In-Hand Manipulation. arXiv: 1808 . 00177 [cs, stat] ( 2019 ). - 326 . Jordan, J., Schmidt, M., Senn, W. & Petrovici, M. A. Evolving interpretable plasticity for spiking networks. eLife 10 , e 66273 ( 2021 ). - 327 . Rabaey, J. M., Chandrakasan, A. P. & Nikoli´ c, B. Digital integrated circuits: a design perspective (Pearson education Upper Saddle River, NJ, 2003 ). - 328 . Yik, J., Ahmed, S. H., Ahmed, Z., Anderson, B., Andreou, A. G., Bartolozzi, C., Basu, A., Blanken, D. d., Bogdan, P., Bohte, S., et al. NeuroBench: Advancing neuromorphic computing through collaborative, fair and representative benchmarking. arXiv preprint arXiv: 2304 . 04640 ( 2023 ). - 329 . Merolla, P., Arthur, J., Alvarez, R., Bussat, J.-M. & Boahen, K. A Multicast Tree Router for Multichip Neuromorphic Systems. Circuits and Systems I: Regular Papers, IEEE Transactions on 61 , 820 ( 2014 ). - 330 . Painkras, E., Plana, L., Garside, J., Temple, S., Galluppi, F., Patterson, C., Lester, D., Brown, A. & Furber, S. SpiNNaker: A 1 -W 18 -Core System-on-Chip for Massively-Parallel Neural Network Simulation. IEEE Journal of Solid-State Circuits 48 , 1943 ( 2013 ). - 331 . Benjamin, B. V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, A. R., Bussat, J., Alvarez-Icaza, R., Arthur, J., Merolla, P. & Boahen, K. Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations. Proceedings of the IEEE 102 , 699 ( 2014 ). - 332 . Basu, A., Deng, L., Frenkel, C. & Zhang, X. Spiking neural network integrated circuits: A review of trends and future directions in 2022 IEEE Custom Integrated Circuits Conference (CICC) ( 2022 ), 1 . - 333 . Pan, X., Ye, T., Xia, Z., Song, S. & Huang, G. Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition ( 2023 ), 2082 . - 334 . Yu, T., Li, X., Cai, Y., Sun, M. & Li, P. S 2 -mlp: Spatial-shift mlp architecture for vision in Proceedings of the IEEE/CVF winter conference on applications of computer vision ( 2022 ), 297 . - 335 . Strother, J. A., Nern, A. & Reiser, M. B. Direct observation of ON and OFF pathways in the Drosophila visual system. Current Biology 24 , 976 ( 2014 ). - 336 . Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G. A. F., Joshi, P., Plank, P. & Risbud, S. R. Advancing neuromorphic computing with Loihi: A survey of results and outlook. Proceedings of the IEEE 109 , 911 ( 2021 ). - 337 . Dalgaty, T., Mesquida, T., Joubert, D., Sironi, A., Vivet, P. & Posch, C. HUGNet: Hemi-Spherical Update Graph Neural Network applied to low-latency event-based optical flow in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition ( 2023 ), 3952 . - 338 . Aimone, J. B., Date, P., Fonseca-Guerra, G. A., Hamilton, K. E., Henke, K., Kay, B., Kenyon, G. T., Kulkarni, S. R., Mniszewski, S. M., Parsa, M., et al. A review of non-cognitive applications for neuromorphic computing. Neuromorphic Computing and Engineering 2 , 032003 ( 2022 ). - 339 . Dalgaty, T., Payvand, M., De Salvo, B., et al. Hybrid CMOS-RRAM neurons with intrinsic plasticity in IEEE ISCAS ( 2019 ), 1 . - 340 . Joshi, V., Gallo, M. L., Haefeli, S., Boybat, I., Nandakumar, S. R., Piveteau, C., Dazzi, M., Rajendran, B., Sebastian, A. & Eleftheriou, E. Accurate deep neural network inference using computational phase-change memory. Nature Communications 11 ( 2020 ). - 341 . Corradi, F., Bontrager, D. & Indiveri, G. Toward neuromorphic intelligent brain-machine interfaces: An event-based neural recording and processing system in Biomedical Circuits and Systems Conference (BioCAS) ( 2014 ), 584 . - 342 . Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I. & Bachem, O. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation version 0 . 0 . 13 . 2021 . - 343 . Zucchet, N., Meier, R., Schug, S., Mujika, A. & Sacramento, J. Online learning of long-range dependencies. arXiv [cs.LG] ( 2023 ). - 344 . Scellier, B. & Bengio, Y. Equilibrium Propagation: Bridging the gap between energy-based models and Backpropagation. Front. Comput. Neurosci. 11 , 24 ( 2017 ). - 345 . Polimeni, J. M., Mayumi, K., Giampietro, M. & Alcott, B. The Jevons paradox and the myth of resource efficiency improvements 200 pp. (Routledge, London, England, 2012 ). - 346 . Newman, M. & Watts, D. Renormalization group analysis of the small-world network model. Physics Letters A 263 , 341 ( 1999 ). - 347 . Zamarreño-Ramos, C., Camuñas-Mesa, L., Pérez-Carrasco, J., Masquelier, T., SerranoGotarredona, T. & Linares-Barranco, B. On spike-timing-dependent-plasticity, memristive devices, and building a self-learning visual cortex. Frontiers in Neuroscience 5 , 1 ( 2011 ). - 348 . Zhu, X., Lee, J. & Lu, W. D. Iodine vacancy redistribution in organic-inorganic Halide perovskite films and resistive switching effects. Adv. Mater. 29 , 1700527 ( 29 2017 ). - 349 . Nedelcu, G., Protesescu, L., Yakunin, S., Bodnarchuk, M. I., Grotevent, M. J. & Kovalenko, M. V. Fast anion-exchange in highly luminescent nanocrystals of cesium lead Halide perovskites (CsPbX 3 , X = Cl, Br, I). Nano Lett. 15 , 5635 ( 8 2015 ). This thesis consists of six selected publications, conducted in collaboration with electrical engineers, computer scientists, material designers and neuroscientists. In this section, I roughly discuss my personal contributions to each project. ## Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning (Chapter 2 ) - collaborated with material scientists for data collection and modeling (e.g., Fig. 1 ) - coded training and evaluation of RRAM-based neural network simulations (e.g., Alg. 1 ) - assisted circuit design with simulation findings (e.g., Fig. 2 ) - contributed to majority of the manuscript including figures Online training of spiking recurrent neural networks with Phase-Change Memory synapses (Chapter 3 ) - performed literature review to identify the problem - coded e-prop, PCM-based analog simulation framework, and neural network training - conducted all data analysis and visualization - contributed to majority of the manuscript including figures ## Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware (Chapter 3 ) - collaboratively planned the project with IBM researchers and INI (e.g., work assignments, experiments, deadlines) - assisted with experiment datasets, architecture design following prior work [ 126 ] - weekly supervision of Anja Šurina (e.g., debugging, hyperparameter opt...) - assisted paper writing and designed several figures ## PCM-trace: scalable synaptic eligibility traces with resistivity drift of Phase-Change Materials (Chapter 3 ) - performed literature review to identify the problem - collaborated with material scientists for data collection and modeling - coded pcm-trace and multi-pcm trace experiments and analysis - assisted circuit design with simulation findings - contributed to majority of the manuscript ## Reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing (Chapter 4 ) - performed literature review to identify the problem - collaborated with material scientists for data collection, required device specifications and modeling (e.g. non-volatility time constant, Fig. 4 b-c) - coded training and evaluation of simulations with volatile and non-volatile memristor models (e.g., RC framework) - designed ICC modulated training following prior work [ 276 ] (e.g., Supplementary Fig. 28 ) - contributed to majority of the manuscript Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems (Chapter 5 ) - performed extensive literature review to identify the strengths the idea - collaborated with material scientists for data collection and modeling (e.g., Fig. 2 d) - coded layout-aware training and evaluation of SHD with backprop and RL with ES benchmarks (e.g., Fig. 4 ) - contributed to majority of the manuscript ## Articles in peer-reviewed journals: - 1 . John, R. A., Demirag, Y., Shynkarenko, Y., Berezovska, Y., Ohannessian, N., Payvand, M., Zeng, P., Bodnarchuk, M. I., Krumeich, F., Kara, G., Shorubalko, I., Nair, M. V., Cooke, G. A., Lippert, T., Indiveri, G. & Kovalenko, M. V. Reconfigurable halide perovskite nanocrystal memristors for neuromorphic computing. Nat. Commun. 13 , 2074 ( 1 2022 ). - 2 . Dalgaty, T., Moro, F., Demirag, Y., De Pra, A., Indiveri, G., Vianello, E. & Payvand, M. Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems. Nat. Commun. 15 , 1 ( 1 2024 ). - 3 . D'Agostino, S., Moro, F., Torchet, T., Demirag, Y., Grenouillet, L., Castellani, N., Indiveri, G., Vianello, E. & Payvand, M. DenRAM: neuromorphic dendritic architecture with RRAM for efficient temporal processing with delays. Nat. Commun. 15 , 1 ( 1 2024 ). ## Preprints: - 4 . Demirag, Y., Frenkel, C., Payvand, M. & Indiveri, G. Online training of spiking recurrent neural networks with Phase-Change Memory synapses 2021 . ## Conference contributions: - 5 . Demirag, Y., Moro, F., Dalgaty, T., Navarro, G., Frenkel, C., Indiveri, G., Vianello, E. & Payvand, M. PCM-trace: Scalable synaptic eligibility traces with resistivity drift of phase-change materials in 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 2021 IEEE International Symposium on Circuits and Systems (ISCAS)Daegu, Korea. 00 (IEEE, 2021 ), 1 . - 6 . Demirag, Y. & Indiveri, G. Network of biologically plausible neuron models can solve motor tasks through heterogeneity in Computational and Systems Neuroscience (COSYNE) (Lisbon, Portugal, 2024 ). - 7 . Demirag, Y., Dittmann, R., Indiveri, G. & Neftci, E. Overcoming phase-change material nonidealities by meta-learning for adaptation on the edge in Proceedings of Neuromorphic Materials, Devices, Circuits and Systems (NeuMatDeCaS) ( 2023 ). - 8 . Payvand, M., Demirag, Y., Dalgaty, T., Vianello, E. & Indiveri, G. Analog weight updates with compliance current modulation of binary ReRAMs for on-chip learning in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 2020 IEEE International Symposium on Circuits and Systems (ISCAS)Seville, Spain. 00 (IEEE, 2020 ), 1 . - 9 . Bohnstingl, T., Surina, A., Fabre, M., Demirag, Y., Frenkel, C., Payvand, M., Indiveri, G. & Pantazi, A. Biologically-inspired training of spiking recurrent neural networks with neuromorphic hardware in 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2022 IEEE 4 th International Conference on Artificial Intelligence Circuits and Systems (AICAS)Incheon, Korea, Republic of. 00 (IEEE, 2022 ), 218 . - 10 . Payvand, M., D'Agostino, S., Moro, F., Demirag, Y., Indiveri, G. & Vianello, E. Dendritic computation through exploiting resistive memory as both delays and weights in Proceedings of the 2023 International Conference on Neuromorphic Systems International Conference on Neuromorphic Systems (ICONS) (New York, USA, 2023 ), 1 . - 11 . Raghunathan, K. C., Demirag, Y., Neftci, E. & Payvand, M. Hardware-aware Few-shot Learning on a Memristor-based Small-world Architecture in Neuro Inspired Computational Elements Conference (NICE) (IEEE, 2024 ). - 12 . Raghunathan, K. C., Demirag, Y., Moro, F., Neftci, E. & Payvand, M. Few-shot learning on brain-inspired small-world graphical hardware in International Conference on Neuromorphic, Natural and Physical Computing (NNPC 2023 ) (Hannover, Germany, 2023 ). - 13 . Yu, Z., Bégon-Lours, L., Demirag, Y. & Offrein, B. J. BEOL compatible cross-bar array of ferroelectric synapses in Proceedings of the 2021 International Conference on Neuromorphic Systems International Conference on Neuromorphic Systems (ICONS) ( 2021 ). - 14 . Yik, J., Ahmed, S. H., Ahmed, Z., Anderson, B., Andreou, A. G., Bartolozzi, C., Basu, A., Blanken, D. d., Bogdan, P., Bohte, S., Bouhadjar, Y., Buckley, S., Cauwenberghs, G., Corradi, F., de Croon, G., Danielescu, A., Daram, A., Davies, M., Demirag, Y., Eshraghian, J., Forest, J., Furber, S., Furlong, M., Gilra, A., Indiveri, G., Joshi, S., Karia, V., Khacef, L., Knight, J. C., Kriener, L., Kubendran, R., Kudithipudi, D., Lenz, G., Manohar, R., Mayr, C., Michmizos, K., Muir, D., Neftci, E., Nowotny, T., Ottati, F., Ozcelikkale, A., Pacik-Nelson, N., Panda, P., Pao-Sheng, S., Payvand, M., Pehle, C., Petrovici, M. A., Posch, C., Renner, A., Sandamirskaya, Y., Schaefer, C. J. S., van Schaik, A., Schemmel, J., Schuman, C., Seo, J.-S., Sheik, S., Shrestha, S. B., Sifalakis, M., Sironi, A., Stewart, K., Stewart, T. C., Stratmann, P., Tang, G., Timcheck, J., Verhelst, M., Vineyard, C. M., Vogginger, B., Yousefzadeh, A., Zhou, B., Zohora, F. T., Frenkel, C. & Reddi, V. J. NeuroBench: Advancing neuromorphic computing through collaborative, fair and representative benchmarking in ( 2023 ).

Rendering Paper...