## Diagram: ARM Architecture Extensions and Streaming Compatibility
### Overview
This image is a technical block diagram illustrating the relationship and feature sets of different ARM architecture extensions, specifically focusing on Advanced SIMD Extension (ASE), Scalable Vector Extension 2 (SVE2), and Scalable Matrix Extension (SME). It details the operational modes defined by processor state bits (PSTATE.SM and PSTATE.ZA) and highlights which extensions are "Streaming-Compatible."
### Components/Axes
The diagram is composed of nested and overlapping colored blocks, each representing a specific extension or mode. There are no traditional chart axes. The key components are:
1. **ASE (Advanced SIMD Extension) Block:**
* **Color:** Light purple.
* **Position:** Far left, separate from the main nested structure.
* **Label:** `ASE (Advanced SIMD Extension)`
* **Condition:** `(PSTATE.SM=0)`
* **Content (Bullet Points):**
* `128 bits`
* `Legacy`
2. **SVE2 Block:**
* **Color:** Light orange.
* **Position:** To the right of ASE, partially overlapped by the Streaming SVE2 block.
* **Label:** `SVE2`
* **Condition:** `(PSTATE.SM=0)`
* **Content (Bullet Points):**
* `Non-streaming VL`
* `Full SVE2 ISA`
3. **Streaming SVE2 Block:**
* **Color:** Light green.
* **Position:** Nested inside the SME block, overlapping the SVE2 block.
* **Label:** `Streaming SVE2`
* **Condition:** `PSTATE.SM=1 and PSTATE.ZA=0`
* **Content (Bullet Points):**
* `Non-Streaming VL or Streaming VL`
* `Subset of SVE2 ISA, unless FA64=1`
* `No ASE instructions unless FA64=1`
* `Low performance vector->gpr, predicate->condcode`
4. **SME Block:**
* **Color:** Light blue.
* **Position:** The largest, outermost block on the right, encompassing the Streaming SVE2 block.
* **Label:** `SME`
* **Condition:** `PSTATE.SM=1 and PSTATE.ZA=1`
* **Content (Two Columns of Bullet Points):**
* **Left Column (under "Streaming VL"):**
* `multi-vector zip`
* `multi-vector convert`
* `multi-vector loads and stores to Z`
* **Right Column (main list):**
* `Streaming VL`
* `Outer-products to ZA`
* `multi-vector mla to ZA`
* `add/sub to ZA`
* `move to/from ZA`
* `Load/store to/from ZA`
* `LUTI using ZT0`
* `multi-vector zip`
* `multi-vector convert`
* `multi-vector loads and stores to Z`
* **Bottom List (shared with Streaming SVE2 block):**
* `Subset of SVE2 ISA, unless FA64=1`
* `No ASE instructions unless FA64=1`
* `Low performance vector->gpr, predicate->condcode`
5. **Streaming-Compatible Bracket:**
* **Position:** Bottom center, spanning the width of the Streaming SVE2 and SME blocks.
* **Label:** `Streaming-Compatible`
### Detailed Analysis
The diagram defines four primary operational states based on the `PSTATE.SM` and `PSTATE.ZA` bits:
1. **Legacy/Non-Streaming State (`PSTATE.SM=0`):**
* **ASE:** Provides 128-bit legacy SIMD instructions.
* **SVE2:** Provides the full SVE2 Instruction Set Architecture (ISA) with a Non-streaming Vector Length (VL).
2. **Streaming SVE2 State (`PSTATE.SM=1, PSTATE.ZA=0`):**
* Enables a subset of the SVE2 ISA.
* Supports either Non-Streaming or Streaming Vector Length (VL).
* Excludes ASE instructions and certain high-performance conversions unless the `FA64` bit is set to 1.
* Noted for low performance on specific operations (vector to general-purpose register, predicate to condition code).
3. **Full SME State (`PSTATE.SM=1, PSTATE.ZA=1`):**
* Encompasses all features of Streaming SVE2.
* Adds the ZA storage array and associated instructions:
* Matrix operations: Outer-products, multi-vector multiply-accumulate (mla), add/subtract.
* Data movement: Move, load/store to/from ZA.
* Lookup Table Instruction (LUTI) using ZT0.
* Additional multi-vector operations (zip, convert, loads/stores to Z).
4. **Streaming-Compatible:** The bracket indicates that both the **Streaming SVE2** and **SME** modes are considered "Streaming-Compatible," implying they share a common execution context or software model distinct from the non-streaming ASE/SVE2 modes.
### Key Observations
* **Hierarchical Containment:** The SME block contains the Streaming SVE2 block, which in turn overlaps the SVE2 block. This visually represents that SME mode includes Streaming SVE2 capabilities, and Streaming SVE2 is a modified subset of the full SVE2 ISA.
* **PSTATE as a Gatekeeper:** The `PSTATE.SM` bit acts as the primary switch between non-streaming (0) and streaming (1) modes. The `PSTATE.ZA` bit, when set to 1 within streaming mode, enables the full SME feature set.
* **Conditional Feature Availability:** The note `unless FA64=1` appears repeatedly, indicating that the `FA64` control bit can override the default exclusion of ASE instructions and enable full 64-bit Scalable Vector Extension functionality within streaming modes.
* **Performance Caveat:** Both streaming modes (Streaming SVE2 and SME) are explicitly noted to have "Low performance" for certain scalar conversion operations (`vector->gpr, predicate->condcode`), suggesting these are not the optimized path in these modes.
### Interpretation
This diagram serves as a crucial reference for understanding the execution context of ARM's advanced vector and matrix extensions. It clarifies that software cannot assume all SVE2 or ASE instructions are available or performant in all states.
The key takeaway is the existence of two distinct execution paradigms:
1. **Non-Streaming (`PSTATE.SM=0`):** For traditional, full-featured SVE2 and legacy ASE code.
2. **Streaming (`PSTATE.SM=1`):** A specialized mode for SME and Streaming SVE2 workloads. This mode prioritizes matrix (ZA array) and multi-vector operations but restricts or degrades the performance of certain legacy and conversion operations.
The "Streaming-Compatible" label suggests that code written for the Streaming SVE2 subset should, in principle, be compatible with and executable on a processor in the full SME state, as SME is a superset. The diagram is essential for compiler writers, runtime developers, and low-level programmers to correctly generate and manage code that utilizes these extensions, ensuring instructions are emitted for the correct PSTATE and that performance-critical conversions are handled appropriately.