2301.03724

Model: gemini-2.0-flash

## SoK: Hardware Defenses Against Speculative Execution Attacks Guangyuan Hu Princeton University gh9@princeton.edu Zecheng He Princeton University zechengh@princeton.edu Abstract -Speculative execution attacks leverage the speculative and out-of-order execution features in modern computer processors to access secret data or execute code that should not be executed. Secret information can then be leaked through a covert channel. While software patches can be installed for mitigation on existing hardware, these solutions can incur big performance overhead. Hardware mitigation is being studied extensively by the computer architecture community. It has the benefit of preserving software compatibility and the potential for much smaller performance overhead than software solutions. This paper presents a systematization of the hardware defenses against speculative execution attacks that have been proposed. We show that speculative execution attacks consist of 6 critical attack steps. We propose defense strategies, each of which prevents a critical attack step from happening, thus preventing the attack from succeeding. We then summarize 20 hardware defenses and overhead-reducing features that have been proposed. We show that each defense proposed can be classified under one of our defense strategies, which also explains why it can thwart the attack from succeeding. We discuss the scope of the defenses, their performance overhead, and the security-performance trade-offs that can be made. ## I. INTRODUCTION Speculative execution attacks, also known as transient execution attacks, are a serious security problem. They exploit performance enhancement features in hardware to access secret data and leak this secret out through microarchitectural covert channels. This negates the confidentiality and integrity protections provided by software isolation, and also by hardware isolation features such as secure enclaves [1], [2]. In particular, Spectre [3], Meltdown [4] and Foreshadow [5] bypass the isolation across processes and privilege levels. The Spectre attack bypasses the memory protection provided by software bounds checking, while the Meltdown attack breaches the memory isolation between the kernel and a user application. Foreshadow [5], and its variants Foreshadow-OS and Foreshadow-VMM [6], breach the Intel SGX enclave isolation, user-to-kernel memory isolation, and virtual-machineto-hypervisor isolation, respectively. The severity of these attacks has resulted in many specific fixes for specific attack variants implemented by the computer industry. These include using instructions to serialize execution [7], [8], to flush hardware prediction states [9], to avoid using untrusted predictions [10], and to restrict accesses to secret information [11]-[13]. However, most of these solutions require changes to the existing software. Furthermore, Ruby B. Lee Princeton University rblee@princeton.edu TABLE I: Hardware defenses and overhead-reducing features against speculative execution attacks published in recent computer architecture and security conferences. | Defense and Overhead-reducing Feature | Conference | Year | |-------------------------------------------------|-----------------|--------| | InvisiSpec [14] | MICRO | 2018 | | DAWG [15] | MICRO | 2018 | | CondSpec [16] | HPCA | 2019 | | Context-sensitive fencing (CSF) [17] | ASPLOS | 2019 | | SpectreGuard [18] | DAC | 2019 | | SafeSpec [19] | DAC | 2019 | | EfficientSpec [20] | ISCA | 2019 | | SpecShield [21] | PACT | 2019 | | STT [22] | MICRO | 2019 | | NDA [23] | MICRO | 2019 | | CleanupSpec [24] | MICRO | 2019 | | MI6 [25] | MICRO | 2019 | | IRONHIDE [26] | HPCA | 2020 | | ConTExT [27] | NDSS | 2020 | | Predictor state encryption [28] | ISCA | 2020 | | MuonTrap [29] | ISCA | 2020 | | Speculative Data-Oblivious Execution (SDO) [30] | ISCA | 2020 | | Clearing the Shadows [31] | PACT | 2020 | | InvarSpec [32] | MICRO | 2020 | | DOLMA [33] | USENIX Security | 2021 | they also cause significant performance overhead (at least 2X slower [11], sometimes up to 8X). Last but not least, the software countermeasures are usually attack-specific. New patches are required to effectively protect against the emerging attacks, which is neither efficient nor sustainable. In response to these attacks on hardware microarchitecture performance optimization features, there have been proposals of hardware defenses as well as features that reduce the performance overhead of defenses [14]-[33], which we show in Table I in chronological order. One key advantage is that the hardware solution can monitor the instruction execution status and accurately protect against speculative vulnerabilities. Another advantage of some hardware solutions is their nonintrusive interaction with the existing software, while inducing low performance overhead. These hardware defenses can read the unmodified program but delay or change the execution of secret-leaking instructions so that the information leakage through hardware states is eliminated. Some microarchitectural defenses also allow security-performance trade-offs and overhead-reducing features [30]-[32]. However, the working mechanisms and scope of different hardware defenses have not been systematically described and compared. Hence, our goal in this paper is to systematize the hardware defenses, to illustrate their key similarities and differences, and to assist future researchers to more easily understand and reason about how an existing defense works. While the goal of this paper is not to describe all the speculative execution attacks in detail, as there are many past work surveying and summarizing these [34]-[37], we analyze the critical attack steps of 23 speculative execution attacks. We then show how the hardware defenses mitigate the attacks by preventing these steps, connecting the attacks and defenses. Our key contributions are: - Producing attack taxonomies based on secret access or secret leakage, covering 23 variants of speculative execution attacks. - Defining new defense strategies based on preventing at least one of the critical attack steps. - Producing a new taxonomy of 4 hardware defense strategies and lower-level categories of defenses. - Creating a systematized view and description of 20 representative hardware defenses and overhead-reducing features proposed to date. - Presenting the performance overhead of the defenses, and illustrating security-performance tradeoffs. ## II. MICROARCHITECTURE AND COVERT CHANNEL BACKGROUND We first describe hardware performance optimization features that can be exploited for speculative execution attacks. Out-of-Order (OoO) execution. An Out-of-Order (OoO) processor is a microarchitecture performance enhancement feature used to boost the throughput of processors by allowing instructions later in the program order to execute before the previous instructions have completed. For example, an earlier instruction may be waiting for one of its operands, or for a functional unit or memory to free up, or for determining if a branch should be taken or where to branch to. Later instructions in an in-order processor that have no dependencies will have to wait unnecessarily. In contrast, an Out-of-Order processor allows the instructions with no dependencies to execute immediately, as long as they retire in-order. Fig. 1 shows a generic Out-of-Order (OoO) processor where instructions are fetched in program order but executed Outof-Order. Instructions are forced to retire in-order to maintain precise exceptions, i.e., if an instruction results in an exception, the following instructions must be 'squashed' as if they were never executed. We will use this generic OoO model to explain the defenses in a unified way in the rest of the paper. Instructions are fetched and decoded to microarchitecturelevel operations (denoted m op's) in program order, but after the m op's are dispatched to the execution stage, the hardware scheduler can schedule any ready m op's to different functional units for execution. Thus, the execution of later m op's can complete earlier than those of previous instructions, which also allows the results of these m op's to be used earlier. The result from the execution of a m op is forwarded and used by other dependent m op's. An important microarchitecture structure, that we will refer to in discussing hardware defenses, is the Re-Order Buffer (ROB) shown in Fig. 1. The ROB records the instruction's or m op's information as well as its execution status, such as whether the instruction has finished its execution ( Done = '1' in the figure) and whether the instruction should be squashed ( Squash = '1'). The ROB guarantees that if an instruction needs to be squashed, all the subsequent instructions are also squashed. The ROB also acts as a FIFO queue to preserve the program order so an instruction can only retire when it reaches the head of the ROB. Speculative execution. Speculative execution is a further performance enhancement that allows instructions to be tentatively (i.e., speculatively) executed, even when the control flow has not been determined, or the data from memory has not arrived. For instance, when the processor fetches a branch whose operand is not available, e.g., having to be read from memory, the address of the next instruction is predicted and fetched so that the processor does not have to stall its pipeline. If the prediction is found to be correct later, the speculative execution improves the performance by executing code on the correct path in advance. However, if the prediction is found to be incorrect, the processor needs to flush the pipeline so that the results of the speculatively executed instructions are discarded. This is called a squash , where the processor restores the architectural state, e.g., the register values visible to the software, as if the mispredicted instructions have not executed. Hardware predictors. From the microarchitecture perspective, speculative execution happens because hardware predictors are present that allow tentative forward progress even when an instruction has unresolved dependencies. For conditional branch instructions, branch predictors predict whether the branch will be taken or not. For indirect branches, the Branch Target Buffer (BTB) predicts the target address. For return instructions, the Return Stack Buffer (RSB) or Return Address Stack (RAS) predicts the address to return to after a procedure call. Microarchitectural state and covert channels. Microarchitectural states are the states of hardware units that are not directly accessible to the programmer or software. Even if invisible from the software's view, these states can impact the execution time of certain programs and the states can be inferred if the processor executes these programs. If one program modifies a certain microarchitectural state with another monitoring it, these two programs form a microarchitectural covert channel in which the former is the sender and the latter is the receiver. Examples include the addresses of cache lines in various cache levels, which we describe in detail below, and the busy status of different hardware resources. Cache state and covert channel. One critical microarchitectural state is the cache state. A cache has many cache lines corresponding to different addresses. Since cache hits are fast, and cache misses are slow, cache timing attacks are possible, leaking information through observing the cache access time. One example exploiting a cache covert channel is the flush- Fig. 1: A block diagram of a typical out-of-order processor. The contents of the ROB show a sample situation where instruction m op[0] executed correctly, but m op[1] to m op[N] were executed speculatively and incorrectly and had to be squased. <details> <summary>Image 1 Details</summary> ![0a0e54af](/v1/image/0a0e54af75ed6a312293b7934c392f8b0f1cfc4ba85af8b52bd80c725942c028) ### Visual Description ## Processor Architecture Diagram: In-order Frontend, Out-of-order Execution, and In-order Retire ### Overview The image is a block diagram illustrating the architecture of a processor, highlighting the flow of instructions through different stages: In-order Frontend, Out-of-order Execution, and In-order Retire. It shows the components within each stage and the connections between them. ### Components/Axes * **In-order Frontend:** This section includes the initial stages of instruction processing. * Branch Prediction Unit * Instruction Cache * Instruction Fetch * Decode * **Out-of-order Execution:** This section shows the components involved in executing instructions out of order. * Scheduler * ALU, DIV, ... * ALU, Vect, ... * Load Buffer * Store-to-load Forwarding * Store Buffer * Load port * **In-order Retire:** This section shows the final stage where instructions are retired in order. * Reorder Buffer (ROB) * Columns: Inst Info, Done, Squash * Rows: μop[0], μop[1], ..., μop[N] * **Caches and Buffers:** * L2 Cache * Line Fill buffer * L1 Data Cache * **Arrows:** Indicate the flow of data and control signals between the components. ### Detailed Analysis or ### Content Details **In-order Frontend:** * The Branch Prediction Unit feeds into the Instruction Cache. * The Instruction Cache feeds into the Instruction Fetch. * The Instruction Fetch feeds into the Decode unit. * The Decode unit outputs μop's (micro-operations) to the Scheduler in the Out-of-order Execution block. * The L2 Cache feeds into the Instruction Cache. **Out-of-order Execution:** * The Scheduler is connected to multiple Arithmetic Logic Units (ALUs), including those for division (DIV) and vector operations (Vect). * The Scheduler is also connected to the Load Buffer and Store Buffer. * Store-to-load Forwarding is indicated between the Load Buffer and Store Buffer. * The Load port connects the Load Buffer to the L1 Data Cache. * Result Forwarding connects the Out-of-order Execution block to the In-order Retire block. **In-order Retire:** * The Reorder Buffer (ROB) contains information about instructions, including whether they are "Done" and whether they are "Squashed". * The table shows the following data: * μop[0]: Done = 1, Squash = 0 * μop[1]: Done = 1, Squash = 1 * μop[N]: Done = 0, Squash = 1 **Caches and Buffers:** * The L2 Cache feeds into the Line Fill buffer. * The Line Fill buffer feeds into the L1 Data Cache. * The L1 Data Cache feeds into the Load port. ### Key Observations * The diagram illustrates a common processor architecture with in-order instruction fetching and decoding, followed by out-of-order execution and in-order retirement. * The Reorder Buffer (ROB) plays a crucial role in ensuring that instructions are retired in the correct order, even though they may have been executed out of order. * The "Done" and "Squash" flags in the ROB indicate the status of each micro-operation. ### Interpretation The diagram provides a high-level overview of a processor's architecture, emphasizing the key stages of instruction processing. The separation of the frontend (in-order), execution (out-of-order), and retirement (in-order) stages allows for efficient instruction processing by overlapping the execution of multiple instructions. The Reorder Buffer (ROB) is essential for maintaining program order and handling exceptions or mispredicted branches. The "Squash" flag indicates that an instruction needs to be discarded, likely due to a branch misprediction or an exception. The data in the ROB table shows examples of instructions that are done and not squashed (μop[0]), done and squashed (μop[1]), and not done but squashed (μop[N]). </details> reload technique [38], where the sender is an insider and the receiver an outsider attacker. During the setup phase of the covert channel, certain cache lines are flushed out of the cache. To send a secret out, the sender accesses a secretdependent address, which brings back one of the flushed lines. The receiver will later measure the time to reload each cache line and infer whether this cache line is fetched by the sender by observing whether it is a cache hit. The flush-reload cache covert channel is used in most of the speculative execution attacks published. There are other techniques for covert communication through cache state. In a prime-probe [39] covert channel, the receiver first primes the cache to fill the cache with its own cache lines. The sender then accesses certain addresses, evicting some of the receiver's cache lines. The receiver can get to know which cache lines the sender accessed by loading each cache line and observing cache misses. In a flushflush [40] covert channel, the receiver keeps evicting certain addresses by executing the flush instruction. If the sender accesses some of these addresses and brings them into the cache, the time to flush will be longer, so the receiver can infer information from timing the second flush. Many other types of covert or side channels, not using caches, nor timing, are also possible. ## III. SPECULATIVE EXECUTION ATTACKS We first present some critical attack steps that we have identified in existing speculative execution attacks. ## A. Critical Attack Steps Although the exact workflow of an attack may vary, we observe that they all consist of 6 critical steps. These are shown in the right column of Fig. 2 and described below. Setup. The Setup step sets up the initial hardware state, e.g., the branch predictor state for Spectre v1, so that the processor will enter speculative execution. It also sets up the initial state for the covert channel, e.g., flushing the shared cache lines for a flush-reload channel. Authorize. The attack starts with the Authorize step. The Authorize operation performs the authorization required for accessing a memory location or a protected register. For speculative attacks, the speculative execution window starts when the authorization is delayed. Access. When the authorization is delayed, the Access step in a speculative attack can read a secret from the cache, the memory, a protected register or a microarchitectural buffer that is otherwise not allowed. Use. The Use step uses the secret to generate a secretdependent operation. Examples are instructions that compute a memory address for a later load operation. Send. The Send step alters the microarchitectural state of the covert channel in a secret-dependent way. Even if the access , use and send operations will all be squashed after the authorization fails, the microarchitectural state change may remain and can be discovered later by the receiver. Receive. The recovery of the secret from the covert channel by the attacker. ## B. A Spectre v1 Attack Example For concreteness, let us first consider a particular speculative attack, the Spectre v1 attack. In Fig. 2, we show the pseudo-code of the Spectre v1 attack and the RISC assembly instructions executed during speculative execution. Lines 1-3 set up the microarchitectural state. The cache lines containing the shared array pointed by SHAREDPtr are flushed from caches as the preparation for the flush-reload cache covert channel which we described in Section II. The size of the private array pointed to by arrayPtr is also flushed so that the load on line 4 will take a long time to finish. Also, the branch predictor is mistrained so that the prediction of the conditional branch in line 5 will be 'not taken'. The conditional branch, bge, in line 5 performs the authorization for the later load byte instruction, lbu, which accesses the secret byte in line 7. Since the conditional branch checking is delayed by the previous load instruction in line 4, a branch predictor is invoked. Due to the mistraining, the branch is not taken and the secret is illegally accessed by the lbu instruction. In line 8 and 9, the secret is then used to calculate a memory address of the next ld instruction in line 10. This ld instruction is a covert send Fig. 2: Spectre v1 attack (bypassing array bounds checking). The assembly code and pseudo code of the attack bypass control flow authorization by a conditional branch to access a secret. The attack leaks an 8-bit secret through the most commonly used flush-reload cache covert channel by loading in a cache line in the shared array into the cache. The code in red is the transient execution that will be squashed. The comments after the arrows show the high-level language equivalents of the assembly code. <details> <summary>Image 2 Details</summary> ![6e040421](/v1/image/6e0404214a1d44e6cbfee5fd24339c81a502c95f731d8af50219cbf7f110d035) ### Visual Description ## Code Snippet with Critical Steps ### Overview The image presents a code snippet, likely pseudo-code, alongside a description of the critical steps involved in its execution. The code appears to be related to a Spectre v1 vulnerability demonstration. It is divided into sections for setup, sender, and receiver, with corresponding critical steps labeled. ### Components/Axes The image is structured as a table with two columns: "Pseudo-code" and "Critical Steps". **Pseudo-code Column:** * Contains code snippets and comments. * Line numbers are present on the left side of the code. **Critical Steps Column:** * Describes the purpose of the corresponding code section. ### Detailed Analysis or ### Content Details Here's a breakdown of the code and critical steps: **Setup:** * **Line 1:** `flushArray(SHAREDPtr, 256);` - Flushes an array. * **Line 2:** `flush(&array_size);` - Flushes the address of `array_size`. * **Line 3:** `mistrainPredictor();` - Mistrains the predictor. * **Critical Step:** Setup **Spectre v1 Sender:** * **Line 4:** `ld r5, 0(&array_size)` - Loads the value of `array_size` into register `r5`. * Registers used: r2 (SHAREDPtr), r3 (offset to a secret), r4 (arrayPtr) * **Line 5:** `bge r3, r5, outside --> if (x < array_size) {` - Branch if `r3` is greater than or equal to `r5` to `outside`. This acts as a conditional check, simulating `if (x < array_size)`. * Critical Step: (long-latency) Authorize * **Line 6:** `add r6, r3, r4` - Adds the values in registers `r3` and `r4` and stores the result in `r6`. * **Line 7:** `lbu r7, 0(r6) --> y = arrayPtr[x];` - Loads a byte from the memory location pointed to by `r6` into register `r7`. This is equivalent to `y = arrayPtr[x]`. * Critical Step: Access * **Line 8:** `slli r7, r7, 12` - Shifts the value in `r7` left by 12 bits. * **Line 9:** `add r8, r2, r7` - Adds the values in registers `r2` and `r7` and stores the result in `r8`. * **Line 10:** `ld r9, 0(r8) --> z = SHAREDPtr[y*4096];` - Loads the value from the memory location pointed to by `r8` into register `r9`. This is equivalent to `z = SHAREDPtr[y*4096]`. * Critical Step: Send * `outside:` - Label indicating the target of the branch instruction. **Receiver:** * **Line 11:** `for i from 0 to 255` - A loop that iterates from 0 to 255. * **Line 12:** `t[i] = TimeToReload(SHAREDPtr[i*4096]);` - Measures the time it takes to reload `SHAREDPtr[i*4096]` and stores it in `t[i]`. * **Line 13:** `Find the minimal t[i]` - Finds the minimum value in the `t[i]` array. * **Critical Step:** Receive ### Key Observations * The code simulates a Spectre v1 attack by accessing memory locations based on a potentially out-of-bounds index (`x`). * The `mistrainPredictor()` function suggests an attempt to manipulate the branch predictor. * The receiver code measures the time it takes to access different memory locations, which can be used to infer the value of the secret. * The red text highlights the core operations related to the vulnerability. ### Interpretation The code demonstrates a simplified version of the Spectre v1 attack. The sender code attempts to read a value from memory based on a potentially out-of-bounds index `x`. The branch predictor might incorrectly predict the branch outcome, allowing the code to access memory locations that it should not be able to access. The receiver code then measures the time it takes to access different memory locations, which can be used to infer the value of the secret. The `TimeToReload` function is crucial as it exploits the cache timing differences to reveal the accessed memory location. The overall goal is to leak information from a protected memory region by exploiting speculative execution and cache timing. </details> | Pseudo-code | Critical Steps | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------| | Setup microarchitectural states: flushArray(SHAREDPtr, 256); flush(&array_size); mistrainPredictor(); Spectre v1 Sender: (r2: SHAREDPtr, r3: offset to a secret, r4: arrayPtr) ld r5, 0(&array_size) bge r3, r5, outside add r6, r3, r4 lbu r7, 0(r6) slli r7, r7, 12 add r8, r2, r7 ld r9, 0(r8) --> if (x < array_size) { --> y = arrayPtr[x]; --> z = SHAREDPtr[y*4096]; 1 2 3 4 5 6 7 8 9 10 | Setup Authorize Access Use Send | | | (long-latency) | | outside: } | | | Receiver: for i from 0 to 255 t[i] = TimeToReload(SHAREDPtr[i*4096]); Find the minimal t[i] 11 12 13 | Receive | instruction that leaks out the secret through the cache covert channel. In line 11-13, the receiver measures the latency to access the shared array to find out which memory address in the shared array was accessed by the sender. The memory address that hits in the cache leaks the secret. ## C. Other Attacks Table II gives a listing of the speculative attacks published to date [3]-[6], [41]-[59]. We show their Common Vulnerabilities and Exposures (CVE) numbers, description and publication date. All the attack variants in Table II, except for the last speculative interference attack, introduce a new way to bypass authorization to access the secret. The speculative interference attack introduces a new way to change the timing of non-speculative instructions, which adds a new dimension to the covert Send operation. Hardware features for malicious speculative execution. In Fig. 3, we show the hardware features that can be exploited to launch malicious speculative execution attacks, especially to access a secret. The first major category of features causing misprediction include the conditional branch prediction, the prediction for branch target address and the memory disambiguation. Spectre v1 [3] attack mistrains the conditional branch for bounds checking to read an out-of-bounds secret. Spectre v1.1 [41] also uses misprediction for conditional branch to bypass bounds checking but performs an out-of-bounds write during speculative execution. Even if the write to memory will not become visible, the write may change a jump target, e.g., the return address, and execute an Access -Use -Send gadget (i.e., TABLE II: The Speculative (Transient) Execution Attack variants. Date is year.month of publication. | Attack | CVE | Description | Date | |---------------------------------------|----------------------|-------------------------------------------------------------|---------------| | Spectre v1 [3] | 2017-5753 | Speculative boundary check bypass for read | 2018.1 | | Spectre v1.1 [41] | 2018-3693 | Speculative boundary check bypass for write | 2018.7 | | NetSpectre [42] | 2017-5753 | Remote attack performing a bounds check bypass | 2018.1 | | Spectre v2 [3] Spectre RSB [43], [44] | 2017-5715 2018-15572 | Branch target misprediction Return target misprediction | 2018.1 2018.8 | | Spectre SSB [45] | 2018-3639 | Speculative store bypass, read stale data in memory | 2018.5 | | Meltdown-Reg (Spectre v3a) [46] | 2018-3640 | System register value leakage to unprivileged attacker | 2018.5 | | Lazy FP [47] | 2018-3665 | Leak of FPU state | 2018.6 | | Meltdown (Spectre v3) [4] | 2017-5754 | Kernel content leakage to unprivileged attacker | 2018.1 | | Foreshadow (L1 Terminal Fault) [5] | 2018-3615 | SGX enclave memory leakage | 2018.8 | | Foreshadow-OS [6] | 2018-3620 | OS memory leakage | 2018.8 | | Foreshadow-VMM [6] | 2018-3646 | VMM memory leakage | 2018.8 | | Spectre v1.2 [41] | N/A | Speculative write to read-only memory | 2018.7 | | RIDL/MLPDS [48], [49] | 2018-12127 | MDS leakage from load port | 2019.5 | | RIDL/ZombieLoad/ MFBDS [48]-[50] | 2018-12130 | MDS leakage from line fill buffer | 2019.5 | | Fallout/MSBDS [49], [51] | 2018-12126 | MDS leakage from store buffer | 2019.5 | | TAA [52] | 2019-11135 | TSX Asynchronous Abort | 2019.11 | | RIDL/MDSUM [48], [49] | 2019-11091 | MDS leakage from uncacheable memory | 2019.5 | | | 2020-0548 | Sampling | 2020.1 | | VRS [53] CacheOut/L1DES [54], [55] | 2020-0549 | Vector Register | 2020.1 | | CROSSTALK/ SRBDS [56], [57] | 2020-0543 | L1D Eviction Sampling Special Register Buffer Data Sampling | 2020.6 | | LVI [58] | 2020-0551 | Load Value Injection causing memory disclosure | 2020.3 | | Speculative Interference | | Speculative interference on non- | | | [59] | N/A | speculative instructions | 2020.9 | a code snippet) as we show in lines 7-10 in Fig. 2 to read and leak a secret. NetSpectre [42] shows that the mistraining of the conditional branch predictor can be performed remotely. Another control-flow misprediction based attack is the Spectre v2 attack [3], which injects a malicious target into the branch target buffer (BTB) for indirect branches. Similarly, the Spectre RSB attack [43], [44] injects wrong return addresses into the return stack buffer (RSB) for function returns. Both can cause information leakage by directing the control flow to an Access -Use -Send gadget. Memory disambiguation checks whether the value written by a previous store instruction, which has not yet been written back to the cache-memory system, should be forwarded to a later load instruction that reads from the same address. In the Speculative Store Bypass (Spectre SSB) attack, if the store address has not been computed and the processor predicts that the addresses of the current load and a previous store are different, then stale data, which can be a secret, can be loaded from the memory system to the processor and get leaked out. The second major category of hardware features exploited consists of an illegal access that reads a secret and forwards it to dependent instructions before it is squashed. We call these 'faulty access and aggressive forwarding' attacks. The first type of attacks transiently bypasses permission checks of special registers and delays the exception handling. MeltdownReg [46] can read the system parameter stored in a system register while LazyFP [47] leaks the stale floating-point unit (FPU) state of a previous domain that is not cleared until first used in a new context. The second type of faulty access attacks transiently violate memory access permission checking and reads illegal data Fig. 3: Taxonomy of secret access (bypassed authorization and secret access steps). The third and fourth rows show the hardware mechanisms used to trigger the transient execution. They correspond to delayed Authorize operations that are temporarily bypassed. The last row shows the attacks that exploit these hardware features. These are listed in the same order as in Table I, from left to right. <details> <summary>Image 3 Details</summary> ![cf22f587](/v1/image/cf22f587cf73227d5be3684d1f1e2ced1a6b6e9b86a4d0308a574cc1eb7a837c) ### Visual Description ## Vulnerability Tree: Speculative Execution Vulnerability ### Overview The image is a tree diagram illustrating the classification of speculative execution vulnerabilities. The diagram branches from the general category of "Speculative Execution Vulnerability" into two main categories: "Misprediction" and "Faulty Access & Aggressive Forwarding". Each of these categories is further subdivided into specific types of vulnerabilities and their corresponding exploits, along with associated reference numbers. ### Components/Axes * **Root Node:** Speculative Execution Vulnerability * **First Level Branches:** * Misprediction * Faulty Access & Aggressive Forwarding * **Second Level Branches (under Misprediction):** * Conditional Branch * Branch Target Address * Memory Disambiguation * Register Permission Bypass * **Second Level Branches (under Faulty Access & Aggressive Forwarding):** * Memory Permission Bypass * Illegal Forwarding from Microarchitectural Buffer * **Third Level Branches (under Memory Permission Bypass):** * Meltdown [4] * Foreshadow [5] * Foreshadow-OS [6] * Foreshadow-VMM [6] * Spectre v1.2 [41] * **Third Level Branches (under Illegal Forwarding from Microarchitectural Buffer):** * Microarchitectural Data Sampling (MDS) * Value Injection * **Fourth Level Branches (under Microarchitectural Data Sampling (MDS)):** * MLPDS (RIDL) [48, 49] * MFBDS (RIDL, ZombieLoad) [48-50] * MSBDS (Fallout) [49, 51] * TAA [52], MDSUM (RIDL) [48, 49] * VRS [53], LIDES (CacheOut) [54, 55] * SRBDS (CROSSTALK) [56, 57] * **Fourth Level Branches (under Value Injection):** * LVI [58] * **Third Level Branches (under Conditional Branch):** * Spectre v1 [3] * Spectre v1.1 [41] * NetSpectre [42] * **Third Level Branches (under Branch Target Address):** * Spectre v2 [3] * Spectre RSB [43, 44] * **Third Level Branches (under Memory Disambiguation):** * Spectre SSB [45] * **Third Level Branches (under Register Permission Bypass):** * Meltdown-Reg [46] * LazyFP [47] ### Detailed Analysis or Content Details The diagram presents a hierarchical breakdown of speculative execution vulnerabilities. It starts with the broad category and then branches into more specific types. Each vulnerability type is associated with specific exploits or attacks, indicated by names like "Spectre," "Meltdown," "Foreshadow," and "LVI," along with corresponding reference numbers in square brackets. * **Speculative Execution Vulnerability** is the root of the tree. * **Misprediction** branches into: * **Conditional Branch:** Includes Spectre v1 [3], Spectre v1.1 [41], and NetSpectre [42]. * **Branch Target Address:** Includes Spectre v2 [3] and Spectre RSB [43, 44]. * **Memory Disambiguation:** Includes Spectre SSB [45]. * **Register Permission Bypass:** Includes Meltdown-Reg [46] and LazyFP [47]. * **Faulty Access & Aggressive Forwarding** branches into: * **Memory Permission Bypass:** Includes Meltdown [4], Foreshadow [5], Foreshadow-OS [6], Foreshadow-VMM [6], and Spectre v1.2 [41]. * **Illegal Forwarding from Microarchitectural Buffer:** Branches into: * **Microarchitectural Data Sampling (MDS):** Includes MLPDS (RIDL) [48, 49], MFBDS (RIDL, ZombieLoad) [48-50], MSBDS (Fallout) [49, 51], TAA [52], MDSUM (RIDL) [48, 49], VRS [53], LIDES (CacheOut) [54, 55], and SRBDS (CROSSTALK) [56, 57]. * **Value Injection:** Includes LVI [58]. ### Key Observations * The diagram provides a structured overview of speculative execution vulnerabilities. * It categorizes vulnerabilities based on their root cause (Misprediction vs. Faulty Access & Aggressive Forwarding). * It lists specific exploits associated with each vulnerability type, along with reference numbers. * The diagram highlights the complexity of speculative execution vulnerabilities and the variety of attacks that can exploit them. ### Interpretation The diagram serves as a taxonomy of speculative execution vulnerabilities, illustrating the relationships between different types of vulnerabilities and their corresponding exploits. It demonstrates that speculative execution vulnerabilities can arise from various sources, including mispredictions and faulty access/forwarding mechanisms. The diagram is useful for understanding the landscape of speculative execution vulnerabilities and for identifying potential attack vectors. The reference numbers likely point to research papers or other documentation that provide more detailed information about each vulnerability and exploit. The presence of multiple exploits under each category suggests that these vulnerabilities can be exploited in various ways. The diagram emphasizes the importance of addressing speculative execution vulnerabilities to ensure the security of computer systems. </details> with a memory access instruction. Meltdown [4] reads and leaks kernel data before the execution is squashed due to the failed supervisor permission check of the secret access. The Foreshadow (L1 terminal fault) attack variants [5], [6] exploit loads which do not have a valid virtual address to physical address mapping. The address translation will abort prematurely by returning a partially translated address. If a secret at this incorrect address is present in the L1 cache, it can be speculatively accessed and leaked out. The leaked data can be a secret in an SGX enclave (Foreshadow), in the kernel space (Foreshadow-OS) or in the virtual machine monitor space (Foreshadow-VMM). Spectre v1.2 attack [41] transiently bypasses the read/write permission and writes to a read-only address. The illegal write can trigger an Access -Use -Send gadget to leak a secret if it is a branch target. The more recent type of attacks (in 2019 and 2020) exploit the hardware vulnerability that some stale data, which is stored in microarchitectural buffers can be read by a load that will cause a fault or invoke a microcode assist [49]. The data can belong to another security domain and can be at a different address from the address the faulting load is accessing. This type of attack is called a microarchitectural data sampling (MDS) attack. In an MDS attack, the victim program first executes and accesses a secret. The secret can be temporarily stored in a microarchitectural buffer when it is in-flight. However, the stale secret value can be forwarded to a faulting or microcode-assisted load issued by the MDS attacker which then sends it out through a covert channel. Microarchitectural buffers that have been shown to store stale secret values include the load port, the line fill buffer and the store buffer, which we show in Fig. 1. The load port temporarily stores the data when it is read by a load operation and being written into a register. The line fill buffer stores a memory line that missed in the L1 data cache and is being returned from the L2 cache [49]. The store buffer stores the data and addresses of store operations to be written to the L1 data cache. RIDL [48] leaks the secret stored in the load port called Microarchitectural Load Port Data Sampling (MLPDS) [49] and the line fill buffer called Microarchitectural Fill Buffer Data Sampling (MFBDS) [49]. ZombieLoad [50] demonstrates more variants of the line fill buffer leakage (MFBDS), whose secret access is triggered by a microcode assist. Fallout [51] leaks the secret stored in the store buffer called Microarchitectural Store Buffer Data Sampling (MSBDS) [49]. A vulerability similar to MDS is the TSX Asynchronous Abort (TAA) [52] in Intel processors. If the Intel TSX atomic execution is aborted, uncompleted loads in the transaction may also read a secret from the microarchitectural buffers exploited by MDS and leak it through a covert channel. The MDS and TAA techniques give rise to more attacks. Uncacheable memory accesses [48], [49] can bring data into the buffers mentioned above, which can be accessed using MDS or TAA technqiues and cause the Microarchitectural Data Sampling Uncacheable Memory (MDSUM) attack. The Vector register sampling (VRS) vulnerability [53] allows part of the previously accessed vector register values to be sent to the store buffer and get leaked by an MSBDS-type attacker. The CacheOut [54] or L1D eviction sampling (L1DES) vulnerability [55] shows that the modified data recently evicted from the L1 data cache can be kept in the line fill buffer, which gives an MFBDS-type attacker the chance to read and leak it. In the CrossTalk [56] or special register buffer data sampling (SRBDS) attack [57], the secret value read from certain special registers can be stored in shared buffers and later propagated to the line fill buffer. The secret can be leaked to an MFBDStype attacker who can even be from a different core. We refer to all the above MDS-related attacks as MDS attacks in Fig. 3. The other type of microarchitectural buffer related attack, i.e., the load value injection (LVI) attacks [58], explore injecting values to the victim domain to trigger speculation. The attacker first places his malicious data in the microarchitectural buffers and lets the victim access the malicious value through the MDS vulnerabilities. If the malicious value is used by the victim as an address to read a secret or a jump address to an Access -Use -Send gadget, the secret can be leaked. ## D. Covert Channels for Send Operation Microarchitectural covert channels are used to transmit the secret that has been illegally accessed. In Fig. 4, we show three Fig. 4: Different ways to leak a secret through a Send operation. The speculative interference attack [59] achieves the final covert Send through a non-speculative instruction. <details> <summary>Image 4 Details</summary> ![4bf71bbc](/v1/image/4bf71bbc57bc06bb4ccb14cab8f477cefd4e125ec80d6c0190919adfa0414488) ### Visual Description ## Diagram: Covert Sending Classification ### Overview The image is a hierarchical diagram illustrating the classification of "Covert Sending" techniques. It branches into three main categories: Speculative, Hybrid, and Non-speculative. Each of these categories further branches into specific methods or attacks. ### Components/Axes * **Top-Level Node:** "Covert Sending" - This is the root of the classification. * **Second-Level Nodes:** * "Speculative" - A category of covert sending. * "Hybrid" - A category of covert sending. * "Non-speculative" - A category of covert sending. * **Third-Level Nodes:** * Under "Speculative": "Cache, AVX, port contention..." * Under "Hybrid": "Speculative Interference [59]" * Under "Non-speculative": "Side-channel Attack" ### Detailed Analysis The diagram shows a top-down classification. * **Covert Sending** is at the top. * It branches into three categories: Speculative, Hybrid, and Non-speculative. * Each of these categories has a specific example or method associated with it: * Speculative includes "Cache, AVX, port contention..." * Hybrid includes "Speculative Interference [59]" * Non-speculative includes "Side-channel Attack" ### Key Observations * The diagram provides a clear hierarchical structure for understanding different types of covert sending techniques. * The "[59]" notation next to "Speculative Interference" likely refers to a citation or reference. * The "..." after "Cache, AVX, port contention..." indicates that the list is not exhaustive. ### Interpretation The diagram illustrates a taxonomy of covert sending methods. It categorizes these methods based on whether they are speculative, hybrid, or non-speculative. The specific examples provided under each category give insight into the types of techniques that fall under each classification. The diagram is useful for understanding the different approaches to covert sending and how they relate to each other. </details> types of covert Send operations. These are through speculative instructions, both speculative and non-speculative instructions (hybrid), and only non-speculative instructions. Most of the existing speculative execution attacks are in the first category, executing a speculative Send operation to cause a secretdependent state change in the covert channel that can be recovered later by the receiver. The cache covert channel is the most commonly used channel. Examples of other covert channels include the execution time of AVX instructions [42], port contention [60] and the cache way predictor [61]. The recently discovered speculative interference attack [59] leaks the secret through non-speculative instructions by changing the timing of non-speculative instructions with the speculatively executed instructions. In Fig. 4, we characterize it as doing a hybrid two-step covert sending. In the first step, the speculative execution causes a secret-dependent hardware unit usage, affecting the timing of non-speculative instructions. In the second step, the timing information of non-speculative instructions can leak the secret. Examples include using the speculative 1) miss status handling register (MSHR) or 2) execution unit contention (first step) to change the timing of a non-speculative load (second step). Essentially, the two examples exploit two different covert channels in the first step, rather than the commonly used flush-reload cache channel. If the Send operation is purely non-speculative as shown in the last case of Fig. 4, the attack becomes a side-channel attack, especially when both Access and Send operations are also non-speculative. This means the program has side-channel vulnerability that allows the secret access and the operation causing a secret-dependent microarchitectural state change, which is beyond the scope of speculative execution attacks. Takeaway from attack analysis. The important observation we make is that the critical attack steps in Section III-A hold for all speculative execution attacks, not just for the Spectre v1 attack. Moreover, any valid combination of delayed authorization, speculative secret access and a covert channel can form a new attack variant. Based on this characterization of speculative attacks, we propose four defense strategies that prevent these speculative execution attacks from succeeding. ## IV. DEFENSE STRATEGIES We propose a taxonomy of defenses depending on the attack step prevented, shown in Fig. 5. We identify four defense strategies, each based on a security policy: - No Setup (Section IV-A): Setup is prevented so that either the malicious speculative execution cannot start or the covert channel state cannot be initialized. - No Access without Authorization (Section IV-B): Access cannot execute before the authorization is completed. - No Use without Authorization (Section IV-C): Access can execute but Use of a secret is blocked before the authorization is completed. - No Send without Authorization (Section IV-D): Both Access and Use can execute but no secret can be sent, before the secret access is authorized. The insight about No Access without Authorization is that while Authorize and Access may not have any data dependencies, they have a security dependency [37] since an access should not be allowed until it is authorized. Hence the No Access without Authorization security policy prevents the security breach. Given that Access , Use and Send are a chain of 3 data-dependent instructions, No Use without Authorization and No Send without Authorization defense strategies can be understood as enforcing the protection at a later stage to try to reduce the performance overhead. We will describe representative defense proposals for each of these defense strategies. ## A. No Setup There are two ways to prevent the Setup step. A defense can prevent either the preparation of the covert channel state or the trigger for speculative execution. Both can be achieved with an isolation-based method shown in Fig. 5. The isolation method requires partitioning of otherwise shared hardware resources or flushing of a hardware resource if it is time-multiplexed. DAWG [15] partitions the cache lines using the domain id's and guarantees no interference through the cache replacement state. Context-sensitive fencing [17] implements a new micro-op to flush the branch target buffers (BTB) or return stack buffer (RSB) state when entering a different protection domain. MI6 [25] partitions the shared DRAM and last-level cache (LLC) resources between trusted enclaves and untrusted software and enables clearing any percore states such as branch predictors, L1 caches and TLBs, with a new instruction. IRONHIDE [26] implements a similar partitioning of LLC and memory resources and also a corelevel partitioning by reserving certain cores for a securitycritical program to reduce the cost of clearing per-core states. Encryption can be applied to hardware states to implement an obfuscation-based isolation defense. Predictor state encryption [28] encrypts the BTB or RAS state with a contextspecific secret when storing a new target address and decrypts it for usage. This prevents the attacker in another process from injecting malicious jump/return targets, without requiring the clearing of microarchitectural states. Such context-specific encryption can also be considered a form of isolation. However, note that these No Setup defenses usually require that the victim and the attacker come from different security domains, as the isolation-based method uses the domain information to allocate resources and enforce access control and Performance-enhancing features for hardware defenses: SDO [30], Clearing the Shadows [31], InvarSpec [32] <details> <summary>Image 5 Details</summary> ![5c745db2](/v1/image/5c745db2bf3ffa85cebeca6cfc88689e25c2ccede56c5827b86f8b0b4dc5b1f4) ### Visual Description ## Hierarchy Diagram: Hardware Defenses of Speculative Execution Attacks ### Overview The image is a hierarchy diagram illustrating different hardware defenses against speculative execution attacks. The diagram branches out from a central title into four main categories, which are further subdivided into specific defense mechanisms. Each defense mechanism is listed with associated citations in square brackets. ### Components/Axes * **Title:** Hardware Defenses of Speculative Execution Attacks * **First Level Categories:** * No Setup * No Access w/o Authorization * No Use w/o Authorization * No Send w/o Authorization * **Second Level Categories (under "No Use w/o Authorization"):** * Basic No Use * No Sensitive Use * Delay * Prevent * Shadow Structure * Roll-back * Isolation * **Defense Mechanisms (listed under each category):** * Isolation (under "No Setup"): DAWG [15], CSF-clearing [17], MI6 [25], IRONHIDE [26], Predictor state encryption [28] * CSF-LFENCE [17] (under "No Access w/o Authorization") * Basic No Use (under "No Use w/o Authorization"): NDA [23], SpectreGuard [18], ConTEXT [27], SpecShieldERP [21] * No Sensitive Use (under "No Use w/o Authorization"): SpecShield-ERP+ [21], STT [22] * Delay (under "No Use w/o Authorization"): CondSpec [16], EfficientSpec [20], DOLMA [33] * Prevent (under "No Send w/o Authorization"): CSF-CFENCE [17] * Shadow Structure (under "No Send w/o Authorization"): InvisiSpec [14], SafeSpec [19], MuonTrap [29] * Roll-back (under "No Send w/o Authorization"): Cleanup-Spec [24] * Isolation (under "No Send w/o Authorization"): DAWG [15], CSF-clearing [17], MI6 [25], IRONHIDE [26] ### Detailed Analysis or ### Content Details The diagram is structured as a tree, with the main title at the top and the categories branching downwards. Each category is represented by a rectangular box, and the connections between the boxes indicate the hierarchical relationships. The defense mechanisms are listed below their respective categories, each followed by a citation in square brackets. * **No Setup:** This category includes defenses that do not require any initial setup. The listed mechanisms are Isolation, DAWG [15], CSF-clearing [17], MI6 [25], IRONHIDE [26], and Predictor state encryption [28]. * **No Access w/o Authorization:** This category includes defenses that prevent access without authorization. The listed mechanism is CSF-LFENCE [17]. * **No Use w/o Authorization:** This category includes defenses that prevent use without authorization. It is further divided into Basic No Use and No Sensitive Use. * **Basic No Use:** NDA [23], SpectreGuard [18], ConTEXT [27], SpecShieldERP [21] * **No Sensitive Use:** SpecShield-ERP+ [21], STT [22] * **Delay:** CondSpec [16], EfficientSpec [20], DOLMA [33] * **No Send w/o Authorization:** This category includes defenses that prevent sending data without authorization. * **Prevent:** CSF-CFENCE [17] * **Shadow Structure:** InvisiSpec [14], SafeSpec [19], MuonTrap [29] * **Roll-back:** Cleanup-Spec [24] * **Isolation:** DAWG [15], CSF-clearing [17], MI6 [25], IRONHIDE [26] ### Key Observations * The "Isolation" defense mechanism appears under both "No Setup" and "No Send w/o Authorization" categories, suggesting it can be implemented in different ways. * The citations in square brackets likely refer to research papers or publications that describe the respective defense mechanisms. * The diagram provides a structured overview of different approaches to hardware-based defenses against speculative execution attacks. ### Interpretation The diagram provides a taxonomy of hardware defenses against speculative execution attacks. It categorizes these defenses based on their operational principles, such as preventing unauthorized access, use, or data transmission. The inclusion of specific defense mechanisms with citations allows for further research and understanding of each approach. The hierarchical structure helps to organize and classify the various defenses, making it easier to compare and contrast their strengths and weaknesses. The repetition of "Isolation" under different categories suggests that this defense strategy can be applied in multiple contexts. </details> Fig. 5: Taxonomy of hardware defenses. The second row shows the 4 defense strategies. The third row shows the child defense categories under each strategy. The fourth row shows the proposed hardware defenses belonging to each defense category. <details> <summary>Image 6 Details</summary> ![96e917ef](/v1/image/96e917ef74c8c1bf252b511efe13a363a998565baacc88f28ea66bc8c175fa11) ### Visual Description ## Diagram: Out-of-Order Execution Pipeline ### Overview The image is a block diagram illustrating the pipeline of an out-of-order execution processor. It shows the flow of instructions from the in-order frontend, through the out-of-order execution core, and finally to the in-order retire stage. The diagram highlights the presence of "fence" instructions and their impact on the pipeline. ### Components/Axes The diagram consists of the following key components: 1. **In-order Frontend:** * **Inst Fetch:** Instruction Fetch unit. * **Decode:** Instruction Decode unit. 2. **μOp's:** Micro-operations buffer. Contains entries for: * `fence` (highlighted in orange) * `Id` (Load instruction) * `...` (Indicates more entries) 3. **Out-of-order Execution:** * **Scheduler:** Schedules micro-operations for execution. * **Load/Store Buffer:** * `older_access` * `fence` (highlighted in orange) * `Id` (Load instruction) * `...` (Indicates more entries) * **ALU...:** Arithmetic Logic Unit (and other execution units). * **Cache/Mem:** Cache and Memory system. 4. **In-order Retire:** In-order retirement stage. ### Detailed Analysis * **Flow of Instructions:** Instructions flow from left to right, starting with the In-order Frontend, then to the μOp's buffer, then to the Out-of-order Execution core, and finally to the In-order Retire stage. * **In-order Frontend:** The In-order Frontend consists of the Instruction Fetch and Decode units. Instructions are fetched and decoded in order. * **μOp's Buffer:** The decoded instructions are stored as micro-operations (μOp's) in a buffer. The diagram explicitly shows "fence" and "Id" (Load) instructions. The "fence" instruction is highlighted in orange. * **Out-of-order Execution:** The Scheduler selects and dispatches micro-operations for execution based on data dependencies and resource availability. The Load/Store Buffer manages memory operations. The ALU performs arithmetic and logical operations. The Cache/Mem system provides access to memory. The Load/Store Buffer also contains "fence" and "Id" instructions, with "fence" highlighted in orange. * **In-order Retire:** The In-order Retire stage ensures that instructions are retired in the order they were fetched, maintaining program order. * **Fence Instructions:** The "fence" instructions are highlighted in orange and appear in the μOp's buffer and the Load/Store Buffer. An orange arrow bypasses the Out-of-order Execution block, going directly to the In-order Retire block. This suggests that "fence" instructions enforce ordering constraints and may bypass the out-of-order execution core. ### Key Observations * The diagram emphasizes the out-of-order execution core, which allows instructions to be executed in a different order than they were fetched, improving performance. * The "fence" instructions are treated specially, potentially bypassing the out-of-order execution core to enforce memory ordering. ### Interpretation The diagram illustrates a typical out-of-order execution pipeline with a focus on how "fence" instructions are handled. "Fence" instructions are synchronization primitives that enforce memory ordering constraints. The diagram suggests that "fence" instructions may bypass the out-of-order execution core to ensure that memory operations are performed in the correct order. This is crucial for maintaining program correctness in multi-threaded environments. The orange highlighting and bypass arrow emphasize the special handling of "fence" instructions in the pipeline. The presence of "fence" instructions in both the μOp's buffer and the Load/Store Buffer indicates that they affect both instruction scheduling and memory access ordering. </details> Fig. 6: Inserting fences to stall the speculative execution of loads. the encryption-based method uses the same key for a certain domain. The same-domain attack, e.g., NetSpectre [42], cannot be mitigated with these techniques. ## B. No Access Without Authorization To prevent a security breach, we should prevent the secret Access before the authorization is completed. Software solutions can insert memory barriers such as the lfence in the x86 ISA to defeat speculative attacks, but they require re-compilation or post-processing of the binary [62]. Also, significant performance overhead is incurred with these software fences. A hardware defense can also prevent the secret access by automatically inserting a fence micro-op. Hardwareinserted fences have the advantage of non-intrusive protection and much lower overhead. The Context-Sensitive Fencing (CSF) defense proposed in [17] is shown in Fig. 6. It uses customizable decoding from software instructions to hardware micro-operations to insert hardware fences after a conditional branch instruction before a subsequent load instruction. To defeat the Spectre v1 attack, CSF-LFENCE can place a fence between these two instructions. As no secret data is accessed in the first place, the No Access without Authorization defense provides strong protection that is independent of the type of covert channel used to exfiltrate the data. ## C. No Use without Authorization Hardware defenses can allow the secret access but prevent its usage in subsequent execution. This improves performance Fig. 7: Hardware modification to support No Use without Authorization . <details> <summary>Image 7 Details</summary> ![86ecb2cd](/v1/image/86ecb2cdd8f91375c95839481ff3269fb67cfc38e32c8c2f9c0c3bf319dcb898) ### Visual Description ## Diagram: Out-of-Order Execution Pipeline ### Overview The image is a diagram illustrating the architecture of an out-of-order execution pipeline in a processor. It shows the flow of instructions through different stages, including the in-order frontend, out-of-order execution core, and in-order retire stage. ### Components/Axes * **In-order Frontend:** This block is located on the left side of the diagram. It contains two sub-blocks: * Inst Fetch: Instruction Fetch unit. * Decode: Instruction Decode unit. * **Out-of-order Execution:** This block is located in the center of the diagram. It contains the following sub-blocks: * Scheduler * Load Buffer * Store Buffer * ALU... (multiple Arithmetic Logic Units) * Load port * Cache/Memory * **In-order Retire:** This block is located on the right side of the diagram. It contains a table labeled "ROB" (Re-Order Buffer) with the following columns: * Inst Info * Done * Squash * Auth (partially shaded in orange) * Rows are labeled as: * μop[0] * μop[1] * ... * μop[N] * **Arrows:** Arrows indicate the flow of instructions and data between the different stages. * An arrow goes from "Inst Fetch" to "Decode". * An arrow goes from "Decode" to "Scheduler". * Arrows go from "Scheduler" to "Load Buffer", "Store Buffer", and "ALU...". * An arrow goes from "Load port" to "Cache/Memory". * An arrow goes from "Cache/Memory" back to "Load port". * An arrow goes from "Scheduler" to "Load port". * An arrow goes from "ALU..." to "In-order Retire" with a red "X" on the arrow. * An arrow goes from "In-order Retire" to "Scheduler" labeled "Result Forwarding". * An arrow goes from "Load Buffer" to "In-order Retire" labeled "Result Forwarding". * An arrow goes from "Store Buffer" to "In-order Retire" labeled "Result Forwarding". ### Detailed Analysis or Content Details The diagram illustrates the flow of instructions through a typical out-of-order execution pipeline. Instructions are fetched and decoded in the in-order frontend. The decoded instructions are then sent to the scheduler in the out-of-order execution core. The scheduler determines when instructions are ready to be executed based on data dependencies and resource availability. Instructions are then dispatched to the appropriate execution units (Load Buffer, Store Buffer, ALU). The results of the executed instructions are then forwarded to the in-order retire stage, where they are committed to the architectural state in the original program order. The ROB (Re-Order Buffer) is used to track the status of instructions and ensure that they are retired in the correct order. ### Key Observations * The diagram highlights the key components of an out-of-order execution pipeline. * The arrows clearly show the flow of instructions and data between the different stages. * The ROB table illustrates how instructions are tracked and retired in order. * The "Result Forwarding" path shows how results are fed back to the scheduler to reduce stalls. ### Interpretation The diagram provides a high-level overview of the architecture of an out-of-order execution pipeline. It demonstrates how instructions can be executed out of order to improve performance, while still maintaining the correct program order. The use of a scheduler and ROB allows the processor to dynamically reorder instructions and execute them in parallel, maximizing resource utilization and minimizing stalls. The "Result Forwarding" path is a critical optimization that allows instructions to receive results directly from the execution units, without having to wait for them to be written to memory. The red "X" on the arrow from "ALU..." to "In-order Retire" indicates a potential hazard or conflict that needs to be resolved before the instruction can be retired. </details> but still blocks the Use step in a speculative attack. We call it the No Use without Authorization defense strategy. This strategy requires modifying the feed-forward logic which forwards the result of a producer instruction to dependent instructions so that forwarding is allowed to later operations only when the producer instruction is completed and authorized. This can be achieved when both the Done and the new Auth bits are set in the ROB in Fig. 7. There are two subclasses of defenses in this category. The 'Basic No Use' defenses simply prevent the data forwarding to any dependent instructions. The 'No Sensitive Use' defenses improve the performance by only preventing the data forwarding to sensitive instruction types such as memory load instructions, which can be used to send cache covert channel signals, or for other known covert channels. Basic no use. An example of the 'Basic No Use' defense strategy is the NDA (Non-speculative Data Access) defense proposal [23]. This has many variants, based on which authorization checks and Access operations are considered. NDA-Permissive checks the resolution of conditional branch conditions and indirect branch addresses (first 2 columns in Fig. 3). NDA-Permissive-BR (Bypass Restriction) checks these and also checks memory address disambiguation (the third column in Fig. 3). These two NDA-Permissive variants protect accesses from the cache and memory and from special registers like control registers. There are also two NDA-Strict variants: NDA-Strict and NDA-Strict-BR. These are like their NDA-Permissive counterparts, except that they also prevent accesses of secrets that are already in the general-purpose registers. The NDA-Load variant further adds hardware to prevent the data forwarding from an Access operation until the instruction is retired, i.e., the instruction is at the head of the ROB queue and has its Authorization completed. This covers the first 5 columns in Fig. 3. Since NDA was proposed before the last two columns in Fig. 3, it is not known if it covers the attacks that do illegal forwarding from microarchitectural buffers. NDA-Full is the most secure variant, combining NDAStrict-BR with NDA-Load. SpectreGuard [18] is another example of a 'Basic No Use' defense. While it only discusses Spectre v1, its key contribution is providing the Linux OS interface to identify sensitive memory pages and mark these as non-speculative. Only data accessed from sensitive pages will not be forwarded during speculative execution, reducing the performance overhead. ConTExT [27] implements similar software support to mark secret data, which should not be used in speculative execution, as non-transient. In addition, ConTExT allows taint propagation in the processor to also taint the values derived from non-transient values. These tainted values cannot be used in speculative execution that happens in the future. SpecShield [21] also implements a 'Basic No Use' defense. It protects any secret in the memory which can be read by load operations. SpecShieldERP prevents data forwarding until the authorization of control flow, memory disambiguation and memory-related permission checking is completed and no violation is found. No sensitive use. Another variant in [21], SpecshieldERP+, implements a 'No Sensitive Use' defense policy by considering the same authorization of control-flow, memory disambiguation authorization and memory permission checking as SpecshieldERP, but only preventing the data forwarding to sensitive instructions like loads and branches. Speculative Taint Tracking (STT) [22] is another example of the 'No Sensitive Use' policy. STT further considers the covert channels due to implicit information flows and marks loads, branches, stores and data-dependent arithmetic instructions as being sensitive. To improve the performance, STT implements an efficient taint tracking mechanism to untaint authorized operations. STT has two variants, STT-Spectre and STT-Future. STT-Spectre considers only the authorization of control flow while STT-future tries to include potential future speculative attacks by deeming a load operation safe only when it reaches the head of the ROB or cannot be squashed. ## D. No Send without Authorization The No Send without Authorization defenses prevent sending a signal on a covert channel so that the secret cannot be recovered by the attacker, who is the receiver of the covert channel. This signal is sent by changing the microarchitectural state. The defenses under this strategy are usually specific to one or multiple covert channels. Below, we describe five ways to achieve this goal. Although related defense proposals have considered different sets of covert channels, the cache covert channel is the main target that is addressed by all defenses. Hence, we consider specifically the memory load instructions, which change the cache state, to explain these covert channels. Delay state change. The processor can delay the execution of a load when it needs to modify the cache state. An example is the Conditional Speculation (CondSpec) defense [16], where an unauthorized memory load that hits in the cache can read the data and complete its execution. However, a load that has a cache miss is held up to be re-issued later. The Efficient Invisible Speculative execution (EfficientSpec) defense [20] also implements this 'delay on miss' mechanism while adding a value predictor to provide a predicted value upon a cache miss. This is compared with the real value after the authorization is completed. The DOLMA defense [33] addresses a broader scope of covert channels including not only data caches but also TLBs, instruction caches and hardware predictor state covert channels. It delays both explicit state changes and the changes caused by implicit secret-dependent execution flow and by resource contention. DOLMA considers stores as well as loads as the Send operation. Prevent state change. The hardware can allow a speculative load to read the data but prevent the cache state change by making the load uncacheable. Context-sensitive fencing [17], with some variants implementing No Access without Authorization (Section IV-B), also provides a new type of fence, CFENCE, to implement No Send without Authorization . A load can execute before a previous CFENCE but it will be converted to a non-cacheable load when it causes a cache miss. This allows the data to be read while preventing the cache state change. The defense variant placing a CFENCE before every load is denoted by 'CSFCFENCE' in Fig. 5. Store speculative state in shadow structures. Visible cache state can be changed only on a successful authorization, by adding a shadow structure to hold the speculatively accessed cache lines. InvisiSpec [14] prevents the modification of the cache state, including the cache coherence state in the multiprocessor system, by extending the processor with a speculative buffer to store the speculatively accessed data. If the authorization is completed and verified, each speculative load will issue a second access to the same address and cause safe cache state change. If the authorization is completed but rejected, the load is squashed, and no modification is made to the cache state. One InvisiSpec variant, InvisiSpec-Spectre, deems a load unauthorized until all the control-flow predictions are verified. The other variant, InvisiSpec-Futuristic, deems a load unauthorized until it reaches the head of the reorder buffer (ROB) or it cannot be squashed. The SafeSpec defense [19] implements a similar shadow buffer to prevent the modification of both cache and translation lookaside buffer (TLB) states. The cache coherence state is not protected by SafeSpec. MuonTrap [29] adds the filter caches as the shadow buffers for I-cache, D-cache and TLB. The speculatively accessed TABLE III: The improvement in performance overhead by applying SDO, ClearShadow and InvarSpec to existing defenses. | Feature | Enhanced Defense | Category | Benchmark | Overhead | Overhead | |------------------|--------------------------|------------|-------------|------------------------------|------------| | | | | | Before | After | | SDO [30] | STT [22] | No Use | SPEC2017 | About 22% | 10.05% | | ClearShadow [31] | Delay on miss [20] | No Send | SPEC2006 | 9% faster than delay-on-miss | basic | | InvarSpec [32] | fence [14] | No Access | SPEC2006 | 199.3% | 101.9% | | InvarSpec [32] | fence [14] | No Access | SPEC2017 | 195.3% | 108.2% | | InvarSpec [32] | Delay on miss [16], [20] | No Send | SPEC2006 | 46.1% | 22.3% | | InvarSpec [32] | Delay on miss [16], [20] | No Send | SPEC2017 | 39.5% | 24.4% | | InvarSpec [32] | InvisiSpec [14] | No Send | SPEC2006 | 18.0% | 9.6% | | InvarSpec [32] | InvisiSpec [14] | No Send | SPEC2017 | 15.4% | 10.9% | entries are only stored in these and get cleared upon security domain switches. A key difference from previous work is that MuonTrap allows non-sensitive modification to the cache coherence state. In a MESI protocol, a speculative access can only be fetched in shared state and any sensitive action changing another cache line from M or E state to S or I state is delayed until it is authorized. Restore state change (Roll-back). The hardware can allow the cache state change during speculative execution but restore the old cache state if the authorization fails. CleanupSpec [24] prevents a speculative execution attack from modifying the cache state by restoring the cache state when the speculation is found to be wrong. Before the authorization is completed, CleanupSpec allows bringing new cache lines into the cache during speculative execution, but extends each memory request with its side-effect fields to track which cache line is fetched into the cache and which cache line is evicted from the L1 data cache, due to this unauthorized request. If a memory request needs to be squashed, a request is sent to invalidate any new cache line fetched during speculative execution, and bring back any cache line evicted speculatively from the L1 data cache. The L2 and last-level caches in CleanupSpec implement address encryption [63] to prevent eviction-based information leakage. Isolation of states between security domains. Assuming that the sender and the receiver are from different security domains, some isolation-based defenses that prevent Setup can also prevent the attacker from receiving the covert signaling. For example, the clearing of the branch predictor state can prevent mistraining in the Setup phase and also prevent the leakage through covert sending [64]. Hence, a defense can prevent two steps as a No Setup defense and a No Send without Authorization defense. ## E. Reducing Overhead of Defenses Techniques have been proposed to reduce the performance overhead of defenses described earlier. Table III shows the performance improvements they achieve. Speculative Data-Oblivious Execution (SDO) [30] allows an instruction, which may depend on a secret, to execute. For instance, a speculative load can access certain cache levels without making any state changes and the performance is improved if the data is found. SDO can be integrated with STT [22]. Clearing the Shadows (ClearShadow) [31] improves the performance by accelerating the computation of branch conditions TABLE IV: Performance numbers reported by existing work. The numbers may not be directly comparable as they are measured in different configurations. Numbers separated by commas are for different defense variants or benchmarks. | Strategy | Defense | Platform | Performance Overhead (%) | |--------------------|-----------------------|-------------------------------------------|----------------------------------------| | No Setup & No Send | DAWG [15] | Zsim [65] | 0 ∼ 15 | | No Setup & No Send | MI6 [25] | RiscyOO [66] | 16.4 | | No Setup & No Send | IRONHIDE [26] | Tilera Tile-Gx72 processor [67] | -20 (Compared to an SGX-like baseline) | | No Access | CSF-LFENCE [17] | GEM5 [68] | 48 | | No Use | NDA [23] | GEM5 [68] | 10.7 ∼ 125 | | No Use | SpectreGuard [18] | GEM5 [68] | 8, 20 | | No Use | ConTExT [27] | Software approximation on Intel processor | 0.1 ∼ 71.1 | | No Use | SpecShieldERP(+) [21] | GEM5 [68] | 10, 21 | | No Use | STT [22] | GEM5 [68] | 8.5, 14.5, 24, 27 | | No Send | CondSpec [16] | GEM5 [68] | 6.8, 12.8, 53.6 | | No Send | EfficientSpec [20] | GEM5 [68] | 11 (IPC loss) | | No Send | DOLMA [33] | GEM5 [68] | 10.2 ∼ 42.2 | | No Send | CSF-CFENCE [17] | GEM5 [68] | 7.7, 21 | | No Send | InvisiSpec [14], [69] | GEM5 [68] | 5, 17 | | No Send | SafeSpec [19] | MARSSx86 [70] | -3 | | No Send | MuonTrap [29] | GEM5 [68] | -5, 4 | | No Send | CleanupSpec [24] | GEM5 [68] | 5.1 | and memory addresses so that Authorize can finish earlier. ClearShadow moves the instructions that Authorize depends on to the front to shorten or remove the speculation window. ClearShadow has been used to improve a 'delay-on-miss' defense [20]. InvarSpec [32] allows some sensitive instructions to execute earlier without protection. InvarSpec software identifies the safe set (SS) of an instruction I which contains instructions that are older than I but do not affect I 's input and execution. InvarSpec hardware extension reads the SS and allows I to be issued even if some SS instructions are not resolved. InvarSpec can be applied to the fence-based defense, the delay-on-miss defense and the InvisiSpec defense as we show in Table III. ## F. Software-hardware Co-design Some hardware defenses require software support. One way is changing the application software as described above for ClearShadow [31] and InvarSpec [32]. Another way is modifying the system software. DAWG [15] needs the system software to assign a proper domain ID to the protected program so that the domain ID is not shared with any potential attackers. Context-sensitive fencing [17] has a set of model-specific registers (MSRs) to specify the fence type and the insertion strategy. SpectreGuard [18] and ConTExT [27] enable marking secret data as non-transient by using a bit in the page table entry, which requires both compiler and OS software modifications. ## V. UNDERSTANDING PERFORMANCE OVERHEAD ## A. Performance Overhead Reported by Defense Papers TABLE IV shows the performance overhead reported by some hardware defenses, listed according to the hardware defense taxonomy we presented in Fig. 5. The same gem5 cycle-accurate processor simulator [68] is used by most of the hardware defense papers. The overhead of isolationbased defenses to prevent cross-domain Setup and Send is mainly due to the clearing of microarchitectural states and the partitioning of hardware resources. CSF-LFENCE [17] <details> <summary>Image 8 Details</summary> ![f50ea91e](/v1/image/f50ea91e12c5919dd04c9be4375f0f0e404a88994d691d6abb0a34a112b4b2e5) ### Visual Description ## Data Table: HW Defense Performance Overhead ### Overview The image presents a data table showing the performance overhead associated with different hardware defense configurations. The table compares various levels of defense, ranging from "Permissive" to "Full," and quantifies the performance impact of each. ### Components/Axes * **Rows:** * HW Defense * Perf. Overhead * **Columns:** * No Use w/o Auth. * NDA * Permissive * Permissive+BR * Strict * Strict+BR * Load * Full ### Detailed Analysis or ### Content Details The table contains the following data: | HW Defense | Permissive | Permissive+BR | Strict | Strict+BR | Load | Full | | --------------- | ---------- | ------------- | ------ | --------- | ------ | ------ | | Perf. Overhead | 10.7% | 22.3% | 36.1% | 45% | 100% | 125% | * **Permissive:** Performance overhead is 10.7%. * **Permissive+BR:** Performance overhead is 22.3%. * **Strict:** Performance overhead is 36.1%. * **Strict+BR:** Performance overhead is 45%. * **Load:** Performance overhead is 100%. * **Full:** Performance overhead is 125%. ### Key Observations * The performance overhead increases as the hardware defense becomes more strict. * The "Load" configuration results in a 100% performance overhead. * The "Full" configuration results in the highest performance overhead at 125%. ### Interpretation The data indicates a direct correlation between the level of hardware defense and the performance overhead. As the defense mechanisms become more stringent (from "Permissive" to "Full"), the performance cost increases significantly. This suggests a trade-off between security and performance, where stronger defenses come at the expense of higher overhead. The "Load" and "Full" configurations, with 100% and 125% overhead respectively, may be impractical for systems where performance is critical. </details> | | No Use w/o Auth. | No Use w/o Auth. | No Use w/o Auth. | No Use w/o Auth. | No Use w/o Auth. | No Use w/o Auth. | |----------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| | HWDefense | NDA | NDA | NDA | NDA | NDA | NDA | | | Permissive | Permissive+BR | Strict | Strict+BR | Load | Full | | Perf. Overhead | 10.7% | 22.3% | 36.1% | 45% | 100% | 125% | (a) Performance overhead with different Authorization and Access types | | No Send w/o Auth. | No Send w/o Auth. | | No Use w/o Auth. | No Use w/o Auth. | |----------------|---------------------|---------------------|----------------|--------------------|--------------------| | HWDefense | InvisiSpec | InvisiSpec | HWDefense | SpecShield | SpecShield | | | Spectre | Futuristic | | SpecShieldERP | SpecShieldERP+ | | Perf. Overhead | 5% | 17% | Perf. Overhead | 21% | 10% | (b) Different Authorization types (c) Restricting potential covert channels TABLE V: Security-performance trade-offs of different variants within the same work. inserts lfence for only kernel loads but already incurs an overhead of 48%. The No Use without Authorization defenses differ a lot in their performance overhead as they may cover different types of authorization (NDA), protect certain data region (SpectreGuard), be emulated with software (ConTExT), and prevent certain sensitive Use 's (SpecShield and STT). The No Send without Authorization defenses generally have lower perfromance overhead as they only address certain covert channels, especially the cache covert channel. We illustrate how some of the defenses trade off security and performance. For increased security, more attacks and vulnerabilities can be covered, and more covert channels mitigated, but at increased performance overhead. Increased overhead for covering more attacks. Table V(a) shows the increase in performance overhead for the NDA [23] defense variants to prevent more types of attacks. The 'Permissive' variant considers the control flow authorization only (the first two columns in Fig. 3). The 'Permissive+BR (Bypass Restriction)' variant further considers memory disambiguation authorization (the third column in Fig. 3). 'Load' (NDA-Load) considers the first five columns in Fig. 3 by not deeming an Access operation authorized until it is retired. A fair comparison of performance overhead is from 'Permissive' (10.7%), to 'Permissive+BR' (22.3%), then to 'Load' (100%), since they all protect secrets in memory and special registers. InvisiSpec [14] provides two variants: InvisiSpec-Spectre defends against control-flow misprediction based attacks and InvisiSpec-Futuristic tries to defend against future attacks, where any speculative load may pose a threat. The latter one is more secure but has more performance overhead (17% vs. InvisiSpec-Spectre's 5% in Table V (b)) [69]. Access type vs. Performance trade-off. TABLE V(a) also shows that as more types of Access are considered, the performance overhead increases. The variant 'Strict+BR' considers the accesses to general-purpose registers (GPRs) in addition to special registers and memory, which are considered by 'Permissive+BR'. The overhead increases from 22.3% (Permissive+BR) to 45% (Strict+BR) due to the GPR access consideration. Mitigated covert channels vs. Performance trade-off. In Table V (c) which compares two SpecShield [21] variants, SpecShieldERP disallows the forwarding from a speculative load to all instructions while SpecShieldERP+ only disallows the forwarding to sensitive loads and branch instructions, <details> <summary>Image 9 Details</summary> ![1f277e7a](/v1/image/1f277e7a49465b4061eded627787463b2a1bbb35268c3779dcf40a1255cbe2af) ### Visual Description ## Diagram: Speculative Execution with Short and Long Access ### Overview The image presents two diagrams illustrating speculative execution scenarios, contrasting "Short access" and "Long access" patterns across different branches with varying authorization constraints. The diagrams depict the timing of "Access," "Use," and "Send" operations relative to an "Auth." (Authorization) point. ### Components/Axes * **X-axis:** "time" - Represents the progression of time from left to right. * **Y-axis:** "Branch" - Lists different branches or execution paths, including: * Unconstrained speculation * No Send without Authorization * No Use without Authorization * No Access without Authorization * No speculation * **Auth.:** Authorization point, indicated by a vertical dashed blue line. * **t1:** A specific time point marked on the timeline. * **Operations:** "Short access", "Long access", "Use", "Send" - Represent different operations performed during execution. * **(a) Short access:** Title for the first diagram. * **(b) Long access:** Title for the second diagram. ### Detailed Analysis #### (a) Short access * **Unconstrained speculation:** Starts with "t1", followed by a "Short access" (green), then "Use", and finally "Send". * **No Send without Authorization:** Starts with "Short access" (green), then "Use", and finally "Send". * **No Use without Authorization:** Starts with "Short access" (green), then "Use", and finally "Send". * **No Access without Authorization:** Starts with "Short access" (green), then "Use", and finally "Send". * **No speculation:** Starts with "Short access" (green), then "Use", and finally "Send". #### (b) Long access * **Unconstrained speculation:** Starts with "t1", followed by a "Long access" (red), then "Use", and finally "Send". * **No Send without Authorization:** Starts with "Long access" (red), then "Use", and finally "Send". * **No Use without Authorization:** Starts with "Long access" (red), then "Use", and finally "Send". * **No Access without Authorization:** Starts with "Long access" (red), then "Use", and finally "Send". * **No speculation:** Starts with "Long access" (red), then "Use", and finally "Send". ### Key Observations * The "Short access" scenarios (a) all involve a shorter access time before authorization, represented by green bars. * The "Long access" scenarios (b) all involve a longer access time before authorization, represented by red bars. * The "Auth." point seems to gate or influence the subsequent "Use" and "Send" operations. * The "No speculation" branch appears to have a delayed start compared to "Unconstrained speculation" in both diagrams. ### Interpretation The diagrams illustrate how different authorization constraints affect the timing of speculative execution. The "Short access" scenarios suggest a more immediate access attempt, while "Long access" scenarios indicate a delayed or extended access period. The "Auth." point likely represents a security check or permission verification. The different branches represent varying levels of security policies, with "No [Action] without Authorization" branches enforcing stricter controls. The diagrams highlight the trade-offs between performance (early speculation) and security (authorization checks). </details> (b) Long access Fig. 8: Performance analysis of different speculative execution defense strategies on a fast access (a) and a slow access (b). which may be covert Send 's. SpecShieldERP+ relaxes some of the security guarantees to reduce the performance impact of SpecShieldERP from 21% to 10%. ## B. Security-Performance Tradeoffs Considering Our Defense Strategies We now consider the theoretical performance overhead reductions that might be expected, as we relax the security policy from No Access without Authorization to No Use without Authorization to No Send without Authorization . These correspond to the three main categories in our defense strategies in Fig. 5. Since speculative attacks are rare, our goal is to compare the impact of these defense strategies on normal (benign) speculative execution. We consider an example benign program containing a branch instruction, a first load instruction, an arithmetic instruction and a second load instruction, that is data dependent on the arithmetic instruction which is data dependent on the first load instruction. While there may be an arbitrary number of instructions between these 3 instructions, we show them as sequential (in Fig. 8), to simplify the discussion. To help correlate this code with a speculative execution attack, e.g., Spectre v1 in Fig. 2, these 4 instructions correspond to the Authorize , Access , Use and Send operations. We illustrate the timelines of the Authorize , Access , Use and Send operations in Fig. 8. We consider two scenarios: a fast secret access (Fig. 8(a)), e.g., the first load has a cache hit, and a slow secret access (Fig. 8(b)), e.g., the first load has a cache miss. In each scenario, we present 1) an insecure OoO processor allowing any speculation ( Unconstrained Speculation ); 2) a No Send without Authorization defense; 3) a No Use without Authorization defense; 4) a No Access without Authorization defense; and 5) a processor disabling speculation ( No Speculation ). The Access , Use and Send are a chain of three datadependent instructions. Therefore the Access , Use and Send operations cannot run together. In Fig. 8(a)), the No Send without Authorization , the No Use without Authorization , and the No Access without Authorization strategies delay the Send , Use and Access operations, respectively, till after the Authorization is resolved. Hence they have increasing performance overhead, showing the intrinsic performance overhead in these defense strategies. The slowest but most rigorous security policy, No Access without Authorization , is as slow as No Speculation , if the first load instruction ( Access ) immediately follows the branch instruction, and there are no other nondependent instructions that can be executed in the speculation window. In Fig. 8(b), the slow Access (cache miss) may not be fully covered by the delay in the Authorize operation. The No Speculation defense still causes the longest delay, same as the No Access without Authorization defense strategy. The No Use without Authorization and the No Send without Authorization defense strategies introduce shorter delay, and can achieve the fastest performance like the Unconstrained Speculation case. Takeaway: In general, the No Speculation and Unconstrained Speculation cases give the upper and lower bounds, respectively, for the total execution time. The overheads of the three defense strategies decrease from the strict security policy of No Access without Authorization , to the more relaxed but still secure No Use without Authorization , to the No Send without Authorization strategies, with not much difference in overhead between the last two strategies. ## VI. PROBLEMS FOR SOME DEFENSES Problem scenarios for isolation-based defenses The isolation-based defenses can be used to either prevent the Setup step (Section IV-A) or the covert Send step (Section IV-D). These defenses can prevent the attack when the victim and the attacker are from different security domains. However, the mis-training can happen in the same domain [35], [42]. The sender and the receiver of covert channel communication can also be from the same domain. For instance, in a Meltdown attack demo [71] where the secret is in the kernel space, the sender instructions that read the secret and send it out are in a user-level process, which also executes the receiver's code to reveal the secret. These special cases can make the isolation-based defenses ineffective. For instance, the DAWG [15] cache uses the domain ID to parition the cache resouces and therefore, it cannot prevent the samedomain attack where the sender and the receiver are in the same process and have the same domain ID. Other No Send without Authorization defenses may also have to be applied, e.g., defenses following Delay , Prevent , Shadow Structure or Roll-back in Section IV-D. Problem with covert channel blocking defenses. The issue with the No Send without Authorization defenses is that they only protect against one or a few covert channels. However, they do not restrict how secrets are illegally accessed and can protect against new speculative execution or other attacks - but only for the specific covert channels considered in the defense. ## VII. RECOMMENDATIONS AND FUTURE WORK Recommendation of defense strategies. From a security perspective, preventing the Access and Use of a secret is more critical, since all covert channels requiring secret-dependent usage are eliminated. Between these two strategies, the No Use without Authorization has better performance as some longlatency loads can be performed under speculative execution, reducing the performance overhead in benign situations. Mitigating the aggressive forwarding of faulty access. The faulty access attacks can read data that the current program does not have the permission to access. Some defenses prevent [17], [23] these attacks by blocking the execution or completion of an Access operation until it is at the head of the reorder buffer (ROB) so that any exception will be immediately handled. We argue that for the faulty access that violates the permission check of memory or special registers, delaying the forwarding to any dependent instructions until its permission check is finished is enough, i.e., a No Use without Authorization policy. For the illegal forwarding from microarchitectural buffers, our suggestion is to disallow the forwarding to any faulting memory accesses, or return a dummy value and disallow its usage. Simply returning a dummy value without preventing its usage is not enough. For instance, returning a dummy value of 0 may cause the leakage of the data at address 0 if the dummy value is speculatively used as an address. ## VIII. CONCLUSIONS In this paper, we first show how speculative execution attacks can be classified according to what hardware features are exploited to bypass security checks, that we call authorizations . We then show a new attack characterization based on the critical attack steps common to all speculative execution attacks, namely, Setup , Authorize , Access , Use , Send and Receive . We observe that the root cause of the attacks succeeding is the bypassing of the Authorize step during speculative execution. This attack characterization enables us to propose the first taxonomy of defense strategies, where each strategy prevents one of the critical attack steps of Setup , Access , Use and Send . We show that the 20 defense proposals considered in this paper can be categorized under at least one of these four defense strategies, or as an overhead-reducing feature. We describe important features in these defenses and some of their key hardware modifications. Security-performance tradeoffs are also discussed for defenses that propose multiple variants. We discuss the scope of these hardware defense strategies and show their relative performance overhead. Future work can consider new attacks and defenses, using and adding to our taxonomies of attacks and defenses. New defenses can be proposed to reduce the performance overhead and/or cover more attack types. For fair comparisons, new defenses should compare their performance with those that target the same set of exploited vulnerabilities, secret accesses and covert channels. Acknowledgements. This work was supported in part by NSF SaTC #1814190, SRC Hardware Security #2844 and a Qualcomm Faculty Award for Prof. Lee. We thank Shuwen Deng and Jakub Szefer for help with initial performance numbers. ## REFERENCES - [1] 'Intel SGX,' https://software.intel.com/content/www/us/en/develop/ documentation/sgx-developer-guide/top.html. - [2] 'Arm tee,' https://www.arm.com/why-arm/technologies/ trustzone-for-cortex-a/tee-reference-documentation. - [3] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher et al. , 'Spectre attacks: Exploiting speculative execution,' in 2019 IEEE Symposium on Security and Privacy (SP) , 2019. - [4] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, 'Meltdown: Reading kernel memory from user space,' in 27th USENIX Security Symposium , 2018. [Online]. Available: https://www.usenix.org/conference/usenixsecurity18/presentation/lipp - [5] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx, 'Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution,' in 27th USENIX Security Symposium , 2018. - [6] O. Weisse, J. Van Bulck, M. Minkin, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, R. Strackx, T. F. Wenisch, and Y. Yarom, 'ForeshadowNG: Breaking the virtual memory abstraction with transient out-of-order execution,' Technical report , 2018, see also USENIX Security paper Foreshadow [5]. - [7] 'Intel analysis of speculative execution side channels,' https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/ Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf, 2018. - [8] 'Arm cache speculation side-channels,' https://developer.arm.com/ support/arm-security-updates/speculative-processor-vulnerability/ download-the-whitepaper, 2018. - [9] 'Deep dive: Indirect branch restricted speculation,' https: //software.intel.com/security-software-guidance/deep-dives/ deep-dive-indirect-branch-restricted-speculation, 2018. - [10] 'Retpoline: A branch target injection mitigation,' https: //software.intel.com/security-software-guidance/api-app/sites/default/ files/Retpoline-A-Branch-Target-Injection-Mitigation.pdf?source= techstories.org, 2018. - [11] 'A year with spectre: a v8 perspective,' https://v8.dev/blog/spectre, 2019. - [12] 'What spectre and meltdown mean for webkit,' https://webkit.org/blog/ 8048/what-spectre-and-meltdown-mean-for-webkit/, 2018. - [13] 'The chromium projects: Mitigating side-channel attacks,' https://www. chromium.org/Home/chromium-security/ssca, 2018. - [14] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. Fletcher, and J. Torrellas, 'Invisispec: Making speculative execution invisible in the cache hierarchy,' in The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2018. - [15] V. Kiriansky, I. Lebedev, S. Amarasinghe, S. Devadas, and J. Emer, 'Dawg: A defense against cache timing attacks in speculative execution processors,' in The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2018. - [16] P. Li, L. Zhao, R. Hou, L. Zhang, and D. Meng, 'Conditional speculation: An effective approach to safeguard out-of-order execution against spectre attacks,' in 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) , 2019. - [17] M. Taram, A. Venkat, and D. Tullsen, 'Context-sensitive fencing: Securing speculative execution via microcode customization,' in The International Conference on Architectural Support for Programming Languages and Operating Systems , 2019. [Online]. Available: https: //doi.org/10.1145/3297858.3304060 - [18] J. Fustos, F. Farshchi, and H. Yun, 'Spectreguard: An efficient data-centric defense mechanism against spectre attacks,' in The 56th Design Automation Conference (DAC) , 2019. [Online]. Available: https://doi.org/10.1145/3316781.3317914 - [19] K. N. Khasawneh, E. M. Koruyeh, C. Song, D. Evtyushkin, D. Ponomarev, and N. Abu-Ghazaleh, 'Safespec: Banishing the spectre of a meltdown with leakage-free speculation,' in The 56th Design Automation Conference (DAC) , 2019. - [20] C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Sj¨ alander, 'Efficient invisible speculative execution through selective delay and value prediction,' in The 46th International Symposium on Computer Architecture (ISCA) , 2019. [Online]. Available: https: //doi.org/10.1145/3307650.3322216 - [21] K. Barber, A. Bacha, L. Zhou, Y. Zhang, and R. Teodorescu, 'Specshield: Shielding speculative data from microarchitectural covert channels,' in The 28th International Conference on Parallel Architectures and Compilation Techniques (PACT) , 2019. - [22] J. Yu, M. Yan, A. Khyzha, A. Morrison, J. Torrellas, and C. W. Fletcher, 'Speculative taint tracking (stt): A comprehensive protection for speculatively accessed data,' in The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2019. [Online]. Available: https://doi.org/10.1145/3352460.3358274 - [23] O. Weisse, I. Neal, K. Loughlin, T. F. Wenisch, and B. Kasikci, 'Nda: Preventing speculative execution attacks at their source,' in The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2019. [Online]. Available: https://doi.org/10.1145/3352460. 3358306 - [24] G. Saileshwar and M. K. Qureshi, 'Cleanupspec: An 'undo' approach to safe speculation,' in The 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2019. [Online]. Available: https://doi.org/10.1145/3352460.3358314 - [25] T. Bourgeat, I. Lebedev, A. Wright, S. Zhang, Arvind, and S. Devadas, 'Mi6: Secure enclaves in a speculative out-of-order processor,' in The 52nd Annual IEEE/ACM International Symposium on Microarchitecture , 2019. [Online]. Available: https://doi.org/10.1145/3352460.3358310 - [26] H. Omar and O. Khan, 'Ironhide: A secure multicore that efficiently mitigates microarchitecture state attacks for interactive applications,' in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) , 2020. - [27] M. Schwarz, M. Lipp, C. Canella, R. Schilling, F. Kargl, and D. Gruss, 'Context: A generic approach for mitigating spectre,' in The 27th Annual Network and Distributed System Security Symposium (NDSS'20), San Diego, CA, USA , 2020. - [28] B. Grayson, J. Rupley, G. Z. Zuraski, E. Quinnell, D. A. Jim´ enez, T. Nakra, P. Kitchin, R. Hensley, E. Brekelbaum, V. Sinha, and A. Ghiya, 'Evolution of the samsung exynos cpu microarchitecture,' in The ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) , 2020. - [29] S. Ainsworth and T. M. Jones, 'Muontrap: Preventing cross-domain spectre-like attacks by capturing speculative state,' in The ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) , 2020. - [30] J. Yu, N. Mantri, J. Torrellas, A. Morrison, and C. W. Fletcher, 'Speculative data-oblivious execution: Mobilizing safe prediction for safe and efficient speculative execution,' in The ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) , 2020. - [31] K.-A. Tran, C. Sakalis, M. Sj¨ alander, A. Ros, S. Kaxiras, and A. Jimborean, 'Clearing the shadows: Recovering lost performance for invisible speculative execution through hw/sw co-design,' in The ACM International Conference on Parallel Architectures and Compilation Techniques (PACT) , 2020. [Online]. Available: https: //doi.org/10.1145/3410463.3414640 - [32] Z. N. Zhao, H. Ji, M. Yan, J. Yu, C. W. Fletcher, A. Morrison, D. Marinov, and J. Torrellas, 'Speculation invariance (invarspec): Faster safe execution through program analysis,' in The 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2020. - [33] K. Loughlin, I. Neal, J. Ma, E. Tsai, O. Weisse, S. Narayanasamy, and B. Kasikci, 'DOLMA: Securing speculation with the principle of transient non-observability,' in 30th USENIX Security Symposium (USENIX Security 21) , 2021. [Online]. Available: https://www.usenix. org/conference/usenixsecurity21/presentation/loughlin - [34] 'Consumption of speculative data barrier,' https://msrc-blog.microsoft.com/2018/03/15/ mitigating-speculative-execution-side-channel-hardware-vulnerabilities. - [35] C. Canella, J. V. Bulck, M. Schwarz, M. Lipp, B. von Berg, P. Ortner, F. Piessens, D. Evtyushkin, and D. Gruss, 'A systematic evaluation of transient execution attacks and defenses,' in 28th USENIX Security Symposium , 2019. [Online]. Available: https: //www.usenix.org/conference/usenixsecurity19/presentation/canella - [36] W. Xiong and J. Szefer, 'Survey of transient execution attacks and their mitigations,' ACM Computing Surveys , 2021. - [37] Z. He, G. Hu, and R. Lee, 'New models for understanding and reasoning about speculative execution attacks,' in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2021. - [38] Y. Yarom and K. Falkner, 'Flush+reload: A high resolution, low noise, l3 cache side-channel attack,' in 23rd USENIX - Security Symposium , 2014. [Online]. Available: https://www.usenix.org/ conference/usenixsecurity14/technical-sessions/presentation/yarom - [39] D. A. Osvik, A. Shamir, and E. Tromer, 'Cache attacks and countermeasures: The case of aes,' in Topics in Cryptology - CT-RSA 2006 , D. Pointcheval, Ed., 2006. - [40] D. Gruss, C. Maurice, K. Wagner, and S. Mangard, 'Flush+flush: A fast and stealthy cache attack,' in Detection of Intrusions and Malware, and Vulnerability Assessment , J. Caballero, U. Zurutuza, and R. J. Rodr´ ıguez, Eds., 2016. - [41] V. Kiriansky and C. Waldspurger, 'Speculative buffer overflows: Attacks and defenses,' arXiv preprint arXiv:1807.03757 , 2018. - [42] M. Schwarz, M. Schwarzl, M. Lipp, J. Masters, and D. Gruss, 'Netspectre: Read arbitrary memory over network,' in European Symposium on Research in Computer Security , 2019. - [43] E. M. Koruyeh, K. N. Khasawneh, C. Song, and N. Abu-Ghazaleh, 'Spectre returns! speculation attacks using the return stack buffer,' in 12th USENIX Workshop on Offensive Technologies (WOOT 18) , 2018. [Online]. Available: https://www.usenix.org/conference/woot18/ presentation/koruyeh - [44] G. Maisuradze and C. Rossow, 'Ret2spec: Speculative execution using return stack buffers,' in The 2018 ACM SIGSAC Conference on Computer and Communications Security , 2018. [Online]. Available: https://doi.org/10.1145/3243734.3243761 - [45] J. Horn, 'Speculative execution, variant 4: Speculative store bypass, 2018,' URl: https://bugs.chromium.org/p/projectzero/issues/detail?id=1528 , 2018. - [46] 'Spectre v3a (rsre),' https://www.intel.com/content/www/us/en/ security-center/advisory/intel-sa-00115.html, 2018. - [47] 'Lazy fp,' https://www.intel.com/content/www/us/en/security-center/ advisory/intel-sa-00145.html, 2018. - [48] S. Van Schaik, A. Milburn, S. ¨ Osterlund, P. Frigo, G. Maisuradze, K. Razavi, H. Bos, and C. Giuffrida, 'Ridl: Rogue in-flight data load,' in 2019 IEEE Symposium on Security and Privacy (SP) , 2019. - [49] 'Microarchitectural data sampling,' https:// software.intel.com/content/www/us/en/develop/articles/ software-security-guidance/technical-documentation/ intel-analysis-microarchitectural-data-sampling.html, 2019. - [50] M. Schwarz, M. Lipp, D. Moghimi, J. Van Bulck, J. Stecklina, T. Prescher, and D. Gruss, 'Zombieload: Cross-privilege-boundary data sampling,' in The 2019 ACM SIGSAC Conference on Computer and Communications Security , 2019. - [51] C. Canella, D. Genkin, L. Giner, D. Gruss, M. Lipp, M. Minkin, D. Moghimi, F. Piessens, M. Schwarz, B. Sunar et al. , 'Fallout: Leaking data on meltdown-resistant cpus,' in The 2019 ACM SIGSAC Conference on Computer and Communications Security , 2019. - [52] 'TAA,' https://software.intel.com/content/www/us/en/ develop/articles/software-security-guidance/advisory-guidance/ intel-tsx-asynchronous-abort.html, 2019. - [53] 'VRS,' https://software.intel.com/content/www/us/en/develop/articles/ software-security-guidance/advisory-guidance/vector-register-sampling. html, 2020. - [54] S. van Schaik, M. Minkin, A. Kwong, D. Genkin, and Y. Yarom, 'Cacheout: Leaking data on intel cpus via cache evictions,' in 2021 IEEE Symposium on Security and Privacy (SP) , 2021. - [55] 'L1d eviction sampling,' https://software.intel.com/content/www/ us/en/develop/articles/software-security-guidance/advisory-guidance/ l1d-eviction-sampling.html, 2020. - [56] H. Ragab, A. Milburn, K. Razavi, H. Bos, and C. Giuffrida, 'Crosstalk: Speculative data leaks across cores are real,' in 2021 IEEE Symposium on Security and Privacy (SP) , 2021. - [57] 'Special register buffer data sampling,' https://software.intel.com/ content/www/us/en/develop/articles/software-security-guidance/ technical-documentation/special-register-buffer-data-sampling.html, 2020. - [58] J. Van Bulck, D. Moghimi, M. Schwarz, M. Lippi, M. Minkin, D. Genkin, Y. Yarom, B. Sunar, D. Gruss, and F. Piessens, 'Lvi: Hijacking transient execution through microarchitectural load value injection,' in 2020 IEEE Symposium on Security and Privacy (SP) , 2020. - [59] M. Behnia, P. Sahu, R. Paccagnella, J. Yu, Z. N. Zhao, X. Zou, T. Unterluggauer, J. Torrellas, C. Rozas, A. Morrison et al. , 'Speculative interference attacks: Breaking invisible speculation schemes,' in The 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems , 2021. - [60] A. Bhattacharyya, A. Sandulescu, M. Neugschwandtner, A. Sorniotti, B. Falsafi, M. Payer, and A. Kurmus, 'Smotherspectre: exploiting speculative execution through port contention,' in The 2019 ACM SIGSAC Conference on Computer and Communications Security , 2019. - [61] M. Lipp, V. Haˇ zi´ c, M. Schwarz, A. Perais, C. Maurice, and D. Gruss, 'Take a way: Exploring the security implications of amd's cache way predictors,' in The 15th ACM Asia Conference on Computer and Communications Security , 2020. [Online]. Available: https://doi.org/10.1145/3320269.3384746 - [62] 'Speculative execution side channel mitigations,' https: //software.intel.com/content/dam/develop/external/us/en/documents/ 336996-speculative-execution-side-channel-mitigations.pdf, 2018. - [63] M. K. Qureshi, 'Ceaser: Mitigating conflict-based cache attacks via encrypted-address and remapping,' in The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2018. - [64] D. Evtyushkin, R. Riley, N. Abu-Ghazaleh, and D. Ponomarev, 'Branchscope: A new side-channel attack on directional branch predictor,' in The Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems , 2018. [Online]. Available: https://doi.org/10.1145/3173162.3173204 - [65] D. Sanchez and C. Kozyrakis, 'Zsim: Fast and accurate microarchitectural simulation of thousand-core systems,' in The 40th Annual International Symposium on Computer Architecture (ISCA) , 2013. [Online]. Available: https://doi.org/10.1145/2485922.2485963 - [66] S. Zhang, A. Wright, T. Bourgeat, and A. Arvind, 'Composable building blocks to open up processor design,' in The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2018. - [67] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown III, and A. Agarwal, 'On-chip interconnection architecture of the tile processor,' IEEE Micro , vol. 27, no. 5, 2007. - [68] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, 'The gem5 simulator,' SIGARCH Comput. Archit. News , vol. 39, no. 2, 2011. [Online]. Available: https://doi.org/10.1145/2024716.2024718 - [69] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. W. Fletcher, and J. Torrellas, 'Correction: Invisispec: Making speculative execution invisible in the cache hierarchy,' 2019. [Online]. Available: https: //iacoma.cs.uiuc.edu/iacoma-papers/corrected micro18.pdf - [70] A. Patel, F. Afram, and K. Ghose, 'Marss-x86: A qemu-based microarchitectural and systems simulator for x86 multicore processors,' in 1st International Qemu Users' Forum , 2011. - [71] 'Meltdown proof-of-concept,' https://github.com/IAIK/meltdown, 2019.

Rendering Paper...