# Technical Analysis: Hierarchical KV Cache Management and Eviction
This document describes a series of nine diagrams (labeled 1 through 9) illustrating the management of a hierarchical Key-Value (KV) cache in a Large Language Model (LLM) context. The diagrams show how system prompts, conversation histories, and branching queries are stored, shared, and evicted.
## Visual Legend and Component Key
* **Grey Rounded Square:** Represents a cached block of text/tokens.
* **Light Blue Rounded Square:** Represents a shared parent node in a branching conversation.
* **Light Green Rounded Square with Letter:** Represents the current active leaf node or "head" of a specific conversation thread.
* **Orange Dashed Square with 'X':** Represents a cache block that has been **evicted** from memory.
* **Solid Black Lines:** Represent the hierarchical relationship (parent-child) between blocks.
* **Orange Dashed Lines:** Represent the link to an evicted block.
---
## Sequence Analysis
### (1) Initial State
* A single empty or root cache block is initialized.
### (2) Linear Conversation Start
* **Root:** System prompt: "You are a helpful assistant."
* **Child Node [a]:** Contains "User: Hello! Assistant: Hi!"
* **Structure:** A simple linear chain.
### (3) Linear Extension
* **Root:** System prompt.
* **Intermediate Node [b]:** "User: Hello! Assistant: Hi!"
* **Leaf Node:** "User: Solve this problem ... Assistant: Sure! ..."
* **Trend:** The conversation grows linearly by appending a new block.
### (4) Branching Conversation
* The cache branches from the first interaction.
* **Shared Parent (Blue):** "User: Hello! Assistant: Hi!"
* **Branch 1 (Left):** Leads to node **[c]** with "User: Solve this question... Assistant: Sure! ..."
* **Branch 2 (Right):** Leads to node **[d]** with "User: What can you do? Assistant: I can ..."
### (5) Cache Eviction (Branch 1)
* **Shared Parent (Blue):** Remains in cache.
* **Branch 1 (Left):** The leaf node is replaced by an orange dashed box marked **"X"** and labeled **"evicted"**.
* **Branch 2 (Right):** Continues to grow. A new leaf node is added: "User: Write a story ... Assistant: Sure! ..."
### (6) Complex Branching (Three Threads)
* The root "You are a helpful assistant" now has two main branches.
* **Left Branch:** Further sub-branches into two conversations (one ending in a green leaf, one in a grey block).
* **Right Branch [e]:** A new thread containing "Question 1: ... Answer 1: ... Question 2: ... Answer 2: ... Question 3: ... Answer 3: ..."
### (7) Multi-Level Branching
* The right-most branch from step (6) now acts as a shared parent (Blue) for three new sub-queries:
* **Sub-branch 1:** "What ... Answer 3: ..."
* **Sub-branch 2:** "When ... Answer 3: ..." (Green leaf)
* **Sub-branch 3:** "How ... Answer 3: ..." (Green leaf)
* Nodes **[f]**, **[g]**, and **[h]** represent the leaf nodes of the left-side branches.
### (8) Deep Eviction and Linear Growth
* **Left Branch:** The middle section of the conversation is **evicted** (Orange 'X'). However, a new leaf node **[i]** is generated further down: "User: Solve this question... Assistant: Sure! ... User: How about ..? Assistant: It is a ...".
* **Right Branch:** The three sub-queries from step (7) are now leaf nodes **[j]**, **[k]**, and **[l]**.
### (9) Massive Eviction and Wide Branching
* **Left Branch:** The entire middle segment is **evicted**.
* **Right Branch:**
* The parent node for the "What/When/How" queries is **evicted**.
* Two subsequent sub-nodes are also **evicted**.
* A new shared parent (Blue) is created at the bottom right, branching into four distinct green leaf nodes: "This is ...", "Let us ...", "We can ...", and "To solve ...".
---
## Summary of Data Trends
1. **Prefix Sharing:** Identical starting strings (like the system prompt) are stored once and shared across all branches to save memory.
2. **Dynamic Branching:** The system supports multiple simultaneous conversation paths originating from any point in the history.
3. **Selective Eviction:** When memory is full, the system evicts specific blocks (marked 'X'). The diagrams suggest that leaf nodes or intermediate nodes can be evicted while keeping the root or other active branches intact.
4. **Recovery/Re-growth:** Even after eviction (as seen in step 8 and 9), the tree can continue to grow from remaining valid nodes.