# Technical Document Extraction: Tree Mask Attention Mechanism
This document describes a technical diagram illustrating a "Tree Mask" mechanism, likely used in Transformer-based architectures for processing hierarchical or branching data structures.
## 1. Component Isolation
The image is divided into three primary functional regions:
* **Left (Tree Structure):** A hierarchical representation of tokens starting from a "Root" node.
* **Top (Key Sequence):** A horizontal sequence of tokens acting as the "Key" in an attention mechanism.
* **Center-Right (Attention Matrix):** An $8 \times 8$ grid representing the mask, where checkmarks indicate permitted attention connections between "Query" tokens (rows) and "Key" tokens (columns).
---
## 2. Tree Structure and Query Mapping (Left Region)
The diagram shows how tokens are branched from a central root, organized into "Heads" (likely representing different branches or paths).
### Hierarchy Flow:
1. **Root (Grey Node):** The origin point.
2. **Head 1 (Red Background):** Contains two tokens:
* **It**
* **I**
3. **Head 2 (Orange Background):** This head branches further into two sub-groups based on the parent token from Head 1.
* **Sub-group 1 (Yellow Background):** Derived from the token "It". Contains: **is**, **'**, **the**.
* **Sub-group 2 (Green Background):** Derived from the token "I". Contains: **is**, **'**, **the**.
### Query Sequence (Vertical Axis):
The tokens from the tree are flattened into a vertical sequence of 8 rows for the attention matrix:
1. **It** (from Head 1)
2. **I** (from Head 1)
3. **is** (from Head 2, yellow)
4. **'** (from Head 2, yellow)
5. **the** (from Head 2, yellow)
6. **is** (from Head 2, green)
7. **'** (from Head 2, green)
8. **the** (from Head 2, green)
---
## 3. Key Sequence (Top Region)
The horizontal axis represents the **Key** tokens. They are grouped by color to match the tree structure:
* **Red Group:** [It], [I]
* **Yellow Group:** [is], ['], [the]
* **Green Group:** [is], ['], [the]
---
## 4. Attention Matrix (Tree Mask Data)
The matrix defines which Query (row) can attend to which Key (column). A purple checkmark ($\checkmark$) indicates an active connection.
### Data Table Reconstruction
| Query \ Key | It (Red) | I (Red) | is (Yel) | ' (Yel) | the (Yel) | is (Grn) | ' (Grn) | the (Grn) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **It** | $\checkmark$ | | | | | | | |
| **I** | | $\checkmark$ | | | | | | |
| **is (Yel)** | $\checkmark$ | | $\checkmark$ | | | | | |
| **' (Yel)** | $\checkmark$ | | | $\checkmark$ | | | | |
| **the (Yel)** | $\checkmark$ | | | | $\checkmark$ | | | |
| **is (Grn)** | | $\checkmark$ | | | | $\checkmark$ | | |
| **' (Grn)** | | $\checkmark$ | | | | | $\checkmark$ | |
| **the (Grn)** | | $\checkmark$ | | | | | | $\checkmark$ |
---
## 5. Trend and Logic Verification
* **Identity Attention:** Every token attends to itself, forming a sparse diagonal pattern (visible in the checkmarks at [1,1], [2,2], [3,3], etc.).
* **Hierarchical Dependency:**
* The **Yellow Group** (is, ', the) only attends to itself and its parent token **"It"** (Red). It cannot see the "I" (Red) branch or the Green branch.
* The **Green Group** (is, ', the) only attends to itself and its parent token **"I"** (Red). It cannot see the "It" (Red) branch or the Yellow branch.
* **Isolation:** There is no cross-attention between the Yellow and Green branches, despite them containing the same string literals ("is", "'", "the"). This confirms the mask enforces the tree structure where branches are independent.
## 6. Textual Labels Summary
* **Title:** Tree Mask (accompanied by a small evergreen tree icon 🌲).
* **Labels:** Root, Head 1, Head 2, Query, Key.
* **Tokens:** It, I, is, ', the.