Image 5fb488938d95...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview

INTEL_VERIFIED

# Technical Diagram Analysis: Routing Mechanisms in Mixture-of-Experts (MoE)

This document describes a technical illustration comparing three different routing strategies for neural network architectures: **Token-choice routing**, **Expert-choice routing**, and **Expert-choice MoD (Mixture-of-Depths)**.

---

## 1. Legend and Global Components

The diagram uses specific visual cues to represent data flow and processing status:

*   **Solid Arrow:** Represents a "Routing decision" (active path).
*   **Dotted Arrow / Dotted Outline:** Represents a "Dropped token" (inactive or discarded path).
*   **Colored Rectangles:** Represent individual tokens.
*   **Large Rounded Boxes:** Represent "Experts" or processing units.
*   **"Route" Block:** A decision-making component accompanied by a small bar chart representing probability distributions or scores.

---

## 2. Component Analysis by Section

### Section A: Token-choice routing
*   **Header:** "Token-choice routing"
*   **Structure:**
    *   **Input:** Six tokens are shown at the bottom (Red, Grey, Orange, Light Purple [dotted], Teal, Yellow).
    *   **Processing Units:** Three experts labeled **Expert 1**, **Expert 2**, and **Expert 3**.
*   **Flow and Logic:**
    *   Each token "chooses" which expert to go to.
    *   **Expert 1** receives the Red and Orange tokens.
    *   **Expert 2** receives the Grey token.
    *   **Expert 3** receives the Teal and Yellow tokens.
    *   **Dropped Token:** The Light Purple token (4th from left) has a dotted outline and a dotted arrow pointing toward Expert 1/2, indicating it was not selected for processing by any expert and is dropped.
    *   **Observation:** Experts have variable workloads (Expert 1 has 2 tokens, Expert 2 has 1, Expert 3 has 2).

### Section B: Expert-choice routing
*   **Header:** "Expert-choice routing"
*   **Structure:**
    *   **Input:** Six token slots at the bottom.
    *   **Processing Units:** Three experts labeled **Expert 1**, **Expert 2**, and **Expert 3**.
*   **Flow and Logic:**
    *   The routing decision originates from the Experts. Each expert selects a fixed number of tokens (top-k).
    *   **Expert 1** selects the Red and Grey tokens.
    *   **Expert 2** selects the Grey and Purple tokens.
    *   **Expert 3** selects the Teal and Yellow tokens.
    *   **Dropped Token:** The 3rd token (Light Orange with dotted outline) is not selected by any expert and is dropped.
    *   **Observation:** Experts have a uniform workload (each processes exactly 2 tokens). Some tokens (like Grey) may be processed by multiple experts.

### Section C: Expert-choice MoD (Mixture-of-Depths)
*   **Header:** "Expert-choice MoD"
*   **Structure:**
    *   **Input:** Six token slots at the bottom.
    *   **Processing Unit:** A single block labeled **Self-attention & MLP**.
*   **Flow and Logic:**
    *   The router selects a fixed capacity of tokens to undergo computation.
    *   The label **"top-2 choices"** indicates that only two tokens are selected for the heavy computation block.
    *   **Selected Tokens:** The Red token (1st) and the Yellow token (6th) are routed into the Self-attention & MLP block.
    *   **Dropped Tokens:** The 2nd, 3rd, 4th, and 5th tokens are shown with dotted outlines, indicating they bypass this specific computation layer.
    *   **Observation:** This mechanism limits the total computation by only allowing a subset of tokens to pass through the layer based on a routing score.

---

## 3. Summary of Key Differences

| Feature | Token-choice routing | Expert-choice routing | Expert-choice MoD |
| :--- | :--- | :--- | :--- |
| **Decision Maker** | Token selects Expert | Expert selects Tokens | Layer selects Tokens |
| **Expert Workload** | Variable (can be unbalanced) | Fixed (balanced) | Fixed (capacity constrained) |
| **Token Processing** | Most tokens processed | Some tokens may be dropped | Significant portion of tokens dropped/bypassed |
| **Primary Goal** | Dynamic allocation | Load balancing | Computational efficiency/Sparsity |

DECODING INTELLIGENCE...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free

INTEL_VERIFIED

# Technical Document Extraction: Token and Expert Choice Routing Mechanisms

## Diagram Overview
The image illustrates three routing mechanisms in a machine learning architecture, focusing on token and expert selection processes. The diagram uses color-coded bars, arrows, and route boxes to represent decision flows and token distribution.

---

### 1. **Token-choice Routing**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and orange bars (two parallel bars per expert).
  - **Expert 2**: Gray bar (single bar).
  - **Expert 3**: Cyan and yellow bars (two parallel bars).
- **Route Box**: 
  - Contains a bar chart (gray bars) representing token distribution.
  - Arrows labeled **"Routing decision"** point to experts.
  - Dashed arrow labeled **"Dropped token"** indicates unrouted tokens.
- **Flow**:
  - Tokens are routed to experts based on the route box's bar chart.
  - Expert 2 receives tokens via a secondary (dashed) path.

#### Key Observations:
- Token distribution is explicit via the route box's bar chart.
- Expert 2 has a fallback routing mechanism (dashed arrow).

---

### 2. **Expert-choice Routing**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and yellow bars.
  - **Expert 2**: Gray and purple bars.
  - **Expert 3**: Cyan and yellow bars.
- **Route Box**: 
  - Connects to **top-2 choices** via arrows.
  - Arrows labeled **"Routing decision"** select experts.
- **Flow**:
  - Tokens are routed to the top 2 experts based on criteria (e.g., relevance, confidence).
  - Dashed lines indicate dropped tokens.

#### Key Observations:
- Hierarchical routing: Only top 2 experts are selected per token.
- Expert 3 receives tokens via a direct path from the route box.

---

### 3. **Expert-choice MoD (Mixture of Depths)**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and yellow bars.
  - **Expert 3**: Cyan and yellow bars.
- **Route Box**: 
  - Labeled **"Self-attention & MLP"**, indicating integrated mechanisms.
  - Arrows labeled **"top-2 choices"** route tokens.
- **Flow**:
  - Tokens are routed to the top 2 experts using a combination of self-attention and MLP.
  - Dashed lines indicate dropped tokens.

#### Key Observations:
- Hybrid routing mechanism combining attention and multilayer perceptron (MLP).
- Simplified expert selection (only Experts 1 and 3 are active).

---

### Common Elements Across All Routing Mechanisms
1. **Color Coding**:
   - Experts are represented by distinct color pairs (e.g., red/orange, gray, cyan/yellow).
   - No explicit legend, but colors are consistent across sections.
2. **Dropped Tokens**:
   - Represented by dashed arrows or unconnected bars.
3. **Routing Decisions**:
   - Arrows labeled **"Routing decision"** or **"top-2 choices"** dictate token flow.

---

### Summary of Routing Strategies
| Mechanism               | Token Selection       | Expert Selection       | Key Features                          |
|-------------------------|-----------------------|------------------------|---------------------------------------|
| **Token-choice**        | Explicit distribution | All experts            | Fallback routing for Expert 2         |
| **Expert-choice**       | Top-2 experts         | Top-2 experts          | Hierarchical selection                |
| **Expert-choice MoD**   | Top-2 experts         | Top-2 experts          | Hybrid self-attention + MLP           |

---

### Notes for Technical Implementation
1. **Token-choice Routing**:
   - Requires a bar chart (route box) to quantify token distribution.
   - Fallback mechanisms (dashed arrows) ensure robustness.
2. **Expert-choice Routing**:
   - Depends on a scoring system to rank experts (e.g., relevance scores).
   - Top-2 selection reduces computational overhead.
3. **Expert-choice MoD**:
   - Combines self-attention (for contextual token relationships) and MLP (for expert-specific processing).
   - Simplifies the expert pool to critical candidates.

---

### Diagram Flow Summary
1. **Input Tokens**: Represented by vertical bars (colors denote experts).
2. **Route Box**: Processes tokens to determine routing decisions.
3. **Expert Selection**: Tokens are routed to selected experts (solid arrows) or dropped (dashed arrows).
4. **Output**: Processed tokens from selected experts.

This diagram provides a framework for optimizing computational efficiency in large-scale models by dynamically routing tokens to specialized experts.

DECODING INTELLIGENCE...

TECHNICAL ASSET FINGERPRINT

5fb488938d95f09f00769e80

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1

EXPERT: nemotron-free VERSION 1