Image 5fb488938d95...

EXPERT: nemotron-free VERSION 1

RUNTIME: free/nvidia/nemotron-nano-12b-v2-vl:free
INTEL_VERIFIED
# Technical Document Extraction: Token and Expert Choice Routing Mechanisms

## Diagram Overview
The image illustrates three routing mechanisms in a machine learning architecture, focusing on token and expert selection processes. The diagram uses color-coded bars, arrows, and route boxes to represent decision flows and token distribution.

---

### 1. **Token-choice Routing**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and orange bars (two parallel bars per expert).
  - **Expert 2**: Gray bar (single bar).
  - **Expert 3**: Cyan and yellow bars (two parallel bars).
- **Route Box**: 
  - Contains a bar chart (gray bars) representing token distribution.
  - Arrows labeled **"Routing decision"** point to experts.
  - Dashed arrow labeled **"Dropped token"** indicates unrouted tokens.
- **Flow**:
  - Tokens are routed to experts based on the route box's bar chart.
  - Expert 2 receives tokens via a secondary (dashed) path.

#### Key Observations:
- Token distribution is explicit via the route box's bar chart.
- Expert 2 has a fallback routing mechanism (dashed arrow).

---

### 2. **Expert-choice Routing**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and yellow bars.
  - **Expert 2**: Gray and purple bars.
  - **Expert 3**: Cyan and yellow bars.
- **Route Box**: 
  - Connects to **top-2 choices** via arrows.
  - Arrows labeled **"Routing decision"** select experts.
- **Flow**:
  - Tokens are routed to the top 2 experts based on criteria (e.g., relevance, confidence).
  - Dashed lines indicate dropped tokens.

#### Key Observations:
- Hierarchical routing: Only top 2 experts are selected per token.
- Expert 3 receives tokens via a direct path from the route box.

---

### 3. **Expert-choice MoD (Mixture of Depths)**
#### Components:
- **Experts**: 
  - **Expert 1**: Red and yellow bars.
  - **Expert 3**: Cyan and yellow bars.
- **Route Box**: 
  - Labeled **"Self-attention & MLP"**, indicating integrated mechanisms.
  - Arrows labeled **"top-2 choices"** route tokens.
- **Flow**:
  - Tokens are routed to the top 2 experts using a combination of self-attention and MLP.
  - Dashed lines indicate dropped tokens.

#### Key Observations:
- Hybrid routing mechanism combining attention and multilayer perceptron (MLP).
- Simplified expert selection (only Experts 1 and 3 are active).

---

### Common Elements Across All Routing Mechanisms
1. **Color Coding**:
   - Experts are represented by distinct color pairs (e.g., red/orange, gray, cyan/yellow).
   - No explicit legend, but colors are consistent across sections.
2. **Dropped Tokens**:
   - Represented by dashed arrows or unconnected bars.
3. **Routing Decisions**:
   - Arrows labeled **"Routing decision"** or **"top-2 choices"** dictate token flow.

---

### Summary of Routing Strategies
| Mechanism               | Token Selection       | Expert Selection       | Key Features                          |
|-------------------------|-----------------------|------------------------|---------------------------------------|
| **Token-choice**        | Explicit distribution | All experts            | Fallback routing for Expert 2         |
| **Expert-choice**       | Top-2 experts         | Top-2 experts          | Hierarchical selection                |
| **Expert-choice MoD**   | Top-2 experts         | Top-2 experts          | Hybrid self-attention + MLP           |

---

### Notes for Technical Implementation
1. **Token-choice Routing**:
   - Requires a bar chart (route box) to quantify token distribution.
   - Fallback mechanisms (dashed arrows) ensure robustness.
2. **Expert-choice Routing**:
   - Depends on a scoring system to rank experts (e.g., relevance scores).
   - Top-2 selection reduces computational overhead.
3. **Expert-choice MoD**:
   - Combines self-attention (for contextual token relationships) and MLP (for expert-specific processing).
   - Simplifies the expert pool to critical candidates.

---

### Diagram Flow Summary
1. **Input Tokens**: Represented by vertical bars (colors denote experts).
2. **Route Box**: Processes tokens to determine routing decisions.
3. **Expert Selection**: Tokens are routed to selected experts (solid arrows) or dropped (dashed arrows).
4. **Output**: Processed tokens from selected experts.

This diagram provides a framework for optimizing computational efficiency in large-scale models by dynamically routing tokens to specialized experts.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

5fb488938d95f09f00769e80

FOUND IN PAPERS

EXPERT: nemotron-free VERSION 1