Image f567e9c8ee72...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Data Extraction: Performance Metrics Analysis

This document contains a detailed extraction of data from three performance charts labeled (a), (b), and (c).

---

## Chart (a): Impact of Cache Hit Rate on Batch Size and Throughput

### Metadata
- **X-Axis:** Cache Hit Rate (%) [Range: 0 to 100]
- **Primary Y-Axis (Left, Green):** Batch Size [Range: 20 to 40+]
- **Secondary Y-Axis (Right, Orange):** Throughput (tokens / s) [Range: 0.4k to 1.2k]
- **Legend Location:** Top Left

### Data Series Trends
1.  **Batch Size (Green Line):** Shows a consistent upward slope. The growth rate increases slightly as the cache hit rate exceeds 60%.
2.  **Throughput (Orange Line):** Shows a consistent upward slope, closely tracking the batch size. It exhibits exponential-like growth towards the 100% hit rate mark.

### Key Data Points (Approximate)
| Cache Hit Rate (%) | Batch Size (Green) | Throughput (Orange) |
| :--- | :--- | :--- |
| 0 | 20 | ~0.4k |
| 20 | ~23 | ~0.45k |
| 40 | ~26 | ~0.5k |
| 60 | ~32 | ~0.65k |
| 80 | ~37 | ~0.85k |
| 100 | ~48 | ~1.15k |

---

## Chart (b): Impact of Cache Hit Rate on Latency

### Metadata
- **X-Axis:** Cache Hit Rate (%) [Range: 0 to 100]
- **Primary Y-Axis (Left, Red):** Total Latency (s) [Range: 200 to 400+]
- **Secondary Y-Axis (Right, Blue):** First Token Latency (s) [Range: 10 to 20+]
- **Legend Location:** Top Right

### Data Series Trends
1.  **Total Latency (Red Line):** Shows a steady, linear downward trend as the cache hit rate increases, dropping from over 400s to approximately 120s.
2.  **First Token Latency (Blue Line):** Shows a sharp initial drop between 0% and 10% hit rate, followed by a gradual, fluctuating decline toward the 100% mark.

### Key Data Points (Approximate)
| Cache Hit Rate (%) | Total Latency (Red) | First Token Latency (Blue) |
| :--- | :--- | :--- |
| 0 | ~430s | ~25s |
| 20 | ~350s | ~8s |
| 40 | ~310s | ~8s |
| 60 | ~230s | ~7s |
| 80 | ~180s | ~5s |
| 100 | ~120s | ~3s |

---

## Chart (c): Throughput Ablation Study across Workloads

### Metadata
- **Y-Axis:** Throughput (Normalized) [Range: 0.00 to 1.00]
- **X-Axis Categories:** LLM Judge, Tree of Thought, MMLU, Multi-Turn Chat (short)
- **Legend (Top):**
    *   **Light Gray:** No Cache
    *   **Dark Gray:** No Tree Structure
    *   **Dark Green:** FCFS Schedule
    *   **Medium Green:** Random Schedule
    *   **Dark Blue:** No Frontend Parallelism
    *   **Light Blue:** No Frontend Hint
    *   **Orange:** Full Optimization

### Data Table (Normalized Throughput)

| Configuration | LLM Judge | Tree of Thought | MMLU | Multi-Turn Chat |
| :--- | :---: | :---: | :---: | :---: |
| **No Cache** (Light Gray) | ~0.40 | ~0.28 | ~0.18 | ~0.53 |
| **No Tree Structure** (Dark Gray) | ~0.45 | ~0.35 | ~0.60 | ~0.88 |
| **FCFS Schedule** (Dark Green) | ~0.15 | ~0.35 | ~0.98 | ~0.53 |
| **Random Schedule** (Med Green) | ~0.50 | ~0.45 | ~0.98 | ~0.68 |
| **No Frontend Parallelism** (Dark Blue) | ~0.52 | ~0.35 | ~0.98 | ~0.95 |
| **No Frontend Hint** (Light Blue) | ~0.50 | ~0.88 | ~0.98 | ~0.98 |
| **Full Optimization** (Orange) | **1.00** | **1.00** | **1.00** | **1.00** |

### Key Observations
*   **Full Optimization** consistently achieves the highest throughput (1.00) across all test cases.
*   **Tree of Thought** is most sensitive to the "No Frontend Hint" and "No Cache" configurations.
*   **MMLU** shows significant performance degradation specifically when "No Cache" is used, but remains high for most other configurations.
*   **LLM Judge** shows the worst performance under the "FCFS Schedule" compared to other workloads.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

f567e9c8ee72be442188a125

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1