# Technical Document Extraction: Image Analysis
## Subplot (a)
- **Title**: Not explicitly labeled (implied by axes)
- **X-Axis**: "Cache Hit Rate (%)" (0% to 100%)
- **Y-Axis**:
- "Batch Size" (green line)
- "Throughput (tokens/s)" (orange line)
- **Legend**:
- Green: Batch Size
- Orange: Throughput (tokens/s)
- **Trends**:
- Both metrics increase monotonically with cache hit rate.
- Throughput peaks at ~1.2k tokens/s at 100% cache hit rate.
- Batch Size peaks at ~40 units at 100% cache hit rate.
## Subplot (b)
- **Title**: Not explicitly labeled (implied by axes)
- **X-Axis**: "Cache Hit Rate (%)" (0% to 100%)
- **Y-Axis**:
- "Total Latency (s)" (red line)
- "First Token Latency (s)" (blue line)
- **Legend**:
- Red: Total Latency
- Blue: First Token Latency
- **Trends**:
- Both latencies decrease sharply as cache hit rate increases.
- Total Latency drops from ~400s to ~100s.
- First Token Latency drops from ~200s to ~10s.
## Subplot (c)
- **Title**: Not explicitly labeled (implied by axes)
- **X-Axis**: Categorical labels:
- LLM Judge
- Tree of Thought
- MMLU
- Multi-Turn Chat(short)
- **Y-Axis**: "Normalized Throughput" (0.0 to 1.0)
- **Legend**:
- Gray: No Cache
- Dark Gray: No Tree Structure
- Green: FCFS Schedule
- Blue: No Frontend Hint
- Orange: Full Optimization
- **Bar Groups**:
- Each category has 5 bars (one per legend label).
- Example:
- **LLM Judge**:
- No Cache: ~0.4
- No Tree Structure: ~0.5
- FCFS Schedule: ~0.2
- No Frontend Hint: ~0.55
- Full Optimization: ~1.0
- **Tree of Thought**:
- No Cache: ~0.3
- No Tree Structure: ~0.4
- FCFS Schedule: ~0.35
- No Frontend Hint: ~0.8
- Full Optimization: ~1.0
- **MMLU**:
- No Cache: ~0.1
- No Tree Structure: ~0.6
- FCFS Schedule: ~0.9
- No Frontend Hint: ~0.95
- Full Optimization: ~1.0
- **Multi-Turn Chat(short)**:
- No Cache: ~0.5
- No Tree Structure: ~0.7
- FCFS Schedule: ~0.6
- No Frontend Hint: ~0.9
- Full Optimization: ~1.0
## Cross-Referenced Observations
1. **Legend Consistency**:
- Subplot (a): Green/orange lines match legend labels.
- Subplot (b): Red/blue lines match legend labels.
- Subplot (c): Bar colors match legend labels (e.g., orange bars = Full Optimization).
2. **Performance Correlation**:
- Higher cache hit rates correlate with increased throughput (subplot a) and reduced latency (subplot b).
- Full Optimization consistently achieves normalized throughput of 1.0 across all categories (subplot c).
## Notes
- No explicit title for subplots (a) and (b); inferred from axis labels.
- Subplot (c) uses grouped bars to compare optimization strategies across tasks.
- All y-axes are normalized or absolute values as labeled.