## Bar Chart: Decode Throughput by Batch Size
### Overview
This is a stacked bar chart comparing the decode throughput (in tokens per second) of three different systems or configurations across two batch sizes (10 and 11). The chart visually demonstrates how total throughput changes with batch size and how the contribution from each system varies.
### Components/Axes
* **Chart Type:** Stacked Bar Chart.
* **X-Axis:** Labeled **"Batch Size"**. It has two discrete categories: **10** and **11**.
* **Y-Axis:** Labeled **"Decode Throughput (tokens/sec)"**. The scale runs from 0 to 1200, with major tick marks at intervals of 200 (0, 200, 400, 600, 800, 1000, 1200).
* **Legend:** Positioned in the top-left corner of the chart area. It contains three entries:
1. **SGLang non-deterministic** (represented by a light blue color).
2. **SGLang deterministic** (represented by a reddish-brown color).
3. **LLM-42** (represented by a green color).
### Detailed Analysis
The chart presents data for two batch sizes. Each bar is a stack of segments corresponding to the systems in the legend.
**Batch Size 10:**
* The bar consists of a single segment.
* **SGLang non-deterministic (Blue):** This segment forms the entire bar. Its height reaches approximately **830 tokens/sec** on the y-axis.
* **Total Throughput for Batch Size 10:** ~830 tokens/sec.
**Batch Size 11:**
* The bar is composed of three stacked segments, from bottom to top:
1. **SGLang deterministic (Reddish-brown):** This is the base segment. Its height reaches approximately **410 tokens/sec**.
2. **LLM-42 (Green):** This segment is stacked on top of the red one. It starts at ~410 and ends at approximately **910 tokens/sec**. Therefore, its individual contribution is approximately 910 - 410 = **500 tokens/sec**.
3. **SGLang non-deterministic (Blue):** This is a very thin segment stacked on top of the green one. It starts at ~910 and ends at approximately **930 tokens/sec**. Its individual contribution is approximately 930 - 910 = **20 tokens/sec**.
* **Total Throughput for Batch Size 11:** ~930 tokens/sec.
### Key Observations
1. **Throughput Increase with Batch Size:** The total decode throughput increases from ~830 tokens/sec at batch size 10 to ~930 tokens/sec at batch size 11.
2. **System Contribution Shift:** At batch size 10, the entire throughput is attributed to "SGLang non-deterministic". At batch size 11, the composition changes dramatically:
* "SGLang deterministic" becomes the largest contributor (~410 tokens/sec).
* "LLM-42" provides a substantial contribution (~500 tokens/sec).
* The contribution from "SGLang non-deterministic" shrinks to a very small fraction (~20 tokens/sec).
3. **Dominant System at Batch Size 11:** The "LLM-42" system (green) appears to be the single largest contributor to throughput at batch size 11.
### Interpretation
This chart likely compares the performance of different Large Language Model (LLM) serving or inference systems ("SGLang" in deterministic and non-deterministic modes, and "LLM-42"). The data suggests several technical insights:
* **Batch Size Impact:** Increasing the batch size from 10 to 11 yields a moderate overall throughput improvement (~12% increase). This is consistent with the general principle that larger batches can improve hardware utilization.
* **System Behavior Change:** The most striking finding is the complete shift in which system is active or dominant. At batch size 10, only the non-deterministic SGLang mode is operational or measured. At batch size 11, the deterministic mode and the LLM-42 system engage significantly, while the non-deterministic mode's role becomes minimal. This could indicate:
* A system configuration or scheduling policy that activates different backends based on batch size.
* A performance bottleneck or resource contention that prevents the non-deterministic mode from scaling effectively to batch size 11, while other systems handle the load.
* An experimental setup where different systems are tested at different, non-overlapping batch sizes.
* **Performance of LLM-42:** The "LLM-42" system demonstrates strong performance at batch size 11, contributing over half of the total throughput. This positions it as a potentially high-throughput option for that specific workload configuration.
* **Deterministic vs. Non-deterministic:** The "SGLang deterministic" mode shows a clear ability to handle a significant portion of the load at batch size 11, whereas the non-deterministic mode does not scale similarly in this test.
**In summary, the chart reveals that total system throughput is not just a function of batch size but is critically dependent on which specific processing system or mode is handling the workload. The transition from batch size 10 to 11 triggers a major change in system utilization, with LLM-42 and deterministic SGLang becoming the primary throughput drivers.**