## CDF Chart: End-to-End Latency Comparison
### Overview
This image displays a Cumulative Distribution Function (CDF) plot comparing the end-to-end (E2E) latency performance of two systems: "SGLang" (in deterministic and non-deterministic modes) and "LLM-42" at various percentage-based configurations. The chart visualizes the probability (CDF) that a request's latency is less than or equal to a given time value.
### Components/Axes
* **Chart Type:** Cumulative Distribution Function (CDF) line chart.
* **X-Axis:** Labeled **"E2E Latency (ms)"**. It represents time in milliseconds, with major tick marks at 0, 20000, 40000, 60000, 80000, and 100000.
* **Y-Axis:** Labeled **"CDF"**. It represents the cumulative probability, ranging from 0.0 to 1.0, with major tick marks at 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0.
* **Legend:** Positioned in the **bottom-right quadrant** of the chart area. It contains 8 entries, each with a colored line sample and a text label:
1. Green line: `SGLang non-deterministic`
2. Red line: `SGLang deterministic`
3. Blue line: `LLM-42 @2%`
4. Orange line: `LLM-42 @5%`
5. Purple line: `LLM-42 @10%`
6. Brown line: `LLM-42 @20%`
7. Pink line: `LLM-42 @50%`
8. Cyan line: `LLM-42 @100%`
* **Grid:** A light gray, dashed grid is present for both major X and Y axis ticks.
### Detailed Analysis
**Trend Verification & Data Points:**
All lines originate at (0 ms, 0.0 CDF) and rise to approach or reach a CDF of 1.0, indicating all measured requests complete within the displayed latency range.
1. **SGLang Lines (Green & Red):**
* **Trend:** Both lines exhibit the steepest initial slope, rising very rapidly.
* **Data Points:** They cross the 0.8 CDF mark at approximately 5,000-7,000 ms. They reach a CDF of ~0.95 by 20,000 ms and converge to 1.0 shortly after 40,000 ms. The green (non-deterministic) and red (deterministic) lines are nearly coincident, with the red line appearing marginally to the left (slightly lower latency) in the 0.6-0.9 CDF range.
2. **LLM-42 Lines (Blue, Orange, Purple, Brown, Pink, Cyan):**
* **General Trend:** These lines show a clear gradient. As the percentage in the label increases, the curve shifts to the right, indicating higher latency for the same cumulative probability.
* **LLM-42 @2% (Blue) & @5% (Orange):** These lines closely follow the SGLang lines, being nearly indistinguishable from them in the lower latency region (CDF < 0.8). They are the best-performing among the LLM-42 variants.
* **LLM-42 @10% (Purple):** Begins to show a slight rightward shift compared to the @2%/@5% lines, especially noticeable above CDF 0.6.
* **LLM-42 @20% (Brown):** Shows a more pronounced rightward shift. It crosses 0.8 CDF at approximately 15,000 ms.
* **LLM-42 @50% (Pink):** Has a significantly more gradual slope. It crosses 0.8 CDF at approximately 25,000 ms.
* **LLM-42 @100% (Cyan):** Exhibits the most gradual slope and highest latency. It crosses 0.8 CDF at approximately 35,000 ms and does not reach a CDF of 1.0 until near the 100,000 ms mark.
### Key Observations
1. **Performance Hierarchy:** The systems can be grouped by performance: SGLang (both modes) and LLM-42 @2%/@5% form a high-performance cluster. Latency increases progressively for LLM-42 @10%, @20%, @50%, and @100%.
2. **SGLang Deterministic vs. Non-deterministic:** The performance difference between these two modes is minimal, with the deterministic mode showing a very slight advantage.
3. **LLM-42 Parameter Impact:** There is a direct, monotonic relationship between the percentage parameter in the LLM-42 label and end-to-end latency. Higher percentages result in worse (higher) latency across the entire distribution.
4. **Tail Latency:** The "tail" of the distribution (CDF > 0.9) shows the most dramatic differences. For example, to serve 95% of requests (CDF=0.95), SGLang requires ~20,000 ms, while LLM-42 @100% requires nearly 60,000 ms.
### Interpretation
This chart demonstrates a clear performance trade-off in the LLM-42 system. The percentage value (e.g., @2%, @100%) likely represents a configuration parameter that trades off latency for another resource or quality metric (such as computational cost, memory usage, or output quality/fidelity). The data suggests:
* **SGLang** is optimized for low latency, achieving very fast response times for the vast majority of requests.
* **LLM-42** offers a tunable parameter. At low settings (@2%, @5%), it can match SGLang's latency performance. However, increasing this parameter to presumably gain benefits in another dimension (unshown in this chart) comes at a significant and predictable cost to response time.
* The **Peircean insight** is that the chart doesn't just show "which is faster," but reveals the *cost function* of the LLM-42 system. The consistent, graded spacing between the LLM-42 curves indicates a well-behaved, predictable relationship between the configuration parameter and its latency impact. This allows a user to make an informed engineering trade-off: selecting the highest LLM-42 percentage that still meets their application's latency budget. The near-overlap of SGLang with LLM-42 @2%/@5% suggests that for latency-critical applications, SGLang or LLM-42 at minimal settings are the viable choices.