## CDF Plot: Latency Distribution Comparison
### Overview
The chart compares cumulative distribution functions (CDFs) of end-to-end (E2E) latency for two systems: SGLang (non-deterministic and deterministic variants) and LLM-42 at various percentiles (2%, 5%, 10%, 20%, 50%, 100%). The x-axis represents latency in milliseconds (ms), and the y-axis represents the CDF value (0.0 to 1.0).
### Components/Axes
- **X-axis**: E2E Latency (ms), ranging from 0 to 100,000 ms with grid lines at 20,000 ms intervals.
- **Y-axis**: CDF (0.0 to 1.0), marked in 0.2 increments.
- **Legend**: Positioned in the top-right corner, with color-coded labels:
- Green: SGLang non-deterministic
- Red: SGLang deterministic
- Blue: LLM-42 @2%
- Orange: LLM-42 @5%
- Purple: LLM-42 @10%
- Brown: LLM-42 @20%
- Pink: LLM-42 @50%
- Cyan: LLM-42 @100%
### Detailed Analysis
1. **SGLang non-deterministic (green)**:
- Starts at (0, 0) and rises sharply.
- Reaches CDF=1.0 at ~20,000 ms.
- Smooth, steep curve with minimal variance.
2. **SGLang deterministic (red)**:
- Similar shape to non-deterministic but slightly delayed.
- Reaches CDF=1.0 at ~30,000 ms.
- Overlaps with LLM-42 @10% (purple) at ~30,000 ms.
3. **LLM-42 percentiles**:
- **@2% (blue)**: Reaches CDF=1.0 at ~20,000 ms (matches SGLang non-deterministic).
- **@5% (orange)**: Reaches CDF=1.0 at ~25,000 ms.
- **@10% (purple)**: Reaches CDF=1.0 at ~30,000 ms (overlaps SGLang deterministic).
- **@20% (brown)**: Reaches CDF=1.0 at ~35,000 ms.
- **@50% (pink)**: Reaches CDF=1.0 at ~45,000 ms.
- **@100% (cyan)**: Reaches CDF=1.0 at ~60,000 ms.
### Key Observations
- **SGLang non-deterministic** achieves the lowest latency, outperforming all LLM-42 percentiles except @2%.
- **SGLang deterministic** latency aligns with LLM-42 @10%, suggesting similar performance at the 10th percentile.
- **LLM-42 @100%** exhibits the highest latency, reaching 1.0 CDF at ~60,000 ms, indicating significant tail latency.
- All LLM-42 percentiles show gradual, less steep curves compared to SGLang, implying broader latency distribution.
### Interpretation
The data demonstrates that **SGLang non-deterministic** provides the most consistent and lowest-latency performance, making it ideal for latency-sensitive applications. The deterministic variant of SGLang introduces a ~10,000 ms delay compared to its non-deterministic counterpart, aligning it with LLM-42's 10th percentile performance.
LLM-42's percentile-based curves reveal a trade-off between average performance and worst-case scenarios: while its 2nd percentile matches SGLang non-deterministic, its 100th percentile latency is 3x higher. This suggests LLM-42 may prioritize throughput or flexibility at the cost of tail latency, whereas SGLang non-deterministic optimizes for minimal latency across all percentiles. The deterministic SGLang variant appears to balance predictability with moderate latency, suitable for applications requiring controlled execution timing.