\n
## Line Chart: Model Performance vs. Inference-Time Compute
### Overview
This is a line chart comparing the performance (Score in %) of two versions of an AI model called "Gemini Deep Think" against a third model, "Aletheia," as a function of increasing inference-time compute. The x-axis uses a logarithmic scale (base 2). The chart demonstrates how scaling compute resources impacts model accuracy on a given task.
### Components/Axes
* **Chart Type:** Line chart with markers.
* **X-Axis:**
* **Label:** "Inference-Time Compute (Log Scale)"
* **Scale:** Logarithmic, base 2.
* **Markers/Ticks:** 2⁰, 2¹, 2², 2³, 2⁴, 2⁵, 2⁶, 2⁷, 2⁸, 2⁹, 2¹⁰, 2¹¹.
* **Y-Axis:**
* **Label:** "Score (%)"
* **Scale:** Linear, from 30 to 90+.
* **Major Gridlines:** At 30, 40, 50, 60, 70, 80, 90.
* **Legend (Position: Bottom-right corner):**
1. **Blue line with circle markers:** "Gemini Deep Think (advanced version, Jan 2026)"
2. **Orange line with circle markers:** "Gemini Deep Think (IMO Gold, Jul 2025)"
3. **Red star marker:** "Aletheia"
* **Other Elements:** A light gray grid is present in the background.
### Detailed Analysis
**Data Series 1: Gemini Deep Think (advanced version, Jan 2026) - Blue Line**
* **Trend:** Shows a strong, consistent upward trend that begins to plateau after 2⁸. The line slopes upward steeply from 2⁰ to 2⁸, then fluctuates slightly at a high level.
* **Data Points (Approximate):**
* 2⁰: ~40%
* 2³: ~67%
* 2⁴: ~73%
* 2⁵: ~78%
* 2⁶: ~82%
* 2⁷: ~85%
* 2⁸: ~90%
* 2⁹: ~85%
* 2¹⁰: ~90%
* 2¹¹: ~88%
**Data Series 2: Gemini Deep Think (IMO Gold, Jul 2025) - Orange Line**
* **Trend:** Shows an initial slight decline, followed by a strong upward trend that also plateaus. The line dips at 2³ before rising steadily. It consistently scores lower than the advanced version.
* **Data Points (Approximate):**
* 2⁰: ~33%
* 2³: ~29% (Local minimum)
* 2⁴: ~43%
* 2⁶: ~58%
* 2⁷: ~59%
* 2⁸: ~63%
* 2⁹: ~66%
* 2¹⁰: ~68%
* 2¹¹: ~65%
**Data Point 3: Aletheia - Red Star**
* **Position:** Located at x=2⁸, y=~95%.
* **Note:** This is a single data point, not a series. It is positioned significantly above both lines at the same compute level (2⁸).
### Key Observations
1. **Performance Gap:** The "advanced version (Jan 2026)" consistently outperforms the "IMO Gold (Jul 2025)" version across all compute levels, indicating significant model improvement over approximately six months.
2. **Compute Scaling:** Both model versions show a clear positive correlation between inference-time compute and score, demonstrating the benefit of increased computational resources for this task.
3. **Diminishing Returns:** Both lines show signs of plateauing after 2⁸ (256) on the log scale, suggesting that beyond this point, additional compute yields smaller performance gains.
4. **Aletheia's Efficiency:** The Aletheia model achieves the highest observed score (~95%) at a moderate compute level (2⁸), outperforming even the advanced Gemini model at that same point. This suggests superior efficiency or capability for this specific task.
5. **Anomaly in Orange Line:** The dip in the IMO Gold version's performance at 2³ (from ~33% at 2⁰ to ~29%) is an outlier in its otherwise upward trend.
### Interpretation
The chart illustrates a classic scaling law in AI: performance improves with increased inference-time compute, but with diminishing returns. The data tells a story of rapid model iteration and improvement, as seen in the jump from the July 2025 to January 2026 versions of Gemini Deep Think.
The most significant finding is the position of Aletheia. It represents a different point in the performance-compute trade-off space. While the Gemini models follow a predictable scaling curve, Aletheia achieves a state-of-the-art result with less compute than the peak of the advanced Gemini model. This could indicate a more efficient architecture, a different training paradigm, or specialization for the benchmark measured by this "Score." The chart positions Aletheia not just as a high performer, but as a potentially more efficient solution, challenging the notion that more compute is always the primary path to higher scores.