## Line Chart: R1-Llama | GK23EN
### Overview
This is a line chart comparing the performance (Accuracy %) of four different data selection strategies ("Full", "Random", "Bottom", "Top") for a model or system named "R1-Llama" on a task or dataset labeled "GK23EN". The chart plots Accuracy against an increasing Ratio (%).
### Components/Axes
* **Chart Title:** "R1-Llama | GK23EN" (Top center)
* **Y-Axis:** Label: "Accuracy (%)". Scale: Linear, ranging from 62 to 74, with major tick marks at 62, 64, 66, 68, 70, 72, 74.
* **X-Axis:** Label: "Ratio (%)". Scale: Appears to be a logarithmic or custom scale with discrete points at 2, 4, 6, 10, 20, 30, 40, 50.
* **Legend:** Positioned in the top-right quadrant of the chart area. It defines four data series:
* **Full:** Gray line with 'x' markers.
* **Random:** Green line with upward-pointing triangle markers.
* **Bottom:** Blue line with square markers.
* **Top:** Red line with circle markers.
### Detailed Analysis
**Data Series Trends and Approximate Values:**
1. **Full (Gray, 'x'):**
* **Trend:** Very flat, nearly horizontal line with a slight upward slope. It represents a high, stable baseline.
* **Data Points (Approximate):**
* Ratio 2%: ~72.2%
* Ratio 4%: ~72.3%
* Ratio 6%: ~72.4%
* Ratio 10%: ~72.5%
* Ratio 20%: ~72.7%
* Ratio 30%: ~72.8%
* Ratio 40%: ~72.9%
* Ratio 50%: ~73.0%
2. **Top (Red, circles):**
* **Trend:** Consistently upward-sloping line. It starts below the "Full" line but surpasses it around a 10-20% ratio and maintains the highest accuracy thereafter.
* **Data Points (Approximate):**
* Ratio 2%: ~71.0%
* Ratio 4%: ~71.5%
* Ratio 6%: ~72.0%
* Ratio 10%: ~72.5% (Intersects with "Full" line)
* Ratio 20%: ~73.0%
* Ratio 30%: ~73.2%
* Ratio 40%: ~73.5%
* Ratio 50%: ~73.3% (Slight dip from previous point)
3. **Random (Green, triangles):**
* **Trend:** Starts low, increases slowly until a ratio of 20%, after which the slope increases dramatically, showing rapid improvement.
* **Data Points (Approximate):**
* Ratio 2%: ~62.0%
* Ratio 4%: ~62.5%
* Ratio 6%: ~63.0%
* Ratio 10%: ~64.0%
* Ratio 20%: ~65.0%
* Ratio 30%: ~67.0%
* Ratio 40%: ~68.5%
* Ratio 50%: ~70.0%
4. **Bottom (Blue, squares):**
* **Trend:** Starts at the same low point as "Random", increases slowly and steadily, but at a slower rate than "Random" after the 20% ratio point. It remains the lowest-performing strategy at higher ratios.
* **Data Points (Approximate):**
* Ratio 2%: ~62.0%
* Ratio 4%: ~62.2%
* Ratio 6%: ~62.5%
* Ratio 10%: ~63.5%
* Ratio 20%: ~64.5%
* Ratio 30%: ~65.0%
* Ratio 40%: ~66.0%
* Ratio 50%: ~67.0%
### Key Observations
* **Performance Hierarchy:** At low ratios (2-6%), the hierarchy is clear: Full ≈ Top > Random ≈ Bottom. At high ratios (40-50%), the hierarchy is: Top > Full > Random > Bottom.
* **Crossover Point:** The "Top" strategy's accuracy line crosses above the "Full" strategy's line between the 10% and 20% ratio marks.
* **Divergence of Random/Bottom:** The "Random" and "Bottom" strategies start at identical accuracy (~62%) but begin to diverge significantly after the 10% ratio, with "Random" improving much faster.
* **Stability vs. Growth:** The "Full" dataset line is remarkably stable, showing minimal gain from increasing ratio. The "Top" strategy shows consistent growth. The "Random" strategy shows a "hockey stick" growth curve, accelerating after 20%.
### Interpretation
This chart demonstrates the impact of data selection quality and quantity on model accuracy for the R1-Llama model on the GK23EN task.
* **Quality over Quantity (Initially):** Using a small ratio (2-10%) of the highest-ranked data ("Top") achieves accuracy comparable to using the entire dataset ("Full"), suggesting high data efficiency and that a core subset of high-quality data is highly informative.
* **The Power of Curation:** As the data ratio increases, the curated "Top" subset not only matches but eventually surpasses the performance of the full dataset. This implies that the full dataset may contain noise or lower-quality examples that, when included in large proportions, can slightly hinder peak performance compared to a perfectly curated set.
* **Random Sampling's Threshold:** The "Random" strategy's poor initial performance and later acceleration suggest a critical mass effect. Below a ~20% ratio, random data lacks sufficient coverage of important patterns. Beyond that threshold, the added diversity begins to pay off rapidly, though it never catches up to curated selection within the tested range.
* **The "Bottom" Penalty:** Consistently selecting the lowest-ranked data ("Bottom") yields the worst performance, confirming that the ranking metric used is meaningful and that low-quality data is actively detrimental or, at best, minimally helpful.
**In essence, the data argues for intelligent data curation ("Top" selection) as the most effective strategy, offering both high efficiency at low ratios and superior peak performance at higher ratios.** The full dataset is a robust but sub-optimal baseline, while random selection requires a significant volume of data to become competitive.