## Technical Analysis of Model Performance Charts
This image contains two scatter plots comparing the performance of various large language models (LLMs) and different pruning/merging methods against their total parameters. A single legend on the right is shared by both charts.
### Legend
The legend defines the coding for methods (lines and colors) and models (markers).
**Methods:**
* **Baseline:** Solid black line (represented as black markers in the plots).
* **Pruning Methods:**
* **REAP (ours):** Blue dashed line with diamond markers.
* **EAN:** Pink dashed line with 'x' markers.
* **Frequency:** Green dashed line with diamond markers.
* **Merging Methods:**
* **HC-SMoE:** Yellow dashed line with square markers.
* **M-SMoE:** Light blue dashed line with square markers.
**Models:**
* **ERNIE-4.5-21B-A3B:** Circle (•)
* **Qwen3-30B-A3B:** Square (■)
* **Mixtral-8x7B-Instruct-v0.1:** Downward-pointing triangle (▼)
* **LLaMA-4-Scout-17B-16E-Instruct:** Star (★)
* **GLM-4.5-Air:** Diamond (◆)
* **Qwen3-Coder-480B-A35B-Instruct-FP8:** Plus sign (+)
* **Kimi-K2-Instruct-W4A16:** Cross (x)
---
### Left Chart: Non-Agentic Code Accuracy vs. Total Parameters
This chart plots "Non-Agentic Code Acc. (%)" on the y-axis against "Log Scaled Total Parameters (in billions)" on the x-axis. The x-axis is logarithmic, with major ticks at 10¹, 10², and 10³. The y-axis ranges from 0 to 60% with ticks every 10%.
**Key Trends and Data Points:**
* **General Trend:** For most methods, accuracy tends to increase as the number of parameters increases. The "REAP (ours)" method consistently outperforms other pruning and merging methods, often approaching or exceeding the baseline performance.
* **ERNIE-4.5-21B-A3B (•):** Baseline is at ~21B parameters with ~55% accuracy. All methods show an upward trend from ~10B to ~21B parameters, with REAP reaching the highest accuracy (~58%).
* **Qwen3-30B-A3B (■):** Baseline is at ~30B parameters with ~58% accuracy. Methods show an upward trend from ~20B to ~30B parameters. REAP reaches ~28% accuracy.
* **Mixtral-8x7B-Instruct-v0.1 (▼):** Baseline is at ~47B parameters with ~32% accuracy. Methods show an upward trend from ~60B to ~80B parameters. REAP reaches ~59% accuracy.
* **LLaMA-4-Scout-17B-16E-Instruct (★):** Baseline is at ~110B parameters with ~55% accuracy. Methods show an upward trend from ~60B to ~80B parameters. REAP reaches ~56% accuracy.
* **GLM-4.5-Air (◆):** Baseline is at ~110B parameters with ~60% accuracy. Methods show an upward trend from ~250B to ~400B parameters. REAP reaches ~65% accuracy. Frequency pruning shows a very steep drop to ~1% at ~250B before recovering to ~54% at ~400B.
* **Qwen3-Coder-480B-A35B-Instruct-FP8 (+):** Baseline is at ~500B parameters with ~66% accuracy. Methods show an upward trend from ~600B to ~800B parameters. REAP reaches ~66% accuracy. Frequency pruning starts very low at ~6% and rises to ~30%.
* **Kimi-K2-Instruct-W4A16 (x):** Only the baseline is shown at ~1000B parameters with ~66% accuracy.
---
### Right Chart: MC Accuracy vs. Total Parameters
This chart plots "MC Accuracy (%)" on the y-axis against "Log Scaled Total Parameters (in billions)" on the x-axis. The x-axis is logarithmic, with major ticks at 10¹, 10², and 10³. The y-axis ranges from 45% to 75% with ticks every 5%.
**Key Trends and Data Points:**
* **General Trend:** Similar to the left chart, accuracy generally increases with parameter count. The "REAP (ours)" method again shows strong performance, often being the top-performing method after the baseline.
* **ERNIE-4.5-21B-A3B (•):** Baseline is at ~21B parameters with ~72% accuracy. All methods show an upward trend from ~10B to ~21B parameters, with REAP reaching ~68% accuracy.
* **Qwen3-30B-A3B (■):** Baseline is at ~30B parameters with ~72% accuracy. Methods show an upward trend from ~20B to ~30B parameters. REAP reaches ~68% accuracy.
* **Mixtral-8x7B-Instruct-v0.1 (▼):** Baseline is at ~47B parameters with ~74% accuracy. Methods show an upward trend from ~60B to ~80B parameters. REAP reaches ~72% accuracy.
* **LLaMA-4-Scout-17B-16E-Instruct (★):** Baseline is at ~110B parameters with ~74% accuracy. Methods show an upward trend from ~60B to ~80B parameters. REAP reaches ~68% accuracy.
* **GLM-4.5-Air (◆):** Baseline is at ~110B parameters with ~75% accuracy. Methods show an upward trend from ~250B to ~400B parameters. REAP reaches ~75% accuracy.
* **Qwen3-Coder-480B-A35B-Instruct-FP8 (+):** Baseline is at ~500B parameters with ~75% accuracy. Methods show an upward trend from ~600B to ~800B parameters. REAP reaches ~77% accuracy, surpassing the baseline.
* **Kimi-K2-Instruct-W4A16 (x):** Only the baseline is shown at ~1000B parameters with ~78% accuracy.