## Line Chart: Accuracy of Four AI Models Across Mathematical Topics
### Overview
This image is a line chart comparing the performance (accuracy) of four different large language models (LLMs) on a wide range of mathematical topics. The chart displays the accuracy percentage for each model across approximately 50 distinct mathematical concepts, primarily from middle school and high school curricula. The data is presented as four interconnected line series, each with markers at data points.
### Components/Axes
* **Chart Type:** Multi-series line chart with markers.
* **Legend:** Positioned at the top center of the chart area. It defines four data series:
* **Yi-6B:** Blue line with circular markers.
* **ChatGLM3-6B:** Orange line with circular markers.
* **LLaMA2-7B:** Green line with circular markers.
* **DeepSeekMath-7B:** Red line with circular markers.
* **Y-Axis (Vertical):**
* **Label:** "Accuracy" (written vertically).
* **Scale:** Linear scale from 0 to approximately 90.
* **Major Gridlines:** Horizontal dashed lines at intervals of 20 (0, 20, 40, 60, 80).
* **X-Axis (Horizontal):**
* **Label:** Not explicitly labeled with a title, but contains category labels for mathematical topics.
* **Category Labels:** A dense series of labels written in Chinese, rotated at a 45-degree angle for readability. These represent the specific mathematical topics tested.
* **Language:** The primary language of the axis labels is **Chinese**. An English translation is provided below for each label.
### Detailed Analysis
The chart plots accuracy values for each model on each topic. Below is a reconstruction of the data, reading the approximate values from the chart. Values are estimated to the nearest 5% due to visual interpretation.
**X-Axis Categories (Chinese -> English Translation):**
1. 全等三角形 -> Congruent Triangles
2. 等腰三角形 -> Isosceles Triangles
3. 等边三角形 -> Equilateral Triangles
4. 平行四边形 -> Parallelograms
5. 圆周角 -> Inscribed Angle
6. 圆心角 -> Central Angle
7. 弧长和扇形面积 -> Arc Length and Sector Area
8. 点与圆的位置关系 -> Positional Relationship between a Point and a Circle
9. 函数与一元一次方程 -> Function and Linear Equation in One Variable
10. 函数与一元一次不等式 -> Function and Linear Inequality in One Variable
11. 函数与二元一次方程组 -> Function and System of Linear Equations in Two Variables
12. 求一次函数的解析式 -> Finding the Analytic Expression of a Linear Function
13. 二次函数的定义 -> Definition of Quadratic Function
14. 反比例函数的定义 -> Definition of Inverse Proportional Function
15. 反比例函数的性质 -> Properties of Inverse Proportional Function
16. 有理数的乘方 -> Exponentiation of Rational Numbers
17. 点的坐标与象限 -> Coordinates of a Point and Quadrants
18. 同底数幂的乘法 -> Multiplication of Powers with the Same Base
19. 约分与通分 -> Reduction and Reduction to a Common Denominator
20. 十字相乘法 -> Cross Multiplication Method
21. 提公因式法 -> Factoring by Common Factor
22. 流水问题 -> Flow/Current Problems (Word Problems)
23. 鸡兔同笼 -> Chicken and Rabbit in the Same Cage (Classic Problem)
24. 整式的乘法与加减 -> Multiplication and Addition/Subtraction of Polynomials
25. 平方差公式 -> Difference of Squares Formula
26. 完全平方公式 -> Perfect Square Formula
27. 二次根式的乘除 -> Multiplication and Division of Quadratic Radicals
28. 二次根式的加减 -> Addition and Subtraction of Quadratic Radicals
29. 二次根式的化简 -> Simplification of Quadratic Radicals
30. 二次根式的运算 -> Operations with Quadratic Radicals
31. 一元二次方程的根 -> Roots of a Quadratic Equation in One Variable
32. 解一元二次方程 -> Solving Quadratic Equations in One Variable
33. 一元二次方程的应用 -> Application of Quadratic Equations in One Variable
34. 一元二次不等式的解法 -> Solving Quadratic Inequalities in One Variable
35. 解一元二次不等式 -> Solving Quadratic Inequalities in One Variable (Repeated/Variant)
36. 分式方程的应用 -> Application of Fractional Equations
37. 数据的波动中的极差 -> Range in Data Fluctuation
38. 数据的波动中的方差 -> Variance in Data Fluctuation
39. 频率的求解概率 -> Solving Probability using Frequency
40. 随机事件与概率 -> Random Events and Probability
**Data Series Trends & Approximate Values:**
* **Trend Verification:**
* **DeepSeekMath-7B (Red):** Generally the highest-performing series, with frequent peaks above 70% and several above 80%. It shows high volatility, with sharp drops on some topics (e.g., near 0% on "Chicken and Rabbit" problem).
* **ChatGLM3-6B (Orange):** Often the second-highest, closely following the red line. It has the single highest peak on the chart (approx. 90% on "Function and Linear Equation"). It also experiences significant drops.
* **Yi-6B (Blue):** Typically performs in the middle range, between 20% and 60%. It has a notable low point (approx. 5%) on "Solving Quadratic Inequalities".
* **LLaMA2-7B (Green):** Generally the lowest-performing series, frequently below 40%. It has several very low points (below 10%) on topics like "Congruent Triangles" and "Chicken and Rabbit" problem.
* **Sample Data Points (First 10 Topics):**
| Topic (English) | Yi-6B (Blue) | ChatGLM3-6B (Orange) | LLaMA2-7B (Green) | DeepSeekMath-7B (Red) |
| :--- | :--- | :--- | :--- | :--- |
| Congruent Triangles | ~40% | ~50% | ~40% | ~55% |
| Isosceles Triangles | ~35% | ~20% | ~25% | ~70% |
| Equilateral Triangles | ~35% | ~45% | ~10% | ~65% |
| Parallelograms | ~50% | ~60% | ~20% | ~75% |
| Inscribed Angle | ~30% | ~55% | ~15% | ~70% |
| Central Angle | ~5% | ~0% | ~5% | ~30% |
| Arc Length/Sector Area | ~30% | ~30% | ~15% | ~30% |
| Point & Circle Relation | ~25% | ~40% | ~20% | ~55% |
| Function & Linear Eq. | ~50% | ~90% | ~65% | ~85% |
| Function & Linear Ineq. | ~45% | ~55% | ~35% | ~70% |
### Key Observations
1. **High Variability:** Performance for all models is extremely topic-dependent. No model is consistently superior across all mathematical domains.
2. **Model Hierarchy:** A rough performance hierarchy is visible: DeepSeekMath-7B (Red) ≥ ChatGLM3-6B (Orange) > Yi-6B (Blue) > LLaMA2-7B (Green). However, this order flips on specific topics.
3. **Notable Outliers:**
* **High Peaks:** ChatGLM3-6B achieves ~90% on "Function and Linear Equation". DeepSeekMath-7B peaks near 85% on several topics.
* **Severe Drops:** Multiple models score near 0% on "Chicken and Rabbit in the Same Cage" (Topic 23). LLaMA2-7B scores very low (~5%) on "Congruent Triangles". Yi-6B drops to ~5% on "Solving Quadratic Inequalities".
4. **Topic Difficulty:** Some topics appear universally challenging (e.g., "Chicken and Rabbit", "Central Angle"), where all models score below 40%. Others, like "Function and Linear Equation", see high scores from multiple models.
5. **Specialization:** DeepSeekMath-7B shows particular strength in algebraic and function-related topics (e.g., topics 9-15, 31-35). LLaMA2-7B struggles significantly with geometry topics (e.g., topics 1-8).
### Interpretation
This chart provides a granular comparison of mathematical reasoning capabilities across four LLMs. The data suggests that:
* **Model Architecture and Training Data Matter:** The superior performance of DeepSeekMath-7B, likely a model fine-tuned for mathematics, indicates that specialized training can yield significant gains in specific domains like math problem-solving.
* **Mathematical Reasoning is Not Monolithic:** The extreme volatility in scores shows that "math ability" in AI is not a single skill. Proficiency in algebra does not guarantee proficiency in geometry or word problems. Models have distinct strengths and weaknesses.
* **Classic Problems Remain a Challenge:** The near-zero scores on the "Chicken and Rabbit" problem—a classic logic puzzle—highlight a potential weakness in handling certain types of structured, non-computational word problems, even for otherwise strong models.
* **Benchmarking Value:** For developers or researchers, this chart is valuable for identifying which model might be best suited for a specific educational application (e.g., a geometry tutor vs. an algebra solver). It also pinpoints specific areas (like "Central Angle" or "Chicken and Rabbit" problems) where all current models need improvement, guiding future research and fine-tuning efforts.
In essence, the chart moves beyond a single "accuracy" score to reveal the complex, topic-dependent landscape of AI mathematical reasoning, emphasizing that model selection should be guided by the specific task at hand.