## Radar Chart: PKU Undergraduate Exam Performance
### Overview
A radar chart comparing the performance of five AI models (GPT-4, o1-2024-12-17, deepseek-r1, o3-mini-2025-01-31, gemini-2.5-pro) across 10 undergraduate exam topics. The chart uses color-coded lines to represent each model's scores.
### Components/Axes
- **Axes**: 10 topics arranged radially:
1. Theory of Functions of Complex Variables
2. Analytic Geometry
3. ODE (Ordinary Differential Equations)
4. PDE (Partial Differential Equations)
5. Mathematical Analysis
6. Numerical Linear Algebra
7. Numerical Analysis
8. Set Theory and Graph Theory
9. Abstract Algebra
10. Probability
- **Legend**:
- GPT-4 (teal)
- o1-2024-12-17 (orange)
- deepseek-r1 (blue)
- o3-mini-2025-01-31 (pink)
- gemini-2.5-pro (green)
- **Scale**: 0–100 (inner to outer radius)
### Detailed Analysis
- **GPT-4 (teal)**:
- Highest score: Theory of Functions of Complex Variables (100)
- Lowest score: Analytic Geometry (60)
- Average trend: Peaks at 100, dips to 60, with intermediate values (e.g., 80 for ODE, 70 for PDE).
- **o1-2024-12-17 (orange)**:
- Scores range from 80 (Analytic Geometry) to 100 (Theory of Functions).
- Strong performance in Probability (95) and Abstract Algebra (90).
- **deepseek-r1 (blue)**:
- Scores range from 70 (Analytic Geometry) to 95 (Probability).
- Notable: 90 in Abstract Algebra, 85 in Numerical Analysis.
- **o3-mini-2025-01-31 (pink)**:
- Scores range from 85 (Analytic Geometry) to 100 (Theory of Functions).
- Highest in Probability (100) and Abstract Algebra (95).
- **gemini-2.5-pro (green)**:
- Scores range from 80 (Analytic Geometry) to 100 (Theory of Functions).
- Strong in Probability (95) and Abstract Algebra (90).
### Key Observations
- GPT-4 has the highest score in Theory of Functions (100) but the lowest in Analytic Geometry (60).
- o3-mini-2025-01-31 and gemini-2.5-pro show the most consistent high performance (85–100).
- deepseek-r1 has the lowest scores in Analytic Geometry (70) and Numerical Linear Algebra (75).
### Interpretation
The models exhibit varying strengths across topics. GPT-4 excels in theoretical areas but struggles with applied geometry. o3-mini-2025-01-31 and gemini-2.5-pro demonstrate balanced performance, suggesting robustness in both theoretical and applied mathematics. The radial layout highlights trade-offs between specialization and generalization.
---
## Bar Chart: PKU Undergraduate Exam Average Scores
### Overview
A bar chart comparing the average scores of five AI models across the PKU Undergraduate Exam. The y-axis ranges from 0 to 100.
### Components/Axes
- **X-axis**: Models (GPT-4, o1-2024-12-17, deepseek-r1, o3-mini-2025-01-31, gemini-2.5-pro).
- **Y-axis**: Average Score (0–100).
- **Bars**: Color-coded to match the radar chart legend.
### Detailed Analysis
- **GPT-4**: 59.6 (lowest average).
- **o1-2024-12-17**: 89.7.
- **deepseek-r1**: 85.0.
- **o3-mini-2025-01-31**: 92.2.
- **gemini-2.5-pro**: 94.2 (highest average).
### Key Observations
- GPT-4 underperforms significantly compared to other models.
- o3-mini-2025-01-31 and gemini-2.5-pro achieve near-perfect averages.
- deepseek-r1 lags slightly behind o1-2024-12-17 and o3-mini.
### Interpretation
The bar chart confirms that o3-mini-2025-01-31 and gemini-2.5-pro are the most capable models for undergraduate-level exams, while GPT-4 struggles with consistency. The disparity in averages suggests differences in training data or architectural focus.
---
## Diamond Radar Chart: PKU PhD Qualifying Exam Performance
### Overview
A diamond-shaped radar chart evaluating o3-mini-2025-01-31's performance in four PhD-level topics: Analysis, Geometry & Topology, Probability, and Algebra.
### Components/Axes
- **Axes**:
1. Analysis
2. Geometry & Topology
3. Probability
4. Algebra
- **Legend**: o3-mini-2025-01-31 (orange).
- **Scale**: 0–100 (inner to outer radius).
### Detailed Analysis
- **Scores**:
- Analysis: 100
- Geometry & Topology: 95
- Probability: 100
- Algebra: 100
### Key Observations
- o3-mini-2025-01-31 achieves perfect scores in Analysis, Probability, and Algebra, with a minor dip in Geometry & Topology (95).
### Interpretation
The model demonstrates exceptional mastery of advanced mathematical topics, with only a negligible gap in Geometry & Topology. This suggests strong specialization in high-level theoretical and applied mathematics, making it highly suitable for PhD-level research.
---
## Cross-Referenced Insights
1. **Consistency**: o3-mini-2025-01-31 and gemini-2.5-pro perform well across both undergraduate and PhD exams, indicating versatility.
2. **Specialization**: GPT-4 excels in specific undergraduate topics (e.g., Theory of Functions) but lacks breadth.
3. **PhD Readiness**: o3-mini-2025-01-31's near-perfect PhD scores highlight its potential for advanced research applications.