## Bar Chart: Problems not solved by leading AI models
### Overview
The image is a bar chart comparing the percentage of problems not solved by leading AI models across different problem types. The y-axis represents the percentage of unsolved problems, ranging from 0% to 100%. The x-axis lists the problem types: FrontierMath, Omni-Math, MathVista, AIME, MATH, GSM-8k, and MMLU. The chart uses a color gradient to indicate saturation, with higher bars representing less saturated problem areas and lower bars representing more saturated ones.
### Components/Axes
* **Title:** Problems not solved by leading AI models
* **Y-axis:** Percentage of problems not solved, ranging from 0% to 100% in increments of 20%. The axis is labeled with "Less saturated" at the top and "More saturated" at the bottom, with an arrow pointing upwards for "Less saturated" and downwards for "More saturated".
* **X-axis:** Problem types: FrontierMath, Omni-Math, MathVista, AIME, MATH, GSM-8k, MMLU.
### Detailed Analysis
The chart displays the following approximate percentages of problems not solved for each category:
* **FrontierMath:** Approximately 98%, colored in teal.
* **Omni-Math:** Approximately 40%, colored in light teal.
* **MathVista:** Approximately 26%, colored in light teal.
* **AIME:** Approximately 26%, colored in light teal.
* **MATH:** Approximately 5%, colored in light teal.
* **GSM-8k:** Approximately 3%, colored in light teal.
* **MMLU:** Approximately 1%, colored in light teal.
### Key Observations
* FrontierMath has a significantly higher percentage of unsolved problems compared to the other problem types.
* The percentage of unsolved problems decreases substantially from Omni-Math to MMLU.
* MATH, GSM-8k, and MMLU have very low percentages of unsolved problems.
### Interpretation
The chart indicates that leading AI models struggle the most with FrontierMath problems, with nearly all problems remaining unsolved. Omni-Math and MathVista also have a relatively high percentage of unsolved problems. In contrast, the AI models perform much better on MATH, GSM-8k, and MMLU problems, suggesting these areas are more "saturated" or well-addressed by current AI capabilities. The data suggests that research and development efforts should focus on improving AI performance in areas like FrontierMath, Omni-Math, and MathVista to address the current gaps in problem-solving capabilities.