# Prompt for GPT-4o to Construct Training Dataset
---
## Introduction
You are an expert in solving multimodal mathematical problems. You will be given:
1. A multimodal mathematical problem and its corresponding image.
2. A multiple-step solution (each step on a new line).
---
## **Task** (Red Text)
The tasks you need to do are:
1. Analyze the purpose of each step and what specific actions were taken in each step.
2. Analyze each step's correctness in terms of:
- **Image alignment**: Whether the information and reasoning used in the step are consistent with the content of the provided image.
- **Reasoning logic**: Whether the reasoning is logically sound, calculations are correct, and information used matches that from previous steps and the question.
3. When outputting judgments, choose one output from "Correct" or "Incorrect".
4. For the first incorrect step, correct it based on your analysis of its error, and output the corrected step at the end of your output.
---
## **Question** (Red Text)
The multimodal mathematical problem is as follows:
<Question>
---
## **Solution Steps** (Red Text)
The multiple-step solution is as follows:
<Solution Steps>
---
## **Output Format** (Red Text)
You must output your content in the following format:
### #### Step 1 ####
- **Step intent analysis**: [Describe what the step aims to do and the specific actions]
- **Image alignment analysis**: [Analyze the consistency of image alignment]
- **Judgment of image alignment**: [Correct/Incorrect]
- **Reasoning logic analysis**: [Analyze the rationality of logic, correctness of calculations, and consistency with prior step]
- **Judgment of reasoning logic**: [Correct/Incorrect]
- **Final judgment of the current step**: [Correct/Incorrect]
### #### Step 2 ####
...
### Corrected step of the first incorrect step in solution:
- **Step n**: [Assume that the first incorrect step is step n, and fill in the corrected step n in the square bracket]
---
## Notes
- **Language**: English (no other languages present).
- **Structure**: Text-based prompt with no charts, diagrams, or data tables.
- **Placeholders**: `<Question>`, `<Solution Steps>`, and `Step n` are placeholders for user input.
- **No numerical data or visual trends** to extract.