## Technical Document: Prompt Template for GPT-4o Training Dataset Construction
### Overview
The image displays a structured text document outlining a prompt template designed for an AI model (specifically GPT-4o) to construct a training dataset. The document is formatted as a technical specification, defining a task for evaluating and correcting multi-step solutions to multimodal mathematical problems. The text is entirely in English.
### Content Details
The document is organized into the following sections, transcribed verbatim:
**Header:**
`Prompt for GPT-4o to construct training dataset:`
Followed by a dashed line separator.
**Introduction:**
`You are an expert in solving multimodal mathematical problems. You will be given:`
`1. A multimodal mathematical problem and its corresponding image.`
`2. A multiple-step solution (each step on a new line).`
**Task Section (Labeled in Red):**
`**Task**:` (The label "**Task**:" is in red text)
`The tasks you need to do are:`
`1. Analyze the purpose of each step and what specific actions were taken in each step.`
`2. Analyze each step's correctness in terms of image alignment and reasoning logic.`
`- Image alignment: Whether the information and reasoning used in the step are consistent with the content of the provided image.`
`- Reasoning logic: Whether the reasoning is logically sound, calculations are correct, and information used matches that from previous steps and question.`
`When outputting judgements, you must choose one output from "Correct" or "Incorrect".`
`3. For the first incorrect step, correct it based on your analysis of its error, and output the corrected step at the end of your output.`
**Question Section (Labeled in Red):**
`**Question**:` (The label "**Question**:" is in red text)
`The multimodal mathematical problem is as follows:`
`<Question>` (This is a blue placeholder tag)
**Solution Steps Section (Labeled in Red):**
`**Solution Steps**:` (The label "**Solution Steps**:" is in red text)
`The multiple-step solution is as follows:`
`<Solution Steps>` (This is a blue placeholder tag)
**Output Format Section (Labeled in Red):**
`**Output Format**:` (The label "**Output Format**:" is in red text)
`You must output your content in the following format:`
`### Step 1 ###`
`Step intent analysis:[Describe what the step aims to do and the specific actions]`
`Image alignment analysis:[Analyze the consistency of image alignment]`
`Judgement of image alignment:[Correct/Incorrect]`
`Reasoning logic analysis:[Analyze the rationality of logic, correctness of calculations and consistency with prior step]`
`Judgement of reasoning logic:[Correct/Incorrect]`
`Final judgement of the current step:[Correct/Incorrect]`
`### Step 2 ###`
`...`
`Corrected step of the first incorrect step in solution:`
`Step n:[assume that the first incorrect step is step n, and fill in the corrected step n in the square bracket]`
### Key Observations
1. **Document Type:** This is a meta-prompt or a system prompt template, not a chart, diagram, or data visualization. It contains instructions and a structured format for another AI to follow.
2. **Formatting Cues:** Key section labels (`**Task**:`, `**Question**:`, `**Solution Steps**:`, `**Output Format**:`) are highlighted in red. Placeholders for dynamic content (`<Question>`, `<Solution Steps>`) are highlighted in blue.
3. **Structured Output:** The required output format is highly structured, mandating a step-by-step analysis with specific fields for intent, image alignment, reasoning logic, and final judgment for each step in a solution.
4. **Error Correction Protocol:** The template includes a specific instruction to identify and correct the *first* incorrect step found in the provided solution.
### Interpretation
This document defines a rigorous evaluation protocol for multimodal mathematical reasoning. Its purpose is to generate high-quality training data by having an AI model act as a "grader" or "tutor." The process involves:
* **Deconstruction:** Breaking down a solution into discrete steps.
* **Multi-faceted Evaluation:** Assessing each step on two independent axes: fidelity to the provided image (visual grounding) and internal logical/mathematical soundness.
* **Structured Judgement:** Enforcing a binary (Correct/Incorrect) output for clarity and consistency in the generated dataset.
* **Pedagogical Correction:** Requiring the generation of a corrected step, which adds a constructive, teaching-oriented element to the dataset.
The template is designed to produce data that can train models not just to solve problems, but to *explain* and *critique* solution processes, emphasizing the importance of aligning reasoning with visual evidence. The strict output format ensures the generated data is machine-readable and consistent, suitable for fine-tuning or evaluation benchmarks.