## Textual Comparison: Case Study on Identification of Critical Tokens via Paraphrastic Probing
### Overview
The image displays a side-by-side textual comparison used as a case study. It demonstrates how a mathematical word problem is solved by an AI system, with the core difference being the paraphrasing of a key term ("meters" vs. "distances") in the initial response. The layout is structured in two parallel columns under a common header, showing identical questions and final answers but with differently phrased reasoning steps.
### Components
1. **Header Bar**: A teal-colored bar at the top spanning the full width. It contains the title text in white.
2. **Left Column**: Contains the original problem and solution path.
3. **Right Column**: Contains the same problem but with a paraphrased term in the response.
4. **Text Styling**: Key terms are highlighted in color within the "Response" lines. A checkmark symbol (✓) appears at the end of each solution.
### Content Details
**Header Text:**
`Case study on identification of critical tokens via Paraphrastic Probing`
**Left Column Content:**
* **Question:** `James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week? (ground truth answer: 540)`
* **Response:** `To find the total **meters** James runs in a week, we need to calculate the total meters he runs in one sprint session and then multiply it by the number of sessions he has in a week.` (The word "meters" is highlighted in **red**).
* **Step 1:** `Calculate the total meters James runs in one sprint session. James runs 3 sprints of 60 meters each. Total meters in one sprint session = 3 * 60 = 180 meters.`
* **Step 2:** `Calculate the total meters...` (The text is truncated with an ellipsis).
* **Conclusion:** `Therefore, James runs a total of 540 meters a week`
* **Final Answer:** `The answer is 540. ✓`
**Right Column Content:**
* **Question:** `James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week? (ground truth answer: 540)` (Identical to the left column).
* **Response:** `To find the total **distances** James runs in a week, we need to calculate the total distance he runs in one sprint session and then multiply it by the number of sessions he has in a week.` (The word "distances" is highlighted in **purple**).
* **Step 1:** `Calculate the total distance James runs in one sprint session. Distance per sprint = 60 meters. Number of sprints per session = 3. Total distance per session = Distance per sprint * Number of sprints per session. Total distance per session = 60 * 3. Total distance per session = 180 meters.`
* **Step 2:** `Calculate the total distance James runs in a week.`
* **Conclusion:** `Therefore, James runs a total of 540 meters in a week. ✓`
### Key Observations
1. **Identical Problem & Answer:** Both columns present the exact same word problem and arrive at the identical, correct numerical answer of 540 meters.
2. **Paraphrased Core Token:** The critical difference is in the first sentence of the "Response." The left column uses the term "**meters**" (highlighted in red), which directly matches a unit in the problem. The right column paraphrases this to "**distances**" (highlighted in purple), a more general term.
3. **Divergent Reasoning Paths:** While the final calculation is the same (3 sprints * 60 meters * 3 sessions), the explanatory steps are phrased differently. The right column's reasoning is more verbose and explicitly breaks down the formula (`Distance per sprint * Number of sprints per session`).
4. **Layout for Comparison:** The side-by-side presentation is designed to isolate and highlight the effect of substituting a single, semantically related token ("meters" -> "distances") on the model's generated reasoning process.
### Interpretation
This case study visually demonstrates the concept of **Paraphrastic Probing** in the context of AI model analysis. The goal is to identify which tokens (words) in a prompt are "critical" for the model's reasoning process.
* **What it demonstrates:** By showing that the model can correctly solve the problem even when a key term ("meters") is paraphrased to a synonym or related concept ("distances"), the study suggests the model's understanding is not rigidly tied to specific lexical tokens. It can map different surface forms to the same underlying concept needed for calculation.
* **Why it matters:** This technique is used to test the robustness and generalization capabilities of language models. If a model fails when a non-critical word is changed but succeeds when a paraphrased critical token is used, it indicates a deeper, more semantic understanding of the problem structure. The highlighted colors (red/purple) serve to draw the viewer's attention directly to the manipulated variable in this experiment.
* **Underlying Pattern:** The image is not presenting raw data but a curated example of an experimental method. The "data" here is the qualitative comparison of two text outputs. The key finding is the **invariance of the correct solution** to the paraphrasing of a core problem-solving token, which is the central point of the case study.