Image b154ad67d9f6...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Document Extraction: GPT-4 Coherency Scores

## 1. Document Metadata
*   **Title:** (a) GPT-4 coherency scores
*   **Image Type:** Box and Whisker Plot
*   **Language:** English

## 2. Component Isolation

### Header
*   **Text:** "(a) GPT-4 coherency scores"
*   **Function:** Defines the subject of the data: a comparison of coherency performance across different prompting methods using the GPT-4 model.

### Main Chart Area (Data Visualization)
The chart is a box plot comparing five distinct categories. The Y-axis represents a numerical score, and the X-axis represents the prompting method. A vertical dashed line separates the first three methods from the two "refined" methods.

#### Axis Information
*   **Y-Axis (Vertical):** Numerical scale ranging from approximately 3 to 10. Major grid lines and labels are present at **4**, **6**, and **8**.
*   **X-Axis (Horizontal):** Categorical labels for prompting methods:
    1.  **IO** (Input-Output)
    2.  **CoT** (Chain of Thought)
    3.  **ToT** (Tree of Thoughts)
    4.  **IO +refine**
    5.  **ToT +refine**

#### Legend/Color Coding
While there is no explicit floating legend box, the colors distinguish the base methods:
*   **Blue:** Associated with "IO" and "IO +refine".
*   **Orange/Brown:** Associated with "CoT".
*   **Green:** Associated with "ToT" and "ToT +refine".

## 3. Data Extraction and Trend Analysis

The following table estimates the values based on the spatial positioning relative to the Y-axis grid lines (4, 6, 8).

| Method | Color | Median Score | Interquartile Range (Box) | Whiskers (Min/Max excluding outliers) | Outliers (Diamonds) |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **IO** | Blue | ~6.3 | ~5.2 to ~7.4 | ~3.0 to ~8.6 | None visible |
| **CoT** | Orange | ~7.2 | ~6.3 to ~7.8 | ~4.4 to ~8.6 | 1 at ~4.0 |
| **ToT** | Green | ~7.8 | ~7.1 to ~8.4 | ~5.4 to ~9.2 | 4 between ~3.6 and ~4.8 |
| **IO +refine** | Blue | ~8.0 | ~7.3 to ~8.4 | ~5.8 to ~9.5 | 6 between ~4.2 and ~5.7 |
| **ToT +refine** | Green | ~8.0 | ~7.4 to ~8.4 | ~6.2 to ~9.2 | 4 between ~4.8 and ~5.8 |

### Trend Verification
1.  **Performance Progression:** There is a clear upward trend in the median coherency score as the complexity of the prompting method increases (IO < CoT < ToT).
2.  **Impact of Refinement:** The addition of a "+refine" step significantly raises the median score for the "IO" method (from ~6.3 to ~8.0). For "ToT", the median remains relatively stable at ~8.0, but the lower whisker moves up, suggesting a more consistent floor for performance.
3.  **Variance and Outliers:** As the median scores increase, the number of low-end outliers also increases, particularly for the "IO +refine" and "ToT +refine" methods, indicating that while the average performance is higher, there are still specific instances of significantly lower coherency.

## 4. Structural Summary
The diagram demonstrates that **Tree of Thoughts (ToT)** and **Refinement (+refine)** techniques yield higher coherency scores in GPT-4 compared to standard **Input-Output (IO)** or **Chain of Thought (CoT)** methods. The "IO +refine" method shows the most dramatic improvement over its base counterpart, reaching a median performance level comparable to "ToT +refine".
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

b154ad67d9f6292af6209c63

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1