Image acec28aa14cb...

EXPERT: gemini-3-flash-free VERSION 1

RUNTIME: nugit/gemini/gemini-3-flash-preview
INTEL_VERIFIED
# Technical Document: Data Processing Pipeline for DPO Training

This document describes a technical flowchart illustrating the data preparation pipeline for Direct Preference Optimization (DPO), specifically focusing on the transformation of the "Skywork Dataset" and the integration of "Synthetic Generation" for factual evaluation.

## 1. Component Isolation and Flow Overview

The diagram consists of five primary processing blocks connected by directional arrows, indicating a sequential and iterative data flow:

1.  **Skywork Dataset (Top Left):** The initial input source.
2.  **DPO Pairs (Top Right):** Structuring the raw data into preference pairs.
3.  **Factual Evaluation (Center Right):** Assessing the factual accuracy of the pairs.
4.  **DPO Transform (Center Left):** Finalizing the data structure for training.
5.  **Synthetic Generation (Bottom Right):** A parallel process for creating corrupted factual data, which feeds back into the main pipeline via **Merging** and **Balancing**.

---

## 2. Detailed Component Analysis

### 2.1 Skywork Dataset (Input)
This block represents a stack of data entries.
*   **Prompt (Purple Label):** "Hi! Can you improve my text?"
*   **Chosen (Green Label):** "Sure, I can help you improve your text. Please provide me with the text and your desired changes."
*   **Rejected (Green Label):** "Sure! I'd be happy to help. What text would you like me to improve?"

### 2.2 DPO Pairs
The Skywork Dataset flows into this block to be structured into a comparison format.
*   **Prompt (Purple Label):** Header for the entry.
*   **Structure:**
    *   `response_0` (Dark Green) is mapped from the **Chosen** label.
    *   `response_1` (Dark Green) is mapped from the **Rejected** label.
    *   `better_response_id` (Light Blue): Value is `"0"`.

### 2.3 Factual Evaluation
The DPO Pairs flow downward into this evaluation stage.
*   **Labels:** Prompt (Purple), response_0 (Green), response_1 (Green), better_response_id (Light Blue).
*   **System Prompt (Blue Label):** (Text not explicitly provided in this specific block, but serves as a placeholder for evaluation logic).
*   **Data Fields:**
    | Field | Value |
    | :--- | :--- |
    | `factual_flag_0` | `"0"` |
    | `factual_flag_1` | `"1"` |
    | `h0` | `"0"` |
    | `h1` | `"1"` |

### 2.4 Synthetic Generation (Bottom Right)
This block describes the creation of "Synthetic Inversion" data to improve factual robustness.
*   **System Prompt (Blue Label):** "You are a factual corruption generator. Rewrite a correct answer into a subtly incorrect version with 2-3 factual errors. It must remain fluent, confident, and plausible." (Text color: Red/Pink).
*   **Prompt (Purple Label):** "Expand the abbreviated word given in the sentence. Output the modified sentence directly. My teacher is a PhD in Lit."
*   **Chosen (Green Label):** "My teacher is a Master of Arts in Literary Studies."
*   **Rejected (Green Label):** "My teacher is a Doctor of Philosophy in Literature."
*   **Metadata:**
    | Field | Value |
    | :--- | :--- |
    | `h_w` | `"1"` |
    | `h_l` | `"0"` |
    | `source` (Pink Label) | `"synthetic Inversion"` |

### 2.5 DPO Transform (Center Left)
This block receives data from the Factual Evaluation and prepares it for the final output.
*   **Labels:** Prompt (Purple), chosen (Green), rejected (Green).
*   **Data Fields:**
    | Field | Value |
    | :--- | :--- |
    | `h_w` | `"0"` |
    | `h_l` | `"1"` |
    | `flipped` | `false` |

---

## 3. Final Processing: Merging and Balancing

The data from **DPO Transform** and **Synthetic Generation** converge at the bottom left of the diagram:

1.  **Merging (Green Box):** Combines the transformed Skywork data with the synthetically generated factual corruption data.
2.  **Balancing (Green Box):** The final step in the pipeline, ensuring the dataset has an appropriate distribution of "Chosen" and "Rejected" responses across different categories (e.g., helpfulness vs. factuality) before training.

## 4. Language Declaration
The primary language of this document and the source image is **English**. No other languages were detected.
DECODING INTELLIGENCE...
TECHNICAL ASSET FINGERPRINT

acec28aa14cbfaeb82f10b76

FOUND IN PAPERS

EXPERT: gemini-3-flash-free VERSION 1