# Technical Document Extraction: RFT-core Architecture Diagram
## 1. Overview
This image is a technical system architecture diagram for **RFT-core**, a framework likely related to Reinforcement Learning from Feedback or similar iterative model training processes. The diagram illustrates the flow of data and synchronization between environment interaction, data storage, and model training.
## 2. Component Segmentation
### A. External Entities (Left & Top-Right)
* **Environment & Human:** Represented by a globe and a person icon. This is the source of external feedback and the arena for agent interaction.
* **Data Pipelines:** Represented by a magnifying glass over a waveform icon. This represents external data processing streams.
### B. Core System Nodes (The Central Triangle - RFT-core)
The core architecture is organized in a triangular relationship, labeled centrally as **RFT-core**.
* **Buffer:** (Top) Represented by a database icon. Acts as the central repository for experiences and training data.
* **Explorer:** (Bottom-Left) Represented by a robot icon. Responsible for interacting with the environment.
* **Trainer:** (Bottom-Right) Represented by a head with gears icon. Responsible for updating the model based on data.
### C. Infrastructure Layer (Bottom)
* **LLM Infra:** Represented by an oval at the base.
* **Sub-text:** (Training, Inference, Model Sync, ...)
### D. Hierarchical Classification (Right Sidebar)
The diagram is divided into three vertical tiers:
1. **High-Level:** Aligned with Buffer, Environment, and Data Pipelines.
2. **Middle-Level:** Aligned with Explorer and Trainer.
3. **Low-Level:** Aligned with LLM Infra.
---
## 3. Data Flow and Process Logic
The diagram uses directed arrows to indicate the sequence of operations and data movement:
| Source | Destination | Label / Action | Description |
| :--- | :--- | :--- | :--- |
| **Environment & Human** | **Explorer** | Agent-Env Interaction | Bi-directional curved arrows showing the agent acting and receiving state/reward. |
| **Environment & Human** | **Buffer** | Additional Feedback | Direct input of human/environmental feedback into the storage buffer. |
| **Data Pipelines** | **Buffer** | Clean/Filter/Prioritize/Synthesize/... | External data is processed and ingested into the buffer. |
| **Data Pipelines** | **Trainer** | Process Training Batch | Direct feed of processed data batches to the training component. |
| **Explorer** | **Buffer** | Rollout Experiences | The agent sends its interaction history (rollouts) to the buffer. (Accompanied by a document icon). |
| **Buffer** | **Trainer** | Training Data | The trainer pulls stored data from the buffer for model updates. (Accompanied by a document icon). |
| **Trainer** | **Explorer** | Synchronize Model Weights | A feedback loop (with a sync icon) where the updated model from the trainer is sent back to the explorer. |
| **Explorer** | **LLM Infra** | (Solid Arrow) | Indicates the explorer utilizes the underlying LLM infrastructure for inference. |
| **Trainer** | **LLM Infra** | (Solid Arrow) | Indicates the trainer utilizes the underlying LLM infrastructure for training operations. |
---
## 4. Technical Summary of Operations
The **RFT-core** system operates as a closed-loop learning system:
1. The **Explorer** interacts with the **Environment & Human** layer to generate **Rollout Experiences**.
2. These experiences, along with **Additional Feedback** and external **Data Pipelines**, are collected in the **Buffer**.
3. The **Trainer** consumes **Training Data** from the Buffer and **Training Batches** from the pipelines to optimize the model.
4. The updated parameters are sent back to the Explorer via **Synchronize Model Weights**, completing the cycle.
5. The entire process is supported by a **Low-Level LLM Infra** that handles the heavy lifting of training, inference, and synchronization.