## Diagram: Model Training Flow
### Overview
The image illustrates two distinct model training flows, one for Ouro-2.6B and another for Ouro-1.4B. Both flows start with a "Warmup" phase and proceed through several stages of training, including "Stable Training," "CT Annealing," "LongCT," "Mid-Training," "Reasoning SFT," and "Thinking." The Ouro-2.6B flow involves upcycling from an earlier stage, while the Ouro-1.4B flow involves keeping a portion of the data.
### Components/Axes
* **Nodes:** Rectangular boxes representing training stages. Each node contains a title describing the stage and the number of tokens used.
* **Edges:** Arrows indicating the flow of training from one stage to the next.
* **Colors:** Two colors are used to distinguish the two training flows: light blue and light brown.
* **Text Labels:** Text annotations on the edges indicate the amount of data being upcycled or kept.
### Detailed Analysis or ### Content Details
**Top Flow (Ouro-2.6B):**
1. **Warmup:** Initial stage (light blue box).
2. **Stable Training:** 3T Tokens (light blue box).
3. **Upcycle 2.6B:** An arrow labeled "Upcycle 2.6B" leads from the "Stable Training" (light blue) to the next "Stable Training" stage (light brown).
4. **Stable Training:** 3T Tokens (light brown box).
5. **CT Annealing:** 1.4T Tokens (light brown box).
6. **LongCT:** 20B Tokens (light brown box).
7. **Mid-Training:** 300B Tokens (light brown box).
8. **Ouro-2.6B:** (light brown box).
9. **Reasoning SFT:** (light brown box).
10. **Ouro-2.6B Thinking:** (light brown box).
**Bottom Flow (Ouro-1.4B):**
1. **Warmup:** Initial stage (light blue box).
2. **Stable Training:** 3T Tokens (light blue box).
3. **Keep 1.4B:** An arrow labeled "Keep 1.4B" leads from the "Stable Training" to the next "Stable Training" stage (light blue).
4. **Stable Training:** 3T Tokens (light blue box).
5. **CT Annealing:** 1.4T Tokens (light blue box).
6. **LongCT:** 20B Tokens (light blue box).
7. **Mid-Training:** 300B Tokens (light blue box).
8. **Ouro-1.4B:** (light blue box).
9. **Reasoning SFT:** (light blue box).
10. **Ouro-1.4B Thinking:** (light blue box).
### Key Observations
* Both flows share similar stages, but the Ouro-2.6B flow involves upcycling 2.6B tokens, while the Ouro-1.4B flow keeps 1.4B tokens.
* The token counts for "Stable Training," "CT Annealing," "LongCT," and "Mid-Training" are the same for both flows.
* The color change from light blue to light brown in the Ouro-2.6B flow indicates a shift in the training process after the upcycling stage.
### Interpretation
The diagram illustrates the training pipelines for two different models, Ouro-2.6B and Ouro-1.4B. The "Upcycle" and "Keep" annotations suggest different strategies for data reuse or augmentation during training. The consistent token counts across certain stages imply a standardized training regimen, while the color change in the Ouro-2.6B flow might signify a transition to a different training phase or dataset after upcycling. The diagram highlights the key steps and data flow involved in training these models, providing insights into their development process.