# Technical Document Extraction: Distributed Computing Architecture Diagram
This image is a technical schematic illustrating a complex parallel processing strategy for machine learning or large-scale data processing. It visualizes the intersection of **Pipeline Parallelism (PP)**, **Tensor Parallelism (TP)**, and **Data Parallelism (DP)** across 16 devices.
## 1. Component Isolation and Hierarchy
The diagram is organized into a grid structure defined by three primary axes of parallelism:
### Vertical Axis: Pipeline Parallelism (PP)
The image is divided into two horizontal rows:
* **PP 1 (Top Row):** Contains Devices 1 through 8.
* **PP 2 (Bottom Row):** Contains Devices 9 through 16.
### Horizontal Axis: Data Parallelism (DP)
The image is divided into four main vertical sections representing data shards:
* **DP 1:** Encompasses Devices 1, 2, 9, and 10.
* **DP 2:** Encompasses Devices 3, 4, 11, and 12.
* **DP 3:** Encompasses Devices 5, 6, 13, and 14.
* **DP 4:** Encompasses Devices 7, 8, 15, and 16.
### Sub-Horizontal Axis: Tensor Parallelism (TP)
Within each DP group, devices are paired to handle tensor shards:
* **TP 1:** Devices 1, 3, 5, 7 (Top) and 9, 11, 13, 15 (Bottom).
* **TP 2:** Devices 2, 4, 6, 8 (Top) and 10, 12, 14, 16 (Bottom).
---
## 2. Device and Shard Mapping
Each device contains a block representing memory or processing units, subdivided into four segments. Specific segments are highlighted (darker green) and labeled to show how model weights or data shards are distributed.
### DP Group 1 & 2 (Left Half - Darker Green Theme)
| Device | PP Level | TP Level | Shard Label | Highlighted Segment Position |
| :--- | :--- | :--- | :--- | :--- |
| **Device 1** | PP 1 | TP 1 | **1a** | 1st (Leftmost) - *Red Border* |
| **Device 2** | PP 1 | TP 2 | **1c** | 3rd |
| **Device 3** | PP 1 | TP 1 | **1b** | 2nd |
| **Device 4** | PP 1 | TP 2 | **1d** | 4th (Rightmost) |
| **Device 9** | PP 2 | TP 1 | **2a** | 1st (Leftmost) |
| **Device 10** | PP 2 | TP 2 | **2c** | 3rd |
| **Device 11** | PP 2 | TP 1 | **2b** | 2nd |
| **Device 12** | PP 2 | TP 2 | **2d** | 4th (Rightmost) |
### DP Group 3 & 4 (Right Half - Lighter Green Theme)
*Note: This section mirrors the left half, representing a replication of the model across different data batches.*
| Device | PP Level | TP Level | Shard Label | Highlighted Segment Position |
| :--- | :--- | :--- | :--- | :--- |
| **Device 13** | PP 1 | TP 1 | **1a** | 1st (Leftmost) - *Red Border* |
| **Device 14** | PP 1 | TP 2 | **1c** | 3rd |
| **Device 15** | PP 1 | TP 1 | **1b** | 2nd |
| **Device 16** | PP 1 | TP 2 | **1d** | 4th (Rightmost) |
| **Device 5** | PP 2 | TP 1 | **2a** | 1st (Leftmost) |
| **Device 6** | PP 2 | TP 2 | **2c** | 3rd |
| **Device 7** | PP 2 | TP 1 | **2b** | 2nd |
| **Device 8** | PP 2 | TP 2 | **2d** | 4th (Rightmost) |
---
## 3. Data Flow and Sharding Logic
### Shard Grouping (Arrows)
The diagram uses black arrows to indicate how individual tensor shards (1a, 1b, 1c, 1d) are logically grouped into "Shard 1" and "Shard 2":
* **Shard 1:** Comprised of segments from **TP 1** (Device 1/5) and **TP 2** (Device 2/6). Specifically, labels **1a** and **1c** point to "Shard 1".
* **Shard 2:** Comprised of segments from **TP 1** (Device 3/7) and **TP 2** (Device 4/8). Specifically, labels **1b** and **1d** point to "Shard 2".
### Visual Indicators
* **Red Outlines:** Highlighted on the "1a" segments in Device 1 and Device 5. This likely indicates the entry point of a specific data operation or a primary reference point for the documentation.
* **Color Coding:**
* **DP 1 & 2** use a darker olive green for highlighted shards.
* **DP 3 & 4** use a brighter lime green for highlighted shards.
* This distinction emphasizes that while the model structure (PP and TP) is identical, the data being processed (DP) is different.
## 4. Summary of Architecture
* **Total Devices:** 16
* **Pipeline Stages:** 2 (PP 1, PP 2)
* **Tensor Parallelism Degree:** 2 (TP 1, TP 2)
* **Data Parallelism Degree:** 4 (DP 1, DP 2, DP 3, DP 4)
* **Total Sharding:** The model is split into 4 logical shards (a, b, c, d) per pipeline stage, distributed across the TP and DP groups.