# Technical Document Extraction: Entity Type Classification System
## Diagram Overview
The image depicts a multi-stage entity classification pipeline with color-coded entity types and task generation workflows. Key components include:
1. **Raw Dataset Composition**
2. **Task Generation Process**
3. **Meta-Training/Testing Framework**
4. **Support/Query Set Structure**
---
## 1. Raw Dataset Composition
### Entity Type Distribution
- **Pie Chart**: "All Entity Types in Raw Dataset"
- **Colors & Labels**:
- Red: Entity Type 1
- Orange: Entity Type 2
- Blue: Entity Type 3
- Green: Entity Type 4
- Purple: Entity Type 5
- Yellow: Entity Type 6
- Gray: Out-of-task entities/background
- Light Gray: Paddings
### Base Types vs Novel Types
- **Base Types** (Larger segments):
- Entity Type 1 (Red): 35%
- Entity Type 2 (Orange): 25%
- Entity Type 3 (Blue): 20%
- Entity Type 4 (Green): 15%
- Entity Type 5 (Purple): 5%
- **Novel Types** (Smaller segments):
- Entity Type 6 (Yellow): 10%
- Entity Type 7 (Light Blue): 5%
---
## 2. Task Generation Process
### Task Generator Output
- **Meta-Train Tasks** (`T₁` to `Tₙ`):
- Each task contains 3 entity types (e.g., `T₁`: Red, Orange, Blue)
- Tasks are color-coded using the same legend
- **Meta-Test Tasks** (`T₁*` to `Tₙ*`):
- Similar structure to meta-train tasks
- Includes novel entity types (e.g., `T₁*`: Orange, Purple, Green)
---
## 3. Support/Query Set Structure
### Support (Training) Set
- **Documents** (`Doc 1` to `Doc 5`):
- Each document contains 3-5 entity blocks
- Example (`Doc 1`):
- Red block (Entity Type 1)
- Orange block (Entity Type 2)
- Blue block (Entity Type 3)
- Gray padding
### Query (Validation) Set
- **Documents** (`Doc 1` to `Doc 6`):
- Similar structure to support set
- Includes novel entity types in later documents
- Example (`Doc 6`):
- Blue block (Entity Type 3)
- Red block (Entity Type 1)
- Orange block (Entity Type 2)
- Gray padding
---
## 4. Legend Analysis
- **Color Legend** (Right side):
- **Entity Types**:
- Red: Entity Type 1
- Orange: Entity Type 2
- Blue: Entity Type 3
- Green: Entity Type 4
- Purple: Entity Type 5
- Yellow: Entity Type 6
- **Special Elements**:
- Gray: Out-of-task entities/background
- Light Gray: Paddings
---
## 5. Spatial Grounding & Color Verification
- **Legend Position**: Right side of diagram
- **Color Consistency Check**:
- All entity blocks in support/query sets match legend colors
- Example: `Doc 3` in support set contains Red (Type 1), Orange (Type 2), Blue (Type 3)
---
## 6. Trend Verification
- **Pie Chart Trends**:
- Base Types dominate raw dataset (80% combined)
- Novel Types represent 20% of entities
- **Task Complexity**:
- Meta-train tasks use common entity types
- Meta-test tasks introduce novel combinations
---
## 7. Component Isolation
### Header Region
- Title: "Entity Type Classification System"
- Subtitle: "Multi-stage pipeline for entity recognition"
### Main Chart Region
- Left: Raw dataset → Task Generator → Meta-tasks
- Right: Support/Query sets with color-coded entities
### Footer Region
- Legend explaining color coding
- Spatial grounding of all components
---
## 8. Data Table Reconstruction
### Support Set Table
| Document | Entity Type 1 | Entity Type 2 | Entity Type 3 | Paddings |
|----------|---------------|---------------|---------------|----------|
| Doc 1 | Present | Present | Present | Present |
| Doc 2 | Present | Present | Present | Present |
| Doc 3 | Present | Present | Present | Present |
| Doc 4 | Present | Present | Present | Present |
| Doc 5 | Present | Present | Present | Present |
### Query Set Table
| Document | Entity Type 1 | Entity Type 2 | Entity Type 3 | Paddings |
|----------|---------------|---------------|---------------|----------|
| Doc 1 | Present | Present | Present | Present |
| Doc 2 | Present | Present | Present | Present |
| Doc 3 | Present | Present | Present | Present |
| Doc 4 | Present | Present | Present | Present |
| Doc 5 | Present | Present | Present | Present |
| Doc 6 | Present | Present | Present | Present |
---
## 9. Critical Observations
1. **Data Imbalance**: Base types (Types 1-4) dominate training data
2. **Novelty Introduction**: Meta-test tasks include Type 6 (Yellow) and Type 7 (Light Blue)
3. **Padding Strategy**: Gray blocks used for sequence alignment
4. **Task Diversity**: Each meta-task combines 3 distinct entity types
---
## 10. Missing Information
- No numerical values provided for entity type frequencies
- No explicit timestamps or version information
- No explanation of "out-of-task entities" purpose
---
## Conclusion
This diagram illustrates a comprehensive entity classification system with:
- Color-coded entity type representation
- Multi-stage task generation pipeline
- Support/validation set structure
- Clear spatial organization of components