# Technical Document Extraction: Multimodal Transformer for Document Classification
## Diagram Overview
This diagram illustrates a multimodal transformer architecture for document classification, showing data flow from input sampling through model processing to prediction. Key components include entity distributions, meta-parameters, embedding spaces, and token labeling.
---
## Legend Analysis
Legend located at top-left corner with four categories:
1. **In-Task Distribution (ITD) Entities (Labeled)**
- Represented by colored squares (blue, green, red)
- Spatial coordinates: [x=20, y=20] to [x=120, y=60]
2. **In-Task Distribution (ITD) Entities (Unlabeled)**
- White squares with colored borders
- Spatial coordinates: [x=140, y=20] to [x=240, y=60]
3. **Out-of-Task Distribution (OTD) Entities/Background (Labeled)**
- Yellow circles
- Spatial coordinates: [x=260, y=20] to [x=360, y=60]
4. **Out-of-Task Distribution (OTD) Entities/Background (Unlabeled)**
- Yellow triangles
- Spatial coordinates: [x=380, y=20] to [x=480, y=60]
*Note: All legend elements match their corresponding visual representations in the diagram.*
---
## Component Breakdown
### 1. Sampling Stage (Left Panel)
- **Input Sources**:
- `Si (train)`: Labeled in-task documents (blue/green/red squares)
- `Qi (test)`: Unlabeled in-task documents (white squares with borders)
- `P(T)`: Out-of-task distribution sampling (yellow circles/triangles)
- **Document Representation**:
- Token sequences shown as colored blocks (0 to L-1 positions)
- Example: "tan chay yee" document with mixed ITD/OTD tokens
### 2. Meta-Parameters Processing (Center Panel)
- **Attention Mechanism**:
- Red arrows indicate cross-document attention between:
- `Ym` (in-task features)
- `Ymq` (query features)
- Positional encoding: 1D/2D position markers
- **Output**: Task-dependent embedding space (see next section)
### 3. Task-Dependent Embedding Space (Right Panel)
- **Clustering**:
- **Class-1**: Green squares (OTD background)
- **Class-2**: Red squares (ITD entities)
- **Class-3**: Blue squares (ITD entities)
- **Token Labeling**:
- Final predictions shown as yellow triangles connected to input documents via red lines
---
## Data Flow Diagram
1. **Input**: Mixed ITD/OTD documents sampled from distributions
2. **Processing**:
- Meta-parameters (φ) encode document features
- Attention mechanism aligns tokens across documents
3. **Output**:
- Embeddings clustered by class in 2D space
- Predictions mapped back to original documents
---
## Key Observations
1. **Data Distribution**:
- ITD entities (labeled/unlabeled) form distinct clusters
- OTD entities/background show broader distribution
2. **Model Behavior**:
- Attention mechanism prioritizes cross-document relationships
- Embedding space separates classes with clear decision boundaries
3. **Prediction Accuracy**:
- Red/yellow connections show high-confidence predictions for ITD entities
---
## Language Note
All textual content is in English. No non-English text detected.