# Technical Document Extraction: System Architecture Diagram
## Header Section: Text Blocks (Left Side)
Six text blocks with statistical claims about Americans, each paired with a small icon/chart:
1. **"AMERICANS WANT TO BE DISTRACTED FROM REALITY"**
- Icon: Person at desk with computer
- Text: "The desire to escape reality is a growing trend..."
2. **"AMERICANS LOVE VIDEO GAMES"**
- Icon: Pixelated game controller
- Text: "70% of Americans play video games weekly..."
3. **"AMERICANS ARE HYPER-SOCIAL"**
- Icon: Network of interconnected people
- Text: "Social media usage exceeds 10 hours/day..."
4. **"AMERICANS ARE CONNECTED"**
- Icon: Group of people with Wi-Fi symbols
- Text: "80% of Americans own smartphones..."
5. **"AMERICANS LOVE ROUTINE"**
- Icon: Clock with repetitive patterns
- Text: "60% of Americans follow daily routines..."
6. **"AMERICANS ARE INFLUENCED BY OTHERS"**
- Icon: Arrows pointing in multiple directions
- Text: "Peer influence drives 75% of purchasing decisions..."
**Question Prompt**:
_"What percentage of Americans are online?"_
---
## Main Chart: System Architecture Diagram
### Components and Flow
1. **Vision Encoder** (Blue Rectangle)
- Input: Text blocks from Header
- Output: Vision Inputs (6 channels)
2. **Align Module** (Dashed Orange Box)
- **Linear Layer**: Processes Vision Encoder output
- **Layer Norm**: Normalization step
- **LM Head (LLM)**: Language Model Head
- **Layer Norm**: Second normalization
- **Softmax**: Outputs probability distribution
3. **Weighted Average Sum** (Oval Node)
- Combines Vision Inputs and Text Inputs
- Output: Aggregated embeddings
4. **LLM Embedding Matrix** (Pink Rectangle)
- Contains full text embeddings
- Selected Text Embeddings: Subset of matrix
5. **LLM** (Pink Rectangle)
- Final output: _"Response: 90%"_
### Spatial Grounding
- **Legend**: Not explicitly present in diagram
- **Color Coding**:
- Blue: Vision Encoder components
- Pink: LLM-related components
- Orange: Align Module boundaries
---
## Footer Section: Response
- **Output**: _"Response: 90%"_
- **Interpretation**: 90% of Americans are online (inferred from context)
---
## Key Trends and Data Points
1. **Text Block Statistics**:
- 70% of Americans play video games weekly
- 80% own smartphones
- 90% are online (response)
2. **System Flow**:
- Text blocks → Vision Encoder → Align Module → LLM → Final Response
---
## Notes
- No explicit numerical data in the diagram itself; percentages derived from text blocks.
- Diagram uses color coding to differentiate modules (blue = vision, pink = text/LLM).
- No heatmap or data table present; focus on component interactions.