## Diagram: Word Embedding Transformation Process
### Overview
The diagram illustrates a conceptual framework for transforming word embeddings using concept vectors. It shows how word embeddings are segmented, combined with concept vectors, and recombined to create new embeddings for specific tokens. The process emphasizes spatial relationships between components and color-coded categorization.
### Components/Axes
1. **Word Embeddings Section (Left)**
- **Structure**: 10 horizontal layers (W1-W10) with segmented regions
- **Segments**:
- Segment 1: Blue dashed lines (top)
- Segment 2: Red dashed lines (middle)
- Segment 3: Green dashed lines (bottom)
- **Color Coding**: Each word layer (W1-W10) has distinct pastel colors (blue, purple, red, green, etc.)
2. **Concept Vectors Section (Center)**
- **Circles**: 12 concept vectors (c1-c12) represented as colored circles
- **Sub-components**: Each circle contains 3-4 smaller colored dots (representing sub-concepts)
- **Color Legend**:
- c1: Green
- c2: Red
- c3: Purple
- c4: Blue
- c5: Teal
- c6: Pink
- c7: Light Blue
- c8: Orange
- c9: Dark Green
- c10: Maroon
- c11: Purple
- c12: Dark Green
3. **Embedding Transformation Section (Right)**
- **Old Embedding**: Horizontal bar labeled "Old Input embedding for token w6" (pink)
- **New Embedding**: Horizontal bar labeled "New Input embedding for token w6" (maroon)
- **Arrows**:
- Red arrows from concept vectors to new embedding
- Green dashed arrows from word embeddings to concept vectors
### Detailed Analysis
1. **Word Embedding Segmentation**
- Segment 1 (blue) spans W1-W3
- Segment 2 (red) spans W4-W6
- Segment 3 (green) spans W7-W10
- Each segment shows progressive color saturation changes
2. **Concept Vector Composition**
- c1 (green): Contains 3 sub-concepts (orange, yellow, green)
- c2 (red): Contains 3 sub-concepts (purple, blue, red)
- c8 (orange): Contains 3 sub-concepts (blue, red, yellow)
- c10 (maroon): Contains 3 sub-concepts (purple, blue, green)
3. **Embedding Transformation**
- Old embedding (w6) shows uniform pink coloration
- New embedding (w6) combines:
- c2 (red segment)
- c8 (orange segment)
- c10 (maroon segment)
- Transformation shown through:
- Red arrows from c2/c8/c10 to new embedding
- Green dashed arrows from word embeddings to concept vectors
### Key Observations
1. **Color Consistency**:
- All concept vector circles maintain their legend colors
- Sub-concept dots within circles use distinct colors but maintain parent circle's color family
2. **Spatial Relationships**:
- Word embeddings flow left → center (concept vectors) → right (new embeddings)
- Segment 2 (red) in word embeddings has strongest connection to concept vectors
3. **Transformation Logic**:
- New embedding combines 3 concept vectors (c2, c8, c10)
- These vectors represent different semantic dimensions (color-coded)
### Interpretation
This diagram represents a conceptual model for contextual word representation learning. The segmentation of word embeddings suggests a multi-dimensional approach to capturing semantic features. The use of concept vectors as intermediate components implies a hierarchical processing structure where:
1. Raw word embeddings are decomposed into meaningful segments
2. These segments are mapped to abstract concept vectors through spatial relationships
3. Selected concept vectors are recombined to create contextually enriched embeddings
The color-coding system serves as both a visual aid and a conceptual framework, with each color family representing different semantic dimensions. The transformation process for token w6 demonstrates how combining specific concept vectors (c2, c8, c10) can create a more nuanced representation than the original embedding.
Notable patterns include the consistent use of color families across different components, suggesting a unified semantic space. The diagram implies that effective word representation requires both decomposition into fundamental components and strategic recombination of those components.