# Technical Document Extraction
## (a) Different shapes of GEMMs in LLM
### Table Structure
| Operation | M | N | K |
|--------------------------|------------------|------------------|------------------|
| **Prefill phase** | | | |
| K, Q, V projection | SeqLen*B | HD*3 | HD |
| O projection | SeqLen*B | HD | HD |
| FFN1 | SeqLen*B | FD | HD |
| FFN2 | SeqLen*B | HD | FD |
| **Decode phase** | | | |
| K, Q, V projection | B | HD*3 | HD |
| O projection | B | HD | HD |
| FFN1 | B | FD | HD |
| FFN2 | B | HD | FD |
### Key Notes
- **Color Coding**:
- Blue: Prefill phase
- Red: Decode phase
- **Footnotes**:
- HD: Hidden dimension size
- FD: Dimension size after first FFN
- B: Batch size
- SeqLen: Input sequence length
- **Highlight**: "Only 4 shapes!" (red text)
## (b) Decision flow
### Flowchart Description
1. **Start**: "For a certain LLM, traverse four [N, K] selections"
2. **First Decision**:
- `Impl.B > Impl.A?`
- **Yes**: `M++` (increment M)
- **No**: Proceed to next decision
3. **Second Decision**:
- `Impl.C > Impl.B?`
- **Yes**: `M++` (increment M)
- **No**: Find `Mâ‚‚` (final M value)
4. **Termination**: "End"
### Abbreviations
- `ImplA`: FastGEMV
- `ImplB`: Our flat GEMM
- `ImplC`: CUTLASS
## (c) Example of heuristic dataflow with hardware resource adaptation
### Table Structure
| M | Pattern Description | Label Description | [N, K] Dimensions |
|---------|-----------------------------------|--------------------------------------------|-------------------------|
| M=17 | Striped (blue) | Using cuBLAS/CUTLASS... | Not specified |
| M=16 | Striped (blue) | Using cuBLAS/CUTLASS... | Not specified |
| M=9 | Striped (blue) | Using cuBLAS/CUTLASS... | Not specified |
| M=8 | Striped (blue) | Using cuBLAS/CUTLASS... | Not specified |
| M=3 | Dotted (red) | Using our flat GEMM optimization | Not specified |
| M=2 | Dotted (red) | Using our flat GEMM optimization | Not specified |
| M=1 | Solid (blue) | Using GEMV on CUDA Core (e.g., FastGEMV) | Not specified |
### Footnotes
- `[N, K] = [12288, 4096]` (M=17)
- `[N, K] = [4096, 4096]` (M=16)
- `[N, K] = [11008, 4096]` (M=9)
- `[N, K] = [4096, 11008]` (M=1)
### Color Legend
- **Blue**: Prefill phase / cuBLAS/CUTLASS usage
- **Red**: Decode phase / Flat GEMM optimization
- **Solid Blue**: GEMV on CUDA Core
## Spatial Grounding & Trend Verification
1. **Table (a)**:
- All entries follow `[Operation, M, N, K]` format
- Color coding matches phase labels
- No numerical trends (categorical data)
2. **Flowchart (b)**:
- Linear decision tree with two branching points
- No numerical data, only logical conditions
3. **Table (c)**:
- M values decrease from 17 to 1
- Pattern changes from striped → dotted → solid
- [N, K] dimensions vary non-linearly
## Component Isolation
1. **Header**:
- Title: "Different shapes of GEMMs in LLM"
- Subtitle: "Only 4 shapes!" (highlighted)
2. **Main Chart**:
- Table (a) with phase-specific operations
- Flowchart (b) with decision logic
3. **Footer**:
- Table (c) with hardware adaptation examples
- Footnotes explaining abbreviations
## Critical Observations
1. **Hardware Optimization**:
- Different GEMM implementations (FastGEMV, Flat GEMM, CUTLASS) correspond to specific M values
- Resource adaptation shown through [N, K] dimension changes
2. **Phase-Specific Operations**:
- Prefill phase uses larger dimensions (SeqLen*B)
- Decode phase uses batch size (B) with reduced dimensions
3. **Decision Logic**:
- M value selection depends on implementation comparisons
- Final M value determined through sequential comparisons
## Missing Information
- No explicit numerical trends (all data categorical)
- No explicit axis titles beyond table headers
- No explicit legend placement coordinates
## Language Notes
- All text in English
- No non-English content detected