# Technical Document Extraction: Constrained Decoding With Logits Mask
## Diagram Components and Flow
### 1. Regular Expression (Top Left)
- **JSON Structure**:
```json
{
"name": "[\\w\\d\\s]+",
"age": "[0-9]+",
"house": "(Gryffindor|Slytherin|Ravenclaw|Hufflepuff)"
}
```
- **Key Elements**:
- `name`: Regex pattern `[\\w\\d\\s]+` (alphanumeric + whitespace)
- `age`: Regex pattern `[0-9]+` (numeric)
- `house`: Literal values: `Gryffindor`, `Slytherin`, `Ravenclaw`, `Hufflepuff`
### 2. Finite State Machine (Top Right)
- **States**: 0 → 1 → 2 → 3 → 4 → 5 → 6 → 7
- **Transitions**:
- Linear progression: `0 → 1 → 2 → 3 → 4 → 5 → 6 → 7`
- Loop: `6 → [0-9]` (self-loop with numeric input)
- **Legend**:
- `0-9`: Numeric input range
- `,`: Comma delimiter
- `[0-9]`: Numeric input (loop condition)
### 3. Decoding Status Examples (Bottom)
#### Example 1: Partial Decoding
- **Input Prompt**:
```
Please fill in the following information about Harry Potter.
{
"name": "Harry",
"age": "",
"hou": ""
}
```
- **Decode + FSM Output**:
- `age`: ✅ Allowed next token
- `Age`: ❌ Not allowed next token
- `hou`: ❌ Not allowed next token
#### Example 2: Sequential Decoding
- **Input Prompt**:
```
Please fill in the following information about Harry Potter.
{
"name": "Harry",
"age": "",
"fif": ""
}
```
- **Decode + FSM Output**:
- `0`: ✅ Allowed next token
- `1`: ✅ Allowed next token
- `fif`: ❌ Not allowed next token
### 4. Legend (Bottom Right)
- **Symbols**:
- ✅: Allowed next token
- ❌: Not allowed next token
## Key Trends and Data Points
1. **Constrained Decoding**:
- The FSM enforces strict token sequencing based on regex patterns.
- Example 1 demonstrates validation of `age` as a valid token but rejects `Age` (case sensitivity) and `hou` (invalid house prefix).
- Example 2 shows sequential token validation (`0`, `1`) for numeric input, rejecting invalid house prefixes like `fif`.
2. **Logits Masking**:
- The FSM acts as a logits mask, pruning invalid token paths during decoding.
- Self-loop on state 6 (`[0-9]`) allows multi-digit numeric input.
## Cross-Referenced Accuracy
- **Legend Alignment**:
- ✅ Symbols in decoding statuses match legend definitions.
- FSM transitions align with regex constraints (e.g., `[0-9]` for numeric input).
## Conclusion
The diagram illustrates a constrained decoding pipeline using a Finite State Machine to enforce regex-based token validation. The logits mask ensures only valid tokens (per FSM transitions and regex patterns) are allowed during Harry Potter information extraction.