## Text Document: Instructions for Evaluating GPT4 Pattern Inference
### Overview
This document outlines a task to verify the correctness of GPT4 in inferring relations or functions from input strings and their mappings. It includes instructions, a multi-step task, and three assessment questions (Q1–Q3) to evaluate GPT4's pattern recognition and description accuracy.
---
### Components/Axes
1. **Instructions Section**
- Goal: Verify GPT4's ability to infer relations/functions from demonstrations.
- Input Format:
- A list of 30 demonstrations mapping an input string `s` to 5 strings (`t1, t2, t3, t4, t5`).
- `s` is an input string; `t1–t5` are output strings (typically single words or sub-words).
- Output: A description of patterns identified across mappings.
2. **Task Section**
- **Part a**: Analyze input strings and their mappings to identify prominent patterns (semantic, language-related, general, or unnatural).
- Expectations: Most cases will exhibit one pattern or none.
- **Part b**: Answer multi-choice questions to assess alignment between GPT4's description and the actual patterns.
3. **Questions (Q1–Q3)**
- **Q1**: Did GPT4 correctly identify the presence/absence of a pattern?
- Options:
1. No pattern (GPT4 said no pattern).
2. No pattern (GPT4 described a pattern).
3. Observable pattern (GPT4 said no pattern).
4. Observable pattern (GPT4 described a pattern).
- **Q2** (if Q1=4): How precise is GPT4's description?
- Options:
- **Correct and accurate**: Accurate description without errors.
- **Correct but inaccurate**: Correct overall but too general/abstract or overly specific.
- **Partially correct**: Correct to some degree but includes errors.
- **Poor**: No description of the pattern.
- **Q3** (if Q1=3 or 4): Categorize the most prominent pattern.
- Options: Semantic, Language, General, Unnatural.
---
### Content Details
- **Input String Format**: `"s: t1, t2, t3, t4, t5"` (e.g., `s` mapped to 5 strings).
- **Pattern Types**: Semantic, language-related, general, or unnatural.
- **Assessment Criteria**:
- **Correctness**: Alignment between GPT4's description and actual patterns.
- **Precision**: Specificity and accuracy of the description.
---
### Key Observations
1. The task emphasizes identifying **prominent patterns** in mappings, with an expectation that most cases will have one pattern or none.
2. Q1 evaluates GPT4's ability to detect patterns, while Q2 and Q3 assess the quality and categorization of those patterns.
3. Answer options for Q2 and Q3 are conditional on prior responses (e.g., Q2 only applies if Q1=4).
---
### Interpretation
This task is designed to rigorously evaluate GPT4's capacity to:
1. **Infer patterns** from structured input-output mappings.
2. **Describe patterns accurately** (Q2) and **categorize them** (Q3).
3. The conditional nature of Q2 and Q3 suggests a focus on progressive assessment, where deeper analysis depends on initial correctness (Q1).
The emphasis on "prominent patterns" and the inclusion of "unnatural" as a category implies a focus on both natural language understanding and anomaly detection. The structured format of input strings (`s: t1–t5`) ensures consistency in evaluating GPT4's ability to generalize across demonstrations.