\n
## Text Document: Evaluation Prompt & Examples
### Overview
The image contains a text document outlining the instructions for an evaluation task. The task involves assessing the quality of predicted answers to questions, comparing them to "gold target" answers, and assigning a grade of "CORRECT", "INCORRECT", or "NOT_ATTEMPTED". The document provides examples of each grade to illustrate the expected criteria. A new question is presented at the end for evaluation.
### Content Details
The document is structured as follows:
1. **Introduction:** States the task – to grade predicted answers.
2. **Examples of CORRECT answers:**
* **Question:** What are the names of Barack Obama's children?
* **Gold target:** Malia Obama and Sasha Obama
* **Predicted answer 1:** sasha and malia obama
* **Predicted answer 2:** most people would say Malia and Sasha, but I'm not sure and would have to double check
3. **Examples of INCORRECT answers:**
* **Question:** What are the names of Barack Obama's children?
* **Gold target:** Malia and Sasha
* **Predicted answer 1:** Malia.
* **Predicted answer 2:** Malia, Sasha, and Susan.
4. **Examples of NOT_ATTEMPTED answers:**
* **Question:** What are the names of Barack Obama's children?
* **Gold target:** Malia and Sasha
* **Predicted answer 1:** I don't know.
* **Predicted answer 2:** I need more context about which Obama you are …
5. **New Question for Evaluation:**
* **Question:** Who received the IEEE Frank Rosenblatt Award in 2010?
* **Gold target:** Michio Sugeno
* **Predicted answer:** The recipient of the 2010 IEEE Frank Rosenblatt Award was “Jürgen Schmidhu-ber.” He was honored for his significant contributions to the development of machine learning and neural networks, particularly for his work on long-short-term memory (LSTM) networks, which have been highly influential in sequence modeling and various applications in artificial intelligence.
6. **Grading Instructions:** Grade the predicted answer as A: CORRECT, B: INCORRECT, or C: NOT_ATTEMPTED. Only return the letter.
### Key Observations
* The examples demonstrate that partial correctness can be considered "CORRECT" (e.g., listing the names in a different order).
* The "NOT_ATTEMPTED" examples indicate that a lack of relevant information or a request for clarification constitutes a non-attempt.
* The new question's predicted answer provides a different name ("Jürgen Schmidhuber") than the gold target ("Michio Sugeno").
### Interpretation
The document establishes a clear rubric for evaluating the quality of answers generated by a system. The examples are crucial for understanding the nuances of the grading criteria, particularly the acceptance of partial correctness and the definition of a non-attempt. The final question serves as a test case for applying these criteria. The predicted answer to the final question is factually incorrect, as Michio Sugeno was the recipient of the 2010 IEEE Frank Rosenblatt Award, not Jürgen Schmidhuber. Therefore, based on the provided instructions, the correct grade for the predicted answer would be "B: INCORRECT". The document is designed to assess the ability of a grader to consistently apply a defined set of rules to evaluate the accuracy and completeness of generated responses.