## Computational Metacognition
Michael T. Cox Zahiduddin Mohammad Sravya Kondrakunta
MICHAEL.COX@WRIGHT.EDU
MOHAMMAD.48@ WRIGHT.EDU
KONDRAKUNTA.2@WRIGHT.EDU
Ventaksampath Raja Gogineni
GOGINENI.14@WRIGHT.EDU
Computer Science and Engineering, Wright State University, Dayton, OH 45435 USA
Dustin Dannenhauer Othalia Larue
DUSTIN.DANNENHAUER@PARALLAXRESEARCH.ORG
OTHALIA.LARUE@PARALLAXRESEARCH.ORG
Autonomy Research Group, Parallax Advanced Research, Beavercreek, OH 45431 USA
## Abstract
Computational metacognition represents a cognitive systems perspective on high-order reasoning in integrated artificial systems that seeks to leverage ideas from human metacognition and from metareasoning approaches in artificial intelligence. The key characteristic is to declaratively represent and then monitor traces of cognitive activity in an intelligent system in order to manage the performance of cognition itself. Improvements in cognition then lead to improvements in behavior and thus performance. We illustrate these concepts with an agent implementation in a cognitive architecture called MIDCA and show the value of metacognition in problem-solving. The results illustrate how computational metacognition improves performance by changing cognition through meta-level goal operations and learning.
## 1. Introduction
The computational metacognition process is analogous to an action-perception cycle in an intelligent agent (Cox, 2005). But, instead of perceiving the environment and acting in the world, metacognition monitors cognition and acts to control cognitive activity. In humans, such introspective processes can prove beneficial, as in a student's proper regulation of classroom learning (Price, Hertzog, & Dunlosky, 2009) or in a game show contestant's judgment of knowing answers to questions posed by the show's host (Reder & Ritter, 1992). Yet given the significant overhead and complexity it exhibits (Conitzer, 2011), metacognition is not a panacea for all computational tasks nor is it always beneficial in human performance (Norman, 2020; Wilson & Schooler, 1991). In this paper, we will examine the kinds of metacognitive activities that can improve a cognitive system's performance, and we will demonstrate these concepts in an implemented problem-solving domain.
In some sense, metacognition is an add-on to a cognitive system. If a cognitive system has a certain behavior, then a metacognitive system should be able to observe that behavior and improve the performance of that system by changing its behavior given some parameters under its control. But here we are speaking of cognitive behavior (i.e., problem solving) rather than the physical behavior of the system in the world. The impact on the actual behavior of the joint system is indirect
and improves performance by improving thinking. As such, metacognition is one of the characteristics of intelligence that separates humans from mere reinforcement machines. This paper presents a computational approach to metacognition that seeks to clarify the manner in which such indirect levers can exert such influential effects on performance. Its contribution is to specify the mechanism of metacognition in computational terms within the context of an existing cognitive architecture and to evaluate these concepts in planning problems. Previous work has discussed the process of detecting metacognitive expectation failures; the paper here discusses the processes of meta-level explanation and goal generation as well as meta-level planning and learning. This is our first fully complete implementation of the metacognitive cycle.
Section 2 describes our concept of computational metacognition consisting of explanatory, immediate and anticipatory metacognition. Subsequently, Section 3 shows how a cognitive architecture instantiates many of these ideas computationally and illustrates the principles with examples. Section 4 then evaluates this approach empirically in a simple, plant-protection domain. Finally, Section 5 enumerates related research, and Section 6 summarizes and concludes.
## 2. Computational Metacognition
Though the concept of computational metacognition has many variations in the literature, having both broad and narrow interpretations, virtually all theories divide it into some kind of introspective monitoring and meta-level control . In the broader sense, any self-directed process can be included under its umbrella; whereas, the stricter definition of meta-x as 'x about x' (see Hayes-Roth, Waterman, & Lenat, 1983) constrains the subject to cognition about cognition. This more narrow definition excludes related concepts such as meta-knowledge (i.e., knowledge about knowledge) which is not a cognitive process per se (Cox, 2011). But broadly speaking in all its interpretations across the literature, metacognition is surely 'the many-headed monster of obscure parentage' of which Brown (1987) speaks; it is not a single monolithic construct. Instead, we claim that metacognition has three fundamental forms (see Table 1).
Table 1 . Types of Computational Metacognition
| Explanatory | Immediate | Anticipatory |
|---------------|---------------|----------------|
| Past | Present | Future |
| Hindsight | Insight | Foresight |
| Retrospective | Introspective | Predictive |
Explanatory metacognition is a reflective process triggered by failures in previous cognitive operations and thus represents a process akin to hindsight. Immediate metacognition represents introspective run-time control of cognition, analogous to physical eye-hand coordination. 1 Anticipatory metacognition is a reflective judgement of future cognitive performance and hence represents self-directed foresight. This paper has little room to discuss all three forms. Instead, it will focus on the explanatory category of metacognition and briefly examine the anticipatory one.
1 This is also related to decision-theoretic metareasoning applied to partial computations (Horvitz, 1990) and to bounded rationality decisions for anytime-planning (Zilberstein, 2011). In contrast, our approach is symbolic and non-statistical.
## COMPUTATIONAL METACOGNITION
## 2.1 Explanatory Metacognition
Expectations play important functional roles in cognition such as the monitoring of plan execution (Dannenhauer & Munoz-Avila, 2015; Pettersson, 2005; Schank, 1982), managing comprehension, especially natural language understanding (Schank & Owens, 1987), and influencing emotion (Langley, 2017). Expectations are knowledge artifacts that enable cognitive systems to verify their behavior is working as intended. The agent checks if discrepancies exist between its expectations and its observations of the state of the world. When such a discrepancy is detected, an expectation failure is said to occur. Different means of addressing expectation failures have been proposed including plan adaptation (i.e., modifying a plan to be executed) (Munoz-Avila & Cox, 2008), learning (i.e., acquiring a new piece of knowledge) (Munoz-Avila, 2018; Ram & Leake, 1995a), and goal reasoning (i.e., changing the goals pursued or formulating new goals to be achieved) (Aha, 2018; Roberts, Borrajo, Cox, & Yorke-Smith, 2018). Expectation failures are signals that a problem may have arisen. Explanations of the problem then provide the basis for goal formulation and thus facilitate problem-solving. Note that we are referring to a kind of internal explanation or selfdiagnosis process rather than an external explanation to another agent.
Similarly, metacognitive expectations (Dannenhauer, Cox, & Munoz-Avila, 2018) play an analogous role to the expectations discussed above. But here the expectation concerns the outcome of cognitive processes rather than events in the world. As such, it relates the mental states immediately before and resulting from a specific mental process or action. An individual mental state π π ππ = ( π£π£1 , β¦ , π£π£ ππ ) is a vector of variables; whereas, a mental action πΌπΌ ππ performs reads from or updates to variables in a mental state. The metacognitive expectation then is represented as the triple ( π π ππ ππ , πΌπΌ ππ ππ , π π ππ+1 ππ ) where πΌπΌ ππ ππ is the current cognitive process and each π π ππ represents the system's mental state immediately preceding and following from it. 2 That is, for a given cognitive process, the expectation specifies constraints on memory associated with the execution of the process.
For example, one might expect commitment to a current goal before a planning process and a plan to be in memory after planning finishes. That is, the expectation would be represented as the triple ( ππ ππ β π π ππ ππ , ππππππππ , ππ β π π ππ+1 ππ ) . If a plan does not result after planning executes, then something went wrong in cognition (Cox & Dannenhauer, 2016). The reasoning from such metacognitive expectation failures is in hindsight, because it happens after a mistake occurs. The function of metacognition is to explain what caused the cognitive-level failure and to formulate a meta-level goal to mitigate the problem. For example, changing the goal minimally to something easier but serving the same end may enable successful planning when physical resources are scarce (see Cox & Veloso, 1998).
Multiple meta-level goals are available to affect cognition. Currently, we have implemented three different meta-level goal types. The first aims to change cognition directly, while the other two represents indirect change.
1. To change the reasoning method (e.g., change from state-space planning to case-based planning) (Cox & Dannenhauer, 2016);
2. To change cognition by changing the goals of the system (Cox, Dannenhauer, & Kondrakunta, 2017);
3. To change cognition by learning new knowledge structures (Mohammad, 2021; Mohammad, Cox, & Molineaux, 2020).
2 Dannenhauer, et al. (2018) defined metacognitive expectations as a Boolean function taking such a triple as input. But, the differences are unimportant for the purposes of this paper.
This paper will focus on the third type, that is, learning-goals (i.e., an explicit meta-level goal to learn a specific piece of knowledge) (Cox, 1997; Cox & Ram, 1999; Ram & Leake, 1995b) as a metacognitive response, but the others are equally as important. Plans can be generated at the metalevel composed of 'actions' such as performing learning in pursuit of a learning goal or executing different goal operations (Cox, et al., 2017; Kondrakunta, Gogineni, & Cox, in press) such as goal change in pursuit of the second type above. But, the goal delegation operation has a particular role in anticipatory metacognition which spans both cognition and metacognition.
## 2.2 Anticipatory Metacognition
The concept of anticipatory metacognition is forward looking. Humans demonstrate foresight concerning their cognitive prowess or the lack thereof on a regular basis. Likewise, cognitive systems can benefit from an ability to predict whether or not they can achieve their goals as opposed to exhaustively trying all possible solutions before acquiescing. Unlike anticipatory thinking (Amos-Binks & Dannenhauer, 2019) that predicts plan execution failures and seeks to mitigate the vulnerability of a plan after it is generated, our approach is to anticipate a failure given the goal but before planning is performed. Making such a prediction, an agent can simply delegate some of its goals to another agent willing to help it and thus mitigate the amount of work to be done.
In contrast to explanatory metacognition, which is triggered by metacognitive expectation failures, anticipatory metacognition is triggered by the presence of suspended goals . Suspended goals are ones that were part of an agent's current goal set but were determined to be problematic given physical resource limitations for example. This condition suggests that planning or other reasoning will fail in the future. As a result, metacognition can generate a meta-level goal to change the cognitive-level goal. In this case, it is a change from the agent's goal to another agent's goal. Thus, the goal delegation operation is partially performed at the cognitive level and partially at the meta-level. Suspending the goal is cognitive; whereas, reasoning about another agent's knowledge, skills and goals in relation to the agent's own goals is associated with Theory of Mind (Goldman, 2006; Gopnik, 2012; Wellman, 1990) and with metacognition. This allows the system to target a good candidate agent for the delegation. Finally, if the agent decides to delegate one or more of its goals and determines a candidate to achieve them, the agent still needs to make an actual request, explain the need for the request, and negotiate the favor. Such reasoning is once again situated at the cognitive level and in the speech acts (Searle, 1969) executed in the world. Gogineni, Kondrakunta, and Cox (in press) provide a complete example of this hybrid process with experimental evidence supporting the benefits of this type of metacognitive activity.
Given space limitations, this paper cannot but sketch the mechanisms behind anticipatory metacognition. Instead, we will describe the detailed process of explanatory metacognition and show how it is implemented in a particular cognitive architecture. Specifically, we will explain how it manifests as a combination of introspective monitoring and meta-level control. With these details in hand, we will then provide an empirical evaluation that demonstrates the benefit of computational metacognition for cognitive systems.
## 3. Architectural Structure and Implementation
The metacognitive integrated dual-cycle architecture (MIDCA) is a cognitive architecture that models both cognition and metacognition for intelligent agents (Cox et al., 2016; Cox, Oates, & Perlis, 2011; Paisner, Cox, Maynord, & Perlis, 2014) and focusses upon various goal operations including goal change, goal formulation and goal delegation (Cox & Dannenhauer, 2017; Cox, et
## COMPUTATIONAL METACOGNITION
al., 2017). 3 It consists of 'action-perception' cycles at both the cognitive level and the metacognitive level (see Figure 1). In general, a cycle performs problem-solving to achieve its goals and then comprehends the results of its behavior and interactions with other agents in its environment. The problem-solving portion of each cycle consists of three phases: intention; planning; and the action-execution/control phase. The comprehension portion consists of perception/monitoring, interpretation, and goal evaluation.
The representations in MIDCA and our formal notation borrow much from the AI planning community (e.g., Ghallab, Nau, Traverso, 2004). We rely on the notion of a state transition system Ξ£ = ( ππ , π΄π΄ , πΎπΎ ) for representing a planning problem. In Ξ£ , ππ is the set of all possible world states, π΄π΄ is the set of actions the agent can take, and πΎπΎ is the successor function πΎπΎ : ππ Γ π΄π΄ β’ ππ . A planning problem is represented as π«π« = ( Ξ£ , π π 0 , ππ ) where the initial state π π 0 β ππ and the goal ππ β πΊπΊ β ππ . A plan ππ = β©πΌπΌ1 , πΌπΌ2 , β¦ , πΌπΌ ππ βͺ is a sequence of actions πΌπΌ ππ β π΄π΄ . The plan ππ represents a solution to π«π« by achieving ππ if and only if its iterative execution from π π 0 results in a final state πΎπΎ ( π π ππ-1 , πΌπΌ ππ ) that entails the goal expression. Note however, that a problem need not always start from some arbitrary initial state but instead may arise during the planning or the plan execution related to some previous problem (Cox, 2020). 4
At the cognitive level, comprehension starts with observations in terms of percepts ( ππ β ) of the world ( πΉπΉ ) whereby the Perceive phase infers the objects in the environment and the relationships between them. The Interpret phase takes as input the resulting relational state ( π π ππ ) and the expectations in memory to determine whether it is making sufficient progress. It is here that a model of the world ( ππΞ¨ ) is inferred from its observations, and new goals ( ππ ππ ) are generated when the model indicates problems or opportunities. Interpret adds any new goals to the goal agenda ( πΊπΊ οΏ½ = { ππ1 , ππ2 , β¦ ππ ππ } ). The Evaluate phase incorporates the concepts inferred from Interpret and checks whether the current goal set ( ππ ππ β πΊπΊ οΏ½ ) is achieved, i.e., whether π π ππ entails ππ ππ . If so, ππ ππ is removed from the agenda and set to the empty set (i.e., πΊπΊ οΏ½ β πΊπΊ οΏ½ / ππ ππ and ππ ππ β {} ).
In cognitive-level problem solving, the Intend phase commits to a new goal set ππ ππ from those available in πΊπΊ οΏ½ if the old goal set is empty. The Plan phase then generates a sequence of actions (i.e., the current plan, ππ ππ = β©πΌπΌ1 , πΌπΌ2 , β¦ πΌπΌ ππ βͺ ) to perform in pursuit of its goals. The Act phase executes the steps of the plan ( πΌπΌ ππ ) one at a time to change the environment through their effects. MIDCA will then use expectations about these actions in subsequent cycles to evaluate the execution of the plan. At the end of each cognitive phase, MIDCA performs an entire metacognitive cycle as explained below.
Like cognition, MIDCA partitions metacognition into introspective monitoring (analogous to cognitive-level comprehension) and meta-level control (analogous to problem-solving). Introspective monitoring detects metacognitive expectation failures and formulates goals to change cognition in response. Meta-level control generates and executes plans to achieve these goals and thus improve cognitive performance. See Dannenhauer, et al. (2018) for a more formal specification of the two-cycle MIDCA mechanism. Here we give enough detail to capture the metacognitive process computationally and allow the reader to follow the examples and understand the evaluation.
3 MIDCA version 1.5 is open-source and runs on python 3.8. The source code and documentation are publicly available at https://github.com/COLAB2/midca. See also www.midca-arch.org .
4 We actually assume a somewhat different problem representation π«π« ππ = ( π π ππ , π π ππ , π΅π΅π΅π΅ , π»π» ππ ) where π π ππ and π π ππ represent the currently observed and expected world states, π΅π΅π΅π΅ is the background knowledge and π»π» ππ is the episodic problemsolving history of the agent (see Cox, 2020). But the differences are not important for the purposes of this paper and may be ignored by the reader.
Figure 1 . The metacognitive integrated dual-cycle architecture and the flow of knowledge between computational phases. The lower (orange) cycle represents cognition, receiving stimuli in the form of percepts and acting upon the environment, thus changing the world state. The upper (blue) cycle represents metacognition, receiving an introspective trace of cognition and controlling the cognitive level through goal operations and learning. Note that all knowledge structures π΅π΅ that are metacognitive have the superscript π΅π΅ ππ .
<details>
<summary>Image 1 Details</summary>

### Visual Description
\n
## Diagram: Cognitive Architecture - Mental Domain & Meta-Level Control
### Overview
The image depicts a cognitive architecture diagram illustrating the interaction between a "Mental Domain" (Ξ©) and a "Meta-Level Control" layer. The diagram uses a circular flow to represent cognitive processes like intending, planning, acting, perceiving, evaluating, and interpreting. It highlights the flow of information, goals, and states between different components within the system. The diagram is divided into two main sections: the upper section representing Meta-Level Control, and the lower section representing the Mental Domain.
### Components/Axes
The diagram contains the following key components:
* **Meta-Level Control (Blue):** Encompasses "Goal Management", "Intend", "Plan", "Controller", "Monitor", "Evaluate", and "Interpret".
* **Mental Domain (Ξ©) (Yellow/Orange/Green):** Includes "Goals", "Memory", "Interpret", "Evaluate", "Perceive (& Listen)", "Act (& Speak)", and "World".
* **Flow Arrows:** Indicate the direction of information and control flow between components.
* **Labels:** Numerous labels describe the processes, data, and relationships within the system. These include terms like "goal change", "goal insertion", "reasoning trace", "hypotheses", "actions", "state", "noise", etc.
* **Mathematical Notation:** Variables like g<sub>s</sub><sup>M</sup>, g<sub>c</sub><sup>M</sup>, ΞΞ©, Ο<sub>l</sub>, Ο, etc., are used to represent specific parameters or values.
* **Text Blocks:** Descriptions within components like "Mission & Goal Agenda (g)", "World Model (M<sub>w</sub>)", "Semantic Memory (Ξ£) & Ontology", "Plans(Ο<sub>l</sub>) & States(s)".
### Detailed Analysis or Content Details
**Meta-Level Control (Blue):**
* **Goal Management (Top):** Receives "goal change" and "goal priorities" and outputs "subgoal" (Ag<sup>M</sup>) and "goal insertion" (g<sub>h</sub><sup>M</sup>).
* **Intend:** Receives "subgoal" (Ag<sup>M</sup>) and outputs "Goal" (g<sup>M</sup>).
* **Plan:** Receives "Goal" (g<sup>M</sup>) and outputs "Ο<sub>l</sub><sup>M</sup>" and "Ο<sub>l</sub>".
* **Controller:** Receives "Algorithms" (Ξ±<sup>M</sup>) and outputs "ΞΞ©".
* **Monitor:** Receives "ΞΞ©" and outputs "Ο<sub>l</sub>".
* **Evaluate:** Receives "g<sub>c</sub><sup>M</sup>" and outputs "ΞM<sub>Ξ©</sub>".
* **Interpret:** Receives "M<sub>Ξ©</sub>" and outputs "Hypotheses" (M<sub>Ο</sub>).
**Mental Domain (Ξ©) (Yellow/Orange/Green):**
* **Goals (Top-Left):** Receives "goal change" (g<sub>0</sub>) and "goal input" and outputs "subgoal" (Ag) and "goal insertion" (g<sub>h</sub>).
* **Memory (Center-Left):** Contains "Mission & Goal Agenda (g)", "World Model (M<sub>w</sub>)", "Semantic Memory (Ξ£) & Ontology", "Plans(Ο<sub>l</sub>) & States(s)". Outputs "g<sub>c</sub>" and "M<sub>Ο</sub>".
* **Interpret (Top-Right):** Receives "M<sub>Ο</sub>" and outputs "Hypotheses" (M<sub>Ο</sub>).
* **Evaluate (Center-Right):** Receives "Hypotheses" (M<sub>Ο</sub>) and outputs "ΞM<sub>Ο</sub>".
* **Perceive (& Listen) (Bottom-Right):** Receives "State" (s<sub>l</sub>) and outputs "Ο".
* **Act (& Speak) (Bottom-Left):** Outputs "Actions" and receives "Ο".
* **World (Bottom):** Receives "ΞΟ" and outputs "noise" (Ο = Ο').
**Flow and Connections:**
* A circular flow exists between "Intend", "Plan", "Act (& Speak)", "Perceive (& Listen)", "Evaluate", and "Interpret" within the Mental Domain.
* The "Controller" in the Meta-Level Control influences the "Mental Domain" through "ΞΞ©".
* The "Monitor" receives feedback from the "Mental Domain" via "Ο<sub>l</sub>".
* The "Evaluate" and "Interpret" components in both layers exchange information.
**Mathematical Notation:**
* g<sub>s</sub><sup>M</sup>: Goal state (Meta-Level)
* g<sub>c</sub><sup>M</sup>: Goal control (Meta-Level)
* ΞΞ©: Change in the Mental Domain
* Ο<sub>l</sub>: Time constant
* Ο: Perception/Noise
* g<sub>0</sub>: Initial Goal
* g<sub>h</sub>: Goal Hypothesis
### Key Observations
* The diagram emphasizes a hierarchical control structure, with the Meta-Level Control overseeing and modulating the Mental Domain.
* The circular flow within the Mental Domain suggests a continuous cycle of perception, action, and evaluation.
* The use of mathematical notation indicates a formal, computational approach to modeling cognition.
* The diagram is highly interconnected, with numerous feedback loops and information pathways.
* The "World" component is positioned as the external environment influencing the system through noise.
### Interpretation
This diagram represents a computational cognitive architecture, likely intended to model human-like intelligence. The separation of Meta-Level Control and the Mental Domain suggests a distinction between conscious control and subconscious processing. The Meta-Level Control acts as a supervisory system, setting goals, monitoring progress, and adjusting strategies. The Mental Domain represents the core cognitive processes involved in perception, action, and reasoning.
The circular flow within the Mental Domain highlights the iterative nature of cognition, where actions lead to perceptions, which inform future actions. The inclusion of "noise" in the "World" component acknowledges the inherent uncertainty and imperfection of real-world perception.
The mathematical notation suggests that this architecture is designed to be implemented as a computational model, allowing for quantitative analysis and simulation of cognitive processes. The diagram is a high-level representation, and further details would be needed to understand the specific algorithms and data structures used within each component. The diagram is a conceptual model, and the relationships between the components are likely more complex in reality. The diagram is a representation of a theoretical framework for understanding cognition, and it is not necessarily a complete or accurate model of the human mind.
</details>
## 3.1 Introspective Monitoring
At the meta-level, introspective monitoring starts with a trace ( ππ ππ ) of the mental domain, that is, of the activity at the cognitive level (again see Figure 1). In support of this knowledge structure, Dannenhauer, et al., (2018) defines an agent's self-model (i.e., a model of the cognitive level) as Ξ© = ( ππ ππ , π΄π΄ ππ , ππ ) where each π π ππ ππ β ππ ππ is a possible mental state, each πΌπΌ ππ ππ β π΄π΄ ππ is a mental action, and ππ is a cognitive transition function. 5 In MIDCA, a mental state is represented as a vector of length ππ = 7 where π π ππ ππ = ( ππ ππ , πΊπΊ οΏ½ , ππ ππ , ππΞ¨ , π·π· , πΈπΈ , πΌπΌ ππ ) . That is, it consists of the current goal, the goal agenda, the current plan, the current world state, discrepancies detected, explanations generated, and the last action executed in the world. MIDCA employs the following mental actions: Perceive, Detect Discrepancies, Explanation, Goal Insertion (i.e., updates the set of goals πΊπΊ οΏ½ to be achieved), Evaluate, Intend, Plan, and Act. Each of these represent either one of the six phases at the cognitive level or a subprocess within Interpret. The cognitive transition function is defined as ππ : ππ ππ Γ π΄π΄ ππ βΆππ ππ . That is, given a current mental state, ππ provides the successive mental state resulting from the execution of that mental action and hence is a source of metacognitive expectations. Finally, the trace is an interleaved sequence of mental states and mental actions. More specifically, ππ ππ = β©π π 0 ππ , πΌπΌ1 ππ , π π 1 ππ , πΌπΌ2 ππ , β¦ , πΌπΌ ππ ππ , π π ππ ππ βͺ where π π 0 ππ , β¦ , π π ππ ππ β ππ ππ and πΌπΌ1 ππ , β¦ , πΌπΌ ππ ππ β π΄π΄ ππ .
The Monitor phase takes the recorded trace and places it in memory for the meta-level Interpret phase to examine. As with the cognitive level phase, meta-level Interpret seeks to detect and explain discrepancies between specific metacognitive expectations and its 'observations' in segments of the cognitive trace. As mentioned in Section 2.1 these expectations consist of the memory states before and after a given cognitive phase or phase component. If a discrepancy exists, then Interpret attempts to explain the discrepancy and formulate a meta-level goal.
Consider an agent that is learning to take care of a garden having goals such as the preservation of native plants and the removal of invasive ones. Its knowledge of plant care, such as the correct application of herbicide, may be flawed, and therefore it may make mistakes in both the execution of solutions (i.e., its behavior in the world) and in the derivation of those solutions (i.e., its reasoning to solve problems). For example, the agent's model of the spray action may lack the knowledge that spraying herbicide will kill plants in locations adjacent to those it targets. When the goal of native plant preservation is violated after spraying invasive ones close to the natives, a normal discrepancy occurs. However, when the Interpret phase fails to explain this discrepancy, a metacognitive expectation fails. That is, the cognitive trace ππ ππ does not adhere to the expectation ( πππππ π ππππππππππππππππ β π π ππ ππ , Explain , ππππππππππππππππππππππ β π π ππ+1 ππ ) since the Explain component of the Interpret phase failed to produce the expected explanation. 6 Now the discrepancy at the metalevel must be explained.
At the meta-level Interpret phase, the idea is to detect discrepancies, explain what caused the discrepancy, and then generate a goal to remove the cause. Metacognitive explanation is a casebased process that reuses old explanations from a library of meta-explanation patterns (Meta-XPs) in a similar manner to Cox and Ram (1999; Ram & Cox, 1994). Figure 2 shows a Meta-XP applied to the gardening example. MIDCA retrieves an explanation and then checks for applicability by inspecting the Meta-XP's pre-XP nodes. These sink nodes in the graph of the knowledge structure represent those conditions that need to hold for the XP to be relevant. A distinguished node among them called the Explains node is the concept being explained. As such, the XP provides a causal
5 The self-model Ξ© is similar to the state transition system Ξ£ defined previously but is a meta-level knowledge structure.
6 Note that if another gardener was observed concurrently spraying herbicide on the adjacent native plant, then an explanation as to why the native plant was killed could be produced and a goal to stop the gardener from repeating that in the future formulated. If this actually was the case, then no metacognitive expectation failure would occur.
## COMPUTATIONAL METACOGNITION
Figure 2 . Meta-explanation pattern for failed action execution due to a poor action model. An explanation pattern (Schank, 1986) is a causal knowledge structure representing prior experience. It maps the causes of the explains node (i.e., what is being explained) from the antecedents (i.e., the XP-asserted nodes) through the internal nodes to the consequents (i.e., the pre-XP nodes).
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Diagram: XP-Asserted to Pre-XP Node Flow
### Overview
The image depicts a diagram illustrating the flow of information from "XP-Asserted Nodes" (source nodes) to "Pre-XP Nodes" (sink nodes) through "Internal Nodes". The diagram uses labeled ovals to represent node groups and arrows to indicate the direction of information flow. Each node is associated with a specific statement or condition.
### Components/Axes
The diagram is divided into three main sections:
1. **XP-Asserted Nodes (source nodes):** Located on the left side of the image.
2. **Internal Nodes:** Represented by a gray arrow pointing from the XP-Asserted Nodes to the Pre-XP Nodes.
3. **Pre-XP Nodes (sink nodes):** Located on the right side of the image.
Each node group contains several individual nodes represented by smaller circles. Each node is labeled with a specific statement.
### Detailed Analysis or Content Details
**XP-Asserted Nodes (source nodes):**
* **executed-action:** (self, Spray(pos3-2, t-33))
* **knowledge:** (Spray(cell, time), Incomplete)
* **Β¬exists (other-agent):** (Negation of the existence of another agent)
**Internal Nodes:**
* Represented by a gray arrow indicating a transformation or flow of information.
**Pre-XP Nodes (sink nodes):**
* **at-phase(Interpret):** Indicates a state at the interpret phase.
* **Explains Node:** Β¬exists(explanation) (Negation of the existence of an explanation)
* **Β¬native-at(pos3-3):** (Negation of being native at position 3-3)
The arrows connecting the nodes indicate a one-to-one mapping between the source nodes and the sink nodes.
### Key Observations
The diagram highlights a process where actions, knowledge, and the absence of other agents (XP-Asserted Nodes) are transformed into states related to interpretation, explanation, and location (Pre-XP Nodes). The use of negation (Β¬) in several statements suggests the importance of what *doesn't* exist or *isn't* true in this process.
### Interpretation
This diagram likely represents a cognitive or computational model of how information is processed. The "XP-Asserted Nodes" could represent initial observations or actions, while the "Pre-XP Nodes" represent the resulting inferences or conclusions. The "Internal Nodes" represent the processing steps that transform the initial information into the final state.
The statements themselves suggest a scenario involving an agent ("self") performing a "Spray" action, possessing incomplete knowledge about it, and operating in an environment where other agents are absent. The resulting state involves interpreting the action, recognizing the lack of an explanation, and noting the agent's non-native status at a specific location.
The diagram could be used to model reasoning, planning, or decision-making processes in artificial intelligence or cognitive science. The use of logical notation (Β¬exists) suggests a formal or symbolic representation of knowledge and reasoning. The diagram is not presenting numerical data, but rather a logical relationship between concepts.
</details>
chain from the XP-asserted nodes (i.e., the XP's antecedents) to the explains node. In the running example, the Meta-XP of Figure 2 explains why the cognitive level Interpret phase failed to generate an explanation when MIDCA was currently at the Interpret phase and a native-plant was not preserved. If that is the case, then the XP says that this is caused when the spray action was executed, no other agent exists nearby, and knowledge of the spray action is incomplete.
The meta-level Interpret phase then takes this knowledge structure and generates a goal from the set of XP-asserted nodes. Here the goal is to negate the middle node, that is, to make the knowledge of the spray operator not incomplete (i.e., to learn a better action model). The specific form output to the meta-level goal agenda is (learned spray, π π ππ+1 ) . See Gogineni, Kondrakunta, Molineaux, & Cox (2018) for further details concerning XP retrieval, selection and application at the cognitive level. The same techniques are used at the meta-level.
## 3.2 Meta-level Control
Like problem-solving at the cognitive level, meta-level control consists of an Intend phase, a metalevel Plan phase, and a Controller. Intend takes the meta-level goal agenda ( πΊπΊ οΏ½ ππ ) and chooses a subset as the current meta-level goal set ( ππ ππ ππ ). Details are available in Kondrakunta and Cox (in press; Kondrakunta, 2017) for the decision mechanism. In our running example, only the single learning goal exists in the agenda, and so Intend selects it as the current goal set.
The Plan phase of the meta-level then takes the current goals and generates a plan ππ ππ ππ to achieve it using the fast-downward stone soup planner (Helmert, 2006; Helmert, Roger, & Karpas, 2011). Unlike normal planners that generate sequences of actions in the world to achieve environmental states, MIDCA uses fast downward to achieve meta-level goals. Given the learning goal (learned spray π π ππ+1 ) , Fast Downward must generate a sequence of learning steps that achieves the goal if executed. To perform this task, MIDCA has a number of operators represented in the Planning Domain Definition Language (PDDL) 2.2 (Edelkamp & Hoffmann, 2004), one of which is shown in Table 2. To select the perform-learning action, the agent checks whether it has a
Table 2 . Action model of the meta-level operator perform-learning . Specified in PDDL 2.2, this primitive, planning operator achieves the learning goal (learned ?op ?current-state) .
```
```
discrepancy in the current state, it has an outdated operator, and the same operator caused the discrepancy. If all the preconditions are met, then it results in the learning goal.
The Controller then attempts to execute the plan one step ππ ππ ππ at a time. In the example, the plan has a single step to perform learning. MIDCA uses the First-Order Inductive Learner (FOIL) (Quinlan, 1990) for this paper. FOIL induces function-free Horn clauses from a set of positive and negative concept examples and some background knowledge represented as a set of first-order logical predicates. It performs hill climbing with an information theoretical function and generates a rule that covers all the examples.
The FOIL algorithm is given positive or negative examples experienced during the execution of failed plans. For example, the spray action may have killed a plant in a location directly north of the intended cell. If so, FOIL will learn the following general Horn clause.
spray (pos1, time2) :- spray (pos0, time1), adj\_time (time2, time1), adj\_north (pos0, pos1)
That is, if the agent sprays a location at time t1, it will be as if it sprays the adjacent location due north at t2. Doing so will then kill all plants in the cell north of the spray. MIDCA then compares such rules generated by the learning procedure with the agent's knowledge of the operator. As a result, it changes the spray operator by adding conditional effects.
```
```
After the agent's memory is updated with the modified operator, the metacognitive cycle ends. Once metacognition completes, MIDCA continues with the subsequent cognitive phase. Given an
incrementally improved operator as with our example or otherwise with some other positive change to cognition, performance will improve. The following section examines this claim empirically and shows specific benefits to cognitive systems that reason about themselves.
## 4. Empirical Evaluation
We evaluate the claim that metacognition can improve the performance of cognitive systems with a relatively simple planning task. The experiments we perform use the plant protection domain developed at the Naval Research Laboratory.
## 4.1 The Plant Protection Domain
The plant protection domain consists of harmful invasive plants, endangered native plants, a human supervisor, and an agent which navigates to a target cell and deploys herbicides (Boggs, Dannenhauer, Floyd, & Aha, 2018). When an agent deploys herbicides, it will affect the neighboring cells as well. The world in which agents act is a map grid of size 10 β¨ 10, as shown in Figure 3. It consists of tiles where a plant occupies a single tile, and the tiles contain at most one plant. Plants are static fixtures that cannot be moved or replanted. The garden area where plants exist lies between (2,2) and (7,7), i.e., the white rectangle. In this domain, two key action models exist: a move operator and a spray operator. The walkway around the planting area exists for the ease of agent movement.
Figure 3 visually depicts a very simple problem with a gardening agent at location (0,0). The goals of the agent are to preserve all native plants and to remove all invasive ones, i.e., nativeat(pos3-3) β§ Β¬invasive-at(pos2-5) β§ Β¬invasive-at(pos3-2) . Here, the Perceive phase of the cognitive cycle inputs the requested expression for the goals, Interpret adds these goals to the goal agenda, and Intend makes all goals current. Finally, the Plan phase takes the current goal set and generates a plan to move to location (3,2), spray the invasive plant, move to location (2,5), and spray again. Given that the native plant already exists at location (3,3) in the initial state, nothing needs to be done to achieve the goal to have a native plant at (3,3). However, the agent does not realize that spraying in one cell will kill plants in all adjacent cells. Thus, the native plant in (3,3) is also killed.
## 4.2 Experimental Design
Two experiments provide a baseline for determining how metacognition with learning affects the performance of an agent in the plant protection domain. Our tests were run in the 10 Γ 10 world (as discussed above) with a standard planning agent and a metacognitive learning agent. The standard agent does not perform goal reasoning or learning. It just executes plans to remove all invasives. The learning agent learns a more accurate model of its actions over the range of goals given and can reject harmful goals accordingly. We randomly place invasive and native plants in the map grid to form problems of increasing difficulty (i.e., larger amounts of invasive plants and hence more goals to solve in a problem).
## COMPUTATIONAL METACOGNITION
Figure 3 . A simple problem in the plant protection domain. The central rectangular area in white indicates the garden with native and invasive plants represented by the symbols NAT and INV respectively. The gardening agent at the origin needs to remove invasives and preserve native plants.
<details>
<summary>Image 3 Details</summary>

### Visual Description
\n
## Grid Diagram: State Transitions
### Overview
The image presents a grid-based diagram representing state transitions, likely within a system or process. The grid spans from x=0 to x=9 and y=0 to y=9. Several cells within the grid are labeled with the abbreviations "INV" and "NAT". The grid cells are colored either gray or white, with the labeled cells being white. A small icon is present at the bottom-left corner (x=0, y=0).
### Components/Axes
* **X-axis:** Ranges from 0 to 9, with integer markings.
* **Y-axis:** Ranges from 0 to 9, with integer markings.
* **Labels:** "INV" (appears three times), "NAT" (appears once).
* **Icon:** A small, colorful icon is located at the coordinates (0,0).
* **Grid Cells:** The grid is composed of 10x10 cells, colored either gray or white.
### Detailed Analysis or Content Details
The labeled cells are positioned as follows:
* **INV:** Located at approximately (1, 5), (3, 2), and (4, 1).
* **NAT:** Located at approximately (3, 3).
The white cells containing the labels are surrounded by gray cells. The icon at (0,0) appears to be a cartoon-style representation of a robot or character.
### Key Observations
The labels "INV" and "NAT" appear to represent different states or conditions within the system. The arrangement of these labels within the grid suggests a possible sequence or relationship between these states. The icon at (0,0) might represent the initial state or an agent interacting with the system. The grid structure implies a discrete state space.
### Interpretation
The diagram likely represents a state transition diagram or a similar visualization of a system's behavior. "INV" and "NAT" could stand for "Invalid" and "Natural" (or other similar concepts), representing different states the system can occupy. The grid's coordinates could represent time steps or other relevant parameters. The presence of the icon suggests an agent or process that moves between these states.
The limited data makes it difficult to draw definitive conclusions. However, the diagram suggests a system where transitions between "INV" and "NAT" states occur, and the grid structure provides a framework for analyzing these transitions. Further context would be needed to understand the specific meaning of the labels and the overall purpose of the diagram.
</details>
For the first experiment, we set the ratio of native to invasive plants at 75:25 and then in another set of trials we set it at 60:40. We also ensure that no two plants are in the same map grid and that all plants are situated between location (2,2) and location (7,7). Experiments were carried out by varying problems with the number of goals ranging from 1 to 20. At each fixed number of goals, we generate 100 random trials thus leading to a total of 2,000 random trials for each experiment. The density of the invasive and native plants on the grid changes in every set of trials as the number of goals increase. Initial goals for this domain will be to remove all the invasive plants and preserve all native plants.
## 4.3 Empirical Results
In Figure 4, the blue line represents the learning agent, and the green line represents the standard agent. From these results, it is clear that the learning agent performs significantly better than the standard agent because a standard agent removes all invasive plants regardless of any plants in adjacent cells. The learning agent outperforms because, after a given number of trials, it improves its use of herbicide. As the spray operator is gradually fixed, the learning agent generates better plans to remove invasive plants while preserving the native ones.
For two different points along the x-axis of Figure 4, Figures 5 and 6 show further details using box plots. A box plot is a standardized way of displaying data such as minimum, first quartile, median, third quartile and maximum. The box is drawn from first quartile to third quartile. The lower line below the box is the minimum value (first quartile value -1.5 * interquartile range) and the upper line is the maximum value (first quartile value +1.5 * interquartile range).
<details>
<summary>Image 4 Details</summary>

### Visual Description
\n
## Line Chart: Goal Achievement Comparison
### Overview
This image presents a line chart comparing the percentage of goals achieved by a "Learning-agent" and a "Standard-agent" as the number of goals per problem increases. The x-axis represents the number of goals per problem, ranging from 0 to 18. The y-axis represents the percentage of goals achieved, ranging from 0 to 100.
### Components/Axes
* **X-axis Title:** "Goals per problem"
* **Y-axis Title:** "Percentage of goals achieved"
* **X-axis Scale:** Linear, from 0 to 18, with markers at integer values.
* **Y-axis Scale:** Linear, from 0 to 100, with markers at 20-unit intervals.
* **Legend:** Located in the top-right corner.
* **Blue Line:** "Learning-agent"
* **Green Line:** "Standard-agent"
### Detailed Analysis
**Learning-agent (Blue Line):**
The Learning-agent line starts at approximately 102% at 0 goals per problem, then decreases with a slight fluctuation.
* 0 Goals: ~102%
* 2 Goals: ~96%
* 4 Goals: ~94%
* 6 Goals: ~92%
* 8 Goals: ~91%
* 10 Goals: ~88%
* 12 Goals: ~84%
* 14 Goals: ~79%
* 16 Goals: ~77%
* 18 Goals: ~79%
**Standard-agent (Green Line):**
The Standard-agent line begins at approximately 88% at 0 goals per problem, initially dips, then exhibits a more consistent downward trend.
* 0 Goals: ~88%
* 2 Goals: ~89%
* 4 Goals: ~82%
* 6 Goals: ~79%
* 8 Goals: ~77%
* 10 Goals: ~69%
* 12 Goals: ~66%
* 14 Goals: ~64%
* 16 Goals: ~61%
* 18 Goals: ~59%
### Key Observations
* The Learning-agent consistently outperforms the Standard-agent across all numbers of goals per problem.
* Both agents show a decrease in goal achievement percentage as the number of goals per problem increases.
* The Learning-agent's performance decline is less steep than that of the Standard-agent.
* The Learning-agent starts with a value slightly above 100%, which is likely an artifact of the data or visualization.
### Interpretation
The data suggests that the Learning-agent is more robust to increasing problem complexity (as measured by the number of goals per problem) than the Standard-agent. While both agents experience a decline in performance with more goals, the Learning-agent maintains a higher success rate. This could indicate that the Learning-agent is better at adapting to more challenging scenarios or prioritizing goals effectively. The initial value above 100% for the Learning-agent is unusual and might warrant further investigation to understand its origin. It could be a rounding error, a data anomaly, or a specific characteristic of the measurement process. The consistent downward trend for both agents suggests that there is a limit to their ability to handle increasing complexity, and that performance degrades as the number of goals increases.
</details>
Figure 4 . Experiment 1 performance as a function of problem complexity. The ratio of native to invasive plants is 75:25 in each of the 2,000 trials. As problems increase in complexity given ever more goals to achieve, performance decreases as a percentage of goals achieved. But, the metacognitive agent that learns a better action model outperforms a standard planning agent that uses no metacognition or learning.
Figure 5 . Experiment 1 box plots for standard and learning agents in 10-goal problems.
<details>
<summary>Image 5 Details</summary>

### Visual Description
\n
## Box Plot: Number of Goals Comparison
### Overview
The image presents a box plot comparing the number of goals achieved by a "Standard agent" and a "Learning agent". The plot displays the distribution of goal counts for each agent type, including median, quartiles, and outliers.
### Components/Axes
* **X-axis:** Agent Type - with categories "Standard agent" and "Learning agent".
* **Y-axis:** Number of goals - Scale ranges from 2 to 20, with increments of 2.
* **Box Plots:** Two box plots, one for each agent type.
* **Outliers:** Represented by individual circles ("o").
### Detailed Analysis
**Standard Agent:**
The box plot for the Standard agent shows a median value of approximately 8. The lower quartile is around 6, and the upper quartile is around 9. The interquartile range (IQR) is approximately 3. There are three outliers: one at approximately 3, one at approximately 4, and one at approximately 5.
**Learning Agent:**
The box plot for the Learning agent shows a median value of approximately 9.5. The lower quartile is around 8.5, and the upper quartile is around 10.5. The IQR is approximately 2. There are no outliers visible for the Learning agent.
### Key Observations
* The Learning agent generally achieves a higher number of goals than the Standard agent, as indicated by the higher median.
* The distribution of goal counts for the Learning agent is more concentrated (smaller IQR) than for the Standard agent.
* The Standard agent has several outliers with significantly lower goal counts, suggesting greater variability in performance.
### Interpretation
The data suggests that the Learning agent is more effective at achieving goals compared to the Standard agent. The higher median and smaller IQR indicate that the Learning agent consistently achieves a higher number of goals with less variability. The presence of outliers for the Standard agent suggests that its performance is more prone to fluctuations, with some instances resulting in significantly lower goal counts. This could be due to the Learning agent's ability to adapt and improve its strategy over time, while the Standard agent operates with a fixed approach. The absence of outliers for the Learning agent suggests a more robust and reliable performance. The difference in the distributions suggests that the learning agent is more consistent in its performance.
</details>
Figure 6 . Experiment 1 box plots for standard and learning agents in 20-goal problems.
<details>
<summary>Image 6 Details</summary>

### Visual Description
\n
## Box Plot: Number of Goals Comparison
### Overview
The image presents a box plot comparing the number of goals achieved by a "Standard agent" and a "Learning agent". The plot displays the distribution of goal counts for each agent, highlighting median, quartiles, and potential outliers.
### Components/Axes
* **X-axis:** Agent Type - labeled "Standard agent" and "Learning agent".
* **Y-axis:** Number of goals - scaled from 2 to 20, with increments of 2.
* **Box Plots:** Two box plots, one for each agent type. Each box plot includes:
* Box: Represents the interquartile range (IQR).
* Line within the box: Represents the median.
* Whiskers: Extend to the furthest data point within 1.5 times the IQR.
* Outliers: Data points beyond the whiskers, represented as individual points.
### Detailed Analysis
**Standard Agent:**
* The box plot for the Standard agent is positioned on the left side of the chart.
* The box extends from approximately 8 to 12.
* The median is around 10.5.
* The lower whisker extends down to approximately 6.
* The upper whisker extends up to approximately 15.
* There is one outlier at approximately 20.
**Learning Agent:**
* The box plot for the Learning agent is positioned on the right side of the chart.
* The box extends from approximately 14 to 16.
* The median is around 15.
* The lower whisker extends down to approximately 14.
* The upper whisker extends up to approximately 16.
* There are no outliers.
### Key Observations
* The Learning agent consistently achieves a higher number of goals compared to the Standard agent.
* The distribution of goal counts for the Learning agent is more concentrated, with a smaller IQR.
* The Standard agent has a wider range of goal counts and an outlier indicating a single instance of exceptionally high performance.
* The median number of goals for the Standard agent is approximately 10.5, while for the Learning agent it is approximately 15.
### Interpretation
The data suggests that the Learning agent is more effective at achieving goals than the Standard agent. The narrower distribution of goal counts for the Learning agent indicates more consistent performance. The outlier for the Standard agent suggests that, while it can occasionally achieve very high performance, its overall performance is less reliable. The difference in median values further supports the conclusion that the Learning agent outperforms the Standard agent. This could be due to the Learning agent's ability to adapt and improve its strategy over time, while the Standard agent operates with a fixed approach. The absence of outliers for the Learning agent suggests a more stable and predictable performance.
</details>
## COMPUTATIONAL METACOGNITION
In Figures 5 and 6, the orange line represents the median of the distribution. Note that the y-axis reports the number of goals achieved instead of the percentage of goals achieved as in Figure 4. The black circles represent the outliers for the distribution of the data. In Figure 5, it is clear that the median of the standard agent is much less than the median of the learning agent. In Figure 6, the median of the learning agent is significantly greater than the median of the standard agent, and the variance is very low in contrast to the standard agent. By the time 20-goal problems are introduced, the learning agent has learned a complete operator. However, many cases will exist where goals cannot be achieved even with a perfect action model, so performance is less than 20 achievements out of 20 goals. For instance, consider again the example in Figure 3. Because herbicide cannot be sprayed outside the garden itself (e.g., at location (3,1)), the invasive at (3,2) cannot be killed without also killing the native plant.
In a second experiment with a 60:40 ratio of native to invasive plants, the learning agent still outperforms the standard agent but with less differences in performance (see Figure 7). The reason for this reduced outcome is that fewer examples of native plants being adjacent to invasive ones randomly occur within a 60:40 mix. Box plots are shown in Figures 8 and 9 to provide further details about the variance and significance of the results.
## 5. Related Research
As mentioned, computational metacognition has many interpretations in the literature. The cognitive psychology literature includes judgements of learning, feelings of knowing, tip of the tongue states and metamemory phenomena (Dunlosky & Thiede, 2013). But in some work, the definition is so broad as to blur the distinction between cognition and metacognition. For example, Kralik et al. (2018) defines metacognition as any decision process that takes input or output from another decision process. As such, they include planning as a metacognitive process as well as supervisory and arbitration processes. However, to argue that planning and simulation is metacognitive stretches the use of the term and provides little apparent value.
The recent work on computational approaches to metacognition are surprisingly thin. Most are either older papers or focus on human psychology rather than artificial intelligence. There is a paper on metacognition applied to neural networks (Babu & Suresh, 2012) and another applied to machine learning (Loeckx, 2017). Neither of these concern high-level reasoning and cognitive systems.
Similar to introspective monitoring and meta-level control, the Artificial Cognitive Neural Framework of Crowder, Friess, & Ncc (2011) splits computational metacognition into metacognitive experiences and metacognitive regulation. However, they also add a third component they call metacognitive knowledge or what a cognitive system knows about itself as a cognitive processor. The details of this framework are mainly conceptual, however, and no evaluation exists.
A few examples of metareasoning exist in the cognitive systems community such as the work of Martie, Alam, Zhang, & Anderson (2019). Their work on symbolic mirroring learns associations between neural network image classifiers and symbolic abstractions. They use meta-points to reflect on executed and unexecuted sections of the cognitive system itself by passing them to higher-level processes termed meta-operations. Although like MIDCA, the meta-operations examine and modify lower-level processes and involve explanation, much of the work is specific to vision tasks and less general.
<details>
<summary>Image 7 Details</summary>

### Visual Description
\n
## Line Chart: Goal Achievement Comparison
### Overview
This image presents a line chart comparing the percentage of goals achieved by a "Learning-agent" and a "Standard-agent" as the number of goals per problem increases. The x-axis represents the number of goals per problem, ranging from 0 to 20. The y-axis represents the percentage of goals achieved, ranging from 0 to 100.
### Components/Axes
* **X-axis Label:** "Goals per problem"
* **Y-axis Label:** "Percentage of goals achieved"
* **X-axis Scale:** Linear, from 0 to 20, with markers at intervals of 2.
* **Y-axis Scale:** Linear, from 0 to 100, with markers at intervals of 20.
* **Legend:** Located in the top-right corner.
* **Blue Line:** "Learning-agent"
* **Green Line:** "Standard-agent"
### Detailed Analysis
**Learning-agent (Blue Line):**
The Learning-agent line starts at approximately 102% at 0 goals per problem, then decreases generally.
* 0 Goals: ~102%
* 2 Goals: ~98%
* 4 Goals: ~94%
* 6 Goals: ~91%
* 8 Goals: ~89%
* 10 Goals: ~86%
* 12 Goals: ~78%
* 14 Goals: ~68%
* 16 Goals: ~63%
* 18 Goals: ~60%
* 20 Goals: ~62%
**Standard-agent (Green Line):**
The Standard-agent line starts at approximately 100% at 0 goals per problem, and decreases more steadily than the Learning-agent.
* 0 Goals: ~100%
* 2 Goals: ~88%
* 4 Goals: ~84%
* 6 Goals: ~80%
* 8 Goals: ~78%
* 10 Goals: ~74%
* 12 Goals: ~66%
* 14 Goals: ~61%
* 16 Goals: ~57%
* 18 Goals: ~54%
* 20 Goals: ~52%
### Key Observations
* Both agents start with very high goal achievement percentages (over 90%).
* The Learning-agent initially outperforms the Standard-agent, but the gap narrows as the number of goals per problem increases.
* Both agents exhibit a decreasing trend in goal achievement as the problem complexity (number of goals) increases.
* The Standard-agent shows a more consistent decline in performance.
* The Learning-agent has more fluctuations in performance, with a slight increase in goal achievement at 20 goals per problem.
### Interpretation
The data suggests that both the Learning-agent and the Standard-agent are effective at solving problems with a small number of goals. However, as the number of goals increases, the performance of both agents declines. The Learning-agent initially demonstrates superior performance, potentially due to its ability to adapt and learn from experience. However, its performance becomes more variable as the problem complexity increases, suggesting that it may struggle to generalize its learning to more challenging scenarios. The Standard-agent, while starting at a slightly lower performance level, exhibits a more consistent decline, indicating a more predictable but potentially less adaptable approach. The slight increase in the Learning-agent's performance at 20 goals per problem could indicate a late-stage learning effect or a stabilization of its performance after initial fluctuations. This data could be used to evaluate the scalability and robustness of different agent designs in complex problem-solving environments.
</details>
Figure 7. Experiment 2 performance as a function of problem complexity. The ratio of native to invasive plants is 60:40 in each of the 2,000 trials. As problems increase in complexity given ever more goals to achieve, performance goes down in terms of the percentage of goals achieved.
Figure 8. Experiment 2 box plots for standard and learning agents in 10-goal problems.
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Box Plot: Number of Goals Scored by Agent Type
### Overview
The image presents a box plot comparing the number of goals scored by two types of agents: a "Standard agent" and a "Learning agent". The plot displays the distribution of goal counts for each agent type, showing the median, quartiles, and potential outliers.
### Components/Axes
* **X-axis:** Agent Type - with categories "Standard agent" and "Learning agent".
* **Y-axis:** Number of goals - ranging from 2 to 20, with increments of 2.
* **Box Plots:** Two box plots, one for each agent type. Each box plot includes:
* A box representing the interquartile range (IQR).
* A horizontal line inside the box representing the median.
* "Whiskers" extending from the box, indicating the range of the data (excluding outliers).
* Potential outlier points beyond the whiskers.
### Detailed Analysis
**Standard Agent:**
The box plot for the Standard agent shows a median of approximately 7 goals. The box extends from approximately 5.5 to 8 goals, indicating the IQR. The whiskers extend down to approximately 4 goals and up to approximately 9 goals. There are no visible outliers.
**Learning Agent:**
The box plot for the Learning agent shows a median of approximately 9.5 goals. The box extends from approximately 7 to 10 goals, indicating the IQR. The whiskers extend down to approximately 6.5 goals and up to approximately 10 goals. There are no visible outliers.
### Key Observations
* The Learning agent appears to score more goals on average than the Standard agent, as indicated by the higher median.
* The distribution of goal counts for the Learning agent is more concentrated, as indicated by the smaller IQR.
* Both agents have similar ranges of goal counts, as indicated by the similar whisker lengths.
* There are no apparent outliers in either dataset.
### Interpretation
The data suggests that the Learning agent is more effective at scoring goals than the Standard agent. The higher median and more concentrated distribution of goal counts for the Learning agent indicate that it consistently scores more goals. This could be due to the Learning agent's ability to adapt and improve its performance over time, while the Standard agent's performance remains relatively constant. The lack of outliers suggests that both agents perform consistently within a certain range, without any extreme deviations. This data could be used to evaluate the effectiveness of the learning algorithm used by the Learning agent and to identify areas for improvement in the Standard agent's strategy.
</details>
Figure 9. Experiment 2 box plots for standard and learning agents in 20-goal problems.
<details>
<summary>Image 9 Details</summary>

### Visual Description
\n
## Chart: Box Plot Comparison of Goal Achievement
### Overview
The image presents a comparative box plot illustrating the distribution of the number of goals achieved by two types of agents: a "Standard agent" and a "Learning agent". The plot displays the median, quartiles, and potential outliers for each agent type.
### Components/Axes
* **X-axis:** Categorical, labeled "Standard agent" and "Learning agent".
* **Y-axis:** Numerical, labeled "Number of goals", with a scale ranging from 2 to 20, incrementing by 2.
* **Box Plots:** Two box plots are displayed, one for each agent type. Each box plot includes:
* A box representing the interquartile range (IQR).
* A horizontal line within the box representing the median.
* "Whiskers" extending from the box, indicating the range of the data (excluding outliers).
* Outliers represented as individual points.
### Detailed Analysis
**Standard Agent:**
* The box plot for the Standard agent is positioned on the left side of the chart.
* The median value appears to be approximately 11.
* The bottom of the box (Q1) is around 8.
* The top of the box (Q3) is around 12.
* The whiskers extend down to approximately 8 and up to approximately 15.
* There is one outlier at approximately 19.
**Learning Agent:**
* The box plot for the Learning agent is positioned on the right side of the chart.
* The median value appears to be approximately 12.
* The bottom of the box (Q1) is around 11.
* The top of the box (Q3) is around 13.
* The whiskers extend down to approximately 11 and up to approximately 13.
* There are no outliers visible for the Learning agent.
### Key Observations
* The Learning agent exhibits a tighter distribution of goal achievement compared to the Standard agent, as indicated by the smaller IQR.
* The median number of goals achieved by the Learning agent is slightly higher than that of the Standard agent.
* The Standard agent has a single outlier with a significantly higher number of goals achieved than the rest of the data.
* The range of the Learning agent is smaller than the Standard agent.
### Interpretation
The data suggests that the Learning agent demonstrates more consistent performance in achieving goals compared to the Standard agent. While the median goal achievement is only slightly higher for the Learning agent, the narrower distribution indicates less variability in its performance. The outlier in the Standard agent's data suggests that, in some instances, it can achieve a significantly higher number of goals, but this is not typical. The absence of outliers for the Learning agent further supports the idea of more stable and predictable performance. This could indicate that the learning algorithm is effective in optimizing goal achievement, leading to more consistent results. The difference in IQR suggests the learning agent is less susceptible to random fluctuations or external factors that might affect goal attainment.
</details>
## COMPUTATIONAL METACOGNITION
Human metacognition has been studied in the field of cognitive architectures: notably in ACT-R (Anderson, 2009; Larue, Hough, & Juvina, 2018), CLARION (Sun, 2016), and LIDA (Franklin et al., 2007). In ACT-R, Anderson and Fincham (2014) explored how reflective functions supported by metacognition can consciously assesses what one knows and how to extend it to solve a problem. More specifically, metacognition enabled the architecture to reflect on declarative representations of cognitive procedures, allowing for the modification or replacement of elements in the procedures used in mathematical problem-solving. More recent work explored metacognitive trigger mechanisms and more particularly the feeling of rightness (Wang & Thompson, 2019) metacognitive experience, and more particularly how feeling of rightness determines the depth of inner simulation of possible scenarios and hypothesis testing.
Metacognition is a key element of CLARION's hybrid (i.e., symbolic and subsymbolic) architecture. Following Flavell's (1976) original definition of metacognition, it supports the active monitoring and regulation of cognitive processes. Metacognitive mechanisms determine how much symbolic and sub-symbolic processing will be involved in decision making and can also dynamically change the ratio of symbolic and sub-symbolic processing and how they interact (e.g., which learning method(s) or reasoning mechanism(s) to apply). Internal parameters (e.g., learning rates, thresholds, and utilities) are also modified through metacognition.
In contrast to CLARION and MIDCA, LIDA has no specific metacognitive module. Higherlevel cognitive processes are implemented as collections of behavior streams. Metacognitive processes specifically are a collection of behavior streams (i.e., sequences of actions) that control deliberation through internal actions (e.g., strategy regulation, resource allocation, and the interruption of strategies that have persisted too long in favor of an alternative). We have described metacognition as a kind of add-on to a cognitive system and implemented it as a separate software module in MIDCA. Yet this is not necessarily required computationally. Indeed, metacognition may be a self-reflective aspect of cognition itself. Only further research will clarify the implications of one approach as opposed to the other.
## 6. Conclusion
This paper briefly sketches a theory of computational metacognition analogous to a cognitive action-perception cycle and illustrates how it can be implemented in a cognitive architecture. We divide metacognition into three categories: explanatory, immediate and anticipatory metacognition. We discuss the first category in detail and briefly explain the last. Immediate metacognition remains for future research. Finally, we provide an empirical evaluation of explanatory metacognition and show how it improves system performance by progressively learning better action models. Previous publications (Cox & Dannenhauer, 2016; Cox, et al., 2017; Dannenhauer, et al., 2018) report similar results in alternative domains and therefore support the generality of the approach. However, this work is the first time the full metacognitive cycle in MIDCA has been described and demonstrated. Earlier publications focus on specific portions of the meta-level process such as the detection of metacognitive expectation failures, and they make large assumptions about other aspects of the meta-level computation.
Yet numerous limitations still exist with respect to the implementation and the theory. As the introduction admits, metacognition is not always the best choice for an agent or for humans. For example, if a system is under severe time constraints, reasoning about reasoning may waste valuable computational resources better spent on cognitive-level problem-solving or action
execution. Although future research will provide a greater understanding of the trade-offs involved (see Norman, 2020, for some interesting heuristics that can potentially mitigate this problem), this paper clearly illustrates the positive effect computational metacognition can offer advanced cognitive systems.
Furthermore, some of the assumptions in this work remain implicit and underexplored. Although (in machines) statistical reinforcement learning (Sutton & Barto, 1999) or (in humans) operant conditioning (Skinner, 1938, 1957) may be at play at the lower levels of cognition, we view deliberate, goal-driven learning (Cox & Ram, 1999; Ram & Leake, 1995a) as a metacognitive activity. The agent needs to consider why reasoning fails and make decisions to achieve explicit goals to learn and hence (indirectly) improve performance. This perspective conflicts with standard formulations of the learning problem (e.g., Mitchell, 1997) even within the cognitive systems community (e.g., Langley, 2021; Mohan & Laird, 2014). As such, this makes the comparison to and contrast with other learning approaches difficult. Thus, an empirical comparison between an agent that learns with metacognition and one that learns without metacognition is problematic from our point of view. Again, future research will attempt to elaborate upon and evaluate this position and to clarify other assumptions less obvious in this paper.
Finally, as the examples shown here demonstrate, MIDCA implements relatively simple solutions for the problem of realizing computational metacognition. At this time, MIDCA's metalevel manages only singleton goal sets, and meta-level plans are currently very short and basic. Interactions between multiple meta-level goals and complex planning alternatives also remain for future work. But previous results shown at the cognitive level with MIDCA together with the robust state of the art in the planning community suggest that solutions to more complicated problems at the meta-level will be relatively straight forward and within practical reach. Rather than focus on sophisticated problems sets that rarely occur, we have instead put forth a fundamental approach to metacognition that promises to change the way we think about what makes a cognitive system intelligent and learning effective.
## Acknowledgements
This research is supported in part by the National Science Foundation through grant S&AS1849131 and by the Office of Naval Research (ONR) through grant N00014-18-1-2009. We also thank the anonymous reviewers for their comments, insights, and suggestions and Matt Molineaux for suggesting the use of FOIL as a learning mechanism and his overall technical critique. Finally, we gratefully thank Paul Bello (a previous program manager at ONR) for taking a risk and funding the MIDCA project when it was merely a proposed concept ten years ago.
## References
- Aha, D. W. (2018). Goal reasoning: Foundations, emerging applications, and prospects. AI Magazine, 39, 3-24.
- Amos-Binks, A., Dannenhauer, D. (2019). Anticipatory thinking: A metacognitive capability. In Working Notes of the 7th Goal Reasoning Workshop . Cambridge, MA. Advances in Cognitive Systems 2019.
## COMPUTATIONAL METACOGNITION
- Anderson, J. R. (2009). How can the human mind occur in the physical universe? Oxford, UK: Oxford University Press.
- Anderson, J. R., & Fincham, J.M. (2014). Extending problem-solving procedures through reflection. Cognitive psychology 74, 1-34.
- Babu, G., & Suresh, S. (2012). Meta-cognitive neural network for classification problems in a sequential learning framework. Neurocomputing 81. 86-96. 10.1016/j.neucom.2011.12.001.
- Boggs, J., Dannenhauer, D., Floyd, M. W., & Aha, D. W. (2018). The ideal rebellion: Maximizing task performance in rebel agents. In Proceedings of the 6th Goal Reasoning Workshop , held at IJCAI/FAIM-2018.
- Brown, A. (1987). Metacognition, executive control, self-regulation, and other more mysterious mechanisms. In F. E. Weinert & R. H. Kluwe (Eds.), Metacognition, motivation, and understanding (pp. 65-116). Hillsdale, NJ: Lawrence Erlbaum Associates.
- Conitzer, V. (2011). Metareasoning as a formal computational problem. In M. T. Cox & A. Raja (Eds.) Metareasoning: Thinking about thinking (pp. 121-127). Cambridge, MA: MIT Press.
- Cox, M. T. (1997). Loose coupling of failure explanation and repair: Using learning goals to sequence learning methods. In D. B. Leake & E. Plaza (Eds.), Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning (pp. 425-434). Berlin: Springer-Verlag.
- Cox, M. T. (2005). Metacognition in computation: A selected research review. Artificial Intelligence 169(2), 104-141.
- Cox, M. T. (2011). Metareasoning, monitoring, and self-explanation. In M. T. Cox & A. Raja (Eds.) Metareasoning: Thinking about thinking (pp. 131-149). Cambridge, MA: MIT Press.
- Cox, M. T. (2020). The problem with problems. In Proceedings of the Eighth Annual Conference on Advances in Cognitive Systems . Palo Alto, CA: Cognitive Systems Foundation.
- Cox, M. T., Alavi, Z., Dannenhauer, D., Eyorokon, V., Munoz-Avila, H., & Perlis, D. (2016). MIDCA: A metacognitive, integrated dual cycle architecture for self regulated autonomy. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence , Vol. 5 (pp. 3712-3718). Palo Alto, CA: AAAI Press.
- Cox, M. T., & Dannenhauer, D. (2016). Goal transformation and goal reasoning. In Proceedings of the 4 th Workshop on Goal Reasoning . New York, IJCAI-16.
- Cox, M. T., & Dannenhauer, Z. A. (2017). Perceptual goal monitors for cognitive agents in changing environments. In Proceedings of the Fifth Annual Conference on Advances in Cognitive Systems, Poster Collection (pp. 1-16). Palo Alto, CA: Cognitive Systems Foundation.
- Cox, M. T., Dannenhauer, D., & Kondrakunta, S. (2017). Goal operations for cognitive systems. In Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence (pp. 4385-4391). Palo Alto, CA: AAAI Press.
- Cox, M. T., Oates, T., & Perlis, D. (2011). Toward an integrated metacognitive architecture. In P. Langley (Ed.), Advances in Cognitive Systems: Papers from the 2011 AAAI Fall Symposium (pp. 74-81). Technical Report FS-11-01. Menlo Park, CA: AAAI Press.
- Cox, M. T., & Ram, A. (1999). Introspective multistrategy learning: On the construction of learning strategies. Artificial Intelligence, 112, 1-55.
- Cox, M. T., & Veloso, M. M. (1998). Goal transformations in continuous planning. In M. desJardins (Ed.), Proceedings of the 1998 AAAI Fall Symposium on Distributed Continual Planning (pp. 23-30). Menlo Park, CA: AAAI Press / The MIT Press.
- Crowder, J., Friess, S., & Ncc, M. (2011). Metacognition and metamemory concepts for AI systems. In Proceedings of the 2011 World Congress in Computer Science, Computer Engineering, & Applied Computing: The 2011 International Conference on Artificial Intelligence . CSREA Press.
- Dannenhauer, D., Cox, M. T., & Munoz-Avila, H. (2018). Declarative metacognitive expectations for high-level cognition. Advances in Cognitive Systems , 6, 231-250.
- Dannenhauer, D., & Munoz-Avila, H. (2015). Raising expectations in GDA agents acting in dynamic environments. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (pp. 2241-2247). Menlo Park, CA: AAAI Press.
- Dunlosky, J., & Thiede, K. W. (2013). Metamemory. In D. Reisberg (Ed.), The Oxford handbook of cognitive psychology (pp. 283-298). Oxford, UK: Oxford University Press.
- Edelkamp, S., & Hoffmann, J. (2004). PDDL2.2: The language for the classical part of the 4th international planning competition . Technical Report 195, University of Freiburg.
- Flavell, J. H. (1976). Metacognitive aspects of problem solving. In A. Resnick (Ed.), The nature of intelligence (pp. 231-235). Hillsdale, NJ: LEA.
- Franklin, S., Ramamurthy, U., D'Mello, S., McCauley, L., Negatu, A., Silva R., & Datla, V. (2007). LIDA: A computational model of global workspace theory and developmental learning. In AAAI Fall Symposium on AI and Consciousness: Theoretical Foundations and Current Approaches . Menlo Park, CA: AAAI Press.
- Ghallab, M., Nau, D., & Traverso, P. (2004). Automated planning: Theory and practice . San Francisco: Morgan Kaufmann.
- Gogineni, V. R., Kondrakunta, S., & Cox, M. T. (in press). Multi-agent goal delegation. To appear in Proceedings of the 9 th Goal Reasoning Workshop . Advances in Cognitive Systems 2021.
- Gogineni, V. R., Kondrakunta, S., Molineaux, M., & Cox, M. T. (2018). Application of case-based explanations to formulate goals in a unpredictable mine clearance domain. In M. Minor (Ed.), 26th International Conference on Case-Based Reasoning: Workshop Proceedings - Case-based Reasoning for the Explanation of Intelligent Systems (pp. 42-51). ICCBR-18.
- Goldman, A. I. (2006). Simulating minds . Oxford, UK: Oxford University Press.
- Gopnik, A. (2012). Reconstructing constructivism: Causal models, Bayesian learning mechanisms, and the theory theory. Psychological Bulletin, 138 (6), 1085-1108.
- Hayes-Roth, F., Waterman, D. A., & Lenat, D. B. (Eds.) (1983). Building Expert Systems . London: Addison-Wesley.
- Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research , 26, 191-246.
- Helmert, M., RΓΆger, G., & Karpas, E. (2011). Fast downward stone soup: A baseline for building planner portfolios. In ICAPS 2011 Workshop on Planning and Learning (pp. 28-35).
- Horvitz, E. J. (1990). Rational metareasoning and compilation for optimizing decisions under bounded resources . Knowledge Systems Laboratory, Medical Computer Science, Stanford University.
## COMPUTATIONAL METACOGNITION
- Kondrakunta, S. (2017). Implementation and evaluation of goal selection in a cognitive architecture . Master's thesis, Wright State University, College of Engineering and Computer Science, Dayton, OH.
- Kondrakunta, S., & Cox, M. T. (in press). Autonomous goal selection operation for agent based architectures. To appear in Proceedings of the 2021 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE'21).
- Kondrakunta, S., Gogineni, V. R., & Cox, M. T. (in press). Agent goal management using goal operations. To appear in Proceedings of the 9 th Goal Reasoning Workshop . Advances in Cognitive Systems 2021.
- Kralik, J. D., Lee, J. H., Rosenbloom, P. S., Jackson, P. C., Epstein, S. L., Romero, O. J., Sanz, R., Larue, O., Schmidtke, H. R., Lee, S. W., & McGreggor, K. (2018). Metacognition for a common model of cognition. Procedia Computer Science , 145, 730-739.
- Mohan, S., & Laird J. E. (2014). Learning goal-oriented hierarchical tasks from situated interactive instruction. Proceedings of the 28th AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
- Langley, P. (2017). A cognitive systems analysis of personality and conversational style. Advances in Cognitive Systems , 5, 1-12.
- Langley, P. (2021). The computational gauntlet of human-like learning . Manuscript submitted for publication.
- Larue, O., Hough, A., & Juvina, I. (2018). A cognitive model of switching between reflective and reactive decision making in the Wason task. In I. Juvina, J. Houpt, and C. Myers (Eds.) Proceedings of the Sixteenth International Conference on Cognitive Modeling (pp. 55-60). Madison, WI: University of Wisconsin.
- Loeckx, J. (2017). The case for meta-cognitive machine learning: On model entropy and concept formation in deep learning. ArXiv , abs/1711.01431.
- Martie, L., Alam, A., Zhang, G., & Anderson, R. (2019). Reflecting after learning for understanding. Advances in Cognitive Systems 8, 53-7.
- Mitchell, T. M. (1997). Machine learning . New York: McGraw-Hill.
- Mohammad, Z. (2021). A rebellion framework with learning for goal-driven autonomy . Master's dissertation, Wright State University, Department of Computer Science and Engineering, Dayton, OH.
- Mohammad, Z., Cox, M. T., & Molineaux, M. (2020). Rebel agents that learn from plan execution failures. In Proceedings of the Integrated Execution / Goal Reasoning Workshop (held at the Thirtieth International Conference on Automated Planning and Scheduling - ICAPS-20), Nancy, France.
- Munoz-Avila, H. (2018). Adaptive goal driven autonomy. In Cox, M. T., Funk, P., & Begum, S. (Eds.) (2018). Case-Based Reasoning Research and Development: Proceedings of the 26th International Conference (pp. 3-12). Berlin: Springer.
- Munoz-Avila, H., & Cox, M. T. (2008). Case-based plan adaptation: An analysis and review . IEEE Intelligent Systems, 23(4), 75-81.
- Norman, E. (2020). Metacognition is not always helpful. Frontiers in Psychology , 11, 15-37.
- Paisner, M., Cox, M. T., Maynord, M., & Perlis, D. (2014). Goal-driven autonomy for cognitive systems. In Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 2085-2090). Austin, TX: Cognitive Science Society.
- Pettersson, O. (2005). Execution monitoring in robotics: A survey. Robotics and Autonomous Systems , 53, 73-88.
- Price, J., Hertzog, C., & Dunlosky, J. (2009). Self-regulated learning in younger and older adults: Does aging affect metacognitive control?. Neuropsychology, development, and cognition. Section B, Aging, neuropsychology and cognition, 17, 329-59.
- Quinlan, J. R. (1990). Learning logical definitions from relations. Machine learning , 5(3), 239-266.
- Ram, A., & Cox, M. T. (1994). Introspective reasoning using meta-explanations for multistrategy learning. In R. S. Michalski & G. Tecuci (Eds.), Machine learning: A multistrategy approach IV (pp. 349-377). San Francisco: Morgan Kaufmann.
- Ram, A., & Leake, D. (1995a). Goal-driven learning . Cambridge, MA: MIT Press.
- Ram, A., & Leake, D. (1995b). Learning, goals, and learning goals. In A. Ram & D. Leake (Eds.), Goal-driven learning (pp. 1-37). Cambridge, MA: MIT Press/Bradford Books.
- Reder, L. M., & Ritter, F. (1992). What determines initial feeling of knowing? Familiarity with question terms, not with the answer, Journal of Experimental Psychology , 18(3) 435-451.
- Roberts, M., Borrajo, D., Cox, M. T., & Yorke-Smith, N. (Eds.) (2018). Special issue on goal reasoning. AI Communications , 31(2), 115-116.
- Schank, R. C. (1982). Dynamic memory: A theory of reminding and learning in computers and people . Cambridge, MA: Cambridge University Press.
- Schank, R. C. (1986). Explanation patterns: Understanding mechanically and creatively . Hillsdale, NJ: Lawrence Erlbaum.
- Schank, R. C., & Owens, C. C. (1987). Understanding by explaining expectation failures. In R. G. Reilly (Ed.), Communication failure in dialogue and discourse . New York: Elsevier Science.
- Searle, J. R. (1969). Speech acts: An essay in the philosophy of language . Cambridge, UK: Cambridge University Press.
- Skinner, B. F. (1938). The behavior of organisms: An experimental analysis . New York: AppletonCentury-Crofts.
- Skinner, B. F. (1957). Verbal behavior . New York: Appleton-Century-Crofts.
- Sun, R. (2016). Anatomy of the mind: Exploring psychological mechanisms and processes with the Clarion cognitive architecture . Oxford, UK: Oxford University Press.
- Sutton, R. S., & Barto, A. G. (1999). Reinforcement learning. Journal of Cognitive Neuroscience , 11(1), 126-134.
- Wang, S., & Thompson, V. (2019). Fluency and feeling of rightness: The effect of anchoring and models. Psychological Topics , 28, 37-72.
- Wellman, H. M. (1990). The child's theory of mind . Cambridge, MA: MIT Press.
- Wilson, T. D., & Schooler, J. W. (1991). Thinking too much: Introspection can reduce the quality of preferences and decisions. Journal of Personality and Social Psychology , 60(2), 181-192.
- Zilberstein, S. (2011). Metareasoning and bounded rationality. In M. T. Cox & A. Raja (Eds.) Metareasoning: Thinking about thinking (pp. 27-40). Cambridge, MA: MIT Press.