# Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
**Authors**:
- Tongliang Liu James Bailey (The University of Melbourne The University of Sydney)
> This work is not related to Amazon.
## Abstract
Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms. Understanding these mechanisms is crucial to improve their reasoning abilities. Drawing inspiration from the interplay between neural processes and human cognition, we propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads, which are key components of LLMs. We introduce CogQA, a dataset that decomposes complex questions into step-by-step subquestions with a chain-of-thought design, each associated with specific cognitive functions such as retrieval or logical reasoning. By applying a multi-class probing method, we identify the attention heads responsible for these functions. Our analysis across multiple LLM families reveals that attention heads exhibit functional specialization, characterized as cognitive heads. These cognitive heads exhibit several key properties: they are universally sparse, and vary in number and distribution across different cognitive functions, and they display interactive and hierarchical structures. We further show that cognitive heads play a vital role in reasoning tasks—removing them leads to performance degradation, while augmenting them enhances reasoning accuracy. These insights offer a deeper understanding of LLM reasoning and suggest important implications for model design, training and fine-tuning strategies. The code is available at https://github.com/sihuo-design/CognitiveMirrors.
## 1 Introduction
Large language models (LLMs) achiam2023gpt; grattafiori2024llama; touvron2023llama; yang2024qwen2, built on neural networks that mimic the structure of the human brain, have demonstrated exceptional performance across various natural language processing (NLP) tasks, often exceeding human capabilities. This has sparked growing interest in exploring the potential similarities between the cognitive processes of LLMs and the human brain. Prior studies have demonstrated that LLMs can predict brain responses to natural language caucheteux2022deep; schrimpf2021neural, indicating a functional alignment between artificial models and biological systems. However, to the best of our knowledge, systematic efforts to align reasoning processes between LLMs and human cognitive agents remain scarce. When solving complex reasoning tasks (e.g., a mathematical multiple-choice question; Figure 1), the human brain engages a network of specialized regions: the frontal lobe recalls relevant knowledge wheeler1997toward, language areas (e.g., Wernicke’s and Broca’s) support semantic processing ono2022bidirectional; meyer2005language, and the parietal and prefrontal cortices carry out higher-order reasoning barsalou2014cognitive; hubbard2005interactions.
Analogously, recent research suggests that components within LLMs may also take on specialized roles. For example, multi-head attention mechanisms in transformers vaswani2017attention have been found to handle distinct functions, such as information retrieval wu2404retrieval or maintaining answer consistency truthful, pointing toward a form of architectural division of labor. However, most of these findings are based on relatively simple tasks zheng2409attention, leaving open how such specialization operates under complex, multi-step reasoning scenarios.
<details>
<summary>x1.png Details</summary>

### Visual Description
\n
## Cognitive Process Diagram: Solving a Percentage Problem
### Overview
The image is a technical diagram illustrating the cognitive steps involved in solving a specific math problem: "What percent is 120 of 90?" It breaks down the problem-solving process into four distinct stages, each associated with a different cognitive function and a corresponding brain region. A flowchart at the bottom visualizes the information flow between these stages.
### Components/Axes
The diagram is organized into three main horizontal sections:
1. **Header (Top):** Contains the input problem and multiple-choice options.
* **Input Label:** "Input" (in red, top-left).
* **Question:** "Question: What percent is 120 of 90?"
* **Options:** "Options: (A) 133 1/3 % (B) 134 1/3 % (C) 135 1/3 % (D) 140 1/3 % (E) 143 1/3 %"
2. **Main Chart (Center):** Four vertical columns, each representing a cognitive stage. Each column contains:
* A **Stage Title** (e.g., "Knowledge Recall").
* A **Guiding Question** or instruction.
* A **Brain Diagram** with labeled lobes (FRONTAL LOBE, PARIETAL LOBE, TEMPORAL LOBE, HIPPOCAMPUS). One lobe is highlighted in each diagram.
* A **Resulting Statement or Answer** for that stage.
3. **Footer (Bottom):** A flowchart composed of colored rectangular boxes and directional arrows, illustrating the flow of information or processing between the stages.
### Detailed Analysis
**The Four Cognitive Stages (Left to Right):**
1. **Knowledge Recall (Green Column):**
* **Guiding Question:** "What is the formula to calculate the percentage of one number relative to another?"
* **Brain Highlight:** The **FRONTAL LOBE** is highlighted in green.
* **Result:** "The formula is (part/whole) ×100%"
2. **Semantic Understanding (Orange Column):**
* **Guiding Question:** "What is the part and whole in the context of 'What percent is 120 of 90?'"
* **Brain Highlight:** The **TEMPORAL LOBE** is highlighted in orange.
* **Result:** "The part is 120 and the whole is 90."
3. **Math (Yellow Column):**
* **Instruction:** "Calculate 120 / 90 *100%"
* **Brain Highlight:** The **PARIETAL LOBE** is highlighted in yellow.
* **Result:** "133 1/3"
4. **Decision Making (Teal Column):**
* **Guiding Question:** "Which option corresponds to 133 1/3%?"
* **Brain Highlight:** The **FRONTAL LOBE** is highlighted in teal.
* **Result:** "Option (A)"
**Flowchart Analysis (Bottom Section):**
* The flowchart consists of multiple rows of grey boxes, with specific boxes colored to correspond to the stages above.
* **Green boxes** (bottom-left) originate from the "Knowledge Recall" stage.
* **Orange boxes** (center-left) originate from the "Semantic Understanding" stage.
* **Yellow boxes** (center) originate from the "Math" stage.
* **Teal boxes** (top-right) originate from the "Decision Making" stage.
* **Arrows:** A series of arrows show the flow:
* Green arrows flow from the green boxes upward and rightward.
* An orange arrow flows from an orange box to a central yellow box.
* A yellow arrow flows from the yellow box upward to a teal box.
* A final green arrow points to the final teal box on the far right.
* This visualizes a sequential process where outputs from earlier stages (Recall, Understanding) feed into the Math stage, which then informs the final Decision Making stage.
### Key Observations
* **Process Decomposition:** The complex task of solving a word problem is explicitly decomposed into four linear, dependent sub-tasks.
* **Neuroscience Analogy:** Each cognitive stage is mapped to a simplified model of brain function, suggesting which region might be primarily engaged (e.g., Temporal Lobe for semantic meaning, Parietal Lobe for calculation).
* **Color Consistency:** A strict color code (Green, Orange, Yellow, Teal) is maintained across the stage columns and the corresponding boxes in the flowchart, providing clear visual linkage.
* **Spatial Layout:** The stages are arranged left-to-right in the order they are performed. The flowchart below uses a more complex, network-like layout but maintains the left-to-right overall progression.
### Interpretation
This diagram serves as a pedagogical or analytical model for understanding the multi-step nature of mathematical problem-solving. It demonstrates that answering a seemingly simple question requires distinct cognitive operations: retrieving relevant knowledge (formula), parsing language to assign values (semantic understanding), performing arithmetic (math), and matching a result to given choices (decision-making).
The inclusion of brain regions, while simplified, emphasizes that different neural networks are recruited for different types of thinking. The flowchart reinforces the idea that these processes are not isolated; information flows and builds upon itself. The final output (Option A) is the result of a chain of successful cognitive operations, any failure in which (e.g., recalling the wrong formula or misidentifying the 'part') would lead to an incorrect answer. The diagram effectively makes the "black box" of thinking visible and structured.
</details>
Figure 1: To solve a complex question, the human brain engages multiple regions to perform distinct cognitive functions necessary for generating a response. We explore whether there are specific attention heads in LLM play functional roles in producing answers.
In parallel, prompting techniques like chain-of-thought (CoT) cot have been shown to improve LLM performance by decomposing complex problems into intermediate steps, a strategy reminiscent of human problem-solving, like the example in Figure 1. We hypothesize that such prompting may activate and coordinate specialized components within the model. Thus, analyzing the behavior of attention heads under CoT reasoning could contribute insights for a deeper understanding of the internal workings of LLMs and how they process complex tasks.
In this work, we present a novel interpretability framework to systematically analyze the cognitive roles of attention heads during complex reasoning. To facilitate this, we introduce Cognitive Question&Answering (CogQA), a benchmark dataset that decomposes natural language questions into structured subquestions annotated with fine-grained cognitive functions, such as retrieval, logical inference, and knowledge recall. Leveraging CogQA, we develop a multi-class probing method to identify and characterize attention heads responsible for distinct cognitive operations within the transformer architecture.
We conduct extensive experiments on three major LLM families, including LLaMA (touvron2023llama), Qwen (yang2024qwen2), and Yi (young2024yi). Our results reveal the existence of cognitive heads that consistently exhibit universality, sparsity, and layered functional organization across architectures. Further analysis of the correlations among these cognitive heads reveals clear functional clustering, with heads grouping based on cognitive roles, and uncovers a hierarchical structure in which lower-level heads modulate higher-level ones—mirroring the modular and distributed processing observed in the human cortex (barsalou2014cognitive; ono2022bidirectional).
Furthermore, we validate the functional importance of these heads by showing that their removal degrades performance on complex tasks and leads to specific error patterns, while their enhancement improves reasoning capabilities. Our findings shed light on the structured cognitive architecture embedded in LLMs and open avenues for function-aware model design and analysis.
## 2 CogQA
In this section, we present a detailed account of our benchmark dataset CogQA’s construction and key characteristics. Although extensive existing benchmark collections span a wide array of NLP tasks, to our knowledge no resource explicitly evaluates LLM reasoning across diverse cognitive functions. To address this, we introduce CogQA, a dataset containing 570 main questions and 3,402 subquestions. Each example comprises a question, its answer, and an annotation specifying the cognitive function required for resolution.
### 2.1 Cognitive Function
To systematically capture the cognitive processes involved in complex reasoning tasks, we categorize cognitive functions into two groups: low-level functions and high-order functions, inspired by established frameworks in cognitive science anderson2014rules; diamond2013executive. Low-level functions primarily involve information retrieval and linguistic analysis, while high-order functions engage more abstract reasoning, problem-solving, and decision-making. Detailed descriptions of these cognitive functions are provided in Appendix A.4.
The low-level cognitive functions include:
- Retrieval: locating relevant information from an external source or prior context.
- Knowledge Recall: accessing stored factual or procedural knowledge from memory.
- Semantic Understanding: interpreting the meaning of words, phrases, or concepts.
- Syntactic Understanding: analyzing the grammatical structure of a sentence.
The high-order cognitive functions include:
- Mathematical Calculation: performing arithmetic or numerical operations.
- Logical Reasoning: drawing conclusions based on formal logical relationships.
- Inference: deriving implicit information that is not directly stated.
- Decision-Making: selecting the best outcome among alternatives based on reasoning.
This categorization reflects a natural progression from basic information processing to complex cognitive integration. Both the human brain and LLMs encompass a wide range of functional modules. Our focus in this work is specifically on reasoning-related cognitive functions. By identifying and organizing these eight core reasoning functions, we can more clearly examine how LLMs handle different types of thinking steps, in a way that is both systematic and easy to interpret.
### 2.2 Data Collections
Based on our categorization of cognitive functions, we sampled 750 diverse questions from NLP reasoning benchmarks, selecting 150 examples from each of AQuA aqua, CREAK creak, ECQA ecqa, e-SNLI esnli, and GSM8K gsm8k. These datasets cover a range of reasoning types, including logical, mathematical, and commonsense reasoning. Using the CoT paradigm, we prompted GPT-4o hurst2024gpt to decompose each question into subquestions, each targeting a single cognitive function. The prompt encourages structured, step-by-step reasoning, with each subquestion being clear, answerable, and sequentially dependent. This yields a set of subquestion-answer-cognitive function (subQAC) triples for each QA pair: $\operatorname{subQACs}=\left\{\left(q_{i},a_{i},c_{i}\right)\right\}_{i=1}^{k}$ , where each contains a subquestion $q_{i}$ , its concise answer $a_{i}$ , and the corresponding cognitive function label $c_{i}$ . The prompt for generating subquestions and examples are list in Appendix A.4 and Appendix A.6, respectively.
### 2.3 Data Filtering and Annotation
Recent advances have made it increasingly feasible to use LLMs for dataset construction, owing to their strong reasoning abilities and capacity to generate high-quality annotations at scale llm_annotate. Although our dataset is constructed automatically using an LLM to reduce manual effort, we implement a strict two-stage human verification pipeline to ensure data quality and mitigate hallucinations. In the first stage, three expert annotators independently assess whether the subquestions are logically structured and align with natural human reasoning. QA pairs with inconsistent or incoherent decompositions are filtered out. In the second stage, annotators verify and, if necessary, relabel the cognitive function associated with each subquestion to ensure alignment with the intended mental process. Finally, we validate the subanswers by cross-checking them using the GPT-o4-mini model o4mini2024, followed by human adjudication where discrepancies arise. Details of the annotation process and rubric can be found in Appendix A.5. This multi-step filtering ensures that each retained subQAC triple reflects a coherent, interpretable reasoning step grounded in core cognitive functions. After this refinement, our final dataset contains 570 main QA and 3,402 validated subQAC triplets.
## 3 Cognitive Function Detections
Given the CogQA dataset, we aim to identify which attention heads in LLMs are associated with specific cognitive functions. We adopt a probing-based framework, a widely used interpretability technique in which an auxiliary classifier is trained to predict properties from intermediate model representations alain2016understanding; belinkov2022probing; tenney2019bert. We frame this as a multi-class classification task: for each cognitively annotated subquestion, we extract head activations (see Section 3.1), train classifier and compute importance scores to identify contributing heads (see Section 3.2). Unlike prior work focusing on a single-class, our method captures many-to-many relationships between heads and functions, enabling a more detailed analysis of functional specialization and overlap compared to prior single-class approaches.
### 3.1 Head Feature Extraction
Given a large language model $\mathcal{M}$ , we generate an answer $a_{i}^{\mathcal{M}}$ for each subquestion $q_{i}$ derived from a main question $Q_{i}$ . To support coherent multi-step reasoning, we include preceding subquestions and their answers as contextual input, emulating the incremental reasoning process observed in human cognition.
During inference, input tokens are embedded and processed through successive transformer layers. At each layer, attention and feedforward operations update the residual stream, which is ultimately decoded into token predictions. For each generated token $i$ , we extract attention head outputs $X_{i}=\{x_{l}^{m}(i)\mid l=1,\dots,L,\ m=1,\dots,M\}$ across all layers, where $x^{m}_{l}$ denotes the value vector from the $m$ -th head in layer $l$ projected into the residual stream, with $M$ the number of heads per layer and $L$ the total number of layers.
Let $N_{t}$ denote the number of tokens in the generated answer $a_{i}^{\mathcal{M}}$ . To isolate semantically informative content relevant to reasoning, we select the top- $k$ most important tokens, We include an ablation study in Appendix A.9 to analyze the impact of using alternative token positions. determined by prompting GPT-o4-mini o4mini2024 (skilled in reasoning), yielding an index set $\mathcal{I}_{k}$ with $|\mathcal{I}_{k}|=k$ (Top- $k$ ( $k=5$ ) token examples are in Appendix A.10). For each index $j\in\mathcal{I}_{k}$ , we extract the corresponding attention head activations $X_{j}$ , and compute the averaged activation feature for the $m$ -th head in layer $l$ as $\bar{x}_{l}^{m}=\frac{1}{k}\sum_{j\in\mathcal{I}_{k}}x_{l}^{m}(j)$ . This results in a full set of head-level features $\bar{X}=\{\bar{x}_{l}^{m}\mid l={1,\ldots,L},\ m={1,\ldots,M}\}$ .
Given prior findings suggesting that cognitive functions may vary by layer depth zheng2409attention, we incorporate layer-wise information by computing the average activation $\bar{x}_{l}=\frac{1}{M}\sum_{m=1}^{M}\bar{x}_{l}^{m}$ for each layer. We then augment each head-level vector with its corresponding layer summary, resulting in enriched features $\bar{x}^{m^{\prime}}_{l}=[\bar{x}^{m}_{l};\bar{x}_{l}]$ . For each subQA triplet $(q_{i},\ a_{i},\ c_{i})$ , the final input to the probing classifier is given by $\{\bar{x}^{m^{\prime}}_{l}\mid l={1,\ldots,L},\ m={1,\ldots,M}\}$ .
### 3.2 Heads Importance
For the CogQA dataset with $N$ subQA pairs, we collect all activations to construct the probing dataset:
$$
\mathcal{D}_{\text{probe}}=\left\{(\bar{x}^{m^{\prime}}_{l},\ c)_{i}\right\}_{i=1}^{N},l\in\{1,\ldots,L\},\ m\in\{1,\ldots,M\} \tag{1}
$$
We split the dataset into training and validation sets with a $4{:}1$ ratio. Each attention head feature is first passed through a trainable linear projection for dimensionality reduction, followed by a two-layer MLP that performs multi-class classification over cognitive functions (training details are provided in Appendix A.3). To interpret the contribution of individual heads to each function, we use a gradient-based attribution method. Specifically, for each function class $c$ , we compute the contribution of each head feature via the gradient $\times$ activation technique:
$$
I^{(c)}_{j}=\mathbb{E}_{(\bar{x},c)\sim\mathcal{D}_{\text{probe}}}\left[\frac{\partial\hat{y}_{c}}{\partial\bar{x}_{j}}\cdot\bar{x}_{j}\right], \tag{2}
$$
where $\bar{x}_{j}$ is the $j$ -th head input feature, and $\hat{y}_{c}$ is the classifier’s predicted logit for class $c$ . This yields an importance score for each attention head with respect to each cognitive function. We aggregate the scores into a matrix $\mathbf{I}\in\mathbb{R}^{C\times(L\cdot M)}$ , where each row corresponds to a function class and each column to a specific head in a specific layer.
We hypothesize that attention heads with higher importance scores contribute more significantly to each cognitive function. By ranking heads according to their importance, we can identify which heads and layers are specialized for specific functions. Subsequent targeted interventions on these heads validate the effectiveness of this approach.
## 4 Experiments
We conduct a series of experiments on three LLM families across various model scales, including LLaMA touvron2023llama (Llama3.1-8B-instruct and Llama3.2-3B-instruct), Qwen yang2024qwen2 (Qwen3-8B and Qwen3-4B), and Yi young2024yi (Yi1.5-9B and Yi1.5-6B). Our goal is to identify cognitive attention heads associated with specific reasoning functions and evaluate their roles via targeted interventions. By selectively masking these heads, we assess their functional significance in supporting downstream performance. We evaluate our method in terms of functional alignment, consistency across models, and causal impact on reasoning tasks. Results confirm the existence of sparse, function-specific heads and highlight their critical contribution to structured cognitive processing within LLMs.
### 4.1 Properties of Cognitive Heads
<details>
<summary>x2.png Details</summary>

### Visual Description
## Heatmap Grid: AI Model Head Importance by Cognitive Task
### Overview
The image displays a grid of eight heatmaps arranged in two rows and four columns. Each heatmap visualizes the "importance" of different attention heads (across layers) within an AI model for a specific cognitive task. The overall purpose is to show which parts of the model (specific layer-head combinations) are most critical for different types of reasoning and understanding.
### Components/Axes
* **Grid Structure:** 8 individual heatmaps in a 2x4 layout.
* **Subplot Titles (Top Row, Left to Right):** "Knowledge Recall", "Retrieval", "Logical Reasoning", "Decision-making".
* **Subplot Titles (Bottom Row, Left to Right):** "Semantic Understanding", "Syntactic Understanding", "Inference", "Math Calculation".
* **Common Y-Axis (Leftmost plots):** Labeled "Layer". Scale runs from 0 at the top to 30 at the bottom, with major ticks at 0, 6, 12, 18, 24, 30.
* **Common X-Axis (Bottom plots):** Labeled "Head". Scale runs from 0 on the left to 30 on the right, with major ticks at 0, 6, 12, 18, 24, 30.
* **Color Scale/Legend (Far Right):** A vertical color bar titled "Heads Importance". The scale is continuous:
* Dark Purple/Black: 0.0000
* Teal/Green: ~0.0010 - 0.0020
* Yellow: 0.0025 to 0.0030+ (brightest yellow indicates highest importance).
### Detailed Analysis
Each heatmap is a 31x31 grid (Layers 0-30 vs. Heads 0-30). The color of each cell represents the importance value for that specific layer-head pair for the given task.
**General Pattern Across All Heatmaps:**
* The background is predominantly dark purple, indicating most layer-head pairs have very low importance (~0.0000) for any given task.
* Importance is highly localized. Scattered "hotspots" of higher importance (teal to yellow) appear, but they are sparse and do not form large, continuous regions.
* The distribution of hotspots varies significantly between tasks, suggesting functional specialization within the model.
**Task-Specific Observations (Spatial Grounding & Trend Verification):**
1. **Knowledge Recall:** Hotspots are scattered. Notable yellow spots appear around (Layer ~12, Head ~12) and (Layer ~28, Head ~24).
2. **Retrieval:** Shows a cluster of moderate-to-high importance (teal/yellow) in the central region, roughly between Layers 12-18 and Heads 6-18.
3. **Logical Reasoning:** Has several distinct yellow hotspots. One prominent spot is near (Layer 15, Head 18). Another is around (Layer 12, Head 28).
4. **Decision-making:** Exhibits a relatively higher density of teal and yellow spots compared to others, particularly in the upper half (Layers 0-15). A bright yellow spot is visible at approximately (Layer 12, Head 6).
5. **Semantic Understanding:** Hotspots are sparse. A clear yellow spot is located near (Layer 10, Head 12). Another is at (Layer 24, Head 24).
6. **Syntactic Understanding:** Shows a notable concentration of activity in the center-left. A bright yellow spot is at (Layer 15, Head 12).
7. **Inference:** Appears to have the fewest high-importance (yellow) spots. Most activity is low-level (teal), with a slightly denser region around Layers 12-18.
8. **Math Calculation:** Displays a very distinct pattern. High-importance yellow spots are concentrated in the lower layers, specifically around (Layer 24, Head 9) and (Layer 27, Head 27). This is a clear outlier in terms of spatial distribution compared to the more distributed patterns of other tasks.
### Key Observations
* **Functional Specialization:** Different cognitive tasks activate distinct, sparse sets of attention heads. There is no single "general reasoning" area.
* **Layer-Head Specificity:** Importance is not uniform across a layer or a head index; it is highly specific to the combination (e.g., Layer 12, Head 12 is important for Knowledge Recall but not necessarily for Math Calculation).
* **Math Calculation Anomaly:** The importance pattern for math is uniquely concentrated in the lower layers (higher layer numbers), whereas other tasks show more mid-layer (layers 10-20) activity.
* **Decision-making Density:** The "Decision-making" task appears to engage a broader set of heads more intensely than tasks like "Inference."
### Interpretation
This visualization provides a Peircean map of the model's internal functional organization. The **icon** is the heatmap grid itself, representing the model's architecture. The **index** is the spatial location of the hotspots, pointing to specific computational units (layer-head pairs). The **symbol** is the assigned task label (e.g., "Logical Reasoning").
The data suggests that the model has developed **modular, distributed expertise**. Rather than a monolithic processor, it uses specialized micro-circuits (specific heads in specific layers) for different cognitive operations. The stark difference in the "Math Calculation" pattern implies that mathematical processing may rely on a fundamentally different or more localized computational pathway within the model compared to linguistic or reasoning tasks.
The sparsity of the hotspots indicates high **efficiency and specialization**; only a tiny fraction of the model's attention capacity is critically important for any single task. This has implications for model interpretability and editing: to influence a specific capability, one might target these identified sparse components rather than the entire model. The variation in patterns across tasks underscores the complexity of artificial cognition and challenges the notion of a single, unified "reasoning engine" within such models.
</details>
Figure 2: The existence of cognitive heads in Llama3.1-8B-instruct responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
Our analysis reveals that cognitive head importance in large language models exhibits three key properties: sparsity and universality, and layered functional organization. To illustrate these characteristics, we present the heatmap of attention head importance scores across eight cognitive functions in Llama3.1-8B-instruct (Figure 2).
Sparsity and Universality: As shown in Figure 2, each cognitive function activates only a small number of high-importance attention heads, revealing a strikingly sparse pattern. In Llama3.1-8B-instruct, fewer than 7% of all heads have importance scores above 0.001 across the eight functions, suggesting that only a compact subset of heads meaningfully contribute to task performance. This sparsity is not uniform: Retrieval contains the highest proportion of salient heads (6.45% exceeding 0.01), while Inference has the fewest (3.42%). These results highlight that LLMs rely on highly specialized, localized components for different cognitive abilities. Importantly, we observe that this sparse functional organization is consistent across different model architectures and sizes. Additional heatmaps for five other models are provided in Appendix A.1, supporting the universality of this phenomenon.
Layered Functional Organization: In addition to sparsity, attention heads show a structured distribution across model layers. Retrieval-related heads cluster primarily in the middle layers, while math-related heads appear more frequently in higher layers. This structured, task-dependent localization points to an emergent modular organization, where different layers support distinct cognitive operations. Further, we identify cognitive heads by selecting those before the elbow point of each function’s descending importance curve (Appendix A.2), and find notable variation in head counts across functions (Appendix A.8). For example, in the LLaMA family, mathematical calculation requires fewer heads (59 in Llama3.1-8B-Instruct, 35 in Llama3.2-3B-Instruct), while inference draws on substantially more (139 and 98, respectively), reflecting differences in representational and computational complexity.
Table 1: Intervention results (%) of cognitive heads vs. random heads across 8 cognitive functions: Retrieval, Knowledge Recall, Semantic Understanding, Syntax Understanding, Math Calculation, Inference, Logic Reasoning, and Decision Making. Lower values indicate more effective intervention outcomes, suggesting that the corresponding heads play a greater role in the cognitive function.
| Model | Inter_Head | Information Extraction and Analysis Functions | Higher-Order Processing Functions | | | | | | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Retrieval | Recall | Semantic | Syntactic | Math | Inference | Logic | Decision | | | | | | | | | | |
| comet | acc | comet | acc | comet | acc | comet | acc | comet | acc | comet | acc | comet | acc | comet | acc | | |
| Llama3.1-8B | random | 90.83 | 84.71 | 87.85 | 83.84 | 91.44 | 97.50 | 87.81 | 66.17 | 94.25 | 83.08 | 91.90 | 70.18 | 91.39 | 54.69 | 97.64 | 90.91 |
| cognitive | 44.96 | 8.24 | 56.93 | 38.38 | 81.98 | 75.00 | 69.20 | 40.00 | 87.81 | 66.17 | 76.65 | 52.63 | 52.07 | 4.69 | 56.02 | 4.55 | |
| Llama3.2-3B | random | 87.89 | 86.47 | 76.35 | 68.69 | 90.54 | 90.00 | 75.82 | 40.00 | 94.98 | 69.65 | 95.66 | 85.96 | 92.75 | 76.56 | 93.30 | 81.82 |
| cognitive | 49.47 | 17.06 | 49.69 | 13.13 | 52.29 | 10.00 | 43.62 | 0.00 | 92.01 | 80.10 | 53.60 | 7.02 | 46.69 | 0.00 | 49.25 | 0.00 | |
| Qwen3-8B | random | 92.81 | 75.29 | 89.90 | 53.54 | 92.73 | 42.50 | 88.60 | 80.00 | 92.69 | 60.20 | 94.45 | 24.56 | 94.15 | 20.31 | 96.52 | 31.82 |
| cognitive | 59.19 | 38.24 | 64.81 | 30.30 | 85.95 | 47.50 | 46.26 | 0.00 | 89.29 | 53.23 | 72.77 | 35.09 | 87.61 | 21.88 | 83.17 | 54.55 | |
| Qwen3-4B | random | 94.17 | 84.71 | 84.61 | 77.78 | 86.91 | 77.50 | 98.15 | 80.00 | 87.15 | 44.78 | 96.89 | 87.72 | 92.00 | 75.00 | 94.79 | 72.73 |
| cognitive | 80.13 | 64.71 | 63.10 | 35.35 | 65.95 | 60.00 | 46.25 | 0.00 | 82.40 | 46.27 | 84.88 | 64.91 | 82.79 | 39.06 | 45.49 | 13.64 | |
| Yi-1.5-9B | random | 86.83 | 79.41 | 82.02 | 54.55 | 77.40 | 35.00 | 81.53 | 60.00 | 76.04 | 36.32 | 89.83 | 36.84 | 87.53 | 42.19 | 86.27 | 63.64 |
| cognitive | 52.76 | 21.76 | 45.99 | 9.09 | 47.25 | 2.50 | 48.10 | 40.00 | 54.22 | 16.92 | 52.41 | 15.79 | 82.75 | 26.56 | 62.85 | 18.18 | |
| Yi-1.5-6B | random | 80.64 | 69.41 | 68.82 | 38.38 | 77.83 | 55.00 | 69.61 | 60.00 | 73.33 | 43.78 | 77.71 | 22.81 | 81.65 | 29.69 | 88.54 | 72.73 |
| cognitive | 49.90 | 15.29 | 68.23 | 41.41 | 49.54 | 2.50 | 42.92 | 0.00 | 76.64 | 43.78 | 68.53 | 14.04 | 44.94 | 0.00 | 86.28 | 50.00 | |
<details>
<summary>x3.png Details</summary>

### Visual Description
## Multi-Panel Line Chart: Impact of Masked Attention Heads on Model Performance
### Overview
The image displays a set of four line charts arranged horizontally, comparing the performance of two methods ("TopK" and "RandomK") across four different tasks as the number of masked attention heads increases. Each chart plots two metrics: "Accuracy" (solid lines, left y-axis) and "Comet" (dashed lines, right y-axis). The overall trend shows that performance, particularly Accuracy, degrades as more heads are masked, with the "TopK" method generally being more sensitive than "RandomK".
### Components/Axes
* **Legend:** Positioned at the top center of the entire figure. It defines four data series:
* `TopK Accuracy`: Solid blue line with circular markers.
* `RandomK Accuracy`: Dashed blue line with circular markers.
* `TopK Comet`: Solid red line with circular markers.
* `RandomK Comet`: Dashed red line with circular markers.
* **X-Axis (Common to all subplots):** Labeled "# Masked Heads". It has major tick marks at 16, 32, 64, and 128. The scale appears to be logarithmic (base 2).
* **Y-Axis (Left, for Accuracy):** Labeled "Score" on the leftmost chart. The scale ranges from 0.0 to 1.0 with increments of 0.2.
* **Y-Axis (Right, for Comet):** Not explicitly labeled with text, but implied by the dashed lines and the legend. The scale is also 0.0 to 1.0.
* **Subplot Titles:** Each of the four panels has a title at its top center:
1. **Retrieval** (Leftmost panel)
2. **Knowledge Recall** (Second from left)
3. **Math Calculation** (Third from left)
4. **Inference** (Rightmost panel)
### Detailed Analysis
**1. Retrieval Task (Left Panel)**
* **TopK Accuracy (Solid Blue):** Starts high (~0.95 at 0-16 heads). Drops sharply after 32 heads, falling to ~0.55 at 64 heads, and plummets to near 0.0 by 128 heads.
* **RandomK Accuracy (Dashed Blue):** Starts high (~0.95). Shows a very gradual, slight decline, remaining above ~0.85 even at 128 heads.
* **TopK Comet (Solid Red):** Starts high (~0.95). Declines steadily after 16 heads, reaching ~0.4 by 128 heads.
* **RandomK Comet (Dashed Red):** Starts high (~0.95). Remains very stable and high, showing only a minimal decrease to ~0.9 by 128 heads.
**2. Knowledge Recall Task (Second Panel)**
* **TopK Accuracy (Solid Blue):** Starts around ~0.8. Drops dramatically after 16 heads, hitting a low of ~0.3 at 32 heads, recovers slightly to ~0.45 at 64 heads, then falls to near 0.0 by 128 heads.
* **RandomK Accuracy (Dashed Blue):** Starts around ~0.8. Declines gradually and linearly to ~0.75 by 128 heads.
* **TopK Comet (Solid Red):** Starts around ~0.95. Declines steadily to ~0.4 by 128 heads.
* **RandomK Comet (Dashed Red):** Starts around ~0.95. Remains very stable, ending near ~0.85.
**3. Math Calculation Task (Third Panel)**
* **TopK Accuracy (Solid Blue):** Starts high (~0.95). Declines gradually until 64 heads (~0.7), then drops sharply to ~0.35 at 128 heads.
* **RandomK Accuracy (Dashed Blue):** Starts high (~0.95). Shows a very slow, linear decline to ~0.8 by 128 heads.
* **TopK Comet (Solid Red):** Starts high (~0.95). Remains stable until 64 heads, then drops sharply to ~0.6 at 128 heads.
* **RandomK Comet (Dashed Red):** Starts high (~0.95). Remains very stable, ending near ~0.85.
**4. Inference Task (Right Panel)**
* **TopK Accuracy (Solid Blue):** Starts around ~0.85. Shows a fluctuating but generally downward trend, with a notable dip at 32 heads (~0.75), a recovery at 64 heads (~0.85), and a final drop to ~0.5 by 128 heads.
* **RandomK Accuracy (Dashed Blue):** Starts around ~0.85. Declines very gradually to ~0.7 by 128 heads.
* **TopK Comet (Solid Red):** Starts around ~0.95. Fluctuates but maintains a high level, ending near ~0.85.
* **RandomK Comet (Dashed Red):** Starts around ~0.95. Remains very stable and high, ending near ~0.9.
### Key Observations
1. **Method Sensitivity:** The "TopK" method (solid lines) is consistently and significantly more sensitive to the number of masked heads than the "RandomK" method (dashed lines). This is true for both Accuracy and Comet metrics across all tasks.
2. **Metric Divergence:** For the "TopK" method, the Accuracy metric (blue solid) degrades much more severely and rapidly than the Comet metric (red solid). For "RandomK", both metrics remain relatively stable.
3. **Task Variability:** The "Knowledge Recall" task shows the most severe and early drop in TopK Accuracy. The "Inference" task shows the most fluctuation in its TopK Accuracy trend.
4. **Threshold Effect:** For TopK Accuracy, there appears to be a critical threshold between 32 and 64 masked heads where performance begins to collapse in most tasks (Retrieval, Math Calculation).
### Interpretation
This data demonstrates a fundamental difference between two strategies for selecting which attention heads to mask. The "RandomK" approach is highly robust; masking heads randomly has a minimal negative impact on both task accuracy and the "Comet" metric (likely a measure of output quality or coherence). This suggests that many attention heads are redundant or can be compensated for by others.
In stark contrast, the "TopK" approach—presumably masking the heads deemed most important by some criterion—is highly destructive. The catastrophic drop in Accuracy indicates that these "TopK" heads are indeed critical for the model's task performance. The fact that the Comet score degrades more slowly suggests that while the model's ability to produce a correct answer (Accuracy) is crippled, its general output quality or fluency (Comet) is somewhat more resilient, though still negatively affected.
The findings imply that the model's knowledge and reasoning capabilities are concentrated in a subset of attention heads. Identifying and preserving these heads is crucial for maintaining performance under parameter reduction or efficiency constraints. Conversely, random pruning is a surprisingly effective strategy for reducing model size with minimal performance loss. The variability across tasks (e.g., Knowledge Recall being most sensitive) also indicates that different capabilities rely on different internal structures within the model.
</details>
Figure 3: The performance of Llama3.1-8B-instruct by masking out top K cognitive heads vs K random heads on retrieval, knowledge recall, math calculation, and inference.
### 4.2 Functional Contributions of Cognitive Heads
After identifying the cognitive heads associated with each function, we examine their functional roles by evaluating the model’s behavior on the CogQA test set under targeted interventions. We perform head ablation by scaling the output of a specific attention head with a small factor $\epsilon$ (e.g., 0.001), effectively suppressing its contribution:
$$
x_{i}^{\text{mask}}=\operatorname{Softmax}\left(\frac{W_{q}^{i}W_{k}^{iT}}{\sqrt{d_{k}/n}}\right)\cdot\epsilon W_{v}^{i} \tag{3}
$$
Specifically, we compare model performance when masking identified cognitive heads versus masking an equal number of randomly selected heads. To quantify the impact of masking, we use several standard evaluation metrics including COMET rei2020comet, BLEU papineni2002bleu, ROUGE chin2004rouge, and semantic similarity to compare the model’s outputs before and after intervention. We define an output as unaffected if the BLEU score exceeds 0.8, or either the ROUGE or semantic similarity scores surpass 0.6, and compute accuracy accordingly.
As shown in Table 1, masking cognitive heads leads to a significant decline in performance, whereas masking an equal number of random heads results in only marginal degradation across all LLMs. In some cases, masking the identified cognitive heads causes the accuracy to drop to zero, indicating that the model cannot execute the corresponding function without them. This sharp contrast highlights the essential role cognitive heads play in enabling specific reasoning capabilities. To further validate the functional specialization, we conduct experiments where we mask the retrieval heads during the evaluation of knowledge recall (Recall), and conversely, mask knowledge recall heads during the evaluation of retrieval performance. The results in Table 2 show that masking the corresponding cognitive heads causes a significantly larger performance drop than masking others.
Table 2: Intervention results (%) of different cognitive heads and random heads across Retrieval and Knowledge Recall functions.
| Llama3.1-8B Llama3.1-8B Llama3.1-8B | random retrieval recall | 90.83 44.96 86.79 | 84.71 8.24 75.29 | 87.85 72.05 56.93 | 83.84 33.33 38.38 |
| --- | --- | --- | --- | --- | --- |
| Qwen3-8B | random | 92.81 | 75.29 | 89.90 | 53.54 |
| Qwen3-8B | retrieval | 59.19 | 38.24 | 79.26 | 57.58 |
| Qwen3-8B | recall | 83.31 | 71.18 | 64.81 | 30.30 |
We further investigate the performance of model under different numbers of masked attention heads. As shown in Figure 3, increasing the number of randomly masked heads has minimal impact on overall performance of Llama3.1-8B-instruct. In contrast, masking cognitive heads results in a significant drop in performance across various functions. Notably, masking heads associated with Retrieval and Knowledge Recall causes a pronounced degradation in their respective functions, whereas functions such as Math Calculation and Inference exhibit more resilience. This suggests that certain cognitive functions depend more heavily on specific, distinguishable attention heads, while others are distributed more broadly across the model.
### 4.3 Relationship Among Cognitive Heads
While cognitive heads are specialized for distinct functions, understanding their relationships is crucial for revealing how complex reasoning emerges from their cooperation.
Functional Clustering: Inspired by neuroscience findings that related cognitive functions localize in overlapping brain regions (e.g., prefrontal cortex for reasoning and inference barsalou2014cognitive), we investigate whether LLM attention heads show similar patterns. We rank each head’s importance across eight cognitive functions, form ranking vectors, and apply principal component analysis (PCA) to visualize their organization (Figure 4). The results reveal clear clustering: heads linked to reasoning, inference, and decision-making group closely, while those related to mathematical computation form a distinct cluster in Llama and Qwen, and lie adjacent to reasoning heads in Yi. Lower-level functions also show moderate clustering. These patterns suggest a modular functional architecture in LLMs akin to that in the human brain.
<details>
<summary>x4.png Details</summary>

### Visual Description
## Scatter Plot: Principal Component Analysis of Cognitive Task Categories
### Overview
The image is a scatter plot visualizing the relationship between various cognitive or computational task categories along two principal components (PC 1 and PC 2). The plot uses blue circular markers to represent each category, with text labels placed adjacent to each point. The chart appears to be a dimensionality reduction output (like PCA) showing how different task types cluster or separate in a reduced feature space.
### Components/Axes
* **X-Axis (Horizontal):** Labeled "PC 1". It has a grid with major tick marks but no numerical scale provided.
* **Y-Axis (Vertical):** Labeled "PC 2". It also has a grid with major tick marks but no numerical scale provided.
* **Data Series:** A single series of data points, all represented by solid blue circles.
* **Legend:** There is no separate legend box. Each data point is directly labeled with its category name.
### Detailed Analysis
The plot contains eight distinct data points, each corresponding to a labeled category. Their approximate spatial positions are described below. Since no numerical axes are provided, positions are described qualitatively relative to the plot area.
1. **Math:** Located in the extreme **top-right corner** of the plot. It has the highest value on PC 1 and the highest value on PC 2 among all points.
2. **Decision:** Located in the **top-center** region. It has a high value on PC 2 (slightly lower than Math) and a moderate-to-high value on PC 1.
3. **Inference:** Located in the **top-left quadrant**. It has a high value on PC 2 and a low value on PC 1.
4. **Logical:** Positioned very close to and slightly below "Inference" in the **top-left quadrant**. It shares a similar high PC 2 value and low PC 1 value.
5. **Semantic:** Located in the **middle-left** area. It has a moderate value on PC 2 and a very low value on PC 1.
6. **Syntactic:** Located near the center of the plot, slightly to the right of "Semantic". It has a moderate value on PC 2 and a moderate value on PC 1.
7. **Retrieval:** Positioned just below and slightly left of "Syntactic", near the **center**. It has a moderate-to-low value on PC 2 and a moderate value on PC 1.
8. **Recall:** Located in the **bottom-center** region. It has the lowest value on PC 2 and a moderate value on PC 1.
**Spatial Grouping & Trend Verification:**
* **High PC 2 Cluster:** "Math", "Decision", "Inference", and "Logical" form a loose cluster in the upper half of the plot. "Math" is an outlier to the far right within this group.
* **Central/Moderate PC 2 Cluster:** "Semantic", "Syntactic", and "Retrieval" are grouped in the middle band of the plot.
* **Low PC 2 Outlier:** "Recall" is isolated at the bottom of the plot.
* **PC 1 Trend:** There is a general, non-linear trend where categories move from left (low PC 1) to right (high PC 1) as we go from "Semantic"/"Inference" towards "Math". "Recall" and "Decision" break this simple trend by having moderate PC 1 values despite their extreme PC 2 positions.
### Key Observations
1. **Distinct Separation:** The categories are well-separated in the 2D space, suggesting the two principal components capture meaningful variance between these task types.
2. **"Math" as an Extreme:** The "Math" category is the most distinct, being the furthest along both PC 1 and PC 2.
3. **"Recall" as an Extreme:** "Recall" is the most distinct in the negative PC 2 direction.
4. **Close Pairs:** "Inference" and "Logical" are very close, as are "Syntactic" and "Retrieval", suggesting high similarity in the underlying feature space for these pairs.
5. **Axis Interpretation (Inferred):** Without labeled scales, the axes' meaning is abstract. However, PC 1 might separate tasks requiring **external knowledge or formal systems** (high PC 1: Math, Decision) from those more based on **internal language processing** (low PC 1: Semantic, Inference). PC 2 might separate tasks requiring **active manipulation or generation** (high PC 2: Math, Decision, Inference) from those focused on **accessing stored information** (low PC 2: Recall, Retrieval).
### Interpretation
This scatter plot is a classic output of a dimensionality reduction technique applied to a dataset where each cognitive task was characterized by multiple features (e.g., performance metrics, neural activation patterns, or computational resource usage). The plot reveals the intrinsic structure of these tasks.
* **What it demonstrates:** The visualization suggests that these eight cognitive categories are not monolithic but exist on a continuum defined by at least two major underlying dimensions (PC 1 and PC 2). Tasks that are intuitively similar (like Inference and Logical reasoning) cluster together, validating the feature set used for the analysis.
* **Relationships:** The proximity of points indicates similarity. The large distance between "Math" and "Recall" implies these tasks are fundamentally different according to the analyzed features. The central position of "Syntactic" and "Retrieval" suggests they may share characteristics with both the high-PC2 and low-PC2 groups.
* **Anomalies/Notable Points:** The isolation of "Math" and "Recall" at opposite corners is the most striking feature. This could indicate that mathematical reasoning and pure recall represent two highly specialized and distinct cognitive modes within this model. The tight clustering of "Inference" and "Logical" suggests the analysis did not strongly differentiate between these often-related concepts.
**Language Note:** All text in the image is in English.
</details>
(a) Llama3.1-8B
<details>
<summary>x5.png Details</summary>

### Visual Description
## Scatter Plot: Principal Component Analysis of Cognitive Task Categories
### Overview
The image is a scatter plot visualizing the relationship between eight distinct cognitive task categories across two principal components (PC1 and PC2). The plot uses a standard Cartesian coordinate system with a grid. Each category is represented by a single blue data point with an adjacent text label. No numerical scales or values are provided on the axes, only relative positioning.
### Components/Axes
* **X-Axis (Horizontal):** Labeled "PC 1". Represents the first principal component. The axis has tick marks but no numerical values.
* **Y-Axis (Vertical):** Labeled "PC 2". Represents the second principal component. The axis has tick marks but no numerical values.
* **Data Series:** A single series of eight blue circular points. Each point is uniquely identified by a text label placed near it.
* **Legend:** There is no separate legend. Labels are placed directly adjacent to their corresponding data points.
### Detailed Analysis
**Spatial Grounding and Point Placement:**
The eight labeled points are distributed across the plot area. Their approximate relative positions are as follows:
1. **Math:** Located in the top-left quadrant. It is the highest point on the PC2 axis and is positioned far to the left on the PC1 axis.
2. **Retrieval:** Located in the bottom-left quadrant. It is the lowest point on the PC2 axis and is positioned far to the left on the PC1 axis.
3. **Syntactic:** Located in the left-center area, slightly above and to the right of "Retrieval".
4. **Recall:** Located very close to "Syntactic", slightly above and to the right of it.
5. **Semantic:** Located in the bottom-right quadrant. It is the lowest point on the PC2 axis on the right side and is positioned far to the right on the PC1 axis.
6. **Decision:** Located in the center-right area, above "Semantic".
7. **Logical:** Located in the top-right quadrant, to the right of "Decision".
8. **Inference:** Located in the top-right quadrant, to the right of and slightly above "Logical". It is the furthest point to the right on the PC1 axis.
**Clustering Observation:**
Visually, the points form two loose clusters:
* A **left-side cluster** containing Math, Retrieval, Syntactic, and Recall.
* A **right-side cluster** containing Semantic, Decision, Logical, and Inference.
### Key Observations
1. **Axis Separation:** The primary separation of data points occurs along the PC1 (horizontal) axis. The left cluster (Math, Retrieval, Syntactic, Recall) is distinctly separated from the right cluster (Semantic, Decision, Logical, Inference).
2. **PC2 Variation:** The PC2 (vertical) axis shows significant variation within clusters. "Math" is a clear outlier with a very high PC2 value. "Retrieval" and "Semantic" are outliers with very low PC2 values within their respective clusters.
3. **Tight Grouping:** The points "Syntactic" and "Recall" are positioned very close together, suggesting high similarity in their principal component scores. Similarly, "Logical" and "Inference" are close neighbors.
4. **No Numerical Data:** The chart provides no quantitative data. All analysis is based on the relative spatial positioning of the labeled points.
### Interpretation
This scatter plot likely results from a dimensionality reduction technique (like PCA) applied to a dataset where different cognitive tasks were measured across many variables. The plot suggests a fundamental underlying structure in how these tasks relate to each other.
* **PC1 as a Core Dimension:** The strong left-right separation along PC1 indicates this component captures the most significant variance in the data. The left cluster (Retrieval, Syntactic, Recall, Math) may represent tasks more associated with **memory, pattern recognition, and formal systems**. The right cluster (Semantic, Decision, Logical, Inference) may represent tasks more associated with **meaning, reasoning, and drawing conclusions**.
* **The Unique Position of Math:** "Math" is separated from the other left-cluster tasks by its high PC2 value. This suggests that while it shares some characteristics with retrieval/syntactic tasks (placing it on the left), it possesses a distinct quality captured by PC2 that sets it apart—perhaps related to abstraction, precision, or formal rule application.
* **Semantic as an Anchor:** "Semantic" knowledge is positioned low on PC2 but firmly in the right cluster. This could imply that semantic understanding is a foundational component (low on the abstract PC2 dimension) for the reasoning tasks (Decision, Logical, Inference) that cluster above it.
* **Proximity Implies Similarity:** The close proximity of "Syntactic" and "Recall" suggests these tasks are highly correlated in the analyzed dataset. Likewise, "Logical" and "Inference" are nearly overlapping, indicating they may be measuring very similar cognitive constructs in this context.
**In summary, the visualization argues that cognitive tasks can be mapped onto a latent space where a primary axis distinguishes between memory/pattern-based functions and meaning/reasoning-based functions, with mathematical reasoning occupying a unique, elevated position within the former category.**
</details>
(b) Qwen3-4B
<details>
<summary>x6.png Details</summary>

### Visual Description
## Scatter Plot: Principal Component Analysis of Cognitive Task Categories
### Overview
The image is a 2D scatter plot visualizing the relationship between eight distinct cognitive task categories based on two principal components (PC1 and PC2). Each category is represented by a single blue data point with an adjacent text label. The plot uses a grid system for reference, but no numerical scales are provided on the axes.
### Components/Axes
* **X-Axis:** Labeled "PC 1" (Principal Component 1). It represents the primary dimension of variance in the underlying data. The axis has major grid lines but no numerical markers.
* **Y-Axis:** Labeled "PC 2" (Principal Component 2). It represents the secondary dimension of variance. The axis has major grid lines but no numerical markers.
* **Data Series:** A single series of eight data points, all colored blue. Each point is uniquely identified by a text label placed near it.
* **Legend:** There is no separate legend. Identification is achieved through direct labeling of each data point.
* **Grid:** A light gray grid divides the plot area into rectangular sections, aiding in relative positioning.
### Detailed Analysis
The eight labeled categories and their approximate spatial positions within the PC1-PC2 space are as follows. Positions are described relative to the plot's quadrants and grid lines.
1. **Syntactic:** Located in the **bottom-left quadrant**. It has the lowest value on PC1 and a low value on PC2.
2. **Semantic:** Located in the **bottom-left quadrant**, but to the right and above "Syntactic". It has a low-to-mid value on PC1 and a low value on PC2.
3. **Recall:** Located in the **bottom-right quadrant**. It has the highest value on PC1 and a low value on PC2.
4. **Retrieval:** Located near the **center-left** of the plot, slightly above the horizontal midline. It has a mid-low value on PC1 and a mid value on PC2.
5. **Inference:** Located in the **top-left quadrant**. It has a low-to-mid value on PC1 and a high value on PC2.
6. **Decision:** Located in the **top-left quadrant**, very close to and slightly above "Inference". It has a similar low-to-mid PC1 value and the highest value on PC2.
7. **Logical:** Located in the **top-center** of the plot. It has a mid value on PC1 and a very high value on PC2 (similar to "Decision").
8. **Math:** Located in the **top-right quadrant**. It has a mid-high value on PC1 and a high value on PC2.
**Visual Trend Verification:**
* **PC1 Trend (Left to Right):** The categories progress from "Syntactic" and "Semantic" on the left, through "Retrieval", "Inference", and "Decision" in the middle, to "Math" and finally "Recall" on the far right. This suggests PC1 may differentiate between tasks based on language structure/meaning (left) and formal logic/factual recall (right).
* **PC2 Trend (Bottom to Top):** The categories progress from "Syntactic", "Semantic", and "Recall" at the bottom, through "Retrieval" in the middle, to "Inference", "Math", "Decision", and "Logical" at the top. This suggests PC2 may differentiate between tasks based on rote memory or basic processing (bottom) and higher-order reasoning or complex computation (top).
### Key Observations
1. **Clustering:** There are two loose clusters:
* A **"Reasoning" cluster** in the upper half: "Inference", "Decision", "Logical", and "Math". These are all high on PC2.
* A **"Language & Memory" cluster** in the lower half: "Syntactic", "Semantic", and "Recall". These are low on PC2.
2. **Outliers/Unique Positions:**
* **"Recall"** is a significant outlier on PC1, positioned far to the right, separate from all other categories.
* **"Retrieval"** occupies a central, bridging position between the lower and upper clusters.
3. **Proximity:** "Inference" and "Decision" are very close together, suggesting high similarity in the analyzed feature space. "Logical" is also nearby but slightly separated along PC1.
### Interpretation
This PCA plot likely visualizes the similarity or dissimilarity between different types of cognitive tasks (possibly for AI models or psychological testing) based on a set of underlying performance metrics or features. The two principal components, PC1 and PC2, capture the most significant dimensions of variation across these tasks.
* **PC1 (Horizontal Axis)** appears to contrast **linguistic/semantic processing** (left: Syntactic, Semantic) with **symbolic/factual processing** (right: Math, Recall). Tasks requiring structured language understanding are at one end, while tasks requiring manipulation of formal systems or retrieval of specific facts are at the other.
* **PC2 (Vertical Axis)** clearly separates **higher-order reasoning and problem-solving** (top: Decision, Logical, Inference, Math) from **more foundational or memory-based tasks** (bottom: Syntactic, Semantic, Recall). This axis may represent cognitive load, abstraction level, or the requirement for novel solution generation.
The central placement of "Retrieval" suggests it shares characteristics with both clusters—it may involve accessing stored information (like Recall) but in a way that supports reasoning processes. The tight grouping of "Inference" and "Decision" indicates these tasks are highly correlated in the measured dimensions, both being core components of complex reasoning. The isolation of "Recall" highlights its distinct nature as a pure memory-access task, fundamentally different from the others along the primary dimension of variation (PC1).
</details>
(c) Yi-1.5-6B
Figure 4: PCA visualization of the 8 function heads’ clustering in three models.
Table 3: Study on the influence of low-level cognitive heads for high-order function on Llama3.1-8B-instruct. Accuracy is measured based on BLEU, ROUGE, and semantic similarity scores.
| ✗ | ✓ | ✓ | ✓ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ |
| --- | --- | --- | --- | --- | --- | --- | --- |
| ✓ | ✗ | ✓ | ✓ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $0.00_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 100}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ |
| ✓ | ✓ | ✗ | ✓ | $66.67_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 33.33}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $88.24_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 11.76}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $93.10_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 8.90}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $57.14_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 42.86}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ |
| ✓ | ✓ | ✓ | ✗ | - | $76.92_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.72931pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{$\downarrow$ 23.08}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $100_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.36821pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{ 0.00}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ | $100_{\definecolor{tcbcolback}{rgb}{0.87890625,0.9609375,1}\definecolor{tcbcolframe}{rgb}{0.87890625,0.9609375,1}\par\noindent\hbox to28.45pt{\vbox to9.96pt{\pgfpicture\makeatletter\hbox{\thinspace\lower 0.0pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{{}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }{}{}{}{}{}{}{}{}\definecolor[named]{pgffillcolor}{rgb}{0.87890625,0.9609375,1}\pgfsys@color@rgb@fill{0.87890625}{0.9609375}{1}\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}{{}{}{{}}}{{}{}{{}}}{}{}\pgfsys@moveto{0.0pt}{2.84526pt}\pgfsys@lineto{0.0pt}{7.11337pt}\pgfsys@curveto{0.0pt}{8.68478pt}{1.27385pt}{9.95863pt}{2.84526pt}{9.95863pt}\pgfsys@lineto{25.60748pt}{9.95863pt}\pgfsys@curveto{27.1789pt}{9.95863pt}{28.45274pt}{8.68478pt}{28.45274pt}{7.11337pt}\pgfsys@lineto{28.45274pt}{2.84526pt}\pgfsys@curveto{28.45274pt}{1.27385pt}{27.1789pt}{0.0pt}{25.60748pt}{0.0pt}\pgfsys@lineto{2.84526pt}{0.0pt}\pgfsys@curveto{1.27385pt}{0.0pt}{0.0pt}{1.27385pt}{0.0pt}{2.84526pt}\pgfsys@closepath\pgfsys@fill\pgfsys@invoke{ }\pgfsys@invoke{ }\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@fill@opacity{1.0}\pgfsys@invoke{ }{{{}}{{}}{{}}{{}}{{}}{{}}{{}}{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{2.84526pt}{3.36821pt}\pgfsys@invoke{ }\hbox{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hbox{\minipage[b]{22.76222pt}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ignorespaces\centering\ignorespaces{\text{ 0.00}}\@add@centering\endminipage}}}\pgfsys@invoke{ }\pgfsys@endscope}\pgfsys@invoke{ }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{ }\pgfsys@endscope\hss}}\endpgfpicture}}\par}$ |
Hierarchical Structure: Human problem solving often involves hierarchical reasoning, where lower-level functions such as retrieval and comprehension support higher-level inference and decision-making. The CogQA dataset captures this structure through subquestions progressing from simple information extraction to complex reasoning. We test if LLMs reflect this hierarchy by masking attention heads tied to early-stage functions and measuring the effect on later tasks. For instance, to assess how Retrieval affects Math Calculation, we suppress Retrieval-related heads throughout the subquestions. Answers from earlier Retrieval are used as priors for later math reasoning, allowing us to observe how disrupting low-level functions can propagate and impair higher-level reasoning along the chain. As Table 3 shows, masking retrieval or knowledge recall heads causes significant performance drops in subsequent decision-making steps, whereas masking syntactic understanding heads has minimal impact. This provides evidence for an emergent hierarchical organization in LLMs, where foundational cognitive functions underpin advanced reasoning.
### 4.4 Influence of Cognitive Heads on Downstream Tasks
In this section, we investigate how cognitive heads influence downstream tasks through both negative interventions (masking out cognitive function heads) and positive interventions (shifting heads toward specific functions). We conduct experiments on two tasks: a math task using 100 GSM8K samples (GSM8K_100) and a retrieval task with 49 samples from an extractive_QA dataset. The Extractive_QA pairs are generated by GPT-4o, with answers extracted directly from the source paragraph.
Negative Intervention: We perform negative intervention by masking corresponding cognitive heads (Math Calculation heads for GSM8K_100 and Retrieval heads for Extractive_QA), effectively suppressing their activations. As shown in Table 4, this causes significant performance drops across models, confirming these heads’ functional roles. Notably, after masking, performance converges to a similarly low level across different LLMs, regardless of model size or original accuracy. This is expected, as the crucial cognitive heads responsible for specific functions are disabled, making it difficult for the model to arrive at correct answers.
For math, the remaining 30% accuracy likely stems from two factors: (1) memorized answers in the base model, and (2) simple questions not requiring actual computation. For retrieval, masking Retrieval heads almost completely abolishes the model’s retrieval ability across all scales. This indicates that cognitive functions are indeed localized in a subset of heads, and masking them leads to a systematic degradation, irrespective of model capacity. The negative intervention example further shows that, masking the Math Calculation heads leads to errors in arithmetic tasks, while retrieval and language functions remain largely unaffected. This confirms that these cognitive heads are crucial for specific functions and highlights the robustness and generalizability of our method.
Positive Intervention: We calculate the activation directions of different cognitive functions using the CogQA dataset. For each function, the activation direction of a head at layer $l$ and index $h$ is computed as:
$$
\operatorname{dir}_{l}^{h}=\mathbb{E}_{i\in\mathcal{D}_{\text{correct }}}\left[x_{l}^{h}(i)\right]-\mathbb{E}_{i\in\mathcal{D}_{\text{incorrect }}}\left[x_{l}^{h}(i)\right] \tag{4}
$$
where $x_{l}^{h}(i)$ denotes the activation of head at layer $l$ and index $h$ , and $\mathcal{D}_{\text{correct }}$ and $\mathcal{D}_{\text{incorrect }}$ represent the sets of samples answered correctly and incorrectly, respectively. Then we estimate the standard deviation of activations truthful along the cognitive function direction to be $\sigma_{l}^{h}$ , and shift original head activation as $x_{l}^{h}(i)\leftarrow x_{l}^{h}(i)+\alpha\sigma_{l}^{h}\operatorname{dir}_{l}^{h}$ , where $\alpha$ is a parameter.
The experimental results in Table 4 show that enhancing the activation of retrieval heads along their corresponding functional directions leads to improved performance on the retrieval task. For the math task, questions often engage multiple cognitive functions. We observe cases where computations are correct but semantic understanding fails. By positively intervening on either Math Calculation or Semantic Understanding heads, performance consistently improves, highlighting the collaborative nature of cognitive functions in complex reasoning and the need for joint interventions—a direction we leave for future work.
The example below illustrates a positive intervention on Math Calculation heads in Qwen3-4B: the model initially produces an incorrect answer (red) but generates the correct one (blue) after intervention. This demonstrates that adjusting cognitive head activations along their functional directions reliably enhances performance on complex reasoning tasks.
Negative Intervention Case
Query: Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r. Original Output: Okay, so I need to solve this system of equations: -42r + 27c = -1167 and 130r + 4c = 372 … First equation: -42*4 + 27*(-37) = -168 - 999 = -1167. Correct. Second equation: 130*4 + 4*(-37) = 520 - 148 = 372. Correct. Yes, that’s right. So everything checks out. So I think that’s the correct answer. Therefore, r is 4. Negative Intervention (Math) Output: … To eliminate $c$ , I need the coefficients of $c$ in both equations to be the same (or negatives). The coefficients are 27 and 4. The least common multiple of 27 and 4 is 100? Wait, 27 is $3^{3}$ , and 4 is $2^{2}$ , so LCM is $2^{2}*3^{3}={\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}4*27=100}$ . So if I can make the coefficients of $c$ to be 100 and 100, or 27 and 4. Wait, but 27 and 4. So if I multiply the first equation by 4 and the second equation by 27, then the coefficients of $c$ would be 27*4 = 100 and 4*27 = 100. Then, if I subtract the equations, the $c$ terms would be 100 and 100, but since they are both positive, subtracting the equations would eliminate $c$ . …
Positive Intervention Case
Query: On a school trip to the seashore, Alan and his friends collected shells. Alan collected four times as many shells as Ben did. Ben got a late start and only collected a third of what Laurie did. If Laurie collected 36 shells how many did Alan collect? Original Output: Alan collected four times as many shells as Ben did, and Ben collected a third of what Laurie did. Since Laurie collected 36 shells, Ben collected 36 / 3 = 12 shells. Therefore, Alan collected 4 * 12 = 48 shells. However, the question asks for the number of shells Alan collected, and it is given that Alan collected four times as many shells as Ben did. Therefore, the correct answer is 4 * 12 = 48 * 4 / 3 = 72. Positive Intervention (Math) Output : …Since Ben collected a third of what Laurie did, he collected 36 / 3 = 12 shells. Alan collected four times as many shells as Ben did, so he collected 4 * 12 = 48 shells …The correct answer is 48
Table 4: The LLM performance on GSM8k_100 and Extractive_QA by Negative and Positive Intervention (Inter.). Math accuracy (%) is measured by exact answer match; Extractive_QA accuracy checks if the original paragraph answer appears in the response.
| Dataset | Method | Head | Llama3.1-8B | Llama3.2-3B | Qwen3-8B | Qwen3-4B |
| --- | --- | --- | --- | --- | --- | --- |
| Extractive_QA | Base | - | 57.14 | 36.73 | 57.14 | 51.02 |
| Negative Inter. | Retrieval | 0 | 0 | 14.29 | 12.24 | |
| Positive Inter. | Retrieval | 63.26 | 44.90 | 61.22 | 69.38 | |
| GSM8K_100 | Base | - | 82 | 64 | 94 | 91 |
| Negative Inter. | Math | 38 | 34 | 34 | 37 | |
| Positive Inter. | Math | 84 | 66 | 94 | 92 | |
| Positive Inter. | Semantic | 84 | 65 | 94 | 93 | |
## 5 Related Works
#### Neural Networks and the Brain
Neural networks have long been studied as computational models of the brain, with early work linking artificial neurons to the biological mechanisms of perception and learning McCullochP90. Convolutional neural networks (CNNs), in particular, have been shown to capture representations similar to those in the visual cortex yamins2014performance, and more recent studies suggest that the functional modularity observed in deep networks gives rise to brain-like specialization (dobs2022brain) in vision task. More recently, LLMs have exhibited striking parallels with human brain activity during language processing. In particular, transformer-based models, such as GPT-2, produce internal representations that align with neural responses in language-selective brain regions caucheteux2022deep; schrimpf2021neural. However, prior work mostly focuses on perception and language representations, with limited study on higher-level cognitive functions like reasoning. We instead analyze LLMs’ behavior in complex reasoning tasks to explore their alignment with human cognitive functions and functional specialization.
#### Functional Specialization of Attention Heads
Recent years have witnessed growing interest in understanding the functional roles of attention heads in Transformer-based models, forming a core component of mechanistic interpretability research. Early work by clark2019does demonstrated that individual heads in BERT capture specific linguistic phenomena such as syntactic dependencies and coreference, indicating a degree of functional specialization. Building on this, voita2019analyzing proposed a pruning-based approach to identify important heads by measuring their contribution to downstream performance, showing that many heads are redundant. Subsequent studies extended this analysis to decoder-only large language models (LLMs). michel2019sixteen explored functional decomposition in such models, leading to the identification of distinct attention heads responsible for tasks such as pattern induction (induction), truthfulness (truthful), information retrieval (wu2404retrieval), and safety alignment (safety). For a broader survey, see zheng2409attention. Despite these advances, most prior work focuses on isolated heads and evaluates them in relatively simple or synthetic tasks. In contrast, we investigate functionally specialized heads under more complex reasoning settings by aligning attention head behavior with human cognitive functions.
## 6 Limitations and Future works
While our study provides an initial framework for analyzing the cognitive functions of attention heads, several limitations remain. First, we focus on eight predefined cognitive functions, which, though representative, may not capture the full spectrum of LLM capabilities; future work could extend this taxonomy with finer-grained or emergent functions. Each subquestion in CogVision is annotated with a single cognitive function, though real reasoning may engage multiple functions. Similarly, we assume one head corresponds to one function, while in practice a head may support multiple functions, vary with context, or reflect hierarchical compositions. These complexities are not fully addressed in our current framework. Excluding subquestions with incorrect subanswers could improve multi-class probing, and further investigation is needed to understand heads serving multiple functions. Finally, our work emphasizes analysis over application, but identifying cognitively relevant heads could inform model design, including dynamic head activation, improved chain-of-thought prompting, targeted fine-tuning, or modular architectures—directions we leave for future exploration.
## 7 Conclusions
We propose an interpretability framework that connects attention heads in large language models (LLMs) to human cognitive functions involved in reasoning. To support this, we introduce CogQA, a cognitively grounded dataset, along with a multi-class classification approach to identify specialized heads associated with specific reasoning tasks. Our analysis across multiple LLM families and scales demonstrates that attention heads exhibit universality, sparsity, intrinsic roles, and dynamic, hierarchical organization. These findings indicate that LLMs internally organize reasoning processes in a manner akin to human cognition, laying the groundwork for more interpretable and cognitively informed language models.
## Acknowledgements
This work is partially supported by the following Australian Research Council (ARC) projects: FT220100318, DP220102121, LP220100527, LP220200949, DP230101534.
## NeurIPS Paper Checklist
1. Claims
1. Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?
1. Answer: [Yes]
1. Justification: Yes, the main claims in the abstract and introduction accurately reflect the paper’s contributions and scope.
1. Guidelines:
- The answer NA means that the abstract and introduction do not include the claims made in the paper.
- The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers.
- The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings.
- It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.
1. Limitations
1. Question: Does the paper discuss the limitations of the work performed by the authors?
1. Answer: [Yes]
1. Justification: We discussed our limitation in discussion section.
1. Guidelines:
- The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper.
- The authors are encouraged to create a separate "Limitations" section in their paper.
- The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be.
- The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated.
- The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon.
- The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size.
- If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness.
- While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.
1. Theory assumptions and proofs
1. Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?
1. Answer: [N/A]
1. Justification: We don’t have theoretical result
1. Guidelines:
- The answer NA means that the paper does not include theoretical results.
- All the theorems, formulas, and proofs in the paper should be numbered and cross-referenced.
- All assumptions should be clearly stated or referenced in the statement of any theorems.
- The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition.
- Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material.
- Theorems and Lemmas that the proof relies upon should be properly referenced.
1. Experimental result reproducibility
1. Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?
1. Answer: [Yes]
1. Justification: Yes. The paper provides sufficient information to reproduce the main experimental results. We release the dataset and describe the experimental setup, intervention methods, model training procedures, and evaluation metrics in detail in the main paper, ensuring transparency and reproducibility of the core findings.
1. Guidelines:
- The answer NA means that the paper does not include experiments.
- If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not.
- If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable.
- Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed.
- While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example
1. If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm.
1. If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully.
1. If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset).
1. We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.
1. Open access to data and code
1. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?
1. Answer: [Yes]
1. Justification: We have released the complete GitHub repository with dataset and code.
1. Guidelines:
- The answer NA means that paper does not include experiments requiring code.
- Please see the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.
- While we encourage the release of code and data, we understand that this might not be possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark).
- The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.
- The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc.
- The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why.
- At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).
- Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.
1. Experimental setting/details
1. Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?
1. Answer: [Yes]
1. Justification: Yes, we give a details about how we select data, and how we constructed our dataset. Also the training configuration.
1. Guidelines:
- The answer NA means that the paper does not include experiments.
- The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them.
- The full details can be provided either with the code, in appendix, or as supplemental material.
1. Experiment statistical significance
1. Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?
1. Answer: [No]
1. Justification: We believe that our experiment does not require this.
1. Guidelines:
- The answer NA means that the paper does not include experiments.
- The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.
- The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions).
- The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.)
- The assumptions made should be given (e.g., Normally distributed errors).
- It should be clear whether the error bar is the standard deviation or the standard error of the mean.
- It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified.
- For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates).
- If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.
1. Experiments compute resources
1. Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?
1. Answer: [No]
1. Justification: We are working with inference only, compute resources is not the factor of any of our experiments.
1. Guidelines:
- The answer NA means that the paper does not include experiments.
- The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.
- The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.
- The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn’t make it into the paper).
1. Code of ethics
1. Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?
1. Answer: [Yes]
1. Justification: We don’t have any code of ethics issues in this paper
1. Guidelines:
- The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.
- If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics.
- The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).
1. Broader impacts
1. Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?
1. Answer: [N/A]
1. Justification: We believe this is not related to our work.
1. Guidelines:
- The answer NA means that there is no societal impact of the work performed.
- If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact.
- Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.
- The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster.
- The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology.
- If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).
1. Safeguards
1. Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?
1. Answer: [N/A]
1. Justification: All data we used are published public dataset.
1. Guidelines:
- The answer NA means that the paper poses no such risks.
- Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters.
- Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images.
- We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.
1. Licenses for existing assets
1. Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?
1. Answer: [Yes]
1. Justification: We cited all models, and dataset we used in this paper.
1. Guidelines:
- The answer NA means that the paper does not use existing assets.
- The authors should cite the original paper that produced the code package or dataset.
- The authors should state which version of the asset is used and, if possible, include a URL.
- The name of the license (e.g., CC-BY 4.0) should be included for each asset.
- For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.
- If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset.
- For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.
- If this information is not available online, the authors are encouraged to reach out to the asset’s creators.
1. New assets
1. Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?
1. Answer: [Yes]
1. Justification: We introduce a new annotated dataset to support cognitive function analysis, which is submitted in the supplementary materials. We also provide detailed documentation describing its construction process, structure, and usage guidelines in the paper.
1. Guidelines:
- The answer NA means that the paper does not release new assets.
- Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc.
- The paper should discuss whether and how consent was obtained from people whose asset is used.
- At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.
1. Crowdsourcing and research with human subjects
1. Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?
1. Answer: [N/A]
1. Justification: Not realted
1. Guidelines:
- The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.
- Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper.
- According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.
1. Institutional review board (IRB) approvals or equivalent for research with human subjects
1. Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?
1. Answer: [N/A]
1. Justification: Not related
1. Guidelines:
- The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.
- Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.
- We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution.
- For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.
1. Declaration of LLM usage
1. Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does not impact the core methodology, scientific rigorousness, or originality of the research, declaration is not required.
1. Answer: [Yes]
1. Justification: We used LLMs as an integral part of our dataset construction process. Specifically, LLMs were used to generate intermediate reasoning steps and candidate answers under controlled prompting. All prompts, generation procedures, and filtering steps are clearly documented in the main paper and supplementary materials to ensure transparency and reproducibility.
1. Guidelines:
- The answer NA means that the core method development in this research does not involve LLMs as any important, original, or non-standard components.
- Please refer to our LLM policy (https://neurips.cc/Conferences/2025/LLM) for what should or should not be described.
## Appendix A Appendix
### A.1 The cognitive function distribution of other models
We present the heatmaps for the remaining five models in this section. The results reveal a notable universality in the sparsity patterns of attention heads across different architectures. Moreover, models within the same family tend to exhibit similar sparsity distributions. For instance, Llama3.2-3B (Figure 5) and Llama3.1-8B (Figure 2) share comparable patterns, as do Qwen3-4B (Figure 7) and Qwen3-8B (Figure 6), as well as Yi-1.5-6B (Figure 9) and Yi-1.5-9B (Figure 8). This consistency is likely due to the shared architectural design and similar pretraining data within each model family.
<details>
<summary>x7.png Details</summary>

### Visual Description
\n
## Multi-Panel Heatmap Grid: Heads Importance Across Cognitive Tasks
### Overview
The image displays a grid of eight heatmaps, arranged in two rows of four. Each heatmap visualizes the "Heads Importance" for a specific cognitive task within what appears to be a neural network model. The x-axis represents the "Head" index (0-18), and the y-axis represents the "Layer" index (0-24). A color bar on the right provides the scale for "Heads Importance," ranging from 0.0000 (dark purple) to 0.0050+ (bright yellow).
### Components/Axes
* **Main Title/Labels:** None present as a single overarching title. Each subplot has its own title.
* **Y-Axis Label (Common to all):** "Layer" (vertical text on the far left).
* **Y-Axis Scale (Common to all):** Linear scale from 0 to 24, with major ticks at 0, 6, 12, 18, 24.
* **X-Axis Label (Common to all):** "Head" (horizontal text at the bottom center).
* **X-Axis Scale (Common to all):** Linear scale from 0 to 18, with major ticks at 0, 6, 12, 18.
* **Color Bar Legend (Right side):**
* **Title:** "Heads Importance"
* **Scale:** Continuous gradient from dark purple (0.0000) through teal and green to bright yellow (0.0050+).
* **Tick Values:** 0.0000, 0.0010, 0.0020, 0.0030, 0.0040, 0.0050+.
* **Subplot Titles (Top of each panel, left to right, top to bottom):**
1. Knowledge Recall
2. Retrieval
3. Logical Reasoning
4. Decision-making
5. Semantic Understanding
6. Syntactic Understanding
7. Inference
8. Math Calculation
### Detailed Analysis
Each heatmap is a 25 (layers) x 19 (heads) grid of colored cells. The color intensity represents the importance value for a specific head at a specific layer for the given task.
**Panel-by-Panel Description (Approximate Patterns):**
1. **Knowledge Recall:** Shows scattered medium-intensity (teal/green) spots, with a few brighter (yellow-green) points concentrated in the mid-layers (approx. layers 12-18). No single dominant head.
2. **Retrieval:** Similar to Knowledge Recall but with slightly more defined clusters. A notable bright spot (yellow) appears around Head 6, Layer 12.
3. **Logical Reasoning:** Features a few distinct bright spots. One prominent yellow point is near Head 12, Layer 12. Another cluster of medium-high importance is visible in the lower layers (approx. 18-24).
4. **Decision-making:** Displays a more dispersed pattern of medium importance. Several bright yellow spots are present, notably around Head 18, Layer 12 and Head 12, Layer 6.
5. **Semantic Understanding:** Appears relatively uniform with low-to-medium importance (mostly dark purple to teal). A few slightly brighter spots are scattered, with no strong concentration.
6. **Syntactic Understanding:** Shows a distinct cluster of high importance (bright yellow) in the mid-layers, centered around Head 6-12, Layer 12-18. This is one of the most concentrated patterns.
7. **Inference:** Has a moderate, scattered pattern. A few brighter points are visible, such as near Head 0, Layer 12.
8. **Math Calculation:** Exhibits a very distinct pattern. High importance (bright yellow) is concentrated in the lower layers (approx. layers 18-24), particularly around Heads 0-6 and Head 18. The upper layers are predominantly low importance (dark purple).
### Key Observations
* **Task-Specific Specialization:** Different cognitive tasks activate distinct patterns of head importance across the network layers.
* **Layer Specialization:** Some tasks show importance concentrated in specific layer bands:
* **Mid-Layers (12-18):** Syntactic Understanding, Retrieval, Knowledge Recall.
* **Lower Layers (18-24):** Math Calculation, Logical Reasoning.
* **Head Specificity:** Certain heads appear highly important for multiple tasks (e.g., Head 12, Layer 12 is bright in Logical Reasoning and Decision-making), while others are task-unique.
* **Pattern Variance:** "Semantic Understanding" shows the most diffuse, low-contrast pattern, suggesting less localized head importance. "Math Calculation" and "Syntactic Understanding" show the most localized, high-contrast patterns.
* **Outlier:** The "Math Calculation" heatmap is an outlier due to its strong, low-layer concentration, contrasting with the more mid-layer focus of language-oriented tasks.
### Interpretation
This visualization likely represents an analysis of a multi-layer, multi-head transformer model (like an LLM), probing which specific "attention heads" are most important for performing different cognitive functions. The "Heads Importance" metric could be derived from methods like integrated gradients, attention rollout, or ablation studies.
The data suggests a **functional specialization within the network**:
* **Lower Layers (18-24):** Appear crucial for **structured, formal reasoning** like mathematical calculation and logical reasoning. This aligns with the idea that lower layers handle more fundamental, syntactic, or structural processing.
* **Mid-Layers (12-18):** Are heavily involved in **language-specific processing**, including syntactic understanding, retrieval of knowledge, and general recall. This is the core "semantic processing" zone.
* **Higher Layers (0-6):** Show less concentrated importance in these tasks, possibly indicating they are involved in more abstract integration or task-specific output formatting not captured by these probes.
The stark difference in the "Math Calculation" pattern implies that mathematical reasoning relies on a fundamentally different computational pathway or set of features within the model compared to linguistic tasks. The diffuse pattern for "Semantic Understanding" might indicate that this capability is more distributed across the network rather than being localized to specific heads.
**In summary, the heatmap grid provides evidence for a hierarchical and modular organization of cognitive functions within the neural network, with clear anatomical (layer) and unit (head) specialization for different types of tasks.**
</details>
Figure 5: The existence of cognitive heads in Llama3.2-3B-instruct responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
<details>
<summary>x8.png Details</summary>

### Visual Description
## Heatmap Grid: Attention Head Importance Across Cognitive Tasks
### Overview
The image displays a grid of eight heatmaps arranged in two rows and four columns. Each heatmap visualizes the "Heads Importance" (likely a measure of attention head contribution or activation strength) across different layers and heads of a neural network model (presumably a transformer) for eight distinct cognitive tasks. The overall color scheme uses a purple-to-yellow gradient, where darker purple indicates lower importance (near 0.0000) and bright yellow indicates higher importance (0.0040+).
### Components/Axes
* **Grid Structure:** 2 rows x 4 columns of individual heatmaps.
* **Individual Heatmap Axes:**
* **X-axis (Horizontal):** Labeled "Head". Markers are at intervals of 6, ranging from 0 to 30. This represents the index of attention heads within a layer.
* **Y-axis (Vertical):** Labeled "Layer". Markers are at intervals of 6, ranging from 0 to 30. This represents the depth/layer number in the model.
* **Color Scale/Legend:** Located on the far right of the image. It is a vertical color bar titled "Heads Importance".
* **Scale:** Ranges from 0.0000 (dark purple) to 0.0040+ (bright yellow).
* **Key Values:** 0.0000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025, 0.0030, 0.0035, 0.0040+.
* **Heatmap Titles (Cognitive Tasks):**
* **Top Row (Left to Right):** "Knowledge Recall", "Retrieval", "Logical Reasoning", "Decision-making".
* **Bottom Row (Left to Right):** "Semantic Understanding", "Syntactic Understanding", "Inference", "Math Calculation".
### Detailed Analysis
Each heatmap is a 31x31 grid (Layers 0-30, Heads 0-30). The analysis below describes the visual trend (distribution of brighter, higher-importance cells) for each task.
1. **Knowledge Recall (Top-Left):**
* **Trend:** Scattered, low-to-moderate importance across many layers and heads. No single dominant cluster.
* **Notable Points:** Slightly brighter spots (approx. 0.0020-0.0025) appear sporadically, for example, around (Layer ~24, Head ~18) and (Layer ~30, Head ~24).
2. **Retrieval (Top-Second from Left):**
* **Trend:** Similar to Knowledge Recall but with a few more distinct, brighter points.
* **Notable Points:** A relatively bright point (approx. 0.0030) is visible near (Layer ~22, Head ~2). Another cluster of moderate importance (0.0015-0.0020) appears in the mid-layers (12-18) across various heads.
3. **Logical Reasoning (Top-Third from Left):**
* **Trend:** Very sparse activation. Most of the map is dark purple (0.0000-0.0005).
* **Notable Points:** A single, isolated bright yellow point (0.0040+) is located at approximately (Layer ~18, Head ~28). A few other faint points exist.
4. **Decision-making (Top-Right):**
* **Trend:** Moderate, scattered activation with a slight concentration in the lower-right quadrant (higher layers, higher head indices).
* **Notable Points:** Several points in the range of 0.0020-0.0030 are visible, particularly between Layers 18-30 and Heads 18-30.
5. **Semantic Understanding (Bottom-Left):**
* **Trend:** Diffuse, low-level activation across the entire grid. Very few points exceed moderate importance.
* **Notable Points:** The brightest spots (approx. 0.0015-0.0020) are scattered, with a minor cluster in the lower layers (0-12).
6. **Syntactic Understanding (Bottom-Second from Left):**
* **Trend:** Shows more structure than Semantic Understanding. There is a visible band of slightly elevated importance (0.0010-0.0020) running horizontally across the mid-layers (approximately Layers 12-24).
* **Notable Points:** A few brighter points (approx. 0.0025) are embedded within this band, e.g., near (Layer ~20, Head ~15).
7. **Inference (Bottom-Third from Left):**
* **Trend:** Extremely sparse, similar to Logical Reasoning. The vast majority of cells are at the lowest importance level.
* **Notable Points:** Only a handful of cells show any discernible color above dark purple, with none appearing to reach the high-importance yellow range.
8. **Math Calculation (Bottom-Right):**
* **Trend:** Shows the most distinct and concentrated pattern. There is a clear cluster of high-importance (bright yellow, 0.0040+) cells.
* **Notable Points:** This cluster is located in the **lower layers (approximately 0-12)** and spans **middle heads (approximately 12-24)**. This is the most visually striking pattern in the entire grid.
### Key Observations
* **Task-Specific Activation:** Different cognitive tasks activate distinct patterns of attention heads. "Math Calculation" has a highly localized, strong activation pattern, while "Inference" and "Logical Reasoning" are extremely sparse.
* **Layer Specialization:** For "Math Calculation," important heads are concentrated in early layers. For tasks like "Syntactic Understanding," importance is more distributed across mid-layers.
* **Sparsity:** Many tasks, especially "Inference" and "Logical Reasoning," show that only a very small subset of attention heads (often just one or two) are deemed highly important for that specific function.
* **Color Scale Consistency:** The color bar is applied uniformly across all eight heatmaps, allowing for direct comparison of importance values between tasks.
### Interpretation
This visualization provides a "functional map" of a neural network's attention mechanism. It suggests that the model develops specialized sub-components (specific heads in specific layers) for different types of cognitive processing.
* **Math Calculation's** concentrated early-layer activation implies that numerical processing might be a foundational, low-level operation in this model, handled by a dedicated set of heads soon after input embedding.
* The **sparsity** in "Logical Reasoning" and "Inference" is notable. It could indicate that these complex tasks rely on the precise, coordinated action of just a few critical heads, or that the importance metric used here is not capturing the distributed nature of these processes effectively.
* The contrast between **"Semantic Understanding"** (diffuse) and **"Syntactic Understanding"** (more structured band) suggests the model may process grammatical structure in a more localized, layer-specific manner than broad semantic meaning.
* **Limitations/Uncertainty:** The exact numerical values are approximate, inferred from the color scale. The interpretation assumes "Heads Importance" is a meaningful, comparable metric across tasks. The spatial grounding confirms that the bright cluster in "Math Calculation" is indeed in the lower layers and middle heads, as described. The language of all text in the image is English.
</details>
Figure 6: The existence of cognitive heads in Qwen3-8B responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
<details>
<summary>x9.png Details</summary>

### Visual Description
## Heatmap Grid: Head Importance Across Cognitive Tasks
### Overview
The image displays a grid of eight heatmaps arranged in a 2x4 layout. Each heatmap visualizes the "Heads Importance" (likely attention head importance scores) across different layers and heads of a neural network model for a specific cognitive task. The overall purpose is to show which attention heads in which layers are most important for different types of reasoning and understanding tasks.
### Components/Axes
* **Grid Structure:** 8 individual heatmaps in 2 rows and 4 columns.
* **Task Titles (Top of each heatmap):**
* Top Row (Left to Right): `Knowledge Recall`, `Retrieval`, `Logical Reasoning`, `Decision-making`
* Bottom Row (Left to Right): `Semantic Understanding`, `Syntactic Understanding`, `Inference`, `Math Calculation`
* **Axes (Common to all heatmaps):**
* **Y-axis (Vertical):** Labeled `Layer`. Scale runs from 0 at the top to 30 at the bottom, with major ticks at 0, 6, 12, 18, 24, 30.
* **X-axis (Horizontal):** Labeled `Head`. Scale runs from 0 on the left to 30 on the right, with major ticks at 0, 6, 12, 18, 24, 30.
* **Color Bar (Right side of the grid):**
* **Label:** `Heads Importance`
* **Scale:** A vertical gradient from dark purple (bottom) to bright yellow (top).
* **Tick Values (Approximate):** 0.0000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025, 0.0030, 0.0035, 0.0040+.
* **Interpretation:** Darker colors (purple/blue) indicate low importance. Brighter colors (green/yellow) indicate higher importance. The highest value is marked as `0.0040+`.
### Detailed Analysis
Each heatmap is a 31x31 grid (Layers 0-30, Heads 0-30). The color of each cell represents the importance score for that specific head in that specific layer for the given task.
**1. Knowledge Recall:**
* **Trend:** Importance is relatively diffuse and low across most of the grid. There is a faint, scattered pattern of slightly higher importance (teal/green) in the middle layers (approx. layers 12-24) across various heads. No single head or layer shows very high (yellow) importance.
**2. Retrieval:**
* **Trend:** Shows more distinct clusters of high importance compared to Knowledge Recall.
* **Key Data Points:** Several bright yellow/green spots are visible, indicating high importance.
* A notable cluster exists around **Layer 18-24, Head 6-12**.
* Another cluster appears around **Layer 24-27, Head 12-18**.
* Scattered high-importance points are also present in lower layers (e.g., near Layer 6, Head 18).
**3. Logical Reasoning:**
* **Trend:** Very sparse and low importance overall. The heatmap is predominantly dark purple/blue. A few isolated, faintly brighter (teal) points are scattered, but no strong clusters or high-importance (yellow) heads are evident.
**4. Decision-making:**
* **Trend:** Similar sparsity to Logical Reasoning, but with a few more noticeable points of medium importance.
* **Key Data Points:** Isolated brighter spots (green) appear, for example, near **Layer 24, Head 6** and **Layer 27, Head 18**. No strong yellow clusters.
**5. Semantic Understanding:**
* **Trend:** Very low and diffuse importance, closely resembling the pattern of Logical Reasoning. The grid is almost uniformly dark with minimal variation.
**6. Syntactic Understanding:**
* **Trend:** This heatmap shows the most pronounced and structured pattern of high importance.
* **Key Data Points:** A clear, dense cluster of high-importance (yellow) heads is located in the **lower-middle to lower layers**.
* The core of the cluster spans approximately **Layers 18-27** and **Heads 6-18**.
* The brightest yellow points (importance >0.0035) are concentrated within this region.
* A few high-importance points also appear at the very bottom layer (Layer 30).
**7. Inference:**
* **Trend:** Extremely sparse and low importance. This is the darkest heatmap, with almost no visible variation from the baseline dark purple. It suggests very few heads are specifically important for this task as measured.
**8. Math Calculation:**
* **Trend:** Shows a scattered, non-clustered pattern of medium to high importance.
* **Key Data Points:** High-importance (yellow/green) points are distributed across various layers and heads without forming a single dense cluster.
* Notable points include: **Layer 24, Head 0-3**; **Layer 27, Head 12**; **Layer 30, Head 6 and 24**.
* The distribution appears more random or distributed compared to the focused cluster in Syntactic Understanding.
### Key Observations
1. **Task-Specific Specialization:** The importance of attention heads is highly task-dependent. Patterns vary dramatically between tasks like Syntactic Understanding (dense cluster) and Inference (almost no signal).
2. **Layer Preference:** For tasks showing clear patterns (Retrieval, Syntactic Understanding, Math Calculation), important heads tend to be located in the **middle to lower layers** (approx. layers 12-30). The top layers (0-6) show very low importance across all tasks.
3. **Cluster vs. Scatter:** Syntactic Understanding and Retrieval show localized clusters of important heads. Math Calculation shows a more scattered distribution. Knowledge Recall, Logical Reasoning, Decision-making, Semantic Understanding, and Inference show diffuse or negligible patterns.
4. **Relative Importance:** The highest absolute importance scores (brightest yellow) are observed in the **Syntactic Understanding** and **Retrieval** heatmaps.
### Interpretation
This visualization provides a Peircean map of a model's internal "division of labor." It suggests that different cognitive tasks rely on distinct subsets of the model's attention mechanism, localized in specific layers and heads.
* **Syntactic Understanding** appears to be a highly specialized function, relying on a dedicated, concentrated module in the network's lower-middle layers. This aligns with linguistic theory where syntax is a foundational, rule-based processing stage.
* **Retrieval** also shows specialization, but the cluster is in a slightly different location, indicating a separate but possibly related circuit for accessing stored information.
* The **scattered pattern for Math Calculation** might indicate that mathematical reasoning is a more distributed process, engaging various capabilities across the network rather than a single dedicated "math module."
* The **near-absence of signal for Inference and Semantic Understanding** is a critical anomaly. This could mean: a) these tasks are so complex they don't rely on specific heads but on diffuse, global interactions not captured by this metric; b) the importance metric used is not sensitive to the type of processing these tasks require; or c) these capabilities are not well-developed or are represented differently in this model.
* The **general importance of middle/lower layers** across active tasks suggests these layers handle more task-specific, syntactic, and retrieval-oriented processing, while the very top layers may handle more abstract integration or output generation that is common across tasks.
In summary, the image reveals a model with clear, task-dependent specialization in its attention heads, particularly for syntactic and retrieval tasks, while highlighting potential gaps or different representational strategies for higher-order reasoning like inference and deep semantic understanding.
</details>
Figure 7: The existence of cognitive heads in Qwen3-4B responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
<details>
<summary>x10.png Details</summary>

### Visual Description
## Heatmap Grid: AI Model Head Importance Across Cognitive Tasks
### Overview
The image displays a grid of eight heatmaps arranged in two rows and four columns. Each heatmap visualizes the "importance" of attention heads (x-axis) across different layers (y-axis) of a neural network model for a specific cognitive task. The overall purpose is to show which parts of the model (specific layer-head combinations) are most active or significant for different types of reasoning and understanding.
### Components/Axes
* **Grid Structure:** 2 rows x 4 columns of individual heatmaps.
* **Individual Heatmap Titles (Top Row, Left to Right):**
1. Knowledge Recall
2. Retrieval
3. Logical Reasoning
4. Decision-making
* **Individual Heatmap Titles (Bottom Row, Left to Right):**
1. Semantic Understanding
2. Syntactic Understanding
3. Inference
4. Math Calculation
* **Y-Axis (Common to all heatmaps):** Labeled "Layer". Scale runs from 0 at the top to 42 at the bottom, with major tick marks at 0, 6, 12, 18, 24, 30, 36, 42.
* **X-Axis (Common to all heatmaps):** Labeled "Head". Scale runs from 0 on the left to 30 on the right, with major tick marks at 0, 6, 12, 18, 24, 30.
* **Color Bar/Legend (Positioned to the right of the grid):**
* **Label:** "Heads Importance"
* **Scale:** A vertical gradient bar.
* **Values (from bottom to top):** 0.0000, 0.0003, 0.0005, 0.0008, 0.0010, 0.0013, 0.0015, 0.0018, 0.0020+.
* **Color Mapping:** Dark purple/blue represents low importance (~0.0000). Colors transition through teal and green to bright yellow, which represents high importance (0.0020+).
### Detailed Analysis
Each heatmap is a 43x31 grid (Layers x Heads) where each cell's color indicates the importance value for that specific layer-head pair.
**Trend Verification & Data Point Analysis (by Heatmap):**
1. **Knowledge Recall:**
* **Trend:** Scattered, low-to-moderate importance. No strong, concentrated clusters.
* **Data Points:** A few isolated yellow/green spots (high importance) appear, notably around Layer ~30, Head ~18 and Layer ~36, Head ~6. Most of the grid is dark blue/purple.
2. **Retrieval:**
* **Trend:** Shows the most distinct and concentrated pattern of high importance.
* **Data Points:** A prominent band of high importance (yellow/green) is visible in the lower-middle layers, roughly between Layers 30-42. Within this band, importance is not uniform; it peaks in specific heads, such as around Head 12-18 and Head 24-30. The upper layers (0-24) are predominantly low importance.
3. **Logical Reasoning:**
* **Trend:** Very sparse high-importance points. Appears to have the lowest overall activation.
* **Data Points:** The grid is almost entirely dark blue. Only a handful of faint green/yellow pixels are visible, for example near Layer 36, Head 24.
4. **Decision-making:**
* **Trend:** Moderate, scattered importance with some clustering in mid-to-lower layers.
* **Data Points:** Several yellow/green spots are distributed, with a slight concentration in the lower half (Layers 24-42). Notable points include Layer ~24, Head ~18 and Layer ~36, Head ~12.
5. **Semantic Understanding:**
* **Trend:** Diffuse, low-level importance across the entire grid.
* **Data Points:** Very few high-importance (yellow) cells. The pattern is a speckled mix of dark blue and teal, indicating generally low but non-zero importance spread widely.
6. **Syntactic Understanding:**
* **Trend:** Shows a clear, structured pattern of moderate-to-high importance.
* **Data Points:** A distinct "grid-like" or "checkerboard" pattern of green/yellow cells is visible, particularly in the lower two-thirds of the layers (Layers 18-42). This suggests specific, regularly spaced heads are important for syntax.
7. **Inference:**
* **Trend:** Similar to Logical Reasoning, with very sparse high-importance signals.
* **Data Points:** The heatmap is predominantly dark. A few isolated green points are present, such as near Layer 30, Head 6.
8. **Math Calculation:**
* **Trend:** Scattered importance with a slight bias towards lower layers.
* **Data Points:** Isolated yellow/green spots appear, mainly in the bottom half (Layers 24-42). Examples include Layer ~36, Head ~0 and Layer ~42, Head ~24.
### Key Observations
* **Task-Specific Activation:** The model utilizes distinctly different patterns of layer-head importance for different cognitive tasks.
* **Retrieval is Unique:** The "Retrieval" task shows the most concentrated and intense activation pattern, localized to a specific band of lower layers.
* **Syntax vs. Semantics:** "Syntactic Understanding" has a more structured, grid-like importance pattern compared to the diffuse pattern of "Semantic Understanding."
* **Low Activation for Logic/Inference:** "Logical Reasoning" and "Inference" show the least activation, suggesting these tasks may rely on more distributed or subtle processing not captured strongly by this importance metric, or on different model components.
* **Layer Gradient:** For several tasks (Retrieval, Decision-making, Syntactic Understanding, Math Calculation), higher importance values are more frequently found in the lower half of the model (Layers 21-42).
### Interpretation
This visualization provides a "cognitive map" of a large language model, revealing how its internal components (attention heads) are differentially recruited for various intellectual tasks.
* **Functional Localization:** The data suggests a degree of functional localization within the model. The strong, localized pattern for **Retrieval** implies that accessing stored knowledge is a distinct process handled by specific circuits in the model's deeper layers. The structured pattern for **Syntactic Understanding** aligns with the idea that grammar processing may involve more regular, patterned computations.
* **Task Complexity & Resource Allocation:** The sparse activation for **Logical Reasoning** and **Inference** is intriguing. It could indicate that these tasks are either: a) performed by a very small, specialized set of heads, b) rely on interactions not captured by this single "importance" metric, or c) are more emergent properties of the entire network's activity rather than localized to specific heads.
* **Architectural Insight:** The concentration of activity in lower layers (higher layer numbers) for many tasks is consistent with some interpretability research suggesting that deeper layers in transformer models often handle more task-specific, semantic processing after earlier layers perform more general feature extraction.
* **Limitation:** The metric is labeled "Heads Importance," but the exact definition (e.g., based on attention weight magnitude, gradient saliency, or another probe) is not specified. The interpretation is therefore relative—comparing patterns across tasks—rather than absolute. The "0.0020+" ceiling on the color bar suggests the highest values may be clipped, potentially masking the true peak importance for tasks like Retrieval.
</details>
Figure 8: The existence of cognitive heads in Yi-1.5-9B responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
<details>
<summary>x11.png Details</summary>

### Visual Description
## Heatmap Grid: Head Importance Across Cognitive Tasks
### Overview
The image displays a grid of eight heatmaps arranged in two rows of four. Each heatmap visualizes the "Heads Importance" (likely attention head importance scores) across different layers and heads of a neural network model for a specific cognitive task. The overall purpose is to compare how the model allocates its attention resources for different types of reasoning and understanding.
### Components/Axes
* **Grid Structure:** 2 rows x 4 columns of individual heatmaps.
* **Individual Heatmap Axes:**
* **Y-axis (Vertical):** Labeled "Layer". Scale runs from 0 at the top to 30 at the bottom, with major ticks at 0, 6, 12, 18, 24, 30.
* **X-axis (Horizontal):** Labeled "Head". Scale runs from 0 on the left to 30 on the right, with major ticks at 0, 6, 12, 18, 24, 30.
* **Color Bar (Legend):** Located on the far right of the entire grid.
* **Title:** "Heads Importance".
* **Scale:** A vertical gradient from dark purple (bottom) to bright yellow (top).
* **Values:** Labeled ticks at 0.0000, 0.0005, 0.0010, 0.0015, 0.0020, 0.0025, and "0.0030+" at the top.
* **Heatmap Titles (Top Row, Left to Right):**
1. Knowledge Recall
2. Retrieval
3. Logical Reasoning
4. Decision-making
* **Heatmap Titles (Bottom Row, Left to Right):**
1. Semantic Understanding
2. Syntactic Understanding
3. Inference
4. Math Calculation
### Detailed Analysis
Each heatmap is a 31x31 grid (Layers 0-30, Heads 0-30). The color of each cell represents the importance score for that specific head at that layer for the given task.
**1. Knowledge Recall:**
* **Trend:** Diffuse, low-to-moderate importance across most of the grid. No single, dominant cluster.
* **Notable Patterns:** Slightly elevated importance (teal/green) appears scattered in the middle layers (approx. 12-24) across various heads. A few isolated brighter spots (yellow-green) are visible in the lower layers (24-30), particularly around heads 0-6.
**2. Retrieval:**
* **Trend:** Shows the most distinct and localized clusters of high importance.
* **Notable Patterns:** A prominent, bright yellow cluster (high importance, >0.0025) is located in the middle layers, approximately layers 18-21, spanning heads 6-12. Another significant cluster of high importance (yellow-green) appears in layers 24-27, heads 18-24. The rest of the map is predominantly dark purple (low importance).
**3. Logical Reasoning:**
* **Trend:** Very diffuse and low importance overall. The darkest heatmap of the set.
* **Notable Patterns:** Almost the entire grid is dark purple (0.0000-0.0005). A very faint, scattered pattern of slightly higher importance (dark blue) is barely visible in the middle layers (12-24).
**4. Decision-making:**
* **Trend:** Moderately diffuse, with some concentration in middle-to-higher layers.
* **Notable Patterns:** A band of slightly elevated importance (blue-green) runs horizontally across the middle layers (approx. 15-24). A few brighter spots (green-yellow) are present in the lower layers (24-30), especially around heads 0-6 and 24-30.
**5. Semantic Understanding:**
* **Trend:** Diffuse, similar to Knowledge Recall but with a slightly more defined horizontal band.
* **Notable Patterns:** A consistent, faint horizontal band of moderate importance (blue-green) is visible across most heads in the middle layers (approx. 15-21). The lower layers (24-30) show scattered low-to-moderate importance.
**6. Syntactic Understanding:**
* **Trend:** Shows a clear, concentrated horizontal band of importance.
* **Notable Patterns:** A distinct band of elevated importance (green-yellow) is located in the middle layers, approximately layers 15-21, stretching across most heads (0-30). This is the most defined horizontal structure among all maps.
**7. Inference:**
* **Trend:** Very diffuse and low importance, similar to Logical Reasoning.
* **Notable Patterns:** The grid is almost entirely dark purple. A very sparse scattering of slightly higher importance (dark blue) is present, with no clear concentration.
**8. Math Calculation:**
* **Trend:** Shows a unique pattern with high importance concentrated in the very lowest layers and specific head clusters.
* **Notable Patterns:** The most striking feature is a bright yellow cluster (very high importance) in the bottom-right corner, specifically layers 27-30 and heads 24-30. Another cluster of high importance (yellow-green) is visible in layers 24-27, heads 6-12. The upper and middle layers are mostly dark.
### Key Observations
1. **Task-Specific Allocation:** The model allocates attention head importance very differently depending on the cognitive task. Retrieval and Math Calculation show highly localized, intense clusters, while Logical Reasoning and Inference show almost no concentrated importance.
2. **Layer Specialization:** High-importance heads are most frequently found in the middle-to-lower layers (approx. 15-30). The upper layers (0-12) are consistently low-importance across all tasks.
3. **Horizontal vs. Clustered Patterns:** Syntactic Understanding shows a clear horizontal band (importance consistent across heads at specific layers). In contrast, Retrieval and Math Calculation show tight, localized clusters (importance specific to certain head-layer combinations).
4. **Math Calculation Anomaly:** The pattern for Math Calculation is an outlier, with its highest importance scores located in the very last layers and heads, unlike any other task.
5. **Low Importance for Abstract Reasoning:** Tasks like Logical Reasoning and Inference, which involve abstract step-by-step processing, show the least defined head importance patterns, suggesting a more distributed or less head-specific processing mechanism.
### Interpretation
This visualization provides a "cognitive map" of a neural network's internal processing. It suggests that different capabilities are not uniformly handled but are instead supported by specialized subsystems within the model.
* **Retrieval** relies on a specific set of heads in the middle layers, likely acting as a dedicated "memory access" module.
* **Syntactic Understanding** uses a consistent set of heads across a specific layer range, indicating a stable, dedicated circuit for parsing grammatical structure.
* **Math Calculation** uniquely depends on the final processing stages (lowest layers), possibly because mathematical operations require the most refined, integrated representations before output.
* The diffuse patterns for **Logical Reasoning** and **Inference** might indicate that these tasks are not solved by dedicated "reasoning heads" but emerge from the complex interaction of many heads across the network, or that the importance metric is less sensitive to the heads involved in these processes.
The stark contrast between tasks implies that improving a model's performance on a specific capability (e.g., math) might require targeted intervention on the specific heads and layers identified here, rather than a uniform approach. The absence of importance in the earliest layers (0-12) across all tasks suggests these layers perform more general, low-level feature extraction common to all processing.
</details>
Figure 9: The existence of cognitive heads in Yi-1.5-6B responsible for eight distinct functions in complex reasoning tasks. The x-axis represents the head index, while the y-axis indicates the layer index.
### A.2 Importance curve
We ranked the importance scores and identified the elbow point, as illustrated in Figure 10.
<details>
<summary>x12.png Details</summary>

### Visual Description
## Line Charts: Importance of Attention Heads Across Cognitive Tasks
### Overview
The image displays a 2x4 grid of eight line charts, each illustrating the relationship between the number of attention "Heads" (x-axis) and their "Importance" (y-axis) for a specific cognitive task. All charts share identical axes scales and a consistent visual pattern: a steeply decaying curve. A single red dot is plotted on each curve, marking a specific point of interest.
### Components/Axes
* **Chart Titles (Top of each subplot):** Decision-making, Inference, Knowledge Recall, Logical Reasoning, Math Calculation, Retrieval, Semantic Understanding, Syntactic Understanding.
* **Y-Axis Label (Left side, shared):** "Importance"
* **Y-Axis Scale (Left side, shared):** Linear scale from `0.00e+00` to `6.00e-03`, with major ticks at `1.50e-03`, `3.00e-03`, `4.50e-03`, and `6.00e-03`.
* **X-Axis Label (Bottom center, shared):** "Heads"
* **X-Axis Scale (Bottom of each subplot):** Non-linear scale with labeled ticks at `1`, `256`, `512`, `768`, and `1024`.
* **Data Series:** A single black line in each chart.
* **Marker:** A solid red circle (dot) on each line.
### Detailed Analysis
Each chart follows the same fundamental trend: a rapid, near-exponential decay in "Importance" as the number of "Heads" increases from 1. The curve flattens significantly after approximately 256 heads, showing diminishing returns.
**Spatial Grounding & Trend Verification:**
The red dot is consistently placed at the x-axis value of **256** in all eight charts. Its vertical position (y-value) varies, indicating the importance score at that specific point for each task.
**Task-Specific Data Points (Approximate):**
* **Decision-making:** Curve starts near `4.50e-03` at 1 head. Red dot at 256 heads is at ~`1.00e-03`.
* **Inference:** Curve starts near `3.00e-03`. Red dot at 256 heads is at ~`0.75e-03`.
* **Knowledge Recall:** Curve starts near `3.00e-03`. Red dot at 256 heads is at ~`0.75e-03`.
* **Logical Reasoning:** Curve starts near `3.00e-03`. Red dot at 256 heads is at ~`0.75e-03`.
* **Math Calculation:** Curve starts highest, near `6.00e-03`. Red dot at 256 heads is at ~`1.00e-03`.
* **Retrieval:** Curve starts near `3.00e-03`. Red dot at 256 heads is at ~`0.75e-03`.
* **Semantic Understanding:** Curve starts near `6.00e-03`. Red dot at 256 heads is at ~`1.00e-03`.
* **Syntactic Understanding:** Curve starts near `4.50e-03`. Red dot at 256 heads is at ~`1.00e-03`.
### Key Observations
1. **Universal Decay Pattern:** All eight cognitive tasks exhibit the same sharp decline in head importance, suggesting a common underlying principle in the model's architecture or training.
2. **Initial Importance Varies:** The starting importance (at 1 head) differs by task. "Math Calculation" and "Semantic Understanding" begin at the highest level (`~6.00e-03`), while "Inference," "Knowledge Recall," "Logical Reasoning," and "Retrieval" begin at a lower level (`~3.00e-03`).
3. **Convergence Point:** The red dot at 256 heads serves as a visual benchmark. For tasks with higher initial importance (Math, Semantic, Decision, Syntactic), the importance at 256 heads is approximately `1.00e-03`. For tasks with lower initial importance, it is approximately `0.75e-03`.
4. **Plateau:** Beyond 256 heads, the curve becomes very flat, indicating that heads numbered 512, 768, and 1024 contribute negligible additional importance.
### Interpretation
This visualization likely analyzes the contribution of individual attention heads in a large language model (with up to 1024 heads) to various cognitive capabilities. The data suggests a **"sparse importance"** or **"lottery ticket"** hypothesis within the model's attention mechanism.
* **What it demonstrates:** A very small subset of attention heads (likely the first few dozen or hundred) carry the vast majority of the functional importance for performing these tasks. The remaining hundreds of heads contribute minimally.
* **Relationship between elements:** The consistent decay across diverse tasks implies this is a structural property of the model, not task-specific. The red dot at 256 heads may represent a design choice, a pruning threshold, or a point of analysis in the source study.
* **Notable implications:** This pattern has significant implications for model efficiency and interpretability. It suggests that model compression techniques (like pruning) could potentially remove a large percentage of attention heads (e.g., those beyond the 256 mark) with minimal impact on performance across these core cognitive tasks. It also indicates that understanding model behavior may be tractable by focusing on this smaller, high-importance subset of components. The variation in starting importance hints that some tasks (like math and semantic understanding) may rely on a more concentrated or potent set of initial heads compared to others.
</details>
Figure 10: Importance curve for eight functions, Llama3.1-8B-instruct.
### A.3 MLP
We train a two-layer multi-class MLP for cognitive function classification. The first layer applies a shared linear projection to each multi-head representation vector, reducing each to a 64-dimensional embedding. These embeddings are then flattened and concatenated into a single vector of size $64\times numberofheads$ . This vector is fed into a hidden layer with 512 units, followed by a ReLU activation and a dropout with a rate of 0.3. The final output layer maps the 512-dimensional hidden representation to the set of cognitive function labels.
The model is trained using the Adam optimizer with a learning rate of $10^{-4}$ and a cross-entropy loss. Training proceeds for 100 epochs. The test accuracy of our classification method across all LLM models is summarized in the Table 5.
Table 5: The test accuracy (%) of probing method on different LLMs.
| Dataset CogQA | Llama3.1-8B-instruct 83.73 | Llama3.2-3B-instruct 79.80 | Qwen3-8B 84.71 | Qwen3-4B 80.79 | Yi-1.5-9B 77.56 | Yi-1.5-6B 75.18 |
| --- | --- | --- | --- | --- | --- | --- |
### A.4 Prompt for Generating CogQA
Prompt
Prompt: You are an expert in analytical logical reasoning. You will be given a question along with its chain-of-thought process. Your task is to break the question down into subquestions based on the chain-of-thought process, ensuring that all necessary steps for solving the problem and constructing the logical chain are included to simulate critical thinking. Decompose the Question: Identify and formulate the key subquestions required to solve the main question logically. Fill in Missing Steps: Ensure that all essential reasoning steps are explicitly stated. NOTE: The information of chain-of-thought cannot be used directly if it doesn’t exist in main query. Each subquestion should be derived solely from the main query and the preceding subquestion. Answer the Subquestions: Provide clear, step-by-step solutions for each subquestion. Annotate Cognitive Skills: Identify and label the specific cognitive abilities required to answer each subquestion. If you believe other cognitive skills are relevant, you may also consider incorporating them. You will be given predefined labels along with their descriptions. Your goal is to enhance the logical reasoning process by making it explicit and structured. <cognitive_skills> **Retrieval**: Refers to the process of fetching relevant information from input text, typically involving the extraction of specific words, phrases, or sentences directly from the original text. **Knowledge Recall**: Involves the storage and recall of domain-specific knowledge, such as concepts from math, physics, biology, etc. This is typically the internal knowledge base of a language model. (Corresponding to the memory head) **Semantic Understanding**: Refers to the ability to comprehend and extract meaning from text or symbols by recognizing relationships between words, phrases, and concepts. It goes beyond syntactic understanding by grasping context, intent, and underlying knowledge. **Syntactic Understanding*: Involves the ability to analyze and interpret the grammatical structure of sentences, including the roles and relationships of words, phrases, and clauses within the language. **Math Calculation**: Refers to the process of performing arithmetic or mathematical operations to obtain a result. It involves applying mathematical concepts, such as addition, subtraction, multiplication, division, and more complex operations (e.g., algebra, calculus), to solve problems or derive values from given inputs. **Inference**: Involves drawing conclusions based on existing evidence or information. It follows logical rules to deduce new statements or decisions from given information. **Logical Reasoning**: The process of drawing conclusions based on a set of premises, following established rules of logic, used to ensure that decisions of people are coherent, consistent, and based on sound principles. **Decision-making**: The process of making a choice in a selection task based on previous information or analysis. <cognitive_skills> Here is the question: <question> question <question> Here is the chain-of-thought: <chain-of-thought> cot <chain-of-thought> Note - Your task is to break the question down into detailed subquestions, ensuring each subquestion can be answered using only one specific cognitive skill. - You need to create a structured and explicit reasoning process that simulates critical thinking while maintaining clarity and precision. - The subquestion needs to be easy to answer and the answer needs to be concise - The information of chain-of-thought cannot be used directly if it doesn’t exist in main query. - Each subquestion should be derived solely from the main query and the preceding subquestion. - You CAN NOT retrieval information from chain-of-thought, but you can retrieval from question. - Your output should be formatted as a list of JSON objects, where each object represents a subquestion, its answer, and the required cognitive skill. - You should use the most efficient logic to analyze the problem and minimize the number of subquestions. Output format [ "subquestion": "<Subquestion text>", "answer": "<Concise answer>", "cognitive_skill": "<Assigned cognitive skill>" , "subquestion": "<Subquestion text>", "answer": "<Concise answer>", "cognitive_skill": "<Assigned cognitive skill>" ] Your answer:
### A.5 Annotations
To ensure the quality and reliability of the decomposed subQAC triplets in the CogQA dataset, we design a rigorous multi-stage annotation pipeline, combining expert review and model-based verification. The goal is to verify the logical validity of subquestions, the correctness of their associated cognitive function labels, and the accuracy of the answers.
#### Stage 1: Validating Subquestion Decomposition
In the first stage, we evaluate whether the generated subquestions are logically sound and align with natural human reasoning. For each QA pair, three expert annotators (with backgrounds in linguistics or cognitive science) independently assess the validity of each subquestion. A subquestion is marked true if it meaningfully contributes to answering the main question and follows a logical reasoning trajectory. Otherwise, it is marked false.
If a subquestion depends on prior information—such as the question text or the answer—from another subquestion, the subquestion order must reflect this dependency. While some subquestions can be answered in parallel and are order-independent, others have prerequisite relationships that require a specific sequence. As the overall reasoning structure often forms a graph, where both sequential and parallel dependencies coexist. During LLM inference, we include the previous subquestions and their corresponding subanswers in the prompt as prior information. Thus, the critical factor is not the ordering alone, but whether the prompt provides the necessary context to answer the current subquestion accurately.
We apply the following filtering criteria:
- AI-Human Agreement: If any annotator considers fewer than 60% of the subquestions valid, the entire QA decomposition is discarded.
- Inter-Annotator Agreement: A subquestion is deemed invalid if at least two annotators mark it as false. If over 40% of the subquestions in a QA pair are invalid under this rule, the whole QA pair is removed.
This filtering ensures that the retained QA decompositions follow coherent, cognitively plausible reasoning chains.
#### Stage 2: Verifying Cognitive Function Labels
In the second stage, annotators evaluate the correctness of the cognitive function label $c_{i}$ assigned to each subQAC triplet $(q_{i},a_{i},c_{i})$ . Three annotators independently mark each label as true or false. When discrepancies occur, annotators collaboratively reassign the correct cognitive label to ensure alignment with the underlying mental operation.
This step ensures that the categorization of subquestions accurately reflects established distinctions between information retrieval, semantic understanding, logical reasoning, and other cognitive processes.
#### Stage 3: Answer Verification via Model and Human Review
In the final stage, we verify the correctness of each answer $a_{i}$ using both automated and manual procedures. We employ the o4-mini model [o4mini2024], known for its logical reasoning capabilities, to re-evaluate GPT-4o-generated answers. If o4-mini disagrees with GPT-4o, it provides an alternative answer. A human annotator then compares both answers and resolves discrepancies by supplying the correct one when necessary. Given the generally objective nature of answers, only one annotator is required for this task.
#### Annotation Outcome
Following this multi-stage process, we retain 570 validated QA pairs, yielding a total of 3,402 high-quality subQAC triplets. Notably, we augment certain cognitive functions to ensure balance across categories. As a result, the original 570 QA pairs were expanded to 720 (including some duplicates), with each duplicated pair potentially associated with distinct subquestions and cognitive functions.
### A.6 CogQA Example
Table 6 presents illustrative examples from the CogQA dataset. The main question and its corresponding answer are taken from the original dataset. Based on an analysis of the main question, a sequence of sub-questions, their answers, and associated cognitive function labels are generated in order.
Table 6: Two examples from the CogQA dataset showing a main question, its final answer, and a breakdown into subquestions with answers and their corresponding cognitive function labels.
| Main Question | A one-year subscription to a newspaper is offered with a 45% discount. How much does the discounted subscription cost if a subscription normally costs $80? |
| --- | --- |
| Answer | We calculate first the discount: 80 $\times$ 45 / 100 = $36. So, the discounted subscription amounts to 80 – 36 = $44. |
| 1. What is the normal cost of a one-year subscription to the newspaper? | $80 | Retrieval |
| --- | --- | --- |
| 2. What is the discount percentage offered on the subscription? | 45% | Retrieval |
| 3. How much is the discount amount in dollars for the subscription? | $36 | Math Calculation |
| 4. What is the cost of the subscription after applying the discount? | $44 | Math Calculation |
| Main Question | What does every person talk out of? Options: - name - hide - mother and father - mouth - heart |
| --- | --- |
| Answer | By mouth, talking is done. Every person talk out of mouth. |
| 1. What is the primary function of talking? | To communicate verbally. | Knowledge Recall |
| --- | --- | --- |
| 2. Which part of the human body is primarily used for verbal communication? | Mouth | Knowledge Recall |
| 3. Based on the options provided, which option corresponds to the part used for verbal communication? | Mouth | Decision-making |
### A.7 Prompt for Question Asking
Prompt
Prompt: You are an expert in analytical and logical reasoning. You will be given a main question and prior knowledge in chain-of-thought (CoT) format. Your task is to answer a follow-up subquestion using the information provided. Here is the main question: <main_question> question </main_question> Here is the prior knowledge in chain-of-thought (CoT) format: <prior_knowledge> cot </prior_knowledge> Here is the subquestion: <subquestion> subquestion </subquestion> Instructions: - Answer the subquestion carefully. - You can use the information in the prior_knowledge to help you answer the subquestion. - Your response should be clear and concise. - Stick to factual reasoning based on provided CoT. - Do not include any explanation, commentary, or code. - Do not output anything after the closing square bracket ‘]‘. Only output your final answer using this format: [ "answer": "<Your answer here>" ] Your answer:
### A.8 The number of cognitive heads for different LLMs
The number of cognitive heads for each model is shown in Table 7.
Table 7: Count (C) and percentage (%) of attention heads exceeding elbow thresholds for each cognitive function across six models.
| Llama3.1-8B-instruct Llama3.2-3B-instruct Qwen3-8B | 105 95 119 | 10.3 14.1 10.3 | 118 62 115 | 11.5 9.2 10.0 | 142 95 114 | 13.9 14.1 9.9 | 124 87 87 | 12.1 12.9 7.6 | 60 90 68 | 5.9 13.4 5.9 | 81 63 108 | 7.9 9.4 9.4 | 139 98 178 | 13.6 14.6 15.5 | 59 35 61 | 5.8 5.2 5.3 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Qwen3-4B | 115 | 10.0 | 94 | 8.2 | 120 | 10.4 | 170 | 14.8 | 143 | 12.4 | 106 | 9.2 | 109 | 9.5 | 99 | 8.6 |
| Yi-1.5-9B | 200 | 13.0 | 134 | 8.7 | 134 | 8.7 | 174 | 11.3 | 218 | 14.2 | 140 | 9.1 | 173 | 11.3 | 167 | 10.9 |
| Yi-1.5-6B | 118 | 11.5 | 90 | 8.8 | 200 | 19.5 | 93 | 9.1 | 99 | 9.7 | 142 | 13.9 | 146 | 14.3 | 67 | 6.5 |
### A.9 Ablation study - Different position of head activation
In the main experiments, we use the top-k generated tokens and average their multi-head attention vectors. We also explore alternative strategies for extracting representations, including using the first generated token, the last generated token, the first meaningful token, and the average of all generated tokens. The corresponding results are shown in Table 8.
Here, first is the first token, last is the last token, meaning_first is the first meaning token (excluding formatting), top-k is the top-k most semantically important tokens, full is all tokens in the answer.We observe that top-k token masking leads to the most significant performance drop when masking the top-30 identified heads, indicating higher precision in identifying retrieval-relevant heads. Interestingly, last, meaning_first, full, and top-k show similar performance trends. This is because different tokens in the output contribute to answering the question, and as the number of masked cognitive heads increases, the influence of token using decreases. Additionally, for Retrieval, the full answer is usually meaningful, whereas others like Math Calculation require semantically meaningful tokens. Based on these results, we choose top-k as our final setting.
Table 8: Attention heads associated with cognitive functions are selected based on different token positions. Accuracy and COMET scores are evaluated after intervention; lower values indicate better outcomes.
| Llama3.1-8B Llama3.1-8B Llama3.1-8B | 30 30 30 | first last meaning_first | 90.51 86.86 88.13 | 83.53 81.76 79.41 | 91.13 90.04 89.72 | 73.13 68.66 68.66 |
| --- | --- | --- | --- | --- | --- | --- |
| Llama3.1-8B | 30 | full | 73.93 | 47.06 | 89.92 | 69.15 |
| Llama3.1-8B | 30 | top-k | 70.05 | 46.47 | 89.32 | 67.16 |
| Llama3.1-8B | 50 | first | 93.28 | 89.41 | 94.46 | 89.57 |
| Llama3.1-8B | 50 | last | 64.39 | 41.18 | 92.05 | 70.15 |
| Llama3.1-8B | 50 | meaning_first | 62.90 | 34.12 | 84.60 | 60.69 |
| Llama3.1-8B | 50 | full | 46.20 | 11.76 | 89.01 | 78.11 |
| Llama3.1-8B | 50 | top-k | 65.64 | 47.76 | 89.65 | 70.15 |
### A.10 Examples of top- $k$ tokens
The selected tokens are intended to semantically represent the generated answer. Below are examples for different cognitive functions for Llama3.1-8B-instruct:
Table 9: Examples of question decomposition with cognitive functions and token selection.
| Given the sentence "A surfboarder catches the waves." can we conclude that "A surfboarder in the water."? (Options: yes / it is not possible to tell / no) | What is typically required for a surfboarder to catch waves? | Knowledge Recall | The surfboarder needs to be in the water. | [’surfboarder’, ’needs’, ’be’, ’in’, ’water’] |
| --- | --- | --- | --- | --- |
| Is the following a factual statement? "Due to its high density, countries around the world use Palladium to mint coins." (Options: yes / no) | What is the statement in question? | Retrieval | The statement in question is: Due to its high density, countries around the world use Palladium to mint coins. | [’high’, ’density’, ’Palladium’, ’mint’, ’coins’] |
| A one-year subscription to a newspaper is offered with a 45% discount. How much does the discounted subscription cost if a subscription normally costs $80? | How much is the discount amount in dollars for the subscription? | Math Calculation | 36 | [’36’] |
We can see that the selected tokens semantically represent the answer. Note that we use all tokens when the number of tokens is fewer than 5.