2306.08302v3
Model: healer-alpha-free
# Unifying Large Language Models and Knowledge Graphs: A Roadmap
> Shirui Pan is with the School of Information and Communication Technology and Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Queensland, Australia. Email: Linhao Luo and Yufei Wang are with the Department of Data Science and AI, Monash University, Melbourne, Australia. E-mail: Chen Chen is with the Nanyang Technological University, Singapore. E-mail: Jiapu Wang is with the Faculty of Information Technology, Beijing University of Technology, Beijing, China. E-mail: Xindong Wu is with the Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), Hefei University of Technology, Hefei, China, and also with the Research Center for Knowledge Engineering, Zhejiang Lab, Hangzhou, China. Email: Shirui Pan and Linhao Luo contributed equally to this work. Corresponding Author: Xindong Wu.
## Abstract
Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the field of natural language processing and artificial intelligence, due to their emergent ability and generalizability. However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge. In contrast, Knowledge Graphs (KGs), Wikipedia and Huapu for example, are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability. Meanwhile, KGs are difficult to construct and evolve by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. Therefore, it is complementary to unify LLMs and KGs together and simultaneously leverage their advantages. In this article, we present a forward-looking roadmap for the unification of LLMs and KGs. Our roadmap consists of three general frameworks, namely, 1) KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs, or for the purpose of enhancing understanding of the knowledge learned by LLMs; 2) LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering; and 3) Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge. We review and summarize existing efforts within these three frameworks in our roadmap and pinpoint their future research directions.
Index Terms: Natural Language Processing, Large Language Models, Generative Pre-Training, Knowledge Graphs, Roadmap, Bidirectional Reasoning. publicationid: pubid: 0000–0000/00$00.00 © 2023 IEEE
## 1 Introduction
Large language models (LLMs) LLMs are also known as pre-trained language models (PLMs). (e.g., BERT [1], RoBERTA [2], and T5 [3]), pre-trained on the large-scale corpus, have shown great performance in various natural language processing (NLP) tasks, such as question answering [4], machine translation [5], and text generation [6]. Recently, the dramatically increasing model size further enables the LLMs with the emergent ability [7], paving the road for applying LLMs as Artificial General Intelligence (AGI). Advanced LLMs like ChatGPT https://openai.com/blog/chatgpt and PaLM2 https://ai.google/discover/palm2, with billions of parameters, exhibit great potential in many complex practical tasks, such as education [8], code generation [9] and recommendation [10].
<details>
<summary>extracted/5367551/figs/LLM_vs_KG.png Details</summary>

### Visual Description
## Diagram: Comparison of Knowledge Graphs (KGs) and Large Language Models (LLMs)
### Overview
The image is a Venn diagram comparing the characteristics of Knowledge Graphs (KGs) and Large Language Models (LLMs). It visually presents the pros and cons of each technology and suggests a complementary relationship between them through directional arrows.
### Components/Axes
The diagram consists of two large, overlapping circles on a light gray background.
* **Top Circle (Light Blue):** Labeled **"Knowledge Graphs (KGs)"** at the top center.
* **Bottom Circle (Yellow):** Labeled **"Large Language Models (LLMs)"** at the bottom center.
* **Overlapping Region:** The intersection of the two circles is a lighter blue shade.
* **Arrows:**
* A **blue curved arrow** originates from the KGs circle (near its "Pros" list) and points towards the LLMs circle (near its "Cons" list).
* A **yellow curved arrow** originates from the LLMs circle (near its "Pros" list) and points towards the KGs circle (near its "Cons" list).
### Detailed Analysis
The textual content is organized into "Pros" and "Cons" lists within each circle.
**1. Knowledge Graphs (KGs) - Blue Circle (Top)**
* **Pros (Listed in the upper-right section of the blue circle):**
* Structural Knowledge
* Accuracy
* Decisiveness
* Interpretability
* Domain-specific Knowledge
* Evolving Knowledge
* **Cons (Listed in the lower-right section of the blue circle, within the overlap):**
* Incompleteness
* Lacking Language Understanding
* Unseen Facts
**2. Large Language Models (LLMs) - Yellow Circle (Bottom)**
* **Pros (Listed in the lower-left section of the yellow circle, within the overlap):**
* General Knowledge
* Language Processing
* Generalizability
* **Cons (Listed in the upper-left section of the yellow circle):**
* Implicit Knowledge
* Hallucination
* Indecisiveness
* Black-box
* Lacking Domain-specific/New Knowledge
### Key Observations
* **Spatial Organization:** The "Cons" for KGs and the "Pros" for LLMs are placed within the overlapping region of the Venn diagram, suggesting these are the areas where the two technologies intersect or where one's characteristics are most relevant to the other's weaknesses.
* **Directional Relationship:** The arrows explicitly map a relationship of potential mitigation. The blue arrow suggests KGs' strengths (e.g., Structural Knowledge, Accuracy) can address LLMs' weaknesses (e.g., Hallucination, Black-box). The yellow arrow suggests LLMs' strengths (e.g., Language Processing, Generalizability) can address KGs' weaknesses (e.g., Incompleteness, Lacking Language Understanding).
* **Contrasting Pairs:** Direct opposites are highlighted, such as KGs' "Accuracy" vs. LLMs' "Hallucination," and KGs' "Interpretability" vs. LLMs' "Black-box."
### Interpretation
This diagram presents a **comparative and synergistic analysis** of two foundational AI technologies. It argues that KGs and LLMs have fundamentally different, often inverse, strengths and weaknesses.
* **Core Argument:** The data suggests that neither technology is a complete solution on its own. KGs excel at providing structured, verifiable, and explainable knowledge but are rigid and lack linguistic nuance. LLMs excel at flexible language understanding and general reasoning but are prone to generating incorrect information (hallucinations) and operate as opaque "black boxes."
* **Proposed Synergy:** The arrows are the most critical interpretive element. They propose a **bidirectional integration** where each technology can compensate for the other's flaws. A hybrid system could use a KG to ground an LLM's responses in factual, structured data, reducing hallucinations and improving interpretability. Conversely, an LLM could be used to populate, query, and natural-language interface with a KG, helping to overcome its incompleteness and lack of language understanding.
* **Underlying Message:** The diagram advocates for a **Peircean investigative approach**—moving beyond viewing these models in isolation. The "truth" or effective application lies in their relationship and combination. The visual layout implies that the future of robust AI systems may reside in the overlapping region, leveraging the decisive, structured knowledge of graphs with the generalizable, linguistic prowess of large models.
</details>
Figure 1: Summarization of the pros and cons for LLMs and KGs. LLM pros: General Knowledge [11], Language Processing [12], Generalizability [13]; LLM cons: Implicit Knowledge [14], Hallucination [15], Indecisiveness [16], Black-box [17], Lacking Domain-specific/New Knowledge [18]. KG pros: Structural Knowledge [19], Accuracy [20], Decisiveness [21], Interpretability [22], Domain-specific Knowledge [23], Evolving Knowledge [24]; KG cons: Incompleteness [25], Lacking Language Understanding [26], Unseen Facts [27]. Pros. and Cons. are selected based on their representativeness. Detailed discussion can be found in Appendix A.
Despite their success in many applications, LLMs have been criticized for their lack of factual knowledge. Specifically, LLMs memorize facts and knowledge contained in the training corpus [14]. However, further studies reveal that LLMs are not able to recall facts and often experience hallucinations by generating statements that are factually incorrect [28, 15]. For example, LLMs might say “Einstein discovered gravity in 1687” when asked, “When did Einstein discover gravity?”, which contradicts the fact that Isaac Newton formulated the gravitational theory. This issue severely impairs the trustworthiness of LLMs.
As black-box models, LLMs are also criticized for their lack of interpretability. LLMs represent knowledge implicitly in their parameters. It is difficult to interpret or validate the knowledge obtained by LLMs. Moreover, LLMs perform reasoning by a probability model, which is an indecisive process [16]. The specific patterns and functions LLMs used to arrive at predictions or decisions are not directly accessible or explainable to humans [17]. Even though some LLMs are equipped to explain their predictions by applying chain-of-thought [29], their reasoning explanations also suffer from the hallucination issue [30]. This severely impairs the application of LLMs in high-stakes scenarios, such as medical diagnosis and legal judgment. For instance, in a medical diagnosis scenario, LLMs may incorrectly diagnose a disease and provide explanations that contradict medical commonsense. This raises another issue that LLMs trained on general corpus might not be able to generalize well to specific domains or new knowledge due to the lack of domain-specific knowledge or new training data [18].
To address the above issues, a potential solution is to incorporate knowledge graphs (KGs) into LLMs. Knowledge graphs (KGs), storing enormous facts in the way of triples, i.e., $(head~{}entity,relation,tail~{}entity)$ , are a structured and decisive manner of knowledge representation (e.g., Wikidata [20], YAGO [31], and NELL [32]). KGs are crucial for various applications as they offer accurate explicit knowledge [19]. Besides, they are renowned for their symbolic reasoning ability [22], which generates interpretable results. KGs can also actively evolve with new knowledge continuously added in [24]. Additionally, experts can construct domain-specific KGs to provide precise and dependable domain-specific knowledge [23].
Nevertheless, KGs are difficult to construct [25], and current approaches in KGs [33, 34, 27] are inadequate in handling the incomplete and dynamically changing nature of real-world KGs. These approaches fail to effectively model unseen entities and represent new facts. In addition, they often ignore the abundant textual information in KGs. Moreover, existing methods in KGs are often customized for specific KGs or tasks, which are not generalizable enough. Therefore, it is also necessary to utilize LLMs to address the challenges faced in KGs. We summarize the pros and cons of LLMs and KGs in Fig. 1, respectively.
Recently, the possibility of unifying LLMs with KGs has attracted increasing attention from researchers and practitioners. LLMs and KGs are inherently interconnected and can mutually enhance each other. In KG-enhanced LLMs, KGs can not only be incorporated into the pre-training and inference stages of LLMs to provide external knowledge [35, 36, 37], but also used for analyzing LLMs and providing interpretability [14, 38, 39]. In LLM-augmented KGs, LLMs have been used in various KG-related tasks, e.g., KG embedding [40], KG completion [26], KG construction [41], KG-to-text generation [42], and KGQA [43], to improve the performance and facilitate the application of KGs. In Synergized LLM + KG, researchers marries the merits of LLMs and KGs to mutually enhance performance in knowledge representation [44] and reasoning [45, 46]. Although there are some surveys on knowledge-enhanced LLMs [47, 48, 49], which mainly focus on using KGs as an external knowledge to enhance LLMs, they ignore other possibilities of integrating KGs for LLMs and the potential role of LLMs in KG applications.
In this article, we present a forward-looking roadmap for unifying both LLMs and KGs, to leverage their respective strengths and overcome the limitations of each approach, for various downstream tasks. We propose detailed categorization, conduct comprehensive reviews, and pinpoint emerging directions in these fast-growing fields. Our main contributions are summarized as follows:
1. Roadmap. We present a forward-looking roadmap for integrating LLMs and KGs. Our roadmap, consisting of three general frameworks to unify LLMs and KGs, namely, KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs + KGs, provides guidelines for the unification of these two distinct but complementary technologies.
1. Categorization and review. For each integration framework of our roadmap, we present a detailed categorization and novel taxonomies of research on unifying LLMs and KGs. In each category, we review the research from the perspectives of different integration strategies and tasks, which provides more insights into each framework.
1. Coverage of emerging advances. We cover the advanced techniques in both LLMs and KGs. We include the discussion of state-of-the-art LLMs like ChatGPT and GPT-4 as well as the novel KGs e.g., multi-modal knowledge graphs.
1. Summary of challenges and future directions. We highlight the challenges in existing research and present several promising future research directions.
The rest of this article is organized as follows. Section 2 first explains the background of LLMs and KGs. Section 3 introduces the roadmap and the overall categorization of this article. Section 4 presents the different KGs-enhanced LLM approaches. Section 5 describes the possible LLM-augmented KG methods. Section 6 shows the approaches of synergizing LLMs and KGs. Section 7 discusses the challenges and future research directions. Finally, Section 8 concludes this paper.
## 2 Background
In this section, we will first briefly introduce a few representative large language models (LLMs) and discuss the prompt engineering that efficiently uses LLMs for varieties of applications. Then, we illustrate the concept of knowledge graphs (KGs) and present different categories of KGs.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Diagram: Timeline of Large Language Model Architectures and Evolution (2018-2023)
### Overview
This image is a technical timeline diagram illustrating the evolution and relationships of major Large Language Models (LLMs) from 2018 to 2023. It categorizes models into three primary architectural families: **Decoder-only**, **Encoder-decoder**, and **Encoder-only**. The diagram shows model names, their approximate parameter counts, and their lineage or influence via connecting arrows. A legend distinguishes between Open-Source and Closed-Source models.
### Components/Axes
1. **Main Sections (Vertical Division):**
* **Top Section:** Labeled **"Decoder-only"** on the left. Contains a simplified architecture diagram: `Input Text` -> `Decoder` -> `Output Text`.
* **Middle Section:** Labeled **"Encoder-decoder"** on the left. Contains a simplified architecture diagram: `Input Text` -> `Encoder` -> `Features` -> `Decoder` -> `Output Text`.
* **Bottom Section:** Labeled **"Encoder-only"** on the left. Contains a simplified architecture diagram: `Input Text` -> `Encoder` -> `Features`.
2. **Timeline (Horizontal Axis):**
* Located at the very bottom of the diagram.
* Marked with years in boxes: `2018`, `2019`, `2020`, `2021`, `2022`, `2023`.
* A red arrow extends from left to right, indicating chronological progression.
3. **Legend:**
* Positioned in the **bottom-right corner** of the main chart area.
* **Yellow-filled box:** `Open-Source`
* **White box with yellow border:** `Closed-Source`
4. **Model Nodes & Connections:**
* Models are represented as colored boxes (purple for Decoder-only, green for Encoder-decoder, red for Encoder-only).
* Each box contains the model name and its approximate parameter count (e.g., `GPT-3 175B`).
* Arrows connect models to show derivation, inspiration, or evolutionary path.
### Detailed Analysis
#### **Decoder-only Section (Purple)**
* **Trend:** Shows a clear progression towards models with vastly increasing parameter counts over time.
* **Models & Timeline Placement (Approximate):**
* **2018:** `GPT-1` (110M)
* **2019:** `GPT-2` (117M-1.5B), `XLNet` (110M-340M)
* **2020:** `GPT-3` (175B), `Gopher` (280B)
* **2021:** `GLaM` (1.2T), `LaMDA` (137B), `PaLM` (540B)
* **2022:** `ChatGPT` (175B), `OPT` (175B) -> `OPT-IML` (175B), `Flan PaLM` (540B), `LLaMa` (7B-65B)
* **2023:** `GPT-4` (Unknown), `Bard` (137B), `Vicuna` (7B), `Alpaca` (7B-13B)
* **Key Connections:** `GPT-1` -> `GPT-2` -> `GPT-3` -> `ChatGPT` -> `GPT-4`. `PaLM` leads to `Flan PaLM` and influences `LLaMa`, which in turn leads to `Vicuna` and `Alpaca`.
#### **Encoder-decoder Section (Green)**
* **Trend:** Shows branching evolution from foundational models like T5 and BART.
* **Models & Timeline Placement (Approximate):**
* **2019:** `BART` (140M), `T5` (80M-11B)
* **2020:** `T0` (11B), `mT5` (300M-13B)
* **2021:** `GLM` (110M-10B), `Switch` (1.6T)
* **2022:** `ST-MoE` (4.1B-269B), `UL2` (20B), `Flan-T5` (80M-11B), `GLM-130B` (130B)
* **2023:** `Flan-UL2` (20B)
* **Key Connections:** `T5` is a central node, leading to `T0`, `mT5`, and `Flan-T5`. `GLM` appears as a separate branch.
#### **Encoder-only Section (Red)**
* **Trend:** Focus on feature extraction models, with parameter sizes generally smaller than later decoder-only models.
* **Models & Timeline Placement (Approximate):**
* **2018:** `BERT` (110M-340M)
* **2019:** `DistillBert` (66M), `RoBERTa` (125M-355M)
* **2020:** `ALBERT` (11M-223M), `ERNIE` (114M), `ELECTRA` (14M-110M), `DeBERTa` (44M-304M)
* **Key Connections:** `BERT` is the foundational model, leading to `DistillBert`, `RoBERTa`, `ALBERT`, and `ELECTRA`.
### Key Observations
1. **Architectural Shift:** The diagram visually emphasizes the recent dominance and scale of **Decoder-only** models (e.g., GPT series, PaLM) compared to the other architectures, especially from 2021 onwards.
2. **Parameter Scale Explosion:** There is a clear trend of rapidly increasing model sizes, from millions (M) in 2018-2019 to hundreds of billions (B) and even trillions (T) of parameters by 2021-2022 (e.g., `GLaM` 1.2T, `Switch` 1.6T).
3. **Open vs. Closed Source:** The legend indicates a mix. Foundational models like `BERT`, `T5`, and `LLaMa` are open-source, while many of the largest and most prominent models like `GPT-3/4`, `ChatGPT`, `PaLM`, and `Bard` are closed-source.
4. **Fine-tuning & Derivatives:** The diagram highlights the trend of creating instruction-tuned or derivative versions of base models (e.g., `T5` -> `Flan-T5`, `PaLM` -> `Flan PaLM`, `UL2` -> `Flan-UL2`, `LLaMa` -> `Vicuna`/`Alpaca`).
### Interpretation
This diagram serves as a **genealogy of modern LLMs**, mapping their technical lineage and the rapid pace of development in the field. It demonstrates that the field has largely converged on the decoder-only transformer architecture for generating text, as exemplified by the GPT series and its successors. The explosive growth in parameter count suggests a prevailing research hypothesis that scale is a primary driver of capability. The presence of both open and closed-source models illustrates a dual ecosystem: closed models pushing the absolute frontier of scale and capability, while open models foster reproducibility, accessibility, and community-driven innovation (e.g., the `LLaMa` -> `Alpaca`/`Vicuna` branch). The timeline underscores that significant architectural diversification (Encoder-only, Encoder-decoder) occurred earlier (2018-2020), while the most recent years (2021-2023) are characterized by scaling and refining the decoder-only paradigm.
</details>
Figure 2: Representative large language models (LLMs) in recent years. Open-source models are represented by solid squares, while closed source models are represented by hollow squares.
<details>
<summary>x2.png Details</summary>

### Visual Description
## Diagram: Transformer Architecture (Encoder-Decoder with Multi-Head Attention)
### Overview
This image is a technical diagram illustrating the high-level architecture of a Transformer model, a foundational neural network architecture for sequence-to-sequence tasks like machine translation. It visually decomposes the model into its primary Encoder and Decoder blocks and provides an expanded view of the internal Self-Attention mechanism.
### Components/Axes
The diagram is organized into three main visual regions from left to right:
1. **Encoder Block (Left):**
* A yellow rounded rectangle labeled **"Encoder"** at the top.
* Contains two internal sub-layer boxes:
* A white box labeled **"Feed Forward"**.
* A light orange box labeled **"Self-Attention"**.
* An upward-pointing arrow emerges from the top of the Encoder block.
* An upward-pointing arrow enters the bottom of the Encoder block.
* A rightward-pointing arrow connects the Encoder block to the Decoder block.
2. **Decoder Block (Center):**
* A yellow rounded rectangle labeled **"Decoder"** at the top.
* Contains three internal sub-layer boxes:
* A white box labeled **"Feed Forward"**.
* A light orange box labeled **"Encoder-Decoder Attention"**.
* A light orange box labeled **"Self-Attention"**.
* An upward-pointing arrow emerges from the top of the Decoder block.
* An upward-pointing arrow enters the bottom of the Decoder block.
* A dashed line connects the Decoder's "Self-Attention" box to the expanded view on the right.
3. **Expanded Self-Attention Mechanism (Right):**
* A large, light-gray rounded rectangle labeled **"Self-Attention"** at the top.
* This block details the components of the Multi-Head Attention sub-layer.
* **Internal Components (from bottom to top):**
* Three small white boxes at the bottom, each labeled **"Linear"**.
* Below these boxes are the input labels: **"V"**, **"Q"**, and **"K"** (from left to right), with upward arrows pointing to their respective Linear boxes.
* Upward arrows from the three Linear boxes point to a large purple box labeled **"Multi-head Dot-Product Attention"**.
* An upward arrow from the purple box points to a white box labeled **"Concat"**.
* An upward arrow from the "Concat" box points to a final white box labeled **"Linear"**.
* An upward arrow emerges from the top of the entire "Self-Attention" block.
### Detailed Analysis
* **Data Flow:** The diagram depicts a clear sequential and hierarchical flow.
1. Input enters the **Encoder** from the bottom, passes through its Self-Attention and Feed Forward layers, and exits from the top.
2. The Encoder's output is fed sideways into the **Decoder**.
3. The Decoder processes its own input (from below) and the Encoder's output through three layers: its own Self-Attention, the Encoder-Decoder Attention (which uses the Encoder's output), and a Feed Forward network.
4. The final output exits the Decoder from the top.
* **Component Relationships:** The dashed line explicitly links the abstract "Self-Attention" box within the Decoder to its detailed implementation on the right, showing that the right-hand block is a "zoom-in" of that component.
* **Attention Mechanism Details:** The expanded view shows that the Multi-Head Attention mechanism consists of:
* Three separate linear projections for the Value (**V**), Query (**Q**), and Key (**K**) vectors.
* The core **Multi-head Dot-Product Attention** operation.
* A **Concat** (concatenation) operation to combine the outputs from multiple attention heads.
* A final **Linear** projection layer.
### Key Observations
* **Structural Symmetry:** The Encoder and Decoder share a similar internal structure with "Self-Attention" and "Feed Forward" layers, highlighting the modular design.
* **Critical Distinction:** The Decoder contains an additional, unique layer: **"Encoder-Decoder Attention"**. This is the component that allows the Decoder to focus on relevant parts of the input sequence (from the Encoder) while generating the output sequence.
* **Visual Coding:** Color is used functionally:
* Yellow: Main architectural blocks (Encoder, Decoder).
* Light Orange: Attention-based sub-layers.
* Purple: The core multi-head attention operation.
* White: Feed-forward and linear transformation layers.
* **Spatial Grounding:** The legend/labels are integrated directly into the components they describe. The expanded Self-Attention view is positioned to the right of the Decoder, connected by a dashed line originating from the corresponding sub-layer.
### Interpretation
This diagram is a canonical representation of the Transformer architecture introduced in the paper "Attention Is All You Need." It demonstrates the model's core innovation: replacing recurrent layers entirely with attention mechanisms.
* **What it demonstrates:** The architecture enables parallel processing of sequences (unlike RNNs) and captures long-range dependencies effectively through self-attention. The Encoder creates a contextual representation of the input, while the Decoder generates the output one element at a time, using both its own previous outputs (via self-attention) and the Encoder's representation (via encoder-decoder attention).
* **Relationships:** The flow shows a clear separation of concerns. The Encoder is responsible for understanding the input. The Decoder is responsible for generating the output, guided by the Encoder's understanding. The Multi-Head Attention is the fundamental computational engine within both, allowing the model to jointly attend to information from different representation subspaces at different positions.
* **Significance:** This specific diagram is foundational for understanding modern large language models (LLMs). It visually explains how the model processes information in parallel and how the decoder "attends to" the encoder's output, which is the basis for tasks like translation, summarization, and text generation. The expanded view of Multi-Head Attention is crucial for understanding the model's ability to capture complex relationships within the data.
</details>
Figure 3: An illustration of the Transformer-based LLMs with self-attention mechanism.
### 2.1 Large Language models (LLMs)
Large language models (LLMs) pre-trained on large-scale corpus have shown great potential in various NLP tasks [13]. As shown in Fig. 3, most LLMs derive from the Transformer design [50], which contains the encoder and decoder modules empowered by a self-attention mechanism. Based on the architecture structure, LLMs can be categorized into three groups: 1) encoder-only LLMs, 2) encoder-decoder LLMs, and 3) decoder-only LLMs. As shown in Fig. 2, we summarize several representative LLMs with different model architectures, model sizes, and open-source availabilities.
#### 2.1.1 Encoder-only LLMs.
Encoder-only large language models only use the encoder to encode the sentence and understand the relationships between words. The common training paradigm for these model is to predict the mask words in an input sentence. This method is unsupervised and can be trained on the large-scale corpus. Encoder-only LLMs like BERT [1], ALBERT [51], RoBERTa [2], and ELECTRA [52] require adding an extra prediction head to resolve downstream tasks. These models are most effective for tasks that require understanding the entire sentence, such as text classification [26] and named entity recognition [53].
#### 2.1.2 Encoder-decoder LLMs.
Encoder-decoder large language models adopt both the encoder and decoder module. The encoder module is responsible for encoding the input sentence into a hidden-space, and the decoder is used to generate the target output text. The training strategies in encoder-decoder LLMs can be more flexible. For example, T5 [3] is pre-trained by masking and predicting spans of masking words. UL2 [54] unifies several training targets such as different masking spans and masking frequencies. Encoder-decoder LLMs (e.g., T0 [55], ST-MoE [56], and GLM-130B [57]) are able to directly resolve tasks that generate sentences based on some context, such as summariaztion, translation, and question answering [58].
#### 2.1.3 Decoder-only LLMs.
Decoder-only large language models only adopt the decoder module to generate target output text. The training paradigm for these models is to predict the next word in the sentence. Large-scale decoder-only LLMs can generally perform downstream tasks from a few examples or simple instructions, without adding prediction heads or finetuning [59]. Many state-of-the-art LLMs (e.g., Chat-GPT [60] and GPT-4 https://openai.com/product/gpt-4) follow the decoder-only architecture. However, since these models are closed-source, it is challenging for academic researchers to conduct further research. Recently, Alpaca https://github.com/tatsu-lab/stanford_alpaca and Vicuna https://lmsys.org/blog/2023-03-30-vicuna/ are released as open-source decoder-only LLMs. These models are finetuned based on LLaMA [61] and achieve comparable performance with ChatGPT and GPT-4.
#### 2.1.4 Prompt Engineering
Prompt engineering is a novel field that focuses on creating and refining prompts to maximize the effectiveness of large language models (LLMs) across various applications and research areas [62]. As shown in Fig. 4, a prompt is a sequence of natural language inputs for LLMs that are specified for the task, such as sentiment classification. A prompt could contain several elements, i.e., 1) Instruction, 2) Context, and 3) Input Text. Instruction is a short sentence that instructs the model to perform a specific task. Context provides the context for the input text or few-shot examples. Input Text is the text that needs to be processed by the model.
Prompt engineering seeks to improve the capacity of large large language models (e.g., ChatGPT) in diverse complex tasks such as question answering, sentiment classification, and common sense reasoning. Chain-of-thought (CoT) prompt [63] enables complex reasoning capabilities through intermediate reasoning steps. Prompt engineering also enables the integration of structural data like knowledge graphs (KGs) into LLMs. Li et al. [64] simply linearizes the KGs and uses templates to convert the KGs into passages. Mindmap [65] designs a KG prompt to convert graph structure into a mind map that enables LLMs to perform reasoning on it. Prompt offers a simple way to utilize the potential of LLMs without finetuning. Proficiency in prompt engineering leads to a better understanding of the strengths and weaknesses of LLMs.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Diagram: LLM Prompt Processing for Sentiment Classification
### Overview
This image is a flowchart or process diagram illustrating how a Large Language Model (LLM) processes a structured prompt to perform a sentiment classification task. The diagram shows the flow from input components, through the LLM, to a final output.
### Components/Axes
The diagram is organized vertically with a clear upward flow indicated by arrows. The components are color-coded and labeled.
**Spatial Layout:**
* **Bottom Region:** Contains the "Input Text" component.
* **Middle Region (within a dashed box):** Contains the "Context" and "Instruction" components, collectively labeled as the "Prompt" on the right side.
* **Upper-Middle Region:** Contains the central "LLMs" processing block.
* **Top Region:** Contains the final "Output" component.
**Component Details (from bottom to top):**
1. **Input Text (Orange Box, Bottom):**
* **Label (Left):** `Input Text` (in orange text).
* **Content (Inside Box):**
* `Text: I think the vacation is okay.`
* `Sentiment:`
2. **Prompt (Dashed Box Enclosure):**
* A dashed gray line encloses the "Instruction" and "Context" boxes.
* A right curly brace `}` on the right side of this dashed box is labeled `Prompt` in black text.
3. **Context (Purple Box, Middle of dashed area):**
* **Label (Left):** `Context` (in purple text).
* **Content (Inside Box):**
* `Text: This is awesome!`
* `Sentiment: Positive`
* `Text: This is bad!`
* `Sentiment: Negative`
4. **Instruction (Red Box, Top of dashed area):**
* **Label (Left):** `Instruction` (in red text).
* **Content (Inside Box):** `Classify the text into neutral, negative or positive.`
5. **LLMs (Yellow Box, Center):**
* A large, central yellow rectangle with rounded corners.
* **Label (Inside Box):** `LLMs` in bold black text.
* An upward-pointing arrow connects the dashed "Prompt" box to the bottom of this "LLMs" box.
6. **Output (White Box, Top):**
* A white rectangle with a black border.
* **Label (Left):** `Output` in black text.
* **Content (Inside Box):** `Positive`
* An upward-pointing arrow connects the top of the "LLMs" box to the bottom of this "Output" box.
### Detailed Analysis
The diagram explicitly defines the structure of a prompt used to instruct an LLM. The prompt is composed of three distinct parts:
1. **Instruction:** A direct command specifying the task (`Classify the text into neutral, negative or positive.`).
2. **Context:** Few-shot examples providing the model with demonstrations of the task format and expected output. It includes two text-sentiment pairs.
3. **Input Text:** The new, unseen data point (`I think the vacation is okay.`) for which the model must generate a prediction.
The flow is linear and unidirectional: The combined Prompt (Instruction + Context + Input Text) is fed into the LLMs block. The LLM processes this structured input and generates a single-word classification, `Positive`, as the Output.
### Key Observations
* **Structured Prompting:** The diagram highlights a "few-shot" or "in-context learning" prompting technique, where examples are provided within the prompt itself.
* **Color-Coding:** Each component type (Input Text, Context, Instruction) is consistently color-coded (orange, purple, red) both in its label and its box border.
* **Task Specificity:** The instruction is explicit and limits the output space to three categories: neutral, negative, or positive.
* **Example Discrepancy:** The "Context" provides examples for "Positive" and "Negative" sentiments but does not include an example for "Neutral," which is listed as a possible classification in the instruction.
* **Output Format:** The final output is a single categorical label, matching the format demonstrated in the context examples.
### Interpretation
This diagram serves as a clear educational or technical schematic for how modern LLMs can be directed to perform specific NLP tasks through carefully constructed prompts. It demonstrates the principle of **in-context learning**, where the model learns the task from the provided examples within the prompt, without requiring weight updates.
The relationship between components is hierarchical and sequential: The **Instruction** defines the goal, the **Context** provides the pattern, and the **Input Text** is the subject of the operation. The **LLMs** block acts as the inference engine that maps the structured input to the correct output category.
A notable point for investigation is the **absence of a "Neutral" example** in the context. While the instruction includes "neutral" as a possible class, the model is only shown examples of "Positive" and "Negative." This could lead to ambiguity or bias in the model's classification of truly neutral statements, as it has not been explicitly shown what a neutral output looks like in this specific format. The chosen input text, "I think the vacation is okay," is itself a potential candidate for a neutral sentiment, making the example particularly relevant for testing the model's generalization beyond its provided context. The diagram implies the model correctly outputs "Positive," suggesting it may interpret "okay" as leaning positive, or that the lack of a neutral example influences its decision boundary.
</details>
Figure 4: An example of sentiment classification prompt.
<details>
<summary>x4.png Details</summary>

### Visual Description
## [Diagram Type]: Knowledge Graph Types (Encyclopedic, Commonsense, Domain-specific, Multi-modal)
### Overview
The image is a vertical diagram divided into four horizontal sections (separated by dashed lines), each illustrating a type of **knowledge graph** (a structured representation of entities and their relationships). Sections are labeled on the left: *Encyclopedic*, *Commonsense*, *Domain-specific*, and *Multi-modal*. Each section includes a title, a graph with nodes (entities) and edges (relationships), and visual elements (icons, images) to contextualize the knowledge type.
### Components/Axes (Sections)
The diagram is segmented into four horizontal regions (top to bottom):
#### 1. Encyclopedic Knowledge Graphs (Top Section)
- **Left Icon**: Wikipedia logo (puzzle globe with “W” and multilingual symbols) labeled *“Wikipedia”*.
- **Graph Nodes** (color-coded):
- Green: *Barack Obama*, *Michelle Obama* (people).
- Purple: *Honolulu*, *Washington D.C.* (places).
- Blue: *USA* (organization).
- **Graph Edges** (relationships):
- *BornIn* (Barack Obama → Honolulu), *PoliticianOf* (Barack Obama → USA), *MarriedTo* (Barack Obama → Michelle Obama), *LiveIn* (Michelle Obama → USA), *LocatedIn* (Honolulu → USA), *CapitalOf* (Washington D.C. → USA).
#### 2. Commonsense Knowledge Graphs (Second Section)
- **Title**: *“Concept: Wake up”* (centered top).
- **Graph Nodes** (color-coded):
- Red: *Wake up* (central concept).
- Blue: *Bed*, *Open eyes*, *Get out of bed*, *Drink coffee*, *Awake*, *Make coffee*, *Coffee*, *Kitchen*, *Sugar*, *Cup*, *Drink* (everyday actions/objects).
- **Graph Edges** (relationships):
- *LocatedAt* (Bed → Wake up), *SubeventOf* (Open eyes → Wake up), *SubeventOf* (Get out of bed → Wake up), *SubeventOf* (Drink coffee → Wake up), *Causes* (Drink coffee → Awake), *SubeventOf* (Make coffee → Drink coffee), *IsFor* (Coffee → Make coffee), *LocatedAt* (Coffee → Kitchen), *Need* (Coffee → Sugar), *Need* (Coffee → Cup), *Is* (Coffee → Drink).
#### 3. Domain-specific Knowledge Graphs (Third Section)
- **Title**: *“Medical Knowledge Graph”* (centered top).
- **Graph Nodes** (color-coded):
- Green: *PINK1* (genetic factor).
- Blue: *Parkinson’s Disease*, *Sleeping Disorder*, *Pervasive Developmental Disorder* (diseases/disorders).
- Purple: *Motor Symptom*, *Tremor*, *Anxiety*, *Language Undevelopment* (symptoms).
- **Graph Edges** (relationships):
- *Cause* (PINK1 → Parkinson’s Disease), *Lead* (Parkinson’s Disease → Motor Symptom), *Lead* (Parkinson’s Disease → Tremor), *Cause* (Parkinson’s Disease → Sleeping Disorder), *Cause* (Sleeping Disorder → Anxiety), *Lead* (Anxiety → Pervasive Developmental Disorder), *Lead* (Pervasive Developmental Disorder → Language Undevelopment).
#### 4. Multi-modal Knowledge Graphs (Bottom Section)
- **Visual Elements**:
- Left: Image of the *Eiffel Tower* (labeled *“Eiffel Tower”*).
- Icons: French flag (left of *France* node), EU flag (right of *European Union* node), photo of *Emmanuel Macron* (right of his node).
- **Graph Nodes** (blue): *Eiffel Tower*, *Paris*, *France*, *European Union*, *Emmanuel Macron*.
- **Graph Edges** (relationships):
- *LocatedIn* (Eiffel Tower → Paris), *CapitalOf* (Paris → France), *MemberOf* (France → European Union), *PoliticianOf* (Emmanuel Macron → France), *LiveIn* (Emmanuel Macron → Paris).
### Key Observations
- **Color Coding**: Distinguishes entity types (e.g., green = people, purple = places, blue = organizations/diseases).
- **Relationship Types**: Edges define spatial (*LocatedIn*), causal (*Cause*), hierarchical (*SubeventOf*), and functional (*Need*, *IsFor*) relationships.
- **Multi-modal Integration**: Combines text nodes with visual elements (images, flags) to represent entities (e.g., Eiffel Tower image, French flag).
### Interpretation
This diagram illustrates how knowledge is structured across domains:
- **Encyclopedic**: Captures factual, general knowledge (e.g., Wikipedia-style entities/relationships).
- **Commonsense**: Encodes everyday reasoning (e.g., “waking up” involves subevents like “opening eyes” or “drinking coffee”).
- **Domain-specific**: Specialized for fields like medicine, capturing causal links (e.g., *PINK1* → *Parkinson’s Disease* → *Tremor*).
- **Multi-modal**: Integrates text and visual information (e.g., Eiffel Tower image + “LocatedIn Paris”), enabling tasks like image-text understanding.
Each type serves a unique purpose: encyclopedic for general knowledge, commonsense for AI reasoning, domain-specific for expert systems, and multi-modal for cross-modal tasks. The diagram emphasizes the diversity of knowledge representation, showing how structure (nodes, edges, visuals) adapts to the information’s nature.
</details>
Figure 5: Examples of different categories’ knowledge graphs, i.e., encyclopedic KGs, commonsense KGs, domain-specific KGs, and multi-modal KGs.
### 2.2 Knowledge Graphs (KGs)
Knowledge graphs (KGs) store structured knowledge as a collection of triples $\mathcal{KG}=\{(h,r,t)\subseteq\mathcal{E}\times\mathcal{R}\times\mathcal{E}\}$ , where $\mathcal{E}$ and $\mathcal{R}$ respectively denote the set of entities and relations. Existing knowledge graphs (KGs) can be classified into four groups based on the stored information: 1) encyclopedic KGs, 2) commonsense KGs, 3) domain-specific KGs, and 4) multi-modal KGs. We illustrate the examples of KGs of different categories in Fig. 5.
#### 2.2.1 Encyclopedic Knowledge Graphs.
Encyclopedic knowledge graphs are the most ubiquitous KGs, which represent the general knowledge in real-world. Encyclopedic knowledge graphs are often constructed by integrating information from diverse and extensive sources, including human experts, encyclopedias, and databases. Wikidata [20] is one of the most widely used encyclopedic knowledge graphs, which incorporates varieties of knowledge extracted from articles on Wikipedia. Other typical encyclopedic knowledge graphs, like Freebase [66], Dbpedia [67], and YAGO [31] are also derived from Wikipedia. In addition, NELL [32] is a continuously improving encyclopedic knowledge graph, which automatically extracts knowledge from the web, and uses that knowledge to improve its performance over time. There are several encyclopedic knowledge graphs available in languages other than English such as CN-DBpedia [68] and Vikidia [69]. The largest knowledge graph, named Knowledge Occean (KO) https://ko.zhonghuapu.com/, currently contains 4,8784,3636 entities and 17,3115,8349 relations in both English and Chinese.
#### 2.2.2 Commonsense Knowledge Graphs.
Commonsense knowledge graphs formulate the knowledge about daily concepts, e.g., objects, and events, as well as their relationships [70]. Compared with encyclopedic knowledge graphs, commonsense knowledge graphs often model the tacit knowledge extracted from text such as (Car, UsedFor, Drive). ConceptNet [71] contains a wide range of commonsense concepts and relations, which can help computers understand the meanings of words people use. ATOMIC [72, 73] and ASER [74] focus on the causal effects between events, which can be used for commonsense reasoning. Some other commonsense knowledge graphs, such as TransOMCS [75] and CausalBanK [76] are automatically constructed to provide commonsense knowledge.
#### 2.2.3 Domain-specific Knowledge Graphs
Domain-specific knowledge graphs are often constructed to represent knowledge in a specific domain, e.g., medical, biology, and finance [23]. Compared with encyclopedic knowledge graphs, domain-specific knowledge graphs are often smaller in size, but more accurate and reliable. For example, UMLS [77] is a domain-specific knowledge graph in the medical domain, which contains biomedical concepts and their relationships. In addition, there are some domain-specific knowledge graphs in other domains, such as finance [78], geology [79], biology [80], chemistry [81] and genealogy [82].
<details>
<summary>x5.png Details</summary>

### Visual Description
## Diagram: Paradigms for Integrating Large Language Models (LLMs) and Knowledge Graphs (KGs)
### Overview
The image is a technical diagram illustrating three distinct paradigms for the interaction and integration between Large Language Models (LLMs) and Knowledge Graphs (KGs). It is divided into three horizontally arranged sections, labeled a, b, and c, each depicting a different architectural relationship and flow of information.
### Components/Axes
The diagram uses consistent visual elements:
* **Boxes:** Two types of rounded rectangles represent the core components.
* **Blue Boxes:** Labeled "KGs" (Knowledge Graphs).
* **Yellow Boxes:** Labeled "LLMs" (Large Language Models).
* **Arrows:** Indicate the direction of information flow or influence.
* **Straight, Downward Arrows:** Show the flow of capabilities or knowledge from one component to another.
* **Straight, Horizontal Arrows:** Indicate input and output for a process.
* **Curved Arrows (in section c):** Represent a bidirectional, synergistic exchange.
* **Text Labels:** Provide titles, component names, and descriptive bullet points.
### Detailed Analysis
The diagram is segmented into three independent models:
**a. KG-enhanced LLMs (Left Section)**
* **Layout:** A blue "KGs" box is positioned at the top. A downward arrow points to a yellow "LLMs" box below it.
* **Flow:** Text enters the LLM box from the left ("Text Input"), and an output exits to the right ("Output").
* **Capabilities from KGs:** The arrow from KGs to LLMs is annotated with a bulleted list describing the knowledge transferred:
* Structural Fact
* Domain-specific Knowledge
* Symbolic-reasoning
* ... (ellipsis indicating additional items)
**b. LLM-augmented KGs (Center Section)**
* **Layout:** A yellow "LLMs" box is positioned at the top. A downward arrow points to a blue "KGs" box below it.
* **Flow:** "KG-related Tasks" enter the KG box from the left, and an "Output" exits to the right.
* **Capabilities from LLMs:** The arrow from LLMs to KGs is annotated with a bulleted list describing the capabilities provided:
* General Knowledge
* Language Processing
* Generalizability
* ... (ellipsis indicating additional items)
**c. Synergized LLMs + KGs (Right Section)**
* **Layout:** A yellow "LLMs" box and a blue "KGs" box are placed side-by-side at the same horizontal level.
* **Flow:** Two curved arrows create a closed loop between them.
* A **blue, curved arrow** flows from the top of the "KGs" box to the top of the "LLMs" box. It is labeled **"Factual Knowledge"**.
* An **orange, curved arrow** flows from the bottom of the "LLMs" box to the bottom of the "KGs" box. It is labeled **"Knowledge Representation"**.
### Key Observations
1. **Directional Dependency:** Sections (a) and (b) depict a unidirectional, hierarchical relationship where one system enhances the other. Section (c) depicts a bidirectional, peer-to-peer relationship.
2. **Role Specialization:** The bullet points explicitly define the complementary strengths each system contributes: KGs provide structured, factual, and symbolic knowledge, while LLMs provide broad, procedural, and linguistic capabilities.
3. **Visual Consistency:** The color coding (blue for KGs, yellow for LLMs) is maintained across all three panels, allowing for easy comparison of the structural differences between paradigms.
4. **Process Context:** Sections (a) and (b) include explicit input/output labels ("Text Input", "KG-related Tasks", "Output"), grounding the abstract models in a practical processing context. Section (c) abstracts this into a continuous cycle of knowledge exchange.
### Interpretation
This diagram serves as a conceptual framework for understanding the evolving field of neuro-symbolic AI, specifically the integration of neural networks (LLMs) and symbolic knowledge bases (KGs).
* **Paradigm (a) - KG-enhanced LLMs:** This represents a "knowledge injection" approach. The goal is to ground the vast but potentially unstructured and hallucination-prone knowledge of an LLM with the precise, structured facts from a KG to improve accuracy and reliability, especially for domain-specific or factual question-answering.
* **Paradigm (b) - LLM-augmented KGs:** This represents an "automation and enrichment" approach. Here, the LLM's powerful language understanding and generation capabilities are used to automate labor-intensive KG tasks like entity linking, relation extraction, schema induction, and query generation, making KG construction and maintenance more scalable.
* **Paradigm (c) - Synergized LLMs + KGs:** This is the most advanced vision, proposing a continuous, co-evolutionary loop. The KG provides verified **Factual Knowledge** to constrain and inform the LLM's outputs. In return, the LLM helps structure unstructured data into formal **Knowledge Representation** (e.g., generating RDF triples or updating ontologies) to expand and refine the KG. This creates a self-improving system where each component addresses the other's weaknesses: the KG provides precision and explainability, while the LLM provides flexibility and scalability.
The progression from (a) to (c) illustrates a shift from using one system as a tool for the other towards a deeply integrated partnership, aiming to combine the reasoning strengths of symbolic AI with the generative and adaptive strengths of modern deep learning.
</details>
Figure 6: The general roadmap of unifying KGs and LLMs. (a.) KG-enhanced LLMs. (b.) LLM-augmented KGs. (c.) Synergized LLMs + KGs.
#### 2.2.4 Multi-modal Knowledge Graphs.
Unlike conventional knowledge graphs that only contain textual information, multi-modal knowledge graphs represent facts in multiple modalities such as images, sounds, and videos [83]. For example, IMGpedia [84], MMKG [85], and Richpedia [86] incorporate both the text and image information into the knowledge graphs. These knowledge graphs can be used for various multi-modal tasks such as image-text matching [87], visual question answering [88], and recommendation [89].
TABLE I: Representative applications of using LLMs and KGs.
| ChatGPT/GPT-4 | Chat Bot | ✓ | | https://shorturl.at/cmsE0 |
| --- | --- | --- | --- | --- |
| ERNIE 3.0 | Chat Bot | ✓ | ✓ | https://shorturl.at/sCLV9 |
| Bard | Chat Bot | ✓ | ✓ | https://shorturl.at/pDLY6 |
| Firefly | Photo Editing | ✓ | | https://shorturl.at/fkzJV |
| AutoGPT | AI Assistant | ✓ | | https://shorturl.at/bkoSY |
| Copilot | Coding Assistant | ✓ | | https://shorturl.at/lKLUV |
| New Bing | Web Search | ✓ | | https://shorturl.at/bimps |
| Shop.ai | Recommendation | ✓ | | https://shorturl.at/alCY7 |
| Wikidata | Knowledge Base | | ✓ | https://shorturl.at/lyMY5 |
| KO | Knowledge Base | | ✓ | https://shorturl.at/sx238 |
| OpenBG | Recommendation | | ✓ | https://shorturl.at/pDMV9 |
| Doctor.ai | Health Care Assistant | ✓ | ✓ | https://shorturl.at/dhlK0 |
### 2.3 Applications
LLMs as KGs have been widely applied in various real-world applications. We summarize some representative applications of using LLMs and KGs in Table I. ChatGPT/GPT-4 are LLM-based chatbots that can communicate with humans in a natural dialogue format. To improve knowledge awareness of LLMs, ERNIE 3.0 and Bard incorporate KGs into their chatbot applications. Instead of Chatbot. Firefly develops a photo editing application that allows users to edit photos by using natural language descriptions. Copilot, New Bing, and Shop.ai adopt LLMs to empower their applications in the areas of coding assistant, web search, and recommendation, respectively. Wikidata and KO are two representative knowledge graph applications that are used to provide external knowledge. OpenBG [90] is a knowledge graph designed for recommendation. Doctor.ai develops a health care assistant that incorporates LLMs and KGs to provide medical advice.
## 3 Roadmap & Categorization
In this section, we first present a road map of explicit frameworks that unify LLMs and KGs. Then, we present the categorization of research on unifying LLMs and KGs.
### 3.1 Roadmap
The roadmap of unifying KGs and LLMs is illustrated in Fig. 6. In the roadmap, we identify three frameworks for the unification of LLMs and KGs, including KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs + KGs. The KG-enhanced LLMs and LLM-augmented KGs are two parallel frameworks that aim to enhance the capabilities of LLMs and KGs, respectively. Building upon these frameworks, Synergized LLMs + KGs is a unified framework that aims to synergize LLMs and KGs to mutually enhance each other.
#### 3.1.1 KG-enhanced LLMs
LLMs are renowned for their ability to learn knowledge from large-scale corpus and achieve state-of-the-art performance in various NLP tasks. However, LLMs are often criticized for their hallucination issues [15], and lacking of interpretability. To address these issues, researchers have proposed to enhance LLMs with knowledge graphs (KGs).
KGs store enormous knowledge in an explicit and structured way, which can be used to enhance the knowledge awareness of LLMs. Some researchers have proposed to incorporate KGs into LLMs during the pre-training stage, which can help LLMs learn knowledge from KGs [35, 91]. Other researchers have proposed to incorporate KGs into LLMs during the inference stage. By retrieving knowledge from KGs, it can significantly improve the performance of LLMs in accessing domain-specific knowledge [92]. To improve the interpretability of LLMs, researchers also utilize KGs to interpret the facts [14] and the reasoning process of LLMs [38].
#### 3.1.2 LLM-augmented KGs
KGs store structure knowledge playing an essential role in many real-word applications [19]. Existing methods in KGs fall short of handling incomplete KGs [33] and processing text corpus to construct KGs [93]. With the generalizability of LLMs, many researchers are trying to harness the power of LLMs for addressing KG-related tasks.
The most straightforward way to apply LLMs as text encoders for KG-related tasks. Researchers take advantage of LLMs to process the textual corpus in the KGs and then use the representations of the text to enrich KGs representation [94]. Some studies also use LLMs to process the original corpus and extract relations and entities for KG construction [95]. Recent studies try to design a KG prompt that can effectively convert structural KGs into a format that can be comprehended by LLMs. In this way, LLMs can be directly applied to KG-related tasks, e.g., KG completion [96] and KG reasoning [97].
<details>
<summary>x6.png Details</summary>

### Visual Description
## System Architecture Diagram: Synergized AI Framework
### Overview
This image is a technical system architecture diagram illustrating a four-layered framework for artificial intelligence systems. It depicts the flow from raw data, through a synergistic core model and various techniques, to final applications. The diagram emphasizes the integration of Large Language Models (LLMs) and Knowledge Graphs (KGs) as the central, synergistic component.
### Components/Axes
The diagram is structured into four horizontal layers, stacked vertically. From bottom to top, they are:
1. **Data Layer**: The foundational input layer.
2. **Synergized Model Layer**: The core processing layer showing the interaction between LLMs and KGs.
3. **Technique Layer**: The methodological layer listing various AI techniques.
4. **Application Layer**: The top layer representing end-user applications.
Upward-pointing arrows connect each layer to the one above it, indicating a flow of information or capability from data to application.
**Detailed Component List:**
* **Data Layer (Bottom):**
* A dashed-line box labeled "Data" on the left.
* Inside the box, four rounded rectangles represent data types: "Structural Fact", "Text Corpus", "Image", "Video".
* An ellipsis ("...") follows "Video", indicating other possible data types.
* **Synergized Model Layer (Center):**
* A large dashed-line box labeled "Synergized Model" on the left.
* **Central Components:** Two colored rectangles.
* A **yellow** rectangle labeled **"LLMs"**.
* A **light blue** rectangle labeled **"KGs"**.
* **Bidirectional Arrows:** Two curved arrows connect the LLMs and KGs boxes.
* A **blue arrow** curves from the top of the KGs box to the top of the LLMs box.
* An **orange arrow** curves from the bottom of the LLMs box to the bottom of the KGs box.
* **Attribute Lists:**
* To the **left of the LLMs box**, a bulleted list: "General Knowledge", "Language Processing", "Generalizability".
* To the **right of the KGs box**, a bulleted list: "Explicit Knowledge", "Domain-specific Knowledge", "Decisiveness", "Interpretability".
* **Technique Layer (Upper Middle):**
* A dashed-line box labeled "Technique" on the left.
* Inside, six rounded rectangles in two rows list techniques:
* Top Row: "Prompt Engineering", "Graph Neural Network", "In-context Learning".
* Bottom Row: "Representation Learning", "Neural-symbolic Reasoning", "Few-shot Learning".
* **Application Layer (Top):**
* A dashed-line box labeled "Application" on the left.
* Inside, four rounded rectangles represent applications: "Search Engine", "Recommender System", "Dialogue System", "AI Assistant".
* An ellipsis ("...") follows "AI Assistant", indicating other possible applications.
### Detailed Analysis
The diagram presents a clear hierarchical and relational structure.
* **Flow Direction:** The primary flow is upward, from the **Data** layer, through the **Synergized Model**, enabled by various **Techniques**, to produce **Applications**.
* **Core Synergy:** The central and most detailed component is the **Synergized Model**. The bidirectional arrows between **LLMs** (yellow) and **KGs** (blue) signify a two-way, mutually reinforcing integration.
* The **blue arrow (KGs → LLMs)** suggests that Knowledge Graphs provide structured, explicit knowledge to enhance LLMs.
* The **orange arrow (LLMs → KGs)** suggests that LLMs provide language processing and generalization capabilities to enhance or populate Knowledge Graphs.
* **Attribute Mapping:** The bulleted lists explicitly define the complementary strengths each component brings to the synergy:
* LLMs contribute broad, implicit knowledge and language fluency.
* KGs contribute precise, structured, and domain-specific knowledge with higher interpretability.
* **Technique Support:** The **Technique** layer lists methods (e.g., Graph Neural Networks, Neural-symbolic Reasoning) that likely facilitate the integration and operation of the synergized LLM+KG model.
* **Extensibility:** The ellipses ("...") in the **Data** and **Application** layers explicitly indicate that the framework is designed to be extensible to other data modalities and application domains.
### Key Observations
1. **Central Synergy:** The diagram's visual focus is the LLM-KGs interaction, highlighted by their central placement, color, and the detailed attribute lists and connecting arrows.
2. **Layered Abstraction:** The architecture follows a classic layered pattern, abstracting complexity from raw data (bottom) to user-facing applications (top).
3. **Complementary Design:** The framework is explicitly designed to combine the strengths of connectionist AI (LLMs) with symbolic AI (KGs), as evidenced by the attribute lists and the "Neural-symbolic Reasoning" technique.
4. **Bidirectional Enhancement:** The relationship between LLMs and KGs is not a simple pipeline but a continuous feedback loop, as shown by the two opposing arrows.
### Interpretation
This diagram illustrates a **modern AI system architecture aimed at overcoming the limitations of standalone LLMs**. The core premise is that while LLMs excel at language tasks and general knowledge, they lack precision, structured reasoning, and domain-specific expertise. Knowledge Graphs provide these missing elements.
The framework suggests that the future of robust AI applications lies in **hybrid systems**. By synergizing LLMs and KGs, the system aims to achieve:
* **More Accurate and Grounded Outputs:** KGs can "ground" LLM responses in verified facts, reducing hallucinations.
* **Enhanced Reasoning:** The combination allows for both statistical pattern recognition (LLMs) and logical, rule-based inference (KGs).
* **Interpretability:** The structured nature of KGs can help explain the reasoning behind an LLM's output.
The **Technique** layer acts as the toolbox for building this synergy. For example, "Graph Neural Networks" can be used to process the KG structure, while "Prompt Engineering" and "In-context Learning" are key to effectively querying and utilizing the LLM within this integrated system.
Ultimately, the diagram maps a path from heterogeneous **Data** to sophisticated **Applications** (like advanced AI Assistants or Search Engines) by leveraging a **synergistic core model** that is more capable than its individual parts. It represents a shift towards composite, neuro-symbolic AI systems.
</details>
Figure 7: The general framework of the Synergized LLMs + KGs, which contains four layers: 1) Data, 2) Synergized Model, 3) Technique, and 4) Application.
#### 3.1.3 Synergized LLMs + KGs
The synergy of LLMs and KGs has attracted increasing attention from researchers these years [40, 42]. LLMs and KGs are two inherently complementary techniques, which should be unified into a general framework to mutually enhance each other.
To further explore the unification, we propose a unified framework of the synergized LLMs + KGs in Fig. 7. The unified framework contains four layers: 1) Data, 2) Synergized Model, 3) Technique, and 4) Application. In the Data layer, LLMs and KGs are used to process the textual and structural data, respectively. With the development of multi-modal LLMs [98] and KGs [99], this framework can be extended to process multi-modal data, such as video, audio, and images. In the Synergized Model layer, LLMs and KGs could synergize with each other to improve their capabilities. In Technique layer, related techniques that have been used in LLMs and KGs can be incorporated into this framework to further enhance the performance. In the Application layer, LLMs and KGs can be integrated to address various real-world applications, such as search engines [100], recommender systems [10], and AI assistants [101].
<details>
<summary>x7.png Details</summary>

### Visual Description
## Diagram: Taxonomy of LLMs Meet KGs
### Overview
This image is a hierarchical tree diagram (mind map) illustrating the taxonomy of research areas at the intersection of Large Language Models (LLMs) and Knowledge Graphs (KGs). The central theme is "LLMs Meet KGs," which branches into three primary categories, each further subdivided into specific research directions and applications. The diagram uses color-coding to distinguish the main branches.
### Components/Axes
* **Central Node (Root):** "LLMs Meet KGs" (Dark blue box, left side).
* **Primary Branches (Level 1):**
1. **KG-enhanced LLMs** (Yellow box, top branch).
2. **LLM-augmented KGs** (Light blue box, middle branch).
3. **Synergized LLMs + KGs** (Teal box, bottom branch).
* **Secondary Branches (Level 2):** Each primary branch splits into 2-5 sub-categories, represented by lighter-colored boxes connected by lines.
* **Tertiary Branches (Level 3):** Some secondary branches further split into specific techniques or tasks, shown in the lightest-colored boxes.
* **Spatial Layout:** The diagram flows from left (root) to right (leaves). The legend is implicit in the color-coding of the boxes and connecting lines, which consistently group related concepts.
### Detailed Analysis
The diagram systematically breaks down the field into three main paradigms:
**1. KG-enhanced LLMs (Yellow Branch):** Focuses on using Knowledge Graphs to improve LLMs.
* **KG-enhanced LLM pre-training:**
* Integrating KGs into training objective
* Integrating KGs into LLM inputs
* KGs Instruction-tuning
* **KG-enhanced LLM inference:**
* Retrieval-augmented knowledge fusion
* KGs Prompting
* **KG-enhanced LLM interpretability:**
* KGs for LLM probing
* KGs for LLM analysis
**2. LLM-augmented KGs (Light Blue Branch):** Focuses on using LLMs to improve Knowledge Graph tasks.
* **LLM-augmented KG embedding** (Note: Corrected from "emebedding" in the source image.):
* LLMs as text encoders
* LLMs for joint text and KG embedding
* **LLM-augmented KG completion:**
* LLMs as encoders
* LLMs as generators
* **LLM-augmented KG construction:**
* Entity discovery
* Relation extraction
* Coreference resolution
* End-to-End KG construction
* Distilling KGs from LLMs
* **LLM-augmented KG to text generation:**
* Leveraging knowledge from LLMs
* LLMs for constructing KG-text aligned Corpus
* **LLM-augmented KG question answering:**
* LLMs as entity/relation extractors
* LLMs as answer reasoners
**3. Synergized LLMs + KGs (Teal Branch):** Focuses on the mutual integration and co-evolution of both technologies.
* **Synergized Knowledge Representation**
* **Synergized Reasoning:**
* LLM-KG fusion reasoning
* LLMs as agents reasoning
### Key Observations
* **Structured Taxonomy:** The diagram presents a clear, three-pronged classification of the research landscape.
* **Directionality of Enhancement:** The first two branches are asymmetric: one uses KGs to help LLMs, the other uses LLMs to help KGs. The third branch proposes a more balanced, synergistic relationship.
* **Granularity:** The "LLM-augmented KGs" branch is the most detailed, suggesting a wide range of established or emerging tasks in this area.
* **Typographical Error:** The term "emebedding" under the light blue branch is a misspelling of "embedding."
### Interpretation
This diagram serves as a conceptual map for understanding how two powerful AI technologies—LLMs (parametric, generative knowledge) and KGs (structured, symbolic knowledge)—can be combined. It moves beyond a simple "A + B" model to show distinct research philosophies:
1. **KG-enhanced LLMs** represents the "knowledge injection" paradigm, aiming to ground LLMs in factual, structured knowledge to improve accuracy, reduce hallucinations, and enhance interpretability.
2. **LLM-augmented KGs** represents the "automation and scaling" paradigm, leveraging the linguistic and reasoning prowess of LLMs to build, complete, and query knowledge graphs more efficiently.
3. **Synergized LLMs + KGs** represents the "unified intelligence" paradigm, envisioning a future where the two forms of knowledge are deeply integrated for more robust reasoning and representation, potentially leading to systems that combine the flexibility of neural networks with the precision of symbolic AI.
The taxonomy highlights that the field is not monolithic but consists of complementary approaches targeting different stages of the AI pipeline (pre-training, inference, construction, reasoning). The detailed breakdown under "LLM-augmented KGs" indicates this is a particularly active area of applied research.
</details>
Figure 8: Fine-grained categorization of research on unifying large language models (LLMs) with knowledge graphs (KGs).
### 3.2 Categorization
To better understand the research on unifying LLMs and KGs, we further provide a fine-grained categorization for each framework in the roadmap. Specifically, we focus on different ways of integrating KGs and LLMs, i.e., KG-enhanced LLMs, KG-augmented LLMs, and Synergized LLMs + KGs. The fine-grained categorization of the research is illustrated in Fig. 8.
KG-enhanced LLMs. Integrating KGs can enhance the performance and interpretability of LLMs in various downstream tasks. We categorize the research on KG-enhanced LLMs into three groups:
1. KG-enhanced LLM pre-training includes works that apply KGs during the pre-training stage and improve the knowledge expression of LLMs.
1. KG-enhanced LLM inference includes research that utilizes KGs during the inference stage of LLMs, which enables LLMs to access the latest knowledge without retraining.
1. KG-enhanced LLM interpretability includes works that use KGs to understand the knowledge learned by LLMs and interpret the reasoning process of LLMs.
LLM-augmented KGs. LLMs can be applied to augment various KG-related tasks. We categorize the research on LLM-augmented KGs into five groups based on the task types:
1. LLM-augmented KG embedding includes studies that apply LLMs to enrich representations of KGs by encoding the textual descriptions of entities and relations.
1. LLM-augmented KG completion includes papers that utilize LLMs to encode text or generate facts for better KGC performance.
1. LLM-augmented KG construction includes works that apply LLMs to address the entity discovery, coreference resolution, and relation extraction tasks for KG construction.
1. LLM-augmented KG-to-text Generation includes research that utilizes LLMs to generate natural language that describes the facts from KGs.
1. LLM-augmented KG question answering includes studies that apply LLMs to bridge the gap between natural language questions and retrieve answers from KGs.
Synergized LLMs + KGs. The synergy of LLMs and KGs aims to integrate LLMs and KGs into a unified framework to mutually enhance each other. In this categorization, we review the recent attempts of Synergized LLMs + KGs from the perspectives of knowledge representation and reasoning.
In the following sections (Sec 4, 5, and 6), we will provide details on these categorizations.
## 4 KG-enhanced LLMs
Large language models (LLMs) achieve promising results in many natural language processing tasks. However, LLMs have been criticized for their lack of practical knowledge and tendency to generate factual errors during inference. To address this issue, researchers have proposed integrating knowledge graphs (KGs) to enhance LLMs. In this section, we first introduce the KG-enhanced LLM pre-training, which aims to inject knowledge into LLMs during the pre-training stage. Then, we introduce the KG-enhanced LLM inference, which enables LLMs to consider the latest knowledge while generating sentences. Finally, we introduce the KG-enhanced LLM interpretability, which aims to improve the interpretability of LLMs by using KGs. Table II summarizes the typical methods that integrate KGs for LLMs.
TABLE II: Summary of KG-enhanced LLM methods.
[b] Task Method Year KG Technique KG-enhanced LLM pre-training ERNIE [35] 2019 E Integrating KGs into Training Objective GLM [102] 2020 C Integrating KGs into Training Objective Ebert [103] 2020 D Integrating KGs into Training Objective KEPLER [40] 2021 E Integrating KGs into Training Objective Deterministic LLM [104] 2022 E Integrating KGs into Training Objective KALA [105] 2022 D Integrating KGs into Training Objective WKLM [106] 2020 E Integrating KGs into Training Objective K-BERT [36] 2020 E + D Integrating KGs into Language Model Inputs CoLAKE [107] 2020 E Integrating KGs into Language Model Inputs ERNIE3.0 [101] 2021 E + D Integrating KGs into Language Model Inputs DkLLM [108] 2022 E Integrating KGs into Language Model Inputs KP-PLM [109] 2022 E KGs Instruction-tuning OntoPrompt [110] 2022 E + D KGs Instruction-tuning ChatKBQA [111] 2023 E KGs Instruction-tuning RoG [112] 2023 E KGs Instruction-tuning KG-enhanced LLM inference KGLM [113] 2019 E Retrival-augmented knowledge fusion REALM [114] 2020 E Retrival-augmented knowledge fusion RAG [92] 2020 E Retrival-augmented knowledge fusion EMAT [115] 2022 E Retrival-augmented knowledge fusion Li et al. [64] 2023 C KGs Prompting Mindmap [65] 2023 E + D KGs Prompting ChatRule [116] 2023 E + D KGs Prompting CoK [117] 2023 E + C + D KGs Prompting KG-enhanced LLM interpretability LAMA [14] 2019 E KGs for LLM probing LPAQA [118] 2020 E KGs for LLM probing Autoprompt [119] 2020 E KGs for LLM probing MedLAMA [120] 2022 D KGs for LLM probing LLM-facteval [121] 2023 E + D KGs for LLM probing KagNet [38] 2019 C KGs for LLM analysis Interpret-lm [122] 2021 E KGs for LLM analysis knowledge-neurons [39] 2021 E KGs for LLM analysis Shaobo et al. [123] 2022 E KGs for LLM analysis
- E: Encyclopedic Knowledge Graphs, C: Commonsense Knowledge Graphs, D: Domain-Specific Knowledge Graphs.
### 4.1 KG-enhanced LLM Pre-training
Existing large language models mostly rely on unsupervised training on the large-scale corpus. While these models may exhibit impressive performance on downstream tasks, they often lack practical knowledge relevant to the real world. Previous works that integrate KGs into large language models can be categorized into three parts: 1) Integrating KGs into training objective, 2) Integrating KGs into LLM inputs, and 3) KGs Instruction-tuning.
#### 4.1.1 Integrating KGs into Training Objective
The research efforts in this category focus on designing novel knowledge-aware training objectives. An intuitive idea is to expose more knowledge entities in the pre-training objective. GLM [102] leverages the knowledge graph structure to assign a masking probability. Specifically, entities that can be reached within a certain number of hops are considered to be the most important entities for learning, and they are given a higher masking probability during pre-training. Furthermore, E-BERT [103] further controls the balance between the token-level and entity-level training losses. The training loss values are used as indications of the learning process for token and entity, which dynamically determines their ratio for the next training epochs. SKEP [124] also follows a similar fusion to inject sentiment knowledge during LLMs pre-training. SKEP first determines words with positive and negative sentiment by utilizing PMI along with a predefined set of seed sentiment words. Then, it assigns a higher masking probability to those identified sentiment words in the word masking objective.
The other line of work explicitly leverages the connections with knowledge and input text. As shown in Fig. 9, ERNIE [35] proposes a novel word-entity alignment training objective as a pre-training objective. Specifically, ERNIE feeds both sentences and corresponding entities mentioned in the text into LLMs, and then trains the LLMs to predict alignment links between textual tokens and entities in knowledge graphs. Similarly, KALM [91] enhances the input tokens by incorporating entity embeddings and includes an entity prediction pre-training task in addition to the token-only pre-training objective. This approach aims to improve the ability of LLMs to capture knowledge related to entities. Finally, KEPLER [40] directly employs both knowledge graph embedding training objective and Masked token pre-training objective into a shared transformer-based encoder. Deterministic LLM [104] focuses on pre-training language models to capture deterministic factual knowledge. It only masks the span that has a deterministic entity as the question and introduces additional clue contrast learning and clue classification objective. WKLM [106] first replaces entities in the text with other same-type entities and then feeds them into LLMs. The model is further pre-trained to distinguish whether the entities have been replaced or not.
<details>
<summary>x8.png Details</summary>

### Visual Description
## Diagram: Text-Knowledge Alignment via LLMs
### Overview
This image is a technical diagram illustrating a process for aligning textual representations from a Large Language Model (LLM) with structured knowledge graph representations. It demonstrates how an input sentence is processed to create parallel representations for both its textual tokens and its constituent named entities, with a mechanism to align them.
### Components/Axes
The diagram is organized into three horizontal layers and two vertical columns.
**Top Layer (Titles):**
* **Left:** "Text Representations"
* **Center:** "Text-knowledge Alignment"
* **Right:** "Knowledge Graph Representations"
**Middle Layer (Processing & Representations):**
* **Central Block:** A large, yellow, horizontal rectangle labeled "**LLMs**". This is the core processing unit.
* **Left Column (Text Path):**
* A series of yellow, rounded rectangles above the LLM block, labeled sequentially: **h₁**, **h₂**, **h₃**, **h₄**, **...**, **h₉**. These represent the hidden state vectors (text representations) output by the LLM for each input token.
* Arrows point upward from the LLM block to each of these `h` boxes.
* **Right Column (Knowledge Graph Path):**
* Two blue, rounded rectangles above the LLM block, labeled: **hₑ₁** and **hₑ₂**. These represent the hidden state vectors for knowledge graph entities.
* Arrows point upward from the LLM block to each of these `hₑ` boxes.
* **Alignment Indicators:** Three dashed, black, curved arrows originate from the text representation boxes (**h₁**, **h₂**, **h₄**) and point towards the knowledge graph representation boxes (**hₑ₁**, **hₑ₂**). This visually depicts the "Text-knowledge Alignment" process.
**Bottom Layer (Inputs):**
* **Left Column (Text Sequence Input):**
* A series of yellow, rounded rectangles below the LLM block, containing the tokenized input text: "**Bob**", "**Dylan**", "**wrote**", "**blowin**", "**...**", "**1962**".
* The label "**Text Sequence**" is centered below this series.
* Arrows point upward from each token box into the LLM block.
* **Right Column (Entity Input):**
* Two blue, rounded rectangles below the LLM block, containing named entities: "**Bob Dylan**" and "**Blowin’ in the Wind**".
* The label "**Entitiy**" (note: likely a typo for "Entity") is centered below this series.
* Arrows point upward from each entity box into the LLM block.
**Footer (Example):**
* A line of text at the very bottom provides the concrete example being processed: "**Input Text: Bob Dylan wrote Blowin’ in the Wind in 1962**". The entities "Bob Dylan" and "Blowin’ in the Wind" are underlined in this sentence, corresponding to the blue entity boxes above.
### Detailed Analysis
The diagram depicts a dual-path processing flow:
1. **Text Processing Path (Left/Yellow):** The input sentence "Bob Dylan wrote Blowin’ in the Wind in 1962" is tokenized into a sequence (`Bob`, `Dylan`, `wrote`, `blowin`, ..., `1962`). Each token is fed into the LLM, which generates a corresponding contextualized text representation vector (`h₁` through `h₉`).
2. **Knowledge Graph Entity Path (Right/Blue):** Pre-identified named entities from the same sentence ("Bob Dylan", "Blowin’ in the Wind") are also processed by the LLM (or a linked component) to produce dedicated entity representation vectors (`hₑ₁`, `hₑ₂`).
3. **Alignment Mechanism (Center/Top):** The core concept is the "Text-knowledge Alignment." Dashed arrows show that specific text representations (e.g., `h₁` for "Bob", `h₂` for "Dylan", `h₄` for "blowin") are being aligned or mapped to their corresponding entity representations (`hₑ₁` for "Bob Dylan", `hₑ₂` for "Blowin’ in the Wind"). This suggests a method to ground the LLM's textual understanding in structured knowledge.
### Key Observations
* **Color Coding:** Yellow is consistently used for text/sequence elements (tokens, text representations). Blue is used for knowledge graph/entity elements (entity names, entity representations).
* **Spatial Grounding:** The "Text-knowledge Alignment" title is centered above the dashed arrows, which themselves span the gap between the left (text) and right (knowledge) columns, clearly indicating the bridging function.
* **Typo:** The label under the blue entity boxes reads "**Entitiy**" instead of "Entity".
* **Abstraction:** The use of "..." in both the text sequence (`...`) and the text representations (`...`) indicates this is a generalized model, not limited to the specific example sentence length.
### Interpretation
This diagram illustrates a method for **knowledge grounding** in Large Language Models. The core problem it addresses is that LLMs learn from text patterns but may not have a robust, structured understanding of real-world entities and facts.
The proposed solution involves:
1. **Dual Representation:** Creating parallel representations for both the raw text tokens and the formal knowledge graph entities mentioned in that text.
2. **Explicit Alignment:** Forcing or encouraging the model to align the hidden states of relevant text tokens (like "Bob" and "Dylan") with the hidden state of the corresponding knowledge entity ("Bob Dylan"). This acts as a bridge between the statistical patterns of language and a structured knowledge base.
The **significance** is that such a system could lead to LLMs that are more factual, less prone to hallucination, and better at tasks requiring precise knowledge (like question answering or information extraction), because their internal representations are explicitly tied to a knowledge graph. The diagram presents this as a modular addition or objective within the LLM framework, highlighting the flow of information from raw input to aligned, knowledge-aware representations.
</details>
Figure 9: Injecting KG information into LLMs training objective via text-knowledge alignment loss, where $h$ denotes the hidden representation generated by LLMs.
#### 4.1.2 Integrating KGs into LLM Inputs
As shown in Fig. 10, this kind of research focus on introducing relevant knowledge sub-graph into the inputs of LLMs. Given a knowledge graph triple and the corresponding sentences, ERNIE 3.0 [101] represents the triple as a sequence of tokens and directly concatenates them with the sentences. It further randomly masks either the relation token in the triple or tokens in the sentences to better combine knowledge with textual representations. However, such direct knowledge triple concatenation method allows the tokens in the sentence to intensively interact with the tokens in the knowledge sub-graph, which could result in Knowledge Noise [36]. To solve this issue, K-BERT [36] takes the first step to inject the knowledge triple into the sentence via a visible matrix where only the knowledge entities have access to the knowledge triple information, while the tokens in the sentences can only see each other in the self-attention module. To further reduce Knowledge Noise, Colake [107] proposes a unified word-knowledge graph (shown in Fig. 10) where the tokens in the input sentences form a fully connected word graph where tokens aligned with knowledge entities are connected with their neighboring entities.
The above methods can indeed inject a large amount of knowledge into LLMs. However, they mostly focus on popular entities and overlook the low-frequent and long-tail ones. DkLLM [108] aims to improve the LLMs representations towards those entities. DkLLM first proposes a novel measurement to determine long-tail entities and then replaces these selected entities in the text with pseudo token embedding as new input to the large language models. Furthermore, Dict-BERT [125] proposes to leverage external dictionaries to solve this issue. Specifically, Dict-BERT improves the representation quality of rare words by appending their definitions from the dictionary at the end of input text and trains the language model to locally align rare word representations in input sentences and dictionary definitions as well as to discriminate whether the input text and definition are correctly mapped.
<details>
<summary>x9.png Details</summary>

### Visual Description
## Diagram: LLM Integration with Text and Knowledge Graphs for Masked Prediction
### Overview
This diagram illustrates a conceptual framework for how Large Language Models (LLMs) can process and integrate information from both a raw text sequence and a structured knowledge graph to perform two distinct masked prediction tasks. The system uses a sentence from "Pride and Prejudice" as an example input.
### Components/Axes
The diagram is structured in three horizontal layers, flowing from bottom to top:
1. **Input Layer (Bottom):** Contains the source data.
* **Input Text:** A sentence at the very bottom: "Mr. Darcy gives Elizabeth a letter". The words "Mr. Darcy", "gives", "Elizabeth", and "letter" are underlined.
* **Text Graph:** A graph representation of the input sentence, located on the left. Nodes (yellow circles) represent words: "a", "gives", "letter", "Mr. Darcy", "Elizabeth". Lines connect all nodes, indicating a fully connected graph structure for this phrase.
* **Knowledge Graph:** A graph of entity relationships, located on the right. Nodes (blue circles) represent entities and relations: "Mother", "Jane", "Beloved", "Father", "Mr. Bennet". Lines connect "Mother" to "Jane", "Beloved", and "Father"; "Father" is also connected to "Mr. Bennet".
2. **Processing Layer (Middle):** Shows the formatted input sequences for the LLM.
* **Text Sequence:** Derived from the Text Graph. It is a sequence of tokens: "Mr. Darcy", "...", "[MASK]". An upward arrow points from the Text Graph to this sequence.
* **Entity Sequence:** Derived from the Knowledge Graph. It is a sequence of entity tokens: "Mother", "...", "[MASK]". An upward arrow points from the Knowledge Graph to this sequence.
* **LLMs:** A large, central yellow rectangle labeled "LLMs". Arrows point upward from each token in both the Text Sequence and Entity Sequence into this block.
3. **Output Layer (Top):** Shows the model's predictions.
* **Mask Text Prediction:** An arrow points from the LLMs block to a yellow box containing the word "letter". The label "Mask Text Prediction" is to its left.
* **Mask Entity Prediction:** An arrow points from the LLMs block to a blue box containing the name "Mr. Bennet". The label "Mask Entity Prediction" is to its left.
### Detailed Analysis
* **Flow and Relationships:** The diagram depicts a dual-pathway process. The input sentence is simultaneously parsed into a textual graph and used to query or align with a knowledge graph. Both representations are serialized into sequences (Text Sequence and Entity Sequence) that are fed into the LLM.
* **Masked Token Tasks:** The system is set up for two cloze-style (fill-in-the-blank) tasks:
1. **Textual Mask:** The `[MASK]` token in the Text Sequence corresponds to the final word in the input sentence. The LLM correctly predicts the masked word as "letter".
2. **Entity Mask:** The `[MASK]` token in the Entity Sequence corresponds to an entity related to "Elizabeth" via the "Father" relation in the Knowledge Graph. The LLM correctly predicts the masked entity as "Mr. Bennet".
* **Graph Structures:** The Text Graph is fully connected, suggesting a model that considers all pairwise relationships between words in the phrase. The Knowledge Graph shows a specific relational structure centered around family connections (Mother, Father) and a relationship (Beloved).
### Key Observations
* The diagram explicitly links the word "Elizabeth" in the Text Graph to the entity "Elizabeth" (implied, though not shown as a node) in the Knowledge Graph, which is then connected to "Father" and "Mr. Bennet".
* The use of color is consistent: yellow for text/word elements and blue for entity/knowledge elements.
* The "..." in both sequences indicates that there are intermediate tokens or entities not explicitly shown for simplicity.
### Interpretation
This diagram proposes a method for enhancing LLMs with structured knowledge. It suggests that by jointly processing a text and its corresponding knowledge graph, a model can perform more accurate and context-aware predictions. The example demonstrates two capabilities:
1. **Basic Language Modeling:** Predicting the next word ("letter") in a sentence.
2. **Knowledge-Aware Reasoning:** Inferring a missing entity ("Mr. Bennet") based on relational knowledge (that Mr. Bennet is Elizabeth's father), which is not explicitly stated in the input sentence "Mr. Darcy gives Elizabeth a letter."
The framework implies that the LLM uses the knowledge graph to fill gaps in its understanding, moving beyond simple statistical patterns in text to incorporate factual, relational information. This is a visual representation of neuro-symbolic integration, where neural networks (LLMs) are augmented with symbolic knowledge structures (Knowledge Graphs) for more robust reasoning.
</details>
Figure 10: Injecting KG information into LLMs inputs using graph structure.
#### 4.1.3 KGs Instruction-tuning
Instead of injecting factual knowledge into LLMs, the KGs Instruction-tuning aims to fine-tune LLMs to better comprehend the structure of KGs and effectively follow user instructions to conduct complex tasks. KGs Instruction-tuning utilizes both facts and the structure of KGs to create instruction-tuning datasets. LLMs finetuned on these datasets can extract both factual and structural knowledge from KGs, enhancing the reasoning ability of LLMs. KP-PLM [109] first designs several prompt templates to transfer structural graphs into natural language text. Then, two self-supervised tasks are proposed to finetune LLMs to further leverage the knowledge from these prompts. OntoPrompt [110] proposes an ontology-enhanced prompt-tuning that can place knowledge of entities into the context of LLMs, which are further finetuned on several downstream tasks. ChatKBQA [111] finetunes LLMs on KG structure to generate logical queries, which can be executed on KGs to obtain answers. To better reason on graphs, RoG [112] presents a planning-retrieval-reasoning framework. RoG is finetuned on KG structure to generate relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning and generate interpretable results.
KGs Instruction-tuning can better leverage the knowledge from KGs for downstream tasks. However, it requires retraining the models, which is time-consuming and requires lots of resources.
### 4.2 KG-enhanced LLM Inference
The above methods could effectively fuse knowledge into LLMs. However, real-world knowledge is subject to change and the limitation of these approaches is that they do not permit updates to the incorporated knowledge without retraining the model. As a result, they may not generalize well to the unseen knowledge during inference [126]. Therefore, considerable research has been devoted to keeping the knowledge space and text space separate and injecting the knowledge while inference. These methods mostly focus on the Question Answering (QA) tasks, because QA requires the model to capture both textual semantic meanings and up-to-date real-world knowledge.
#### 4.2.1 Retrieval-Augmented Knowledge Fusion
Retrieval-Augmented Knowledge Fusion is a popular method to inject knowledge into LLMs during inference. The key idea is to retrieve relevant knowledge from a large corpus and then fuse the retrieved knowledge into LLMs. As shown in Fig. 11, RAG [92] proposes to combine non-parametric and parametric modules to handle the external knowledge. Given the input text, RAG first searches for relevant KG in the non-parametric module via MIPS to obtain several documents. RAG then treats these documents as hidden variables $z$ and feeds them into the output generator, empowered by Seq2Seq LLMs, as additional context information. The research indicates that using different retrieved documents as conditions at different generation steps performs better than only using a single document to guide the whole generation process. The experimental results show that RAG outperforms other parametric-only and non-parametric-only baseline models in open-domain QA. RAG can also generate more specific, diverse, and factual text than other parameter-only baselines. Story-fragments [127] further improves architecture by adding an additional module to determine salient knowledge entities and fuse them into the generator to improve the quality of generated long stories. EMAT [115] further improves the efficiency of such a system by encoding external knowledge into a key-value memory and exploiting the fast maximum inner product search for memory querying. REALM [114] proposes a novel knowledge retriever to help the model to retrieve and attend over documents from a large corpus during the pre-training stage and successfully improves the performance of open-domain question answering. KGLM [113] selects the facts from a knowledge graph using the current context to generate factual sentences. With the help of an external knowledge graph, KGLM could describe facts using out-of-domain words or phrases.
<details>
<summary>x10.png Details</summary>

### Visual Description
## Diagram: Knowledge Retrieval-Augmented Generation (RAG) System Flowchart
### Overview
This image is a technical flowchart illustrating a knowledge retrieval-augmented generation (RAG) system architecture. It demonstrates the process of answering a factual question by retrieving structured data from Knowledge Graphs (KGs) and using a Large Language Model (LLM) to generate a final answer, with a backpropagation mechanism for system improvement.
### Components/Axes
The diagram consists of several labeled components connected by directional arrows, indicating data flow. There are no traditional chart axes. The components are:
1. **Input Question (Q):** "Which country is Obama from?" (Position: Far left)
2. **KGs:** A light blue, rounded rectangle labeled "KGs" (Position: Top-left). This represents Knowledge Graphs, the source of structured data.
3. **Knowledge Retriever:** A white, rounded rectangle with a black border labeled "Knowledge Retriever" (Position: Center-left). It receives input from the question and the KGs.
4. **Retrieved Facts:** A section containing two light blue, rounded rectangles with black text, labeled collectively as "Retrieved Facts" (Position: Center). The facts are presented as triples:
* (Obama, BornIn, Honolulu)
* (Honolulu, LocatedIn, USA)
* An ellipsis "..." indicates additional retrieved facts.
5. **LLMs:** A yellow, rounded rectangle labeled "LLMs" (Position: Center-right). This represents the Large Language Model component.
6. **Output Answer (A):** "A: USA" (Position: Far right).
7. **Backpropagation:** A dashed line with an arrowhead, labeled "Backpropagation" (Position: Bottom, running from the output back to the Knowledge Retriever and LLMs).
### Detailed Analysis
The process flow is as follows:
1. **Input:** The system receives a natural language question: "Which country is Obama from?".
2. **Retrieval Phase:** The question is sent to the **Knowledge Retriever**. The retriever queries the **KGs** (Knowledge Graphs) to find relevant structured information.
3. **Fact Extraction:** The retriever outputs a set of **Retrieved Facts** in the form of subject-predicate-object triples. The explicit examples provided are:
* `(Obama, BornIn, Honolulu)`: States that Barack Obama was born in Honolulu.
* `(Honolulu, LocatedIn, USA)`: States that Honolulu is located in the USA.
The ellipsis implies the retrieval of other related triples (e.g., `(USA, IsA, Country)`).
4. **Generation Phase:** The retrieved facts are fed as context into the **LLMs** (Large Language Model).
5. **Output:** The LLM processes the question and the factual context to generate the final answer: "USA".
6. **Learning Loop:** A dashed line labeled **Backpropagation** connects the output back to both the **Knowledge Retriever** and the **LLMs**. This indicates a feedback mechanism where the system's performance can be used to update and improve both the retrieval and generation components.
### Key Observations
* **Hybrid Architecture:** The system combines symbolic AI (structured Knowledge Graphs) with neural AI (LLMs).
* **Explicit Reasoning Path:** The retrieved facts provide a clear, logical chain (Obama -> BornIn -> Honolulu -> LocatedIn -> USA) that justifies the final answer, enhancing interpretability.
* **Closed-Loop System:** The presence of backpropagation signifies this is a trainable system, not a static pipeline. Errors in the final answer can be used to refine the retriever's search and the LLM's reasoning.
* **Spatial Layout:** The flow is strictly left-to-right for the forward inference pass. The backpropagation loop is visually distinct at the bottom, emphasizing its role as a separate training/optimization phase.
### Interpretation
This diagram illustrates a core concept in modern AI: **Retrieval-Augmented Generation (RAG)**. The system is designed to overcome a key limitation of standalone LLMs—their potential for hallucination and lack of up-to-date or precise factual knowledge.
* **What it demonstrates:** It shows how grounding an LLM's response in externally retrieved, structured facts from a KG can lead to more accurate, verifiable, and trustworthy answers. The LLM's role shifts from being a sole repository of knowledge to being a reasoner that synthesizes provided information.
* **Relationships:** The Knowledge Retriever acts as a bridge between the unstructured query and the structured knowledge base. The LLM acts as the reasoning engine that interprets the retrieved facts in the context of the original question. The backpropagation link is critical, suggesting the entire system can be trained end-to-end, potentially optimizing what facts are retrieved and how they are used.
* **Significance:** This architecture is foundational for building AI assistants that can answer questions accurately about specific domains (using private KGs) or dynamic information (using constantly updated KGs), while maintaining the natural language fluency of LLMs. The example using Barack Obama is a simple, clear demonstration of multi-hop reasoning (connecting two facts) to derive an answer not explicitly stated in a single triple.
</details>
Figure 11: Retrieving external knowledge to enhance the LLM generation.
#### 4.2.2 KGs Prompting
To better feed the KG structure into the LLM during inference, KGs prompting aims to design a crafted prompt that converts structured KGs into text sequences, which can be fed as context into LLMs. In this way, LLMs can better take advantage of the structure of KGs to perform reasoning. Li et al. [64] adopt the pre-defined template to convert each triple into a short sentence, which can be understood by LLMs for reasoning. Mindmap [65] designs a KG prompt to convert graph structure into a mind map that enables LLMs to perform reasoning by consolidating the facts in KGs and the implicit knowledge from LLMs. ChatRule [116] samples several relation paths from KGs, which are verbalized and fed into LLMs. Then, LLMs are prompted to generate meaningful logical rules that can be used for reasoning. CoK [117] proposes a chain-of-knowledge prompting that uses a sequence of triples to elicit the reasoning ability of LLMs to reach the final answer.
KGs prompting presents a simple way to synergize LLMs and KGs. By using the prompt, we can easily harness the power of LLMs to perform reasoning based on KGs without retraining the models. However, the prompt is usually designed manually, which requires lots of human effort.
### 4.3 Comparison between KG-enhanced LLM Pre-training and Inference
KG-enhanced LLM Pre-training methods commonly enrich large-amount of unlabeled corpus with semantically relevant real-world knowledge. These methods allow the knowledge representations to be aligned with appropriate linguistic context and explicitly train LLMs to leverage those knowledge from scratch. When applying the resulting LLMs to downstream knowledge-intensive tasks, they should achieve optimal performance. In contrast, KG-enhanced LLM inference methods only present the knowledge to LLMs in the inference stage and the underlying LLMs may not be trained to fully leverage these knowledge when conducting downstream tasks, potentially resulting in sub-optimal model performance.
However, real-world knowledge is dynamic and requires frequent updates. Despite being effective, the KG-enhanced LLM Pre-training methods never permit knowledge updates or editing without model re-training. As a result, the KG-enhanced LLM Pre-training methods could generalize poorly to recent or unseen knowledge. KG-enhanced LLM inference methods can easily maintain knowledge updates by changing the inference inputs. These methods help improve LLMs performance on new knowledge and domains.
In summary, when to use these methods depends on the application scenarios. If one wishes to apply LLMs to handle time-insensitive knowledge in particular domains (e.g., commonsense and reasoning knowledge), KG-enhanced LLM Pre-training methods should be considered. Otherwise, KG-enhanced LLM inference methods can be used to handle open-domain knowledge with frequent updates.
<details>
<summary>extracted/5367551/figs/LLM_probing.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph (KG) Integration with LLMs for Question Answering
### Overview
The image is a technical diagram illustrating a process flow where a Knowledge Graph (KG) is used to generate factual questions, which are then answered by Large Language Models (LLMs). The process includes a validation step where the LLM's prediction is checked against the original KG fact. The diagram is composed of a central KG subgraph, connected via labeled arrows to three processing blocks: a Question Generator, LLMs, and an Answer/Validation block.
### Components/Axes
The diagram is segmented into four primary regions:
1. **Top-Left (Dashed Box):** A Knowledge Graph (KG) subgraph.
2. **Top-Right:** A "Question Generator" block.
3. **Bottom-Right:** A yellow "LLMs" block.
4. **Bottom-Left:** An "Answer: President" block.
**Labels and Text Elements:**
* **KGs Box Title:** "KGs"
* **KG Nodes (Circles):** "President", "Obama", "Hawaii", "USA", "1776"
* **KG Edges (Arrows with Labels):**
* From "Obama" to "President": "Profession"
* From "Obama" to "Hawaii": "BronIn" (Note: Likely a typo for "BornIn")
* From "Obama" to "USA": "Country"
* From "1776" to "USA": "FoundedIn"
* **Flow Labels:**
* Arrow from KG to Question Generator: "Fact" with the text "(Obama, Profession, President)" above it.
* Arrow from Question Generator to LLMs: "Obama's profession is [MASK]."
* Arrow from LLMs to Answer block: "Prediction"
* Arrow from KG to Answer block: "Validation"
* **Block Labels:**
* "Question Generator"
* "LLMs"
* "Answer: President"
### Detailed Analysis
**1. Knowledge Graph (KG) Subgraph:**
* **Central Entity:** "Obama" is the central node.
* **Relationships:**
* `Obama --(Profession)--> President`: States Obama's profession.
* `Obama --(BronIn)--> Hawaii`: States Obama's birthplace (with a noted typo).
* `Obama --(Country)--> USA`: States Obama's country.
* `1776 --(FoundedIn)--> USA`: States the founding year of the USA.
* **Spatial Layout:** The KG is enclosed in a blue dashed rectangle in the top-left quadrant. "Obama" is centrally located within this box. "President" is top-left of Obama, "Hawaii" is bottom-left, "USA" is top-right, and "1776" is bottom-right.
**2. Process Flow:**
* **Step 1 - Fact Extraction:** A specific fact, represented as the triple `(Obama, Profession, President)`, is extracted from the KG. This is indicated by the arrow labeled "Fact" pointing from the KG box to the "Question Generator".
* **Step 2 - Question Generation:** The "Question Generator" block processes the fact to create a natural language question with a mask: `"Obama's profession is [MASK]."`
* **Step 3 - LLM Prediction:** The masked question is fed into the "LLMs" block (highlighted in yellow). The LLM generates a prediction for the masked token.
* **Step 4 - Validation:** The LLM's "Prediction" is sent to the "Answer" block. Simultaneously, the original fact from the KG is used for "Validation". The final output in the answer block is "President", indicating the correct answer derived from the KG and confirmed as the valid prediction.
### Key Observations
* **Typo in Diagram:** The edge label "BronIn" is almost certainly a typographical error for "BornIn".
* **Closed-Loop System:** The diagram depicts a closed-loop evaluation or training system. The KG provides ground truth, which is transformed into a testable format (masked question) for the LLM, and the result is validated against the original source.
* **Specific Example:** The entire process is demonstrated using a single, concrete example about Barack Obama's profession, making the abstract process easy to follow.
* **Visual Emphasis:** The "LLMs" block is the only one filled with color (yellow), drawing attention to it as the core component being tested or utilized.
### Interpretation
This diagram illustrates a methodology for **evaluating or training LLMs on factual knowledge** using structured Knowledge Graphs. The process demonstrates how unstructured language models can be grounded in and tested against a structured, verifiable knowledge base.
* **Purpose:** The system is designed to generate precise, fact-based questions from a KG to probe an LLM's knowledge. The `[MASK]` token format is typical of cloze-style tasks used in model training and evaluation (e.g., BERT).
* **Relationships:** The KG acts as the authoritative source of truth. The Question Generator acts as a translator, converting structured triples into natural language. The LLM is the system under test, and the validation step closes the loop by comparing the model's output to the KG's fact.
* **Significance:** This approach addresses a key challenge with LLMs: their tendency to generate plausible but factually incorrect information ("hallucination"). By using a KG for validation, the process ensures answer correctness and provides a clear metric for model performance. The specific example shows the system correctly identifying "President" as the profession, validating that the LLM's prediction aligns with the structured fact `(Obama, Profession, President)`.
* **Underlying Concept:** The diagram promotes the idea of **neuro-symbolic AI**, where neural networks (LLMs) are combined with symbolic knowledge representations (KGs) to achieve more reliable and interpretable results.
</details>
Figure 12: The general framework of using knowledge graph for language model probing.
### 4.4 KG-enhanced LLM Interpretability
Although LLMs have achieved remarkable success in many NLP tasks, they are still criticized for their lack of interpretability. The large language model (LLM) interpretability refers to the understanding and explanation of the inner workings and decision-making processes of a large language model [17]. This can improve the trustworthiness of LLMs and facilitate their applications in high-stakes scenarios such as medical diagnosis and legal judgment. Knowledge graphs (KGs) represent the knowledge structurally and can provide good interpretability for the reasoning results. Therefore, researchers try to utilize KGs to improve the interpretability of LLMs, which can be roughly grouped into two categories: 1) KGs for language model probing, and 2) KGs for language model analysis.
#### 4.4.1 KGs for LLM Probing
The large language model (LLM) probing aims to understand the knowledge stored in LLMs. LLMs, trained on large-scale corpus, are often known as containing enormous knowledge. However, LLMs store the knowledge in a hidden way, making it hard to figure out the stored knowledge. Moreover, LLMs suffer from the hallucination problem [15], which results in generating statements that contradict facts. This issue significantly affects the reliability of LLMs. Therefore, it is necessary to probe and verify the knowledge stored in LLMs.
LAMA [14] is the first work to probe the knowledge in LLMs by using KGs. As shown in Fig. 12, LAMA first converts the facts in KGs into cloze statements by a pre-defined prompt template and then uses LLMs to predict the missing entity. The prediction results are used to evaluate the knowledge stored in LLMs. For example, we try to probe whether LLMs know the fact (Obama, profession, president). We first convert the fact triple into a cloze question “Obama’s profession is $\_$ .” with the object masked. Then, we test if the LLMs can predict the object “president” correctly.
However, LAMA ignores the fact that the prompts are inappropriate. For example, the prompt “Obama worked as a _” may be more favorable to the prediction of the blank by the language models than “Obama is a _ by profession”. Thus, LPAQA [118] proposes a mining and paraphrasing-based method to automatically generate high-quality and diverse prompts for a more accurate assessment of the knowledge contained in the language model. Moreover, Adolphs et al. [128] attempt to use examples to make the language model understand the query, and experiments obtain substantial improvements for BERT-large on the T-REx data. Unlike using manually defined prompt templates, Autoprompt [119] proposes an automated method, which is based on the gradient-guided search to create prompts. LLM-facteval [121] designs a systematic framework that automatically generates probing questions from KGs. The generated questions are then used to evaluate the factual knowledge stored in LLMs.
Instead of probing the general knowledge by using the encyclopedic and commonsense knowledge graphs, BioLAMA [129] and MedLAMA [120] probe the medical knowledge in LLMs by using medical knowledge graphs. Alex et al. [130] investigate the capacity of LLMs to retain less popular factual knowledge. They select unpopular facts from Wikidata knowledge graphs which have low-frequency clicked entities. These facts are then used for the evaluation, where the results indicate that LLMs encounter difficulties with such knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the tail.
<details>
<summary>extracted/5367551/figs/LLM_analysis.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph Reasoning for Question Answering
### Overview
This image is a conceptual diagram illustrating how a Knowledge Graph (KG) and Large Language Models (LLMs) can be combined to answer a factual question through multi-hop reasoning. The diagram shows a specific example: answering "Which country is Joe Biden from?" by traversing relationships in a KG.
### Components/Axes
The diagram is segmented into three primary regions:
1. **Left Region (KGs):** A dashed blue box labeled "KGs" (Knowledge Graphs) containing a network of nodes (entities) and directed edges (relationships).
2. **Center-Right Region (LLMs & Answer):** A yellow rounded rectangle labeled "LLMs" and a white rounded rectangle labeled "Answer: USA".
3. **Bottom Region (Question):** A dashed blue box containing the input question.
**Spatial Layout:**
* The **KGs** box occupies the left ~60% of the image.
* The **LLMs** box is positioned to the right of the KGs box, connected by a large white arrow.
* The **Answer** box is directly above the LLMs box, connected by an upward-pointing white arrow.
* The **Question** box spans the bottom of the image, with an upward arrow labeled "Reasoning Path" pointing into the KGs box.
### Detailed Analysis
**1. Knowledge Graph (KGs) Content:**
* **Nodes (Entities):** Represented by light blue circles. The labeled nodes are:
* `Obama`
* `President`
* `Hawaii`
* `Joe Biden`
* `USA`
* `1776`
* **Edges (Relationships):** Represented by black arrows with text labels. The relationships are:
* `Obama` --(Profession)--> `President`
* `Obama` --(bron_in)--> `Hawaii` *(Note: "bron_in" appears to be a typo for "born_in")*
* `Joe Biden` --(Colleagues)--> `Obama`
* `Obama` --(Country)--> `USA`
* `USA` --(FoundedIn)--> `1776`
* **Highlighted Reasoning Path:** A semi-transparent pink shape highlights a specific path through the graph: `Joe Biden` -> `Colleagues` -> `Obama` -> `Country` -> `USA`.
**2. Process Flow:**
* An upward arrow from the **Question** box points into the **KGs** box, labeled "Reasoning Path".
* A large white arrow points from the **KGs** box to the **LLMs** box.
* A white upward arrow points from the **LLMs** box to the **Answer** box.
**3. Text Content:**
* **Question Box:** `Q: Which country is Joe Biden from?`
* The words `country` and `Joe Biden` are highlighted in blue font.
* **LLMs Box:** `LLMs` (in black text on a yellow background).
* **Answer Box:** `Answer: USA` (in black text).
### Key Observations
* The diagram explicitly shows a **multi-hop reasoning** process. The answer "USA" is not directly linked to "Joe Biden" in the KG. The system must infer the connection via the intermediate node "Obama".
* The **pink highlight** visually isolates the critical reasoning chain used to derive the answer.
* The **typo "bron_in"** is present on the edge from Obama to Hawaii.
* The diagram uses **color coding**: light blue for KG entities, yellow for the LLM processing unit, and blue text to highlight key terms in the question.
### Interpretation
This diagram serves as a pedagogical or conceptual model for a **neuro-symbolic AI system**. It demonstrates how structured knowledge (the KG) can be leveraged by a statistical model (the LLM) to perform logical reasoning.
* **What it demonstrates:** The system doesn't just retrieve a stored fact. It performs a graph traversal: 1) Identify the subject (`Joe Biden`). 2) Find a connecting relationship (`Colleagues`) to another entity (`Obama`). 3) From that entity, find the target relationship (`Country`) to arrive at the answer (`USA`). This mimics human-like deductive reasoning.
* **Relationship between elements:** The KG provides the factual backbone and relational structure. The LLM acts as the reasoning engine that interprets the question, navigates the graph, and synthesizes the final answer. The "Reasoning Path" arrow signifies that the question triggers a specific search strategy within the KG.
* **Notable Anomaly:** The inclusion of the node `1776` and the `FoundedIn` relationship, while factually correct for the USA, is **extraneous to answering the specific question**. This highlights a challenge in KG-based QA: distinguishing relevant from irrelevant subgraphs. The system must focus on the path leading to the answer and ignore tangential facts.
* **Underlying Message:** The diagram argues for the value of combining explicit, symbolic knowledge representations (KGs) with the flexible processing power of LLMs to achieve more accurate, explainable, and logically sound question answering compared to using an LLM alone, which might rely solely on parametric knowledge.
</details>
Figure 13: The general framework of using knowledge graph for language model analysis.
#### 4.4.2 KGs for LLM Analysis
Knowledge graphs (KGs) for pre-train language models (LLMs) analysis aims to answer the following questions such as “how do LLMs generate the results?”, and “how do the function and structure work in LLMs?”. To analyze the inference process of LLMs, as shown in Fig. 13, KagNet [38] and QA-GNN [131] make the results generated by LLMs at each reasoning step grounded by knowledge graphs. In this way, the reasoning process of LLMs can be explained by extracting the graph structure from KGs. Shaobo et al. [123] investigate how LLMs generate the results correctly. They adopt the causal-inspired analysis from facts extracted from KGs. This analysis quantitatively measures the word patterns that LLMs depend on to generate the results. The results show that LLMs generate the missing factual more by the positionally closed words rather than the knowledge-dependent words. Thus, they claim that LLMs are inadequate to memorize factual knowledge because of the inaccurate dependence. To interpret the training of LLMs, Swamy et al. [122] adopt the language model during pre-training to generate knowledge graphs. The knowledge acquired by LLMs during training can be unveiled by the facts in KGs explicitly. To explore how implicit knowledge is stored in parameters of LLMs, Dai et al. [39] propose the concept of knowledge neurons. Specifically, activation of the identified knowledge neurons is highly correlated with knowledge expression. Thus, they explore the knowledge and facts represented by each neuron by suppressing and amplifying knowledge neurons.
## 5 LLM-augmented KGs
Knowledge graphs are famous for representing knowledge in a structural manner. They have been applied in many downstream tasks such as question answering, recommendation, and web search. However, the conventional KGs are often incomplete and existing methods often lack considering textual information. To address these issues, recent research has explored integrating LLMs to augment KGs to consider the textual information and improve the performance in downstream tasks. In this section, we will introduce the recent research on LLM-augmented KGs. We will introduce the methods that integrate LLMs for KG embedding, KG completion, KG construction, KG-to-text generation, and KG question answering, respectively. Representative works are summarized in Table III.
TABLE III: Summary of representative LLM-augmented KG methods.
[b] Task Method Year LLM Technique LLM-augmented KG embedding Pretrain-KGE [94] 2020 E LLMs as Text Encoders KEPLER [40] 2020 E LLMs as Text Encoders Nayyeri et al. [132] 2022 E LLMs as Text Encoders Huang et al. [133] 2022 E LLMs as Text Encoders CoDEx [134] 2022 E LLMs as Text Encoders LMKE [135] 2022 E LLMs for Joint Text and KG Embedding kNN-KGE [136] 2022 E LLMs for Joint Text and KG Embedding LambdaKG [137] 2023 E + D + ED LLMs for Joint Text and KG Embedding LLM-augmented KG completion KG-BERT [26] 2019 E Joint Encoding MTL-KGC [138] 2020 E Joint Encoding PKGC [139] 2022 E Joint Encoding LASS [140] 2022 E Joint Encoding MEM-KGC [141] 2021 E MLM Encoding OpenWorld KGC [142] 2023 E MLM Encoding StAR [143] 2021 E Separated Encoding SimKGC [144] 2022 E Separated Encoding LP-BERT [145] 2022 E Separated Encoding GenKGC [96] 2022 ED LLM as decoders KGT5 [146] 2022 ED LLM as decoders KG-S2S [147] 2022 ED LLM as decoders AutoKG [93] 2023 D LLM as decoders LLM-augmented KG construction ELMO [148] 2018 E Named Entity Recognition GenerativeNER [149] 2021 ED Named Entity Recognition LDET [150] 2019 E Entity Typing BOX4Types [151] 2021 E Entity Typing ELQ [152] 2020 E Entity Linking ReFinED [153] 2022 E Entity Linking BertCR [154] 2019 E CR (Within-document) Spanbert [155] 2020 E CR (Within-document) CDLM [156] 2021 E CR (Cross-document) CrossCR [157] 2021 E CR (Cross-document) CR-RL [158] 2021 E CR (Cross-document) SentRE [159] 2019 E RE (Sentence-level) Curriculum-RE [160] 2021 E RE (Sentence-level) DREEAM [161] 2023 E RE (Document-level) Kumar et al. [95] 2020 E End-to-End Construction Guo et al. [162] 2021 E End-to-End Construction Grapher [41] 2021 ED End-to-End Construction PiVE [163] 2023 D + ED End-to-End Construction COMET [164] 2019 D Distilling KGs from LLMs BertNet [165] 2022 E Distilling KGs from LLMs West et al. [166] 2022 D Distilling KGs from LLMs LLM-augmented KG-to-text Generation Ribeiro et al [167] 2021 ED Leveraging Knowledge from LLMs JointGT [42] 2021 ED Leveraging Knowledge from LLMs FSKG2Text [168] 2021 D + ED Leveraging Knowledge from LLMs GAP [169] 2022 ED Leveraging Knowledge from LLMs GenWiki [170] 2020 - Constructing KG-text aligned Corpus KGPT [171] 2020 ED Constructing KG-text aligned Corpus LLM-augmented KGQA Lukovnikov et al. [172] 2019 E Entity/Relation Extractor Luo et al. [173] 2020 E Entity/Relation Extractor QA-GNN [131] 2021 E Entity/Relation Extractor Nan et al. [174] 2023 E + D + ED Entity/Relation Extractor DEKCOR [175] 2021 E Answer Reasoner DRLK [176] 2022 E Answer Reasoner OreoLM [177] 2022 E Answer Reasoner GreaseLM [178] 2022 E Answer Reasoner ReLMKG [179] 2022 E Answer Reasoner UniKGQA [43] 2023 E Answer Reasoner
- E: Encoder-only LLMs, D: Decoder-only LLMs, ED: Encoder-decoder LLMs.
### 5.1 LLM-augmented KG Embedding
Knowledge graph embedding (KGE) aims to map each entity and relation into a low-dimensional vector (embedding) space. These embeddings contain both semantic and structural information of KGs, which can be utilized for various tasks such as question answering [180], reasoning [38], and recommendation [181]. Conventional knowledge graph embedding methods mainly rely on the structural information of KGs to optimize a scoring function defined on embeddings (e.g., TransE [33], and DisMult [182]). However, these approaches often fall short in representing unseen entities and long-tailed relations due to their limited structural connectivity [183, 184]. To address this issue, as shown in Fig. 14, recent research adopts LLMs to enrich representations of KGs by encoding the textual descriptions of entities and relations [94, 40].
<details>
<summary>x11.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph Embedding (KGE) Training Pipeline Using LLMs
### Overview
This image is a technical diagram illustrating a pipeline for training Knowledge Graph Embedding (KGE) models. The process uses Large Language Models (LLMs) to generate textual embeddings for entities and relations from a Knowledge Graph (KG), which are then fed into KGE models to produce final vector representations. The diagram uses a specific example triple from a knowledge graph to demonstrate the flow.
### Components and Flow
The diagram is structured vertically, with a bottom-to-top data flow indicated by arrows. It can be segmented into three main regions:
1. **Bottom Region (Input - Knowledge Graph):**
* A central light blue box labeled **"KGs"** (Knowledge Graphs).
* Three arrows point upward from "KGs" to three components representing a knowledge graph triple `(h, r, t)`:
* **Left (h - Head Entity):** A light blue box labeled **"Neil Armstrong"** with the subscript **`h`** below it.
* **Center (r - Relation):** A darker blue box labeled **"BornIn"** with the subscript **`r`** below it.
* **Right (t - Tail Entity):** A light blue box labeled **"Wapakoneta"** with the subscript **`t`** below it.
* The entire triple is enclosed in parentheses: `( Neil Armstrong , BornIn , Wapakoneta )`.
2. **Middle Region (Textual Processing - LLMs):**
* Above the triple, three text descriptions are provided, each with an arrow pointing up to a central yellow box.
* **Left (`Text_h`):** A box containing the text: **"An American astronaut and aeronautical engineer."** An arrow labeled **`Text_h`** points from this box to the LLMs box.
* **Center (`Text_r`):** An arrow labeled **`Text_r`** points directly from the "BornIn" relation box to the LLMs box. No separate descriptive text box is shown for the relation.
* **Right (`Text_t`):** A box containing the text: **"A small city in Ohio, USA."** An arrow labeled **`Text_t`** points from this box to the LLMs box.
* A large, central yellow box labeled **"LLMs"** (Large Language Models) receives the three text inputs (`Text_h`, `Text_r`, `Text_t`).
3. **Top Region (Embedding Generation - KGE Training):**
* This entire section is enclosed in a dashed-line box labeled **"KGE Training"** at the top-left.
* Three arrows point upward from the "LLMs" box to three sets of blue rectangles representing initial embeddings:
* **Left:** A set of four light blue rectangles labeled **`e_h`** (embedding for head).
* **Center:** A set of four darker blue rectangles labeled **`e_r`** (embedding for relation).
* **Right:** A set of four light blue rectangles labeled **`e_t`** (embedding for tail).
* These three embedding sets (`e_h`, `e_r`, `e_t`) are inputs to a central grey box labeled **"KGE Models"**.
* Three arrows point upward from the "KGE Models" box to three final vector representations:
* **Left:** A set of four light blue rectangles labeled **`v_h`** (vector for head).
* **Center:** A set of four darker blue rectangles labeled **`v_r`** (vector for relation).
* **Right:** A set of four light blue rectangles labeled **`v_t`** (vector for tail).
### Detailed Analysis
* **Data Flow:** The process begins with a structured triple `(h, r, t)` from a Knowledge Graph. The head (`h`) and tail (`t`) entities are converted into natural language descriptions (`Text_h`, `Text_t`). The relation (`r`) is also associated with a text label (`Text_r`). These textual representations are processed by LLMs to generate initial dense vector embeddings (`e_h`, `e_r`, `e_t`). These embeddings serve as the input for training specialized KGE models, which then output the final, refined knowledge graph vectors (`v_h`, `v_r`, `v_t`).
* **Color Coding:** A consistent color scheme is used:
* **Light Blue:** Used for head/tail entities (`Neil Armstrong`, `Wapakoneta`), their text descriptions, their initial embeddings (`e_h`, `e_t`), and their final vectors (`v_h`, `v_t`).
* **Darker Blue:** Used exclusively for the relation (`BornIn`), its initial embedding (`e_r`), and its final vector (`v_r`).
* **Yellow:** Highlights the core processing component, the LLMs.
* **Grey:** Represents the KGE Models component.
* **Spatial Grounding:** The "KGE Training" label is in the top-left corner of the dashed box. The "LLMs" box is centrally located, acting as the bridge between textual input and embedding output. The flow is strictly vertical, with no horizontal connections between the parallel processing streams for h, r, and t.
### Key Observations
1. **Hybrid Architecture:** The diagram explicitly shows a pipeline that combines the semantic understanding of LLMs (for processing text) with the structural learning of KGE models (for embedding graph relations).
2. **Asymmetric Treatment:** The relation (`BornIn`) is treated differently from entities. It lacks a descriptive sentence box (`Text_r` is just a label), and its embeddings (`e_r`, `v_r`) are consistently colored differently, suggesting it may be processed or represented in a distinct manner.
3. **Example-Driven:** The use of the specific, well-known triple `(Neil Armstrong, BornIn, Wapakoneta)` serves as a concrete example to ground the abstract technical process.
### Interpretation
This diagram illustrates a method for **enhancing Knowledge Graph Embeddings with semantic knowledge from LLMs**. The core idea is that the rich, pre-trained linguistic knowledge within LLMs can be leveraged to create more meaningful initial embeddings (`e_h`, `e_r`, `e_t`) for graph elements. Instead of starting the KGE training process from random initialization, these LLM-derived embeddings provide a semantically informed starting point.
The pipeline suggests that the textual description of an entity (e.g., "An American astronaut...") contains valuable information that, when encoded by an LLM, can help a KGE model learn better structural representations (`v_h`, `v_r`, `v_t`). This could lead to KGE models that perform better on tasks like link prediction or entity classification, as they benefit from both the graph's structure and the world knowledge embedded in language models. The separation of the "LLMs" and "KGE Models" boxes indicates they are distinct modules, likely pre-trained independently and then integrated in this fashion.
</details>
Figure 14: LLMs as text encoder for knowledge graph embedding (KGE).
#### 5.1.1 LLMs as Text Encoders
Pretrain-KGE [94] is a representative method that follows the framework shown in Fig. 14. Given a triple $(h,r,t)$ from KGs, it firsts uses a LLM encoder to encode the textual descriptions of entities $h$ , $t$ , and relations $r$ into representations as
$$
\displaystyle e_{h}=\text{LLM}(\text{Text}_{h}),e_{t}=\text{LLM}(\text{Text}_{
t}),e_{r}=\text{LLM}(\text{Text}_{r}), \tag{1}
$$
where $e_{h},e_{r},$ and $e_{t}$ denotes the initial embeddings of entities $h$ , $t$ , and relations $r$ , respectively. Pretrain-KGE uses the BERT as the LLM encoder in experiments. Then, the initial embeddings are fed into a KGE model to generate the final embeddings $v_{h},v_{r}$ , and $v_{t}$ . During the KGE training phase, they optimize the KGE model by following the standard KGE loss function as
$$
\mathcal{L}=[\gamma+f(v_{h},v_{r},v_{t})-f(v^{\prime}_{h},v^{\prime}_{r},v^{
\prime}_{t})], \tag{2}
$$
where $f$ is the KGE scoring function, $\gamma$ is a margin hyperparameter, and $v^{\prime}_{h},v^{\prime}_{r}$ , and $v^{\prime}_{t}$ are the negative samples. In this way, the KGE model could learn adequate structure information, while reserving partial knowledge from LLM enabling better knowledge graph embedding. KEPLER [40] offers a unified model for knowledge embedding and pre-trained language representation. This model not only generates effective text-enhanced knowledge embedding using powerful LLMs but also seamlessly integrates factual knowledge into LLMs. Nayyeri et al. [132] use LLMs to generate the world-level, sentence-level, and document-level representations. They are integrated with graph structure embeddings into a unified vector by Dihedron and Quaternion representations of 4D hypercomplex numbers. Huang et al. [133] combine LLMs with other vision and graph encoders to learn multi-modal knowledge graph embedding that enhances the performance of downstream tasks. CoDEx [134] presents a novel loss function empowered by LLMs that guides the KGE models in measuring the likelihood of triples by considering the textual information. The proposed loss function is agnostic to model structure that can be incorporated with any KGE model.
#### 5.1.2 LLMs for Joint Text and KG Embedding
Instead of using KGE model to consider graph structure, another line of methods directly employs LLMs to incorporate both the graph structure and textual information into the embedding space simultaneously. As shown in Fig. 15, $k$ NN-KGE [136] treats the entities and relations as special tokens in the LLM. During training, it transfers each triple $(h,r,t)$ and corresponding text descriptions into a sentence $x$ as
$$
x=\texttt{[CLS]}\ h\ \ \text{Text}_{h}\texttt{[SEP]}\ r\ \texttt{[SEP]}\
\texttt{[MASK]}\ \ \text{Text}_{t}\texttt{[SEP]}, \tag{3}
$$
where the tailed entities are replaced by [MASK]. The sentence is fed into a LLM, which then finetunes the model to predict the masked entity, formulated as
$$
P_{LLM}(t|h,r)=P(\texttt{[MASK]=t}|x,\Theta), \tag{4}
$$
where $\Theta$ denotes the parameters of the LLM. The LLM is optimized to maximize the probability of the correct entity $t$ . After training, the corresponding token representations in LLMs are used as embeddings for entities and relations. Similarly, LMKE [135] proposes a contrastive learning method to improve the learning of embeddings generated by LLMs for KGE. Meanwhile, to better capture graph structure, LambdaKG [137] samples 1-hop neighbor entities and concatenates their tokens with the triple as a sentence feeding into LLMs.
<details>
<summary>x12.png Details</summary>

### Visual Description
## Diagram: Mask Entity Prediction Process
### Overview
This image is a technical diagram illustrating a process called "Mask Entity Prediction." It depicts how a Large Language Model (LLM) is used to predict a missing entity (a masked token) within a structured knowledge triple, with the aid of Knowledge Graphs (KGs). The diagram uses a specific example involving Neil Armstrong to demonstrate the workflow.
### Components/Axes
The diagram is structured vertically with a clear top-to-bottom flow, segmented into three main regions:
1. **Header (Top Region):**
* **Title:** "Mask Entity Prediction" (centered, bold, black text).
* **Predicted Output:** A light blue, rounded rectangle containing the text "Wapakoneta" is positioned at the top-right. An upward-pointing arrow connects it to the main processing block below.
2. **Main Processing Block (Central Region):**
* **Core Processor:** A large, horizontal, yellow rectangle labeled "LLMs" (bold, black text). This represents the Large Language Model performing the prediction task.
* **Input Sequence:** A row of nine distinct tokens/elements below the "LLMs" block, each with an upward-pointing arrow indicating they are fed into the model. From left to right:
* A white, rounded rectangle: `[CLS]`
* A light blue, rounded rectangle: `Neil Armstrong`
* A yellow, rounded rectangle: `Textₕ`
* A white, rounded rectangle: `[SEP]`
* A medium blue, rounded rectangle: `BornIn`
* A light blue, rounded rectangle: `[MASK]`
* A yellow, rounded rectangle: `Textₜ`
* A white, rounded rectangle: `[SEP]`
* **Knowledge Triple Annotation:** Below the input sequence, the text `( Neil Armstrong, BornIn, Wapakoneta)` is written. Beneath this triple, the variables `h`, `r`, and `t` are placed directly under "Neil Armstrong," "BornIn," and "Wapakoneta," respectively, labeling them as head, relation, and tail.
3. **Footer (Bottom Region):**
* **Knowledge Source:** A light blue, rounded rectangle labeled "KGs" (bold, black text) is centered at the bottom.
* **Data Flow Arrow:** An upward-pointing arrow connects the "KGs" box to the variable `r` (the relation "BornIn") in the knowledge triple annotation above, indicating the source of the structured data.
### Detailed Analysis
The diagram explicitly models a **knowledge graph completion or entity prediction task** using a masked language modeling approach.
* **Input Structure:** The input to the LLM is a formatted sequence resembling a sentence with special tokens. It combines:
* **Special Tokens:** `[CLS]` (start of sequence) and `[SEP]` (separator) tokens, common in models like BERT.
* **Entity Head:** The known head entity "Neil Armstrong" (light blue).
* **Relation:** The known relation "BornIn" (medium blue).
* **Masked Tail:** The tail entity position is masked with `[MASK]` (light blue), which the model must predict.
* **Contextual Text:** Placeholders `Textₕ` and `Textₜ` (yellow) suggest additional textual context about the head and tail entities may be incorporated.
* **Process Flow:**
1. A knowledge triple `(Neil Armstrong, BornIn, ?)` is sourced from **KGs** (Knowledge Graphs).
2. The head (`h=Neil Armstrong`) and relation (`r=BornIn`) are encoded into the input sequence. The tail (`t`) is replaced with a `[MASK]` token.
3. This entire sequence is processed by the **LLMs** block.
4. The LLM outputs a prediction for the masked token, which is the entity "Wapakoneta."
* **Color Coding:**
* **Light Blue:** Used for entities (`Neil Armstrong`, `[MASK]`, `Wapakoneta`) and the `KGs` source.
* **Medium Blue:** Used specifically for the relation `BornIn`.
* **Yellow:** Used for the core `LLMs` processor and the contextual text placeholders (`Textₕ`, `Textₜ`).
* **White:** Used for structural special tokens (`[CLS]`, `[SEP]`).
### Key Observations
1. **Task Specificity:** The diagram is not a general model architecture but a specific instance of a **masked entity prediction task**.
2. **Hybrid Input:** The model integrates both **structured knowledge** (the triple from KGs) and **unstructured text** (implied by `Textₕ` and `Textₜ`) as inputs.
3. **Clear Data Provenance:** The arrow from "KGs" to the relation `r` explicitly shows the origin of the structured fact being completed.
4. **Spatial Grounding:** The predicted output "Wapakoneta" is spatially linked directly to the "LLMs" block, emphasizing it is the model's direct output. The `[MASK]` token in the input sequence is the specific element being resolved.
### Interpretation
This diagram demonstrates a **Peircean investigative** process of **abductive reasoning** within AI. It shows how a system can infer the most plausible missing piece of information (the "tail" entity) given a known context (head and relation).
* **What it Suggests:** The data suggests that LLMs can be effectively leveraged not just for open-ended text generation, but for precise, structured knowledge tasks. By framing a knowledge graph completion problem as a masked language modeling problem, the model's vast pre-trained knowledge about the world can be directed to fill gaps in a formal knowledge base.
* **How Elements Relate:** The **KGs** provide the foundational, structured fact. The **LLMs** act as the reasoning engine that understands the semantic relationship between "Neil Armstrong" and "BornIn" and retrieves the correct completion ("Wapakoneta") from its parametric memory. The input sequence format is the crucial bridge that translates a symbolic knowledge query into a form the neural network can process.
* **Notable Anomalies/Patterns:** The inclusion of `Textₕ` and `Textₜ` is a notable pattern. It implies the process isn't relying solely on the entity names but may enrich them with descriptive text, potentially improving prediction accuracy for less common entities. The diagram presents an ideal, successful case where the model correctly predicts the factual answer. It does not illustrate handling of ambiguity, multiple possible answers, or incorrect predictions.
</details>
Figure 15: LLMs for joint text and knowledge graph embedding.
### 5.2 LLM-augmented KG Completion
Knowledge Graph Completion (KGC) refers to the task of inferring missing facts in a given knowledge graph. Similar to KGE, conventional KGC methods mainly focused on the structure of the KG, without considering the extensive textual information. However, the recent integration of LLMs enables KGC methods to encode text or generate facts for better KGC performance. These methods fall into two distinct categories based on their utilization styles: 1) LLM as Encoders (PaE), and 2) LLM as Generators (PaG).
#### 5.2.1 LLM as Encoders (PaE).
As shown in Fig. 16 (a), (b), and (c), this line of work first uses encoder-only LLMs to encode textual information as well as KG facts. Then, they predict the plausibility of the triples or masked entities by feeding the encoded representation into a prediction head, which could be a simple MLP or conventional KG score function (e.g., TransE [33] and TransR [185]).
Joint Encoding. Since the encoder-only LLMs (e.g., Bert [1]) are well at encoding text sequences, KG-BERT [26] represents a triple $(h,r,t)$ as a text sequence and encodes it with LLM Fig. 16 (a).
$$
x=\texttt{[CLS]}\ \text{Text}_{h}\ \texttt{[SEP]}\ \text{Text}_{r}\ \texttt{[
SEP]}\ \text{Text}_{t}\ \texttt{[SEP]}, \tag{5}
$$
The final hidden state of the [CLS] token is fed into a classifier to predict the possibility of the triple, formulated as
$$
s=\sigma(\text{MLP}(e_{\texttt{[CLS]}})), \tag{6}
$$
where $\sigma(\cdot)$ denotes the sigmoid function and $e_{\texttt{[CLS]}}$ denotes the representation encoded by LLMs. To improve the efficacy of KG-BERT, MTL-KGC [138] proposed a Multi-Task Learning for the KGC framework which incorporates additional auxiliary tasks into the model’s training, i.e. prediction (RP) and relevance ranking (RR). PKGC [139] assesses the validity of a triplet $(h,r,t)$ by transforming the triple and its supporting information into natural language sentences with pre-defined templates. These sentences are then processed by LLMs for binary classification. The supporting information of the triplet is derived from the attributes of $h$ and $t$ with a verbalizing function. For instance, if the triple is (Lebron James, member of sports team, Lakers), the information regarding Lebron James is verbalized as ”Lebron James: American basketball player”. LASS [140] observes that language semantics and graph structures are equally vital to KGC. As a result, LASS is proposed to jointly learn two types of embeddings: semantic embedding and structure embedding. In this method, the full text of a triple is forwarded to the LLM, and the mean pooling of the corresponding LLM outputs for $h$ , $r$ , and $t$ are separately calculated. These embeddings are then passed to a graph-based method, i.e. TransE, to reconstruct the KG structures.
<details>
<summary>x13.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph Triple Encoding Methods Using Large Language Models (LLMs)
### Overview
This image is a technical diagram illustrating three distinct methods for encoding knowledge graph triples, represented as (head, relation, tail) or (h, r, t), using Large Language Models (LLMs). The diagram is divided into three vertically stacked sections, each depicting a different encoding paradigm: (a) Joint Encoding, (b) MLM (Masked Language Model) Encoding, and (c) Separated Encoding. The overall flow shows how a structured triple is transformed into a text sequence and processed by an LLM-based architecture to produce a specific output (classification, entity prediction, or a score).
### Components/Axes
The diagram is composed of the following key components, arranged from top to bottom:
1. **Top Header (Common Input):**
* **Text:** `Triple: (h, r, t)`
* **Visual:** A downward-pointing arrow leads to the next line.
* **Text:** `Text Sequence: [CLS] Text_h [SEP] Text_r [SEP] Text_t [SEP]`
* **Visual:** A dashed horizontal line separates this common input definition from the three specific encoding methods below.
2. **Section (a) - Joint Encoding:**
* **Input Sequence:** `[CLS] Text_h [SEP] Text_r [SEP] Text_t [SEP]` (highlighted with a blue box around `[CLS]`).
* **Processing Block:** A yellow rectangle labeled `LLMs`.
* **Output Path:** An arrow points upward from the LLMs block to a white box labeled `MLP` (Multi-Layer Perceptron). An arrow from the MLP points to a final white box containing `0/1`, indicating a binary classification output.
* **Label:** `(a) Joint Encoding` is centered below this section.
3. **Section (b) - MLM Encoding:**
* **Input Sequence:** `[CLS] Text_h [SEP] Text_r [SEP] [MASK] [SEP]` (highlighted with a blue box around `[MASK]`).
* **Processing Block:** A yellow rectangle labeled `LLMs`.
* **Output Path:** An arrow points upward from the LLMs block to a white box labeled `MLP`. An arrow from the MLP points to a final light blue box labeled `Entity`.
* **Label:** `(b) MLM Encoding` is centered below this section.
4. **Section (c) - Separated Encoding:**
* **Input Sequences (Two Parallel Streams):**
* **Left Stream:** `[CLS] Text_h [SEP] Text_r [SEP]` (highlighted with a blue box around `[CLS]`).
* **Right Stream:** `[CLS] Text_t [SEP]` (highlighted with a blue box around `[CLS]`).
* **Processing Blocks:** Two separate yellow rectangles, each labeled `LLMs`, one for each input stream.
* **Output Path:** Arrows from both LLMs blocks converge upward into a white box labeled `Score Function`. An arrow from the Score Function points to a final white box labeled `Score`.
* **Label:** `(c) Separated Encoding` is centered below this section.
### Detailed Analysis
The diagram details the architectural flow for each method:
* **(a) Joint Encoding:** The entire triple, formatted as a single text sequence with separator tokens (`[SEP]`), is fed into an LLM. The representation of the special `[CLS]` token (typically used for sequence-level tasks) is passed to an MLP, which outputs a binary value (`0/1`). This suggests a task like triple classification (determining if the triple is true or false).
* **(b) MLM Encoding:** The input sequence is similar, but the tail entity text (`Text_t`) is replaced with a `[MASK]` token. The LLM processes this masked sequence. The representation at the position of the `[MASK]` token is fed to an MLP to predict the original `Entity` (i.e., `Text_t`). This is a classic masked language modeling objective applied to knowledge graph completion.
* **(c) Separated Encoding:** The head-relation pair (`Text_h`, `Text_r`) and the tail entity (`Text_t`) are encoded independently by two (potentially shared) LLMs. Each input starts with its own `[CLS]` token. The final hidden states corresponding to these `[CLS]` tokens are then combined by a `Score Function` to produce a scalar `Score`. This score likely measures the validity or plausibility of the triple (h, r, t).
### Key Observations
1. **Input Formatting Consistency:** All methods use the same foundational text sequence format: `[CLS] ... [SEP] ... [SEP] ... [SEP]`. The key variation is whether the tail is present (`Text_t`), masked (`[MASK]`), or processed separately.
2. **Special Token Utilization:** The `[CLS]` token is consistently used as the aggregate sequence representation for downstream tasks (classification in (a), and as input to the score function in (c)). In (b), the `[MASK]` token's position is used for prediction.
3. **Output Divergence:** The three methods are designed for different downstream tasks: binary classification (a), entity prediction/filling (b), and scoring/ranking (c).
4. **Architectural Complexity:** The complexity increases from (a) to (c). Joint Encoding uses a single LLM pass. MLM Encoding also uses a single pass but with a masked objective. Separated Encoding requires two parallel LLM passes and an additional fusion mechanism (Score Function).
### Interpretation
This diagram provides a comparative overview of how pre-trained LLMs can be adapted for knowledge graph reasoning tasks. It demonstrates the flexibility of the transformer architecture and the text-based interface for structured data.
* **What the data suggests:** The diagram suggests that knowledge graph triples can be effectively "linearized" into text sequences that LLMs are pre-trained to understand. The choice of encoding method depends on the specific task: verifying existing triples (Joint Encoding), predicting missing entities (MLM Encoding), or generating a compatibility score for ranking candidate triples (Separated Encoding).
* **How elements relate:** The top section defines the universal input representation. Sections (a), (b), and (c) are variations on a theme, showing different ways to mask, structure, and process that input to achieve different goals. The LLM is the central, reusable component in all three pipelines.
* **Notable patterns/anomalies:** A key pattern is the progressive decoupling of the triple components. Method (a) fully couples them, (b) decouples the tail via masking, and (c) fully decouples the tail from the head-relation pair. This progression likely involves a trade-off between computational cost, model capacity, and task specificity. The use of a separate `Score Function` in (c) is a notable architectural addition not present in the other two methods, indicating a need for a dedicated module to integrate separately encoded representations.
</details>
Figure 16: The general framework of adopting LLMs as encoders (PaE) for KG Completion.
MLM Encoding. Instead of encoding the full text of a triple, many works introduce the concept of Masked Language Model (MLM) to encode KG text (Fig. 16 (b)). MEM-KGC [141] uses Masked Entity Model (MEM) classification mechanism to predict the masked entities of the triple. The input text is in the form of
$$
x=\texttt{[CLS]}\ \text{Text}_{h}\ \texttt{[SEP]}\ \text{Text}_{r}\ \texttt{[
SEP]}\ \texttt{[MASK]}\ \texttt{[SEP]}, \tag{7}
$$
Similar to Eq. 4, it tries to maximize the probability that the masked entity is the correct entity $t$ . Additionally, to enable the model to learn unseen entities, MEM-KGC integrates multitask learning for entities and super-class prediction based on the text description of entities:
$$
x=\texttt{[CLS]}\ \texttt{[MASK]}\ \texttt{[SEP]}\ \text{Text}_{h}\ \texttt{[
SEP]}. \tag{8}
$$
OpenWorld KGC [142] expands the MEM-KGC model to address the challenges of open-world KGC with a pipeline framework, where two sequential MLM-based modules are defined: Entity Description Prediction (EDP), an auxiliary module that predicts a corresponding entity with a given textual description; Incomplete Triple Prediction (ITP), the target module that predicts a plausible entity for a given incomplete triple $(h,r,?)$ . EDP first encodes the triple with Eq. 8 and generates the final hidden state, which is then forwarded into ITP as an embedding of the head entity in Eq. 7 to predict target entities.
Separated Encoding. As shown in Fig. 16 (c), these methods involve partitioning a triple $(h,r,t)$ into two distinct parts, i.e. $(h,r)$ and $t$ , which can be expressed as
$$
\displaystyle x_{(h,r)} \displaystyle=\texttt{[CLS]}\ \text{Text}_{h}\ \texttt{[SEP]}\ \text{Text}_{r}
\ \texttt{[SEP]}, \displaystyle x_{t} \displaystyle=\texttt{[CLS]}\ \text{Text}_{t}\ \texttt{[SEP]}. \tag{9}
$$
Then the two parts are encoded separately by LLMs, and the final hidden states of the [CLS] tokens are used as the representations of $(h,r)$ and $t$ , respectively. The representations are then fed into a scoring function to predict the possibility of the triple, formulated as
$$
s=f_{score}(e_{(h,r)},e_{t}), \tag{11}
$$
where $f_{score}$ denotes the score function like TransE.
StAR [143] applies Siamese-style textual encoders on their text, encoding them into separate contextualized representations. To avoid the combinatorial explosion of textual encoding approaches, e.g., KG-BERT, StAR employs a scoring module that involves both deterministic classifier and spatial measurement for representation and structure learning respectively, which also enhances structured knowledge by exploring the spatial characteristics. SimKGC [144] is another instance of leveraging a Siamese textual encoder to encode textual representations. Following the encoding process, SimKGC applies contrastive learning techniques to these representations. This process involves computing the similarity between the encoded representations of a given triple and its positive and negative samples. In particular, the similarity between the encoded representation of the triple and the positive sample is maximized, while the similarity between the encoded representation of the triple and the negative sample is minimized. This enables SimKGC to learn a representation space that separates plausible and implausible triples. To avoid overfitting textural information, CSPromp-KG [186] employs parameter-efficient prompt learning for KGC.
LP-BERT [145] is a hybrid KGC method that combines both MLM Encoding and Separated Encoding. This approach consists of two stages, namely pre-training and fine-tuning. During pre-training, the method utilizes the standard MLM mechanism to pre-train a LLM with KGC data. During the fine-tuning stage, the LLM encodes both parts and is optimized using a contrastive learning strategy (similar to SimKGC [144]).
#### 5.2.2 LLM as Generators (PaG).
<details>
<summary>x14.png Details</summary>

### Visual Description
## Diagram: LLM-Based Query Processing Architectures (PaG)
### Overview
The image is a technical diagram illustrating two different architectures for processing a knowledge graph query triple using Large Language Models (LLMs). The architectures are labeled as "Encoder-Decoder PaG" and "Decoder-Only PaG". The diagram shows the flow of data from an input query triple through text sequence formatting and into the LLM components.
### Components/Axes
The diagram is divided into two main sections, separated by a horizontal dashed line.
**Top Section (Input Preparation):**
* **Top Center:** Text label "Query Triple: (h, r, ?)". An arrow points downward from this label.
* **Below Arrow:** Text label "Text Sequence: [CLS] Textₕ [SEP] Textᵣ [SEP]". This represents the formatted input sequence derived from the query triple.
**Middle Section (Encoder-Decoder PaG):**
* **Label:** "(a) Encoder-Decoder PaG" is centered below this section.
* **Components:**
* A yellow rectangular box labeled "LLMs (En.)" (Encoder).
* A second yellow rectangular box to its right labeled "LLMs (De.)" (Decoder).
* A rightward-pointing arrow connects the "LLMs (En.)" box to the "LLMs (De.)" box.
* **Data Flow:**
* An upward arrow points into the bottom of the "LLMs (En.)" box. The associated text is: "[SEP] Textₕ [SEP] Textᵣ [SEP]".
* An upward arrow points out of the top of the "LLMs (De.)" box. The associated text is: "[SEP] Textₜ [SEP]".
**Bottom Section (Decoder-Only PaG):**
* **Label:** "(a) Decoder-Only PaG" is centered below this section. *(Note: The label is identical to the one above, which may be a typographical error in the source image.)*
* **Components:**
* A single yellow rectangular box labeled "LLMs (De.)".
* **Data Flow:**
* An upward arrow points into the bottom of the "LLMs (De.)" box. The associated text is: "[SEP] Textₕ [SEP] Textᵣ [SEP]".
* An upward arrow points out of the top of the "LLMs (De.)" box. The associated text is: "[SEP] Textₜ [SEP]".
### Detailed Analysis
The diagram details the data transformation and model flow for two approaches to a task abbreviated as "PaG" (likely "Prompt-as-Graph" or similar).
1. **Input:** The process begins with a "Query Triple" in the form `(h, r, ?)`, which is a standard knowledge graph query where `h` is the head entity, `r` is the relation, and `?` denotes the missing tail entity to be predicted.
2. **Textualization:** This triple is converted into a "Text Sequence" using special tokens: `[CLS]` (classification token), `[SEP]` (separator token), `Textₕ` (text representation of the head entity), and `Textᵣ` (text representation of the relation).
3. **Encoder-Decoder PaG Architecture:**
* The formatted sequence `[SEP] Textₕ [SEP] Textᵣ [SEP]` is fed into an **Encoder** LLM (`LLMs (En.)`).
* The encoder's output is passed to a **Decoder** LLM (`LLMs (De.)`).
* The decoder generates the output sequence `[SEP] Textₜ [SEP]`, where `Textₜ` represents the predicted tail entity in text form.
4. **Decoder-Only PaG Architecture:**
* The same formatted sequence `[SEP] Textₕ [SEP] Textᵣ [SEP]` is fed directly into a single **Decoder-only** LLM (`LLMs (De.)`).
* This single model generates the same output sequence `[SEP] Textₜ [SEP]`.
### Key Observations
* **Architectural Contrast:** The core difference is the use of two specialized models (Encoder + Decoder) versus a single, unified Decoder-only model for the same task.
* **Input Consistency:** Both architectures receive an identical input text sequence derived from the query triple.
* **Output Consistency:** Both architectures are designed to produce an identical output format: a text sequence containing the predicted tail entity (`Textₜ`).
* **Labeling Anomaly:** Both architectural diagrams are labeled with "(a)", which is likely an error. Typically, they would be labeled (a) and (b) for distinction.
### Interpretation
This diagram illustrates a method for framing knowledge graph completion (predicting the tail entity `?` in a triple `(h, r, ?)`) as a text-to-text generation problem solvable by LLMs. The "PaG" approach involves converting the structured query into a natural language-like sequence.
The comparison highlights a significant design choice in applying LLMs:
* The **Encoder-Decoder** approach uses a dedicated encoder to understand the input query and a separate decoder to generate the answer, potentially allowing for more specialized processing.
* The **Decoder-Only** approach leverages the generative capabilities of a single model to perform both understanding and generation, which is characteristic of models like GPT.
The diagram suggests that the research or system being documented explores or utilizes both paradigms, possibly to compare their effectiveness or to offer flexibility based on the available LLM infrastructure. The consistent use of special tokens (`[CLS]`, `[SEP]`) indicates the use of a model architecture and tokenization scheme similar to BERT (for the encoder) or standard sequence-to-sequence models.
</details>
Figure 17: The general framework of adopting LLMs as decoders (PaG) for KG Completion. The En. and De. denote the encoder and decoder, respectively.
Recent works use LLMs as sequence-to-sequence generators in KGC. As presented in Fig. 17 (a) and (b), these approaches involve encoder-decoder or decoder-only LLMs. The LLMs receive a sequence text input of the query triple $(h,r,?)$ , and generate the text of tail entity $t$ directly.
GenKGC [96] uses the large language model BART [5] as the backbone model. Inspired by the in-context learning approach used in GPT-3 [59], where the model concatenates relevant samples to learn correct output answers, GenKGC proposes a relation-guided demonstration technique that includes triples with the same relation to facilitating the model’s learning process. In addition, during generation, an entity-aware hierarchical decoding method is proposed to reduce the time complexity. KGT5 [146] introduces a novel KGC model that fulfils four key requirements of such models: scalability, quality, versatility, and simplicity. To address these objectives, the proposed model employs a straightforward T5 small architecture. The model is distinct from previous KGC methods, in which it is randomly initialized rather than using pre-trained models. KG-S2S [147] is a comprehensive framework that can be applied to various types of KGC tasks, including Static KGC, Temporal KGC, and Few-shot KGC. To achieve this objective, KG-S2S reformulates the standard triple KG fact by introducing an additional element, forming a quadruple $(h,r,t,m)$ , where $m$ represents the additional ”condition” element. Although different KGC tasks may refer to different conditions, they typically have a similar textual format, which enables unification across different KGC tasks. The KG-S2S approach incorporates various techniques such as entity description, soft prompt, and Seq2Seq Dropout to improve the model’s performance. In addition, it utilizes constrained decoding to ensure the generated entities are valid. For closed-source LLMs (e.g., ChatGPT and GPT-4), AutoKG adopts prompt engineering to design customized prompts [93]. As shown in Fig. 18, these prompts contain the task description, few-shot examples, and test input, which instruct LLMs to predict the tail entity for KG completion.
<details>
<summary>x15.png Details</summary>

### Visual Description
## Diagram: LLM-Based Entity Prediction Process
### Overview
This image is a flowchart or process diagram illustrating how Large Language Models (LLMs) are used to perform a knowledge graph completion task. The diagram shows a pipeline where a specific query about a movie franchise is processed by an LLM to generate a predicted answer.
### Components/Axes
The diagram is structured vertically with a clear upward flow indicated by arrows. The components are color-coded and positioned as follows:
1. **Top Output Box (White, Top-Center):** A rounded rectangle containing the text "Charlie's Angels: Full Throttle". This represents the final output or prediction.
2. **LLMs Processing Box (Yellow, Center):** A rounded rectangle labeled "LLMs". An arrow points from this box to the top output box. To its left is the OpenAI logo (a black and white geometric flower-like symbol).
3. **Input Task Container (Dashed Gray Border, Bottom):** A large, dashed-border container holds the input data and examples. An arrow points from this container up to the "LLMs" box.
* **Task Description (Red Box, Top of Container):** "Given head entity and relation, predict the tail entity from the candidates: [ 100 candidates ]".
* **Example 1 (Purple Box, Middle of Container):**
* Head: Charlie's Angels
* Relation: genre of
* Tail: Comedy-GB
* **Example 2 (Orange Box, Bottom of Container):**
* Head: Charlie's Angels
* Relation: prequel of
* Tail: [This field is blank, indicating it is the target for prediction.]
* **Multiplier (Text, Right of Examples):** "×5" is placed to the right of the two example boxes, suggesting that five such examples (or a few-shot prompt with 5 demonstrations) are provided as context.
### Detailed Analysis
The diagram explicitly details a **few-shot learning** setup for an LLM.
* **Task:** The core task is defined in the red box: predicting a tail entity from a set of 100 candidates, given a head entity and a relation. This is a standard knowledge graph link prediction or completion task.
* **Input Structure:** The input consists of:
1. A natural language instruction (the red box).
2. A set of demonstration examples (the purple and orange boxes). The "×5" indicates there are five such example triples provided in the prompt.
3. The final query to be solved is embedded as the last example (the orange box), where the "Tail:" field is empty, prompting the model to fill it.
* **Example Content:** The examples use the head entity "Charlie's Angels". The first example provides a completed triple ("genre of" -> "Comedy-GB"). The second example sets up the actual prediction problem: given "Charlie's Angels" and the relation "prequel of", what is the tail entity?
* **Process Flow:** The entire input package (instruction + examples) is fed into the "LLMs" (represented by the yellow box and the OpenAI logo). The LLM processes this prompt and generates the text in the top white box as its output: "Charlie's Angels: Full Throttle".
### Key Observations
1. **Spatial Grounding:** The legend (color-coding) is consistent: White for final output, Yellow for the model, Red for the task rule, Purple for a solved example, and Orange for the target query. The "×5" is positioned to the right of the example stack, clearly modifying them.
2. **Trend/Flow Verification:** The arrows establish a clear, unidirectional data flow: Input Examples -> LLMs -> Output Prediction. There is no branching or feedback loop shown.
3. **Component Isolation:** The diagram is cleanly segmented into three logical regions: the **Input Region** (dashed box), the **Processing Region** (LLMs box), and the **Output Region** (top box).
4. **Precision in Transcription:** All text is transcribed exactly. Notably, the tail in the orange box is intentionally left blank, which is a critical part of the diagram's meaning.
### Interpretation
This diagram demonstrates a **prompt engineering** technique for leveraging LLMs as knowledge base completion engines. It shows how a complex relational reasoning task can be framed as a text completion problem.
* **What it suggests:** The LLM is being used not just for its language understanding, but as a repository of world knowledge (e.g., knowing that "Charlie's Angels: Full Throttle" is the sequel to "Charlie's Angels"). The "×5" few-shot examples are crucial; they teach the model the desired input-output format and provide in-context learning cues about the type of knowledge required.
* **Relationships:** The diagram highlights the relationship between structured knowledge (head, relation, tail triples) and unstructured language models. It positions the LLM as an intermediary that can translate between these formats.
* **Notable Anomaly/Insight:** The blank "Tail:" in the orange box is the most significant element. It transforms the diagram from a simple flowchart into a **problem statement**. The entire setup exists to fill that blank. The output "Charlie's Angels: Full Throttle" is the model's solution, confirming that the LLM successfully retrieved the correct sequel from its parametric knowledge based on the provided context and examples. This illustrates the potential of LLMs to perform symbolic reasoning tasks when prompted appropriately.
</details>
Figure 18: The framework of prompt-based PaG for KG Completion.
Comparison between PaE and PaG. LLMs as Encoders (PaE) applies an additional prediction head on the top of the representation encoded by LLMs. Therefore, the PaE framework is much easier to finetune since we can only optimize the prediction heads and freeze the LLMs. Moreover, the output of the prediction can be easily specified and integrated with existing KGC functions for different KGC tasks. However, during the inference stage, the PaE requires to compute a score for every candidate in KGs, which could be computationally expensive. Besides, they cannot generalize to unseen entities. Furthermore, the PaE requires the representation output of the LLMs, whereas some state-of-the-art LLMs (e.g. GPT-4 footnotemark: ) are closed sources and do not grant access to the representation output.
LLMs as Generators (PaG), on the other hand, which does not need the prediction head, can be used without finetuning or access to representations. Therefore, the framework of PaG is suitable for all kinds of LLMs. In addition, PaG directly generates the tail entity, making it efficient in inference without ranking all the candidates and easily generalizing to unseen entities. But, the challenge of PaG is that the generated entities could be diverse and not lie in KGs. What is more, the time of a single inference is longer due to the auto-regressive generation. Last, how to design a powerful prompt that feeds KGs into LLMs is still an open question. Consequently, while PaG has demonstrated promising results for KGC tasks, the trade-off between model complexity and computational efficiency must be carefully considered when selecting an appropriate LLM-based KGC framework.
#### 5.2.3 Model Analysis
Justin et al. [187] provide a comprehensive analysis of KGC methods integrated with LLMs. Their research investigates the quality of LLM embeddings and finds that they are suboptimal for effective entity ranking. In response, they propose several techniques for processing embeddings to improve their suitability for candidate retrieval. The study also compares different model selection dimensions, such as Embedding Extraction, Query Entity Extraction, and Language Model Selection. Lastly, the authors propose a framework that effectively adapts LLM for knowledge graph completion.
### 5.3 LLM-augmented KG Construction
Knowledge graph construction involves creating a structured representation of knowledge within a specific domain. This includes identifying entities and their relationships with each other. The process of knowledge graph construction typically involves multiple stages, including 1) entity discovery, 2) coreference resolution, and 3) relation extraction. Fig 19 presents the general framework of applying LLMs for each stage in KG construction. More recent approaches have explored 4) end-to-end knowledge graph construction, which involves constructing a complete knowledge graph in one step or directly 5) distilling knowledge graphs from LLMs.
#### 5.3.1 Entity Discovery
Entity discovery in KG construction refers to the process of identifying and extracting entities from unstructured data sources, such as text documents, web pages, or social media posts, and incorporating them to construct knowledge graphs.
Named Entity Recognition (NER) involves identifying and tagging named entities in text data with their positions and classifications. The named entities include people, organizations, locations, and other types of entities. The state-of-the-art NER methods usually employ LLMs to leverage their contextual understanding and linguistic knowledge for accurate entity recognition and classification. There are three NER sub-tasks based on the types of NER spans identified, i.e., flat NER, nested NER, and discontinuous NER. 1) Flat NER is to identify non-overlapping named entities from input text. It is usually conceptualized as a sequence labelling problem where each token in the text is assigned a unique label based on its position in the sequence [148, 1, 188, 189]. 2) Nested NER considers complex scenarios which allow a token to belong to multiple entities. The span-based method [190, 191, 192, 193, 194] is a popular branch of nested NER which involves enumerating all candidate spans and classifying them into entity types (including a non-entity type). Parsing-based methods [195, 196, 197] reveal similarities between nested NER and constituency parsing tasks (predicting nested and non-overlapping spans), and propose to integrate the insights of constituency parsing into nested NER. 3) Discontinuous NER identifies named entities that may not be contiguous in the text. To address this challenge, [198] uses the LLM output to identify entity fragments and determine whether they are overlapped or in succession.
Unlike the task-specific methods, GenerativeNER [149] uses a sequence-to-sequence LLM with a pointer mechanism to generate an entity sequence, which is capable of solving all three types of NER sub-tasks.
<details>
<summary>x16.png Details</summary>

### Visual Description
## Diagram: LLM-based Knowledge Graph Construction
### Overview
This image is a two-part diagram illustrating the process of constructing a knowledge graph from unstructured text using a Large Language Model (LLM). The top section displays the resulting knowledge graph, while the bottom section details the LLM-based extraction and linking pipeline that generates it. The overall flow is indicated by upward-pointing arrows, showing that the source text at the very bottom is processed to create the structured graph at the top.
### Components/Axes
The diagram is segmented into three primary regions:
1. **Top Region (Knowledge Graph):** A network diagram with nodes (blue circles) and labeled, colored edges representing relationships.
2. **Middle Region (LLM-based Knowledge Graph Construction):** A yellow box containing the source sentence with color-coded annotations and a legend explaining the annotation scheme.
3. **Bottom Region (Source Text):** The raw input text.
**Legend (Located in the middle region, bottom):**
* **Light Blue Box:** Named Entity Recognition
* **Light Orange Box:** Entity Typing
* **Image of Joe Biden:** Entity Linking
* **Light Purple Box:** Coreference Resolution
* **Light Red Box:** Relation Extraction
**Nodes and Edges in the Knowledge Graph (Top Region):**
* **Nodes:** Represent entities. Visualized as blue circles, some accompanied by images (Joe Biden's portrait, US flag, Pennsylvania skyline).
* **Edges:** Represent relationships. Color-coded and labeled:
* **Yellow Edges:** Labeled "IsA". Connects an entity to its type (e.g., Joe Biden -> politician, United States -> country, Pennsylvania -> state).
* **Red/Brown Edges:** Labeled with specific relations. Connects entities (e.g., Joe Biden -> BornIn -> Pennsylvania, Joe Biden -> PresidentOf -> United States).
### Detailed Analysis
**Source Text (Bottom Region):**
The input sentence is: "Joe Biden was born in Pennsylvania. He serves as the 46th President of the United States."
**LLM-based Construction Process (Middle Region):**
The source sentence is annotated to show the NLP tasks performed:
1. **Named Entity Recognition (Light Blue):** Identifies "Joe Biden", "Pennsylvania", "United States".
2. **Entity Typing (Light Orange):** Assigns types: "politician" (to Joe Biden), "state" (to Pennsylvania), "country" (to United States).
3. **Entity Linking (Image Icon):** Links the text "Joe Biden" to a specific real-world entity (represented by his portrait). Similarly, "Pennsylvania" is linked to a skyline image, and "United States" to a flag.
4. **Coreference Resolution (Light Purple):** Links the pronoun "He" back to the antecedent "Joe Biden".
5. **Relation Extraction (Light Red):** Extracts the relationships "born in" and "President of the".
**Resulting Knowledge Graph (Top Region):**
The extracted information is structured into a graph:
* **Central Node:** Joe Biden (linked to his portrait).
* **Relationships from Joe Biden:**
* `IsA` -> `politician` (yellow edge).
* `BornIn` -> `Pennsylvania` (red edge).
* `PresidentOf` -> `United States` (red edge).
* **Sub-graphs for Linked Entities:**
* `Pennsylvania` (linked to skyline) `IsA` -> `state` (yellow edge). It has an outgoing red edge to an unspecified node ("...").
* `United States` (linked to flag) `IsA` -> `country` (yellow edge). It has an incoming red edge from an unspecified node ("...").
### Key Observations
* The diagram explicitly maps each NLP sub-task (from the legend) to specific highlights in the source text.
* The knowledge graph uses a consistent visual language: blue circles for nodes, yellow "IsA" edges for typing, and red/brown edges for specific relations.
* The graph is incomplete, indicated by nodes with "..." and edges leading to unspecified nodes, suggesting it is a fragment of a larger knowledge base.
* The use of images (portrait, flag, skyline) alongside text labels serves as a form of entity disambiguation and visual grounding.
### Interpretation
This diagram serves as a pedagogical or technical illustration of how modern LLMs can be used for **structured information extraction**. It demonstrates the pipeline from unstructured natural language to a formal, queryable knowledge representation.
The process highlights several advanced NLP capabilities:
1. **Disambiguation:** Entity Linking ensures "Joe Biden" refers to the specific U.S. President, not another person with the same name.
2. **Contextual Understanding:** Coreference Resolution ("He" -> "Joe Biden") is crucial for connecting facts across sentences.
3. **Schema Induction:** The system automatically identifies entity types ("politician", "state", "country") and relation types ("BornIn", "PresidentOf"), which could be part of a predefined schema or dynamically inferred.
The output knowledge graph transforms a simple biographical sentence into a set of **subject-predicate-object triples** (e.g., `(Joe Biden, BornIn, Pennsylvania)`). This structured format is fundamental for semantic search, question answering systems, and building larger AI knowledge bases. The diagram effectively argues that LLMs can act as the "engine" for this complex extraction and structuring task, bridging the gap between human language and machine-readable knowledge.
</details>
Figure 19: The general framework of LLM-based KG construction.
Entity Typing (ET) aims to provide fine-grained and ultra-grained type information for a given entity mentioned in context. These methods usually utilize LLM to encode mentions, context and types. LDET [150] applies pre-trained ELMo embeddings [148] for word representation and adopts LSTM as its sentence and mention encoders. BOX4Types [151] recognizes the importance of type dependency and uses BERT to represent the hidden vector and each type in a hyperrectangular (box) space. LRN [199] considers extrinsic and intrinsic dependencies between labels. It encodes the context and entity with BERT and employs these output embeddings to conduct deductive and inductive reasoning. MLMET [200] uses predefined patterns to construct input samples for the BERT MLM and employs [MASK] to predict context-dependent hypernyms of the mention, which can be viewed as type labels. PL [201] and DFET [202] utilize prompt learning for entity typing. LITE [203] formulates entity typing as textual inference and uses RoBERTa-large-MNLI as the backbone network.
Entity Linking (EL), as known as entity disambiguation, involves linking entity mentions appearing in the text to their corresponding entities in a knowledge graph. [204] proposed BERT-based end-to-end EL systems that jointly discover and link entities. ELQ [152] employs a fast bi-encoder architecture to jointly perform mention detection and linking in one pass for downstream question answering systems. Unlike previous models that frame EL as matching in vector space, GENRE [205] formulates it as a sequence-to-sequence problem, autoregressively generating a version of the input markup-annotated with the unique identifiers of an entity expressed in natural language. GENRE is extended to its multilingual version mGENRE [206]. Considering the efficiency challenges of generative EL approaches, [207] parallelizes autoregressive linking across all potential mentions and relies on a shallow and efficient decoder. ReFinED [153] proposes an efficient zero-shot-capable EL approach by taking advantage of fine-grained entity types and entity descriptions which are processed by a LLM-based encoder.
#### 5.3.2 Coreference Resolution (CR)
Coreference resolution is to find all expressions (i.e., mentions) that refer to the same entity or event in a text.
Within-document CR refers to the CR sub-task where all these mentions are in a single document. Mandar et al. [154] initialize LLM-based coreferences resolution by replacing the previous LSTM encoder [208] with BERT. This work is followed by the introduction of SpanBERT [155] which is pre-trained on BERT architecture with a span-based masked language model (MLM). Inspired by these works, Tuan Manh et al. [209] present a strong baseline by incorporating the SpanBERT encoder into a non-LLM approach e2e-coref [208]. CorefBERT leverages Mention Reference Prediction (MRP) task which masks one or several mentions and requires the model to predict the masked mention’s corresponding referents. CorefQA [210] formulates coreference resolution as a question answering task, where contextual queries are generated for each candidate mention and the coreferent spans are extracted from the document using the queries. Tuan Manh et al. [211] introduce a gating mechanism and a noisy training method to extract information from event mentions using the SpanBERT encoder.
In order to reduce the large memory footprint faced by large LLM-based NER models, Yuval et al. [212] and Raghuveer el al. [213] proposed start-to-end and approximation models, respectively, both utilizing bilinear functions to calculate mention and antecedent scores with reduced reliance on span-level representations.
Cross-document CR refers to the sub-task where the mentions refer to the same entity or event might be across multiple documents. CDML [156] proposes a cross document language modeling method which pre-trains a Longformer [214] encoder on concatenated related documents and employs an MLP for binary classification to determine whether a pair of mentions is coreferent or not. CrossCR [157] utilizes an end-to-end model for cross-document coreference resolution which pre-trained the mention scorer on gold mention spans and uses a pairwise scorer to compare mentions with all spans across all documents. CR-RL [158] proposes an actor-critic deep reinforcement learning-based coreference resolver for cross-document CR.
#### 5.3.3 Relation Extraction (RE)
Relation extraction involves identifying semantic relationships between entities mentioned in natural language text. There are two types of relation extraction methods, i.e. sentence-level RE and document-level RE, according to the scope of the text analyzed.
Sentence-level RE focuses on identifying relations between entities within a single sentence. Peng et al. [159] and TRE [215] introduce LLM to improve the performance of relation extraction models. BERT-MTB [216] learns relation representations based on BERT by performing the matching-the-blanks task and incorporating designed objectives for relation extraction. Curriculum-RE [160] utilizes curriculum learning to improve relation extraction models by gradually increasing the difficulty of the data during training. RECENT [217] introduces SpanBERT and exploits entity type restriction to reduce the noisy candidate relation types. Jiewen [218] extends RECENT by combining both the entity information and the label information into sentence-level embeddings, which enables the embedding to be entity-label aware.
Document-level RE (DocRE) aims to extract relations between entities across multiple sentences within a document. Hong et al. [219] propose a strong baseline for DocRE by replacing the BiLSTM backbone with LLMs. HIN [220] use LLM to encode and aggregate entity representation at different levels, including entity, sentence, and document levels. GLRE [221] is a global-to-local network, which uses LLM to encode the document information in terms of entity global and local representations as well as context relation representations. SIRE [222] uses two LLM-based encoders to extract intra-sentence and inter-sentence relations. LSR [223] and GAIN [224] propose graph-based approaches which induce graph structures on top of LLM to better extract relations. DocuNet [225] formulates DocRE as a semantic segmentation task and introduces a U-Net [226] on the LLM encoder to capture local and global dependencies between entities. ATLOP [227] focuses on the multi-label problems in DocRE, which could be handled with two techniques, i.e., adaptive thresholding for classifier and localized context pooling for LLM. DREEAM [161] further extends and improves ATLOP by incorporating evidence information.
End-to-End KG Construction. Currently, researchers are exploring the use of LLMs for end-to-end KG construction. Kumar et al. [95] propose a unified approach to build KGs from raw text, which contains two LLMs powered components. They first finetune a LLM on named entity recognition tasks to make it capable of recognizing entities in raw text. Then, they propose another “2-model BERT” for solving the relation extraction task, which contains two BERT-based classifiers. The first classifier learns the relation class whereas the second binary classifier learns the direction of the relations between the two entities. The predicted triples and relations are then used to construct the KG. Guo et al. [162] propose an end-to-end knowledge extraction model based on BERT, which can be applied to construct KGs from Classical Chinese text. Grapher [41] presents a novel end-to-end multi-stage system. It first utilizes LLMs to generate KG entities, followed by a simple relation construction head, enabling efficient KG construction from the textual description. PiVE [163] proposes a prompting with an iterative verification framework that utilizes a smaller LLM like T5 to correct the errors in KGs generated by a larger LLM (e.g., ChatGPT). To further explore advanced LLMs, AutoKG design several prompts for different KG construction tasks (e.g., entity typing, entity linking, and relation extraction). Then, it adopts the prompt to perform KG construction using ChatGPT and GPT-4.
<details>
<summary>x17.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph Construction from Cloze Questions via LLMs
### Overview
This image is a flowchart diagram illustrating a three-stage process for constructing a Knowledge Graph (KG). The process begins with natural language cloze-style (fill-in-the-blank) questions, which are processed by Large Language Models (LLMs) to extract structured relational triples. These triples are then used to build a connected knowledge graph. The diagram is presented on a light gray background.
### Components/Axes
The diagram is segmented into four primary components, arranged from left to right, connected by right-pointing block arrows.
1. **Cloze Question (Leftmost Component):**
* A black-bordered, rounded rectangle containing example text.
* **Text Content:**
* `Obama born in [MASK]`
* `Honolulu is located in [MASK]`
* `USA's capital is [MASK]`
* `...` (indicating more questions)
2. **LLMs (Central Processing Component):**
* A yellow, rounded rectangle with a black border.
* **Label:** `LLMs` (centered within the box).
3. **Distilled Triples (Right-Center Component):**
* A light blue, rounded rectangle with a black border.
* **Text Content (Structured Triples):**
* `(Obama, BornIn, Honolulu)`
* `(Honolulu, LocatedIn, USA)`
* `(Washingto D.C., CapitalOf, USA)`
* `...` (indicating more triples)
* **Note:** The text contains a typo: "Washingto" instead of "Washington".
4. **Construct KGs (Rightmost Component):**
* A directed graph (knowledge graph) with labeled nodes and edges.
* **Title:** `Construct KGs` (bold, top-right).
* **Nodes (Entities):** Represented by light blue circles with black outlines.
* `Brarck Obama` (Note: Typo for "Barack")
* `Honolulu`
* `USA`
* `Michelle Obama`
* `Washingto D.C.` (Note: Typo for "Washington")
* **Edges (Relationships):** Represented by black arrows with text labels.
* From `Brarck Obama` to `Honolulu`: `BornIn`
* From `Brarck Obama` to `USA`: `PoliticianOf`
* From `Brarck Obama` to `Michelle Obama`: `MarriedTo`
* From `Michelle Obama` to `USA`: `LiveIn`
* From `Honolulu` to `USA`: `LocatedIn`
* From `Washingto D.C.` to `USA`: `CapitalOf`
### Detailed Analysis
The diagram explicitly maps the transformation of unstructured text into structured data.
* **Stage 1 (Input):** The "Cloze Question" box provides the raw input format. The `[MASK]` token signifies the missing piece of information the LLM is expected to predict or fill.
* **Stage 2 (Processing):** The "LLMs" box acts as the inference engine. It takes the cloze questions as input and performs the task of filling in the masks, effectively performing relation extraction.
* **Stage 3 (Intermediate Output):** The "Distilled Triples" box shows the direct output of the LLM processing. Each line is a structured triple in the format `(Subject, Predicate, Object)`. The ellipsis (`...`) indicates this is a sample from a larger set.
* **Stage 4 (Final Output):** The "Construct KGs" section visualizes how the extracted triples are assembled into a graph database. Each unique entity from the triples becomes a node. Each triple defines a directed, labeled edge between two nodes. The graph shows a small, interconnected network centered around the entity `USA`.
### Key Observations
1. **Data Flow:** The process is linear and unidirectional: Unstructured Text -> LLM Inference -> Structured Triples -> Knowledge Graph.
2. **Entity Consolidation:** The graph demonstrates entity resolution. For example, the subject "Obama" in the triple becomes the node `Brarck Obama`, and the object "Honolulu" becomes the node `Honolulu`.
3. **Relationship Expansion:** The final graph contains relationships (`PoliticianOf`, `MarriedTo`, `LiveIn`) that were not explicitly shown in the sample "Distilled Triples" box, implying the ellipsis (`...`) represents a more comprehensive set of extracted facts.
4. **Typos/Errors:** Two consistent typos are present: "Brarck" for "Barack" and "Washingto" for "Washington". These appear in both the triple and the graph node, suggesting they originate from the source data or the extraction process.
### Interpretation
This diagram serves as a conceptual pipeline for automated knowledge base population. It demonstrates a method to leverage the latent knowledge within Large Language Models to structure information from plain text.
* **What it suggests:** The core idea is that LLMs can be prompted with cloze-style questions to reliably extract factual, relational knowledge. This extracted knowledge is not stored as text but as structured triples, which are the fundamental building blocks of knowledge graphs.
* **How elements relate:** The cloze questions define the *scope* of knowledge to be extracted (e.g., birthplaces, locations, capitals). The LLM is the *transformer* that converts this scope into structured assertions. The triples are the *atomic units* of knowledge. The knowledge graph is the *integrated system* that reveals connections between entities (e.g., linking Barack Obama to the USA via multiple relationships).
* **Notable implications:** The presence of typos highlights a critical challenge in this automated pipeline: data quality and consistency. Errors in the initial extraction (like misspelling "Barack") propagate directly into the final knowledge graph, potentially affecting its reliability for downstream tasks like reasoning or question answering. The diagram effectively argues for the feasibility of this approach while implicitly underscoring the need for robust error-checking and entity normalization steps.
</details>
Figure 20: The general framework of distilling KGs from LLMs.
#### 5.3.4 Distilling Knowledge Graphs from LLMs
LLMs have been shown to implicitly encode massive knowledge [14]. As shown in Fig. 20, some research aims to distill knowledge from LLMs to construct KGs. COMET [164] proposes a commonsense transformer model that constructs commonsense KGs by using existing tuples as a seed set of knowledge on which to train. Using this seed set, a LLM learns to adapt its learned representations to knowledge generation, and produces novel tuples that are high quality. Experimental results reveal that implicit knowledge from LLMs is transferred to generate explicit knowledge in commonsense KGs. BertNet [165] proposes a novel framework for automatic KG construction empowered by LLMs. It requires only the minimal definition of relations as inputs and automatically generates diverse prompts, and performs an efficient knowledge search within a given LLM for consistent outputs. The constructed KGs show competitive quality, diversity, and novelty with a richer set of new and complex relations, which cannot be extracted by previous methods. West et al. [166] propose a symbolic knowledge distillation framework that distills symbolic knowledge from LLMs. They first finetune a small student LLM by distilling commonsense facts from a large LLM like GPT-3. Then, the student LLM is utilized to generate commonsense KGs.
### 5.4 LLM-augmented KG-to-text Generation
The goal of Knowledge-graph-to-text (KG-to-text) generation is to generate high-quality texts that accurately and consistently describe the input knowledge graph information [228]. KG-to-text generation connects knowledge graphs and texts, significantly improving the applicability of KG in more realistic NLG scenarios, including storytelling [229] and knowledge-grounded dialogue [230]. However, it is challenging and costly to collect large amounts of graph-text parallel data, resulting in insufficient training and poor generation quality. Thus, many research efforts resort to either: 1) leverage knowledge from LLMs or 2) construct large-scale weakly-supervised KG-text corpus to solve this issue.
#### 5.4.1 Leveraging Knowledge from LLMs
As pioneering research efforts in using LLMs for KG-to-Text generation, Ribeiro et al. [167] and Kale and Rastogi [231] directly fine-tune various LLMs, including BART and T5, with the goal of transferring LLMs knowledge for this task. As shown in Fig. 21, both works simply represent the input graph as a linear traversal and find that such a naive approach successfully outperforms many existing state-of-the-art KG-to-text generation systems. Interestingly, Ribeiro et al. [167] also find that continue pre-training could further improve model performance. However, these methods are unable to explicitly incorporate rich graph semantics in KGs. To enhance LLMs with KG structure information, JointGT [42] proposes to inject KG structure-preserving representations into the Seq2Seq large language models. Given input sub-KGs and corresponding text, JointGT first represents the KG entities and their relations as a sequence of tokens, then concatenate them with the textual tokens which are fed into LLM. After the standard self-attention module, JointGT then uses a pooling layer to obtain the contextual semantic representations of knowledge entities and relations. Finally, these pooled KG representations are then aggregated in another structure-aware self-attention layer. JointGT also deploys additional pre-training objectives, including KG and text reconstruction tasks given masked inputs, to improve the alignment between text and graph information. Li et al. [168] focus on the few-shot scenario. It first employs a novel breadth-first search (BFS) strategy to better traverse the input KG structure and feed the enhanced linearized graph representations into LLMs for high-quality generated outputs, then aligns the GCN-based and LLM-based KG entity representation. Colas et al. [169] first transform the graph into its appropriate representation before linearizing the graph. Next, each KG node is encoded via a global attention mechanism, followed by a graph-aware attention module, ultimately being decoded into a sequence of tokens. Different from these works, KG-BART [37] keeps the structure of KGs and leverages the graph attention to aggregate the rich concept semantics in the sub-KG, which enhances the model generalization on unseen concept sets.
<details>
<summary>x18.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph to Text Generation Pipeline
### Overview
The image is a technical diagram illustrating a pipeline for converting structured knowledge from a Knowledge Graph (KG) into natural language text using Large Language Models (LLMs). The flow moves from left to right, starting with a graph representation, proceeding to a linearized text format, passing through an LLM, and resulting in a descriptive sentence.
### Components/Axes
The diagram is segmented into four main regions from left to right:
1. **Left Region (KGs):** A knowledge graph with nodes (entities) and directed edges (relationships).
2. **Center-Left Region (Graph Linearization):** A text-based representation of the graph's triples.
3. **Center Region (LLMs):** A yellow, rounded rectangle representing the Large Language Model processing unit.
4. **Right Region (Description Text):** The final generated natural language output.
**Labels and Text Elements:**
* **Title (Top-Left):** "KGs"
* **Graph Nodes (Entities):**
* "Brarck Obama" (Note: Likely a typo for "Barack Obama")
* "Honolulu"
* "USA"
* "Michelle Obama"
* "Washingto D.C." (Note: Likely a typo for "Washington D.C.")
* **Graph Edges (Relationships):**
* "BornIn" (from Brarck Obama to Honolulu)
* "PoliticianOf" (from Brarck Obama to USA)
* "MarriedTo" (from Brarck Obama to Michelle Obama)
* "LiveIn" (from Michelle Obama to USA)
* "LocatedIn" (from Honolulu to USA)
* "CapitalOf" (from Washingto D.C. to USA)
* **Section Title (Center-Left):** "Graph Linearization"
* **Linearized Text:**
* "Brack Obama [SEP]"
* "PoliticianOf [SEP]"
* "USA [SEP] ....."
* "[SEP] Michelle Obama"
* **Processing Unit Label (Center):** "LLMs"
* **Output Section Title (Top-Right):** "Description Text"
* **Generated Text (Right):** "Brack Obama is a politician of USA. He was born in Honolulu, and married to Michelle Obama." (Entity names "Brack Obama", "USA", "Honolulu", and "Michelle Obama" are underlined in the image).
### Detailed Analysis
The diagram explicitly maps the transformation process:
1. **Knowledge Graph (KG):** A graph structure with 5 nodes and 6 directed, labeled edges defining factual relationships between entities (people, locations, political entities).
2. **Graph Linearization:** The graph is converted into a sequence of tokens. The example shows a partial sequence: `Brack Obama [SEP] PoliticianOf [SEP] USA [SEP] ..... [SEP] Michelle Obama`. The "[SEP]" token is used as a separator between elements. The "....." indicates the sequence is truncated for the diagram.
3. **LLM Processing:** The linearized text sequence is input into a block labeled "LLMs".
4. **Text Generation:** The LLM outputs a coherent, grammatical sentence that verbalizes the relationships from the original graph: "Brack Obama is a politician of USA. He was born in Honolulu, and married to Michelle Obama."
### Key Observations
* **Typos in Source Data:** The knowledge graph contains typos ("Brarck", "Washingto") which are partially propagated to the linearization ("Brack") and the final output ("Brack").
* **Underlining in Output:** In the final "Description Text", the entity names are underlined, visually highlighting them as the key pieces of information extracted from the structured data.
* **Directional Flow:** The process is strictly unidirectional, indicated by large, hollow arrows pointing from left to right between each major component.
* **Abstraction:** The "LLMs" block is a black box; the diagram focuses on the input-output transformation rather than the internal model mechanics.
### Interpretation
This diagram serves as a conceptual model for a **Knowledge Graph-to-Text (KG-to-Text)** generation task. It demonstrates how structured, relational data stored in a knowledge graph can be transformed into human-readable prose.
* **What it demonstrates:** The pipeline shows a method for "verbalizing" a knowledge base. The linearization step is crucial, as it formats the graph data into a sequential input that standard LLMs, which are trained on text, can process.
* **Relationships between elements:** The Knowledge Graph is the source of truth containing discrete facts. The Linearization is a necessary intermediate representation. The LLM acts as the "translator" or "renderer," applying its language understanding and generation capabilities to produce fluent text that accurately reflects the source facts.
* **Notable implications:** The presence of typos in the source graph and their propagation highlights a key challenge in real-world systems: the quality of the output is dependent on the quality of the input data. The underlining in the final text emphasizes that the goal is not just to generate any sentence, but to generate one that explicitly conveys the specific entities and relationships from the original graph. This process is fundamental to applications like automated report generation, conversational AI over databases, and creating accessible summaries of complex data.
</details>
Figure 21: The general framework of KG-to-text generation.
#### 5.4.2 Constructing large weakly KG-text aligned Corpus
Although LLMs have achieved remarkable empirical success, their unsupervised pre-training objectives are not necessarily aligned well with the task of KG-to-text generation, motivating researchers to develop large-scale KG-text aligned corpus. Jin et al. [170] propose a 1.3M unsupervised KG-to-graph training data from Wikipedia. Specifically, they first detect the entities appearing in the text via hyperlinks and named entity detectors, and then only add text that shares a common set of entities with the corresponding knowledge graph, similar to the idea of distance supervision in the relation extraction task [232]. They also provide a 1,000+ human annotated KG-to-Text test data to verify the effectiveness of the pre-trained KG-to-Text models. Similarly, Chen et al. [171] also propose a KG-grounded text corpus collected from the English Wikidump. To ensure the connection between KG and text, they only extract sentences with at least two Wikipedia anchor links. Then, they use the entities from those links to query their surrounding neighbors in WikiData and calculate the lexical overlapping between these neighbors and the original sentences. Finally, only highly overlapped pairs are selected. The authors explore both graph-based and sequence-based encoders and identify their advantages in various different tasks and settings.
### 5.5 LLM-augmented KG Question Answering
Knowledge graph question answering (KGQA) aims to find answers to natural language questions based on the structured facts stored in knowledge graphs [233, 234]. The inevitable challenge in KGQA is to retrieve related facts and extend the reasoning advantage of KGs to QA. Therefore, recent studies adopt LLMs to bridge the gap between natural language questions and structured knowledge graphs [175, 235, 174]. The general framework of applying LLMs for KGQA is illustrated in Fig. 22, where LLMs can be used as 1) entity/relation extractors, and 2) answer reasoners.
#### 5.5.1 LLMs as Entity/relation Extractors
Entity/relation extractors are designed to identify entities and relationships mentioned in natural language questions and retrieve related facts in KGs. Given the proficiency in language comprehension, LLMs can be effectively utilized for this purpose. Lukovnikov et al. [172] are the first to utilize LLMs as classifiers for relation prediction, resulting in a notable improvement in performance compared to shallow neural networks. Nan et al. [174] introduce two LLM-based KGQA frameworks that adopt LLMs to detect mentioned entities and relations. Then, they query the answer in KGs using the extracted entity-relation pairs. QA-GNN [131] uses LLMs to encode the question and candidate answer pairs, which are adopted to estimate the importance of relative KG entities. The entities are retrieved to form a subgraph, where an answer reasoning is conducted by a graph neural network. Luo et al. [173] use LLMs to calculate the similarities between relations and questions to retrieve related facts, formulated as
$$
s(r,q)=\text{LLM}(r)^{\top}\text{LLM}(q), \tag{12}
$$
where $q$ denotes the question, $r$ denotes the relation, and $\text{LLM}(\cdot)$ would generate representation for $q$ and $r$ , respectively. Furthermore, Zhang et al. [236] propose a LLM-based path retriever to retrieve question-related relations hop-by-hop and construct several paths. The probability of each path can be calculated as
$$
P(p|q)=\prod_{t=1}^{|p|}s(r_{t},q), \tag{13}
$$
where $p$ denotes the path, and $r_{t}$ denotes the relation at the $t$ -th hop of $p$ . The retrieved relations and paths can be used as context knowledge to improve the performance of answer reasoners as
$$
P(a|q)=\sum_{p\in\mathcal{P}}P(a|p)P(p|q), \tag{14}
$$
where $\mathcal{P}$ denotes retrieved paths and $a$ denotes the answer.
<details>
<summary>x19.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph-Augmented LLM Question Answering System
### Overview
This image is a technical system architecture diagram illustrating a pipeline for answering factual questions using Large Language Models (LLMs) augmented with Knowledge Graphs (KGs). The flow starts with a natural language question at the bottom and progresses upward through extraction, retrieval, and reasoning stages to produce a final score. The diagram uses color-coded boxes and directional arrows to show data flow and component relationships.
### Components/Axes
The diagram is structured into three main horizontal layers or components, from bottom to top:
1. **Bottom Layer: Relation/entity Extractor**
* A large, rounded rectangular box labeled **"Relation/entity Extractor"**.
* Inside this box is a yellow rectangular block labeled **"LLMs"**.
* An upward-pointing arrow enters this box from below, originating from the input question.
2. **Middle Layer: Knowledge Graphs (KGs)**
* A central, light-blue rectangular box labeled **"KGs"**.
* Two smaller boxes feed into the "KGs" box from below:
* A light-blue box labeled **"Neil Armstrong"**, with the label **"Entity"** to its left.
* A darker blue box labeled **"BornIn"**, with the label **"Relation"** to its right.
* Two arrows point upward from the "KGs" box, labeled **"Retrieve in KGs"**, leading to the top layer.
3. **Top Layer: Answer Reasoner**
* A large, rounded rectangular box labeled **"Answer Reasoner"**.
* Inside this box is a yellow rectangular block labeled **"LLMs"**.
* Multiple inputs feed into this "LLMs" block from below, structured as a sequence:
* A white box labeled **"[CLS]"**.
* A yellow box labeled **"Question"**.
* A white box labeled **"[SEP]"**.
* A yellow box labeled **"Related Facts"** (this receives an arrow from the "Retrieve in KGs" process).
* A white box labeled **"[SEP]"**.
* A yellow box labeled **"Candidates"** (this also receives an arrow from the "Retrieve in KGs" process).
* A white box labeled **"[SEP]"**.
* An upward-pointing arrow exits the "Answer Reasoner" box, leading to a final output box.
4. **Output**
* A small, gray box at the very top labeled **"Score"**.
5. **Input**
* Text at the very bottom of the diagram: **"Question: Where was Neil Armstrong born in?"**. An arrow points from this text into the "Relation/entity Extractor".
### Detailed Analysis
The diagram explicitly models the processing of the example question: **"Where was Neil Armstrong born in?"**.
* **Step 1 - Extraction:** The input question is processed by the **Relation/entity Extractor** (powered by LLMs). This component identifies and extracts the key components from the question:
* **Entity:** "Neil Armstrong"
* **Relation:** "BornIn"
* **Step 2 - Retrieval:** The extracted entity and relation are used to query the **KGs** (Knowledge Graphs). The system retrieves relevant information, which is categorized as:
* **Related Facts:** General facts from the KG connected to the entity and relation.
* **Candidates:** Specific potential answer entities (e.g., cities or locations) from the KG.
* **Step 3 - Reasoning:** The retrieved information, along with the original question, is formatted into a specific input sequence for the **Answer Reasoner** (also powered by LLMs). The sequence follows a pattern reminiscent of BERT-style models: `[CLS] Question [SEP] Related Facts [SEP] Candidates [SEP]`. The LLM within the Answer Reasoner processes this combined context.
* **Step 4 - Output:** The final output of the system is a **"Score"**, which likely represents a confidence score or ranking for the candidate answers.
### Key Observations
* **Modular Design:** The system is clearly divided into specialized modules for extraction, retrieval, and reasoning.
* **LLM Core:** LLMs are the core computational engine in both the extraction and reasoning stages.
* **Structured Input:** The Answer Reasoner uses a structured, separator-based input format (`[CLS]`, `[SEP]`) to combine different information sources, suggesting the use of a transformer-based model.
* **Example-Driven:** The entire diagram is annotated with a concrete example ("Neil Armstrong", "BornIn") to illustrate the abstract process.
* **Color Coding:** Yellow is consistently used for LLM components and primary data inputs (Question, Related Facts, Candidates). Blue shades are used for Knowledge Graph elements.
### Interpretation
This diagram illustrates a **Retrieval-Augmented Generation (RAG)** or **Knowledge Graph-Enhanced** approach to question answering. The core idea is to overcome the static knowledge limitations of a base LLM by dynamically retrieving factual information from an external, structured knowledge source (the KG) before generating an answer.
* **What it demonstrates:** The pipeline shows how a complex factual question is decomposed, how relevant knowledge is fetched, and how all information is synthesized by an LLM to produce a reasoned output. It emphasizes a **"retrieve-then-read"** paradigm.
* **Relationships:** The flow is strictly bottom-up and linear: Question → Extraction → Retrieval → Reasoning → Score. The "KGs" component acts as a bridge between the raw question and the final reasoning, providing grounded facts.
* **Notable Design Choices:** The separation of "Related Facts" and "Candidates" suggests the system distinguishes between contextual knowledge and direct answer options. The use of `[CLS]` and `[SEP]` tokens indicates the Answer Reasoner is likely a fine-tuned BERT-like model for a multiple-choice or ranking task, where it scores the plausibility of each "Candidate" given the "Question" and "Related Facts".
* **Underlying Purpose:** This architecture aims to produce more accurate, factual, and verifiable answers compared to an LLM relying solely on its parametric memory. It explicitly incorporates a symbolic knowledge source (KG) to ground the neural model's reasoning.
</details>
Figure 22: The general framework of applying LLMs for knowledge graph question answering (KGQA).
#### 5.5.2 LLMs as Answer Reasoners
Answer reasoners are designed to reason over the retrieved facts and generate answers. LLMs can be used as answer reasoners to generate answers directly. For example, as shown in Fig. 3 22, DEKCOR [175] concatenates the retrieved facts with questions and candidate answers as
$$
x=\texttt{[CLS]}\ q\ \texttt{[SEP]}\ \text{Related Facts}\ \texttt{[SEP]}\ a\
\texttt{[SEP]}, \tag{15}
$$
where $a$ denotes candidate answers. Then, it feeds them into LLMs to predict answer scores. After utilizing LLMs to generate the representation of $x$ as QA context, DRLK [176] proposes a Dynamic Hierarchical Reasoner to capture the interactions between QA context and answers for answer prediction. Yan et al. [235] propose a LLM-based KGQA framework consisting of two stages: (1) retrieve related facts from KGs and (2) generate answers based on the retrieved facts. The first stage is similar to the entity/relation extractors. Given a candidate answer entity $a$ , it extracts a series of paths $p_{1},\ldots,p_{n}$ from KGs. But the second stage is a LLM-based answer reasoner. It first verbalizes the paths by using the entity names and relation names in KGs. Then, it concatenates the question $q$ and all paths $p_{1},\ldots,p_{n}$ to make an input sample as
$$
x=\texttt{[CLS]}\ q\ \texttt{[SEP]}\ p_{1}\ \texttt{[SEP]}\ \cdots\ \texttt{[
SEP]}\ p_{n}\ \texttt{[SEP]}. \tag{16}
$$
These paths are regarded as the related facts for the candidate answer $a$ . Finally, it uses LLMs to predict whether the hypothesis: “ $a$ is the answer of $q$ ” is supported by those facts, which is formulated as
$$
\displaystyle e_{\texttt{[CLS]}} \displaystyle=\text{LLM}(x), \displaystyle s \displaystyle=\sigma(\text{MLP}(e_{\texttt{[CLS]}})), \tag{17}
$$
where it encodes $x$ using a LLM and feeds representation corresponding to [CLS] token for binary classification, and $\sigma(\cdot)$ denotes the sigmoid function.
To better guide LLMs reason through KGs, OreoLM [177] proposes a Knowledge Interaction Layer (KIL) which is inserted amid LLM layers. KIL interacts with a KG reasoning module, where it discovers different reasoning paths, and then the reasoning module can reason over the paths to generate answers. GreaseLM [178] fuses the representations from LLMs and graph neural networks to effectively reason over KG facts and language context. UniKGQA [43] unifies the facts retrieval and reasoning into a unified framework. UniKGQA consists of two modules. The first module is a semantic matching module that uses a LLM to match questions with their corresponding relations semantically. The second module is a matching information propagation module, which propagates the matching information along directed edges on KGs for answer reasoning. Similarly, ReLMKG [179] performs joint reasoning on a large language model and the associated knowledge graph. The question and verbalized paths are encoded by the language model, and different layers of the language model produce outputs that guide a graph neural network to perform message passing. This process utilizes the explicit knowledge contained in the structured knowledge graph for reasoning purposes. StructGPT [237] adopts a customized interface to allow large language models (e.g., ChatGPT) directly reasoning on KGs to perform multi-step question answering.
TABLE IV: Summary of methods that synergize KGs and LLMs.
| Synergized Knowledge representation | JointGT [42] | 2021 |
| --- | --- | --- |
| KEPLER [40] | 2021 | |
| DRAGON [44] | 2022 | |
| HKLM [238] | 2023 | |
| Synergized Reasoning | LARK [45] | 2023 |
| Siyuan et al. [46] | 2023 | |
| KSL [239] | 2023 | |
| StructGPT [237] | 2023 | |
| Think-on-graph [240] | 2023 | |
## 6 Synergized LLMs + KGs
The synergy of LLMs and KGs has attracted increasing attention these years, which marries the merits of LLMs and KGs to mutually enhance performance in various downstream applications. For example, LLMs can be used to understand natural language, while KGs are treated as a knowledge base, which provides factual knowledge. The unification of LLMs and KGs could result in a powerful model for knowledge representation and reasoning.
In this section, we will discuss the state-of-the-art Synergized LLMs + KGs from two perspectives: 1) Synergized Knowledge Representation, and 2) Synergized Reasoning. Representative works are summarized in Table IV.
<details>
<summary>x20.png Details</summary>

### Visual Description
## [Diagram]: Neural Network Architecture for Text and Knowledge Graph Fusion
### Overview
The image displays a technical diagram of a neural network architecture designed to process both textual input and structured knowledge from a knowledge graph. The architecture consists of two primary encoder modules: a **T-encoder** (Text encoder) and a **K-encoder** (Knowledge encoder), which work in parallel before their outputs are fused. The system produces two distinct outputs: **Text Outputs** and **Knowledge Graph Outputs**.
### Components/Axes
The diagram is organized vertically, with data flowing from the bottom (input) to the top (outputs).
**1. Input Layer (Bottom):**
* **Input Text:** A yellow rectangular box at the bottom-left labeled "Input Text". This is the starting point for the textual data stream.
* **Knowledge Graph:** A dashed-line box at the bottom-right containing a graphical representation of a knowledge graph. It consists of seven light-blue circular nodes connected by directed black arrows, indicating relationships. The label "Knowledge Graph" is below this graphic.
**2. T-encoder (Lower Module):**
* A rounded rectangular container labeled "T-encoder" on its left side.
* Contains a sub-label "N Layers" in its top-right corner.
* Inside, a single yellow rectangular box labeled "Self-Attention".
* **Data Flow:** An arrow points upward from "Input Text" into the "Self-Attention" box. Another arrow points upward from the "Self-Attention" box to the K-encoder above.
**3. K-encoder (Upper Module):**
* A larger rounded rectangular container labeled "K-encoder" on its left side.
* Contains a sub-label "M Layers" in its top-right corner.
* **Internal Components:**
* **Left (Text Path):** A yellow rectangular box labeled "Self-Attention". An arrow from the T-encoder's output points into this box.
* **Right (Knowledge Path):** A light-blue rectangular box labeled "Self-Attention". An arrow from the "Knowledge Graph" points into this box.
* **Fusion Module:** A large horizontal rectangle spanning the top of the K-encoder, divided into two colored sections. The left (yellow) section is labeled "Text-Knowledge Fusion Module". The right (light-blue) section is unlabeled but is part of the same module. Arrows from both the left and right "Self-Attention" boxes point upward into this fusion module.
**4. Output Layer (Top):**
* **Text Outputs:** A label at the top-left with an arrow pointing upward from the yellow section of the "Text-Knowledge Fusion Module".
* **Knowledge Graph Outputs:** A label at the top-right with an arrow pointing upward from the blue section of the "Text-Knowledge Fusion Module".
### Detailed Analysis
**Component Isolation & Flow:**
* **Region 1 (Input):** The system takes two parallel inputs: raw text and a structured knowledge graph.
* **Region 2 (T-encoder):** The text undergoes initial processing through `N` layers of self-attention mechanisms, which contextualize the textual tokens.
* **Region 3 (K-encoder):** This is the core fusion engine.
* The pre-processed text from the T-encoder enters a dedicated self-attention layer (yellow).
* Simultaneously, the knowledge graph data enters its own separate self-attention layer (blue), likely to encode the graph structure and node features.
* The outputs from these two parallel self-attention streams are fed into the **Text-Knowledge Fusion Module**. This module is the critical junction where information from the text and the knowledge graph is integrated.
* **Region 4 (Output):** The fused representation is then split to generate two specialized outputs: one optimized for text-based tasks and another for knowledge graph-based tasks.
**Spatial Grounding:**
* The **T-encoder** is positioned in the lower-left quadrant.
* The **Knowledge Graph** diagram is in the lower-right quadrant.
* The **K-encoder** occupies the upper half of the image, centered.
* The **Text-Knowledge Fusion Module** is the topmost internal component of the K-encoder, spanning its full width.
* The **Output labels** are positioned at the very top, aligned with their respective data paths (Text on left, Knowledge Graph on right).
### Key Observations
1. **Dual-Stream Architecture:** The design explicitly maintains separate processing pathways for text and knowledge graph data until the fusion stage, suggesting a need to preserve their distinct structural properties.
2. **Asymmetric Depth:** The T-encoder uses `N` layers, while the K-encoder uses `M` layers. This implies the knowledge fusion and encoding process may require a different (potentially deeper or shallower) level of processing complexity than initial text encoding.
3. **Fusion Before Final Output:** Integration happens within the K-encoder, not at the very end. This allows the fused representation to be further processed by the remaining `M` layers of the K-encoder before generating outputs.
4. **Color Coding:** Yellow consistently represents text-related components ("Input Text", T-encoder's "Self-Attention", left side of fusion module). Light blue represents knowledge-graph-related components (Knowledge Graph nodes, K-encoder's right "Self-Attention", right side of fusion module). This visual scheme reinforces the dual-pathway concept.
### Interpretation
This diagram illustrates a sophisticated **multi-modal neural network architecture** designed for tasks that require reasoning over both unstructured text and structured knowledge (e.g., complex question answering, fact-checking, or enhanced document understanding).
* **What it demonstrates:** The model learns a joint representation of text and knowledge. The T-encoder creates a contextualized understanding of the language. The K-encoder's right branch encodes the relational structure of the knowledge graph. The **Text-Knowledge Fusion Module** is the investigative core—it likely uses mechanisms like cross-attention to allow textual concepts to attend to relevant knowledge graph entities and relations, and vice-versa, creating a unified semantic space.
* **Relationship between elements:** The flow is hierarchical and integrative. Raw inputs are first refined in isolation (T-encoder for text, initial KG encoding), then brought together in a dedicated fusion module within a deeper encoder (K-encoder), and finally projected into task-specific output spaces. This suggests the fused representation is rich enough to support multiple downstream applications.
* **Notable implications:** The separate "Knowledge Graph Outputs" imply the model can perform tasks directly on the graph structure (like link prediction or graph classification) using the infused textual context. The architecture explicitly avoids simply appending knowledge graph embeddings to text embeddings; instead, it promotes deep, layer-wise interaction between the two modalities, which is key for sophisticated reasoning. The variable `N` and `M` layers offer flexibility to tune the model's capacity for each sub-task.
</details>
Figure 23: Synergized knowledge representation by additional KG fusion modules.
### 6.1 Synergized Knowledge Representation
Text corpus and knowledge graphs both contain enormous knowledge. However, the knowledge in text corpus is usually implicit and unstructured, while the knowledge in KGs is explicit and structured. Synergized Knowledge Representation aims to design a synergized model that can effectively represent knowledge from both LLMs and KGs. The synergized model can provide a better understanding of the knowledge from both sources, making it valuable for many downstream tasks.
To jointly represent the knowledge, researchers propose the synergized models by introducing additional KG fusion modules, which are jointly trained with LLMs. As shown in Fig. 23, ERNIE [35] proposes a textual-knowledge dual encoder architecture where a T-encoder first encodes the input sentences, then a K-encoder processes knowledge graphs which are fused them with the textual representation from the T-encoder. BERT-MK [241] employs a similar dual-encoder architecture but it introduces additional information of neighboring entities in the knowledge encoder component during the pre-training of LLMs. However, some of the neighboring entities in KGs may not be relevant to the input text, resulting in extra redundancy and noise. CokeBERT [242] focuses on this issue and proposes a GNN-based module to filter out irrelevant KG entities using the input text. JAKET [243] proposes to fuse the entity information in the middle of the large language model.
KEPLER [40] presents a unified model for knowledge embedding and pre-trained language representation. In KEPLER, they encode textual entity descriptions with a LLM as their embeddings, and then jointly optimize the knowledge embedding and language modeling objectives. JointGT [42] proposes a graph-text joint representation learning model, which proposes three pre-training tasks to align representations of graph and text. DRAGON [44] presents a self-supervised method to pre-train a joint language-knowledge foundation model from text and KG. It takes text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities. Then, DRAGON utilizes two self-supervised reasoning tasks, i.e., masked language modeling and KG link prediction to optimize the model parameters. HKLM [238] introduces a unified LLM which incorporates KGs to learn representations of domain-specific knowledge.
### 6.2 Synergized Reasoning
To better utilize the knowledge from text corpus and knowledge graph reasoning, Synergized Reasoning aims to design a synergized model that can effectively conduct reasoning with both LLMs and KGs.
LLM-KG Fusion Reasoning. LLM-KG Fusion Reasoning leverages two separated LLM and KG encoders to process the text and relevant KG inputs [244]. These two encoders are equally important and jointly fusing the knowledge from two sources for reasoning. To improve the interaction between text and knowledge, KagNet [38] proposes to first encode the input KG, and then augment the input textual representation. In contrast, MHGRN [234] uses the final LLM outputs of the input text to guide the reasoning process on the KGs. Yet, both of them only design a single-direction interaction between the text and KGs. To tackle this issue, QA-GNN [131] proposes to use a GNN-based model to jointly reason over input context and KG information via message passing. Specifically, QA-GNN represents the input textual information as a special node via a pooling operation and connects this node with other entities in KG. However, the textual inputs are only pooled into a single dense vector, limiting the information fusion performance. JointLK [245] then proposes a framework with fine-grained interaction between any tokens in the textual inputs and any KG entities through LM-to-KG and KG-to-LM bi-directional attention mechanism. As shown in Fig. 24, pairwise dot-product scores are calculated over all textual tokens and KG entities, the bi-directional attentive scores are computed separately. In addition, at each jointLK layer, the KGs are also dynamically pruned based on the attention score to allow later layers to focus on more important sub-KG structures. Despite being effective, in JointLK, the fusion process between the input text and KG still uses the final LLM outputs as the input text representations. GreaseLM [178] designs deep and rich interaction between the input text tokens and KG entities at each layer of the LLMs. The architecture and fusion approach is mostly similar to ERNIE [35] discussed in Section 6.1, except that GreaseLM does not use the text-only T-encoder to handle input text.
<details>
<summary>x21.png Details</summary>

### Visual Description
## [Diagram]: System Architecture for Joint Language Model and Knowledge Graph Reasoning
### Overview
This image is a technical system architecture diagram illustrating a model that combines a Large Language Model (LLM) with a Knowledge Graph (KG) for question answering. The process flows from an input question and option, through encoding and joint reasoning layers, to a final answer, incorporating a dynamic pruning mechanism. The diagram is divided into a main flowchart on the left and two detailed sub-diagrams on the right, enclosed in dashed boxes.
### Components/Flow
The diagram is organized into several interconnected blocks and sub-diagrams.
**Main Flowchart (Left Side):**
* **Input (Bottom-Left):** A white box labeled `Question <SEP> Option`. The text "Question" is in orange, and "Option" is in green, separated by `<SEP>`.
* **LLM Encoder (Bottom-Left, above input):** A yellow rectangular box labeled `LLM Encoder`. An arrow points from the input box to this encoder.
* **KG Encoder (Center-Left):** A light blue rectangular box labeled `KG Encoder`. It receives input from a sub-diagram representing a knowledge graph structure.
* **Joint Reasoning Layer (Center):** A large grey rectangular box labeled `Joint Reasoning Layer`. It receives upward arrows from both the `LLM Encoder` and the `KG Encoder`.
* **Dynamic Pruning (Top-Center):** A light blue rectangular box labeled `Dynamic Pruning`. It receives an upward arrow from the `Joint Reasoning Layer`.
* **Answer Inference (Top-Left):** A label `Answer Inference` placed to the left of the `Dynamic Pruning` box, indicating the overall stage.
* **Output (Top-Center):** A white box labeled `Answer` in green text. An arrow points from the `Dynamic Pruning` box to this output.
**Sub-Diagram 1 (Top-Right, dashed box):**
This details the "LM to KG Attention" mechanism.
* **Components:** It shows two graph structures. The left graph has orange nodes (representing LM entities) and light blue nodes (representing KG entities) connected by arrows. A green node is highlighted.
* **Process:** An arrow labeled `LM to KG Attention` points from the left graph to a right graph. The right graph shows the same structure but with some connections marked with an "X" inside a circle, indicating pruning or masking.
* **Text:** The label `LM to KG Attention` is placed above the arrow connecting the two graphs.
**Sub-Diagram 2 (Bottom-Right, dashed box):**
This details the bidirectional attention mechanism between LM and KG representations.
* **Components:**
* Two white boxes: `LM Rep.` (top) and `KG Rep.` (bottom).
* Two circular nodes with a cross inside (⊗), representing attention operations.
* Two circular nodes with a plus inside (⊕), representing summation or fusion operations.
* **Flow & Labels:**
* An arrow from `LM Rep.` goes to the top ⊗ node.
* An arrow from `KG Rep.` goes to the bottom ⊗ node.
* A horizontal arrow labeled `LM to KG Att.` connects the two ⊗ nodes.
* A horizontal arrow labeled `KG to LM Att.` connects the two ⊗ nodes in the opposite direction.
* The output of the top ⊗ node goes to the top ⊕ node.
* The output of the bottom ⊗ node goes to the bottom ⊕ node.
* The original `LM Rep.` and `KG Rep.` also feed directly into their respective ⊕ nodes.
* Final arrows point rightward from the ⊕ nodes, indicating the fused representations.
**Embedded Knowledge Graph Icon (Center-Bottom, dashed box):**
* A small graph icon with orange, light blue, and green nodes connected by arrows. This icon is the source input for the `KG Encoder`.
### Detailed Analysis
The diagram describes a multi-stage pipeline:
1. **Input Encoding:** The input "Question <SEP> Option" is processed by the `LLM Encoder`. Simultaneously, a knowledge graph structure is processed by the `KG Encoder`.
2. **Joint Reasoning:** The outputs from both encoders are fed into the `Joint Reasoning Layer`. This is the core fusion module.
3. **Attention Mechanisms (Detailed on the right):**
* **LM to KG Attention (Top-Right):** This mechanism allows the language model representations to attend to relevant parts of the knowledge graph, effectively pruning irrelevant KG nodes (shown by the crossed-out connections).
* **Bidirectional Attention (Bottom-Right):** This shows a more detailed interaction where `LM Rep.` and `KG Rep.` attend to each other (`LM to KG Att.` and `KG to LM Att.`). The attended features are then combined with the original representations via summation (⊕).
4. **Dynamic Pruning:** The output of the joint reasoning layer undergoes `Dynamic Pruning`, likely to refine the reasoning path or remove noise before final inference.
5. **Output:** The final `Answer` is generated.
### Key Observations
* **Color Coding:** Colors are used consistently: Orange for LM-related elements, Light Blue for KG-related elements and the pruning module, Green for the final answer and a key node, and Yellow for the LLM Encoder.
* **Flow Direction:** The primary data flow is upward from input to output. The sub-diagrams on the right provide lateral, detailed explanations of internal mechanisms.
* **Pruning Visualization:** The top-right sub-diagram explicitly visualizes the pruning concept by showing connections being "crossed out" after attention.
* **Modular Design:** The architecture is highly modular, with clear separation between encoding, reasoning, attention, and pruning stages.
### Interpretation
This diagram represents a sophisticated neuro-symbolic architecture designed to enhance question-answering by grounding a language model's reasoning in a structured knowledge graph. The key innovation appears to be the **dynamic and bidirectional attention mechanisms** between the LM and KG representations, coupled with a **pruning step**.
* **What it demonstrates:** The system doesn't just use the KG as a static lookup; it actively reasons over the graph structure. The "LM to KG Attention" allows the model to focus on relevant KG subgraphs for a given question, while the bidirectional attention in the lower sub-diagram suggests a deep, iterative fusion of textual and structural knowledge.
* **Relationships:** The `Joint Reasoning Layer` is the central hub where the two knowledge sources (textual from LM, structural from KG) interact. The `Dynamic Pruning` module acts as a filter, ensuring the final answer inference is based on the most salient information extracted through this joint reasoning.
* **Purpose:** This architecture likely aims to improve answer accuracy, explainability (by tracing reasoning through the KG), and efficiency (via pruning) compared to using an LLM alone. It addresses the common LLM limitation of hallucination or lack of factual grounding by tightly coupling it with a verified knowledge source.
</details>
Figure 24: The framework of LLM-KG Fusion Reasoning.
LLMs as Agents Reasoning. Instead using two encoders to fuse the knowledge, LLMs can also be treated as agents to interact with the KGs to conduct reasoning [246], as illustrated in Fig. 25. KD-CoT [247] iteratively retrieves facts from KGs and produces faithful reasoning traces, which guide LLMs to generate answers. KSL [239] teaches LLMs to search on KGs to retrieve relevant facts and then generate answers. StructGPT [237] designs several API interfaces to allow LLMs to access the structural data and perform reasoning by traversing on KGs. Think-on-graph [240] provides a flexible plug-and-play framework where LLM agents iteratively execute beam searches on KGs to discover the reasoning paths and generate answers. To enhance the agent abilities, AgentTuning [248] presents several instruction-tuning datasets to guide LLM agents to perform reasoning on KGs.
<details>
<summary>x22.png Details</summary>

### Visual Description
## Diagram: Knowledge Graph and Reasoning-on-Graphs Process
### Overview
This image is a technical diagram illustrating a system that combines a Knowledge Graph (KG) with Large Language Models (LLMs) to answer factual questions. The diagram is divided into two main sections: an upper "Knowledge Graph" containing entities and relationships, and a lower "Reasoning-on-Graphs" process that uses both LLMs and KGs to derive an answer from a given question.
### Components/Axes
The diagram is composed of the following primary components, arranged vertically:
1. **Knowledge Graph (Top Section):**
* Enclosed in a large, dashed, rounded rectangle labeled **"Knowledge Graph"** at the top center.
* Contains nodes (circles) and directed edges (arrows) representing entities and their relationships.
* **Nodes (Entities):**
* `Barack Obama` (Red-outlined circle, bottom-left of the KG)
* `Michelle Obama` (Blue-filled circle, top-left of the KG)
* `Honolulu` (Red-outlined circle, center of the KG)
* `Hawaii` (Blue-filled circle, bottom-right of the KG)
* `USA` (Red-outlined circle, right side of the KG)
* `1776` (Blue-filled circle, top-right of the KG)
* **Edges (Relationships):**
* `Marry_to`: From `Barack Obama` to `Michelle Obama` (Grey arrow)
* `Born_in`: From `Barack Obama` to `Honolulu` (Red arrow)
* `Located_in`: From `Honolulu` to `Hawaii` (Grey arrow)
* `City_of`: From `Honolulu` to `USA` (Red arrow)
* `Founded_in`: From `USA` to `1776` (Grey arrow)
* **LLM Agent Icon:** A yellow square with a black line-art icon of a person's head with gears, located to the left of the `Barack Obama` node, labeled **"LLM agent"**.
2. **Reasoning-on-Graphs (Middle Section):**
* A curved, double-headed arrow labeled **"Reasoning-on-Graphs"** points from the Knowledge Graph down to a smaller dashed box.
* The smaller dashed box contains two stacked rectangles:
* Top rectangle: Yellow, labeled **"LLMs"**.
* Bottom rectangle: Light blue, labeled **"KGs"**.
3. **Question-Answer Flow (Bottom Section):**
* **Question (Left):** Text reading **"Question: Which country is Barack Obama from ?"**.
* **Process Arrow:** A thick, white arrow with a black outline points from the question to the "Reasoning-on-Graphs" box.
* **Answer (Right):** Text reading **"Answer: USA"**. A similar thick arrow points from the "Reasoning-on-Graphs" box to this answer.
### Detailed Analysis
The diagram visually traces the reasoning path to answer the specific question.
* **Highlighted Reasoning Path:** A specific chain of relationships in the Knowledge Graph is emphasized with **red arrows and red-outlined nodes**. This path is:
1. `Barack Obama` --(Born_in)--> `Honolulu`
2. `Honolulu` --(City_of)--> `USA`
* **Spatial Grounding:** The "LLM agent" icon is positioned to the left of the starting node (`Barack Obama`). The legend for node/edge colors is implicit: red indicates the active reasoning path for the current query, while blue/grey indicates other, inactive facts in the graph.
* **Data Flow:** The process is linear:
1. Input: A natural language question.
2. Processing: The "Reasoning-on-Graphs" module, which integrates LLMs (for understanding and generation) and KGs (for structured factual data).
3. Output: A factual answer derived from the graph.
### Key Observations
* The Knowledge Graph contains both relevant and irrelevant information for the given question. Facts like `Marry_to` and `Founded_in` are present but not used in this specific reasoning chain.
* The diagram explicitly shows that the answer "USA" is not directly stated but is inferred through a two-hop path in the graph (`Barack Obama` -> `Honolulu` -> `USA`).
* The use of color (red) is critical for isolating the specific subgraph used for this particular query from the larger knowledge base.
### Interpretation
This diagram demonstrates a **neuro-symbolic AI approach**, where the pattern-matching and language capabilities of LLMs are combined with the explicit, structured knowledge of a Knowledge Graph.
* **What it suggests:** The system can answer complex factual questions by performing multi-hop reasoning over a graph of interconnected entities. The LLM likely parses the question, identifies relevant entities (`Barack Obama`, `country`), and then queries or traverses the KG to find a connecting path.
* **How elements relate:** The Knowledge Graph acts as a structured memory. The "Reasoning-on-Graphs" component is the engine that uses the LLM to navigate this memory. The "LLM agent" icon within the KG box may symbolize the LLM's role in initiating graph queries or interpreting graph structures.
* **Notable Anomalies/Patterns:** The most significant pattern is the **selective activation of a subgraph**. The system doesn't use the entire graph; it dynamically identifies and follows a relevant path. This is efficient and mimics human reasoning. The presence of the `1776` node, while factually correct (USA founded in 1776), is an example of a "distractor" or peripheral fact that the reasoning process correctly ignores for this query. The diagram effectively argues that combining statistical AI (LLMs) with symbolic AI (KGs) leads to more accurate, explainable, and grounded question-answering.
</details>
Figure 25: Using LLMs as agents for reasoning on KGs.
Comparison and Discussion. LLM-KG Fusion Reasoning combines the LLM encoder and KG encoder to represent knowledge in a unified manner. It then employs a synergized reasoning module to jointly reason the results. This framework allows for different encoders and reasoning modules, which are trained end-to-end to effectively utilize the knowledge and reasoning capabilities of LLMs and KGs. However, these additional modules may introduce extra parameters and computational costs while lacking interpretability. LLMs as Agents for KG reasoning provides a flexible framework for reasoning on KGs without additional training cost, which can be generalized to different LLMs and KGs. Meanwhile, the reasoning process is interpretable, which can be used to explain the results. Nevertheless, defining the actions and policies for LLM agents is also challenging. The synergy of LLMs and KGs is still an ongoing research topic, with the potential to have more powerful frameworks in the future.
## 7 Future Directions and Milestones
In this section, we discuss the future directions and several milestones in the research area of unifying KGs and LLMs.
### 7.1 KGs for Hallucination Detection in LLMs
The hallucination problem in LLMs, which generates factually incorrect content, significantly hinders the reliability of LLMs. As discussed in Section 4, existing studies try to utilize KGs to obtain more reliable LLMs through pre-training or KG-enhanced inference. Despite the efforts, the issue of hallucination may continue to persist in the realm of LLMs for the foreseeable future. Consequently, in order to gain the public’s trust and border applications, it is imperative to detect and assess instances of hallucination within LLMs and other forms of AI-generated content (AIGC). Existing methods strive to detect hallucination by training a neural classifier on a small set of documents [249], which are neither robust nor powerful to handle ever-growing LLMs. Recently, researchers try to use KGs as an external source to validate LLMs [250]. Further studies combine LLMs and KGs to achieve a generalized fact-checking model that can detect hallucinations across domains [251]. Therefore, it opens a new door to utilizing KGs for hallucination detection.
### 7.2 KGs for Editing Knowledge in LLMs
Although LLMs are capable of storing massive real-world knowledge, they cannot quickly update their internal knowledge updated as real-world situations change. There are some research efforts proposed for editing knowledge in LLMs [252] without re-training the whole LLMs. Yet, such solutions still suffer from poor performance or computational overhead [253]. Existing studies [254] also reveal that edit a single fact would cause a ripple effect for other related knowledge. Therefore, it is necessary to develop a more efficient and effective method to edit knowledge in LLMs. Recently, researchers try to leverage KGs to edit knowledge in LLMs efficiently.
### 7.3 KGs for Black-box LLMs Knowledge Injection
Although pre-training and knowledge editing could update LLMs to catch up with the latest knowledge, they still need to access the internal structures and parameters of LLMs. However, many state-of-the-art large LLMs (e.g., ChatGPT) only provide APIs for users and developers to access, making themselves black-box to the public. Consequently, it is impossible to follow conventional KG injection approaches described [244, 38] that change LLM structure by adding additional knowledge fusion modules. Converting various types of knowledge into different text prompts seems to be a feasible solution. However, it is unclear whether these prompts can generalize well to new LLMs. Moreover, the prompt-based approach is limited to the length of input tokens of LLMs. Therefore, how to enable effective knowledge injection for black-box LLMs is still an open question for us to explore [255, 256].
### 7.4 Multi-Modal LLMs for KGs
Current knowledge graphs typically rely on textual and graph structure to handle KG-related applications. However, real-world knowledge graphs are often constructed by data from diverse modalities [257, 258, 99]. Therefore, effectively leveraging representations from multiple modalities would be a significant challenge for future research in KGs [259]. One potential solution is to develop methods that can accurately encode and align entities across different modalities. Recently, with the development of multi-modal LLMs [260, 98], leveraging LLMs for modality alignment holds promise in this regard. But, bridging the gap between multi-modal LLMs and KG structure remains a crucial challenge in this field, demanding further investigation and advancements.
### 7.5 LLMs for Understanding KG Structure
Conventional LLMs trained on plain text data are not designed to understand structured data like knowledge graphs. Thus, LLMs might not fully grasp or understand the information conveyed by the KG structure. A straightforward way is to linearize the structured data into a sentence that LLMs can understand. However, the scale of the KGs makes it impossible to linearize the whole KGs as input. Moreover, the linearization process may lose some underlying information in KGs. Therefore, it is necessary to develop LLMs that can directly understand the KG structure and reason over it [237].
<details>
<summary>x23.png Details</summary>

### Visual Description
## Diagram: Three-Stage Process Flow for LLM and Knowledge Graph Integration
### Overview
The image is a process flow diagram illustrating a three-stage progression for integrating Large Language Models (LLMs) and Knowledge Graphs (KGs). The diagram uses color-coded stages and directional arrows to show the flow from separate components to a synergized system, which then enables specific advanced capabilities.
### Components/Axes
The diagram is structured horizontally into three distinct stages, each represented by a colored arrow-shaped header at the top and corresponding content boxes below.
**Stage Headers (Top Row):**
* **Stage 1:** A yellow-orange arrow pointing right, containing the text "Stage 1".
* **Stage 2:** A light purple arrow pointing right, containing the text "Stage 2".
* **Stage 3:** A light green arrow pointing right, containing the text "Stage 3".
**Process Flow Components (Main Area):**
* **Stage 1 (Left, Yellow-Orange Boxes):**
* Top Box: "KG-enhanced LLMs"
* Bottom Box: "LLM-augmented KGs"
* A black, curved bracket connects the right sides of these two boxes, leading to a single arrow pointing to Stage 2.
* **Stage 2 (Center, Purple Box):**
* A single box labeled "Synergized LLMs + KGs".
* A black arrow originates from the right side of this box and splits into three separate arrows pointing to the three boxes in Stage 3.
* **Stage 3 (Right, Green Boxes):**
* Top Box: "Graph Structure Understanding"
* Middle Box: "Multi-modality"
* Bottom Box: "Knowledge Updating"
### Detailed Analysis
The diagram depicts a clear, linear-then-branching flow:
1. **Input (Stage 1):** Two parallel, complementary approaches are presented: enhancing LLMs with knowledge graphs and augmenting knowledge graphs using LLMs.
2. **Integration (Stage 2):** These two approaches converge and synergize into a unified system ("Synergized LLMs + KGs").
3. **Output/Capabilities (Stage 3):** The synergized system enables three distinct advanced capabilities or research directions, shown as parallel outputs.
**Spatial Grounding & Color Consistency:**
* The legend (stage headers) is positioned at the top of the image.
* The color of each stage header (yellow-orange, purple, light green) is consistently applied to the borders of the corresponding process boxes below it, confirming the grouping.
* The flow is strictly left-to-right, with Stage 1 on the far left, Stage 2 in the center, and Stage 3 on the far right.
### Key Observations
* The diagram emphasizes **synergy** as the critical middle step (Stage 2) that transforms two separate research streams into a foundation for new capabilities.
* The three outcomes in Stage 3 are presented as co-equal, parallel results of the synergy, not as a sequential process.
* The use of color is functional, clearly demarcating the three phases of the process.
### Interpretation
This diagram outlines a conceptual framework for the evolution of AI systems combining LLMs and KGs. It suggests that the field is moving beyond treating these technologies in isolation or in simple service of one another (Stage 1). The core thesis is that their true potential is unlocked through deep integration and mutual enhancement (Stage 2). This integrated "synergized" system is not an end in itself but a platform that enables solving more complex, structural problems: understanding graph-based knowledge natively, processing multiple data modalities, and dynamically updating its knowledge base (Stage 3). The progression implies that capabilities like robust reasoning over structured data, cross-modal understanding, and continuous learning are dependent on this foundational synthesis of neural (LLM) and symbolic (KG) approaches.
</details>
Figure 26: The milestones of unifying KGs and LLMs.
### 7.6 Synergized LLMs and KGs for Birectional Reasoning
KGs and LLMs are two complementary technologies that can synergize each other. However, the synergy of LLMs and KGs is less explored by existing researchers. A desired synergy of LLMs and KGs would involve leveraging the strengths of both technologies to overcome their individual limitations. LLMs, such as ChatGPT, excel in generating human-like text and understanding natural language, while KGs are structured databases that capture and represent knowledge in a structured manner. By combining their capabilities, we can create a powerful system that benefits from the contextual understanding of LLMs and the structured knowledge representation of KGs. To better unify LLMs and KGs, many advanced techniques need to be incorporated, such as multi-modal learning [261], graph neural network [262], and continuous learning [263]. Last, the synergy of LLMs and KGs can be applied to many real-world applications, such as search engines [100], recommender systems [10, 89], and drug discovery.
With a given application problem, we can apply a KG to perform a knowledge-driven search for potential goals and unseen data, and simultaneously start with LLMs to perform a data/text-driven inference to see what new data/goal items can be derived. When the knowledge-based search is combined with data/text-driven inference, they can mutually validate each other, resulting in efficient and effective solutions powered by dual-driving wheels. Therefore, we can anticipate increasing attention to unlock the potential of integrating KGs and LLMs for diverse downstream applications with both generative and reasoning capabilities in the near future.
## 8 Conclusion
Unifying large language models (LLMs) and knowledge graphs (KGs) is an active research direction that has attracted increasing attention from both academia and industry. In this article, we provide a thorough overview of the recent research in this field. We first introduce different manners that integrate KGs to enhance LLMs. Then, we introduce existing methods that apply LLMs for KGs and establish taxonomy based on varieties of KG tasks. Finally, we discuss the challenges and future directions in this field. We envision that there will be multiple stages (milestones) in the roadmap of unifying KGs and LLMs, as shown in Fig. 26. In particular, we will anticipate increasing research on three stages: Stage 1: KG-enhanced LLMs, LLM-augmented KGs, Stage 2: Synergized LLMs + KGs, and Stage 3: Graph Structure Understanding, Multi-modality, Knowledge Updating. We hope that this article will provide a guideline to advance future research.
## Acknowledgments
This research was supported by the Australian Research Council (ARC) under grants FT210100097 and DP240101547 and the National Natural Science Foundation of China (NSFC) under grant 62120106008.
## References
- [1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- [2] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
- [3] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
- [4] D. Su, Y. Xu, G. I. Winata, P. Xu, H. Kim, Z. Liu, and P. Fung, “Generalizing question answering system with pre-trained language model fine-tuning,” in Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019, pp. 203–211.
- [5] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in ACL, 2020, pp. 7871–7880.
- [6] J. Li, T. Tang, W. X. Zhao, and J.-R. Wen, “Pretrained language models for text generation: A survey,” arXiv preprint arXiv:2105.10311, 2021.
- [7] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler et al., “Emergent abilities of large language models,” Transactions on Machine Learning Research.
- [8] K. Malinka, M. Perešíni, A. Firc, O. Hujňák, and F. Januš, “On the educational impact of chatgpt: Is artificial intelligence ready to obtain a university degree?” arXiv preprint arXiv:2303.11146, 2023.
- [9] Z. Li, C. Wang, Z. Liu, H. Wang, S. Wang, and C. Gao, “Cctest: Testing and repairing code completion systems,” ICSE, 2023.
- [10] J. Liu, C. Liu, R. Lv, K. Zhou, and Y. Zhang, “Is chatgpt a good recommender? a preliminary study,” arXiv preprint arXiv:2304.10149, 2023.
- [11] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
- [12] X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020.
- [13] J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, B. Yin, and X. Hu, “Harnessing the power of llms in practice: A survey on chatgpt and beyond,” arXiv preprint arXiv:2304.13712, 2023.
- [14] F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, “Language models as knowledge bases?” in EMNLP-IJCNLP, 2019, pp. 2463–2473.
- [15] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
- [16] H. Zhang, H. Song, S. Li, M. Zhou, and D. Song, “A survey of controllable text generation using transformer-based pre-trained language models,” arXiv preprint arXiv:2201.05337, 2022.
- [17] M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen, “A survey of the state of explainable ai for natural language processing,” arXiv preprint arXiv:2010.00711, 2020.
- [18] J. Wang, X. Hu, W. Hou, H. Chen, R. Zheng, Y. Wang, L. Yang, H. Huang, W. Ye, X. Geng et al., “On the robustness of chatgpt: An adversarial and out-of-distribution perspective,” arXiv preprint arXiv:2302.12095, 2023.
- [19] S. Ji, S. Pan, E. Cambria, P. Marttinen, and S. Y. Philip, “A survey on knowledge graphs: Representation, acquisition, and applications,” IEEE TNNLS, vol. 33, no. 2, pp. 494–514, 2021.
- [20] D. Vrandečić and M. Krötzsch, “Wikidata: a free collaborative knowledgebase,” Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014.
- [21] S. Hu, L. Zou, and X. Zhang, “A state-transition framework to answer complex questions over knowledge base,” in EMNLP, 2018, pp. 2098–2108.
- [22] J. Zhang, B. Chen, L. Zhang, X. Ke, and H. Ding, “Neural, symbolic and neural-symbolic reasoning on knowledge graphs,” AI Open, vol. 2, pp. 14–35, 2021.
- [23] B. Abu-Salih, “Domain-specific knowledge graphs: A survey,” Journal of Network and Computer Applications, vol. 185, p. 103076, 2021.
- [24] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, K. Jayant, L. Ni, M. Kathryn, M. Thahir, N. Ndapandula, P. Emmanouil, R. Alan, S. Mehdi, S. Burr, W. Derry, G. Abhinav, C. Xi, S. Abulhair, and W. Joel, “Never-ending learning,” Communications of the ACM, vol. 61, no. 5, pp. 103–115, 2018.
- [25] L. Zhong, J. Wu, Q. Li, H. Peng, and X. Wu, “A comprehensive survey on automatic knowledge graph construction,” arXiv preprint arXiv:2302.05019, 2023.
- [26] L. Yao, C. Mao, and Y. Luo, “Kg-bert: Bert for knowledge graph completion,” arXiv preprint arXiv:1909.03193, 2019.
- [27] L. Luo, Y.-F. Li, G. Haffari, and S. Pan, “Normalizing flow-based neural process for few-shot knowledge graph completion,” SIGIR, 2023.
- [28] Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia, Z. Ji, T. Yu, W. Chung et al., “A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity,” arXiv preprint arXiv:2302.04023, 2023.
- [29] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
- [30] O. Golovneva, M. Chen, S. Poff, M. Corredor, L. Zettlemoyer, M. Fazel-Zarandi, and A. Celikyilmaz, “Roscoe: A suite of metrics for scoring step-by-step reasoning,” ICLR, 2023.
- [31] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: a core of semantic knowledge,” in WWW, 2007, pp. 697–706.
- [32] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka, and T. Mitchell, “Toward an architecture for never-ending language learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 24, no. 1, 2010, pp. 1306–1313.
- [33] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” NeurIPS, vol. 26, 2013.
- [34] G. Wan, S. Pan, C. Gong, C. Zhou, and G. Haffari, “Reasoning like human: Hierarchical reinforcement learning for knowledge graph reasoning,” in AAAI, 2021, pp. 1926–1932.
- [35] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced language representation with informative entities,” in ACL, 2019, pp. 1441–1451.
- [36] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, and P. Wang, “K-BERT: enabling language representation with knowledge graph,” in AAAI, 2020, pp. 2901–2908.
- [37] Y. Liu, Y. Wan, L. He, H. Peng, and P. S. Yu, “KG-BART: knowledge graph-augmented BART for generative commonsense reasoning,” in AAAI, 2021, pp. 6418–6425.
- [38] B. Y. Lin, X. Chen, J. Chen, and X. Ren, “KagNet: Knowledge-aware graph networks for commonsense reasoning,” in EMNLP-IJCNLP, 2019, pp. 2829–2839.
- [39] D. Dai, L. Dong, Y. Hao, Z. Sui, B. Chang, and F. Wei, “Knowledge neurons in pretrained transformers,” arXiv preprint arXiv:2104.08696, 2021.
- [40] X. Wang, T. Gao, Z. Zhu, Z. Zhang, Z. Liu, J. Li, and J. Tang, “KEPLER: A unified model for knowledge embedding and pre-trained language representation,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 176–194, 2021.
- [41] I. Melnyk, P. Dognin, and P. Das, “Grapher: Multi-stage knowledge graph construction using pretrained language models,” in NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- [42] P. Ke, H. Ji, Y. Ran, X. Cui, L. Wang, L. Song, X. Zhu, and M. Huang, “JointGT: Graph-text joint representation learning for text generation from knowledge graphs,” in ACL Finding, 2021, pp. 2526–2538.
- [43] J. Jiang, K. Zhou, W. X. Zhao, and J.-R. Wen, “Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph,” ICLR 2023, 2023.
- [44] M. Yasunaga, A. Bosselut, H. Ren, X. Zhang, C. D. Manning, P. S. Liang, and J. Leskovec, “Deep bidirectional language-knowledge graph pretraining,” NeurIPS, vol. 35, pp. 37 309–37 323, 2022.
- [45] N. Choudhary and C. K. Reddy, “Complex logical reasoning over knowledge graphs using large language models,” arXiv preprint arXiv:2305.01157, 2023.
- [46] S. Wang, Z. Wei, J. Xu, and Z. Fan, “Unifying structure reasoning and language model pre-training for complex reasoning,” arXiv preprint arXiv:2301.08913, 2023.
- [47] C. Zhen, Y. Shang, X. Liu, Y. Li, Y. Chen, and D. Zhang, “A survey on knowledge-enhanced pre-trained language models,” arXiv preprint arXiv:2212.13428, 2022.
- [48] X. Wei, S. Wang, D. Zhang, P. Bhatia, and A. Arnold, “Knowledge enhanced pretrained language models: A compreshensive survey,” arXiv preprint arXiv:2110.08455, 2021.
- [49] D. Yin, L. Dong, H. Cheng, X. Liu, K.-W. Chang, F. Wei, and J. Gao, “A survey of knowledge-intensive nlp with pre-trained language models,” arXiv preprint arXiv:2202.08772, 2022.
- [50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NeurIPS, vol. 30, 2017.
- [51] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” in ICLR, 2019.
- [52] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.
- [53] K. Hakala and S. Pyysalo, “Biomedical named entity recognition with multilingual bert,” in Proceedings of the 5th workshop on BioNLP open shared tasks, 2019, pp. 56–61.
- [54] Y. Tay, M. Dehghani, V. Q. Tran, X. Garcia, J. Wei, X. Wang, H. W. Chung, D. Bahri, T. Schuster, S. Zheng et al., “Ul2: Unifying language learning paradigms,” in ICLR, 2022.
- [55] V. Sanh, A. Webson, C. Raffel, S. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, A. Raja, M. Dey et al., “Multitask prompted training enables zero-shot task generalization,” in ICLR, 2022.
- [56] B. Zoph, I. Bello, S. Kumar, N. Du, Y. Huang, J. Dean, N. Shazeer, and W. Fedus, “St-moe: Designing stable and transferable sparse expert models,” URL https://arxiv. org/abs/2202.08906, 2022.
- [57] A. Zeng, X. Liu, Z. Du, Z. Wang, H. Lai, M. Ding, Z. Yang, Y. Xu, W. Zheng, X. Xia, W. L. Tam, Z. Ma, Y. Xue, J. Zhai, W. Chen, Z. Liu, P. Zhang, Y. Dong, and J. Tang, “GLM-130b: An open bilingual pre-trained model,” in ICLR, 2023.
- [58] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mt5: A massively multilingual pre-trained text-to-text transformer,” in NAACL, 2021, pp. 483–498.
- [59] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- [60] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” NeurIPS, vol. 35, pp. 27 730–27 744, 2022.
- [61] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- [62] E. Saravia, “Prompt Engineering Guide,” https://github.com/dair-ai/Prompt-Engineering-Guide, 2022, accessed: 2022-12.
- [63] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. H. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” in NeurIPS.
- [64] S. Li, Y. Gao, H. Jiang, Q. Yin, Z. Li, X. Yan, C. Zhang, and B. Yin, “Graph reasoning for question answering with triplet retrieval,” in ACL, 2023.
- [65] Y. Wen, Z. Wang, and J. Sun, “Mindmap: Knowledge graph prompting sparks graph of thoughts in large language models,” arXiv preprint arXiv:2308.09729, 2023.
- [66] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A collaboratively created graph database for structuring human knowledge,” in SIGMOD, 2008, pp. 1247–1250.
- [67] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The Semantic Web: 6th International Semantic Web Conference. Springer, 2007, pp. 722–735.
- [68] B. Xu, Y. Xu, J. Liang, C. Xie, B. Liang, W. Cui, and Y. Xiao, “Cn-dbpedia: A never-ending chinese knowledge extraction system,” in 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems. Springer, 2017, pp. 428–438.
- [69] P. Hai-Nyzhnyk, “Vikidia as a universal multilingual online encyclopedia for children,” The Encyclopedia Herald of Ukraine, vol. 14, 2022.
- [70] F. Ilievski, P. Szekely, and B. Zhang, “Cskg: The commonsense knowledge graph,” Extended Semantic Web Conference (ESWC), 2021.
- [71] R. Speer, J. Chin, and C. Havasi, “Conceptnet 5.5: An open multilingual graph of general knowledge,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017.
- [72] H. Ji, P. Ke, S. Huang, F. Wei, X. Zhu, and M. Huang, “Language generation with multi-hop reasoning on commonsense knowledge graph,” in EMNLP, 2020, pp. 725–736.
- [73] J. D. Hwang, C. Bhagavatula, R. Le Bras, J. Da, K. Sakaguchi, A. Bosselut, and Y. Choi, “(comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs,” in AAAI, vol. 35, no. 7, 2021, pp. 6384–6392.
- [74] H. Zhang, X. Liu, H. Pan, Y. Song, and C. W.-K. Leung, “Aser: A large-scale eventuality knowledge graph,” in Proceedings of the web conference 2020, 2020, pp. 201–211.
- [75] H. Zhang, D. Khashabi, Y. Song, and D. Roth, “Transomcs: from linguistic graphs to commonsense knowledge,” in IJCAI, 2021, pp. 4004–4010.
- [76] Z. Li, X. Ding, T. Liu, J. E. Hu, and B. Van Durme, “Guided generation of cause and effect,” in IJCAI, 2020.
- [77] O. Bodenreider, “The unified medical language system (umls): integrating biomedical terminology,” Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004.
- [78] Y. Liu, Q. Zeng, J. Ordieres Meré, and H. Yang, “Anticipating stock market of the renowned companies: a knowledge graph approach,” Complexity, vol. 2019, 2019.
- [79] Y. Zhu, W. Zhou, Y. Xu, J. Liu, Y. Tan et al., “Intelligent learning for knowledge graph towards geological data,” Scientific Programming, vol. 2017, 2017.
- [80] W. Choi and H. Lee, “Inference of biomedical relations among chemicals, genes, diseases, and symptoms using knowledge representation learning,” IEEE Access, vol. 7, pp. 179 373–179 384, 2019.
- [81] F. Farazi, M. Salamanca, S. Mosbach, J. Akroyd, A. Eibeck, L. K. Aditya, A. Chadzynski, K. Pan, X. Zhou, S. Zhang et al., “Knowledge graph approach to combustion chemistry and interoperability,” ACS omega, vol. 5, no. 29, pp. 18 342–18 348, 2020.
- [82] X. Wu, T. Jiang, Y. Zhu, and C. Bu, “Knowledge graph for china’s genealogy,” IEEE TKDE, vol. 35, no. 1, pp. 634–646, 2023.
- [83] X. Zhu, Z. Li, X. Wang, X. Jiang, P. Sun, X. Wang, Y. Xiao, and N. J. Yuan, “Multi-modal knowledge graph construction and application: A survey,” IEEE TKDE, 2022.
- [84] S. Ferrada, B. Bustos, and A. Hogan, “Imgpedia: a linked dataset with content-based analysis of wikimedia images,” in The Semantic Web–ISWC 2017. Springer, 2017, pp. 84–93.
- [85] Y. Liu, H. Li, A. Garcia-Duran, M. Niepert, D. Onoro-Rubio, and D. S. Rosenblum, “Mmkg: multi-modal knowledge graphs,” in The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16. Springer, 2019, pp. 459–474.
- [86] M. Wang, H. Wang, G. Qi, and Q. Zheng, “Richpedia: a large-scale, comprehensive multi-modal knowledge graph,” Big Data Research, vol. 22, p. 100159, 2020.
- [87] B. Shi, L. Ji, P. Lu, Z. Niu, and N. Duan, “Knowledge aware semantic concept expansion for image-text matching.” in IJCAI, vol. 1, 2019, p. 2.
- [88] S. Shah, A. Mishra, N. Yadati, and P. P. Talukdar, “Kvqa: Knowledge-aware visual question answering,” in AAAI, vol. 33, no. 01, 2019, pp. 8876–8884.
- [89] R. Sun, X. Cao, Y. Zhao, J. Wan, K. Zhou, F. Zhang, Z. Wang, and K. Zheng, “Multi-modal knowledge graphs for recommender systems,” in CIKM, 2020, pp. 1405–1414.
- [90] S. Deng, C. Wang, Z. Li, N. Zhang, Z. Dai, H. Chen, F. Xiong, M. Yan, Q. Chen, M. Chen, J. Chen, J. Z. Pan, B. Hooi, and H. Chen, “Construction and applications of billion-scale pre-trained multimodal business knowledge graph,” in ICDE, 2023.
- [91] C. Rosset, C. Xiong, M. Phan, X. Song, P. Bennett, and S. Tiwary, “Knowledge-aware language model pretraining,” arXiv preprint arXiv:2007.00655, 2020.
- [92] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” in NeurIPS, vol. 33, 2020, pp. 9459–9474.
- [93] Y. Zhu, X. Wang, J. Chen, S. Qiao, Y. Ou, Y. Yao, S. Deng, H. Chen, and N. Zhang, “Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities,” arXiv preprint arXiv:2305.13168, 2023.
- [94] Z. Zhang, X. Liu, Y. Zhang, Q. Su, X. Sun, and B. He, “Pretrain-kge: learning knowledge representation from pretrained language models,” in EMNLP Finding, 2020, pp. 259–266.
- [95] A. Kumar, A. Pandey, R. Gadia, and M. Mishra, “Building knowledge graph using pre-trained language model for learning entity-aware relationships,” in 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON). IEEE, 2020, pp. 310–315.
- [96] X. Xie, N. Zhang, Z. Li, S. Deng, H. Chen, F. Xiong, M. Chen, and H. Chen, “From discrimination to generation: Knowledge graph completion with generative transformer,” in WWW, 2022, pp. 162–165.
- [97] Z. Chen, C. Xu, F. Su, Z. Huang, and Y. Dou, “Incorporating structured sentences with time-enhanced bert for fully-inductive temporal relation prediction,” SIGIR, 2023.
- [98] D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “Minigpt-4: Enhancing vision-language understanding with advanced large language models,” arXiv preprint arXiv:2304.10592, 2023.
- [99] M. Warren, D. A. Shamma, and P. J. Hayes, “Knowledge engineering with image data in real-world settings,” in AAAI, ser. CEUR Workshop Proceedings, vol. 2846, 2021.
- [100] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du et al., “Lamda: Language models for dialog applications,” arXiv preprint arXiv:2201.08239, 2022.
- [101] Y. Sun, S. Wang, S. Feng, S. Ding, C. Pang, J. Shang, J. Liu, X. Chen, Y. Zhao, Y. Lu et al., “Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation,” arXiv preprint arXiv:2107.02137, 2021.
- [102] T. Shen, Y. Mao, P. He, G. Long, A. Trischler, and W. Chen, “Exploiting structured knowledge in text via graph-guided representation learning,” in EMNLP, 2020, pp. 8980–8994.
- [103] D. Zhang, Z. Yuan, Y. Liu, F. Zhuang, H. Chen, and H. Xiong, “E-bert: A phrase and product knowledge enhanced language model for e-commerce,” arXiv preprint arXiv:2009.02835, 2020.
- [104] S. Li, X. Li, L. Shang, C. Sun, B. Liu, Z. Ji, X. Jiang, and Q. Liu, “Pre-training language models with deterministic factual knowledge,” in EMNLP, 2022, pp. 11 118–11 131.
- [105] M. Kang, J. Baek, and S. J. Hwang, “Kala: Knowledge-augmented language model adaptation,” in NAACL, 2022, pp. 5144–5167.
- [106] W. Xiong, J. Du, W. Y. Wang, and V. Stoyanov, “Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model,” in ICLR, 2020.
- [107] T. Sun, Y. Shao, X. Qiu, Q. Guo, Y. Hu, X. Huang, and Z. Zhang, “CoLAKE: Contextualized language and knowledge embedding,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 3660–3670.
- [108] T. Zhang, C. Wang, N. Hu, M. Qiu, C. Tang, X. He, and J. Huang, “DKPLM: decomposable knowledge-enhanced pre-trained language model for natural language understanding,” in AAAI, 2022, pp. 11 703–11 711.
- [109] J. Wang, W. Huang, M. Qiu, Q. Shi, H. Wang, X. Li, and M. Gao, “Knowledge prompting in pre-trained language model for natural language understanding,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 3164–3177.
- [110] H. Ye, N. Zhang, S. Deng, X. Chen, H. Chen, F. Xiong, X. Chen, and H. Chen, “Ontology-enhanced prompt-tuning for few-shot learning,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 778–787.
- [111] H. Luo, Z. Tang, S. Peng, Y. Guo, W. Zhang, C. Ma, G. Dong, M. Song, W. Lin et al., “Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models,” arXiv preprint arXiv:2310.08975, 2023.
- [112] L. Luo, Y.-F. Li, G. Haffari, and S. Pan, “Reasoning on graphs: Faithful and interpretable large language model reasoning,” arXiv preprint arxiv:2310.01061, 2023.
- [113] R. Logan, N. F. Liu, M. E. Peters, M. Gardner, and S. Singh, “Barack’s wife hillary: Using knowledge graphs for fact-aware language modeling,” in ACL, 2019, pp. 5962–5971.
- [114] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “Realm: Retrieval-augmented language model pre-training,” in ICML, 2020.
- [115] Y. Wu, Y. Zhao, B. Hu, P. Minervini, P. Stenetorp, and S. Riedel, “An efficient memory-augmented transformer for knowledge-intensive NLP tasks,” in EMNLP, 2022, pp. 5184–5196.
- [116] L. Luo, J. Ju, B. Xiong, Y.-F. Li, G. Haffari, and S. Pan, “Chatrule: Mining logical rules with large language models for knowledge graph reasoning,” arXiv preprint arXiv:2309.01538, 2023.
- [117] J. Wang, Q. Sun, N. Chen, X. Li, and M. Gao, “Boosting language models reasoning with chain-of-knowledge prompting,” arXiv preprint arXiv:2306.06427, 2023.
- [118] Z. Jiang, F. F. Xu, J. Araki, and G. Neubig, “How can we know what language models know?” Transactions of the Association for Computational Linguistics, vol. 8, pp. 423–438, 2020.
- [119] T. Shin, Y. Razeghi, R. L. Logan IV, E. Wallace, and S. Singh, “Autoprompt: Eliciting knowledge from language models with automatically generated prompts,” arXiv preprint arXiv:2010.15980, 2020.
- [120] Z. Meng, F. Liu, E. Shareghi, Y. Su, C. Collins, and N. Collier, “Rewire-then-probe: A contrastive recipe for probing biomedical knowledge of pre-trained language models,” arXiv preprint arXiv:2110.08173, 2021.
- [121] L. Luo, T.-T. Vu, D. Phung, and G. Haffari, “Systematic assessment of factual knowledge in large language models,” in EMNLP, 2023.
- [122] V. Swamy, A. Romanou, and M. Jaggi, “Interpreting language models through knowledge graph extraction,” arXiv preprint arXiv:2111.08546, 2021.
- [123] S. Li, X. Li, L. Shang, Z. Dong, C. Sun, B. Liu, Z. Ji, X. Jiang, and Q. Liu, “How pre-trained language models capture factual knowledge? a causal-inspired analysis,” arXiv preprint arXiv:2203.16747, 2022.
- [124] H. Tian, C. Gao, X. Xiao, H. Liu, B. He, H. Wu, H. Wang, and F. Wu, “SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis,” in ACL, 2020, pp. 4067–4076.
- [125] W. Yu, C. Zhu, Y. Fang, D. Yu, S. Wang, Y. Xu, M. Zeng, and M. Jiang, “Dict-BERT: Enhancing language model pre-training with dictionary,” in ACL, 2022, pp. 1907–1918.
- [126] T. McCoy, E. Pavlick, and T. Linzen, “Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference,” in ACL, 2019, pp. 3428–3448.
- [127] D. Wilmot and F. Keller, “Memory and knowledge augmented language models for inferring salience in long-form stories,” in EMNLP, 2021, pp. 851–865.
- [128] L. Adolphs, S. Dhuliawala, and T. Hofmann, “How to query language models?” arXiv preprint arXiv:2108.01928, 2021.
- [129] M. Sung, J. Lee, S. Yi, M. Jeon, S. Kim, and J. Kang, “Can language models be biomedical knowledge bases?” in EMNLP, 2021, pp. 4723–4734.
- [130] A. Mallen, A. Asai, V. Zhong, R. Das, H. Hajishirzi, and D. Khashabi, “When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories,” arXiv preprint arXiv:2212.10511, 2022.
- [131] M. Yasunaga, H. Ren, A. Bosselut, P. Liang, and J. Leskovec, “QA-GNN: Reasoning with language models and knowledge graphs for question answering,” in NAACL, 2021, pp. 535–546.
- [132] M. Nayyeri, Z. Wang, M. Akter, M. M. Alam, M. R. A. H. Rony, J. Lehmann, S. Staab et al., “Integrating knowledge graph embedding and pretrained language models in hypercomplex spaces,” arXiv preprint arXiv:2208.02743, 2022.
- [133] N. Huang, Y. R. Deshpande, Y. Liu, H. Alberts, K. Cho, C. Vania, and I. Calixto, “Endowing language models with multimodal knowledge graph representations,” arXiv preprint arXiv:2206.13163, 2022.
- [134] M. M. Alam, M. R. A. H. Rony, M. Nayyeri, K. Mohiuddin, M. M. Akter, S. Vahdati, and J. Lehmann, “Language model guided knowledge graph embeddings,” IEEE Access, vol. 10, pp. 76 008–76 020, 2022.
- [135] X. Wang, Q. He, J. Liang, and Y. Xiao, “Language models as knowledge embeddings,” arXiv preprint arXiv:2206.12617, 2022.
- [136] N. Zhang, X. Xie, X. Chen, S. Deng, C. Tan, F. Huang, X. Cheng, and H. Chen, “Reasoning through memorization: Nearest neighbor knowledge graph embeddings,” arXiv preprint arXiv:2201.05575, 2022.
- [137] X. Xie, Z. Li, X. Wang, Y. Zhu, N. Zhang, J. Zhang, S. Cheng, B. Tian, S. Deng, F. Xiong, and H. Chen, “Lambdakg: A library for pre-trained language model-based knowledge graph embeddings,” 2022.
- [138] B. Kim, T. Hong, Y. Ko, and J. Seo, “Multi-task learning for knowledge graph completion with pre-trained language models,” in COLING, 2020, pp. 1737–1743.
- [139] X. Lv, Y. Lin, Y. Cao, L. Hou, J. Li, Z. Liu, P. Li, and J. Zhou, “Do pre-trained models benefit knowledge graph completion? A reliable evaluation and a reasonable approach,” in ACL, 2022, pp. 3570–3581.
- [140] J. Shen, C. Wang, L. Gong, and D. Song, “Joint language semantic and structure embedding for knowledge graph completion,” in COLING, 2022, pp. 1965–1978.
- [141] B. Choi, D. Jang, and Y. Ko, “MEM-KGC: masked entity model for knowledge graph completion with pre-trained language model,” IEEE Access, vol. 9, pp. 132 025–132 032, 2021.
- [142] B. Choi and Y. Ko, “Knowledge graph extension with a pre-trained language model via unified learning method,” Knowl. Based Syst., vol. 262, p. 110245, 2023.
- [143] B. Wang, T. Shen, G. Long, T. Zhou, Y. Wang, and Y. Chang, “Structure-augmented text representation learning for efficient knowledge graph completion,” in WWW, 2021, pp. 1737–1748.
- [144] L. Wang, W. Zhao, Z. Wei, and J. Liu, “Simkgc: Simple contrastive knowledge graph completion with pre-trained language models,” in ACL, 2022, pp. 4281–4294.
- [145] D. Li, M. Yi, and Y. He, “Lp-bert: Multi-task pre-training knowledge graph bert for link prediction,” arXiv preprint arXiv:2201.04843, 2022.
- [146] A. Saxena, A. Kochsiek, and R. Gemulla, “Sequence-to-sequence knowledge graph completion and question answering,” in ACL, 2022, pp. 2814–2828.
- [147] C. Chen, Y. Wang, B. Li, and K. Lam, “Knowledge is flat: A seq2seq generative framework for various knowledge graph completion,” in COLING, 2022, pp. 4005–4017.
- [148] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in NAACL, 2018, pp. 2227–2237.
- [149] H. Yan, T. Gui, J. Dai, Q. Guo, Z. Zhang, and X. Qiu, “A unified generative framework for various NER subtasks,” in ACL, 2021, pp. 5808–5822.
- [150] Y. Onoe and G. Durrett, “Learning to denoise distantly-labeled data for entity typing,” in NAACL, 2019, pp. 2407–2417.
- [151] Y. Onoe, M. Boratko, A. McCallum, and G. Durrett, “Modeling fine-grained entity types with box embeddings,” in ACL, 2021, pp. 2051–2064.
- [152] B. Z. Li, S. Min, S. Iyer, Y. Mehdad, and W. Yih, “Efficient one-pass end-to-end entity linking for questions,” in EMNLP, 2020, pp. 6433–6441.
- [153] T. Ayoola, S. Tyagi, J. Fisher, C. Christodoulopoulos, and A. Pierleoni, “Refined: An efficient zero-shot-capable approach to end-to-end entity linking,” in NAACL, 2022, pp. 209–220.
- [154] M. Joshi, O. Levy, L. Zettlemoyer, and D. S. Weld, “BERT for coreference resolution: Baselines and analysis,” in EMNLP, 2019, pp. 5802–5807.
- [155] M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, “Spanbert: Improving pre-training by representing and predicting spans,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 64–77, 2020.
- [156] A. Caciularu, A. Cohan, I. Beltagy, M. E. Peters, A. Cattan, and I. Dagan, “CDLM: cross-document language modeling,” in EMNLP, 2021, pp. 2648–2662.
- [157] A. Cattan, A. Eirew, G. Stanovsky, M. Joshi, and I. Dagan, “Cross-document coreference resolution over predicted mentions,” in ACL, 2021, pp. 5100–5107.
- [158] Y. Wang, Y. Shen, and H. Jin, “An end-to-end actor-critic-based neural coreference resolution system,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021, 2021, pp. 7848–7852.
- [159] P. Shi and J. Lin, “Simple BERT models for relation extraction and semantic role labeling,” CoRR, vol. abs/1904.05255, 2019.
- [160] S. Park and H. Kim, “Improving sentence-level relation extraction through curriculum learning,” CoRR, vol. abs/2107.09332, 2021.
- [161] Y. Ma, A. Wang, and N. Okazaki, “DREEAM: guiding attention with evidence for improving document-level relation extraction,” in EACL, 2023, pp. 1963–1975.
- [162] Q. Guo, Y. Sun, G. Liu, Z. Wang, Z. Ji, Y. Shen, and X. Wang, “Constructing chinese historical literature knowledge graph based on bert,” in Web Information Systems and Applications: 18th International Conference, WISA 2021, Kaifeng, China, September 24–26, 2021, Proceedings 18. Springer, 2021, pp. 323–334.
- [163] J. Han, N. Collier, W. Buntine, and E. Shareghi, “Pive: Prompting with iterative verification improving graph-based generative capability of llms,” arXiv preprint arXiv:2305.12392, 2023.
- [164] A. Bosselut, H. Rashkin, M. Sap, C. Malaviya, A. Celikyilmaz, and Y. Choi, “Comet: Commonsense transformers for knowledge graph construction,” in ACL, 2019.
- [165] S. Hao, B. Tan, K. Tang, H. Zhang, E. P. Xing, and Z. Hu, “Bertnet: Harvesting knowledge graphs from pretrained language models,” arXiv preprint arXiv:2206.14268, 2022.
- [166] P. West, C. Bhagavatula, J. Hessel, J. Hwang, L. Jiang, R. Le Bras, X. Lu, S. Welleck, and Y. Choi, “Symbolic knowledge distillation: from general language models to commonsense models,” in NAACL, 2022, pp. 4602–4625.
- [167] L. F. R. Ribeiro, M. Schmitt, H. Schütze, and I. Gurevych, “Investigating pretrained language models for graph-to-text generation,” in Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, 2021, pp. 211–227.
- [168] J. Li, T. Tang, W. X. Zhao, Z. Wei, N. J. Yuan, and J.-R. Wen, “Few-shot knowledge graph-to-text generation with pretrained language models,” in ACL, 2021, pp. 1558–1568.
- [169] A. Colas, M. Alvandipour, and D. Z. Wang, “GAP: A graph-aware language model framework for knowledge graph-to-text generation,” in Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 5755–5769.
- [170] Z. Jin, Q. Guo, X. Qiu, and Z. Zhang, “GenWiki: A dataset of 1.3 million content-sharing text and graphs for unsupervised graph-to-text generation,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 2398–2409.
- [171] W. Chen, Y. Su, X. Yan, and W. Y. Wang, “KGPT: Knowledge-grounded pre-training for data-to-text generation,” in EMNLP, 2020, pp. 8635–8648.
- [172] D. Lukovnikov, A. Fischer, and J. Lehmann, “Pretrained transformers for simple question answering over knowledge graphs,” in The Semantic Web–ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part I 18. Springer, 2019, pp. 470–486.
- [173] D. Luo, J. Su, and S. Yu, “A bert-based approach with relation-aware attention for knowledge base question answering,” in IJCNN. IEEE, 2020, pp. 1–8.
- [174] N. Hu, Y. Wu, G. Qi, D. Min, J. Chen, J. Z. Pan, and Z. Ali, “An empirical study of pre-trained language models in simple knowledge graph question answering,” arXiv preprint arXiv:2303.10368, 2023.
- [175] Y. Xu, C. Zhu, R. Xu, Y. Liu, M. Zeng, and X. Huang, “Fusing context into knowledge graph for commonsense question answering,” in ACL, 2021, pp. 1201–1207.
- [176] M. Zhang, R. Dai, M. Dong, and T. He, “Drlk: Dynamic hierarchical reasoning with language model and knowledge graph for question answering,” in EMNLP, 2022, pp. 5123–5133.
- [177] Z. Hu, Y. Xu, W. Yu, S. Wang, Z. Yang, C. Zhu, K.-W. Chang, and Y. Sun, “Empowering language models with knowledge graph reasoning for open-domain question answering,” in EMNLP, 2022, pp. 9562–9581.
- [178] X. Zhang, A. Bosselut, M. Yasunaga, H. Ren, P. Liang, C. D. Manning, and J. Leskovec, “Greaselm: Graph reasoning enhanced language models,” in ICLR, 2022.
- [179] X. Cao and Y. Liu, “Relmkg: reasoning with pre-trained language models and knowledge graphs for complex question answering,” Applied Intelligence, pp. 1–15, 2022.
- [180] X. Huang, J. Zhang, D. Li, and P. Li, “Knowledge graph embedding based question answering,” in WSDM, 2019, pp. 105–113.
- [181] H. Wang, F. Zhang, X. Xie, and M. Guo, “Dkn: Deep knowledge-aware network for news recommendation,” in WWW, 2018, pp. 1835–1844.
- [182] B. Yang, S. W.-t. Yih, X. He, J. Gao, and L. Deng, “Embedding entities and relations for learning and inference in knowledge bases,” in ICLR, 2015.
- [183] W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang, “One-shot relational learning for knowledge graphs,” in EMNLP, 2018, pp. 1980–1990.
- [184] P. Wang, J. Han, C. Li, and R. Pan, “Logic attention based neighborhood aggregation for inductive knowledge graph embedding,” in AAAI, vol. 33, no. 01, 2019, pp. 7152–7159.
- [185] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1, 2015.
- [186] C. Chen, Y. Wang, A. Sun, B. Li, and L. Kwok-Yan, “Dipping plms sauce: Bridging structure and text for effective knowledge graph completion via conditional soft prompting,” in ACL, 2023.
- [187] J. Lovelace and C. P. Rosé, “A framework for adapting pre-trained language models to knowledge graph completion,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022, pp. 5937–5955.
- [188] J. Fu, L. Feng, Q. Zhang, X. Huang, and P. Liu, “Larger-context tagging: When and why does it work?” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 2021, pp. 1463–1475.
- [189] X. Liu, K. Ji, Y. Fu, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” CoRR, vol. abs/2110.07602, 2021.
- [190] J. Yu, B. Bohnet, and M. Poesio, “Named entity recognition as dependency parsing,” in ACL, 2020, pp. 6470–6476.
- [191] F. Li, Z. Lin, M. Zhang, and D. Ji, “A span-based model for joint overlapped and discontinuous named entity recognition,” in ACL, 2021, pp. 4814–4828.
- [192] C. Tan, W. Qiu, M. Chen, R. Wang, and F. Huang, “Boundary enhanced neural span classification for nested named entity recognition,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, 2020, pp. 9016–9023.
- [193] Y. Xu, H. Huang, C. Feng, and Y. Hu, “A supervised multi-head self-attention network for nested named entity recognition,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, 2021, pp. 14 185–14 193.
- [194] J. Yu, B. Ji, S. Li, J. Ma, H. Liu, and H. Xu, “S-NER: A concise and efficient span-based model for named entity recognition,” Sensors, vol. 22, no. 8, p. 2852, 2022.
- [195] Y. Fu, C. Tan, M. Chen, S. Huang, and F. Huang, “Nested named entity recognition with partially-observed treecrfs,” in AAAI, 2021, pp. 12 839–12 847.
- [196] C. Lou, S. Yang, and K. Tu, “Nested named entity recognition as latent lexicalized constituency parsing,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, 2022, pp. 6183–6198.
- [197] S. Yang and K. Tu, “Bottom-up constituency parsing and nested named entity recognition with pointer networks,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, 2022, pp. 2403–2416.
- [198] F. Li, Z. Lin, M. Zhang, and D. Ji, “A span-based model for joint overlapped and discontinuous named entity recognition,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, 2021, pp. 4814–4828.
- [199] Q. Liu, H. Lin, X. Xiao, X. Han, L. Sun, and H. Wu, “Fine-grained entity typing via label reasoning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, 2021, pp. 4611–4622.
- [200] H. Dai, Y. Song, and H. Wang, “Ultra-fine entity typing with weak supervision from a masked language model,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, 2021, pp. 1790–1799.
- [201] N. Ding, Y. Chen, X. Han, G. Xu, X. Wang, P. Xie, H. Zheng, Z. Liu, J. Li, and H. Kim, “Prompt-learning for fine-grained entity typing,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022, pp. 6888–6901.
- [202] W. Pan, W. Wei, and F. Zhu, “Automatic noisy label correction for fine-grained entity typing,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, 2022, pp. 4317–4323.
- [203] B. Li, W. Yin, and M. Chen, “Ultra-fine entity typing with indirect supervision from natural language inference,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 607–622, 2022.
- [204] S. Broscheit, “Investigating entity knowledge in BERT with simple neural end-to-end entity linking,” CoRR, vol. abs/2003.05473, 2020.
- [205] N. D. Cao, G. Izacard, S. Riedel, and F. Petroni, “Autoregressive entity retrieval,” in 9th ICLR, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
- [206] N. D. Cao, L. Wu, K. Popat, M. Artetxe, N. Goyal, M. Plekhanov, L. Zettlemoyer, N. Cancedda, S. Riedel, and F. Petroni, “Multilingual autoregressive entity linking,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 274–290, 2022.
- [207] N. D. Cao, W. Aziz, and I. Titov, “Highly parallel autoregressive entity linking with discriminative correction,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, 2021, pp. 7662–7669.
- [208] K. Lee, L. He, and L. Zettlemoyer, “Higher-order coreference resolution with coarse-to-fine inference,” in NAACL, 2018, pp. 687–692.
- [209] T. M. Lai, T. Bui, and D. S. Kim, “End-to-end neural coreference resolution revisited: A simple yet effective baseline,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022, 2022, pp. 8147–8151.
- [210] W. Wu, F. Wang, A. Yuan, F. Wu, and J. Li, “Corefqa: Coreference resolution as query-based span prediction,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2020, pp. 6953–6963.
- [211] T. M. Lai, H. Ji, T. Bui, Q. H. Tran, F. Dernoncourt, and W. Chang, “A context-dependent gated module for incorporating symbolic semantics into event coreference resolution,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 2021, pp. 3491–3499.
- [212] Y. Kirstain, O. Ram, and O. Levy, “Coreference resolution without span representations,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 2: Short Papers), Virtual Event, August 1-6, 2021, 2021, pp. 14–19.
- [213] R. Thirukovalluru, N. Monath, K. Shridhar, M. Zaheer, M. Sachan, and A. McCallum, “Scaling within document coreference to long texts,” in Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, ser. Findings of ACL, vol. ACL/IJCNLP 2021, 2021, pp. 3921–3931.
- [214] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” CoRR, vol. abs/2004.05150, 2020.
- [215] C. Alt, M. Hübner, and L. Hennig, “Improving relation extraction by pre-trained language representations,” in 1st Conference on Automated Knowledge Base Construction, AKBC 2019, Amherst, MA, USA, May 20-22, 2019, 2019.
- [216] L. B. Soares, N. FitzGerald, J. Ling, and T. Kwiatkowski, “Matching the blanks: Distributional similarity for relation learning,” in ACL, 2019, pp. 2895–2905.
- [217] S. Lyu and H. Chen, “Relation classification with entity type restriction,” in Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, ser. Findings of ACL, vol. ACL/IJCNLP 2021, 2021, pp. 390–395.
- [218] J. Zheng and Z. Chen, “Sentence-level relation extraction via contrastive learning with descriptive relation prompts,” CoRR, vol. abs/2304.04935, 2023.
- [219] H. Wang, C. Focke, R. Sylvester, N. Mishra, and W. Y. Wang, “Fine-tune bert for docred with two-step process,” CoRR, vol. abs/1909.11898, 2019.
- [220] H. Tang, Y. Cao, Z. Zhang, J. Cao, F. Fang, S. Wang, and P. Yin, “HIN: hierarchical inference network for document-level relation extraction,” in PAKDD, ser. Lecture Notes in Computer Science, vol. 12084, 2020, pp. 197–209.
- [221] D. Wang, W. Hu, E. Cao, and W. Sun, “Global-to-local neural networks for document-level relation extraction,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, 2020, pp. 3711–3721.
- [222] S. Zeng, Y. Wu, and B. Chang, “SIRE: separate intra- and inter-sentential reasoning for document-level relation extraction,” in Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, ser. Findings of ACL, vol. ACL/IJCNLP 2021, 2021, pp. 524–534.
- [223] G. Nan, Z. Guo, I. Sekulic, and W. Lu, “Reasoning with latent structure refinement for document-level relation extraction,” in ACL, 2020, pp. 1546–1557.
- [224] S. Zeng, R. Xu, B. Chang, and L. Li, “Double graph based reasoning for document-level relation extraction,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, 2020, pp. 1630–1640.
- [225] N. Zhang, X. Chen, X. Xie, S. Deng, C. Tan, M. Chen, F. Huang, L. Si, and H. Chen, “Document-level relation extraction as semantic segmentation,” in IJCAI, 2021, pp. 3999–4006.
- [226] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 9351, 2015, pp. 234–241.
- [227] W. Zhou, K. Huang, T. Ma, and J. Huang, “Document-level relation extraction with adaptive thresholding and localized context pooling,” in AAAI, 2021, pp. 14 612–14 620.
- [228] C. Gardent, A. Shimorina, S. Narayan, and L. Perez-Beltrachini, “The WebNLG challenge: Generating text from RDF data,” in Proceedings of the 10th International Conference on Natural Language Generation, 2017, pp. 124–133.
- [229] J. Guan, Y. Wang, and M. Huang, “Story ending generation with incremental encoding and commonsense knowledge,” in AAAI, 2019, pp. 6473–6480.
- [230] H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu, “Commonsense knowledge aware conversation generation with graph attention,” in IJCAI, 2018, pp. 4623–4629.
- [231] M. Kale and A. Rastogi, “Text-to-text pre-training for data-to-text tasks,” in Proceedings of the 13th International Conference on Natural Language Generation, 2020, pp. 97–102.
- [232] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in ACL, 2009, pp. 1003–1011.
- [233] A. Saxena, A. Tripathi, and P. Talukdar, “Improving multi-hop question answering over knowledge graphs using knowledge base embeddings,” in ACL, 2020, pp. 4498–4507.
- [234] Y. Feng, X. Chen, B. Y. Lin, P. Wang, J. Yan, and X. Ren, “Scalable multi-hop relational reasoning for knowledge-aware question answering,” in EMNLP, 2020, pp. 1295–1309.
- [235] Y. Yan, R. Li, S. Wang, H. Zhang, Z. Daoguang, F. Zhang, W. Wu, and W. Xu, “Large-scale relation learning for question answering over knowledge bases with pre-trained language models,” in EMNLP, 2021, pp. 3653–3660.
- [236] J. Zhang, X. Zhang, J. Yu, J. Tang, J. Tang, C. Li, and H. Chen, “Subgraph retrieval enhanced model for multi-hop knowledge base question answering,” in ACL (Volume 1: Long Papers), 2022, pp. 5773–5784.
- [237] J. Jiang, K. Zhou, Z. Dong, K. Ye, W. X. Zhao, and J.-R. Wen, “Structgpt: A general framework for large language model to reason over structured data,” arXiv preprint arXiv:2305.09645, 2023.
- [238] H. Zhu, H. Peng, Z. Lyu, L. Hou, J. Li, and J. Xiao, “Pre-training language model incorporating domain-specific heterogeneous knowledge into a unified representation,” Expert Systems with Applications, vol. 215, p. 119369, 2023.
- [239] C. Feng, X. Zhang, and Z. Fei, “Knowledge solver: Teaching llms to search for domain knowledge from knowledge graphs,” arXiv preprint arXiv:2309.03118, 2023.
- [240] J. Sun, C. Xu, L. Tang, S. Wang, C. Lin, Y. Gong, H.-Y. Shum, and J. Guo, “Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph,” arXiv preprint arXiv:2307.07697, 2023.
- [241] B. He, D. Zhou, J. Xiao, X. Jiang, Q. Liu, N. J. Yuan, and T. Xu, “BERT-MK: Integrating graph contextualized knowledge into pre-trained language models,” in EMNLP, 2020, pp. 2281–2290.
- [242] Y. Su, X. Han, Z. Zhang, Y. Lin, P. Li, Z. Liu, J. Zhou, and M. Sun, “Cokebert: Contextual knowledge selection and embedding towards enhanced pre-trained language models,” AI Open, vol. 2, pp. 127–134, 2021.
- [243] D. Yu, C. Zhu, Y. Yang, and M. Zeng, “JAKET: joint pre-training of knowledge graph and language understanding,” in AAAI, 2022, pp. 11 630–11 638.
- [244] X. Wang, P. Kapanipathi, R. Musa, M. Yu, K. Talamadupula, I. Abdelaziz, M. Chang, A. Fokoue, B. Makni, N. Mattei, and M. Witbrock, “Improving natural language inference using external knowledge in the science questions domain,” in AAAI, 2019, pp. 7208–7215.
- [245] Y. Sun, Q. Shi, L. Qi, and Y. Zhang, “JointLK: Joint reasoning with language models and knowledge graphs for commonsense question answering,” in NAACL, 2022, pp. 5049–5060.
- [246] X. Liu, H. Yu, H. Zhang, Y. Xu, X. Lei, H. Lai, Y. Gu, H. Ding, K. Men, K. Yang et al., “Agentbench: Evaluating llms as agents,” arXiv preprint arXiv:2308.03688, 2023.
- [247] Y. Wang, N. Lipka, R. A. Rossi, A. Siu, R. Zhang, and T. Derr, “Knowledge graph prompting for multi-document question answering,” arXiv preprint arXiv:2308.11730, 2023.
- [248] A. Zeng, M. Liu, R. Lu, B. Wang, X. Liu, Y. Dong, and J. Tang, “Agenttuning: Enabling generalized agent abilities for llms,” 2023.
- [249] W. Kryściński, B. McCann, C. Xiong, and R. Socher, “Evaluating the factual consistency of abstractive text summarization,” arXiv preprint arXiv:1910.12840, 2019.
- [250] Z. Ji, Z. Liu, N. Lee, T. Yu, B. Wilie, M. Zeng, and P. Fung, “Rho ( $\backslash\rho$ ): Reducing hallucination in open-domain dialogues with knowledge grounding,” arXiv preprint arXiv:2212.01588, 2022.
- [251] S. Feng, V. Balachandran, Y. Bai, and Y. Tsvetkov, “Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge,” arXiv preprint arXiv:2305.08281, 2023.
- [252] Y. Yao, P. Wang, B. Tian, S. Cheng, Z. Li, S. Deng, H. Chen, and N. Zhang, “Editing large language models: Problems, methods, and opportunities,” arXiv preprint arXiv:2305.13172, 2023.
- [253] Z. Li, N. Zhang, Y. Yao, M. Wang, X. Chen, and H. Chen, “Unveiling the pitfalls of knowledge editing for large language models,” arXiv preprint arXiv:2310.02129, 2023.
- [254] R. Cohen, E. Biran, O. Yoran, A. Globerson, and M. Geva, “Evaluating the ripple effects of knowledge editing in language models,” arXiv preprint arXiv:2307.12976, 2023.
- [255] S. Diao, Z. Huang, R. Xu, X. Li, Y. Lin, X. Zhou, and T. Zhang, “Black-box prompt learning for pre-trained language models,” arXiv preprint arXiv:2201.08531, 2022.
- [256] T. Sun, Y. Shao, H. Qian, X. Huang, and X. Qiu, “Black-box tuning for language-model-as-a-service,” in International Conference on Machine Learning. PMLR, 2022, pp. 20 841–20 855.
- [257] X. Chen, A. Shrivastava, and A. Gupta, “NEIL: extracting visual knowledge from web data,” in IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, 2013, pp. 1409–1416.
- [258] M. Warren and P. J. Hayes, “Bounding ambiguity: Experiences with an image annotation system,” in Proceedings of the 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, ser. CEUR Workshop Proceedings, vol. 2276, 2018, pp. 41–54.
- [259] Z. Chen, Y. Huang, J. Chen, Y. Geng, Y. Fang, J. Z. Pan, N. Zhang, and W. Zhang, “Lako: Knowledge-driven visual estion answering via late knowledge-to-text injection,” 2022.
- [260] R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V. Alwala, A. Joulin, and I. Misra, “Imagebind: One embedding space to bind them all,” in ICCV, 2023, pp. 15 180–15 190.
- [261] J. Zhang, Z. Yin, P. Chen, and S. Nichele, “Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review,” Information Fusion, vol. 59, pp. 103–126, 2020.
- [262] H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei, “Trustworthy graph neural networks: Aspects, methods and trends,” arXiv:2205.07424, 2022.
- [263] T. Wu, M. Caccia, Z. Li, Y.-F. Li, G. Qi, and G. Haffari, “Pretrained language model in continual learning: A comparative study,” in ICLR, 2022.
- [264] X. L. Li, A. Kuncoro, J. Hoffmann, C. de Masson d’Autume, P. Blunsom, and A. Nematzadeh, “A systematic investigation of commonsense knowledge in large language models,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 11 838–11 855.
- [265] Y. Zheng, H. Y. Koh, J. Ju, A. T. Nguyen, L. T. May, G. I. Webb, and S. Pan, “Large language models for scientific synthesis, inference and explanation,” arXiv preprint arXiv:2310.07984, 2023.
- [266] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language processing via large pre-trained language models: A survey,” ACM Computing Surveys, vol. 56, no. 2, pp. 1–40, 2023.
- [267] J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” in International Conference on Learning Representations, 2021.
- [268] Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y. Zhang, Y. Chen, L. Wang, A. T. Luu, W. Bi, F. Shi, and S. Shi, “Siren’s song in the ai ocean: A survey on hallucination in large language models,” arXiv preprint arXiv:2309.01219, 2023.
## Appendix A Pros and Cons for LLMs and KGs
In this section, we introduce the pros and cons of LLMs and KGs in detail. We summarize the pros and cons of LLMs and KGs in Fig. 1, respectively.
LLM pros.
- General Knowledge [11]: LLMs pre-trained on large-scale corpora, which contain a large amount of general knowledge, such as commonsense knowledge [264] and factual knowledge [14]. Such knowledge can be distilled from LLMs and used for downstream tasks [265].
- Language Processing [12]: LLMs have shown great performance in understanding natural language [266]. Therefore, LLMs can be used in many natural language processing tasks, such as question answering [4], machine translation [5], and text generation [6].
- Generalizability [13]: LLMs enable great generalizability, which can be applied to various downstream tasks [267]. By providing few-shot examples [59] or finetuning on multi-task data [3], LLMs achieve great performance on many tasks.
LLM cons.
- Implicit Knowledge [14]: LLMs represent knowledge implicitly in their parameters. It is difficult to interpret or validate the knowledge obtained by LLMs.
- Hallucination [15]: LLMs often experience hallucinations by generating content that while seemingly plausible but are factually incorrect [268]. This problem greatly reduces the trustworthiness of LLMs in real-world scenarios.
- Indecisiveness [16]: LLMs perform reasoning by generating from a probability model, which is an indecisive process. The generated results are sampled from the probability distribution, which is difficult to control.
- Black-box [17]: LLMs are criticized for their lack of interpretability. It is unclear to know the specific patterns and functions LLMs use to arrive at predictions or decisions.
- Lacking Domain-specific/New Knowledge [18]: LLMs trained on general corpus might not be able to generalize well to specific domains or new knowledge due to the lack of domain-specific knowledge or new training data.
KG pros.
- Structural Knowledge [19]: KGs store facts in a structural format (i.e., triples), which can be understandable by both humans and machines.
- Accuracy [20]: Facts in KGs are usually manually curated or validated by experts, which are more accurate and dependable than those in LLMs.
- Decisiveness [21]: The factual knowledge in KGs is stored in a decisive manner. The reasoning algorithm in KGs is also deterministic, which can provide decisive results.
- Interpretability [22]: KGs are renowned for their symbolic reasoning ability, which provides an interpretable reasoning process that can be understood by humans.
- Domain-specific Knowledge [23]: Many domains can construct their KGs by experts to provide precise and dependable domain-specific knowledge.
- Evolving Knowledge [24]: The facts in KGs are continuously evolving. The KGs can be updated with new facts by inserting new triples and deleting outdated ones.
KG cons.
- Incompleteness [25]: KGs are hard to construct and often incomplete, which limits the ability of KGs to provide comprehensive knowledge.
- Lacking Language Understanding [33]: Most studies on KGs model the structure of knowledge, but ignore the textual information in KGs. The textual information in KGs is often ignored in KG-related tasks, such as KG completion [26] and KGQA [43].
- Unseen Facts [27]: KGs are dynamically changing, which makes it difficult to model unseen entities and represent new facts.