2507.03811v1

Model: healer-alpha-free

# Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts ††thanks: This work was funded by the authors’ individual grants from Kunumi. **Authors**: Gianlucca Zuin13, Saulo Mastelini23, Túlio Loures13, Adriano Veloso > 1Universidade Federal de Minas Gerais (UFMG), Brazil 2Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo (ICMC-USP), Brazil 3Kunumi, Brazil Email: {gianlucca, saulo, tulio, ## Abstract Documenting tacit knowledge in organizations can be a challenging task due to incomplete initial information, difficulty in identifying knowledgeable individuals, the interplay of formal hierarchies and informal networks, and the need to ask the right questions. To address this, we propose an agent-based framework leveraging large language models (LLMs) to iteratively reconstruct dataset descriptions through interactions with employees. Modeling knowledge dissemination as a Susceptible-Infectious (SI) process with waning infectivity, we conduct 864 simulations across various synthetic company structures and different dissemination parameters. Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores. We analyze how each simulation parameter affects the knowledge retrieval process for the agent. In particular, we find that our approach is able to recover information without needing to access directly the only domain specialist. These findings highlight the agent’s ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible. Index Terms: Organizational Knowledge, Agent-based simulations, Large Language Models ©2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ## I Introduction In current fast-paced and knowledge-driven organizations, efficient management and sharing of knowledge remain challenging tasks. While explicit knowledge is easily documented, tacit knowledge — rooted in personal experiences and expertise — often proves difficult to systematically capture and disseminate. Sociological theories of organizational structure, including Max Weber’s bureaucratic model [1] and Frederick Taylor’s scientific management [2], provide valuable insights into how both formal and informal systems influence the flow of information. Weber emphasized the efficiency of rigid hierarchies in structuring authority and information flow, while Taylor focused on the benefits of standardizing knowledge to boost productivity. However, both perspectives highlight inherent challenges: rigid hierarchies can create bottlenecks, while more flexible systems risk fostering knowledge silos. Organizational structures, whether hierarchical, flat, or matrixed, impact how knowledge is disseminated within a company. These structures define the pathways of interaction among employees and influence knowledge’s accessibility and retention. Research has shown that knowledge dissemination in cooperative learning networks resembles the spread of disease through a crowd [3, 4]. Thus, by drawing on sociology and network theory, we illustrate information flow within organizations using epidemic models. The works of Polanyi [5] and Nonaka and Takeuchi [6] further underline the complexity of capturing and sharing tacit knowledge. Polanyi’s assertion that “we can know more than we can tell” underscores the implicit nature of such knowledge, while Nonaka and Takeuchi’s SECI model demonstrates how organizations could convert tacit knowledge into explicit forms through socialization, externalization, combination, and internalization. On the other hand, Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of tasks, including natural language understanding, translation, summarization, and creative writing [7]. Their proficiency in processing and generating human-like text has opened new possibilities in areas such as automated content creation, human-computer interaction, and educational technologies [8, 9]. In this study, we introduce a novel method that leverages an LLM-based agent to uncover and document tacit knowledge within organizational contexts. To validate our approach, we allow the agent to navigate virtual company hierarchies, engaging with simulated employees—also modeled as LLMs—to iteratively gather and refine details about the structure, purpose, and possible applications of a data table, particularly in the realm of machine learning tasks. <details> <summary>extracted/6597023/Figures/fluxo_conversation3.png Details</summary> ![51d6c63d](/v1/image/51d6c63ddf640306fd588efd55579bca3179ef51212abaf366dc98976d7e2d04) ### Visual Description ## Diagram: AI Agent Conversation Loop for Data Description ### Overview This image is a flowchart diagram illustrating a "Conversation Loop" process for an AI agent tasked with building a table description for a database table named `AutomotivePerformance`. The diagram shows a cyclical interaction between the AI agent, a user (Miguel Torres), and internal system processes like parsing, scoring, self-reflection, and decision-making. The primary language is English. ### Components/Axes The diagram is structured with a main gray container labeled **"Conversation Loop"** on the right, and a feedback loop on the left. **Main Components (within the gray "Conversation Loop" box):** 1. **Question Asking:** Represented by a robot icon with a magnifying glass. 2. **Employee:** Represented by a simple robot head icon. 3. **Description Parsing:** A dashed box containing parsed data. 4. **Scoring + Critic:** A dashed box containing a critique. 5. **Self-reflection:** A dashed box containing improvement suggestions. 6. **Flow Decision-making:** A dashed box containing a suggested action. **Flow & Arrows:** Red arrows indicate the direction of the process flow, creating a loop. **External Elements (Left Side):** * A stack of documents icon labeled: "Chat history, current table description, critic with score and self-reflection suggestions are fed back to the agent when deciding the next question." * Two speech bubbles representing the AI agent's communication with the user, Miguel Torres. ### Detailed Analysis The process flow is as follows: 1. **Initiation (Top Left):** The AI agent initiates contact. * **Speech Bubble Text:** "Hello Miguel Torres! I'm excited to work with you on building the table description for AutomotivePerformance. Let's get started!" 2. **Question Asking Phase:** The agent asks a detailed question. * **Speech Bubble Text:** "Miguel, could you explain what each of the columns like 'VCLN', 'PMAX', 'FRF', 'UREV', and 'SYSTM' specifically represents in relation to automotive performance? Additionally, what types of data do you expect in each column (e.g., numeric, categorical)?" 3. **Employee Response:** The user (Miguel) responds, indicating a lack of knowledge. * **Speech Bubble Text:** "I don't have specific knowledge about these columns. For detailed insights on those variables, I recommend reaching out to Sophia Chen, as she has valuable information regarding that table." 4. **Description Parsing:** The system parses the current, incomplete description of the table. * **Parsed Content:** ``` AutomotivePerformance Columns: - VCLN: Unknown data type and purpose (no specific information provided) - PMAX: Unknown data type and purpose (no specific information provided) - FRF: Unknown data type and purpose (no specific information provided) ... ``` 5. **Scoring + Critic:** The parsed description is evaluated. * **Critic Text:** ``` Critic: Score: 2 The description of the AutomotivePerformance table is significantly lacking in detail. None of the columns have their data types or purposes specified, which is critical information for understanding the table's structure and content. Without this information, it is impossible to assess how the data can be used or what it represents. The mention of potentially obtaining further insights from Sophia Chen does not compensate for the lack of information provided in the description. ``` 6. **Self-reflection:** Based on the critic, the system generates improvement suggestions. * **Suggestion Text:** ``` To improve the description, the following elements should be added: 1. Clearly define the data types for each column (e.g., integer, float). 2. Provide a brief explanation of the purpose of each column, including what data it is expected to hold. 3. Include any relationships with other tables, if applicable, to give context on how this table fits into a larger database schema. 4. Mention any constraints, such as primary keys, foreign keys, or unique constraints, if they exist. 5. Add example values for each column to illustrate the kind of data that will be present. ``` 7. **Flow Decision-making:** The system decides the next action based on all prior context. * **Decision Text:** ``` Suggested Action: switch Suggested User: Sophia Chen Reasoning: The current user, Miguel Torres, has indicated that he does not have specific knowledge about the requested columns in the AutomotivePerformance table and has recommended reaching out to Sophia Chen for detailed insights. This suggests that Sophia is likely to have the relevant information needed. ``` 8. **Agent Action & Loop Closure (Bottom Left):** The agent acts on the decision and communicates back to the user, closing the loop. * **Speech Bubble Text:** "Thank you, Miguel Torres, for your help so far! I appreciate your guidance and will now reach out to Sophia Chen for further insights on the AutomotivePerformance table." * An arrow from this speech bubble points back to the initial document stack, indicating the chat history is updated and fed back into the system for the next cycle. ### Key Observations * **Iterative Process:** The diagram explicitly shows a closed-loop, iterative refinement process for generating technical documentation. * **Multi-Agent Collaboration:** The workflow involves switching between different human experts (Miguel Torres -> Sophia Chen) to gather information. * **Internal Critique & Self-Improvement:** The system has built-in mechanisms (Scoring + Critic, Self-reflection) to evaluate its own output and generate specific, actionable steps for improvement. * **Context Preservation:** The "Chat history..." block emphasizes that all prior interactions and system states are retained to inform future decisions. * **Low Initial Score:** The initial description receives a very low score (2), highlighting the system's ability to recognize poor-quality output. ### Interpretation This diagram models a sophisticated, self-correcting AI agent designed for collaborative knowledge extraction and documentation. It demonstrates a Peircean investigative approach: 1. **Abduction (Inference to the Best Explanation):** The "Flow Decision-making" component abduces that since Miguel lacks knowledge and suggests Sophia, Sophia is the best source of information. The action "switch" is the inferred best next step. 2. **Deduction:** The "Self-reflection" component deduces the specific requirements (data types, purposes, constraints) that a complete table description must have, based on the general goal of creating useful documentation. 3. **Induction:** The "Scoring + Critic" component induces that the current description is poor based on the observed absence of critical fields, leading to a low score. The process underscores that effective technical documentation is not a one-shot task but a dialogue. The AI acts as a facilitator, identifying knowledge gaps, routing queries to the correct experts, and iteratively refining the output based on structured feedback. The anomaly is the initial, severely deficient description, which serves as the catalyst for the entire corrective loop. The system's value lies not in having all answers initially, but in its structured process to find them through collaboration and self-assessment. </details> Figure 1: The agent iteratively builds knowledge and decides on its next course of action as it interacts with company employees in a conversation loop. Our simulation-based approach reflects the complexities of real-world organizations. Employees modeled as LLMs interact with the agent, responding to inquiries, redirecting questions, or providing insights on specific aspects of the data. After each interaction, the agent assesses its progress, generating critiques and suggestions for improvement before determining the next steps as illustrated in Figure 1. This iterative process not only addresses knowledge gaps but also simulates human-like reasoning [10, 11], allowing the agent to adapt to various organizational structures and communication styles. Rather than relying on a human to locate the appropriate domain specialist within an organization—an often difficult and time-consuming task—our method facilitates the crowdsourcing of this process using LLM agents, creating a comprehensive repository of organizational knowledge. To evaluate the effectiveness of our method, we conducted 864 simulations across diverse synthetic organizational structures, knowledge dissemination strategies, and data configurations. Our LLM agent achieved a success rate of 94.9% in acquiring complete knowledge about the queried tables. These simulations encompassed over 300,000 interactions, including self-reasoning steps, producing a dataset exceeding 45 million words. This dataset serves as a valuable resource for advancing research into LLM-agent interactions and the relationship between organizational structures and knowledge management practices. For access to the full simulation chat log and prompts, see the Code and Data Availability statement. ## II Related Work Max Weber’s seminal work on social and economic organization [1] offers key insights into the functioning of bureaucratic and hierarchical structures. A central focus of Weber’s work is in his concept of authority, which includes traditional authority, rooted in customs, and rational-legal authority, based on formal rules and laws. Both of these forms of authority are highly relevant to companies and economic organizations. Weber highlights bureaucracy as the primary model of rational-legal authority, emphasizing its efficiency, hierarchy, specialization, and reliance on impersonal procedures. Weber’s concept of Verstehen (German for “interpretive understanding”) stresses the need to understand social actions through the subjective meanings individuals give to their behaviors. Additionally, his use of “ideal types” of behavior as a methodological tool help analyze many social and economic phenomena. These theories form the basis for understanding the formal mechanisms that govern information flow within organizations. Many other works in organizational theory build upon Weber’s ideas to explore additional aspects of organizational behavior, power dynamics, and decision-making processes. For instance, Peter M. Blau’s theories of social exchange [12] emphasize the informal dynamics of interaction, where reciprocity and trust become critical in sharing tacit knowledge. Blau’s analysis reveals how interpersonal relationships complement formal structures, resulting in collaboration and knowledge transfer. Diefenbach and Sillince [13] discuss the interplay between formal and informal hierarchies, which is crucial for understanding how knowledge propagates. They highlight that formal hierarchies are founded on clearly delineated roles and command chains, whereas informal hierarchies emerge from social interactions and shared norms. Hedlund [14] introduced the concept of heterarchies, challenging traditional hierarchical assumptions and providing a framework to analyze knowledge flow in multinational corporations. This perspective aligns with the need to model both structured and unstructured interactions in simulations. Mihm et al. [15] explored hierarchical structures and search processes, highlighting the challenges of navigating complex organizational networks. Their study employs simulations to reveal how hierarchy impacts solution stability, quality, and speed in problem-solving, showcasing the importance of adapting organizational structures to meet search requirements effectively. Incorporating these social theory perspectives provides a realistic foundation for constructing synthetic company hierarchies that reflect both structured and dynamic relationships. As such, the parameter choices in our simulations are heavily influenced by these foundational works—from Weber’s concept of bureaucratic organizations to Blau’s and Diefenbach and Sillince’s emphasis on informal connections. These theoretical frameworks not only ground our models in established sociological principles, but complement the emerging computational approaches to simulating human behavior. Recent work demonstrates LLM’s ability to model complex social systems in realistic simulations. Park et al. [16] proposed generative agents that simulate human behavior through dynamic memory mechanisms, enabling decision-making in a sandbox setting inspired by the game The Sims. These agents autonomously perform tasks, establish social relationships, and adapt behaviors through experiential learning. Dai et al. [17] investigated LLM-driven social evolution under Hobbesian Social Contract Theory, illustrates LLMs’ capacity to simulate foundational theories of societal dynamics and demonstrating agents transitioning from conflict in a “state of nature” to cooperative social orders via emergent contractual agreements. Qian et al. [18] introduced ChatDev, a framework where role-based LLM agents collaborate through structured dialogues to complete software development phases. Collectively, these studies establish LLMs as tools for emulating social behaviors across individual, organizational, and societal scales. However, to our knowledge, no previous literature proposes a method for simulating the human behaviors of interest to this work in synthetic organizational structures, namely, knowledge extraction processes. Regarding LLM themselves, advances in prompting strategies have shown to enhance its reasoning capabilities. Prompt-chains [19] iteratively refine solutions through task-specific prompts, while least-to-most prompting [20] decomposes problems into sequential subproblems. Chain-of-thought prompting [21] improves logical inference by generating intermediate reasoning steps. These divide-and-conquer approaches increase precision and reliability, particularly for iterative problem-solving contexts. Further, techniques to introduce reasoning into LLMs have also been shown to improve results. The self-reflection paradigm [11] enables LLMs to evaluate and critique their own reasoning processes. Similarly, self-critique and self-scoring mechanisms [16] have proven effective in assessing the quality of information generated by LLMs, demonstrating the advantages of feedback and iterative refinement. Building on these strategies, our approach develops a robust agent system to extract tacit knowledge from human specialists, consolidate it into documents, and validate its efficacy. These works also show how LLM-based simulations closely mimic human interactions, making them a suitable surrogate for testing and refining such methods. ## III Proposed Approach The challenge of documenting tacit knowledge within an organization requires a process that accounts for both incomplete initial knowledge and the dynamic nature of organizational networks. In our proposed approach, we introduce an agent-based framework designed to gather and document knowledge about dataset tables through iterative interactions with employees. The agent starts with a limited understanding and progressively refines its knowledge by engaging with the organization. However, real-world case studies can be difficult to obtain. To evaluate the effectiveness of this methodology, we employ simulations of organizational knowledge dissemination, modeling both formal and informal networks. In this section, we describe the key aspects of our agent design and the simulations conducted to assess its performance. ### III-A Prompt-chaining to build an Agent Our approach employs a prompt-chaining technique to develop an LLM-based agent for documenting tacit knowledge. Specifically, we wish to build a description for a table known to the organization. The agent operates in a partially observable environment. While such an environment holds access to the current table description (the true knowledge $k^*$ ), each employee’s expertise and network connections are uncertain. The agent does, however, know a subset of the network: the company’s hierarchical structure. This scenario can be modeled using a framework inspired by a Markov Decision Process (MDP). Briefly, it consists of: - Knowledge States ( $K$ ): Represent the agent’s current understanding of the knowledge domain. - Actions ( $A$ ): Consist of the set of Actions that an agent can take. - Transition Model ( $P(k^\prime|k,a)$ ): Probabilistic changes in knowledge state due to the actions made and changes in the environment. - Reward Function ( $R(k,a)$ ): Measures the completeness and accuracy of the knowledge state given the actions. Formally, let $K$ represent the set of possible states that define the agent’s current knowledge level, and $A$ represent the set of actions or prompts the agent can take. The transition probability, $P(k^\prime|k,a)$ , defines the likelihood of transitioning from state $k$ to state $k^\prime$ after taking action $a$ . The reward function, $R(k,a)$ , quantifies the benefit of the action in a particular state, reflecting the knowledge acquired. The process terminates when the agent reaches a terminal state $k_t$ , where all relevant knowledge about the proposed application is acquired. This is measured by $R(k_t,a)≥ε$ , where $ε$ represents the minimum score for a suitable table description. The agent’s goal is to maximize the accumulated reward by formulating an optimal policy $π^*$ , which guides its decisions implicitly. The agent begins with a basic understanding $k_0$ of the table or organizational knowledge, knowing only the table’s name and columns. As the agent interacts with employees, it refines its state $k$ by incorporating new information and identifying gaps. The actions ( $A$ ) involve generating questions aimed at filling these gaps. The agent uses its current state to formulate questions that address specific gaps in its understanding. Additionally, the agent’s self-critical assessments, which include quality scores, explanations, and suggestions for improvement, guide it in this task. The Transition Model ( $P(k^\prime|k,a)$ ) accounts for the probabilistic nature of state transitions after an action is taken. Due to the variability in LLM outputs and employee responses, these transitions are non-deterministic. The LLM manages these uncertainties by dynamically adjusting itself and its strategies in response to the feedback it receives. The Reward Function ( $R(k,a)$ ) evaluates actions, measuring alignment between the agent’s understanding and the expected complete knowledge. While these components are not explicitly modeled in the agent framework, they describe how the LLM organizes and selects prompts. The key goal is to maximize the accumulated reward, which helps guide the agent toward more effective knowledge acquisition. The agent interacts with employees based on their position in the hierarchy, structured as a stack, and prefers to start with those at lower levels to avoid burdening their superiors. However, if an employee is mentioned during a conversation - for instance, someone who likely holds relevant information about the table - the agent updates its knowledge of the company structure and moves this person to the top of the stack. Once someone with at least partial knowledge is found, the agent can move from one recommended employee to another, severely trimming the search space. The entire process is structured as a sequence of prompts and tasks: 1. Greet the employee to establish rapport; 1. Formulate a question about the data table; 1. Process the employee’s response and update self internal knowledge, building a new table description; 1. Critique the new updated description through: 1. Scoring the new description to evaluate its quality; 1. Criticizing it to identify gaps or inconsistencies; 1. Suggesting areas for improvement. 1. Decide whether to continue the conversation with the current employee or switch to another. A core aspect of this process is the self-critical feedback loop. After each interaction, the agent integrates the updated description, critique, score, suggestions, and chat history as inputs for the next question-generation step. These inputs, combined with the agent’s internal knowledge, guide the formulation of the next question, focusing on areas in which the understanding is lacking. This step involves using a meta-prompt to assess the completeness and relevance of the acquired information and evaluate its current knowledge state critically. Figure 2 illustrates the prompt used during this self-critical step. The agent then proceeds to self-reflect over the generated critic in order to evaluate the next course of action. Self-critic prompt You will review the table description and assess how complete it is. Here’s how to give feedback: - Score: Rate the quality of the description from 0 to 10, with 10 being excellent. - Reasoning: Explain why you gave this score, including any areas for improvement. - Suggestions: If the description score is below 8, suggest what can be added or refined to improve it. Aim for a balanced approach: emphasize what’s good, but point out areas where we can improve. Focus on understanding the table deeply and capturing its essence in the description. Do not use * or other characters. Use the exact same format as the one above. To reach a score of 5, we should know mostly everything about all columns. This includes column names, their meaning, and data types. If we also know some example values, then we can reach even a 6 or a 7. To reach a score of 8, we should also have tacit knowledge. Good variables to use, meaningful interactions, and so on. Guesswork is not allowed. If the description’s author mentions that they are not very certain about something, i.e., they are guessing what a column (or columns) mean, the score should be automatically lower. Be critical and do not be lenient. To reach a high score, a description should be really good and complete. Not knowing all the data types of the columns immediately implies a low score. Task: Critique the completeness of the description. Current table description Figure 2: Meta-prompt used in the self-critical stage. The agent critically assesses the current table description while providing suggestions on aspects to improve. ### III-B Simulating company structures We simulate the dissemination of tacit knowledge across a synthetic company network using epidemic models [22]. Specifically, we employ the Susceptible-Infectious (SI) model with waning infectivity [23, 24]. In this context, knowledge about a dataset table is treated as a collection of “diseases”, where each column of the table represents a distinct “fact” (a unit of information). The dynamics of the SI model in this framework are defined as follows: - S (Susceptible): Individuals who are unaware of a specific fact in the table. - I (Infectious): Individuals who know a fact and can share it. The likelihood of sharing decreases with time. The dissemination process begins with a “patient zero”, an individual who possesses complete knowledge (i.e., all facts corresponding to all columns of the table). Knowledge spreads through the organizational network, which consists of two components. The Hierarchy Network represents the formal organizational structure, explicitly known to the LLM agent; and the Relationship Network is a hidden network of informal connections (e.g., frequent interactions among colleagues) that the agent infers through its interactions. Intuitively, employees who work under the same team know each other, which leads to the Hierarchy Network being a sub-graph of the more complete Relationship Network. Before allowing the LLM to traverse the network, we disseminate knowledge using the epidemic model across the complete Relationship Network. The key feature of this model is the waning infectivity, where an individual’s ability to share knowledge diminishes over time. To simulate this, the time-dependent transmission rate $β(t)$ is modeled as: $$ β(t)=β_0e^-γ t \tag{1} $$ where $β_0$ is the initial transmission rate, and $γ$ determines how quickly the ability to share knowledge wanes. This SI model is used to simulate the uneven distribution of knowledge within the relationship network. Since the ability to transmit knowledge quickly wanes, over time some employees will know specific facts about the dataset table (i.e., become “infected”), while others will remain unaware. As such, we consider both: - Knowledge Transfer: Susceptible individuals S become informed about a fact at a rate proportional to $β(t)SI$ , where $β(t)$ decays over time. - Knowledge Loss: Infected individuals gradually lose their ability to spread a specific fact as $β(t)$ decreases. The agent’s task is to traverse this network after convergence using the initially incomplete knowledge about the evaluated table and employee relationships, hold conversations, and reconstruct the full table documentation by piecing together the facts that have spread across different individuals. To simulate diverse organizational structures, we define key parameters that allow us to model companies ranging from startups to large multinational corporations, from highly formal bureaucracies to agile, informal networks. - Max Hierarchical Depth: Shallow hierarchies represent organizations inspired by Taylor’s principles of functional management, emphasizing task specialization, direct oversight, and higher interconnectivity among employees. These structures are often seen in manufacturing setups or flat startups. Deep hierarchies reflect Weber’s bureaucratic organization theory, characterized by formal authority, strict adherence to rules, and clear chains of command. These structures reduce interconnectivity, with knowledge flow restricted to well-defined channels. By adjusting the hierarchical depth, we can simulate a spectrum of companies, from flat startups fostering innovation to rigid bureaucracies prioritizing stability and control. - Number of Employees: This parameter spans small teams to large multinational corporations. Smaller organizations model the dynamics of startups, where knowledge is often centralized or informally shared. Larger organizations allow us to explore the complexities of scale, including silos, multi-layered decision-making, and formalized processes. We assume balanced tree-like structures as the hierarchy. Thus, we infer the branching factor $b_d$ (that is, the number of people working under each boss) as: $$ b_d≈√[hierarchy depth]{number of employees} \tag{2} $$ - Alpha and Decay: These parameters govern how knowledge spreads and wanes in the relationship network. Slow decay mimics organizations with stable knowledge-sharing practices. High decay represents environments where knowledge quickly becomes obsolete, such as fast-paced industries or competitive workplaces. In our simulation, decay is directly tied to how far the knowledge can spread from the initial patient zero, while alpha governs the probability of knowledge being shared. - Number of Informal Connections: Informal connections mimic agile or networked organizations with significant ad-hoc collaboration, enabling knowledge flow outside formal channels. A low number of informal connections represents traditional, hierarchical organizations, where interactions align strictly with reporting structures. By varying the balance between formal and informal connections, we can simulate environments from rigid bureaucracies to dynamic, innovation-driven teams. These parameters enable the simulation of various company types, including startups (flat, highly interconnected structures with informal knowledge-sharing dynamics), large multinationals (deep hierarchies, rigid formal connections, and slow diffusion of knowledge), or even agile organizations (high mean degrees, shallow hierarchies, and dynamic knowledge-sharing networks). As per Max Weber’s social theory, these simulations capture the dichotomy between traditional bureaucracies (emphasizing stability and control) and more modern, flexible organizations. ## IV Experiments To evaluate the effectiveness of our approach, we conducted experiments with various configurations of company structures, knowledge, and dissemination rates. Figure 3 illustrate an example of a synthetic organization. For each of the parameters discussed, we elected a range of possible values and iterate over all possible combinations. We assess the statistical significance of our measurements through a pairwise t-test with p-value $≤ 0.05$ over 3 repetitions of each simulated environment and agent’s interaction. In particular, we consider: - Hierarchical Depth: 2, 5, 10, 20 (from shallow to deep) - Number of Employees: 20, 75, 200 (small, medium, and large organizations) - Alpha: 0.1, 0.5 - Decay: 0.5, 0.8 - Number of Informal Connections: 0, 2.5, 5 (from formal to heavily informal) - Number of Table Columns: 5, 20 <details> <summary>extracted/6597023/Figures/kwoledgelevel_1_.png Details</summary> ![c4e532af](/v1/image/c4e532af8abb5377a449d90e78ff84165fd4e3b5770cd04264c8e6fb72adfcad) ### Visual Description ## [Diagram Type]: Hierarchical Tree Diagram ### Overview The image displays a hierarchical tree diagram with a root node at the top and multiple levels of child nodes branching downward. The diagram consists of 25 circular nodes, each labeled with a unique integer from 0 to 24. Nodes are connected by directed arrows pointing from parent to child. The nodes are colored with a gradient of red/orange hues, from very light peach to deep maroon, which likely represents a quantitative value or categorical attribute, though no explicit legend is provided. ### Components/Axes * **Nodes:** 25 circular nodes, each containing a numerical label (0 through 24). * **Edges:** Directed arrows connecting parent nodes to child nodes, indicating a one-way relationship or flow. * **Hierarchy:** The structure is a tree with a single root (node 0) and multiple branches. * **Color Scale:** A gradient from light peach (e.g., nodes 0, 2, 18) to deep maroon (node 4). The color intensity appears to correlate with the node's position or a hidden value, but the exact mapping is not defined in the image. ### Detailed Analysis **Spatial Layout and Node Connections:** * **Root (Level 0):** Node `0` (light peach) is positioned at the top center. * **Level 1:** Node `0` has three direct children: * Node `1` (orange-red) to the left. * Node `2` (very light peach) in the center. * Node `3` (light orange) to the right. * **Level 2:** * Node `1` has five children: `4` (deep maroon), `5` (red), `6` (dark red), `7` (dark red), `8` (very dark red). * Node `2` has two children: `9` (light peach), `10` (light peach). * Node `3` has two children: `11` (very light peach), `12` (orange-red). * **Level 3 (Leaf Nodes for some branches):** * Node `4` has two children: `13` (orange-red), `14` (orange-red). * Node `5` has three children: `15` (very light peach), `16` (orange-red), `17` (very light peach). * Node `6` has five children: `18` (very light peach), `19` (light peach), `20` (light peach), `21` (light peach), `22` (light orange). * Node `7` has two children: `23` (light orange), `24` (light orange). * Nodes `8`, `9`, `10`, `11`, `12` are leaf nodes with no children. **Node Color and Label Inventory:** * **Deep Maroon/Darkest Red:** Node `4`. * **Very Dark Red:** Node `8`. * **Dark Red:** Nodes `6`, `7`. * **Red:** Node `5`. * **Orange-Red:** Nodes `1`, `12`, `13`, `14`, `16`. * **Light Orange:** Nodes `3`, `22`, `23`, `24`. * **Light Peach:** Nodes `2`, `9`, `10`, `19`, `20`, `21`. * **Very Light Peach:** Nodes `0`, `11`, `15`, `17`, `18`. ### Key Observations 1. **Asymmetric Branching:** The tree is highly asymmetric. Node `1` is the most prolific parent (5 direct children, 10 total descendants), while nodes `2` and `3` have fewer branches. 2. **Color Distribution:** The darkest nodes (`4`, `8`, `6`, `7`) are all direct children of node `1`. The lightest nodes are found at the root (`0`), under node `2`, and as some leaf nodes under nodes `5` and `6`. 3. **Depth Variation:** The tree has a maximum depth of 3 edges from the root (e.g., path 0 -> 1 -> 4 -> 13). Some branches terminate at level 2 (e.g., node `8`). 4. **No Explicit Legend:** The meaning of the color gradient is not defined within the image. It could represent node weight, priority, risk, value, or a categorical grouping. ### Interpretation This diagram visually represents a hierarchical system or dataset. The structure suggests a parent-child relationship model, such as: * An **organizational chart** where node 0 is the CEO, and colors might represent department budgets or performance metrics. * A **decision tree** or **classification model** where nodes are decision points and colors indicate impurity or class probability. * A **data lineage** or **workflow diagram** showing the flow and transformation of data from a source (node 0) through various processes. * A **network topology** or **dependency graph**. The concentration of darker, more intense colors under node `1` suggests that this branch of the hierarchy is of particular significance, intensity, or contains a higher value of the measured attribute. The asymmetry indicates an uneven distribution of resources, complexity, or data flow within the system. Without a legend, the precise quantitative or categorical meaning of the colors remains ambiguous, but their systematic variation is the primary visual cue for additional information beyond the pure structure. </details> (a) Hierarchy and knowledge levels. <details> <summary>extracted/6597023/Figures/relationships_1_.png Details</summary> ![8098bf19](/v1/image/8098bf19587893494a603a3ea7f414411105e81ea883f52acf1833a451b8b6a1) ### Visual Description \n ## Directed Graph Diagram: Hierarchical Network Structure ### Overview The image displays a directed graph (or network diagram) consisting of 25 nodes, labeled with integers from 0 to 24. The nodes are arranged in a hierarchical, layered structure with directed edges (lines) connecting them, indicating a flow or relationship from higher to lower layers. The graph appears to represent a multi-level network, such as an organizational chart, a data flow diagram, or a dependency tree. ### Components/Axes * **Nodes:** 25 circular nodes, each filled with a light green color and containing a unique integer label (0 through 24). * **Edges:** Directed lines (black) connecting nodes, indicating a one-way relationship or flow from a source node to a target node. * **Layout:** The graph is organized into four distinct horizontal layers or tiers. * **Layer 0 (Top):** Contains a single node: **0**. * **Layer 1:** Contains three nodes: **1, 2, 3**. * **Layer 2:** Contains nine nodes: **4, 5, 6, 7, 8, 9, 10, 11, 12**. * **Layer 3 (Bottom):** Contains twelve nodes: **13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24**. ### Detailed Analysis **Node Connectivity and Flow:** The flow is strictly top-down. Connections exist between adjacent layers, with no visible connections within the same layer or skipping layers. 1. **From Layer 0 (Node 0):** * Node 0 has directed edges to all three nodes in Layer 1: **1, 2, and 3**. 2. **From Layer 1:** * **Node 1** connects to nodes **4, 5, 6, 7, 8, 9, 10, 11** in Layer 2. * **Node 2** connects to nodes **6, 7, 8, 9, 10, 11** in Layer 2. * **Node 3** connects to nodes **9, 10, 11, 12** in Layer 2. 3. **From Layer 2:** * **Node 4** connects to nodes **13, 14** in Layer 3. * **Node 5** connects to nodes **13, 14, 15, 16, 17, 18** in Layer 3. * **Node 6** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20** in Layer 3. * **Node 7** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20, 21** in Layer 3. * **Node 8** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20, 21, 22** in Layer 3. * **Node 9** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23** in Layer 3. * **Node 10** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24** in Layer 3. * **Node 11** connects to nodes **13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24** in Layer 3. * **Node 12** connects to nodes **23, 24** in Layer 3. **Spatial Grounding:** * **Node 0** is positioned at the top-center of the diagram. * **Layer 1 (Nodes 1, 2, 3)** is positioned directly below Node 0, spaced evenly across the width. * **Layer 2 (Nodes 4-12)** is positioned below Layer 1, with nodes spread across the full width. Nodes 4 and 5 are on the far left, nodes 11 and 12 are on the far right. * **Layer 3 (Nodes 13-24)** forms the bottom row, spanning the entire width of the diagram. * The density of connecting lines increases significantly from the top layer to the bottom layer, creating a complex web in the lower half of the diagram. ### Key Observations 1. **Increasing Fan-Out:** The number of outgoing connections (fan-out) from nodes generally increases as you move down the hierarchy. Node 0 has a fan-out of 3, Layer 1 nodes have fan-outs ranging from 4 to 8, and Layer 2 nodes have fan-outs ranging from 2 to 12. 2. **Central Hub Nodes:** Nodes **6, 7, 8, 9, 10, and 11** in Layer 2 act as major hubs, receiving connections from multiple Layer 1 nodes and distributing connections to nearly all nodes in Layer 3. 3. **Peripheral Nodes:** Nodes **4, 5, and 12** in Layer 2 have more limited connectivity, linking to specific subsets of the bottom layer. Node 12, in particular, only connects to the last two nodes (23, 24). 4. **Complete Connectivity in Lower Layers:** The bottom layer (Nodes 13-24) is highly interconnected from above. Most nodes in Layer 3 receive connections from multiple sources in Layer 2, suggesting a many-to-many relationship or a robust distribution network at the base. ### Interpretation This diagram represents a **hierarchical broadcast or distribution network**. The structure suggests a system where information, commands, or resources flow from a single source (Node 0) through intermediate distributors (Layers 1 and 2) to a broad base of end-points (Layer 3). * **Function:** It could model an organizational command structure, a computer network topology, a supply chain, or a data dissemination protocol. * **Robustness & Redundancy:** The high connectivity in the lower layers implies redundancy. If a hub node in Layer 2 fails, its target nodes in Layer 3 likely still receive input from other hub nodes, making the system resilient. * **Bottleneck Potential:** Nodes in Layer 1 and the central hubs in Layer 2 are critical points. Failure of Node 0 or Node 1, for example, would disrupt flow to a large portion of the network. * **Scalability:** The design is scalable at the bottom (Layer 3 can be expanded) but may face bottlenecks at the middle layers if the number of end-points grows significantly without adding more intermediate distributors. **Note:** The diagram contains no textual labels beyond the node identifiers (0-24). There are no axis titles, legends, or data values to extract, as this is a structural diagram, not a data chart. The primary information is the topology itself—the nodes and their directed connections. </details> (b) Relationships graph. Figure 3: A synthetic organization with 25 employees, 4 hierarchical steps and 2.5 $×$ informal connections. Red shades indicate the amount of knowledge (e.g.: number of columns known) that an employee holds. Decay 0.8, alpha 0.1. For each repetition, a table subject was randomly selected from a predefined list, which was then used to generate the simulated complete knowledge $s^*$ through an LLM. This includes the overall table description, each column (with the respective type, meaning, and examples), as well as its possible variable relations. The possible table subjects for the generated knowledge include: aerospace, agriculture, automotive, business, construction, finances, food service, healthcare, machinery, mining, oil, packaging, energy, retail, sports, transportation, tourism, and tech. These subjects were selected following common industry sectors, as defined both in literature [25, 26, 27] and by the International Labour Organization https://www.ilo.org/industries-and-sectors. Given a parameter configuration, we then build the respective company structure, including the Hierarchy Network and the Relationship Network. In sequence, the individual facts from the original knowledge $k^*$ are disseminated in the company network using the SI epidemic model throughout the constructed company. These steps follow the methodology described in Section III-B. We also generate a textual background description for each employee, according to a job role as defined by the hierarchy and a randomized personality archetype, as well as a description of the information relevant to the problem they have access to: what partial knowledge of the table the individual has, their connections in the company’s network, and what partial knowledge they are aware their connections have (determined probabilistically). These backgrounds are intended to make responses more human-like and also serve as context to answer the main agent’s questions during the simulation. Finally, using the built structures and generated descriptions, we begin the conversation as described in Section III-A by assigning the questioning agent a random starting employee from the bottom of the hierarchy. All experiments were conducted on OpenAI’s GPT-4o mini model. ## V Discussion After conducting repeated conversational experiments with varying fictional company structures, we summarized the conversational statistics by averaging the results for each company structural parameter. We collected the self-critical score values generated during the conversations and other aspects that describe how the agent navigated the companies’ internal structures. In Table I, Len(path) represents the average number of non-unique fictitious employees contacted by the agent during inquiries about the data. In some cases, the agent contacted certain employees multiple times, effectively making these simulated individuals act as hubs within the conversational graph. We measured the average number of hubs (# Hubs) observed during the simulations. Additionally, we calculated the minimum distance between the first person contacted by the agent and “patient zero”, who possesses all the knowledge about the table’s columns. In the table, this metric is labeled Min. dist. and was obtained using the Dijkstra algorithm [28] applied to the relationship graph of each simulated company. We measured the percentage of times the conversational agent reached “patient zero” (% p0) and reported the average and mean self-critical score produced. TABLE I: Overview of statistics grouped and averaged for different simulated companies’ parameters. NIC: number of informal connections; MHD: maximum hierarchical depth. | Parameter | Value | Len(path) | # Hubs | Min. dist. | % p0 | Avg. score | Final score | | --- | --- | --- | --- | --- | --- | --- | --- | | Alpha | 0.1 | 31.6 | 1.26 | 3.94 | 0.38 | 5.49 | 6.79 | | 0.5 | 16.0 | 0.59 | 3.95 | 0.22 | 6.34 | 8.07 | | | NIC | 0.0 | 36.34 | 1.32 | 5.89 | 0.41 | 5.35 | 6.79 | | 2.5 | 19.69 | 0.82 | 3.06 | 0.26 | 6.1 | 7.66 | | | 5.0 | 15.38 | 0.65 | 2.89 | 0.23 | 6.3 | 7.84 | | | Decay | 0.5 | 27.81 | 1.06 | 3.91 | 0.34 | 5.67 | 7.2 | | 0.8 | 19.8 | 0.79 | 3.99 | 0.25 | 6.16 | 7.66 | | | MHD | 2 | 13.81 | 0.65 | 2.72 | 0.24 | 6.37 | 7.97 | | 5 | 25.0 | 1.01 | 4.01 | 0.32 | 5.8 | 7.36 | | | 10 | 27.85 | 1.06 | 4.39 | 0.29 | 5.74 | 7.19 | | | 20 | 28.56 | 0.98 | 4.66 | 0.33 | 5.75 | 7.21 | | | # Employees | 20 | 8.66 | 0.6 | 2.77 | 0.48 | 6.06 | 7.57 | | 75 | 26.67 | 1.08 | 4.05 | 0.28 | 5.94 | 7.59 | | | 200 | 36.09 | 1.1 | 5.02 | 0.14 | 5.74 | 7.13 | | | # Features | 5 | 25.13 | 0.91 | 4.01 | 0.28 | 6.2 | 7.6 | | 20 | 22.48 | 0.95 | 3.88 | 0.31 | 5.64 | 7.26 | | Regarding the parametization of the SI model, we observe that the average conversational path length tends to decrease as the alpha value increases, while the description scores tend to improve. Analogously, the number of non-unique employees contacted by the conversational agent decreases as the decay of knowledge transmission increases. The same decrease is observed for the number of hubs within the organization. These observations suggests that knowledge is more evenly distributed within the companies’ structure, which aligns with our expectations. There does not appear to be a clear relationship between changes in the alpha value and the minimum distance to patient zero. However, patient zero becomes less likely to be reached as the alpha value increases and the agent has more options to explore before reaching the employee holding all the information about a table. Additionally, the number of hubs decreases with an increase in alpha. As expected, the average path length and the distance to “patient zero” increase as the maximum hierarchy depth increases. This indicates that the agent needs to contact more employees to retrieve the desired knowledge. Additionally, the overall description score decreases slightly with an increase in maximum hierarchy depth, hinting at knowledge fragmentation within the company structure. On the other hand, the score values remain relatively stable despite changes in the number of employees, highlighting the robustness of the description extraction procedure regardless of company size. We likewise observe that a higher Number of Informal Connections (NIC) corresponds to a higher self-critical score obtained at the end of the simulations. Additionally, as the NIC increases in the simulated company structure, the number of conversational hubs and the overall conversation path length decrease. Similarly, the minimum distance between the starting employee and patient zero decreases as the number of connections increases. These results are expected since a higher NIC increases the likelihood of quickly reaching employees who hold specific information about the data at hand. The number of times the employee with all the information is contacted decreases with a higher NIC, suggesting that information is more likely to be distributed among employees when they are more interconnected. However, this observation, constrained by our simulation setup, only accounts for the target knowledge sought by the conversational agent. In real-world scenarios, informal connections are more likely to imply shared informal knowledge rather than business knowledge. Apart from a slight decrease in score quality as the number of features increases, varying the number of features does not strongly correlate with other experimental parameters. This is expected, as these parameters primarily influence the companies’ structure rather than the knowledge retrieval task. After analyzing the simulated conversations through which the agents reconstructed the original knowledge, we also evaluated the quality of the reproduced table descriptions that were generated at the end of each experiment run. This helps us to consider whether the proposed methodology can effectively achieve its desired goal of recovering the disseminated knowledge. For that purpose, we compare the final description $k_t$ generated by the agent with the original knowledge description $k^*$ for that table according to the following metrics: #### Full-knowledge recall We first checked if the agent was able to generate a table description that included a mention of every column originally disseminated in the company’s network. An agent is considered to have achieved full-knowledge recall in an experiment run if it succeeded in at least reproducing all table columns in its final report. #### METEOR We also measure the METEOR score [29] of the descriptions for each column in the table. The score creates word alignments between the candidate and reference texts, using not only exact matches, but also stems and WordNet synonyms and calculates the harmonic combination of precision and recall for those alignments. The METEOR scores for all columns are averaged to arrive at an overall score cMETEOR for the generated description $k_t$ . #### G-Eval We then used an LLM-as-judge approach to better capture the semantic equivalency between the original and generated descriptions. For that, we apply the G-Eval LLM evaluation framework [30] to score each column description according to prompted definitions of textual quality. The G-Eval framework uses LLMs with Chain-of-Thought to fill out a form and generate a final score between 1 and 5 according to the given definition. The G-Eval scores for all columns are averaged to arrive at an overall score for the generated description $k_t$ , which we refer to as cGEvalCoh, for a definition of “coherence”, and cGEvalFaith, for a definition of “faithfulness”. #### Self-critical with context Finally, to directly compare how having access to the official table $k^*$ can change the agent’s own evaluation, we altered the self-critic LLM prompts to give the evaluation score for the same final description $k_t$ , but now having access to the original description as context. From the 864 simulations executed, the LLM agent achieved full-knowledge recall in 820 of them (94.9%). As such, in the vast majority of cases, the agent at a minimum retained some mention of every column present in the original table $k^*$ . In the remaining 44 cases, the final description $k_t$ retained $∼ 77\$ of the expected columns on average. The agent-generated descriptions measured on average $0.17$ on the column-based METEOR score, $2.65$ for column-based G-Eval Coherence and $4.37$ for G-Eval Faithfulness. In addition, adding the original description to the self-critical evaluation reduced the average score from $7.43$ to $6.75$ . While looking directly at the value of those metrics is insufficient to evaluate the quality of the created texts definitively, we can analyze their values in relation to each other and to additional attributes of the simulation runs. Table II shows the rank correlation between the final score given by the self-critic agent during the simulation and the evaluation metrics based on the original knowledge. We identify a strong correlation between the self-critical score with and without context, indicating that the agent’s own evaluation during the conversation partly maintains its relative ranking once it gains access to the original knowledge, even if the absolute score does get reduced. We also observe that the cMETEOR and cGEvalCoh metrics are strongly correlated, indicating that both the word alignment-based metric and the LLM-as-judge method similarly measure the correspondence between the target and reference texts in this context. cGEvalFaith also showed a moderate correlation with the other evaluation metrics, such as $0.66$ to the score with context. Lastly, we assess the effectiveness of our proposed autonomous agent-based knowledge retrieval process by evaluating how the quality of the obtained descriptions is affected by interacting with patient zero or not. In our simulated setup, once the agent reaches patient zero, the task becomes trivial. Ideally, the agent should be able to retrieve high-quality descriptions from employees who do not hold full knowledge. We use the company simulation parameters discussed in Table I as inputs for dimensionality reduction with the UMAP algorithm [31] and aggregated points over a grid with colors representing either cGEvalFaith or % p0 (averaged over the three repetitions performed for each configuration). The resulting projections are shown side by side in Figure 4. We identify areas with high-quality descriptions where patient zero was never contacted (e.g., the center region of the projected points). Conversely, we also observe areas where patient zero was contacted and the obtained cGEvalFaith was high. These findings, along with the low correlation ( $-0.06$ ) between the compared metrics, reinforce that the effectiveness of the proposed agent does not depend on reaching the employee $p_0$ who possesses complete knowledge. TABLE II: Spearman correlation between metric pairs and metrics averaged over the simulations (bottom row). SCS: self-critical score; SCS+C: self-critical score with context. | | cMETEOR | cGEvalCoh | cGEvalFaith | SCS | SCS+C | | --- | --- | --- | --- | --- | --- | | cMETEOR | – | 0.772820 | 0.436230 | 0.263699 | 0.279494 | | cGEvalCoh | 0.772820 | – | 0.565856 | 0.347426 | 0.447342 | | cGEvalFaith | 0.436230 | 0.565856 | – | 0.497946 | 0.665396 | | SCS | 0.263699 | 0.347426 | 0.497946 | – | 0.729141 | | SCS+C | 0.279494 | 0.447342 | 0.665396 | 0.729141 | – | | Mean Overall Score | 0.1741 | 2.6531 | 4.3675 | 7.4317 | 6.7465 | <details> <summary>extracted/6597023/Figures/faithfulness_nolabel.png Details</summary> ![3891b7e4](/v1/image/3891b7e4b70087dfb3527f79eef9555f1bfd67d911a8d153b69faba1c491ffac) ### Visual Description ## Heatmap: Unlabeled Spatial Data Distribution ### Overview The image displays a 2D heatmap visualization. It consists of a grid of colored squares (cells) arranged in an irregular, sparse pattern against a white background. A vertical color scale bar is positioned to the right of the main grid. There are no axis titles, a main chart title, or labels for the grid's rows and columns. The data appears to represent the intensity or magnitude of a measured variable across a two-dimensional space. ### Components/Axes * **Main Grid:** A square area containing colored cells. The grid dimensions are not explicitly labeled, but visually it appears to be approximately 20x20 cells. Many cells are empty (white). * **Color Scale (Legend):** A vertical bar located on the right side of the image. * **Scale Type:** Continuous, sequential color gradient. * **Color Range:** From a very light, pale peach/white at the bottom to a deep, dark red at the top. * **Numerical Labels (Markers):** The scale is marked with the following values, from bottom to top: `2.5`, `3.0`, `3.5`, `4.0`, `4.5`. * **Interpretation:** The color of each cell in the main grid corresponds to a numerical value according to this scale. Darker red indicates a higher value (~4.5), while lighter shades indicate lower values (~2.5). ### Detailed Analysis * **Data Point Distribution:** The colored cells are not uniformly distributed. They form several distinct clusters and isolated points. * **Cluster 1 (Top-Left Quadrant):** A dense, irregular cluster of cells. Colors range from medium red to very dark red, indicating values primarily between ~3.5 and 4.5. * **Cluster 2 (Center):** A significant concentration of cells. This area contains a mix of colors, including some of the darkest red cells (value ~4.5) and several lighter orange/peach cells (value ~3.0-3.5). * **Cluster 3 (Bottom-Right Quadrant):** Another dense cluster. Shows a similar range of values as the center cluster, with a notable presence of very dark red cells. * **Sparse Areas:** The top-right and bottom-left quadrants have significantly fewer data points. The cells present in these areas tend to be lighter in color (values ~2.5-3.5), with a few exceptions. * **Value Range:** The data values represented span the full range of the color bar, from approximately 2.5 to 4.5. * **Pattern:** The data does not form a simple geometric shape or gradient. It appears as a complex, possibly organic or stochastic, spatial pattern with areas of high density and high value interspersed with empty space. ### Key Observations 1. **Missing Context:** The complete absence of axis labels, a title, or any descriptive text makes it impossible to know what the X and Y dimensions represent (e.g., time, space, categories) or what the measured variable is. 2. **High-Value Hotspots:** The darkest red cells (value ~4.5) are not isolated; they appear in small groups within the major clusters, suggesting localized "hotspots" of high intensity. 3. **Data Sparsity:** A significant portion of the grid is empty (white). This could indicate zero values, missing data, or a threshold below which data is not displayed. 4. **Color Gradient Clarity:** The color scale is well-defined and continuous, allowing for reasonable estimation of values based on color matching. ### Interpretation This heatmap visualizes the spatial distribution of a quantitative variable. The clustering suggests that the phenomenon being measured is not random but occurs in specific, concentrated regions. The co-location of high-value (dark red) and medium-value (orange) cells within the same clusters indicates variability within these active regions. **Without axis labels, the specific meaning is ambiguous.** However, common interpretations for such a pattern could include: * **Geospatial Data:** Density of events (e.g., disease cases, sensor readings, species sightings) across a geographic area. * **Scientific Measurement:** Intensity of a signal (e.g., gene expression, electromagnetic field strength, material stress) across a sample or simulation grid. * **Abstract Data:** Correlation strength between pairs of items in a matrix, where the X and Y axes represent the same set of items. The primary takeaway is the existence of distinct, heterogeneous clusters of activity. The next critical step for understanding this data would be to obtain the missing metadata: what the axes measure and what the color scale quantifies. </details> (a) <details> <summary>extracted/6597023/Figures/p0_reached_nolabel.png Details</summary> ![b2bbd0d2](/v1/image/b2bbd0d27b92b088cc29748d87e24a6943a172135ab5040c73d3f185939b1b0b) ### Visual Description ## Heatmap: Unlabeled Grid Visualization ### Overview The image displays a 2D heatmap visualization. It consists of a square grid of colored cells (pixels) arranged in a seemingly random or sparse pattern against a white background. A vertical color scale bar is positioned to the right of the main grid. There are no visible axis labels, titles, or a legend identifying the categories represented by the grid's rows and columns. ### Components/Axes 1. **Main Grid Area:** A square region containing a sparse distribution of colored squares. The grid appears to be approximately 30x30 cells, though the exact dimensions are not labeled. The cells are either colored or white (empty/background). 2. **Color Scale Bar:** Located on the right side of the image, running vertically. * **Scale Range:** 0.0 (bottom) to 1.0 (top). * **Tick Marks & Labels:** The scale has labeled tick marks at intervals of 0.2: `0.0`, `0.2`, `0.4`, `0.6`, `0.8`, `1.0`. * **Color Gradient:** The bar shows a continuous gradient from a very light pink/white at 0.0, through shades of orange and red, to a dark burgundy/red at 1.0. 3. **Missing Elements:** The image contains **no** axis titles, row/column labels, chart title, or a separate legend box. The color bar serves as the sole legend for interpreting cell values. ### Detailed Analysis * **Data Representation:** Each colored cell in the grid represents a numerical value between 0.0 and 1.0, as defined by the color scale. The color of the cell maps directly to a value on the gradient. * **Spatial Distribution & Value Clusters:** * **High-Value Clusters (Dark Red, ~0.8-1.0):** Several isolated dark red cells are scattered throughout the grid. Notable clusters or individual high-value points appear in the: * Top-left quadrant (e.g., a dark cell near the top edge). * Center-left area. * Bottom-right quadrant (a small cluster of dark cells). * Bottom-center area. * **Medium-Value Clusters (Orange-Red, ~0.4-0.7):** These are more numerous and form loose, irregular groupings. They are often adjacent to or surrounding the high-value dark red cells. A significant concentration exists in the upper half of the grid. * **Low-Value Cells (Light Pink, ~0.1-0.3):** These are dispersed widely, often filling spaces between medium-value clusters. They are less common than medium-value cells. * **Empty/Zero Cells (White):** A large portion of the grid is white, indicating either a value of 0.0, missing data, or a value below the visualization threshold. The pattern is sparse, not a dense matrix. * **Color-Value Cross-Reference:** The darkest burgundy cells (e.g., in the bottom-right cluster) correspond to the top of the scale (~1.0). The lightest pink cells correspond to values just above 0.0. The orange cells fall in the middle of the gradient (~0.5). ### Key Observations 1. **Sparsity:** The data is highly sparse. Most grid cells are empty (white), with colored cells forming an irregular, non-contiguous pattern. 2. **Lack of Structure:** There is no immediately apparent geometric structure (like a diagonal, blocks, or waves) to the distribution of colored cells. It appears random or based on an underlying, unlabeled dataset. 3. **Value Range Utilization:** The full range of the color scale (0.0 to 1.0) is utilized, from the lightest pinks to the darkest reds. 4. **No Labels:** The complete absence of axis or category labels makes it impossible to determine what the rows, columns, or individual cells represent without external context. ### Interpretation This heatmap visualizes a sparse, two-dimensional dataset where most coordinates have a null or zero value. The colored cells indicate the presence and magnitude of a measured variable at specific (x, y) coordinates. * **What it Demonstrates:** The visualization highlights "hot spots" (dark red cells) of high intensity or probability scattered across the measured space. The surrounding medium and low-value cells suggest a gradient or diffusion of the measured property around these hotspots, or simply independent, lower-magnitude events. * **Relationships:** The clustering of similarly colored cells (e.g., dark red cells near other red/orange cells) suggests possible spatial correlation or autocorrelation in the underlying data—where high values are more likely to be found near other high values. * **Anomalies & Patterns:** The primary anomaly is the sparsity itself. The pattern does not suggest a simple linear relationship or a uniform distribution. Without labels, it could represent anything from a correlation matrix with many zero correlations, to a spatial map of event occurrences (e.g., sensor activations, genetic markers, or document term frequencies), to a visualization of weights in a neural network layer. * **Peircean Investigative Reading:** The sign (the heatmap) is an **icon** of the underlying data structure, representing its spatial layout directly. It is also an **index**, as the color intensity points to the magnitude of the value at each location. The viewer must use abductive reasoning to infer the context: the sparsity and random-like distribution might indicate a complex, non-linear system or a high-dimensional dataset projected into 2D. The lack of labels is a critical indexical gap, pointing to the necessity of external information (a caption, accompanying text, or dataset description) to complete the interpretation. </details> (b) Figure 4: UMAP comparing the quality of the obtained table descriptions against the dependency on contacting patient zero. ## VI Conclusion In this paper, we propose a novel approach for tacit knowledge retrieval using LLM-based agents within organizational environments. We map the problem as a task to reconstruct database table descriptions that have been disseminated through a company’s employee network. Our approach allows the agent to navigate through the company graph, engaging in natural language conversations with employees to gradually accumulate partial information of the desired table and gain context to help direct its next course of action. To validate our proposed method, we explore a broad simulated setup that encompasses a diverse range of companies structures with different generated table descriptions. Our empirical findings show that the proposed approach is robust and effective in retrieving tacit knowledge spread within the hierarchy of the simulated companies. By evaluating the reference-free self-critical scores used by our agent during its exploration process, we observe that these scores exhibit similarities to the reference-based evaluation metrics considered in our setup. We also identify that the agent is often able to retrieve the full table description without ever directly contacting the employee that was the source of the disseminated knowledge, achieving a recall rate of $94.9\$ . These results showcase the robustness of our approach and its ability to reconstruct tacit knowledge through automated conversational interactions. For future work, we are implementing our approach at Kunumi, letting the agent interact with real employees in the company network. While further research is needed, preliminary results indicate improved documentation quality, enhancing workflows and efficiency as knowledge concentrated among managers becomes more accessible. ## Code and Data Availability The code, data, and prompts used for the machine-learning analyses, available for non-commercial use, has been deposited at https://doi.org/10.6084/m9.figshare.28785524 [32]. ## References - [1] M. Weber, The theory of social and economic organization. Simon and Schuster, 1947. - [2] F. W. Taylor, Scientific management. Routledge, 2004. - [3] I. Z. Kiss, M. Broom, P. G. Craze, and I. Rafols, “Can epidemic models describe the diffusion of topics across disciplines?” Journal of Informetrics, vol. 4, no. 1, pp. 74–82, 2010. - [4] H.-M. Zhu, S.-T. Zhang, Y.-Y. Zhang, F. Wang et al., “Tacit knowledge spreading based on knowledge spreading model on networks with consideration of intention mechanism,” Journal of Digital Information Management, vol. 13, no. 4, pp. 293–300, 2015. - [5] M. Polanyi, “The tacit dimension,” in Knowledge in organisations. Routledge, 2009, pp. 135–146. - [6] I. Nonaka and H. Takeuchi, “The knowledge-creating company,” Harvard business review, vol. 85, no. 7/8, p. 162, 2007. - [7] M. U. Hadi, Q. Al Tashi, A. Shah, R. Qureshi, A. Muneer, M. Irfan, A. Zafar, M. B. Shaikh, N. Akhtar, J. Wu et al., “Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects,” Authorea Preprints, 2024. - [8] M. S. Orenstrakh, O. Karnalim, C. A. Suarez, and M. Liut, “Detecting llm-generated text in computing education: Comparative study for chatgpt cases,” in 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2024, pp. 121–126. - [9] J. Zamfirescu-Pereira, R. Y. Wong, B. Hartmann, and Q. Yang, “Why johnny can’t prompt: how non-ai experts try (and fail) to design llm prompts,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–21. - [10] T. Webb, K. J. Holyoak, and H. Lu, “Emergent analogical reasoning in large language models,” Nature Human Behaviour, vol. 7, no. 9, pp. 1526–1541, 2023. - [11] N. Miao, Y. W. Teh, and T. Rainforth, “Selfcheck: Using llms to zero-shot check their own step-by-step reasoning,” arXiv preprint arXiv:2308.00436, 2023. - [12] P. Blau, Exchange and power in social life. Routledge, 2017. - [13] T. Diefenbach and J. A. Sillince, “Formal and informal hierarchy in different types of organization,” Organization studies, vol. 32, no. 11, pp. 1515–1537, 2011. - [14] G. Hedlund, “Assumptions of hierarchy and heterarchy, with applications to the management of the multinational corporation,” in Organization theory and the multinational corporation. Springer, 1993, pp. 211–236. - [15] J. Mihm, C. H. Loch, D. Wilkinson, and B. A. Huberman, “Hierarchical structure and search in complex organizations,” Management science, vol. 56, no. 5, pp. 831–848, 2010. - [16] J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22. - [17] G. Dai, W. Zhang, J. Li, S. Yang, S. Rao, A. Caetano, M. Sra et al., “Artificial leviathan: Exploring social evolution of llm agents through the lens of hobbesian social contract theory,” arXiv preprint arXiv:2406.14373, 2024. - [18] C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, Y. Su, X. Cong et al., “Chatdev: Communicative agents for software development,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 15 174–15 186. - [19] T. Wu, M. Terry, and C. J. Cai, “Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts,” in Proceedings of the 2022 CHI conference on human factors in computing systems, 2022, pp. 1–22. - [20] D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. Le et al., “Least-to-most prompting enables complex reasoning in large language models,” arXiv preprint arXiv:2205.10625, 2022. - [21] J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 24 824–24 837. - [22] W. O. Kermack and A. G. McKendrick, “A contribution to the mathematical theory of epidemics,” Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character, vol. 115, no. 772, pp. 700–721, 1927. - [23] M. Ehrhardt, J. Gašper, and S. Kilianová, “Sir-based mathematical modeling of infectious diseases with vaccination and waning immunity,” Journal of Computational Science, vol. 37, p. 101027, 2019. - [24] S. Ahmetolan, A. H. Bilge, A. Demirci, and A. P. Dobie, “A susceptible–infectious (si) model with two infective stages and an endemic equilibrium,” Mathematics and Computers in Simulation, vol. 194, pp. 19–35, 2022. - [25] D. J. Arent, R. S. Tol, E. Faust, J. P. Hella, S. Kumar, K. M. Strzepek, F. L. Tóth, D. Yan, A. Abdulla, H. Kheshgi et al., “Key economic sectors and services,” in Climate change 2014 impacts, adaptation and vulnerability: Part a: Global and sectoral aspects. Cambridge University Press 2010, 2015, pp. 659–708. - [26] P. S. Laumas, “Key sectors in some underdeveloped countries.” Kyklos, vol. 28, no. 1, 1975. - [27] B. Guibert, J. Laganier, and M. Volle, “An essay on industrial classifications,” Économie et statistique, vol. 20, pp. 1–18, 1971. - [28] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. MIT press, 2022. - [29] A. Lavie and A. Agarwal, “Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments,” in Proceedings of the Second Workshop on Statistical Machine Translation, ser. StatMT ’07. USA: Association for Computational Linguistics, 2007, p. 228–231. - [30] Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “G-eval: Nlg evaluation using gpt-4 with better human alignment,” in Conference on Empirical Methods in Natural Language Processing, 2023, p. 2511. - [31] L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018. - [32] G. Zuin, “Code and data for Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts,” 2025. [Online]. Available: https://doi.org/10.6084/m9.figshare.28785524

Rendering Paper...