2410.11905

Model: gemini-2.5-flash-lite-free

# A Scalable Communication Protocol for Networks of Large Language Models **Authors**: Samuele Marro&Emanuele La Malfa&Jesse Wright&Guohao Li, Nigel Shadbolt&Michael Wooldridge&Philip Torr, Oxford, UK Oxford, UK > Corresponding author. Email:samuele@robots.ox.ac.uk. Abstract Communication is a prerequisite for collaboration. When scaling networks of AI-powered agents, communication must be versatile, efficient, and portable. These requisites, which we refer to as the Agent Communication Trilemma, are hard to achieve in large networks of agents. We introduce Agora, a meta protocol that leverages existing communication standards to make LLM-powered agents solve complex problems efficiently. In Agora, agents typically use standardised routines for frequent communications, natural language for rare communications, and LLM-written routines for everything in between. Agora sidesteps the Agent Communication Trilemma and robustly handles changes in interfaces and members, allowing unprecedented scalability with full decentralisation and minimal involvement of human beings. On large Agora networks, we observe the emergence of self-organising, fully automated protocols that achieve complex goals without human intervention. 1 Introduction Human language evolved primarily for communication purposes (Fedorenko et al., 2024). Despite its inherent ambiguity, natural language provides great versatility and allows humans and machines to collaborate and achieve complex goals that they otherwise could not (Russell & Norvig, 2016). Decades of literature in computer science explored how to foster collaboration between agents modelled as programs (Wooldridge & Jennings, 1995; Gilbert, 2019). Several research papers design networks of agents to solve complex problems by leveraging each model’s specialisation, the so-called rule-based agents paradigm (Wooldridge, 2009). Despite its influence, such a paradigm faces two major limitations: agents hardly adapt to environmental changes and require structured data that limits their versatility (Gilbert & Terna, 2000). With the advent of Large Language Models (LLM) (Vaswani et al., 2017; Brown et al., 2020), there has been a resurgent interest in networks of collaborative agents. LLMs can solve a variety of problems (Achiam et al., 2023; Dubey et al., 2024a) expressed in natural language as they excel at following instructions (Schulman et al., 2017; Rafailov et al., 2024). LLMs also showed remarkable improvements at handling structured data such as graphs and formatted languages (Kassner et al., 2020; Collins et al., 2022; Jin et al., 2023; Lin et al., 2024). In terms of performance (e.g., accuracy on classification), the literature suggests that specialised LLMs outperform general purpose models (Hu et al., 2021; Zhang et al., 2024), as well as mitigating the difficulties of handling gargantuan models and the drawbacks of data and model centralisation (Song et al., 2023). Thus, we hypothesise that: Hypothesis A network of heterogeneous LLMs can automate various complex tasks with nearly no human supervision via specialised and efficient protocols. However, networks of LLM-powered agents face three key challenges that make communication at scale significantly more difficult: - LLMs are heterogeneous: different LLMs have different architectures, makers, capabilities and usage policies. Heterogeneity is not unique to agents of LLMs, yet, compared to classic MAS agents, LLMs come with deeper representations of the surrounding environment and are thus more challenging to standardise. - LLMs are (mostly) general-purpose tools: enumerating and standardising each task they can perform is infeasible. - LLMs are expensive: the computational footprint and inference time of “small” LLMs dwarfs that of comparable, specialised APIs. Scalable communication between heterogeneous LLMs must be versatile, i.e., capable of handling a variety of use cases, efficient, i.e., requiring the least computational effort, and portable, i.e., supporting the protocol should require the least human effort possible. The above-mentioned issues constitute the Agent Communication Trilemma, which we expand in Section 3. In light of this, the aim of this paper is the following: Key Contribution We design and implement a communication protocol between heterogeneous LLM-powered agents and assess its feasibility and scalability for solving high-order tasks. We sidestep the Trilemma with Agora, a meta protocol that relies on the dual use of structured data for frequent communications and natural language for infrequent ones. With Agora, we instantiate large networks of LLM-powered agents that solve complex tasks autonomously by leveraging efficient communications schemas. In such networks, we observe agents develop an emergent fully automated protocol to solve a complex task starting from an instruction expressed in natural language. We believe that this observation can serve as a basis to renew interest in emergent protocols/languages in large networks of LLMs (Lazaridou et al., 2018; Chaabouni et al., 2019; Lazaridou & Baroni, 2020; Chaabouni et al., 2022). The paper is structured as follows. We first outline the key challenges that constitute the Agent Communication Trilemma (Section 3); we then detail how Agora addresses the Trilemma and serves as a communication protocol for networks of LLMs (Section 4). Finally, in Section 5, we provide two fully functional demos Our code is available at github.com/agora-protocol/paper-demo.: the former, with two agents, to clarify Agora’s operating principles; the latter, with 100, to prove Agora’s scalability and show the emergence of self-organising behaviours. 2 Related Work Multi-agent LLMs and communication. At the time of writing, Multi-Agent-Systems of Large Language Models (MAS-LLM) have become an active area of research (Guo et al., 2024) after the upsurge of LLMs as general purpose problem solvers (Brown et al., 2020; Achiam et al., 2023; Dubey et al., 2024b). Many fields have adapted techniques from the MAS-LLM paradigm to solve problems single models fail at, including reasoning and math (Li et al., 2024), Theory of Mind (Cross et al., 2024; Li et al., 2023b), planning (Singh et al., 2024), alignment to human values (Pang et al., 2024), and simulation of games, economics, and political scenarios (Bakhtin et al., 2022; Hua et al., 2023; Wu et al., 2024a). The common intuition of these works is that by breaking a task into sub-components (Hong et al., 2023) and allocating a large number of specialised models (Li et al., 2024) to each of them (Li et al., 2023a), one can achieve higher performance and observe emergent behaviours that otherwise would not occur. On the other hand, a key requisite for solving complex tasks in large networks of MAS-LLMs is effective and efficient communication. In large networks, LLMs must agree on the actions to take (Chen et al., 2023): works such as Agashe et al. (2023) and Liang et al. (2023) studied how LLMs debate to foster collaboration on high-order tasks (Du et al., 2023). Another recent line of research explores the topology of the MAS-LLM network as a facilitator to reach consensus (Chen et al., 2024). LLMs for simulations and emergence of protocols. A few seminal works studied how emergent communication and protocols arise between neural networks that manipulate symbols (Havrylov & Titov, 2017; Lazaridou et al., 2018; Lazaridou & Baroni, 2020). Written before the rise of LLMs, these works inspired researchers to explore how spontaneous collaboration emerges in MAS-LLMs (Wu et al., 2024b), with application to simulation of societies (Gao et al., 2024). Of particular interest for this paper are the works by Chaabouni et al. (2019) and Chaabouni et al. (2022). Chaabouni et al. (2019) describes how emergent communication systems between neural networks privilege longer messages. Chaabouni et al. (2022) posits the existence of “scaling laws” (Kaplan et al., 2020) for large networks of MAS-LLMs in which the dataset, task complexity, and population size are the key to observe emergent behaviours. 3 The Agent Communication Trilemma <details> <summary>img/triangle-trilemma.png Details</summary> ![d8cc136f](/v1/image/d8cc136fa0dd46025c8e9aa27d1dad231ee635490b641d009e85c6fb59dc63f5) ### Visual Description ## Diagram: Conceptual Framework of API Interaction ### Overview This diagram is a conceptual framework presented as a triangle within a triangle. It illustrates relationships between different aspects of APIs, with "Agora" at the center of an inner triangle. The outer triangle's vertices and sides represent key concepts and their associated API types. ### Components/Axes The diagram features two nested triangles: **Outer Triangle (Blue Lines):** * **Top Vertex:** Labeled "Efficiency" (in blue text). * **Left Vertex:** Labeled "Portability" (in blue text). * **Right Vertex:** Labeled "Versatility" (in blue text). * **Bottom Side:** Labeled "Natural language" (in red text). * **Left Side:** Connects "Portability" and "Efficiency". * **Right Side:** Connects "Versatility" and "Efficiency". **Inner Triangle (Red Lines):** * **Center:** Labeled "Agora" (in red text). * The inner triangle is positioned such that its vertices touch the midpoints of the outer triangle's sides. **Labels associated with the sides of the outer triangle:** * **Left Side (between Portability and Efficiency):** Labeled "Traditional static API (e.g., OBP)" (in red text). This label is positioned to the left of the blue line. * **Right Side (between Versatility and Efficiency):** Labeled "Meta-API (e.g., RDF)" (in red text). This label is positioned to the right of the blue line. ### Detailed Analysis or Content Details The diagram does not contain numerical data or specific quantitative values. It is a qualitative representation of concepts. * **Efficiency:** Positioned at the top vertex of the outer triangle. * **Portability:** Positioned at the bottom-left vertex of the outer triangle. * **Versatility:** Positioned at the bottom-right vertex of the outer triangle. * **Natural language:** Positioned along the bottom edge of the outer triangle, below the "Portability" and "Versatility" vertices. * **Traditional static API (e.g., OBP):** Associated with the left side of the outer triangle, representing a balance or trade-off between Portability and Efficiency. * **Meta-API (e.g., RDF):** Associated with the right side of the outer triangle, representing a balance or trade-off between Versatility and Efficiency. * **Agora:** Located at the center of the inner triangle, suggesting it is a central concept or platform that integrates or relates to the other elements. The inner triangle's vertices appear to align with the midpoints of the outer triangle's sides, implying a balanced relationship. ### Key Observations * The diagram uses a triangular structure to represent a multi-dimensional relationship between concepts. * "Efficiency" is positioned as a peak, potentially indicating it's a primary consideration or a point of convergence. * "Portability," "Versatility," and "Natural language" form the base, suggesting they are foundational or related aspects. * The inner triangle labeled "Agora" is centrally placed, implying it is a unifying or mediating element. * The labels "Traditional static API (e.g., OBP)" and "Meta-API (e.g., RDF)" are placed along the sides, suggesting these API types embody specific combinations of the vertex concepts. ### Interpretation This diagram visually represents a conceptual model for understanding different types of APIs and their characteristics. * **The outer triangle** likely depicts a trade-off space or a spectrum of capabilities. "Efficiency," "Portability," and "Versatility" are presented as key dimensions. The placement of "Natural language" along the base suggests it might be a fundamental aspect or a common interface layer. * **"Traditional static API (e.g., OBP)"** being on the left side, between "Portability" and "Efficiency," suggests that such APIs might prioritize portability and efficiency, potentially at the expense of other factors. * **"Meta-API (e.g., RDF)"** being on the right side, between "Versatility" and "Efficiency," implies that these APIs might excel in versatility and efficiency, possibly with different trade-offs compared to traditional static APIs. * **"Agora"** at the center of the inner triangle signifies a central hub, platform, or concept that aims to harmonize or leverage these different aspects of APIs. The inner triangle's structure, with its vertices touching the midpoints of the outer triangle's sides, could imply that "Agora" represents an ideal or balanced state that integrates the core concepts of efficiency, portability, and versatility, possibly facilitated by natural language interaction. In essence, the diagram suggests that different API paradigms (traditional static vs. meta-APIs) occupy different positions within a conceptual space defined by efficiency, portability, and versatility, and that a central entity like "Agora" might serve to bridge or optimize these aspects. The inclusion of "Natural language" hints at the potential for more human-readable or accessible API interactions. </details> Figure 1: The Trilemma and how our solution (Agora) balances efficiency, portability and versatility. An agent is a computer system that, in an environment, is capable of autonomous actions (the so-called ‘agency’ (Horty, 2001)) to meet its design objective (Wooldridge & Jennings, 1995; Wooldridge, 2009, p. 15). Just as humans must negotiate and cooperate to achieve shared goals, so too must agents within multi-agent systems (Wooldridge, 2009, p. 24-25). However, when designing communication protocols for heterogeneous networks (i.e., networks where agents have different architectures, capabilities and design constraints), we run into difficulties when attempting to optimise for three properties at the same time: - Versatility: communication between agents should support a wide variety of messages, both in terms of content and format; - Efficiency: the computational cost of running an agent and networking cost of communication should be minimal; - Portability: supporting the communication protocol should require the least implementation effort by the largest number of agents involved. We name the trade-off between such properties the Agent Communication Trilemma, which is illustrated in Figure 1. In the next sections, we will discuss how an LLM-powered communication protocol can trade off versatility, efficiency, and portability. 3.1 Versatile vs. Portable Communication In networks of agents, versatility and portability are at tension for two fundamental reasons (Olivé, 2007). A prerequisite for two agents who communicate is (1) a shared conceptual understanding of the topic on which they communicate. For instance, two agents can communicate about the weather if they both ‘know’ what it means to be sunny, rainy and overcast. For example, they should share a similar notion of describing and measuring temperature (e.g., in degrees Celsius). In addition, (2) agents must encode and decode messages in a way that is intelligible for both. Continuing the weather example, if two agents exchange data using JSON objects, both the sender and the receiver must know the syntax (e.g., the keys of a JSON object, such as temperature) and the semantics (e.g. temperature is a $32$ -bit floating point value representing the temperature, in central London, as measured in degrees Celsius) of the exchanged messages. In complex scenarios, defining routines whose syntax and semantics satisfy requisites (1) and (2) may be difficult. For example, a programmer has to manually implement a method to decode (or decode) messages to (or from) other agents. Additionally, the programmer must explicitly instruct the agent about how to manipulate and reason about the message content, often by interpreting API documentation describing the semantics of the message. Therefore, there is a trade-off between the breadth of messages (versatility) and the implementation cost (portability). An example of high-portability, low-versatility is the Open Banking Platform (OBP), which uses a well-defined Open API schema for data transfer (OBL, 2024). OBP is highly portable because it uses a fixed range of well-known concepts which developers can implement; however, it is restricted to discussing a narrow domain of banking data and is thus not versatile. On the other end of the spectrum, rules-based Semantic Web agents (Berners-Lee et al., 2001) that exchange RDF (Beckett et al., 2014) encoded documents are highly versatile since ontologies (Wooldridge, 2009, p. 180) enable the description of structured relations between essentially any concept. Still, they require developers to program agents to implement the specific ontologies used by the network (e.g., if a set of RDF triples states that the temperature is 38°C, an agent must be able to interpret the concepts of “temperature” and “Celsius”). 3.2 Efficient vs. Versatile and Portable Communication As previously mentioned, rule-based agents excel at the tasks they are designed to solve but hardly adapt to new environments. Decades of research in reinforcement learning (Sutton, 2018) and then in deep reinforcement learning (Arulkumaran et al., 2017; Henderson et al., 2018), introduced a paradigm where agents learn to optimise their reward as proxy of the task we want them to solve. Agentic-LLMs, i.e., multi-agent systems powered by language models, is a recent paradigm for machine-to-machine communication that relies mostly on their proficiency at handling natural language and following instructions (Li et al., 2023a). Natural language is highly expressive, making it a suitable choice for versatile communication (Russell & Norvig, 2016). Additionally, LLMs trained on massive corpora seem to develop an implicit understanding of various concepts that abstracts and makes communication independent from their internal architecture. Moreover, LLMs can integrate external tools, write code and invoke APIs with relatively little or no training (Schick et al., 2024), since the only requirement is a natural-language description of the tool and its parameters. Conversely, natural language as a communication medium has two major drawbacks. While engineering and hardware improvements (Dubey et al., 2024b) mitigate costs over time, the computational requirements of invoking an LLM dwarf those of comparable APIs, representing a major bottleneck for scaling networks of LLMs. On the other hand, using closed-source pay-per-usage LLMs hosted by third parties is expensive and raises concerns in terms of replicability of the results (La Malfa et al., 2023). Additionally, natural language is inherently ambiguous: while LLMs have a certain degree of “common sense” to fulfil requests, non-determinism and natural language specifics leave space for errors that routines minimise (for instance, if someone asks for the temperature in Fahrenheit and the agent has a tool that returns the temperature in Celsius, the model must know that Celsius and Fahrenheit are both units of measure for temperature). These factors make LLMs and natural language more prone to errors than other alternatives like handwritten APIs. In conclusion, RESTful APIs (efficient), RDF tuples (portable) and natural language (versatile) are all trade-offs in the Trilemma. While some approaches are more useful in practice than others, the fact that no communication format achieves all three properties simultaneously suggests that we need a hybrid communication protocol that leverages all of them. The next section outlines our solution. 4 Agora: a Communication Protocol Layer for LLMs <details> <summary>img/evil.png Details</summary> ![69f3bedb](/v1/image/69f3bedbe6c14d5ee2c5dd09780c3385207ee6f85ebc09d7e2b874c921105c21) ### Visual Description ## Diagram: LLM-Powered Node Communication and Data Handling ### Overview This diagram illustrates a conceptual architecture involving multiple "LLM-Powered Nodes" that communicate with each other by sending and receiving messages. These nodes interact with a layered system that appears to represent a backend infrastructure. The upper layer of this infrastructure hosts various data storage and programming language icons, suggesting data management and application development capabilities. The lower layer is secured by HTTPS, indicating secure communication protocols. ### Components/Axes The diagram does not feature traditional axes or scales as it is a conceptual representation. The key components are: * **LLM-Powered Nodes**: Represented by black, faceted, geometric shapes (likely octahedrons). There are four such nodes depicted. * **Communication Lines**: Thin grey lines connect the LLM-Powered Nodes, indicating a network or communication pathway. * **"Send/receive message" Text**: A label positioned above the communication lines, clarifying the nature of the interaction between the nodes. * **Layered Infrastructure**: A two-tiered structure, depicted with blue outlines and a lighter blue interior. * **Upper Layer**: Contains several icons representing technologies: * **SQL Icon**: A database cylinder with a SQL label. * **MongoDB Icon**: A green leaf within a database cylinder, labeled "mongo DB". * **XML Icon**: A document icon with "</>" symbols, labeled "XML". * **PHP Icon**: A purple oval with "php" text. * **HTML, CSS, JS Icons**: Three distinct icons representing web technologies. * **Python Icon**: The Python programming language logo. * **Meta (Facebook) Icon**: The infinity-like symbol. * **OpenAI (ChatGPT) Icon**: A circular, stylized symbol. * **Lower Layer**: Features a prominent HTTPS icon (a padlock with "HTTPS" text below it) situated within a central horizontal band. This layer also has black circular nodes connected by dashed lines, suggesting internal connections or points of interest. * **Connection Points**: Black dots are present on the lower layer, connected by dashed lines to the upper layer, indicating a structural or data flow connection between the layers. ### Detailed Analysis or Content Details The diagram depicts a distributed system where multiple LLM-Powered Nodes are interconnected. The phrase "Send/receive message" suggests that these nodes exchange information. These nodes appear to interact with a backend system represented by the layered structure. The upper layer of the backend infrastructure showcases a diverse set of technologies: * **Databases**: SQL and MongoDB are explicitly shown, implying relational and NoSQL data storage capabilities. * **Data Format**: XML is represented, indicating support for this structured data format. * **Programming Languages/Frameworks**: PHP, HTML, CSS, JavaScript, and Python are visible, suggesting a broad range of development and scripting capabilities. * **Services/Platforms**: Icons for Meta and OpenAI are present, hinting at potential integrations or dependencies on these platforms or their services. The lower layer is secured by HTTPS, signifying that all communication passing through this layer is encrypted and protected. The black dots and dashed lines within this layer might represent secure communication channels, network interfaces, or specific security modules. ### Key Observations * The architecture emphasizes the role of LLM-Powered Nodes in a networked environment. * The backend infrastructure is designed to support a variety of data storage solutions and programming paradigms. * The inclusion of specific platform icons (Meta, OpenAI) suggests potential integration points or the use of their APIs/services. * HTTPS in the lower layer highlights a focus on secure data transmission within the backend. * The diagram is abstract and does not provide specific details on data volume, latency, or processing power. ### Interpretation This diagram illustrates a modern, distributed system architecture where Large Language Models (LLMs) act as intelligent nodes. These nodes are capable of communicating with each other, likely to share insights, process information collaboratively, or distribute tasks. The system's backend is robust, supporting diverse data management (SQL, MongoDB, XML) and development stacks (PHP, Python, web technologies). The presence of Meta and OpenAI icons suggests that these LLM nodes might leverage external AI services or platforms for enhanced capabilities, or that the system itself is designed to interact with these ecosystems. The HTTPS layer underscores the critical importance of security in handling data and communications within this architecture. The overall design suggests a flexible and scalable system that can handle complex data processing and communication tasks, potentially for applications in areas like natural language processing, data analysis, or intelligent automation. The diagram implies a layered approach to system design, with distinct responsibilities for communication, data handling, and security. The LLM nodes are presented as the primary agents of intelligence and interaction within this framework. </details> (a) An illustration of Agora and how it abstracts the underlying implementation, communication, and physical layers. <details> <summary>img/evil-stack.png Details</summary> ![39de4e72](/v1/image/39de4e726e5c1b447c233f325810138549c0d2ea6c8e57dba3185b55119e075b) ### Visual Description ## Diagram: Layered System Architecture ### Overview This diagram illustrates a layered system architecture, depicting distinct levels of functionality stacked vertically. The layers are presented from top to bottom, suggesting a hierarchical or dependency relationship. Each layer is represented by a rectangular box with a red border and contains a text label and, in some cases, associated icons. A dashed red border box at the top indicates "Further layers," implying this is an incomplete representation or a conceptual grouping. ### Components/Axes The diagram does not have traditional axes or scales. The components are: * **"Further layers"**: Enclosed in a dashed red border box, positioned at the top. This suggests additional, unspecified layers above the visible ones. * **"Agora"**: A rectangular box below "Further layers." It contains the text "Agora" and a black geometric icon resembling a faceted polyhedron. * **"Implementation Layer"**: A rectangular box below "Agora." It contains the text "Implementation Layer" and three icons: * A stylized icon that appears to represent a network or interconnected nodes (possibly a logo for a specific technology). * A Python logo (a blue and yellow snake). * A stack of three discs with "SQL" written on the bottom disc, representing a database. * **"Communication layer"**: A rectangular box below "Implementation Layer." It contains the text "Communication layer" and a black icon of a padlock. * **"Physical Layer"**: A rectangular box at the bottom. It contains the text "Physical Layer." The layers are arranged in a vertical stack, with "Further layers" at the highest conceptual level, followed by "Agora," "Implementation Layer," "Communication layer," and "Physical Layer" at the lowest visible level. The visual presentation suggests a top-down flow or dependency. ### Detailed Analysis or Content Details The diagram presents a conceptual model of a system architecture. The layers are: 1. **Further layers**: This is a placeholder for additional, higher-level abstractions or functionalities not detailed in this diagram. 2. **Agora**: This layer is associated with a geometric icon. The term "Agora" historically refers to a public open space used for assemblies and markets in ancient Greece, suggesting a central hub or marketplace for interactions or data. 3. **Implementation Layer**: This layer is rich with icons indicating its function. The presence of a network/node icon, the Python logo (a popular programming language for development and data science), and an SQL database icon strongly suggest that this layer is responsible for the core logic, data management, and potentially the development environment of the system. 4. **Communication layer**: Represented by a padlock icon, this layer is clearly responsible for secure communication protocols and data exchange between different parts of the system or with external entities. 5. **Physical Layer**: This is the foundational layer, likely representing the underlying hardware, network infrastructure, and physical resources upon which the rest of the system operates. ### Key Observations * The diagram uses a clear, hierarchical layering approach. * The icons provide semantic clues about the function of each layer, particularly for the "Implementation Layer" and "Communication layer." * The "Further layers" element indicates that this is not an exhaustive representation. * The ordering from top to bottom suggests a progression from abstract concepts or higher-level services down to the fundamental physical infrastructure. ### Interpretation This diagram depicts a common architectural pattern where a system is broken down into distinct layers, each with specific responsibilities. The "Physical Layer" forms the base, providing the necessary infrastructure. The "Communication layer" ensures secure data transfer. The "Implementation Layer" handles the core logic and data processing, utilizing tools like Python and SQL. The "Agora" layer, with its abstract icon, might represent a decentralized or marketplace-like component, possibly for resource allocation or service discovery. The "Further layers" suggest that this model can be extended to include even higher levels of abstraction, such as user interfaces or application-specific services. The overall structure implies a modular and scalable design, where each layer can be developed, managed, and updated independently, as long as its interfaces with adjacent layers are maintained. This layered approach is fundamental in software engineering for managing complexity and promoting maintainability. </details> (b) Stack of technologies to build Agora. Figure 2: How Agora fits into a standard communication protocol stack. The key to solving the Communication Trilemma involves accepting that no single protocol can achieve optimal efficiency, portability and versatility at the same time. In this section we introduce Agora, a meta protocol that takes advantage of the unique capabilities of LLMs to sidestep the Trilemma by adapting different communications methods for different scenarios. The most powerful LLMs share three key properties: - They can understand, manipulate, and reply to other agents using natural language; - They excel at following instructions, including writing code to implement routines (Schick et al., 2024; Hou et al., 2023; Liu et al., 2024); - They can autonomously negotiate protocols and reach consensus on strategies and behaviours to adopt in complex scenarios (Chen et al., 2023; Fu et al., 2023). At its core, Agora uses different communication formats depending on the circumstances; an agent can support a wide breadth of communications (high versatility) while handling the majority of the total volume of requests with efficient routines (high efficiency). Moreover, the entire negotiation and implementation workflow is handled by the LLMs and requires no human supervision (high portability). The concept of protocol documents (PD), which we sketch in Figure 3 and discuss in the next section, lies at the core of Agora’s functionalities. In the next sections, we illustrate the hierarchy of communication methods Agora supports natively and the concept of PD; we then provide an example of how Agora works and how it enables versatile, efficient, and portable communication. We conclude by emphasising how one can integrate and build upon Agora with further technological layers independently from its underlying technologies. 4.1 Communication in (an) Agora Agora introduces a machine-readable way to transfer and refer to protocols, namely the protocol documents (PDs). A PD is a plain-text description of a communication protocol. Throughout this paper, we use the word “protocol” to refer to any standardised description of structured communication. PDs are self-contained, implementation-agnostic, and contain everything an agent needs to support a protocol: this means that most descriptions of existing protocols, such as RFCs, are also suitable PDs. However, instead of relying on a central body to assign identifiers, a PD is uniquely identified by its hash (for multiplexing). In Agora, the most frequent communications have dedicated efficient routines, and the least frequent ones use inefficient but flexible LLMs and natural language. In particular: - When possible, frequent communications are handled through traditional protocols, for which there are standard, human-written implementations (e.g., OBP); - For communications that happen less frequently (or for which there are no standard protocols), agents can use structured data as an exchange medium (which can be handled by LLM-written routines); - For communications that might be frequent for one side but not the other, the agents still use structured data, but one side can choose to use an LLM, while the other uses a routine; - For rare communications or when a routine fails unexpectedly, the agents can resort to natural language. It is entirely up to the agent to handle a query using a human-written routine, an LLM-written routine, or an LLM (or a combination of these three). This gives the agent maximum flexibility over how to process queries. Forcing or nudging a model to use a specific communication style can improve efficiency, yet its discussion is out of the scope of this paper. One can, for example, specify in the system prompt of an LLM to negotiate a protocol whenever possible. In the Demo (Section 5.3), we will illustrate the trade-off between the versatility of a communication protocol and its expected usage. Hierarchical communications support any form of communication (maximum versatility) , although in practice an LLM is invoked in very rare cases (maximum efficiency). Moreover, since LLMs can implement routines on their own (since PDs fully describe the syntax and semantics of a protocol), human programmers only need to provide an overview of the tools the agent has access to, which means that the implementation effort required on the human side is minimal (maximum portability). In other words, Agora sidesteps the Communication Trilemma by employing routines for frequent requests and resorting to natural language when agents need to negotiate efficient ways to solve a problem or errors occur. 4.2 An Example of Communication over Agora <details> <summary>img/pd-negotiation.png Details</summary> ![1ec06c4b](/v1/image/1ec06c4bca7a03018bbf9d7242a15604fd373f71a22f3c02613ae6f917e27dec) ### Visual Description ## Diagram: LLM-Powered Node Interaction with Data ### Overview This diagram illustrates two distinct interaction scenarios involving an "LLM-Powered Node" and another entity. The left side depicts a negotiation process using natural language, while the right side shows a message being formatted as "PD" (likely referring to a data format or protocol) and hashed. Both scenarios involve a black, faceted geometric shape representing a node, with reflections suggesting a digital or abstract representation. ### Components/Axes This diagram does not contain traditional axes or legends as it is a conceptual illustration. The key components are: * **LLM-Powered Node**: Represented by a black, multifaceted geometric shape (resembling an octahedron or a similar polyhedron) with a blue outline. It has a reflection beneath it. * **Negotiate PD**: Represented by an icon of a document with a pencil, indicating an action of negotiation or creation. Below this icon is text. * **Message formatted as PD**: Represented by an icon of a document with lines and columns, indicating a structured message. Below this icon is text. * **Connecting Lines**: * A solid light gray line connects the "LLM-Powered Node" to the "Negotiate PD" icon. * A dashed light gray line connects the "LLM-Powered Node" to the "Negotiate PD" icon. * A solid light gray line connects the "LLM-Powered Node" to the "Message formatted as PD" icon. * **Text Labels**: * "LLM-Powered Node" (positioned above the first node on the left). * "Natural language" (positioned along the solid line connecting the first node to the "Negotiate PD" icon). * "Negotiate PD" (positioned below the document/pencil icon). * "hash '123'" (positioned below "Negotiate PD"). * "Message formatted" (positioned above and to the right of the document icon on the right). * "as PD" (positioned below "Message formatted"). * "hash '123'" (positioned below "as PD"). ### Detailed Analysis or Content Details **Left Side Scenario (Negotiation):** 1. An "LLM-Powered Node" is shown. 2. A connection is made to an icon representing "Negotiate PD", which is accompanied by the text "hash '123'". 3. The connection is labeled "Natural language", indicating that the interaction between the LLM-Powered Node and the negotiation process occurs via natural language. 4. There are two lines connecting the LLM-Powered Node to the "Negotiate PD" element: a solid line and a dashed line. The solid line is associated with the "Natural language" label. The dashed line might represent a secondary or implicit connection, or a different aspect of the negotiation. **Right Side Scenario (Message Formatting):** 1. An "LLM-Powered Node" is shown. 2. A connection is made to an icon representing a structured message. 3. This connection is labeled "Message formatted as PD" and "hash '123'". This suggests that the LLM-Powered Node is involved in creating or processing a message that is structured according to a "PD" format and has been assigned a hash value of '123'. ### Key Observations * The diagram presents two distinct pathways for interaction originating from an LLM-Powered Node. * The first pathway emphasizes communication and negotiation using "Natural language". * The second pathway focuses on the output of a structured message ("PD" format) and its associated cryptographic hash. * The hash value "123" is consistent across both scenarios, suggesting it might be a placeholder or a reference to a specific data state or agreement. * The use of identical geometric shapes for nodes implies a consistent underlying technology or concept, with the interactions differentiating their roles. ### Interpretation This diagram illustrates the capabilities of an LLM-Powered Node in two key areas: 1. **Natural Language Interaction and Negotiation**: The left side suggests that LLM-powered nodes can engage in complex interactions, such as negotiating data parameters ("Negotiate PD"), using human-readable natural language. The presence of a hash value associated with this negotiation implies that the outcome of the negotiation is verifiable or has a specific digital fingerprint. The dual lines (solid and dashed) might represent different phases or types of communication within the negotiation. 2. **Data Formatting and Integrity**: The right side demonstrates the node's ability to process and output data in a specific format ("PD") and ensure its integrity through hashing. This is crucial for secure and reliable data exchange in distributed systems or blockchain applications. The hash '123' serves as a unique identifier for this specific formatted message. In essence, the diagram highlights the dual role of LLM-powered nodes: acting as intelligent agents capable of understanding and generating human language for interaction, and as reliable processors of structured data, ensuring its authenticity and integrity through cryptographic methods. This suggests a system where natural language interfaces can be used to manage and secure data transactions. </details> Figure 3: How a protocol document is negotiated between LLM-powered agents (left) and used for future efficient communications. We now describe how two agents, Alice and Bob, can efficiently communicate over Agora using a PD routine, as illustrated in Figure 3. Alice initially sends a query with the hash of its corresponding PD. Bob uses the hash to determine if he has a corresponding routine. If so, he calls it and handles the communication without invoking the LLM. Otherwise, Bob handles the response with the LLM itself. If Bob uses an LLM to reply several times to queries that follow a given protocol over time, to the point where using an LLM every time becomes expensive, he can use the LLM to write a routine that handles future communications. If the routine fails or the communication is a one-off instance that does not require a protocol, Alice and Bob use natural language, which is again handled by the LLM. Natural language is also available to bootstrap communication between nodes that have never interacted before, as well as to negotiate new protocols. That said, the lower cost of routines and the lack of ambiguity are strong incentives for agents to prefer structured data. Note that PDs can be shared with other nodes in the network, which means that two agents that have never interacted before can use protocols developed by other agents. In the Appendix A, we provide details of five use cases of Agora to further show its versatility as a personal assistant and data analysis tool, and how it leverages compositionality and scalability to reduce costs. 4.3 Agora as a Layer Zero Protocol Figure 2 illustrates that Agora is implementation and technology agnostic. The implementation of the agents themselves (e.g., LLMs), the database used to store data (e.g., VectorDB, SQL, MongoDB, etc.), the language in which implementations are written (Python, Java, etc.) and the nature of tools are all abstracted. At the same time, PDs can refer to other protocol documents, and since routines can call other routines, agents can build upon previous negotiations to solve more complex tasks. Finally, the versatility and portability of Agora make it straightforward to handle the addition or removal of a node, a change in the capabilities of a node, or a change in the goals of the network, as illustrated in the demo, Section 5.3. All these factors contribute to making Agora a natural Layer Zero protocol, i.e. a foundation layer, for higher-order communication and collaboration between LLMs. We hope our protocol can fuel theoretical and applied research on complex protocols, negotiation schemes, and consensus algorithms in large networks of LLMs. 5 Agora in Practice We implement and showcase two scenarios where Agora can be applied. The former, with two agents whose objective is to exchange some data; the latter, with $100$ , to test Agora scalability and the capacity of LLM-powered agents to autonomously coordinate in complex scenarios. For space reasons, the scenarios are further expanded in Appendices C and D; here, we instead focus on their functionalities and the key observations we drew in terms of efficiency/versatility/portability, reduction of costs, scalability and emergent behaviours of fully automated networks of LLMs. 5.1 Implementation Details The design of Agora for our working demos follows three key principles: - Minimality. Agora enforces the basic standards that allow for efficient negotiation and use of protocols, leaving everything else to PDs or other higher-order standards; - Decentralisation. Agora does not rely on central authorities, with any collection of nodes being able to use Agora independently; - Full backward compatibility. Agora supports existing communication protocols and schemas such as OpenAPI and JSON-Schema. From a practical point of view, Agora uses HTTPS as base communication layer and JSON as format to exchange metadata. When sending a message in a given protocol, an agent sends a JSON document with three keys: the protocol hash, the body of the request formatted according to the protocol, and a non-empty list of sources from which the protocol can be downloaded. The receiver downloads the PD from its preferred source and, upon checking that the hash matches, stores it for future uses. This hash-based identification system ensures that any node can reference any PD without relying on a central authority to assign identifiers. Where PDs are stored is entirely up to the agents; aside from regular cloud storage, hash-based indexing makes decentralised storage options (such as IPFS Benet (2014)) viable. Additionally, since essentially all protocols can be stored as PDs, Agora has full backwards compatibility with existing protocols (although human programmers are encouraged to provide existing, standardised implementations instead of having the LLM re-implement them from scratch). To simplify negotiation, an agent can expose an endpoint with a list of supported protocols: a potential sender can thus compare the list with its own to automatically determine if there is a common protocol. The sender can also use a potentially unsupported protocol, although the receiver can choose to reject it by returning a predefined error message. Refer to LABEL:sec:\pname{}-specification for a more formal description of Agora. 5.2 Demo: Retrieving Weather Data Consider two agents, Alice and Bob. Alice is a Llama-3-405B (Dubey et al., 2024b) powered agent managing the bookings of a guided tour service in London. While Llama-3 models can be hosted locally, for the sake of a proper comparison with GPT-4o and Gemini, we use a cloud provider, namely SambaNova (https://sambanova.ai). Bob is a GPT-4o (Achiam et al., 2023) agent for weather service that provides weather forecasts for a given date and location. As part of the user interaction loop, Alice notifies the user if heavy raining is expected on a booked date. To check the weather, she initially uses her LLM to send a natural language query to Bob (phase A1): Alice - Natural Language What is the weather forecast for London, UK on 2024-09-27? Bob uses his Toolformer LLM (Schick et al., 2024) to query his database (phase B1) and returns a natural language reply (phase B2): Bob - Natural Language The weather forecast for London, UK, on 2024-09-27 is as follows: “Rainy, 11 degrees Celsius, with a precipitation of 12 mm.” Over time, the cost of invoking an LLM for phases A1 and B2 dominate all the other costs; Alice and Bob thus decide to develop a protocol. Alice checks if Bob already supports a suitable protocol but finds none. Therefore, she decides to negotiate a protocol with Bob. After a few rounds of negotiation, Alice and Bob agree on the following protocol: Alice sends a JSON document with two fields, location and date, and Bob replies with a JSON document containing three fields, namely temperature (in degrees Celsius), precipitation (in millimetres), and weatherCondition (one of “sunny”, “cloudy”, “rainy” and “snowy”). From there on, Alice specifies the protocol hash when performing a query. An example of exchanged message (excluding Agora’s metadata) is: Alice - PD {"location": "London, UK", "date": "2024-09-27"} Both Alice and Bob independently decide to write a routine to handle their side of the communication. From now on, Alice and Bob do not need to use the LLM to transmit traffic data: a routine now automates phases A1, B1 and B2 and leverages the costs of invoking the respective LLMs. A cost analysis. In our demo, negotiating the protocol and implementing the routines cost $0.043$ USD in API calls, compared to an average cost of $0.020$ USD for a natural-language exchange. This means that, as long as Alice and Bob use the agreed-upon protocol more than twice, Agora reduces the overall cost. Please refer to Appendix C for a transcription of the negotiation process and the final protocol. As a final note, we stress that the entire communication happened without human intervention. Additionally, should Bob become unavailable, Alice can simply reuse the PD with a new node that may use a different LLM/database/technology stack. 5.3 Demo: a Network of 100 Agents <details> <summary>img/agora-100.png Details</summary> ![5025d42c](/v1/image/5025d42c895f36ecdc14c0822b926431dcaf8ce0b95e4e881c5893aaa1dc1a74) ### Visual Description ## Diagram: Food Delivery Process in an Agora Sub-network ### Overview This diagram illustrates a simplified workflow for a food delivery process within an "Agora sub network." It depicts the interactions between a central entity (represented by a black octahedron, likely the Agora network or a smart contract), a customer ordering food, a restaurant, a delivery rider, and external services like traffic and delivery status checks. The diagram is divided into two main sections: the initial ordering and negotiation phase on the left, and a step-by-step process flow on the right, culminating in a "Result" section. ### Components/Axes The diagram does not feature traditional axes or legends. Instead, it uses icons and text labels to represent entities and actions. **Key Components and Icons:** * **Black Octahedron:** Represents the central network or a smart contract. It is shown at the start of the process and at the end of the "Result" section. * **Document with Pencil Icon:** Represents negotiation or agreement, labeled "Negotiate PD hash '123'". * **Plate with Food Icon:** Represents food being prepared or ready for delivery. * **Scooter Rider Icon:** Represents the delivery rider. * **Cars and Traffic Light Icon:** Represents traffic conditions or a traffic service. * **Numbered Circles (1-6):** Indicate sequential steps in the process flow. * **Text Labels:** Describe actions, services, and data hashes. **Textual Labels and Their Meanings:** * **Left Section (Agora Sub network):** * "Orders food from a restaurant": An action initiated from the central entity. * "Negotiate PD hash '123'": Represents a negotiation phase with a specific hash value. * "Delivers food PD Hash '234'": Represents the delivery of food with a specific hash value. * "Traffic flow PD Hash '600'": Represents traffic flow information with a specific hash value. * **Right Section (Process Flow):** * **Step 1:** "Orders food from a restaurant" * **Step 2:** "Uses a PD to check the delivery service" * **Step 3:** "Uses a PD to check the traffic service" * **Step 4:** "Replies with a PD that the traffic flows" * **Step 5:** "Replies with a PD that the rider is available" * **Step 6:** "Replies with a PD that the food is being delivered" * **Result Section:** * "The order starts" * An arrow leading from the "Result" section to the black octahedron. ### Detailed Analysis or Content Details The diagram outlines a process flow with distinct stages: **Left Side (Initial Interaction):** 1. The process begins with the **black octahedron** (Agora sub network). 2. An action "Orders food from a restaurant" is initiated, leading to a **plate with food icon**. 3. A separate path shows "Negotiate PD hash '123'" originating from the **black octahedron** and leading to a **document with pencil icon**. 4. Another path shows "Delivers food PD Hash '234'" originating from the **plate with food icon** and leading to the **scooter rider icon**. 5. A path labeled "Traffic flow PD Hash '600'" originates from the **cars and traffic light icon** and leads to the **scooter rider icon**. **Right Side (Step-by-Step Process):** This section details the interaction with external services and the rider. 1. **Step 1:** The **black octahedron** initiates "Orders food from a restaurant." This is followed by an arrow to the **plate with food icon**. 2. **Step 2:** The **plate with food icon** (representing the order or restaurant) uses a "PD" (likely a data packet or proof of delivery) to check the delivery service. This is represented by an arrow to the **scooter rider icon**. 3. **Step 3:** The **scooter rider icon** uses a "PD" to check the traffic service. This is represented by an arrow to the **cars and traffic light icon**. 4. **Step 4:** The **cars and traffic light icon** (representing the traffic service) replies with a "PD that the traffic flows." This is represented by an arrow to the **scooter rider icon**. 5. **Step 5:** The **scooter rider icon** (now potentially representing the delivery service) replies with a "PD that the rider is available." This is represented by an arrow to the **plate with food icon**. 6. **Step 6:** The **plate with food icon** (representing the restaurant or order status) replies with a "PD that the food is being delivered." This is represented by an arrow to the **scooter rider icon**. **Result Section:** * "The order starts" is indicated by an arrow from the **plate with food icon** to the **black octahedron**. This suggests the completion of the initial setup and the commencement of the actual delivery process, with the network confirming the start. ### Key Observations * **Decentralized Nature:** The use of "PD" (likely Proof of Delivery or similar data structure) and hashes suggests a decentralized or blockchain-based system. * **Service Integration:** The diagram highlights the integration of multiple services: restaurant ordering, delivery service, and traffic information. * **Information Flow:** The arrows clearly indicate the direction of information and action flow between different entities and services. * **Hash Values:** Specific hash values ('123', '234', '600') are associated with different stages or data points, implying data integrity and traceability. * **Iterative Checking:** The process involves the rider checking traffic and availability, and services responding with relevant data. ### Interpretation This diagram illustrates a streamlined food delivery process within a hypothetical "Agora sub network." The black octahedron likely represents a smart contract or a decentralized application orchestrating the entire workflow. The process begins with an order being placed, followed by negotiation and the preparation of food. The core of the diagram on the right details how the delivery rider interacts with external services. The rider uses "PDs" (which could be verifiable credentials or data proofs) to query the delivery service for availability and the traffic service for real-time conditions. The responses from these services, also in the form of "PDs," enable the rider to proceed. The final step shows the confirmation that the food is being delivered, leading back to the central network, signifying the start of the actual delivery phase. The "Result" section, "The order starts," implies that after all checks and confirmations, the network acknowledges the commencement of the delivery. The inclusion of hash values suggests that each significant piece of information or transaction is cryptographically secured and verifiable, which is a hallmark of blockchain or distributed ledger technologies. This system aims to ensure transparency, efficiency, and reliability in the food delivery ecosystem by leveraging verifiable data and automated checks. The diagram effectively visualizes a smart contract-driven or decentralized autonomous organization (DAO)-managed delivery service. </details> Figure 4: Illustration of how in an Agora network with $100$ agents (left; for clarity, only the relevant sub-network is displayed), an emergent protocol for food delivery emerges (right). We now show the scaling capabilities and emergent behaviours of Agora by considering a network of 100 LLM-powered agents. In particular, we scale the number of agents, which, as posited in Chaabouni et al. (2022), is a requisite for the emergence of complex behaviours in multi-agent networks. We design a network of $85$ assistant agents interacting with $15$ server agents, all powered by LLMs. The server agents offer various services, such as booking hotel rooms, calling taxis, ordering food, etc. An example of a sub-network for food delivery is sketched in Figure 4, left. Their specialisation is handled via prompting, as in Deshpande et al. (2023); Joshi et al. (2023); Li et al. (2023a). As part of their workflow, server agents must interact with several tools and databases; additionally, some servers need to interact with other servers to complete assistants’ requests (e.g., taxi services use the traffic data agent to adjust estimated fares for a run). We bootstrap the network by leveraging the underlying communication layer (as described in Section 4 and Figure 2) and inform the nodes of which URLs correspond to which node, as well as manually creating the connection links between agents (e.g. the Taxi Service server knows that the server on port 5007 is a traffic server, but it does not know how to communicate with it and what information it requires); To showcase the portability of Agora throughout the network, we use different database technologies (SQL and MongoDB) and different LLMs, both open- and closed-source (GPT-4o, Llama-3-405B, and Gemini 1.5 Pro (Reid et al., 2024)). We then generate $1000$ random queries, which range from simple ones, such as requesting today’s weather, to more complex ones, like booking rooms in ski resorts, buying tickets for movies, ordering one of each dish from a menu, and so on. For each query, assistants receive a JSON document (which represents the task data) and are tasked with fulfilling the request and returning a parsed response that follows a given schema. Queries are distributed among assistants following a Pareto distribution, to simulate some assistants sending significantly more requests than others. Each node can also read and share PDs to one of three protocol databases. Overall, these design decisions result in a very heterogeneous network, testing the limits of Agora. Refer to Appendix D for further implementation details. Emergent protocols in large networks. Once the connections are established and the networks can send and receive messages, we observe several noteworthy behaviours. As PDs are progressively shared between agents (see Figure 5(b)), we observe the emergence of a decentralised consensus on the appropriate protocols for a given task. An example of this behaviour involves ordering food from restaurants: an agent queries another to request food to be delivered to a certain address. The restaurant agent requests a delivery driver from a food delivery service, who, in turn, checks with the traffic data agent to see if the traffic is smooth enough to fulfil the delivery. None of the agents know each other’s roles and the protocols involved beyond their immediate communication. Still, the interaction of the various agents creates an automated workflow that takes care of everything. The emergence of such a protocol is illustrated in Figure 4 (right). In contrast with some recent literature on the emergence of complex protocols (Chaabouni et al., 2019), we observe that with the proper incentives (i.e., efficiency), agents in Agora escape the inefficient trap of committing to longer messages in large scale communications. A cost analysis. We compare the cost of running our Agora network against one that uses natural language for all communications. As shown in Figure 5(a), at the beginning Agora’s cost-efficiency marginally outperforms the network that relies only on natural language; this gap increases over time, with progressively more Agora-powered nodes relying on LLM-written routines. The overall cost in API queries for running $1000$ queries in the natural language network is $36.23$ USD, compared to Agora’s $7.67$ USD: in other words, executing this demo with Agora is approximately five times cheaper than with regular natural language. Continuing the demo for more queries would have led to an even larger cost difference. <details> <summary>x1.png Details</summary> ![fd688c68](/v1/image/fd688c68d6336d657ca03e4cceca08c9ae82a293d7e3a7dc73eb645506e9c433) ### Visual Description ## Line Chart: Cost vs. Queries for Natural Language and Agora ### Overview This image displays a line chart comparing the cost (in USD) as a function of the number of queries for two different systems: "Natural Language" and "Agora". The chart spans from 0 to 1000 queries on the x-axis and from 0.000 to 0.040 USD on the y-axis. ### Components/Axes * **X-axis:** * **Title:** Queries * **Scale:** Numerical, ranging from 0 to 1000. Major tick marks are present at 0, 200, 400, 600, 800, and 1000. * **Y-axis:** * **Title:** Cost (USD) * **Scale:** Numerical, ranging from 0.000 to 0.040. Major tick marks are present at 0.000, 0.005, 0.010, 0.015, 0.020, 0.025, 0.030, 0.035, and 0.040. * **Legend:** * Located in the center-right of the chart. * **"Natural Language"**: Represented by a teal-colored line. * **"Agora"**: Represented by an orange-colored line. ### Detailed Analysis **Natural Language (Teal Line):** * **Trend:** The teal line, representing "Natural Language", generally fluctuates between approximately 0.034 USD and 0.040 USD. It starts around 0.034 USD, rises to a peak near 0.040 USD between queries 200 and 400, then shows a significant drop to around 0.028 USD between queries 800 and 900, before recovering slightly to approximately 0.033 USD by query 1000. * **Key Data Points (approximate):** * At 0 queries: ~0.034 USD * At 200 queries: ~0.039 USD * At 400 queries: ~0.036 USD * At 600 queries: ~0.038 USD * At 800 queries: ~0.037 USD * At 850 queries: ~0.028 USD (lowest point in this range) * At 900 queries: ~0.029 USD * At 1000 queries: ~0.033 USD **Agora (Orange Line):** * **Trend:** The orange line, representing "Agora", exhibits a steep downward trend in the initial phase of queries, then stabilizes and fluctuates at a much lower cost. It starts at approximately 0.029 USD at 0 queries, rapidly decreases to around 0.009 USD by query 100, and continues to decrease, reaching a low of approximately 0.004 USD between queries 400 and 500. After this point, it fluctuates between approximately 0.004 USD and 0.006 USD for the remainder of the queries up to 1000. * **Key Data Points (approximate):** * At 0 queries: ~0.029 USD * At 50 queries: ~0.015 USD * At 100 queries: ~0.009 USD * At 200 queries: ~0.011 USD * At 300 queries: ~0.008 USD * At 400 queries: ~0.005 USD * At 500 queries: ~0.004 USD * At 600 queries: ~0.006 USD * At 800 queries: ~0.005 USD * At 1000 queries: ~0.006 USD ### Key Observations * **Cost Discrepancy:** "Natural Language" consistently operates at a significantly higher cost per query (around 0.033-0.040 USD) compared to "Agora" (which stabilizes around 0.004-0.006 USD after an initial drop). * **Initial Cost of Agora:** "Agora" starts with a relatively high cost (0.029 USD) but demonstrates a dramatic reduction in cost within the first 100 queries. * **Stability of Agora:** After the initial sharp decline, "Agora" maintains a relatively stable and low cost throughout the observed query range. * **Volatility of Natural Language:** The "Natural Language" system shows more pronounced fluctuations in cost, particularly a sharp decrease around 800-900 queries. ### Interpretation This chart visually demonstrates a substantial difference in the cost-efficiency of the "Natural Language" system versus the "Agora" system. The "Agora" system appears to be significantly more cost-effective, especially after an initial learning or optimization phase within the first 100 queries. The steep initial decline in "Agora's" cost suggests an improvement in efficiency as more queries are processed, possibly due to caching, model adaptation, or other optimization mechanisms. The "Natural Language" system, while operating at a higher cost, exhibits more variability. The significant drop in cost around query 800-900 for "Natural Language" could indicate a change in the underlying process, a successful optimization event, or a shift in the type of queries being processed. However, even at its lowest point, its cost remains higher than the stabilized cost of "Agora". In essence, the data suggests that if cost is a primary concern, "Agora" is the superior choice, especially for a large volume of queries. The "Natural Language" system might offer other benefits not represented by cost (e.g., accuracy, features), but its operational expense is considerably higher. The chart prompts further investigation into why "Agora" becomes so much cheaper and what factors contribute to the fluctuations in "Natural Language" costs. </details> (a) Cost comparison of natural language vs Agora on a network of $100$ agents. Costs are averaged with a window size of $100$ . <details> <summary>x2.png Details</summary> ![8c3365b4](/v1/image/8c3365b4920c737368fc2ea294b10216da385dc011952d755abf2ae6cf4c19d8) ### Visual Description ## Line Chart: Queries with LLMs (%) vs. Number of Protocols over Queries ### Overview This image displays a line chart with two y-axes and a shared x-axis. The chart plots two data series representing the percentage of queries handled by Large Language Models (LLMs) and the number of protocols used, both as a function of the total number of queries processed. ### Components/Axes * **X-axis:** * **Title:** Queries * **Scale:** Linear, ranging from 0 to 1000. Major tick marks are present at 0, 200, 400, 600, 800, and 1000. * **Left Y-axis:** * **Title:** Queries with LLMs (%) * **Scale:** Linear, ranging from 0 to 80. Major tick marks are present at 20, 40, 60, and 80. The axis label is oriented vertically. * **Right Y-axis:** * **Title:** Number of Protocols * **Scale:** Linear, ranging from 0 to 14. Major tick marks are present at 0, 2, 4, 6, 8, 10, 12, and 14. The axis label is oriented vertically. * **Data Series:** * **Blue Line:** Represents "Queries with LLMs (%)". * **Red Line:** Represents "Number of Protocols". ### Detailed Analysis **Blue Line (Queries with LLMs (%))**: This line starts at approximately 82% at 0 queries. It then exhibits a steep downward trend, reaching around 42% at approximately 100 queries. From 100 to about 250 queries, it fluctuates between approximately 42% and 53%. After 250 queries, the trend is generally downward, with significant fluctuations, reaching a low of approximately 25% around 780 queries. It then shows a slight upward trend to about 32% at 900 queries, followed by a decrease to approximately 30% at 1000 queries. * **Approximate Key Points (Queries, Percentage):** * (0, 82) * (50, 58) * (100, 42) * (150, 52) * (200, 48) * (250, 53) * (300, 45) * (350, 35) * (400, 28) * (450, 33) * (500, 31) * (550, 32) * (600, 31) * (650, 34) * (700, 32) * (750, 27) * (800, 25) * (850, 30) * (900, 32) * (950, 30) * (1000, 30) **Red Line (Number of Protocols)**: This line starts at 0 protocols at 0 queries. It shows a step-wise increase. * It remains at 0 protocols until approximately 20 queries. * Increases to 2 protocols around 20 queries and stays there until approximately 50 queries. * Increases to 4 protocols around 50 queries and stays there until approximately 100 queries. * Increases to 6 protocols around 100 queries and stays there until approximately 150 queries. * Increases to 8 protocols around 150 queries and stays there until approximately 200 queries. * Increases to 10 protocols around 200 queries and stays there until approximately 300 queries. * Increases to 12 protocols around 300 queries and stays there until approximately 400 queries. * Increases to 14 protocols around 400 queries and stays there until approximately 1000 queries. * **Approximate Key Points (Queries, Number of Protocols):** * (0, 0) * (20, 2) * (50, 4) * (100, 6) * (150, 8) * (200, 10) * (300, 12) * (400, 14) * (1000, 14) ### Key Observations * There is a strong inverse correlation observed between the "Queries with LLMs (%)" and the "Number of Protocols" in the initial phase of the queries. As the number of protocols increases, the percentage of queries handled by LLMs generally decreases. * The "Number of Protocols" increases in discrete steps, suggesting a system where protocols are added or enabled at certain query thresholds. * The "Queries with LLMs (%)" shows a significant initial drop, then fluctuates, and eventually stabilizes at a lower percentage as the number of protocols reaches its maximum. * The most pronounced decrease in LLM usage occurs when the number of protocols is increasing from 0 to 10. * Once the number of protocols reaches 14 (around 400 queries), the "Queries with LLMs (%)" appears to stabilize within a range of approximately 25% to 35%, despite continued fluctuations. ### Interpretation The data suggests a system where the introduction or increase in the number of protocols leads to a decrease in the proportion of queries handled by LLMs. This could imply that as more complex or specialized protocols are introduced, the system relies less on general-purpose LLMs and more on specific protocol handlers. The step-wise increase in the "Number of Protocols" indicates a controlled or staged rollout of these protocols. The initial sharp decline in LLM usage (from ~82% to ~42% within the first 100 queries) coincides with the initial increases in protocols (from 0 to 6). This suggests that the early addition of protocols has the most significant impact on reducing LLM reliance. The subsequent fluctuations in LLM usage after the number of protocols stabilizes at 14 might represent the dynamic nature of query routing or the interaction between LLMs and the various protocols. The stabilization of LLM usage within a certain range (25-35%) after reaching the maximum number of protocols could indicate that the system has reached a steady state where LLMs handle a baseline percentage of queries, possibly for tasks not covered by the existing protocols or for fallback mechanisms. In essence, the chart demonstrates a trade-off or a shift in query handling strategy as the system's protocol complexity increases. The system appears to be designed to offload more specific tasks to dedicated protocols as they become available, thereby reducing the burden on general LLMs. This could be for reasons of efficiency, accuracy for specific tasks, or cost. </details> (b) The number of queries to the LLMs in Agora decreases over time as the number of established PDs grows. Figure 5: Summary of the efficiency of Agora for the demo with 100 agents. 6 Conclusions In this paper, we introduced Agora, a meta protocol that sidesteps the Agent Communication Trilemma by using a mix of natural language and structured protocols. We showed that Agora agents can negotiate, implement and use protocols, creating self-organising networks that solve complex tasks. Additionally, we demonstrated the scalability of Agora by testing a $100$ -agent demo and achieving a five-fold reduction in costs compared to natural language-only communication. Our results showcase the power of negotiation as a basis for efficient, scalable, and decentralised agent networks. As LLMs continue to improve and as interactions between them increase, LLM-powered agent networks have the potential to surpass the scale limitations of single LLMs. Developing frameworks and protocols that enable decentralised, flexible and efficient communication, either through Agora or other technologies, can lay the foundations for a future where complex activities are partially, if not fully, automated by LLMs. Acknowledgements We thank the Alan Turing Institute for providing the computational power to run our agent network, as well as SambaNova for providing credits for our Llama 3 experiments. Samuele Marro is funded by Microsoft Research Ltd. Emanuele La Malfa is funded by the Alan Turing Institute. Jesse Wright is funded by the Department of Computer Science of the University of Oxford. References - Achiam et al. (2023) Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. - Agashe et al. (2023) Saaket Agashe, Yue Fan, and Xin Eric Wang. Evaluating multi-agent coordination abilities in large language models. arXiv preprint arXiv:2310.03903, 2023. - Arulkumaran et al. (2017) Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017. - Bakhtin et al. (2022) Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Meta Fundamental AI Research Diplomacy Team (FAIR)† Hu, Hengyuan, et al. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022. - Beckett et al. (2014) David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. Rdf 1.1 turtle. World Wide Web Consortium, 2014. - Benet (2014) Juan Benet. Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561, 2014. - Berners-Lee et al. (2001) Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific american, 284(5):34–43, 2001. - Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html. - Chaabouni et al. (2019) Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, and Marco Baroni. Anti-efficient encoding in emergent communication. Advances in Neural Information Processing Systems, 32, 2019. - Chaabouni et al. (2022) Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, and Bilal Piot. Emergent communication at scale. In International conference on learning representations, 2022. - Chen et al. (2023) Huaben Chen, Wenkang Ji, Lufeng Xu, and Shiyu Zhao. Multi-agent consensus seeking via large language models. arXiv preprint arXiv:2310.20151, 2023. - Chen et al. (2024) Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, and Chuchu Fan. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4311–4317. IEEE, 2024. - Collins et al. (2022) Katherine M Collins, Catherine Wong, Jiahai Feng, Megan Wei, and Joshua B Tenenbaum. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. arXiv preprint arXiv:2205.05718, 2022. - Cross et al. (2024) Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Yamins, and Nick Haber. Hypothetical minds: Scaffolding theory of mind for multi-agent tasks with large language models. arXiv preprint arXiv:2407.07086, 2024. - Deshpande et al. (2023) Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023. - Du et al. (2023) Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023. - Dubey et al. (2024a) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, et al. The llama 3 herd of models. 2024a. URL https://api.semanticscholar.org/CorpusID:271571434. - Dubey et al. (2024b) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024b. - Fedorenko et al. (2024) Evelina Fedorenko, Steven T. Piantadosi, and Edward A. F. Gibson. Language is primarily a tool for communication rather than thought. In Nature, pp. volume 630. Springer Nature, 2024. - Fu et al. (2023) Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023. - Gao et al. (2024) Chen Gao, Fengli Xu, Xu Chen, Xiang Wang, Xiangnan He, and Yong Li. Simulating human society with large language model agents: City, social media, and economic system. In Companion Proceedings of the ACM on Web Conference 2024, pp. 1290–1293, 2024. - Gilbert (2019) Nigel Gilbert. Agent-based models. Sage Publications, 2019. - Gilbert & Terna (2000) Nigel Gilbert and Pietro Terna. How to build and use agent-based models in social science. Mind & Society, 1:57–72, 2000. - Guo et al. (2024) Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024. - Havrylov & Titov (2017) Serhii Havrylov and Ivan Titov. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. Advances in neural information processing systems, 30, 2017. - Henderson et al. (2018) Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. - Hong et al. (2023) Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023. - Horty (2001) John F Horty. Agency and deontic logic. Oxford University Press, 2001. - Hou et al. (2023) Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 2023. - Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. - Hua et al. (2023) Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023. - Jin et al. (2023) Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783, 2023. - Joshi et al. (2023) Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a way to model truthfulness in language models. arXiv preprint arXiv:2310.18168, 2023. - Kaplan et al. (2020) Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. - Kassner et al. (2020) Nora Kassner, Benno Krojer, and Hinrich Schütze. Are pretrained language models symbolic reasoners over knowledge? arXiv preprint arXiv:2006.10413, 2020. - La Malfa et al. (2023) Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G Cohn, Nigel Shadbolt, and Michael Wooldridge. Language models as a service: Overview of a new paradigm and its challenges. arXiv e-prints, pp. arXiv–2309, 2023. - Lazaridou & Baroni (2020) Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419, 2020. - Lazaridou et al. (2018) Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. Emergence of linguistic communication from referential games with symbolic and pixel input. arXiv preprint arXiv:1804.03984, 2018. - Li et al. (2023a) Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for” mind” exploration of large language model society. Advances in Neural Information Processing Systems, 36:51991–52008, 2023a. - Li et al. (2023b) Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701, 2023b. - Li et al. (2024) Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. More agents is all you need. arXiv preprint arXiv:2402.05120, 2024. - Liang et al. (2023) Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023. - Lin et al. (2024) Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, and Janet B Pierrehumbert. Graph-enhanced large language models in asynchronous plan reasoning. arXiv preprint arXiv:2402.02805, 2024. - Liu et al. (2024) Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36, 2024. - OBL (2024) OBL. Open banking read write api profile v4.0. 2024. URL https://openbankinguk.github.io/read-write-api-site3/v4.0/profiles/read-write-data-api-profile.html. - Olivé (2007) Antoni Olivé. Conceptual modeling of information systems. Springer Science & Business Media, 2007. - Pang et al. (2024) Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, and Siheng Chen. Self-alignment of large language models via multi-agent social simulation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024. - Rafailov et al. (2024) Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024. - Reid et al. (2024) Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024. - Russell & Norvig (2016) Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Pearson, 2016. - Schick et al. (2024) Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024. - Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. - Singh et al. (2024) Ishika Singh, David Traum, and Jesse Thomason. Twostep: Multi-agent task planning using classical planners and large language models. arXiv preprint arXiv:2403.17246, 2024. - Song et al. (2023) Junghwan Song, Heeyoung Jung, Selin Chun, Hyunwoo Lee, Minhyeok Kang, Minkyung Park, Eunsang Cho, et al. How to decentralize the internet: A focus on data consolidation and user privacy. Computer Networks, 234:109911, 2023. - Sutton (2018) Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018. - Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. - Wooldridge (2009) Michael Wooldridge. An introduction to multiagent systems. John wiley & sons, 2009. - Wooldridge & Jennings (1995) Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995. - Wu et al. (2024a) Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024a. - Wu et al. (2024b) Zengqing Wu, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Run Peng, and Chuan Xiao. Shall we talk: Exploring spontaneous collaborations of competing llm agents. arXiv preprint arXiv:2402.12327, 2024b. - Zhang et al. (2024) Biao Zhang, Zhongtao Liu, Colin Cherry, and Orhan Firat. When scaling meets llm finetuning: The effect of data, model and finetuning method. arXiv preprint arXiv:2402.17193, 2024. Appendix A Agora: Use Cases S1. Agora as a personal assistant. A user is organising a trip to Paris: they want to book a flight, rent a car, and book a hotel room. The LLM reads the prompt, identifies the actions it has to undertake and checks if there are LLMs available in Agora who can fulfil it. For each service, an LLM is ready to reply. 1. A user sends a message to its personal assistant. 2. The personal assistant dispatches it to Agora. <details> <summary>img/scenarios/s1-1.png Details</summary> ![18af356d](/v1/image/18af356dc3f2205a0cf9f27dc1e4343dba9e003c9eb16586bd9069c6958d842d) ### Visual Description ## Diagram: User Request to LLM Assistant Dispatch ### Overview This diagram illustrates a process flow, starting with a user's natural language request for travel-related services, which is then processed and dispatched to an LLM's assistant. The assistant appears to represent a system that can handle multiple, interconnected tasks. ### Components/Axes The diagram consists of three main sections: 1. **User Input:** * A silhouette of a human head, indicating a user. * A smartphone icon, representing the device used for input. * A text box containing the user's request: "I want to book a **flight**, a **hotel** and a **car** for next week in Paris." * The words "flight" and "hotel" are highlighted in blue and orange, respectively. * The word "car" is highlighted in red. 2. **Process/Dispatch:** * A horizontal arrow pointing from the user input section to the right, signifying the direction of the process. * Text above the arrow reads: "Dispatch to the LLMs’ assistant". 3. **LLM Assistant Representation:** * A graph-like structure composed of four interconnected nodes (circles). * Each node contains an icon representing a service: * **Top Node:** An airplane icon, representing flights. * **Right Node:** A car icon, representing car rentals. * **Bottom Node:** A building icon, representing hotels. * **Left Node:** An empty circle, which may represent a general task or an unassigned service. * Lines connect the nodes, indicating potential relationships or dependencies between the services. Specifically: * The airplane node is connected to the empty node and the car node. * The car node is connected to the airplane node, the empty node, and the building node. * The building node is connected to the car node and the empty node. * The empty node is connected to the airplane node, the car node, and the building node. * There is also a central horizontal line connecting the left and right nodes, and a central vertical line connecting the top and bottom nodes, forming a grid-like structure within the diamond shape of the connections. ### Detailed Analysis or Content Details The diagram visually represents the transformation of a user's spoken or typed request into a structured format that can be handled by an AI assistant. * The user's request is a multi-intent query, asking for three distinct services: flights, hotels, and cars, with specific temporal (next week) and locational (Paris) constraints. * The "Dispatch to the LLMs’ assistant" text indicates that the raw request is processed and then routed to a more sophisticated system. * The graph structure on the right suggests that the LLM's assistant can manage these services as distinct but potentially related entities. The presence of an empty node might signify a placeholder for a general task, a coordination node, or a service that was not explicitly requested but is implicitly required (e.g., booking confirmation). The interconnections suggest that the assistant can manage dependencies or parallel processing of these requests. ### Key Observations * The diagram highlights the ability of an LLM assistant to parse complex, multi-part natural language requests. * The visual representation of the LLM assistant as a connected graph implies a modular or service-oriented architecture for handling different travel components. * The distinct icons within the nodes clearly categorize the types of services being managed. ### Interpretation This diagram demonstrates a conceptual workflow for how a large language model (LLM) assistant might process a user's travel booking request. The initial natural language input is parsed, identifying key entities and intents (flight, hotel, car, location, time). This parsed information is then "dispatched" to the LLM's assistant, which is depicted as a network of interconnected services. The interconnected nodes suggest that the assistant can manage these services in a coordinated manner. For instance, booking a flight might influence the availability or pricing of hotels, or vice-versa. The empty node could represent a central orchestrator or a generic task that needs to be fulfilled, such as confirming all bookings or handling payment. The visual representation implies that the LLM assistant is not just a simple command interpreter but a system capable of understanding and managing complex, multi-faceted requests by breaking them down into constituent services and their relationships. This is a fundamental step in building sophisticated AI agents that can perform real-world tasks. </details> The LLM that acts as personal assistant in the network dispatches the flight, hotel and car requests to the respective LLMs in the network. The messages are dispatched in natural language as there are no pre-existing routines to handle them. 1. The LLM personal assistant dispatches the respective messages to the right node. 2. The car, hotel, and flight LLMs process the requests and turn them into queries for their booking systems. 3. Each LLM replies with their availability and options. <details> <summary>img/scenarios/s1-2.png Details</summary> ![8c487d2a](/v1/image/8c487d2a263f055fdffe67e40cf3dc18a29c27e4cc4e6cb9697530ffad07d1fe) ### Visual Description ## Diagram: Travel Booking Intent Flow ### Overview This diagram illustrates a potential flow of user intents related to booking travel services in Paris. It depicts a starting point (represented by a blank circle) from which different booking actions can be initiated, leading to specific travel service icons. ### Components/Axes The diagram consists of: * **A blank circle:** Located on the left side of the diagram. This likely represents an initial user state or a general query. * **Three distinct icons within circles:** * An **airplane icon**: Located in the top-right quadrant. * A **car icon**: Located in the right-center of the diagram. * A **building icon (representing a hotel)**: Located in the bottom-right quadrant. * **Text labels with quotation marks:** These describe the specific intents or actions associated with the arrows. * "Book a flight for Paris" * "Book a car in Paris" * "Book a hotel room in Paris" * **Solid arrows:** These indicate a direct transition or a primary path from the initial state to a specific booking intent. * **Dashed lines:** These connect the travel service icons, suggesting potential relationships or alternative paths between them, though they do not have explicit labels or directional arrows. ### Detailed Analysis or Content Details The diagram shows the following connections: 1. **From the blank circle to the airplane icon:** A solid arrow originates from the blank circle and points towards the airplane icon. This arrow is labeled with the text "Book a flight for Paris". 2. **From the blank circle to the car icon:** A solid arrow originates from the blank circle and points towards the car icon. This arrow is labeled with the text "Book a car in Paris". 3. **From the blank circle to the hotel icon:** A solid arrow originates from the blank circle and points towards the hotel icon. This arrow is labeled with the text "Book a hotel room in Paris". Additionally, there are dashed lines connecting: * The airplane icon to the car icon. * The airplane icon to the hotel icon. * The car icon to the hotel icon. These dashed lines suggest that once a user has expressed an intent for one service (e.g., booking a flight), they might subsequently consider or be presented with options for other services (e.g., booking a car or hotel). ### Key Observations * The diagram clearly maps three distinct booking intents originating from a single, undefined starting point. * The use of solid arrows for direct intents and dashed lines for potential secondary connections is a key visual distinction. * All intents are explicitly tied to the location "Paris". ### Interpretation This diagram appears to represent a simplified intent recognition or dialogue management system for a travel booking application. The blank circle could symbolize a user's initial, perhaps vague, request or a general entry point into the booking system. The solid arrows and their associated text labels indicate that the system can directly interpret and act upon specific user commands like "Book a flight for Paris," "Book a car in Paris," or "Book a hotel room in Paris." The dashed lines suggest a potential for follow-on actions or related intents. For example, after a user expresses interest in booking a flight, the system might infer or suggest that they also need to book a hotel or a car. This could be part of a conversational flow where the system proactively offers related services to enhance the user experience and facilitate a complete travel plan. The lack of arrows on the dashed lines implies these are not necessarily sequential steps but rather related options or possibilities. The consistent mention of "Paris" across all intents highlights the geographical focus of this particular flow. </details> For the next iterations, the LLMs involved in the request propose a routine to standardise the requests to avoid natural language and process the request without invoking the LLMs. <details> <summary>img/scenarios/s1-3.png Details</summary> ![0ee8524f](/v1/image/0ee8524fa0f109010dc36f1ab606e90c87011b72e6739ad1179b6c9b094476d9) ### Visual Description ## Diagrams: Travel Service Integration Models ### Overview The image displays two distinct diagrams, labeled '1' and '2', illustrating different approaches to integrating travel-related services. Diagram 1 depicts a conceptual model where a central entity (represented by a blank circle) requests lists of flights, cars, and hotels from separate service providers. Diagram 2 shows a more structured approach where a central document or system acts as an intermediary, communicating with flight, car, and hotel services via a defined "protocol." ### Components/Axes There are no axes or legends in these diagrams as they are conceptual representations. ### Detailed Analysis **Diagram 1:** * **Central Entity:** Represented by a large, blank circle in the center-left. This entity appears to be the initiator of requests. * **Service Providers:** * **Flights:** Represented by a circle containing an airplane icon. * **Cars:** Represented by a circle containing a car icon. * **Hotels:** Represented by a circle containing a hotel building icon. * **Connections and Data:** * An arrow originates from the central entity and points to the "Flights" service provider, labeled with `<list of flights>`. This indicates a request for flight information. * An arrow originates from the central entity and points to the "Cars" service provider, labeled with `<list of cars>`. This indicates a request for car information. * An arrow originates from the central entity and points to the "Hotels" service provider, labeled with `<list of hotels>`. This indicates a request for hotel information. * Dashed grey lines connect the service providers to each other, suggesting potential indirect relationships or dependencies, though no specific data or protocol is indicated for these connections. **Diagram 2:** * **Central Intermediary:** Represented by a document icon with horizontal lines, positioned in the center. This element appears to manage communication. * **Service Providers:** * **Flights:** Represented by a circle containing an airplane icon, positioned top-center. * **Cars:** Represented by a circle containing a car icon, positioned right-center. * **Hotels:** Represented by a circle containing a hotel building icon, positioned bottom-center. * **Connections and Protocol:** * An arrow originates from the "Flights" service provider and points to the central intermediary, labeled with `protocol`. This indicates communication from the flight service to the intermediary. * An arrow originates from the central intermediary and points to the "Cars" service provider, labeled with `protocol`. This indicates communication from the intermediary to the car service. * An arrow originates from the "Hotels" service provider and points to the central intermediary, labeled with `protocol`. This indicates communication from the hotel service to the intermediary. * Dashed grey lines connect the service providers to each other, similar to Diagram 1, suggesting potential indirect relationships. ### Key Observations * Diagram 1 illustrates a direct request model from a central entity to individual service providers. * Diagram 2 demonstrates a more mediated approach where a central document or system acts as a hub, using a defined "protocol" for communication with each service. * The use of dashed grey lines in both diagrams suggests that while direct communication is shown with solid arrows, there might be underlying or secondary connections between the service providers themselves. ### Interpretation These diagrams represent two architectural patterns for integrating external services, likely within a travel booking or management system. Diagram 1 suggests a simpler, potentially less scalable model where a client application directly queries each service. This might be suitable for basic integrations but could lead to increased complexity on the client side as it needs to manage multiple distinct APIs and data formats. The labels `<list of flights>`, `<list of cars>`, and `<list of hotels>` imply that the central entity is requesting raw lists of available options. Diagram 2 presents a more robust and potentially more scalable architecture. The introduction of a central "protocol" implies a standardized way of interacting with the services, which can simplify development and maintenance. This intermediary component could be responsible for data transformation, aggregation, caching, or enforcing business logic. The arrows in Diagram 2 indicate a bidirectional or at least a structured communication flow, where the intermediary might be sending requests and receiving responses, or vice versa, according to the defined protocol. This model is more akin to a microservices or service-oriented architecture where a facade or API gateway pattern is employed. The dashed lines could represent inter-service communication that is not directly managed by the central intermediary but is still relevant to the overall system's functionality. </details> The user receives all the data and decides whether to book or not. S2. Security and scalability. An LLM (Alice) collects some historical data from another LLM (Bob) that has access to a database whose internal mechanism and implementation are to keep private. Alice submits a request to collect some historical records from Bob. The request is formatted in natural language. <details> <summary>img/scenarios/s2-1.png Details</summary> ![7c2e76de](/v1/image/7c2e76de3f207f19d455f8271a64512778bcfb27bf7db95a2a9f42904261ba63) ### Visual Description ## Diagram: Interaction between LLM Agents and Data Storage ### Overview This diagram illustrates a conceptual model of interaction between two Large Language Model (LLM) agents, labeled 'A' and 'B', and a data storage system. Agent A communicates a request to Agent B, which then interacts with a data store. A key aspect highlighted is that the data store is not accessible by other LLMs. ### Components/Axes * **Agent A**: Represented by a solid circle with the label "A" inside. * **Agent B**: Represented by a solid circle with the label "B" inside. * **Dotted Circle**: A dashed circle, positioned above and between Agents A and B, with a dotted outline. This element is not explicitly labeled but suggests a potential or indirect connection. * **Arrow from A to B**: A solid black arrow pointing from Agent A to Agent B, indicating a flow of information or a request. * **Text below Arrow**: The text "“I need the records from 2012”" is positioned below the arrow originating from Agent A, representing a specific query or instruction. * **Double-headed Arrow between B and Data Store**: A double-headed black arrow connecting Agent B to the data storage icon, indicating bidirectional communication or data transfer. * **Data Storage Icon**: A stacked cylinder icon representing a database or data repository. * **Dashed Vertical Line**: A dashed vertical line to the right of Agent B, separating the agent interaction area from the data storage area. * **Rounded Rectangle with Text**: A rounded rectangle positioned to the top-right of the dashed vertical line, containing the text "Not Accessible by other LLMs". This indicates a restriction on access to the data store. ### Detailed Analysis or Content Details * **Agent A to Agent B Communication**: Agent A initiates a communication with Agent B. The content of this communication is a specific request: "“I need the records from 2012”". This implies Agent A is delegating a task or seeking information from Agent B. * **Agent B to Data Store Interaction**: Agent B interacts with the data storage system. The double-headed arrow suggests that Agent B can both read from and potentially write to this data store. * **Data Store Accessibility**: The data storage system is explicitly marked as "Not Accessible by other LLMs". This suggests a controlled access mechanism, where only specific agents (in this case, likely Agent B, and indirectly Agent A through B) can interact with it. * **Dotted Connection**: The dashed lines and dotted circle suggest a potential, but not direct or active, connection or relationship between Agent A and a hypothetical entity, or perhaps a less defined interaction path. ### Key Observations * **Hierarchical or Delegated Interaction**: The diagram suggests a pattern where Agent A relies on Agent B to access specific data. * **Controlled Data Access**: The data store is a protected resource, not openly available to all LLMs. This implies a security or architectural design choice. * **Specific Data Request**: The explicit request from Agent A ("records from 2012") indicates that LLMs can make specific, context-aware queries. ### Interpretation This diagram illustrates a common pattern in distributed AI systems or multi-agent architectures. Agent A, perhaps a user-facing LLM or a different functional agent, requires specific data that it cannot directly access. It delegates this request to Agent B, which is designed to interact with a specialized, secured data store. The "Not Accessible by other LLMs" label is crucial. It implies that the data store contains sensitive information, proprietary data, or data that requires specific processing or authorization that only Agent B possesses. This architecture prevents unauthorized access and ensures data integrity and security. The dotted connection might represent a fallback mechanism, a less common interaction path, or a conceptual link that is not actively utilized in this specific scenario. The overall system demonstrates a modular approach to LLM interaction, where specialized agents handle specific tasks like data retrieval from restricted sources. This design promotes efficiency and security by isolating critical data access. </details> Alice submits another request to Bob. Bob negotiates a protocol to query its data and writes a shared document protocol in JSON. <details> <summary>img/scenarios/s2-2.png Details</summary> ![2ff2ec90](/v1/image/2ff2ec903025ccf0e9833bc8a2199869a556c6ba9cbf33d2fa142e27ca925b8b) ### Visual Description ## Diagram: Interaction Protocol Between Entities ### Overview This diagram illustrates a conceptual interaction between two entities, labeled 'A' and 'B', and a data storage system. Entity 'A' and 'B' appear to be involved in a negotiation process. Entity 'B' also interacts with a data storage system that is explicitly marked as "Not Accessible by other LLMs". A separate document icon suggests the definition or establishment of a "protocol". ### Components/Axes The diagram consists of the following visual elements: * **Entity A**: Represented by a solid circle with the letter 'A' inside. It is positioned on the left side of the diagram. * **Entity B**: Represented by a solid circle with the letter 'B' inside. It is positioned to the right of Entity A. * **Dotted Circle**: A dashed, circular outline is positioned above and between Entities A and B, suggesting a potential or abstract connection. * **Data Storage System**: Represented by a stack of discs, indicating a database or storage. This is positioned to the right of Entity B. * **"Not Accessible by other LLMs" Box**: A rounded rectangle with a dashed border, containing the text "Not Accessible by other LLMs". This box encloses the Data Storage System. * **Document Icon**: A stylized icon representing a document with horizontal lines, positioned below and to the left of Entity B. * **Arrows**: Various arrows indicate the direction of interaction or flow. ### Detailed Analysis * **Interaction between A and B**: A solid black arrow originates from Entity A and points towards Entity B. Below this arrow, the text "<negotiation of the protocol>" is inscribed, indicating a communication or agreement process. A reciprocal arrow originates from Entity B and points towards Entity A, also labeled with "<negotiation of the protocol>", signifying a two-way negotiation. * **Dashed Line Connection**: A dashed grey line connects Entity A to the dotted circle above, and another dashed grey line connects Entity B to the same dotted circle. This suggests an indirect or conceptual link, possibly representing a shared context or a higher-level abstraction. * **Interaction between B and Data Storage**: A double-headed arrow connects Entity B to the Data Storage System. This indicates a bidirectional flow of information or interaction between Entity B and the storage. * **Protocol Definition**: A solid black arrow originates from Entity B and points towards the Document Icon. This arrow is labeled with the word "protocol" in italics, suggesting that Entity B is either defining, generating, or communicating a protocol represented by this document. * **Accessibility Restriction**: The Data Storage System is visually contained within a shaded region marked "Not Accessible by other LLMs". This clearly delineates that the data within this storage is restricted from access by other Large Language Models. ### Key Observations * The diagram depicts a system where two entities, A and B, are engaged in a protocol negotiation. * Entity B plays a central role, interacting with both Entity A and the data storage. * The data storage is explicitly isolated and not accessible to other LLMs, implying a private or specialized data repository. * The "protocol" is associated with Entity B and a document icon, suggesting it's a defined set of rules or procedures. * The dashed lines and dotted circle suggest a conceptual layer or an alternative communication path that is not explicitly defined by solid arrows. ### Interpretation This diagram illustrates a system architecture or interaction model, likely within the context of AI or software systems. * **Negotiation and Protocol Establishment**: The core interaction between A and B, labeled "<negotiation of the protocol>", suggests that these entities are establishing or agreeing upon a communication protocol. This is a fundamental step in enabling interoperability or defining how they will interact. * **Role of Entity B**: Entity B appears to be a key intermediary or controller. It not only negotiates with A but also directly interacts with a data storage system. The arrow pointing from B to the "protocol" document implies that B is responsible for defining or disseminating this protocol. * **Data Isolation and Security**: The "Not Accessible by other LLMs" designation for the data storage is a critical piece of information. It highlights a design choice to keep certain data private or exclusive to the entities within this specific interaction. This could be for reasons of data sensitivity, proprietary information, or to prevent interference from external AI agents. * **Conceptual vs. Actual Interaction**: The dashed lines connecting to the dotted circle might represent a higher-level conceptual model, a shared understanding, or a potential future state of interaction, distinct from the direct, solid-line communications. * **Overall System Dynamics**: The diagram suggests a scenario where two AI agents (or components) are setting up their communication rules, with one agent managing access to a private data store. This could be relevant in scenarios like federated learning, secure data sharing, or specialized AI agent collaboration where data privacy is paramount. The diagram implies a controlled environment for interaction and data access. </details> Alice now uses the protocol to query data from Bob. Bob directly turns the JSON they receives from Alice into a query for its Database. In this way: Bob does not invoke the LLM and the database internals are not exposed. <details> <summary>img/scenarios/s2-3.png Details</summary> ![44712c4c](/v1/image/44712c4c307c0b54e1e3bc562773ab1f34b625c201494e88da08dd0223f7d50a) ### Visual Description ## Diagram: Data Flow and Accessibility in a System ### Overview This diagram illustrates a simplified data flow and accessibility model within a system, likely involving Large Language Models (LLMs). It depicts two entities, labeled 'A' and 'B', and a data store. Entity 'A' appears to generate or possess data that is then transmitted to entity 'B'. Entity 'B' has a bidirectional relationship with a data store, which is explicitly marked as "Not Accessible by other LLMs". ### Components/Axes This diagram does not contain traditional axes or charts. The components are: * **Entity A**: Represented by a large circle with the letter "A" inside. It is positioned on the left side of the diagram. * **Document Icon**: A black icon representing a document with horizontal lines, positioned to the right of Entity A. * **Data Snippet**: Textual data enclosed in curly braces, located to the right of the document icon. It reads: ``` { "record": 2013 } ``` * **Arrow from A to B**: A solid black arrow originating from Entity A and pointing towards Entity B, indicating a data flow or communication. * **Entity B**: Represented by a large circle with the letter "B" inside. It is positioned to the right of Entity A. * **Dotted Lines**: Two dashed grey lines originate from a dotted circle above and between A and B, extending towards A and B respectively. The dotted circle itself is not explicitly labeled but suggests a potential connection or scope. * **Bidirectional Arrow between B and Data Store**: A double-headed black arrow connecting Entity B to the data store, indicating a two-way communication or data exchange. * **Data Store**: Represented by a stack of grey discs, symbolizing a database or storage. It is positioned on the right side of the diagram. * **Boundary/Region**: A shaded grey rectangular region encloses the data store and the text "Not Accessible by other LLMs". This region is demarcated by a dashed grey border. * **Text Box**: A rounded rectangle within the shaded region, containing the text "Not Accessible by other LLMs". ### Detailed Analysis or Content Details * **Entity A to Entity B Data Transfer**: Entity A sends data to Entity B. This data is represented by a document icon and a JSON-like snippet `{ "record": 2013 }`. The arrow indicates a one-way transfer from A to B. * **Entity B and Data Store Interaction**: Entity B can both send data to and receive data from the data store, as indicated by the bidirectional arrow. * **Data Store Accessibility**: The data store is explicitly designated as "Not Accessible by other LLMs". This implies that while Entity B can interact with it, other LLMs (presumably outside of this depicted system or scope) cannot directly access the data within this store. * **Dotted Circle and Lines**: The dotted circle and lines suggest a potential, possibly optional or conceptual, connection or scope related to entities A and B. It might represent a shared context, a broader network, or a different type of interaction not fully detailed. ### Key Observations * **Data Encapsulation**: The data store is isolated and its contents are restricted from access by "other LLMs". * **Role of Entity B**: Entity B acts as an intermediary or a gateway to the data store, with the ability to read from and write to it. * **Data Format**: The data transferred from A to B is presented in a structured format, resembling JSON. ### Interpretation This diagram illustrates a system architecture where data originates from a source (A), is processed or relayed by an intermediary (B), and stored in a protected data repository. The key takeaway is the controlled access to the data store. Entity B, which is part of the system and can interact with the store, is distinct from "other LLMs" that are explicitly denied access. This suggests a security or privacy mechanism is in place, ensuring that sensitive data within the store is only accessible through a designated pathway (Entity B). The presence of the document icon and the JSON-like data snippet indicates that the data being transferred is structured and potentially represents a record with a specific value (2013). The dotted lines and circle might represent a higher-level abstraction or a different communication channel that is not directly involved in the primary data flow depicted. In essence, the diagram demonstrates a pattern of data isolation and controlled access, where a specific component (B) has privileged access to a restricted data store, while other entities (other LLMs) are excluded. This is a common pattern in systems designed for data security, privacy, or specialized processing. The diagram could be interpreted as a simplified representation of how a specific LLM (B) might interact with a private dataset, while preventing other LLMs from directly querying that dataset. </details> S3. Compositional tasks. An LLM (Alice) wants to (1) analyse some market data and then (2) compute some metrics. Two LLMs in the network can do that. 1. Alice retrieves the protocol documents from a database. 2. Alice finds out that there are two protocol documents that can be used to achieve its goal. <details> <summary>img/scenarios/s3-1.png Details</summary> ![f481d367](/v1/image/f481d36796c6e57ceff87aa220720dc5a59e8dfb2c8c5acd8c7602745726a0c6) ### Visual Description ## Diagram: System Interaction Flow ### Overview This diagram illustrates a simplified interaction flow between three components, labeled as 'A', a database icon, and a calculator icon. The interactions are depicted by arrows and dashed lines, with accompanying text describing the actions or checks performed. ### Components/Axes The diagram does not feature traditional axes or scales as it is a conceptual illustration. The key components are: * **Component A**: Represented by a large circle with the letter 'A' inside. This component appears to be a central processing or decision-making entity. * **Document Icons**: Three document icons are positioned above Component A, indicating a collection of documents. * **Database Icon**: Represented by a stack of discs within a circle, positioned to the top-right of Component A. This likely symbolizes data storage. * **Calculator Icon**: Represented by a calculator within a circle, positioned to the bottom-right of Component A. This likely symbolizes computational or analytical processes. * **Interaction Arrows**: * A double-headed arrow connects the document icons to Component A, with the text "Check if a protcol document exists" positioned to the left of this arrow. * A dashed line connects Component A to the Database Icon. * A dashed line connects Component A to the Calculator Icon. * A dashed line connects the Database Icon to the Calculator Icon. * **Descriptive Text**: * "Check if a protcol document exists" is located to the left of the arrow between the document icons and Component A. * "Retrieve some numerical data and compute some metrics." is located below Component A. ### Detailed Analysis or Content Details The diagram depicts the following interactions and processes: 1. **Protocol Document Check**: Component A interacts with a set of three documents. The double-headed arrow and the accompanying text "Check if a protcol document exists" indicate that Component A performs a check to determine the presence or validity of a protocol document. This interaction is bidirectional, suggesting a query and response mechanism. 2. **Data Retrieval and Computation**: Below Component A, the text "Retrieve some numerical data and compute some metrics." describes a primary function of Component A. This implies that Component A is responsible for accessing numerical data and performing calculations to derive metrics. 3. **Inter-component Connections**: * A dashed line connects Component A to the Database Icon. This suggests that Component A can access or interact with the database, likely to retrieve the "numerical data" mentioned. * A dashed line connects Component A to the Calculator Icon. This suggests that Component A can initiate or utilize the computational capabilities of the calculator, possibly to "compute some metrics." * A dashed line connects the Database Icon to the Calculator Icon. This implies that the database can provide data to the calculator, or that the calculator can directly access the database. ### Key Observations * Component A appears to be the central orchestrator, initiating checks and data processing tasks. * The diagram suggests a workflow where protocol documents are first verified, then numerical data is retrieved from a database, and finally, metrics are computed using a calculator. * The dashed lines suggest a less direct or asynchronous communication compared to the solid arrows, or simply represent logical connections rather than direct data flow in all cases. ### Interpretation This diagram illustrates a conceptual model of a system where a central entity (Component A) performs several key functions. Initially, it verifies the existence of a "protocol document," which could be a prerequisite for further operations. Following this check, Component A is described as retrieving numerical data and computing metrics. The connections to the database and calculator icons suggest that these are the resources or modules Component A utilizes to fulfill its data retrieval and computation responsibilities. The connection between the database and calculator further reinforces the idea that data is processed computationally. The overall flow suggests a process that might be found in data analysis pipelines, system validation, or compliance checking systems, where a set of rules (protocol documents) are checked, data is gathered, and then analyzed to produce meaningful insights or metrics. The diagram highlights the modularity of the system, with distinct components responsible for storage, computation, and central control/logic. The bidirectional arrow for the protocol check implies a potential for feedback or confirmation. </details> 1. Alice submits a request to the first agent to retrieve the data using the first protocol document. 2. Alice receives the data as expected. <details> <summary>img/scenarios/s3-2.png Details</summary> ![674fc131](/v1/image/674fc131af49b7150d06751ea68cd65ef79dcb0c986d935ef0d823ed4e4b43c3) ### Visual Description ## Diagram: Data Processing Flow ### Overview This diagram illustrates two distinct data processing pathways, both originating from a conceptual entity labeled "A". Pathway 1 shows "A" interacting with a document icon, leading to a stack of discs icon. Pathway 2 shows "A" interacting with a scatter plot, which then leads to a stack of discs icon. The diagram is segmented into two main sections, marked by the numbers "1" and "2" in square boxes. ### Components/Axes * **Circles with "A"**: Two instances of a circular node, each containing the letter "A". These likely represent a source or a process. * **Document Icon**: A stylized icon depicting a document with lines of text and a folded corner. This represents a document or data file. * **Scatter Plot**: A simple 2D scatter plot with an L-shaped axis, showing several data points. This represents raw or analyzed data. * **Stack of Discs Icon**: A stylized icon depicting a stack of horizontal discs. This commonly represents a database or a collection of data. * **Arrows**: Directed lines indicating the flow of information or process. * **Numbered Boxes (1 and 2)**: Square boxes containing the numbers "1" and "2", likely segmenting or labeling the two distinct pathways. ### Detailed Analysis or Content Details **Pathway 1 (Segment 1):** * A circular node labeled "A" is connected by an arrow to a document icon. * The document icon is then connected by an arrow to a stack of discs icon. * This pathway suggests that "A" interacts with a document, and the output or result of this interaction is stored or represented by the stack of discs. **Pathway 2 (Segment 2):** * A circular node labeled "A" is connected by an arrow to a scatter plot. * The scatter plot is then connected by an arrow to a stack of discs icon. * This pathway suggests that "A" interacts with or generates a scatter plot (representing data points), and this data is then stored or represented by the stack of discs. ### Key Observations * Both pathways converge on the "stack of discs" icon, implying that both processes ultimately result in data being stored or managed in a database-like structure. * Pathway 1 involves a document as an intermediate step, while Pathway 2 involves a scatter plot. This suggests different types of input or intermediate data representations. * The repetition of the "A" node suggests that "A" is a common starting point or a recurring process in both scenarios. ### Interpretation This diagram visually represents two distinct data processing workflows. **Pathway 1** suggests a process where an entity "A" consumes or processes a document, leading to the creation or storage of structured data (represented by the stack of discs). This could symbolize tasks like data extraction from documents, document analysis, or content management. **Pathway 2** suggests a process where an entity "A" interacts with or generates a set of data points visualized as a scatter plot. This data is then stored in a database. This pathway might represent data analysis, visualization of raw data, or the output of a statistical model. The common endpoint of both pathways, the "stack of discs," indicates that regardless of the initial input (document or scatter plot data), the ultimate outcome is data consolidation or storage in a persistent data repository. The diagram highlights that different types of data inputs or intermediate representations can lead to the same data storage outcome. The numbers "1" and "2" serve to clearly delineate these two separate, albeit related, processes. The diagram is abstract and does not provide specific details about the nature of "A," the content of the document, or the specific data points in the scatter plot, but it effectively communicates a conceptual flow of data processing and storage. </details> 1. Alice submits a request to the second LLM to compute some metrics on the data using the second protocol document. 2. Alice receives the metrics as expected. <details> <summary>img/scenarios/s3-3.png Details</summary> ![1b4852bf](/v1/image/1b4852bfcb0bb33c38137cd350399cb54d915dcd8f867d9893f0413912060fb0) ### Visual Description ## Diagram: Data Processing Flow ### Overview This diagram illustrates two distinct data processing flows, both originating from a component labeled "A". Each flow leads to a calculator icon, suggesting a computational or analytical process. The key difference between the two flows lies in the type of data input preceding the calculator. ### Components/Axes The diagram consists of the following elements: * **Numbered Labels:** * "1" (top-left) * "2" (bottom-left) * **Circular Nodes:** * A circle with the letter "A" inside, present in both flow 1 and flow 2. * **Iconographic Representations of Data/Processes:** * **Flow 1:** * A document icon (representing data or a report). * A plus sign (+) icon. * A scatter plot icon (representing data points). * **Flow 2:** * A bar chart icon (representing data visualization or aggregated data). * **Arrows:** * A black arrow indicating the direction of flow from "A" through the data icons to the calculator. * A second black arrow indicating the direction of flow from "A" through the data icons to the calculator. * **Calculator Icons:** * A stylized icon of a calculator with buttons and a display, present at the end of both flow 1 and flow 2. ### Detailed Analysis or Content Details **Flow 1 (Labeled "1"):** 1. **Starting Point:** Component "A". 2. **Intermediate Data Representation:** The flow from "A" is followed by a combination of a document icon and a scatter plot icon, linked by a plus sign. This suggests that "A" provides input that is then combined with a document and a scatter plot for further processing. 3. **End Point:** The combined data is processed by a calculator icon. **Flow 2 (Labeled "2"):** 1. **Starting Point:** Component "A". 2. **Intermediate Data Representation:** The flow from "A" is followed by a bar chart icon. This suggests that "A" provides input that is then processed using bar chart data. 3. **End Point:** The bar chart data is processed by a calculator icon. ### Key Observations * Both flows originate from the same component "A". * Both flows culminate in a calculator, implying a computational or analytical outcome. * The primary distinction is the nature of the data input: Flow 1 uses a combination of a document and a scatter plot, while Flow 2 uses a bar chart. ### Interpretation This diagram visually represents two different pathways for processing information originating from a source labeled "A". * **Flow 1** suggests a process where raw data (potentially represented by the document and scatter plot) is fed into a computational tool (the calculator). The scatter plot specifically implies a need for analyzing relationships or distributions between variables. * **Flow 2** suggests a process where summarized or visualized data (represented by the bar chart) is used for calculation or analysis. Bar charts typically represent comparisons or trends of discrete categories. The presence of the plus sign in Flow 1 between the document and scatter plot icons could indicate that the document contains the raw data that is then plotted, or that the document and scatter plot are separate but combined inputs. Overall, the diagram demonstrates that the same initial component "A" can lead to different analytical outcomes depending on the type of data or representation used in the subsequent steps. This could represent different analytical strategies or different types of data outputs derived from "A". The calculator icon at the end of each path signifies that a quantitative or logical operation is performed on the preceding data. </details> S4. Scalable consensus in large networks. An LLM (Alice) wants to collect and aggregate data points from $N\gg 1$ resources. There is no protocol to handle that, and each resource has its own implementation, possibly not public. 1. Alice submits the requests in natural language. 2. Each queried LLM processes the request, turns it into a routine to retrieve the data and sends it back to Alice. <details> <summary>img/scenarios/s4-1.png Details</summary> ![9863cad9](/v1/image/9863cad9f3a5311be9c5674e9e9df4e6f082d4eb722aabc5ff25c763c78128e7) ### Visual Description ## Diagram: Data Request and Retrieval Process ### Overview The image displays two distinct diagrams, labeled '1' and '2', illustrating a process involving a central entity 'A' and multiple other entities represented by circles. Diagram 1 depicts a "Request data" flow from 'A' to the other entities. Diagram 2 shows a "Process and retrieve data" flow, where data is returned to 'A' from the other entities. Both diagrams use a star-like topology originating from 'A'. ### Components/Axes * **Central Entity:** Represented by a circle labeled 'A'. This entity is the origin of requests in diagram 1 and the destination of retrieved data in diagram 2. * **External Entities:** Represented by multiple circles arranged vertically to the right of 'A'. The number of these entities is indicated by the presence of three visible circles and an ellipsis (...), suggesting a variable or larger number of entities. * **Flow Arrows:** Solid black arrows indicate the direction of data flow. * **Dashed Grey Arcs:** These arcs connect the external entities in a circular fashion, suggesting a potential relationship or communication channel between them, though no explicit label is provided for this connection. * **Labels:** * "1" (Top-left corner of the first diagram) * "Request data" (Above the arrows originating from 'A' in diagram 1) * "2" (Top-left corner of the second diagram) * "Process and retrieve data" (Above the arrows originating from the external entities and pointing to 'A' in diagram 2) ### Detailed Analysis or Content Details **Diagram 1: Request Data** * **Description:** Entity 'A' initiates a request for data. * **Flow:** Solid black arrows originate from 'A' and point towards each of the external entities. * **Textual Annotation:** "Request data" is positioned above the outgoing arrows from 'A'. * **Implied Action:** 'A' is sending out requests to multiple sources. **Diagram 2: Process and Retrieve Data** * **Description:** The external entities process the requests and return data to entity 'A'. * **Flow:** Solid black arrows originate from the external entities and point towards 'A'. * **Textual Annotation:** "Process and retrieve data" is positioned above the incoming arrows to 'A'. * **Implied Action:** The external entities are performing an action (processing) and then sending data back to 'A'. **Dashed Grey Arcs:** * **Placement:** These arcs are positioned behind the external entities, curving from the top-most entity to the bottom-most entity in each diagram. * **Interpretation:** While not explicitly labeled, these arcs suggest a potential network or communication path among the external entities themselves, or perhaps a cyclical dependency or grouping. They are present in both diagrams, indicating a consistent structural element. ### Key Observations * The diagrams illustrate a two-phase process: an outbound request phase and an inbound retrieval phase. * Entity 'A' acts as a central coordinator or consumer of data. * The external entities are the providers of data, capable of processing requests. * The ellipsis (...) indicates that the number of external entities is not fixed at three and could be more. * The dashed grey arcs suggest an underlying structure or relationship among the external entities that is not directly part of the primary data flow from 'A'. ### Interpretation These diagrams visually represent a common pattern in distributed systems, client-server architectures, or data retrieval mechanisms. * **What the data suggests or demonstrates:** Diagram 1 demonstrates a broadcast or multicast request from a central entity ('A') to multiple data sources. Diagram 2 shows the subsequent response where these data sources fulfill the request, process the information, and return it to the central entity. This could represent a query to a database cluster, a request to microservices, or a task distribution scenario. * **How the elements relate to each other:** Entity 'A' is the initiator and receiver. The external entities are the workers or data providers. The arrows clearly define the direction of information flow. The dashed arcs hint at a potential peer-to-peer communication or a shared context among the data providers, which might be relevant for coordination or data consistency, though this is not explicitly detailed. * **Any notable outliers, trends, or anomalies:** There are no numerical data points or trends to analyze in this diagram. It is purely a conceptual representation of a process. The "..." is a standard notation to indicate continuation or an unspecified number of items, not an anomaly. The consistent presence of the dashed arcs in both diagrams suggests they are a stable feature of the system being depicted. The diagrams are simple and clear, lacking any complex interactions or unexpected elements. </details> Alice wants to retrieve more data and queries the network another time. 1. One or more receivers suggest using a protocol document for the next iterations. 2. Alice agrees and uses the protocols with as many resources as possible. <details> <summary>img/scenarios/s4-2.png Details</summary> ![e69a2268](/v1/image/e69a2268590ca08db55c0352a3f52eaf8963178abd13f8f28d2236c6c82aeed7) ### Visual Description ## Diagram: Protocol Negotiation Flow ### Overview This diagram illustrates a process where a central entity, labeled "A", initiates a "negotiation of the protocol" with multiple other entities. The diagram depicts a star-like structure originating from "A", with arrows indicating the direction of the negotiation. There are also dashed lines suggesting a potential connection or interaction among the other entities. ### Components/Axes * **Node A**: Represented by a circle with the letter "A" inside. This node is positioned on the left side of the diagram. * **Other Nodes**: Represented by a series of circles arranged vertically to the right of Node A. The ellipsis (...) between the second and fourth circles indicates that there are more nodes in this sequence than explicitly drawn. * **Arrows**: Solid black arrows originate from Node A and point towards the other nodes. These arrows signify the direction of the "negotiation of the protocol". * **Labels**: The text "<negotiation of the protocol>" is positioned alongside several of the arrows, indicating the nature of the interaction. The angle brackets suggest this is a specific type of message or action. * **Dashed Lines**: Light gray dashed lines connect some of the "other nodes" in a curved manner, suggesting a secondary or implicit relationship between these entities. ### Detailed Analysis or Content Details * Node A is the source of multiple outgoing connections. * There are at least four distinct "other nodes" depicted, with the ellipsis implying a variable number of participants in the negotiation. * The label "<negotiation of the protocol>" is associated with at least three of the outgoing connections from Node A. The text is rotated to align with the direction of the arrows. * The dashed lines connect the second, third, and fourth depicted "other nodes" in a loop-like fashion, suggesting they might communicate or interact with each other after or during the negotiation with A. ### Key Observations * The diagram clearly shows a centralized initiation of a protocol negotiation from entity "A". * The use of arrows indicates a directed flow of communication or action. * The ellipsis suggests scalability or a variable number of participants in the negotiation process. * The dashed lines introduce an element of inter-entity communication among the recipients of the negotiation, which is distinct from the primary negotiation initiated by "A". ### Interpretation This diagram visually represents a common pattern in distributed systems or network protocols where a central entity (A) orchestrates or initiates a negotiation process with multiple other entities. The label "<negotiation of the protocol>" implies that this is a formal step in establishing communication or agreement between the parties. The dashed lines between the other nodes suggest that once the initial negotiation with A is underway or completed, these entities may engage in their own internal communication or coordination. This could represent scenarios like: * **Client-Server Interaction**: 'A' could be a server initiating a connection setup with multiple clients. * **Group Communication Setup**: 'A' might be a coordinator setting up a group communication channel, and the dashed lines represent peer-to-peer communication among group members. * **Protocol Handshake**: 'A' could be initiating a handshake with several other nodes to establish a common protocol for subsequent interactions. The diagram highlights a two-tiered interaction: a primary, directed negotiation from a central point, followed by potential secondary, peer-to-peer interactions among the participants. The use of angle brackets for the label suggests a specific, possibly standardized, message or phase within a larger protocol. </details> The successive communications will increasingly use protocol documents, thus not necessitating the receiver to process the query with the LLM. S5. Scaling complex NLP routines. An LLM (Alice) wants to retrieve data from a system powered by an LLM (Bob) that, in turns, obtains its data from a search engine (i.e., the LLM is combined with a RAG). Bob has to (1) turn the natural language request into a query, (2) retrieve the data from the RAG, and (3) return a summary. Alice queries Bob to retrieve some data. There is no routine to handle any of the three phases, so Bob has to invoke the LLM twice to turn the query into a format to invoke the RAG and then perform the summarisation. <details> <summary>img/scenarios/s5-1.png Details</summary> ![0f80421f](/v1/image/0f80421ff09298503d781de76a7fa2bf7743ce3dda02d25c04ee5b069bf6f800) ### Visual Description ## Diagram: Illustrating a Process Flow with Two Entities ### Overview This image displays a series of four numbered diagrams, each depicting a process flow between two entities labeled 'A' and 'B'. The diagrams illustrate different stages or interactions, with accompanying text describing the action taking place. ### Components/Axes There are no explicit axes or legends in this diagram. The primary components are: - **Numbered Labels:** 1, 2, 3, and 4, indicating sequential steps or distinct scenarios. - **Entities:** Represented by circles labeled 'A' and 'B'. - **Arrows:** Indicate the direction of flow or interaction between entities. - **Textual Descriptions:** Enclosed in quotation marks or angle brackets, explaining the process. - **Icon:** In diagram 3, a magnifying glass with a gear inside, representing a search or retrieval mechanism. ### Detailed Analysis **Diagram 1:** - **Label:** 1 - **Entities:** A and B. - **Flow:** An arrow points from A to B. - **Description:** "What's the highest mountain in the world" (This appears to be a query or input from A to B). **Diagram 2:** - **Label:** 2 - **Entities:** A and B. - **Flow:** An arrow points from A to B. - **Description:** <Processes the query with an LLM> (This indicates that entity B processes the query from A using a Large Language Model). **Diagram 3:** - **Label:** 3 - **Entities:** A and B. - **Flow:** A double-headed arrow between A and B, and a single arrow from B pointing towards a dashed line boundary. To the right of the dashed line is an icon of a magnifying glass with a gear. - **Description:** <Invokes the RAG> (This suggests that entity B, after processing, invokes a Retrieval-Augmented Generation (RAG) system, represented by the icon and dashed line). **Diagram 4:** - **Label:** 4 - **Entities:** A and B. - **Flow:** An arrow points from B to A. - **Description:** <Summarises the content with an LLM> (This indicates that entity B, after summarization, sends information back to A, likely using an LLM). ### Key Observations - The diagrams illustrate a sequential process, likely related to a question-answering or information retrieval system. - Entity A appears to be the initiator, posing a query. - Entity B is responsible for processing the query, invoking a RAG system, and then summarizing content. - The flow of information is primarily from A to B, with a final step of information returning from B to A. - The icon in diagram 3 strongly suggests the involvement of a search or retrieval mechanism, consistent with RAG. ### Interpretation This diagram series outlines a simplified workflow for a system that answers questions. - **Diagram 1** shows entity A posing a question to entity B. - **Diagram 2** clarifies that entity B uses a Large Language Model (LLM) to process this query. - **Diagram 3** indicates that entity B then interacts with a Retrieval-Augmented Generation (RAG) system. The RAG system is depicted as an external component, possibly a database or knowledge base, accessed via a search/retrieval mechanism (magnifying glass with gear). The double-headed arrow between A and B in this step might imply a back-and-forth communication or that B is orchestrating the RAG process. - **Diagram 4** shows entity B returning a summarized answer to entity A, again utilizing an LLM for the summarization. The overall process demonstrates a common pattern in modern AI-powered question-answering systems, where an LLM is augmented with external knowledge retrieval to provide more accurate and contextually relevant answers. The progression from query to processing, retrieval, and summarization is clearly depicted. </details> Alice queries Bob again; this time, Bob asks to use a routine to query directly the RAG. <details> <summary>img/scenarios/s5-2.png Details</summary> ![0d5c8169](/v1/image/0d5c8169a0ca3495f39841d0bb327be610ec270cd7b6c85086f71dea19b695d0) ### Visual Description ## Diagram: Flow of Information Processing ### Overview This image displays three distinct diagrams, each illustrating a different stage or aspect of information processing, likely within a system involving a Large Language Model (LLM) and a Retrieval Augmented Generation (RAG) component. The diagrams use circles labeled 'A' and 'B' to represent entities or processes, connected by arrows to indicate data flow. Each diagram is numbered (1, 2, and 3) and includes descriptive text below. ### Components/Axes There are no traditional axes or legends in this diagram. The key components are: * **Circles labeled 'A' and 'B'**: These represent abstract entities or stages in a process. * **Arrows**: Indicate the direction of data flow or interaction between 'A' and 'B'. * **Icons**: A document icon in Diagram 1 and a magnifying glass with gears icon in Diagram 2. * **Dashed Boxes**: A red dashed box in Diagram 1 highlights a skipped process. A grey dashed vertical line in Diagram 2 separates 'B' from the RAG component. * **Numbers (1, 2, 3)**: Used to sequentially label each distinct diagram. * **Descriptive Text (in angle brackets)**: Provides context for each diagram's depicted process. ### Detailed Analysis **Diagram 1:** * **Label:** 1 * **Components:** Circle 'A', Circle 'B', a document icon next to 'A', a solid arrow pointing from 'A' to 'B'. A red dashed box encloses a greyed-out representation of 'A' and 'B' connected by a grey line, with the text "SKIP" and "<Processes the query with an LLM>" below it. * **Text:** `<Formatted query>` below the arrow from 'A' to 'B'. * **Description:** This diagram shows a formatted query originating from 'A' and being sent to 'B'. The red dashed box indicates that a process involving 'A' and 'B' ("Processes the query with an LLM") is skipped. **Diagram 2:** * **Label:** 2 * **Components:** Circle 'A', Circle 'B', a solid arrow pointing from 'A' to 'B', a double-headed arrow between 'B' and a RAG component (represented by a dashed vertical line and a magnifying glass with gears icon). * **Text:** `<Invokes the RAG>` below the double-headed arrow. * **Description:** This diagram illustrates 'A' sending a query to 'B', and 'B' then invoking the RAG component. The double-headed arrow suggests a two-way interaction between 'B' and the RAG. **Diagram 3:** * **Label:** 3 * **Components:** Circle 'A', Circle 'B', a solid arrow pointing from 'B' to 'A'. * **Text:** `<Summarises the content with an LLM>` below the arrow from 'B' to 'A'. * **Description:** This diagram shows content being summarized by an LLM, with the summarized content flowing from 'B' back to 'A'. ### Key Observations * The diagrams depict a sequential or conditional flow of information. * Diagram 1 suggests an initial formatting of a query. * Diagram 2 shows a critical step where a RAG system is invoked, implying retrieval of information. * Diagram 3 indicates a summarization step, likely performed by an LLM, with the output returning to the originating entity ('A'). * The "SKIP" annotation in Diagram 1 implies that the LLM processing step might be bypassed in certain scenarios, or that the initial query formatting is a prerequisite before LLM processing. ### Interpretation These diagrams collectively illustrate a potential workflow for a system that uses a Large Language Model (LLM) and Retrieval Augmented Generation (RAG). * **Diagram 1** likely represents the initial stage where a user's input (or an internal process) is transformed into a structured or "formatted query" that can be understood by the subsequent components. The "SKIP" annotation around the LLM processing suggests that the system might first format the query and then, depending on the context or if RAG is involved, proceed to other steps without direct LLM processing at that specific point, or that the LLM processing is a separate, potentially optional, step. * **Diagram 2** is the core of the RAG interaction. 'A' sends a query to 'B' (which could be an orchestrator or a specific module). 'B' then "invokes the RAG," meaning it queries an external knowledge base or data source to retrieve relevant information. The double-headed arrow between 'B' and the RAG icon signifies that 'B' sends a query to RAG and receives results back. This retrieved information is crucial for grounding the LLM's responses. * **Diagram 3** shows the final stage where 'B' (having potentially processed retrieved information) "summarises the content with an LLM." The arrow pointing from 'B' back to 'A' indicates that the summarized output is returned to the initial entity or process 'A'. This suggests that 'A' is the recipient of the final, processed information, which could be an answer to the original query, a summary of retrieved documents, or some other form of generated content. In essence, the diagrams demonstrate a process where a query is formatted, relevant information is retrieved via RAG, and then an LLM is used to synthesize and summarize this information before returning it. The skipped LLM processing in Diagram 1 might imply that the RAG step is prioritized or that the LLM is only engaged for specific types of queries or after retrieval. This workflow is a common pattern in modern LLM applications to improve accuracy and reduce hallucinations by providing external context. </details> Any query that complies with the document protocol now skips the first phase and directly invokes the RAG. Appendix B Agora Specification In this section, we provide a more formal description of Agora. B.1 Transactions An Agora transaction operates as follows. Suppose that an agent, Alice, is trying to communicate with another agent Bob: - Alice sends to Bob over HTTPS a JSON document containing three fields: - protocolHash: The hash of the protocol document. If natural language is used, then the value of protocolHash is null; - protocolSources: A list of URIs where the protocol document can be found. Must be empty if protocolHash is null and non-empty otherwise; - body: A string containing the body of the request as specified by the given protocol. - If Bob does not have the protocol document, he fetches it (either from the sources provided by Alice or from another repository); - If Bob is unable to use the protocol, he returns a JSON document with one field, namely status, which is equal to “rejected”; - Otherwise, Bob computes the response using the LLM, routines, or a combination of both; - Bob sends as response a JSON document with the following fields: - status: a string indicating the status of the response (can be “success” or “failure”); - body: the response returned by the agent. - Note that "status":"failure" must be used only for errors that are not covered by the protocol document (e.g., the agent failing to instantiate the LLM); when the protocol prescribes how to handle an error, the agent should return "status":"success" and the correct error message as body. B.2 Protocol Documents A protocol document is identified by its SHA1 hash. Protocol documents can refer to other protocol documents by adding a preamble containing the protocol hash and at least one source for the protocol document. Further discussions concerning the optimal way to reference other protocol documents is left to future work. B.3 Preferred Protocols In order to signal to other agents which protocols are supported by default, an agent can expose an endpoint /.wellknown. This endpoint returns a JSON object where each key is a supported protocol hash and each value is a non-empty list of protocol sources. Appendix C Example of Negotiation Process Between Two Agents We report the negotiation process between Alice and Bob for a protocol to transmit weather forecast. Note that while the LLMs used Markdown-style formatting, for the sake of readability we replaced Markdown syntax with the corresponding typeset style. Alice Hello. I’m here to negotiate a protocol to query a weather forecast service. The service should accept a date and a location as input and return the predicted weather condition, temperature, and precipitation for that day. Before we start, I’d like to clarify the rules for our negotiation. We should focus on designing a simple and unambiguous protocol that specifies the exact format of the messages sent and received. We shouldn’t worry about how the messages will be delivered, and we should assume that the implementation will receive a string and return a string. To begin, I propose that we use a JSON-based protocol. The input message could be a JSON object with two properties: “date” and “location”. The output message could be a JSON object with three properties: “temperature”, “precipitation”, and “weatherCondition”. What are your thoughts on this proposal? Bob Your proposal aligns well with the goal of keeping the protocol simple and unambiguous. Using a JSON-based protocol for both the input and output messages is a good approach. Here’s a more detailed specification for the protocol: Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } • date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. • location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } • temperature: A number representing the predicted temperature for that day in °C. • precipitation: A number representing the predicted precipitation for that day in mm. • weatherCondition A string representing the predicted weather condition for that day. Possible values are “sunny”, “cloudy”, “rainy”, “snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" } If you agree with this structure, we can finalize the protocol. Alice LLM response: I agree with the proposed protocol structure. It aligns well with the requirements and is simple and unambiguous. Finalised Protocol Name: Weather Forecast Query Protocol Description: A protocol for querying the weather forecast for a given date and location. Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } • date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. • location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } • temperature: A number representing the predicted temperature for that day in °C. • precipitation: A number representing the predicted precipitation for that day in mm. • weatherCondition A string representing the predicted weather condition for that day. Possible values are ”sunny”, ”cloudy”, ”rainy”, ”snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" } Appendix D 100 Agent Demo - Full Description D.1 Implementation Notes We implemented our demo in Python, using Flask servers as basis for our agents. Each agent is either a user or a server: - Users receive a random task, some randomly generated data and a description of the task data (including its schema). Their objective is to execute the requested action and return a reply according to a certain schema. This allows us to generate a large number of queries without needing to handcraft them. Note that all tasks are single-round, i.e. they can be fulfilled in one round of communication; - Servers receive queries from other users and reply to them using a combination of three types of tools: - Database tools, which involve connecting to a personal SQL or MongoDB database (assigned at random). Depending on the server, some databases are initialised with dummy data; - Mock tools, which are simplifications of actual tools (e.g., for taxi service agents, the assignTaxi tool is a mock tool that, instead of actually sending a taxi to a location, mimics the request flow); - External tools, which are tools that enable the agent to start a Agora communication with a predefined server, although no information about the respective agents’ schema is provided. In other words, the skiLodge agent can open a channel with the weatherService agent Moreover, we added three protocol databases, which are simple Flask servers that host protocol documents. The first protocol database is a peer with the second one, the latter of which is also a peer with the third protocol database (but the first protocol database is not a peer of the third one). Every 10 executed queries, one protocol databases shares its protocol documents with its peers. This simulates the propagation of protocol documents between different databases. Picking a Protocol Users track the number of communications with a given server about a certain type of task until it hits one of two thresholds: one for using a protocol instead of natural language and one for negotiating a protocol ex novo. When the first threshold is hit, the user invokes the LLM to check if either the server or the reference protocol database (which is randomly assigned to the user at the start of the demo) already have a suitable protocol. If there are none, the user continues using natural language until the second threshold is hit: in that case, the user begins a negotiation with the server and submits the final protocol to the reference protocol database. Similarly, each server has a counter that tracks the number of natural language communications with any user since the last negotiation. Once the counter hits a threshold, the server requests a negotiation with the user, regardless of how many of the tracked queries were sent by the current user. After negotiation, the counter is reset. In our demo, we set the thresholds for the user to respectively 3 and 5 communications, and the threshold for the server to 10. APIs For GPT-4o and Gemini 1.5 Pro, we used respectively the OpenAI and Google API. For Llama 3 405b, we used the SambaNova API. Prices per million tokens are reported in Table 1. Table 1: Prices per million tokens at the time of writing. | GPT-4o | 5.00 | 15.00 | | --- | --- | --- | | Llama 3 405b | 5.00 | 10.00 | | Gemini 1.5 Pro | 3.50 | 10.50 | Bootstrapping Quality-of-Life Extensions For the sake of bootstrapping the network, while implementing the demo we added two features to our nodes: - Providing each node with a simple protocol for multi-round communication in natural language; - Allowing the protocol document to include machine-readable metadata, such as the name or a short description of the protocol. This helps an agent to determine quickly which protocols, among a list of potential protocols, can be suitable for a certain task. We leave whether these features should be integrated with the Agora standard, or whether they should be handled using PDs only, to future work. D.2 Experimental Setup Preliminary Tests We first ran a series of qualitative tests to determine which among the considered LLMs (OpenAI GPT 4o, Llama 3 405b, Gemini 1.5 Pro) were the most suitable for negotiation and programming. We found that while all three LLMs were capable of negotiating and implementing protocols, GPT 4o was the most robust, followed by the Llama 3 405b and finally Gemini 1.5 Pro. Surprisingly, the main factor behind the brittleness of Gemini 1.5 Pro was not the model’s inherent performance, but rather the lack of robustness of the API itself: even with tailored retry systems, the API sometimes failed to respond in a nondeterministic manner (i.e. the same query would at times succeed and at times fail). We believe that our experience was due to temporary server issues, rather than fundamental problems with the model. LLM Distribution In light of our preliminary results, we manually assigned a model to each server node, following a power law consistent with our findings (9 nodes with GPT-4o, 4 nodes with Llama 3 405b, 2 nodes with Gemini 1.5 Pro). User agents were instead randomly assigned one of the three LLMs with uniform distribution. Overall, the breakdown of nodes by model is: - GPT-4o: 38 nodes (9 server nodes, 29 user nodes) - Llama 3 405b: 32 nodes (4 server nodes, 28 user nodes) - Gemini 1.5 Pro: 30 nodes (2 server nodes, 28 user nodes) Out of 1000 queries, 8 (representing thus 0.8% of the total query volume) failed due to Google’s Gemini API not responding. This phenomenon was unrelated to the use of Agora, with 500 Internal Server errors appearing both in the Agora demo and the natural language counterfactual with roughly the same frequency. Task Distribution To simulate the heterogeneity in communication frequency (i.e. how some nodes tend to be more active than others), we assigned to each user a “query budget” (which represents how many queries are sent by a given user) following a Pareto distribution with shape parameter equal to $0.5$ , adapted so that each user has at least 1 query. The query budget is then split between three randomly chosen types of queries using a Pareto law with a shape parameter of 1 and a minimum of 1 query per type (unless the budget is less than 3 queries). See Figure 6 for a visualisation of the distribution. <details> <summary>x3.png Details</summary> ![e0c75418](/v1/image/e0c7541868cb48c15f8ba6ff97cbc92ddb61400e32babdb0d7fd862ee57d4c03) ### Visual Description ## Line Chart: Query Budget vs. User Position ### Overview This image displays a line chart illustrating the relationship between "Query Budget" and "User Position (Chosen Randomly)". The chart uses a logarithmic scale for the y-axis, indicating that the query budget decreases significantly as the user position increases. ### Components/Axes * **Y-axis Title:** "Query Budget" * **Scale:** Logarithmic, with major tick marks at 10⁰ (1), 10¹ (10), and 10² (100). * **Axis Markers:** 10⁰, 10¹, 10². * **X-axis Title:** "User Position (Chosen Randomly)" * **Scale:** Linear. * **Axis Markers:** 0, 10, 20, 30, 40, 50, 60, 70, 80. * **Data Series:** A single line, colored in a light teal/cyan hue. * **Legend:** There is no explicit legend, but the single data series is visually represented by the light teal/cyan line. ### Detailed Analysis The chart shows a steep initial decline in Query Budget as User Position increases from 0. * **At User Position 0:** The Query Budget is approximately 100 (10²). * **Between User Position 0 and 10:** The Query Budget drops sharply, reaching approximately 10 (10¹) at User Position 10. * **Between User Position 10 and 20:** The Query Budget continues to decrease, with several small steps, reaching approximately 5 at User Position 20. * **Between User Position 20 and 30:** The Query Budget shows a stepped decrease, hovering around 3-4. Specifically, it appears to be around 4 from approximately User Position 20 to 24, then drops to around 3 from 24 to 28, and then to approximately 2.5 from 28 to 30. * **Between User Position 30 and 40:** The Query Budget remains relatively stable at approximately 2.5. * **Between User Position 40 and 50:** The Query Budget shows a slight decrease, then a step down to approximately 2. * **At User Position 50:** The Query Budget drops significantly to approximately 1 (10⁰). * **From User Position 50 to 80 (and beyond):** The Query Budget remains constant at approximately 1 (10⁰). ### Key Observations * The most significant drop in Query Budget occurs in the initial user positions (0-10). * There are several plateaus and step-wise decreases in the Query Budget as User Position increases, suggesting discrete changes or thresholds. * After User Position 50, the Query Budget stabilizes at its minimum value of approximately 1. ### Interpretation This chart suggests that the "Query Budget" is highly dependent on the "User Position (Chosen Randomly)". The steep initial decline indicates that early user positions are associated with a much higher query budget. As the user position progresses, the query budget diminishes, eventually reaching a floor of approximately 1. The stepped nature of the curve might represent different tiers or categories of users, where the budget is adjusted in discrete steps based on their position. The stabilization at the end implies that beyond a certain user position (around 50), there is no further reduction in the query budget, or it has reached its minimum allocation. This could be relevant in scenarios like resource allocation, where early engagement or priority is given a higher budget, and later engagement receives a reduced, but constant, budget. </details> Figure 6: Distribution of query budgets for users. The y axis is logarithmic. D.3 Additional Observations Cost Breakdown The breakdown of cost by activity is as follows: - Natural language communication: 54%; - Negotiation: 6%; - Checking the suitability of existing protocols 22%; - Implementing the protocols: 17%; Note that negotiation, despite being the most expensive activity (since it involves several rounds of communication), actually represented the smallest contribution to the total cost, with cheaper but more frequent operations (i.e. sending natural language messages and checking the suitability of protocols) making up the largest portion. Similar Protocols Due to the (intentional) partial insulation of nodes in the network, sometimes similar protocols emerged independently. Nevertheless, agents using different default protocols were still able to communicate by picking one of the available protocols; for the sake of simplicity, the preferred protocol is chosen by the sender.

Rendering Paper...