# A Scalable Communication Protocol for Networks of Large Language Models
**Authors**: Samuele Marro&Emanuele La Malfa&Jesse Wright&Guohao Li, Nigel Shadbolt&Michael Wooldridge&Philip Torr, Oxford, UK Oxford, UK
> Corresponding author. Email:samuele@robots.ox.ac.uk.
Abstract
Communication is a prerequisite for collaboration. When scaling networks of AI-powered agents, communication must be versatile, efficient, and portable. These requisites, which we refer to as the Agent Communication Trilemma, are hard to achieve in large networks of agents. We introduce Agora, a meta protocol that leverages existing communication standards to make LLM-powered agents solve complex problems efficiently. In Agora, agents typically use standardised routines for frequent communications, natural language for rare communications, and LLM-written routines for everything in between. Agora sidesteps the Agent Communication Trilemma and robustly handles changes in interfaces and members, allowing unprecedented scalability with full decentralisation and minimal involvement of human beings. On large Agora networks, we observe the emergence of self-organising, fully automated protocols that achieve complex goals without human intervention.
1 Introduction
Human language evolved primarily for communication purposes (Fedorenko et al., 2024). Despite its inherent ambiguity, natural language provides great versatility and allows humans and machines to collaborate and achieve complex goals that they otherwise could not (Russell & Norvig, 2016).
Decades of literature in computer science explored how to foster collaboration between agents modelled as programs (Wooldridge & Jennings, 1995; Gilbert, 2019). Several research papers design networks of agents to solve complex problems by leveraging each model’s specialisation, the so-called rule-based agents paradigm (Wooldridge, 2009). Despite its influence, such a paradigm faces two major limitations: agents hardly adapt to environmental changes and require structured data that limits their versatility (Gilbert & Terna, 2000).
With the advent of Large Language Models (LLM) (Vaswani et al., 2017; Brown et al., 2020), there has been a resurgent interest in networks of collaborative agents. LLMs can solve a variety of problems (Achiam et al., 2023; Dubey et al., 2024a) expressed in natural language as they excel at following instructions (Schulman et al., 2017; Rafailov et al., 2024). LLMs also showed remarkable improvements at handling structured data such as graphs and formatted languages (Kassner et al., 2020; Collins et al., 2022; Jin et al., 2023; Lin et al., 2024).
In terms of performance (e.g., accuracy on classification), the literature suggests that specialised LLMs outperform general purpose models (Hu et al., 2021; Zhang et al., 2024), as well as mitigating the difficulties of handling gargantuan models and the drawbacks of data and model centralisation (Song et al., 2023).
Thus, we hypothesise that:
Hypothesis
A network of heterogeneous LLMs can automate various complex tasks with nearly no human supervision via specialised and efficient protocols.
However, networks of LLM-powered agents face three key challenges that make communication at scale significantly more difficult:
- LLMs are heterogeneous: different LLMs have different architectures, makers, capabilities and usage policies. Heterogeneity is not unique to agents of LLMs, yet, compared to classic MAS agents, LLMs come with deeper representations of the surrounding environment and are thus more challenging to standardise.
- LLMs are (mostly) general-purpose tools: enumerating and standardising each task they can perform is infeasible.
- LLMs are expensive: the computational footprint and inference time of “small” LLMs dwarfs that of comparable, specialised APIs.
Scalable communication between heterogeneous LLMs must be versatile, i.e., capable of handling a variety of use cases, efficient, i.e., requiring the least computational effort, and portable, i.e., supporting the protocol should require the least human effort possible. The above-mentioned issues constitute the Agent Communication Trilemma, which we expand in Section 3.
In light of this, the aim of this paper is the following:
Key Contribution
We design and implement a communication protocol between heterogeneous LLM-powered agents and assess its feasibility and scalability for solving high-order tasks.
We sidestep the Trilemma with Agora, a meta protocol that relies on the dual use of structured data for frequent communications and natural language for infrequent ones. With Agora, we instantiate large networks of LLM-powered agents that solve complex tasks autonomously by leveraging efficient communications schemas. In such networks, we observe agents develop an emergent fully automated protocol to solve a complex task starting from an instruction expressed in natural language. We believe that this observation can serve as a basis to renew interest in emergent protocols/languages in large networks of LLMs (Lazaridou et al., 2018; Chaabouni et al., 2019; Lazaridou & Baroni, 2020; Chaabouni et al., 2022).
The paper is structured as follows. We first outline the key challenges that constitute the Agent Communication Trilemma (Section 3); we then detail how Agora addresses the Trilemma and serves as a communication protocol for networks of LLMs (Section 4). Finally, in Section 5, we provide two fully functional demos Our code is available at github.com/agora-protocol/paper-demo.: the former, with two agents, to clarify Agora’s operating principles; the latter, with 100, to prove Agora’s scalability and show the emergence of self-organising behaviours.
2 Related Work
Multi-agent LLMs and communication.
At the time of writing, Multi-Agent-Systems of Large Language Models (MAS-LLM) have become an active area of research (Guo et al., 2024) after the upsurge of LLMs as general purpose problem solvers (Brown et al., 2020; Achiam et al., 2023; Dubey et al., 2024b). Many fields have adapted techniques from the MAS-LLM paradigm to solve problems single models fail at, including reasoning and math (Li et al., 2024), Theory of Mind (Cross et al., 2024; Li et al., 2023b), planning (Singh et al., 2024), alignment to human values (Pang et al., 2024), and simulation of games, economics, and political scenarios (Bakhtin et al., 2022; Hua et al., 2023; Wu et al., 2024a). The common intuition of these works is that by breaking a task into sub-components (Hong et al., 2023) and allocating a large number of specialised models (Li et al., 2024) to each of them (Li et al., 2023a), one can achieve higher performance and observe emergent behaviours that otherwise would not occur.
On the other hand, a key requisite for solving complex tasks in large networks of MAS-LLMs is effective and efficient communication. In large networks, LLMs must agree on the actions to take (Chen et al., 2023): works such as Agashe et al. (2023) and Liang et al. (2023) studied how LLMs debate to foster collaboration on high-order tasks (Du et al., 2023). Another recent line of research explores the topology of the MAS-LLM network as a facilitator to reach consensus (Chen et al., 2024).
LLMs for simulations and emergence of protocols.
A few seminal works studied how emergent communication and protocols arise between neural networks that manipulate symbols (Havrylov & Titov, 2017; Lazaridou et al., 2018; Lazaridou & Baroni, 2020). Written before the rise of LLMs, these works inspired researchers to explore how spontaneous collaboration emerges in MAS-LLMs (Wu et al., 2024b), with application to simulation of societies (Gao et al., 2024). Of particular interest for this paper are the works by Chaabouni et al. (2019) and Chaabouni et al. (2022). Chaabouni et al. (2019) describes how emergent communication systems between neural networks privilege longer messages. Chaabouni et al. (2022) posits the existence of “scaling laws” (Kaplan et al., 2020) for large networks of MAS-LLMs in which the dataset, task complexity, and population size are the key to observe emergent behaviours.
3 The Agent Communication Trilemma
<details>
<summary>img/triangle-trilemma.png Details</summary>

### Visual Description
\n
## Diagram: API Comparison Triangle
### Overview
The image is a triangular diagram illustrating a comparison between three API approaches: Traditional static API (with OBP as an example), Meta-API (with RDF as an example), and Agora. The triangle's vertices represent three key characteristics: Efficiency, Portability, and Versatility. The diagram visually represents the trade-offs and strengths of each API type across these characteristics.
### Components/Axes
The diagram consists of an equilateral triangle with the following labels:
* **Vertices:**
* Top: "Efficiency"
* Bottom-Left: "Portability"
* Bottom-Right: "Versatility"
* **Sides:** Each side is labeled with an API type:
* Top-Left: "Traditional static API (e.g., OBP)" - colored in dark red.
* Top-Right: "Meta-API (e.g., RDF)" - colored in dark red.
* Bottom: "Natural language" - colored in dark red.
* **Center:** "Agora" - colored in dark red.
### Detailed Analysis
The diagram doesn't contain numerical data, but rather a visual representation of relative strengths.
* **Traditional static API (OBP):** This API type is positioned along the "Efficiency" and "Portability" axes. It suggests a strong focus on these two characteristics, potentially at the expense of "Versatility".
* **Meta-API (RDF):** This API type is positioned along the "Efficiency" and "Versatility" axes. It suggests a strong focus on these two characteristics, potentially at the expense of "Portability".
* **Agora:** Located in the center of the triangle, Agora appears to represent a balance between all three characteristics – Efficiency, Portability, and Versatility. It occupies the intersection of all three API types.
* **Natural Language:** This is positioned along the "Portability" and "Versatility" axes. It suggests a strong focus on these two characteristics, potentially at the expense of "Efficiency".
### Key Observations
* The diagram highlights a trade-off between the three characteristics. No single API type excels in all three areas.
* Agora is presented as a solution that attempts to bridge the gap and offer a more balanced approach.
* The examples provided (OBP and RDF) are used to anchor the positions of the Traditional and Meta-APIs, respectively.
### Interpretation
The diagram suggests that choosing an API involves prioritizing certain characteristics over others. Traditional static APIs are efficient and portable, but may lack versatility. Meta-APIs are efficient and versatile, but may be less portable. Agora aims to provide a more holistic solution by balancing all three characteristics. The inclusion of "Natural language" suggests that this approach prioritizes portability and versatility, but may be less efficient.
The diagram is a conceptual model, and doesn't provide quantitative data. It's a qualitative representation of the strengths and weaknesses of different API approaches. The positioning of each API type within the triangle is subjective and based on the designer's assessment of their relative strengths. The diagram implies that Agora is a novel approach that attempts to overcome the limitations of traditional and meta-APIs by offering a more balanced set of features.
</details>
Figure 1: The Trilemma and how our solution (Agora) balances efficiency, portability and versatility.
An agent is a computer system that, in an environment, is capable of autonomous actions (the so-called ‘agency’ (Horty, 2001)) to meet its design objective (Wooldridge & Jennings, 1995; Wooldridge, 2009, p. 15). Just as humans must negotiate and cooperate to achieve shared goals, so too must agents within multi-agent systems (Wooldridge, 2009, p. 24-25). However, when designing communication protocols for heterogeneous networks (i.e., networks where agents have different architectures, capabilities and design constraints), we run into difficulties when attempting to optimise for three properties at the same time:
- Versatility: communication between agents should support a wide variety of messages, both in terms of content and format;
- Efficiency: the computational cost of running an agent and networking cost of communication should be minimal;
- Portability: supporting the communication protocol should require the least implementation effort by the largest number of agents involved.
We name the trade-off between such properties the Agent Communication Trilemma, which is illustrated in Figure 1. In the next sections, we will discuss how an LLM-powered communication protocol can trade off versatility, efficiency, and portability.
3.1 Versatile vs. Portable Communication
In networks of agents, versatility and portability are at tension for two fundamental reasons (Olivé, 2007). A prerequisite for two agents who communicate is (1) a shared conceptual understanding of the topic on which they communicate. For instance, two agents can communicate about the weather if they both ‘know’ what it means to be sunny, rainy and overcast. For example, they should share a similar notion of describing and measuring temperature (e.g., in degrees Celsius). In addition, (2) agents must encode and decode messages in a way that is intelligible for both. Continuing the weather example, if two agents exchange data using JSON objects, both the sender and the receiver must know the syntax (e.g., the keys of a JSON object, such as temperature) and the semantics (e.g. temperature is a $32$ -bit floating point value representing the temperature, in central London, as measured in degrees Celsius) of the exchanged messages.
In complex scenarios, defining routines whose syntax and semantics satisfy requisites (1) and (2) may be difficult. For example, a programmer has to manually implement a method to decode (or decode) messages to (or from) other agents. Additionally, the programmer must explicitly instruct the agent about how to manipulate and reason about the message content, often by interpreting API documentation describing the semantics of the message. Therefore, there is a trade-off between the breadth of messages (versatility) and the implementation cost (portability).
An example of high-portability, low-versatility is the Open Banking Platform (OBP), which uses a well-defined Open API schema for data transfer (OBL, 2024). OBP is highly portable because it uses a fixed range of well-known concepts which developers can implement; however, it is restricted to discussing a narrow domain of banking data and is thus not versatile. On the other end of the spectrum, rules-based Semantic Web agents (Berners-Lee et al., 2001) that exchange RDF (Beckett et al., 2014) encoded documents are highly versatile since ontologies (Wooldridge, 2009, p. 180) enable the description of structured relations between essentially any concept. Still, they require developers to program agents to implement the specific ontologies used by the network (e.g., if a set of RDF triples states that the temperature is 38°C, an agent must be able to interpret the concepts of “temperature” and “Celsius”).
3.2 Efficient vs. Versatile and Portable Communication
As previously mentioned, rule-based agents excel at the tasks they are designed to solve but hardly adapt to new environments. Decades of research in reinforcement learning (Sutton, 2018) and then in deep reinforcement learning (Arulkumaran et al., 2017; Henderson et al., 2018), introduced a paradigm where agents learn to optimise their reward as proxy of the task we want them to solve. Agentic-LLMs, i.e., multi-agent systems powered by language models, is a recent paradigm for machine-to-machine communication that relies mostly on their proficiency at handling natural language and following instructions (Li et al., 2023a).
Natural language is highly expressive, making it a suitable choice for versatile communication (Russell & Norvig, 2016). Additionally, LLMs trained on massive corpora seem to develop an implicit understanding of various concepts that abstracts and makes communication independent from their internal architecture. Moreover, LLMs can integrate external tools, write code and invoke APIs with relatively little or no training (Schick et al., 2024), since the only requirement is a natural-language description of the tool and its parameters.
Conversely, natural language as a communication medium has two major drawbacks. While engineering and hardware improvements (Dubey et al., 2024b) mitigate costs over time, the computational requirements of invoking an LLM dwarf those of comparable APIs, representing a major bottleneck for scaling networks of LLMs. On the other hand, using closed-source pay-per-usage LLMs hosted by third parties is expensive and raises concerns in terms of replicability of the results (La Malfa et al., 2023). Additionally, natural language is inherently ambiguous: while LLMs have a certain degree of “common sense” to fulfil requests, non-determinism and natural language specifics leave space for errors that routines minimise (for instance, if someone asks for the temperature in Fahrenheit and the agent has a tool that returns the temperature in Celsius, the model must know that Celsius and Fahrenheit are both units of measure for temperature). These factors make LLMs and natural language more prone to errors than other alternatives like handwritten APIs.
In conclusion, RESTful APIs (efficient), RDF tuples (portable) and natural language (versatile) are all trade-offs in the Trilemma. While some approaches are more useful in practice than others, the fact that no communication format achieves all three properties simultaneously suggests that we need a hybrid communication protocol that leverages all of them. The next section outlines our solution.
4 Agora: a Communication Protocol Layer for LLMs
<details>
<summary>img/evil.png Details</summary>

### Visual Description
\n
## Diagram: LLM-Powered Node Architecture
### Overview
The image depicts a diagram illustrating the architecture of an LLM-Powered Node. It shows a central processing unit (represented by a rectangular prism) connected to multiple diamond-shaped nodes, and a lower layer representing data storage and communication protocols. The diagram emphasizes the flow of messages between the LLM-Powered Node and external entities.
### Components/Axes
The diagram consists of the following components:
* **LLM-Powered Node:** A light blue rectangular prism at the top center, labeled "LLM-Powered Node".
* **Nodes:** Four dark grey diamond shapes surrounding the LLM-Powered Node.
* **Data Storage/Technologies:** Icons representing various data storage and programming technologies are positioned on a light blue platform below the LLM-Powered Node. These include:
* MongoDB (green icon)
* SQL (white icon)
* XML (black icon with angle brackets)
* PHP (purple icon with the PHP logo)
* HTML, CSS, JS (icon with the respective logos)
* Python (icon with the Python logo)
* Infinity symbol (light blue)
* **Communication Protocol:** A yellow circular icon labeled "HTTPS" is positioned at the bottom center.
* **Message Flow:** A light blue curved arrow labeled "Send/receive message" connects the LLM-Powered Node to the surrounding diamond-shaped nodes.
* **Connection Points:** Black circles are positioned at the corners of the lower rectangular prism, suggesting connection points.
### Detailed Analysis or Content Details
The diagram illustrates a layered architecture. The top layer consists of the LLM-Powered Node, which acts as a central processing unit. This node communicates with four external nodes via a "Send/receive message" pathway. The lower layer represents the underlying technologies and protocols that support the LLM-Powered Node.
The data storage technologies include MongoDB, SQL, and XML, indicating a flexible data management system. The programming languages and technologies represented (PHP, HTML, CSS, JS, Python) suggest a web-based application or service. The HTTPS protocol indicates secure communication.
The diagram does not contain numerical data or specific values. It is a conceptual representation of an architecture.
### Key Observations
* The LLM-Powered Node is positioned as the central component, highlighting its importance in the architecture.
* The use of multiple data storage technologies suggests a versatile and adaptable system.
* The inclusion of HTTPS indicates a focus on security.
* The diamond-shaped nodes surrounding the LLM-Powered Node could represent clients, other services, or data sources.
* The diagram is visually symmetrical, suggesting a balanced and well-structured architecture.
### Interpretation
The diagram represents a system architecture for an application or service powered by a Large Language Model (LLM). The LLM-Powered Node serves as the core processing unit, receiving and sending messages to external nodes. The underlying technologies (data storage, programming languages, communication protocols) provide the necessary infrastructure for the LLM to function.
The architecture suggests a scalable and secure system capable of handling various data types and communication requirements. The use of HTTPS indicates a commitment to data privacy and security. The diagram does not provide details about the specific functionality of the LLM or the nature of the messages being exchanged, but it provides a clear overview of the system's overall structure. The diagram is a high-level conceptual representation and does not contain specific data points or quantitative information. It is intended to convey the relationships between the different components of the system.
</details>
(a) An illustration of Agora and how it abstracts the underlying implementation, communication, and physical layers.
<details>
<summary>img/evil-stack.png Details</summary>

### Visual Description
\n
## Diagram: Layered Architecture
### Overview
The image depicts a layered architecture diagram, illustrating a stack of layers representing different components of a system. The layers are arranged vertically, with the "Physical Layer" at the bottom and "Further layers" at the top. Each layer is contained within a rectangular box with rounded corners. Icons are present within some of the layers, presumably representing technologies or functionalities within those layers.
### Components/Axes
The diagram consists of the following layers, from bottom to top:
1. **Physical Layer:** Located at the bottom of the diagram.
2. **Communication layer:** Above the Physical Layer. Contains an icon resembling a network connection.
3. **Implementation Layer:** Above the Communication Layer. Contains two icons: a DNA helix and a database cylinder labeled "SQL".
4. **Agora:** Above the Implementation Layer. Contains a blue, multifaceted geometric shape.
5. **Further layers:** At the top of the diagram, enclosed in a dashed red rectangle.
There are no explicit axes or scales in this diagram. It's a conceptual representation of layers.
### Detailed Analysis or Content Details
* **Physical Layer:** No specific details or icons are present.
* **Communication layer:** Contains an icon that appears to represent network connectivity or communication protocols.
* **Implementation Layer:** Contains a DNA helix icon and a cylinder icon labeled "SQL". The DNA icon could represent biological or complex algorithmic processes, while the "SQL" icon clearly indicates a relational database component.
* **Agora:** Contains a blue, multifaceted geometric shape. The shape is difficult to identify precisely, but it could represent a complex data structure or a core component of the system.
* **Further layers:** This layer is indicated as being expandable, suggesting that the architecture can be extended with additional layers. The dashed red border indicates it is not a concrete layer but a placeholder for future development.
### Key Observations
The diagram illustrates a clear separation of concerns through its layered approach. The layers progress from the foundational "Physical Layer" to more abstract and application-specific layers like "Agora" and "Further layers". The inclusion of "SQL" in the "Implementation Layer" suggests that data persistence is a key aspect of the system. The icons provide visual cues about the technologies or functionalities within each layer.
### Interpretation
This diagram represents a typical layered architecture commonly used in software development. The layers abstract away complexity, allowing developers to focus on specific aspects of the system without needing to understand the intricacies of lower layers. The "Physical Layer" represents the underlying infrastructure, while the "Communication layer" handles data transfer. The "Implementation Layer" contains the core logic and data storage components. "Agora" likely represents a central processing or coordination component. The "Further layers" indicate the system's extensibility and potential for future growth.
The use of icons suggests a focus on specific technologies or functionalities within each layer. The diagram is a high-level overview and does not provide detailed information about the interactions between layers or the specific implementation details of each component. It serves as a conceptual model for understanding the system's architecture.
</details>
(b) Stack of technologies to build Agora.
Figure 2: How Agora fits into a standard communication protocol stack.
The key to solving the Communication Trilemma involves accepting that no single protocol can achieve optimal efficiency, portability and versatility at the same time. In this section we introduce Agora, a meta protocol that takes advantage of the unique capabilities of LLMs to sidestep the Trilemma by adapting different communications methods for different scenarios.
The most powerful LLMs share three key properties:
- They can understand, manipulate, and reply to other agents using natural language;
- They excel at following instructions, including writing code to implement routines (Schick et al., 2024; Hou et al., 2023; Liu et al., 2024);
- They can autonomously negotiate protocols and reach consensus on strategies and behaviours to adopt in complex scenarios (Chen et al., 2023; Fu et al., 2023).
At its core, Agora uses different communication formats depending on the circumstances; an agent can support a wide breadth of communications (high versatility) while handling the majority of the total volume of requests with efficient routines (high efficiency). Moreover, the entire negotiation and implementation workflow is handled by the LLMs and requires no human supervision (high portability). The concept of protocol documents (PD), which we sketch in Figure 3 and discuss in the next section, lies at the core of Agora’s functionalities.
In the next sections, we illustrate the hierarchy of communication methods Agora supports natively and the concept of PD; we then provide an example of how Agora works and how it enables versatile, efficient, and portable communication. We conclude by emphasising how one can integrate and build upon Agora with further technological layers independently from its underlying technologies.
4.1 Communication in (an) Agora
Agora introduces a machine-readable way to transfer and refer to protocols, namely the protocol documents (PDs). A PD is a plain-text description of a communication protocol. Throughout this paper, we use the word “protocol” to refer to any standardised description of structured communication. PDs are self-contained, implementation-agnostic, and contain everything an agent needs to support a protocol: this means that most descriptions of existing protocols, such as RFCs, are also suitable PDs. However, instead of relying on a central body to assign identifiers, a PD is uniquely identified by its hash (for multiplexing).
In Agora, the most frequent communications have dedicated efficient routines, and the least frequent ones use inefficient but flexible LLMs and natural language. In particular:
- When possible, frequent communications are handled through traditional protocols, for which there are standard, human-written implementations (e.g., OBP);
- For communications that happen less frequently (or for which there are no standard protocols), agents can use structured data as an exchange medium (which can be handled by LLM-written routines);
- For communications that might be frequent for one side but not the other, the agents still use structured data, but one side can choose to use an LLM, while the other uses a routine;
- For rare communications or when a routine fails unexpectedly, the agents can resort to natural language.
It is entirely up to the agent to handle a query using a human-written routine, an LLM-written routine, or an LLM (or a combination of these three). This gives the agent maximum flexibility over how to process queries. Forcing or nudging a model to use a specific communication style can improve efficiency, yet its discussion is out of the scope of this paper. One can, for example, specify in the system prompt of an LLM to negotiate a protocol whenever possible. In the Demo (Section 5.3), we will illustrate the trade-off between the versatility of a communication protocol and its expected usage.
Hierarchical communications support any form of communication (maximum versatility) , although in practice an LLM is invoked in very rare cases (maximum efficiency). Moreover, since LLMs can implement routines on their own (since PDs fully describe the syntax and semantics of a protocol), human programmers only need to provide an overview of the tools the agent has access to, which means that the implementation effort required on the human side is minimal (maximum portability). In other words, Agora sidesteps the Communication Trilemma by employing routines for frequent requests and resorting to natural language when agents need to negotiate efficient ways to solve a problem or errors occur.
4.2 An Example of Communication over Agora
<details>
<summary>img/pd-negotiation.png Details</summary>

### Visual Description
\n
## Diagram: LLM-Powered Node Communication Flow
### Overview
The image depicts a diagram illustrating a communication flow involving an LLM-Powered Node. The diagram shows a series of interconnected nodes with data exchange occurring between them. The flow involves negotiation of a 'PD' (likely a protocol or data definition) and the transmission of a message formatted as 'PD'.
### Components/Axes
The diagram consists of the following components:
* **LLM-Powered Node:** Located at the top-left, labeled "LLM-Powered Node".
* **Nodes:** Five dark gray, pyramid-shaped nodes arranged in a roughly linear fashion.
* **Communication Arrows:** Gray dashed arrows indicating the direction of data flow.
* **Data Labels:** Text boxes associated with the arrows, describing the data being exchanged.
* **Hash Values:** The hash value '123' appears twice, associated with the negotiation and message formatting.
### Detailed Analysis or Content Details
The diagram shows the following flow:
1. **Initial Node (LLM-Powered Node):** The process begins with the "LLM-Powered Node".
2. **Negotiation:** A dashed gray arrow extends from the LLM-Powered Node to a square icon representing "Negotiate PD hash '123'". The arrow is labeled "Natural language".
3. **Intermediate Nodes:** The flow then passes through three intermediate pyramid-shaped nodes. No data labels are associated with these nodes.
4. **Message Formatting:** A dashed gray arrow extends from the third intermediate node to a rectangular icon labeled "Message formatted as PD hash '123'".
5. **Final Node:** The flow terminates at a final pyramid-shaped node.
### Key Observations
* The hash value '123' is consistently used in both the negotiation and message formatting steps, suggesting it's a key identifier or version number.
* The use of "Natural language" in the initial communication suggests the LLM-Powered Node interacts using natural language processing.
* The diagram focuses on the process of establishing a protocol ('PD') and then transmitting a message using that protocol.
* The intermediate nodes do not have any associated data labels, implying they are simply relaying the information.
### Interpretation
The diagram illustrates a system where an LLM-Powered Node initiates a communication process by negotiating a data protocol (PD) using natural language. Once the protocol is established (identified by the hash '123'), a message is formatted according to that protocol and transmitted through a series of nodes. The diagram suggests a secure or versioned communication system, where the hash value ensures data integrity and compatibility. The lack of detail regarding the intermediate nodes suggests they are not involved in the data transformation or processing, but rather act as simple conduits for the message. The diagram is a high-level overview and does not provide details about the specific implementation of the LLM or the PD protocol. It is a conceptual illustration of a communication flow.
</details>
Figure 3: How a protocol document is negotiated between LLM-powered agents (left) and used for future efficient communications.
We now describe how two agents, Alice and Bob, can efficiently communicate over Agora using a PD routine, as illustrated in Figure 3. Alice initially sends a query with the hash of its corresponding PD. Bob uses the hash to determine if he has a corresponding routine. If so, he calls it and handles the communication without invoking the LLM. Otherwise, Bob handles the response with the LLM itself.
If Bob uses an LLM to reply several times to queries that follow a given protocol over time, to the point where using an LLM every time becomes expensive, he can use the LLM to write a routine that handles future communications.
If the routine fails or the communication is a one-off instance that does not require a protocol, Alice and Bob use natural language, which is again handled by the LLM. Natural language is also available to bootstrap communication between nodes that have never interacted before, as well as to negotiate new protocols. That said, the lower cost of routines and the lack of ambiguity are strong incentives for agents to prefer structured data.
Note that PDs can be shared with other nodes in the network, which means that two agents that have never interacted before can use protocols developed by other agents.
In the Appendix A, we provide details of five use cases of Agora to further show its versatility as a personal assistant and data analysis tool, and how it leverages compositionality and scalability to reduce costs.
4.3 Agora as a Layer Zero Protocol
Figure 2 illustrates that Agora is implementation and technology agnostic. The implementation of the agents themselves (e.g., LLMs), the database used to store data (e.g., VectorDB, SQL, MongoDB, etc.), the language in which implementations are written (Python, Java, etc.) and the nature of tools are all abstracted.
At the same time, PDs can refer to other protocol documents, and since routines can call other routines, agents can build upon previous negotiations to solve more complex tasks.
Finally, the versatility and portability of Agora make it straightforward to handle the addition or removal of a node, a change in the capabilities of a node, or a change in the goals of the network, as illustrated in the demo, Section 5.3.
All these factors contribute to making Agora a natural Layer Zero protocol, i.e. a foundation layer, for higher-order communication and collaboration between LLMs. We hope our protocol can fuel theoretical and applied research on complex protocols, negotiation schemes, and consensus algorithms in large networks of LLMs.
5 Agora in Practice
We implement and showcase two scenarios where Agora can be applied. The former, with two agents whose objective is to exchange some data; the latter, with $100$ , to test Agora scalability and the capacity of LLM-powered agents to autonomously coordinate in complex scenarios. For space reasons, the scenarios are further expanded in Appendices C and D; here, we instead focus on their functionalities and the key observations we drew in terms of efficiency/versatility/portability, reduction of costs, scalability and emergent behaviours of fully automated networks of LLMs.
5.1 Implementation Details
The design of Agora for our working demos follows three key principles:
- Minimality. Agora enforces the basic standards that allow for efficient negotiation and use of protocols, leaving everything else to PDs or other higher-order standards;
- Decentralisation. Agora does not rely on central authorities, with any collection of nodes being able to use Agora independently;
- Full backward compatibility. Agora supports existing communication protocols and schemas such as OpenAPI and JSON-Schema.
From a practical point of view, Agora uses HTTPS as base communication layer and JSON as format to exchange metadata. When sending a message in a given protocol, an agent sends a JSON document with three keys: the protocol hash, the body of the request formatted according to the protocol, and a non-empty list of sources from which the protocol can be downloaded. The receiver downloads the PD from its preferred source and, upon checking that the hash matches, stores it for future uses. This hash-based identification system ensures that any node can reference any PD without relying on a central authority to assign identifiers. Where PDs are stored is entirely up to the agents; aside from regular cloud storage, hash-based indexing makes decentralised storage options (such as IPFS Benet (2014)) viable. Additionally, since essentially all protocols can be stored as PDs, Agora has full backwards compatibility with existing protocols (although human programmers are encouraged to provide existing, standardised implementations instead of having the LLM re-implement them from scratch).
To simplify negotiation, an agent can expose an endpoint with a list of supported protocols: a potential sender can thus compare the list with its own to automatically determine if there is a common protocol. The sender can also use a potentially unsupported protocol, although the receiver can choose to reject it by returning a predefined error message.
Refer to LABEL:sec:\pname{}-specification for a more formal description of Agora.
5.2 Demo: Retrieving Weather Data
Consider two agents, Alice and Bob. Alice is a Llama-3-405B (Dubey et al., 2024b) powered agent managing the bookings of a guided tour service in London. While Llama-3 models can be hosted locally, for the sake of a proper comparison with GPT-4o and Gemini, we use a cloud provider, namely SambaNova (https://sambanova.ai). Bob is a GPT-4o (Achiam et al., 2023) agent for weather service that provides weather forecasts for a given date and location. As part of the user interaction loop, Alice notifies the user if heavy raining is expected on a booked date.
To check the weather, she initially uses her LLM to send a natural language query to Bob (phase A1):
Alice - Natural Language
What is the weather forecast for London, UK on 2024-09-27?
Bob uses his Toolformer LLM (Schick et al., 2024) to query his database (phase B1) and returns a natural language reply (phase B2):
Bob - Natural Language
The weather forecast for London, UK, on 2024-09-27 is as follows: “Rainy, 11 degrees Celsius, with a precipitation of 12 mm.”
Over time, the cost of invoking an LLM for phases A1 and B2 dominate all the other costs; Alice and Bob thus decide to develop a protocol. Alice checks if Bob already supports a suitable protocol but finds none. Therefore, she decides to negotiate a protocol with Bob. After a few rounds of negotiation, Alice and Bob agree on the following protocol: Alice sends a JSON document with two fields, location and date, and Bob replies with a JSON document containing three fields, namely temperature (in degrees Celsius), precipitation (in millimetres), and weatherCondition (one of “sunny”, “cloudy”, “rainy” and “snowy”). From there on, Alice specifies the protocol hash when performing a query. An example of exchanged message (excluding Agora’s metadata) is:
Alice - PD
{"location": "London, UK", "date": "2024-09-27"}
Both Alice and Bob independently decide to write a routine to handle their side of the communication. From now on, Alice and Bob do not need to use the LLM to transmit traffic data: a routine now automates phases A1, B1 and B2 and leverages the costs of invoking the respective LLMs.
A cost analysis.
In our demo, negotiating the protocol and implementing the routines cost $0.043$ USD in API calls, compared to an average cost of $0.020$ USD for a natural-language exchange. This means that, as long as Alice and Bob use the agreed-upon protocol more than twice, Agora reduces the overall cost. Please refer to Appendix C for a transcription of the negotiation process and the final protocol.
As a final note, we stress that the entire communication happened without human intervention. Additionally, should Bob become unavailable, Alice can simply reuse the PD with a new node that may use a different LLM/database/technology stack.
5.3 Demo: a Network of 100 Agents
<details>
<summary>img/agora-100.png Details</summary>

### Visual Description
\n
## Diagram: Food Delivery Process using Agora Subnetwork
### Overview
This diagram illustrates a food delivery process facilitated by an "Agora subnetwork" and utilizing "PD" (likely Payment Details or Proof of Delivery) hashes for verification and communication. The process begins with an order, proceeds through traffic and rider availability checks, and culminates in delivery and order commencement. The diagram uses a flow chart style with circular nodes representing actions and arrows indicating the sequence of events.
### Components/Axes
The diagram consists of the following key components:
* **Agora subnetwork:** Represented by a light blue, semi-transparent cloud shape in the bottom-left corner.
* **Nodes:** Circular nodes with icons representing actors (user, restaurant, delivery personnel) and actions (ordering, checking service, delivering).
* **Arrows:** Curved arrows indicating the flow of information and actions.
* **Text Labels:** Descriptive text associated with each node and arrow, detailing the action or message being conveyed.
* **PD Hashes:** Labels indicating "PD Hash" values associated with traffic flow and delivery.
* **Result:** A section at the bottom indicating the final outcome of the process.
### Detailed Analysis or Content Details
The diagram depicts the following sequence of events:
1. **Orders food from a restaurant:** A user (represented by a person icon) initiates an order. The arrow points to a circular node with a restaurant icon. The text label reads: “Orders food from a restaurant”.
2. **Uses a PD to check the delivery service:** The restaurant (restaurant icon) uses a PD to check the delivery service. The text label reads: “Uses a PD to check the delivery service”.
3. **Uses a PD to check the traffic service:** A delivery personnel (scooter icon) uses a PD to check the traffic service. The text label reads: “Uses a PD to check the traffic service”.
4. **Replies with a PD that the traffic flows:** The traffic service (car icon) replies with a PD indicating traffic flow. The text label reads: “Replies with a PD that the traffic flows”.
5. **Replies with a PD that the rider is available:** The delivery service (scooter icon) replies with a PD indicating rider availability. The text label reads: “Replies with a PD that the rider is available”.
6. **Replies with a PD that food is being delivered:** The delivery personnel (scooter icon) replies with a PD indicating food is being delivered. The text label reads: “Replies with a PD that food is being delivered”.
7. **Result: The order starts:** The final node shows the user (person icon) receiving the food, and the text label reads: “The order starts”.
**PD Hash Values:**
* **Traffic flow PD Hash:** "600" (located near the arrow connecting the Agora subnetwork and the traffic check node).
* **Deliver food PD Hash:** "234" (located near the arrow connecting the Agora subnetwork and the delivery node).
* **Negotiate PD Hash:** "123" (located near the arrow connecting the Agora subnetwork and the initial order node).
### Key Observations
* The Agora subnetwork appears to be a central component facilitating communication and verification throughout the process.
* PD hashes are used to confirm the status of various stages (traffic, delivery, rider availability).
* The process is sequential, with each step dependent on the successful completion of the previous one.
* The diagram emphasizes the use of verifiable data (PD hashes) to ensure trust and transparency in the delivery process.
### Interpretation
This diagram demonstrates a secure and verifiable food delivery system leveraging a subnetwork ("Agora") and cryptographic hashes ("PD Hashes"). The use of PD hashes suggests a blockchain or distributed ledger technology is employed to ensure the integrity and authenticity of each step in the process. The Agora subnetwork acts as a trusted intermediary, facilitating communication and verification between the different parties involved (user, restaurant, delivery service, traffic service).
The flow suggests a system designed to minimize risks and ensure accountability. For example, the traffic check and rider availability checks, confirmed by PD hashes, prevent orders from being accepted if delivery is not feasible. The "Negotiate PD" hash suggests an initial agreement or contract is established before the order is placed.
The diagram highlights a potential application of blockchain technology in supply chain management and logistics, where transparency and trust are crucial. The use of hashes provides a tamper-proof record of each transaction, enhancing security and reducing the potential for fraud. The diagram does not provide any information about the underlying technology or the specific implementation details of the Agora subnetwork or the PD hash generation process.
</details>
Figure 4: Illustration of how in an Agora network with $100$ agents (left; for clarity, only the relevant sub-network is displayed), an emergent protocol for food delivery emerges (right).
We now show the scaling capabilities and emergent behaviours of Agora by considering a network of 100 LLM-powered agents. In particular, we scale the number of agents, which, as posited in Chaabouni et al. (2022), is a requisite for the emergence of complex behaviours in multi-agent networks.
We design a network of $85$ assistant agents interacting with $15$ server agents, all powered by LLMs. The server agents offer various services, such as booking hotel rooms, calling taxis, ordering food, etc. An example of a sub-network for food delivery is sketched in Figure 4, left. Their specialisation is handled via prompting, as in Deshpande et al. (2023); Joshi et al. (2023); Li et al. (2023a). As part of their workflow, server agents must interact with several tools and databases; additionally, some servers need to interact with other servers to complete assistants’ requests (e.g., taxi services use the traffic data agent to adjust estimated fares for a run). We bootstrap the network by leveraging the underlying communication layer (as described in Section 4 and Figure 2) and inform the nodes of which URLs correspond to which node, as well as manually creating the connection links between agents (e.g. the Taxi Service server knows that the server on port 5007 is a traffic server, but it does not know how to communicate with it and what information it requires);
To showcase the portability of Agora throughout the network, we use different database technologies (SQL and MongoDB) and different LLMs, both open- and closed-source (GPT-4o, Llama-3-405B, and Gemini 1.5 Pro (Reid et al., 2024)). We then generate $1000$ random queries, which range from simple ones, such as requesting today’s weather, to more complex ones, like booking rooms in ski resorts, buying tickets for movies, ordering one of each dish from a menu, and so on. For each query, assistants receive a JSON document (which represents the task data) and are tasked with fulfilling the request and returning a parsed response that follows a given schema. Queries are distributed among assistants following a Pareto distribution, to simulate some assistants sending significantly more requests than others. Each node can also read and share PDs to one of three protocol databases. Overall, these design decisions result in a very heterogeneous network, testing the limits of Agora. Refer to Appendix D for further implementation details.
Emergent protocols in large networks.
Once the connections are established and the networks can send and receive messages, we observe several noteworthy behaviours. As PDs are progressively shared between agents (see Figure 5(b)), we observe the emergence of a decentralised consensus on the appropriate protocols for a given task. An example of this behaviour involves ordering food from restaurants: an agent queries another to request food to be delivered to a certain address. The restaurant agent requests a delivery driver from a food delivery service, who, in turn, checks with the traffic data agent to see if the traffic is smooth enough to fulfil the delivery. None of the agents know each other’s roles and the protocols involved beyond their immediate communication. Still, the interaction of the various agents creates an automated workflow that takes care of everything. The emergence of such a protocol is illustrated in Figure 4 (right). In contrast with some recent literature on the emergence of complex protocols (Chaabouni et al., 2019), we observe that with the proper incentives (i.e., efficiency), agents in Agora escape the inefficient trap of committing to longer messages in large scale communications.
A cost analysis.
We compare the cost of running our Agora network against one that uses natural language for all communications. As shown in Figure 5(a), at the beginning Agora’s cost-efficiency marginally outperforms the network that relies only on natural language; this gap increases over time, with progressively more Agora-powered nodes relying on LLM-written routines. The overall cost in API queries for running $1000$ queries in the natural language network is $36.23$ USD, compared to Agora’s $7.67$ USD: in other words, executing this demo with Agora is approximately five times cheaper than with regular natural language. Continuing the demo for more queries would have led to an even larger cost difference.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Line Chart: Cost per Query vs. Number of Queries
### Overview
This line chart depicts the cost (in USD) per query for two different systems, "Natural Language" and "Agora", as the number of queries increases from 0 to 1000. The x-axis represents the number of queries, and the y-axis represents the cost in USD.
### Components/Axes
* **X-axis Title:** "Queries" - Scale ranges from 0 to 1000, with gridlines at intervals of 100.
* **Y-axis Title:** "Cost (USD)" - Scale ranges from 0.000 to 0.040, with gridlines at intervals of 0.005.
* **Legend:** Located in the top-right corner.
* "Natural Language" - Represented by a green line.
* "Agora" - Represented by an orange line.
### Detailed Analysis
**Natural Language (Green Line):**
The green line representing "Natural Language" initially slopes downward from approximately 0.025 USD at 0 queries to a minimum of around 0.032 USD at approximately 50 queries. It then fluctuates, generally remaining between 0.034 and 0.038 USD for queries 50 to 750. Around 750 queries, the line begins a steeper descent, reaching approximately 0.030 USD at 1000 queries.
* 0 Queries: ~0.025 USD
* 50 Queries: ~0.032 USD
* 200 Queries: ~0.036 USD
* 400 Queries: ~0.037 USD
* 600 Queries: ~0.037 USD
* 800 Queries: ~0.039 USD
* 1000 Queries: ~0.030 USD
**Agora (Orange Line):**
The orange line representing "Agora" exhibits a rapid decline from approximately 0.022 USD at 0 queries to a minimum of around 0.005 USD at approximately 200 queries. After 200 queries, the line fluctuates between approximately 0.005 and 0.010 USD, with a slight upward trend towards the end, reaching approximately 0.007 USD at 1000 queries.
* 0 Queries: ~0.022 USD
* 50 Queries: ~0.015 USD
* 200 Queries: ~0.005 USD
* 400 Queries: ~0.007 USD
* 600 Queries: ~0.006 USD
* 800 Queries: ~0.006 USD
* 1000 Queries: ~0.007 USD
### Key Observations
* "Agora" consistently has a significantly lower cost per query than "Natural Language" across all query volumes.
* Both systems exhibit a decreasing cost per query as the number of queries increases, suggesting economies of scale.
* The cost reduction for "Agora" is much more pronounced in the initial stages (0-200 queries) than for "Natural Language".
* "Natural Language" shows a more stable cost per query between 200 and 750 queries, while "Agora" remains relatively low and stable.
* "Natural Language" experiences a notable cost decrease after 750 queries, while "Agora" remains relatively flat.
### Interpretation
The data suggests that "Agora" is a more cost-effective solution for processing queries, especially at higher volumes. The initial steep decline in cost for "Agora" indicates a significant setup or fixed cost component that is amortized over a larger number of queries. The relatively stable cost of "Natural Language" between 200 and 750 queries might indicate a different cost structure, potentially with higher per-query processing costs but fewer fixed costs. The late-stage cost decrease for "Natural Language" could be due to optimizations or learning effects as the system processes more data. The difference in cost trends suggests that the two systems employ different underlying technologies or architectures for query processing. The chart provides valuable information for decision-making regarding which system to use based on the expected query volume and cost constraints.
</details>
(a) Cost comparison of natural language vs Agora on a network of $100$ agents. Costs are averaged with a window size of $100$ .
<details>
<summary>x2.png Details</summary>

### Visual Description
## Line Chart: LLM Query Performance & Protocol Count
### Overview
The image presents a line chart displaying two metrics over a range of queries: the percentage of queries handled by Large Language Models (LLMs) and the number of protocols in use. The x-axis represents the number of queries, ranging from 0 to 1000. The left y-axis shows the percentage of queries with LLMs, scaled from 20% to 80%. The right y-axis displays the number of protocols, scaled from 0 to 14.
### Components/Axes
* **X-axis:** Queries (0 to 1000)
* **Left Y-axis:** Queries with LLMs (%) (20 to 80)
* **Right Y-axis:** Number of Protocols (0 to 14)
* **Line 1 (Red):** Percentage of Queries with LLMs
* **Line 2 (Blue):** Number of Protocols
### Detailed Analysis
**Red Line (Percentage of Queries with LLMs):**
The red line starts at approximately 80% at 0 queries and exhibits a steep downward trend, reaching a low of around 25% at approximately 100 queries. It then fluctuates between 40% and 70% from 100 to 600 queries, with several step-like increases. From 600 to 800 queries, the line plateaus around 75-80%. Finally, it increases slightly to approximately 82% at 1000 queries.
* 0 Queries: ~80%
* 100 Queries: ~25%
* 200 Queries: ~55%
* 300 Queries: ~65%
* 400 Queries: ~50%
* 500 Queries: ~70%
* 600 Queries: ~75%
* 700 Queries: ~80%
* 800 Queries: ~80%
* 900 Queries: ~80%
* 1000 Queries: ~82%
**Blue Line (Number of Protocols):**
The blue line begins at approximately 4 protocols at 0 queries and decreases to a minimum of around 2 protocols at approximately 300 queries. It then oscillates between 2 and 5 protocols from 300 to 900 queries, exhibiting a generally fluctuating pattern. The line ends at approximately 3 protocols at 1000 queries.
* 0 Queries: ~4
* 100 Queries: ~3.5
* 200 Queries: ~3
* 300 Queries: ~2
* 400 Queries: ~3
* 500 Queries: ~4
* 600 Queries: ~4.5
* 700 Queries: ~3.5
* 800 Queries: ~3
* 900 Queries: ~3.5
* 1000 Queries: ~3
### Key Observations
* The percentage of queries handled by LLMs initially decreases significantly, then stabilizes with fluctuations.
* The number of protocols used generally decreases initially and then fluctuates within a narrow range.
* There appears to be a weak inverse correlation between the percentage of LLM queries and the number of protocols. When LLM usage increases, the number of protocols tends to decrease, and vice versa.
* The red line exhibits several step-like increases, suggesting discrete changes in LLM usage thresholds or configurations.
### Interpretation
The chart suggests a dynamic relationship between the adoption of LLMs for query processing and the overall system architecture (as represented by the number of protocols). The initial decline in LLM usage could be due to a period of experimentation or optimization where LLMs were not initially effective for all query types. The subsequent stabilization and increases in LLM usage indicate successful integration and increasing reliance on LLMs. The fluctuating number of protocols suggests that the system is adapting to the changing workload and LLM capabilities, potentially adding or removing protocols as needed. The inverse correlation hints at a trade-off: as LLMs handle more queries, the need for more complex protocols might decrease. The step-like increases in LLM usage suggest deliberate adjustments to the system's configuration, possibly based on performance metrics or cost considerations. The data suggests a system that is evolving and optimizing its use of LLMs over time.
</details>
(b) The number of queries to the LLMs in Agora decreases over time as the number of established PDs grows.
Figure 5: Summary of the efficiency of Agora for the demo with 100 agents.
6 Conclusions
In this paper, we introduced Agora, a meta protocol that sidesteps the Agent Communication Trilemma by using a mix of natural language and structured protocols. We showed that Agora agents can negotiate, implement and use protocols, creating self-organising networks that solve complex tasks. Additionally, we demonstrated the scalability of Agora by testing a $100$ -agent demo and achieving a five-fold reduction in costs compared to natural language-only communication. Our results showcase the power of negotiation as a basis for efficient, scalable, and decentralised agent networks. As LLMs continue to improve and as interactions between them increase, LLM-powered agent networks have the potential to surpass the scale limitations of single LLMs. Developing frameworks and protocols that enable decentralised, flexible and efficient communication, either through Agora or other technologies, can lay the foundations for a future where complex activities are partially, if not fully, automated by LLMs.
Acknowledgements
We thank the Alan Turing Institute for providing the computational power to run our agent network, as well as SambaNova for providing credits for our Llama 3 experiments. Samuele Marro is funded by Microsoft Research Ltd. Emanuele La Malfa is funded by the Alan Turing Institute. Jesse Wright is funded by the Department of Computer Science of the University of Oxford.
References
- Achiam et al. (2023) Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Agashe et al. (2023) Saaket Agashe, Yue Fan, and Xin Eric Wang. Evaluating multi-agent coordination abilities in large language models. arXiv preprint arXiv:2310.03903, 2023.
- Arulkumaran et al. (2017) Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
- Bakhtin et al. (2022) Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Meta Fundamental AI Research Diplomacy Team (FAIR)† Hu, Hengyuan, et al. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Beckett et al. (2014) David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. Rdf 1.1 turtle. World Wide Web Consortium, 2014.
- Benet (2014) Juan Benet. Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561, 2014.
- Berners-Lee et al. (2001) Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific american, 284(5):34–43, 2001.
- Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Chaabouni et al. (2019) Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, and Marco Baroni. Anti-efficient encoding in emergent communication. Advances in Neural Information Processing Systems, 32, 2019.
- Chaabouni et al. (2022) Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, and Bilal Piot. Emergent communication at scale. In International conference on learning representations, 2022.
- Chen et al. (2023) Huaben Chen, Wenkang Ji, Lufeng Xu, and Shiyu Zhao. Multi-agent consensus seeking via large language models. arXiv preprint arXiv:2310.20151, 2023.
- Chen et al. (2024) Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, and Chuchu Fan. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4311–4317. IEEE, 2024.
- Collins et al. (2022) Katherine M Collins, Catherine Wong, Jiahai Feng, Megan Wei, and Joshua B Tenenbaum. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. arXiv preprint arXiv:2205.05718, 2022.
- Cross et al. (2024) Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Yamins, and Nick Haber. Hypothetical minds: Scaffolding theory of mind for multi-agent tasks with large language models. arXiv preprint arXiv:2407.07086, 2024.
- Deshpande et al. (2023) Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023.
- Du et al. (2023) Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
- Dubey et al. (2024a) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, et al. The llama 3 herd of models. 2024a. URL https://api.semanticscholar.org/CorpusID:271571434.
- Dubey et al. (2024b) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024b.
- Fedorenko et al. (2024) Evelina Fedorenko, Steven T. Piantadosi, and Edward A. F. Gibson. Language is primarily a tool for communication rather than thought. In Nature, pp. volume 630. Springer Nature, 2024.
- Fu et al. (2023) Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023.
- Gao et al. (2024) Chen Gao, Fengli Xu, Xu Chen, Xiang Wang, Xiangnan He, and Yong Li. Simulating human society with large language model agents: City, social media, and economic system. In Companion Proceedings of the ACM on Web Conference 2024, pp. 1290–1293, 2024.
- Gilbert (2019) Nigel Gilbert. Agent-based models. Sage Publications, 2019.
- Gilbert & Terna (2000) Nigel Gilbert and Pietro Terna. How to build and use agent-based models in social science. Mind & Society, 1:57–72, 2000.
- Guo et al. (2024) Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024.
- Havrylov & Titov (2017) Serhii Havrylov and Ivan Titov. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. Advances in neural information processing systems, 30, 2017.
- Henderson et al. (2018) Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Hong et al. (2023) Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Horty (2001) John F Horty. Agency and deontic logic. Oxford University Press, 2001.
- Hou et al. (2023) Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 2023.
- Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Hua et al. (2023) Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
- Jin et al. (2023) Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783, 2023.
- Joshi et al. (2023) Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a way to model truthfulness in language models. arXiv preprint arXiv:2310.18168, 2023.
- Kaplan et al. (2020) Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Kassner et al. (2020) Nora Kassner, Benno Krojer, and Hinrich Schütze. Are pretrained language models symbolic reasoners over knowledge? arXiv preprint arXiv:2006.10413, 2020.
- La Malfa et al. (2023) Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G Cohn, Nigel Shadbolt, and Michael Wooldridge. Language models as a service: Overview of a new paradigm and its challenges. arXiv e-prints, pp. arXiv–2309, 2023.
- Lazaridou & Baroni (2020) Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419, 2020.
- Lazaridou et al. (2018) Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. Emergence of linguistic communication from referential games with symbolic and pixel input. arXiv preprint arXiv:1804.03984, 2018.
- Li et al. (2023a) Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for” mind” exploration of large language model society. Advances in Neural Information Processing Systems, 36:51991–52008, 2023a.
- Li et al. (2023b) Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701, 2023b.
- Li et al. (2024) Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. More agents is all you need. arXiv preprint arXiv:2402.05120, 2024.
- Liang et al. (2023) Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023.
- Lin et al. (2024) Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, and Janet B Pierrehumbert. Graph-enhanced large language models in asynchronous plan reasoning. arXiv preprint arXiv:2402.02805, 2024.
- Liu et al. (2024) Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36, 2024.
- OBL (2024) OBL. Open banking read write api profile v4.0. 2024. URL https://openbankinguk.github.io/read-write-api-site3/v4.0/profiles/read-write-data-api-profile.html.
- Olivé (2007) Antoni Olivé. Conceptual modeling of information systems. Springer Science & Business Media, 2007.
- Pang et al. (2024) Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, and Siheng Chen. Self-alignment of large language models via multi-agent social simulation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024.
- Rafailov et al. (2024) Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
- Reid et al. (2024) Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024.
- Russell & Norvig (2016) Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Pearson, 2016.
- Schick et al. (2024) Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024.
- Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Singh et al. (2024) Ishika Singh, David Traum, and Jesse Thomason. Twostep: Multi-agent task planning using classical planners and large language models. arXiv preprint arXiv:2403.17246, 2024.
- Song et al. (2023) Junghwan Song, Heeyoung Jung, Selin Chun, Hyunwoo Lee, Minhyeok Kang, Minkyung Park, Eunsang Cho, et al. How to decentralize the internet: A focus on data consolidation and user privacy. Computer Networks, 234:109911, 2023.
- Sutton (2018) Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Wooldridge (2009) Michael Wooldridge. An introduction to multiagent systems. John wiley & sons, 2009.
- Wooldridge & Jennings (1995) Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
- Wu et al. (2024a) Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024a.
- Wu et al. (2024b) Zengqing Wu, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Run Peng, and Chuan Xiao. Shall we talk: Exploring spontaneous collaborations of competing llm agents. arXiv preprint arXiv:2402.12327, 2024b.
- Zhang et al. (2024) Biao Zhang, Zhongtao Liu, Colin Cherry, and Orhan Firat. When scaling meets llm finetuning: The effect of data, model and finetuning method. arXiv preprint arXiv:2402.17193, 2024.
Appendix A Agora: Use Cases
S1. Agora as a personal assistant.
A user is organising a trip to Paris: they want to book a flight, rent a car, and book a hotel room.
The LLM reads the prompt, identifies the actions it has to undertake and checks if there are LLMs available in Agora who can fulfil it. For each service, an LLM is ready to reply.
1. A user sends a message to its personal assistant. 2. The personal assistant dispatches it to Agora.
<details>
<summary>img/scenarios/s1-1.png Details</summary>

### Visual Description
\n
## Diagram: LLM Assistant Dispatch Flow
### Overview
This diagram illustrates the flow of a user request to an LLM (Large Language Model) assistant for booking travel arrangements. It depicts a user initiating a request via a smartphone, which is then dispatched to the LLM assistant, resulting in the activation of services related to flights, hotels, and car rentals.
### Components/Axes
The diagram consists of the following components:
* **User:** Represented by a grey profile silhouette on the left.
* **Smartphone:** A rectangular device positioned next to the user.
* **Arrow:** A directional arrow indicating the flow of information from the user/smartphone to the LLM assistant.
* **Text Label:** "Dispatch to the LLMs' assistant" is written along the arrow.
* **LLM Assistant:** Represented by a central circle connected to four smaller circles via lines.
* **Service Icons:** Each of the four smaller circles contains an icon representing a specific service:
* Airplane (Flight)
* Car (Car Rental)
* Building (Hotel)
* Empty Circle (Central Hub)
* **User Query:** Text below the user: "I want to book a flight, a hotel and a car for next week in Paris."
### Detailed Analysis or Content Details
The diagram shows a unidirectional flow of information. The user's request, "I want to book a flight, a hotel and a car for next week in Paris," is sent from the smartphone to the LLM assistant. The LLM assistant then appears to dispatch the request to three different services: flight booking, hotel booking, and car rental. The central circle acts as a hub, connecting the user request to these specific services.
The user query explicitly requests:
* A flight
* A hotel
* A car
* For travel next week
* To Paris
### Key Observations
The diagram simplifies the process of travel booking through an LLM assistant. It highlights the LLM's role as a central dispatcher, connecting the user's request to the necessary services. The diagram does not show any feedback loop or confirmation steps. The diagram is a high-level conceptual illustration, not a detailed technical specification.
### Interpretation
The diagram suggests a streamlined user experience where a single request can trigger multiple booking processes. The LLM assistant acts as an intermediary, understanding the user's intent and routing the request to the appropriate services. This implies the LLM has natural language understanding capabilities to parse the user's request and identify the required services. The absence of a feedback loop suggests that the diagram focuses on the initial request dispatch rather than the entire booking process. The diagram is a conceptual model of how an LLM can facilitate travel arrangements, emphasizing the automation and integration of different services.
</details>
The LLM that acts as personal assistant in the network dispatches the flight, hotel and car requests to the respective LLMs in the network. The messages are dispatched in natural language as there are no pre-existing routines to handle them.
1. The LLM personal assistant dispatches the respective messages to the right node. 2. The car, hotel, and flight LLMs process the requests and turn them into queries for their booking systems. 3. Each LLM replies with their availability and options.
<details>
<summary>img/scenarios/s1-2.png Details</summary>

### Visual Description
\n
## Diagram: Travel Booking Workflow
### Overview
The image depicts a diagram illustrating a workflow for booking travel arrangements, specifically for a trip to Paris. It shows three main actions – booking a flight, booking a car, and booking a hotel – and their interrelationships. The diagram uses circles as nodes representing actions, arrows to indicate the flow or dependency between actions, and icons to visually represent each action.
### Components/Axes
The diagram consists of:
* **Three Circular Nodes:** Representing the actions of booking a flight, a car, and a hotel.
* **Arrows:** Indicating the flow between the nodes. These arrows are dashed.
* **Icons:** A plane for "Book a flight", a car for "Book a car", and a building for "Book a hotel".
* **Text Labels:** Associated with each arrow, describing the action or dependency.
### Detailed Analysis or Content Details
1. **Node 1 (Top):** Contains an icon of an airplane and the text "Book a flight for Paris".
2. **Node 2 (Left):** A circle with no icon, connected to Node 1 with a dashed arrow labeled "Book a car in Paris".
3. **Node 3 (Bottom):** Contains an icon of a building and the text "Book a hotel room in Paris".
4. **Node 4 (Right):** Contains an icon of a car.
5. **Arrow 1:** Originates from Node 1 ("Book a flight for Paris") and points to Node 2. The label on this arrow is "Book a car in Paris".
6. **Arrow 2:** Originates from Node 3 ("Book a hotel room in Paris") and points to Node 4. The label on this arrow is "Book a car in Paris".
7. **Arrow 3:** Originates from Node 2 and points to Node 4. The label on this arrow is "Book a car in Paris".
### Key Observations
* The diagram suggests that booking a flight is a prerequisite for booking a car.
* Booking a hotel is also a prerequisite for booking a car.
* The car booking action appears to be dependent on both flight and hotel bookings.
* The diagram does not indicate any specific order between booking a flight and a hotel.
### Interpretation
The diagram illustrates a simplified travel planning process. It highlights the dependencies between different booking actions. The fact that both flight and hotel bookings lead to car booking suggests that a car is often rented *after* the primary travel arrangements (flight and accommodation) are confirmed. The diagram doesn't provide any quantitative data or specific details about the booking process, but it effectively conveys the logical flow of actions. The use of icons makes the diagram easily understandable at a glance. The repetition of the label "Book a car in Paris" on multiple arrows is somewhat redundant and could be improved by using a more descriptive label that clarifies the relationship (e.g., "Arrange Transportation"). The diagram implies a user journey where the user first secures travel and lodging, then arranges local transportation.
</details>
For the next iterations, the LLMs involved in the request propose a routine to standardise the requests to avoid natural language and process the request without invoking the LLMs.
<details>
<summary>img/scenarios/s1-3.png Details</summary>

### Visual Description
\n
## Diagram: Travel/Logistics Flow
### Overview
The image depicts two distinct diagrams (labeled '1' and '2') illustrating a travel or logistics flow. Both diagrams involve transportation modes (airplane, car) and accommodation (hotel), connected by directional arrows labeled "protocol" or representing lists of options. The diagrams appear to represent different approaches or stages within a travel process.
### Components/Axes
The diagrams consist of the following components:
* **Circles:** Represent entities or stages in the process (e.g., starting point, transportation, destination).
* **Icons:** Airplane, car, hotel building.
* **Arrows:** Indicate the flow or connection between entities. Some arrows are solid, others are dashed.
* **Labels:** "<list of flights>", "<list of cars>", "<list of hotels>", "protocol".
* **Boxes:** Numbered '1' and '2' to distinguish the diagrams.
### Detailed Analysis or Content Details
**Diagram 1:**
* A central circle is connected to three other elements:
* An airplane icon, labeled "<list of flights>". The connection is a dashed arrow.
* A car icon, labeled "<list of cars>". The connection is a solid arrow.
* A hotel building icon, labeled "<list of hotels>". The connection is a solid arrow.
* The central circle does not have a label.
**Diagram 2:**
* An airplane icon is connected to a "protocol" box via a dashed arrow.
* The "protocol" box is connected to a car icon via a solid arrow labeled "protocol".
* The "protocol" box is also connected to a hotel building icon via a solid arrow labeled "protocol".
* The airplane and hotel icons are also connected via a dashed arrow.
### Key Observations
* Diagram 1 presents a more direct connection between the starting point and the various travel options (flights, cars, hotels).
* Diagram 2 introduces a "protocol" intermediary step, suggesting a more structured or regulated process.
* The use of dashed vs. solid arrows may indicate different types of relationships or dependencies. Dashed arrows might represent options or possibilities, while solid arrows represent definite steps.
* The "protocol" element in Diagram 2 appears to act as a central control point, directing the flow to both car and hotel options.
### Interpretation
The diagrams likely represent two different approaches to travel arrangement. Diagram 1 could represent a more flexible, user-driven approach where the traveler directly chooses between flights, cars, and hotels. Diagram 2 suggests a more managed or controlled process, where a "protocol" (perhaps a travel agency, corporate policy, or automated system) dictates the flow of travel arrangements, potentially prioritizing certain options or enforcing specific rules.
The "protocol" element could represent a decision-making process, a set of rules, or a service provider that manages the travel logistics. The dashed arrows in both diagrams suggest that certain connections are optional or less direct than the solid arrow connections. The diagrams highlight the contrast between a direct, user-controlled travel process and a more structured, protocol-driven one. The diagrams do not provide any quantitative data, but rather illustrate a conceptual difference in travel arrangement methodologies.
</details>
The user receives all the data and decides whether to book or not.
S2. Security and scalability.
An LLM (Alice) collects some historical data from another LLM (Bob) that has access to a database whose internal mechanism and implementation are to keep private.
Alice submits a request to collect some historical records from Bob. The request is formatted in natural language.
<details>
<summary>img/scenarios/s2-1.png Details</summary>

### Visual Description
\n
## Diagram: Data Access Flow
### Overview
The image depicts a diagram illustrating a data access flow between two entities, labeled 'A' and 'B', and a data storage component. It highlights a request from 'A' to 'B' for specific records and the access of 'B' to a data storage unit. A dotted line indicates a potential, but inaccessible, connection.
### Components/Axes
The diagram consists of three main components:
* **Entity A:** Represented by a filled circle labeled "A".
* **Entity B:** Represented by a filled circle labeled "B".
* **Data Storage:** Represented by a stack of disks.
* **Text:** "I need the records from 2012" is displayed as a speech bubble originating from 'A' and pointing towards 'B'.
* **Text:** "Not Accessible by other LLMs" is displayed next to the data storage.
* **Arrows:** Solid arrows indicate direct data flow, while dashed arrows indicate a potential, but blocked, connection.
### Detailed Analysis or Content Details
* **Entity A to Entity B:** A solid black arrow points from 'A' to 'B', accompanied by the text "I need the records from 2012". This indicates a request for data from 2012.
* **Entity B to Data Storage:** A double-headed solid black arrow connects 'B' to the data storage, indicating bidirectional data access.
* **Entity A to Unknown Entity:** A dashed gray arrow points from 'A' to an empty circle, and then continues to 'B'. This suggests a potential connection that is not fully established or accessible.
* **Data Storage Isolation:** The text "Not Accessible by other LLMs" is positioned to the right of the data storage, indicating a security or access restriction.
### Key Observations
* The diagram emphasizes a specific data request ("records from 2012").
* The data storage is explicitly marked as inaccessible to other LLMs, suggesting a privacy or security concern.
* The dashed line indicates a potential pathway that is currently unavailable or restricted.
### Interpretation
The diagram illustrates a controlled data access scenario. Entity 'A' requests specific data (records from 2012) from Entity 'B'. Entity 'B' has access to the data storage, but this storage is protected from access by other Large Language Models (LLMs). The dashed line suggests a possible alternative route for data access, but it is currently blocked or unavailable.
This could represent a system where sensitive data is held by 'B' and access is carefully controlled. The restriction on other LLMs suggests a concern about data leakage or unauthorized use. The request for records from 2012 implies a historical data retrieval process. The diagram highlights the importance of data governance and security in a data-driven environment. The dotted line could represent a failed attempt to access data directly, or a pathway that requires additional authorization.
</details>
Alice submits another request to Bob.
Bob negotiates a protocol to query its data and writes a shared document protocol in JSON.
<details>
<summary>img/scenarios/s2-2.png Details</summary>

### Visual Description
\n
## Diagram: System Interaction & Protocol Negotiation
### Overview
The image depicts a diagram illustrating the interaction between two entities, labeled 'A' and 'B', involving protocol negotiation and access to a data source. The diagram uses circles to represent entities, arrows to indicate data flow, and a dashed box to denote a restricted access area.
### Components/Axes
The diagram consists of the following components:
* **Entity A:** Represented by a circle labeled "A".
* **Entity B:** Represented by a circle labeled "B".
* **Data Source:** Represented by a cylinder stack, indicating a database or storage system.
* **Protocol:** Represented by a document icon with lines, labeled "protocol".
* **Negotiation Flow:** A solid arrow labeled "<negotiation of the protocol>" connecting A and B.
* **Data Flow:** A solid arrow connecting B and the Data Source.
* **Restricted Access:** A dashed box surrounding the Data Source, with the text "Not Accessible by other LLMs".
* **Potential Connection:** A dotted arrow originating from A and pointing towards B, with a dotted circle at the end.
### Detailed Analysis or Content Details
The diagram shows a flow of information:
1. Entity A initiates a protocol negotiation with Entity B, indicated by the arrow labeled "<negotiation of the protocol>".
2. Entity B accesses a data source (database) and retrieves information.
3. Entity B utilizes a "protocol" (represented by the document icon) to interact with the data source.
4. The data source is explicitly marked as inaccessible to other LLMs.
5. There is a potential, but not confirmed, connection from A to B, indicated by the dotted arrow.
### Key Observations
* The diagram emphasizes the importance of protocol negotiation between entities A and B.
* The data source is specifically protected from access by other LLMs, suggesting a security or privacy concern.
* The dotted arrow indicates a possible, but not guaranteed, communication path between A and B.
* The diagram does not provide any quantitative data or specific details about the protocol itself.
### Interpretation
The diagram illustrates a secure system architecture where Entity A interacts with a data source through Entity B. The protocol negotiation step suggests a dynamic or adaptable communication process. The restriction on access for other LLMs highlights a focus on data security and control. The dotted line from A to B suggests a potential for direct communication, but it is not a guaranteed or primary pathway. This setup could represent a scenario where Entity A requests data from Entity B, which then retrieves it from a protected database, ensuring that the data remains inaccessible to unauthorized entities. The diagram is conceptual and does not provide specific implementation details. It focuses on the relationships and access control mechanisms within the system.
</details>
Alice now uses the protocol to query data from Bob.
Bob directly turns the JSON they receives from Alice into a query for its Database. In this way: Bob does not invoke the LLM and the database internals are not exposed.
<details>
<summary>img/scenarios/s2-3.png Details</summary>

### Visual Description
\n
## Diagram: Data Flow between Systems A and B
### Overview
The image depicts a diagram illustrating data flow between two systems, labeled "A" and "B". System A sends data to System B, which then interacts with a data storage component. A dotted line indicates a potential connection to an inaccessible component.
### Components/Axes
The diagram consists of the following components:
* **System A:** Represented by a circle labeled "A".
* **System B:** Represented by a circle labeled "B".
* **Data Storage:** Represented by a stack of disks positioned to the right of System B.
* **Data Packet:** A rectangular block with lines representing pages, containing the JSON data `{"record": 2013}`.
* **Arrows:** Indicate the direction of data flow. Solid arrows represent direct data transfer, while dotted arrows represent potential or inaccessible connections.
* **Text Annotation:** "Not Accessible by other LLMs" positioned to the right of System B.
### Detailed Analysis or Content Details
* **Data Flow from A to B:** A solid arrow originates from System A and points towards System B. Embedded within this arrow is a data packet containing the JSON object `{"record": 2013}`. This indicates that System A is sending data to System B, and the data includes a field named "record" with a value of 2013.
* **Data Flow from B to Data Storage:** A double-headed solid arrow connects System B to the stack of disks representing data storage. This indicates a bidirectional data flow between System B and the data storage component.
* **Potential Connection:** A dotted arrow originates from System A and points towards a circle, which is then connected to System B via a dotted arrow. This suggests a potential connection or data flow path that is not directly accessible.
* **Accessibility Restriction:** The text "Not Accessible by other LLMs" is positioned to the right of System B, indicating that the component or data associated with System B is restricted from access by other Large Language Models.
### Key Observations
* The diagram highlights a specific data point being transferred from System A to System B: `{"record": 2013}`.
* System B appears to have a direct and bidirectional relationship with a data storage component.
* There is a potential, but inaccessible, connection originating from System A.
* The annotation explicitly states a restriction on access for other LLMs.
### Interpretation
The diagram illustrates a data pipeline where System A provides data to System B, which then stores and potentially retrieves data from a storage component. The JSON data `{"record": 2013}` suggests that the data being transferred represents a record with a year value of 2013. The dotted line and the "Not Accessible by other LLMs" annotation suggest a security or privacy concern, indicating that certain aspects of the system or data are intentionally shielded from external access, specifically from other LLMs. This could be due to sensitive data, proprietary algorithms, or other security considerations. The bidirectional arrow between System B and the data storage suggests that System B both writes to and reads from the storage. The diagram is a simplified representation of a complex system, focusing on the data flow and access control aspects.
</details>
S3. Compositional tasks.
An LLM (Alice) wants to (1) analyse some market data and then (2) compute some metrics. Two LLMs in the network can do that.
1. Alice retrieves the protocol documents from a database. 2. Alice finds out that there are two protocol documents that can be used to achieve its goal.
<details>
<summary>img/scenarios/s3-1.png Details</summary>

### Visual Description
\n
## Diagram: Data Flow for Protocol and Metric Calculation
### Overview
The image is a diagram illustrating a data flow process. It depicts a central component "A" interacting with document files and a database, ultimately leading to metric computation. The diagram uses simple shapes (circles, rectangles, cylinders) and dashed arrows to represent the flow of information.
### Components/Axes
The diagram consists of the following components:
* **Component A:** A central circle labeled "A".
* **Document Files:** Three rectangular icons representing document files, positioned at the top-left.
* **Database:** A cylindrical icon representing a database, positioned at the top-right.
* **Calculator:** A rectangular icon representing a calculator, positioned at the bottom-right.
* **Arrows:** Dashed arrows indicating the flow of data between components.
* **Text Labels:** Two text labels describing the actions associated with Component A.
### Detailed Analysis or Content Details
The diagram shows the following data flow:
1. **Protocol Check:** Component A receives data from the three document files (top-left) via a bidirectional dashed arrow. The label next to this arrow states: "Check if a protocol document exists".
2. **Data Retrieval:** Component A also receives data from the database (top-right) via a dashed arrow.
3. **Metric Computation:** Component A sends data to the calculator (bottom-right) via a dashed arrow. The label next to this arrow states: "Retrieve some numerical data and compute some metrics."
There are no numerical values or scales present in the diagram. The diagram is conceptual and does not contain specific data points.
### Key Observations
The diagram highlights a process where a central component ("A") checks for protocol documents, retrieves data from a database, and then uses that data to compute metrics using a calculator. The bidirectional arrow between Component A and the document files suggests a query-response interaction. The dashed lines indicate a non-direct or potentially asynchronous data flow.
### Interpretation
This diagram likely represents a workflow in a scientific or engineering context. Component "A" could be a software module or a process that needs to adhere to a specific protocol. The protocol documents are checked to ensure compliance. The database likely contains raw data, which is retrieved and processed to generate meaningful metrics. The calculator represents the computation step.
The diagram emphasizes the importance of protocol adherence and data-driven metric calculation. It suggests a system where data integrity and process validation are crucial. The lack of specific details implies that the diagram is a high-level overview of a more complex system. The diagram does not provide any information about the nature of the protocol, the type of data stored in the database, or the specific metrics being computed.
</details>
1. Alice submits a request to the first agent to retrieve the data using the first protocol document. 2. Alice receives the data as expected.
<details>
<summary>img/scenarios/s3-2.png Details</summary>

### Visual Description
\n
## Diagram: Data Flow Illustration
### Overview
The image depicts a simplified data flow diagram. It shows two instances labeled "A" interacting with different data sources and a document. The diagram illustrates a process where "A" can access or generate a document, and also interact with data storage (represented by cylinder icons) and a data visualization (represented by a scatter plot).
### Components/Axes
The diagram consists of the following components:
* **Node A (x2):** Two circular nodes labeled "A". These likely represent processes, systems, or actors.
* **Document Icon:** A rectangular icon representing a document, with lines suggesting pages within.
* **Data Storage (x2):** Two cylinder icons, representing data storage or databases.
* **Scatter Plot:** A rectangular area containing a scatter plot of data points.
* **Arrows:** Lines with arrowheads indicating the direction of data flow.
* **Numbered Boxes (x2):** Boxes labeled "1" and "2" in the top-left and bottom-right corners, respectively. Their purpose is unclear without further context.
### Detailed Analysis or Content Details
The diagram shows the following data flows:
1. **A to Document:** An arrow originates from the left "A" node and points towards the document icon. This suggests that "A" can create or access the document.
2. **A to Data Storage (Left):** An arrow originates from the left "A" node and points towards the left data storage cylinder. This suggests "A" can read from or write to this data storage.
3. **A to Scatter Plot & Data Storage (Right):** An arrow originates from the right "A" node and splits into two directions. One branch points to the scatter plot, and the other points to the right data storage cylinder. This suggests "A" can generate or use the scatter plot and also interact with the right data storage.
4. **Scatter Plot to Data Storage (Right):** An arrow originates from the scatter plot and points to the right data storage cylinder. This suggests the scatter plot data is stored in the right data storage.
There are no numerical values or specific data labels present in the diagram. The scatter plot contains approximately 10-15 data points, but their exact coordinates or values are not discernible.
### Key Observations
* The diagram highlights a dual pathway for the "A" node. The left "A" interacts with a document and one data storage, while the right "A" interacts with a scatter plot and another data storage.
* The scatter plot appears to be an intermediate step, potentially used for analysis or visualization before being stored in the right data storage.
* The numbered boxes "1" and "2" do not appear to be directly related to the data flow and their purpose is unknown.
### Interpretation
The diagram likely represents a data processing pipeline. "A" could be a user, a system, or a process that interacts with data in various forms. The left "A" handles document-based data, while the right "A" handles more analytical data represented by the scatter plot. The data storage cylinders represent persistent storage for both types of data. The diagram suggests a workflow where data can be generated, processed, visualized, and stored.
The presence of two "A" nodes suggests a potential separation of concerns or parallel processing. The numbered boxes could indicate steps in a larger process or simply be identifiers for different parts of the diagram. Without additional context, it's difficult to determine the precise meaning of the diagram. It is a high-level illustration of data flow, lacking specific details about the data itself or the operations performed on it.
</details>
1. Alice submits a request to the second LLM to compute some metrics on the data using the second protocol document. 2. Alice receives the metrics as expected.
<details>
<summary>img/scenarios/s3-3.png Details</summary>

### Visual Description
\n
## Diagram: Data Processing Flow
### Overview
The image depicts a diagram illustrating a two-step data processing flow. It shows how data originating from a source labeled "A" is processed through different stages involving data input, calculation, and output. The diagram is split into two parallel flows, numbered 1 and 2, both originating from the same source "A".
### Components/Axes
The diagram consists of the following components:
* **Source A:** Represented by a circle labeled "A".
* **Data Input (Flow 1):** A document icon and a scatter plot icon, connected by a "+" symbol, indicating data combination.
* **Calculator (Flow 1 & 2):** A standard calculator icon.
* **Data Output (Flow 1):** A calculator icon.
* **Data Output (Flow 2):** A bar chart icon.
* **Flow Arrows:** Arrows indicating the direction of data flow.
* **Sequence Numbers:** Boxes labeled "1" and "2" indicating the order of the flows.
### Detailed Analysis or Content Details
**Flow 1:**
* The flow starts from source "A".
* An arrow points from "A" to a combination of a document icon and a scatter plot icon. The "+" symbol between them suggests that data from a document is being combined with data represented by the scatter plot.
* An arrow then points from this combined data to a calculator icon.
* Finally, an arrow points from the calculator icon to another calculator icon.
**Flow 2:**
* The flow starts from source "A".
* An arrow points directly from "A" to a calculator icon.
* An arrow then points from the calculator icon to a bar chart icon.
There are no numerical values or specific data labels present in the diagram. The icons are symbolic representations of data and processes.
### Key Observations
* Both flows originate from the same source "A".
* Flow 1 involves a combination of document data and scatter plot data before calculation, while Flow 2 directly uses data from source "A" for calculation.
* Flow 1 outputs to a calculator, while Flow 2 outputs to a bar chart.
* The diagram does not specify the type of calculations performed by the calculator.
### Interpretation
The diagram illustrates two different data processing pipelines. Flow 1 suggests a process where data from multiple sources (document and scatter plot) is integrated and then processed. The output of this process is another calculator, which could represent further processing or a different type of output. Flow 2 represents a simpler process where data from source "A" is directly processed and visualized as a bar chart.
The diagram could represent a data analysis workflow where different types of data require different processing steps before visualization. The use of a calculator in both flows suggests a common mathematical or computational step. The difference in output (calculator vs. bar chart) indicates that the two flows serve different purposes or target different audiences.
The diagram is abstract and lacks specific details about the data or the calculations involved. It serves as a high-level overview of the data processing flow rather than a detailed technical specification.
</details>
S4. Scalable consensus in large networks.
An LLM (Alice) wants to collect and aggregate data points from $N\gg 1$ resources. There is no protocol to handle that, and each resource has its own implementation, possibly not public.
1. Alice submits the requests in natural language. 2. Each queried LLM processes the request, turns it into a routine to retrieve the data and sends it back to Alice.
<details>
<summary>img/scenarios/s4-1.png Details</summary>

### Visual Description
\n
## Diagram: Data Request and Processing Flow
### Overview
The image depicts a diagram illustrating a data request and processing flow. It shows a central node "A" interacting with multiple other nodes in two distinct stages: requesting data and processing/retrieving data. The diagram uses arrows to indicate the direction of data flow.
### Components/Axes
The diagram consists of the following components:
* **Node A:** A central node, labeled "A", depicted as a circle with a thicker border.
* **Data Nodes:** Multiple unlabeled circular nodes representing data sources or processing units.
* **Arrows:** Solid and dashed arrows indicating the direction of data flow.
* **Labels:** Two labels, "Request data" and "Process and retrieve data", describing the actions occurring in each stage.
* **Stage Indicators:** Boxes labeled "1" and "2" indicating the two stages of the process.
### Detailed Analysis or Content Details
The diagram is divided into two sections, labeled "1" and "2" in the top-left and top-right corners respectively.
**Section 1: Request Data**
* Node A (center-left) sends solid arrows to multiple data nodes.
* The label "Request data" is positioned along one of the arrows originating from Node A.
* The number of data nodes is indicated by "..." suggesting there are more nodes than are explicitly shown.
**Section 2: Process and Retrieve Data**
* Node A (center-right) sends dashed arrows to multiple data nodes.
* The label "Process and retrieve data" is positioned along one of the arrows originating from Node A.
* Similar to Section 1, "..." indicates that there are more data nodes than are shown.
### Key Observations
* The diagram illustrates a client-server or master-worker pattern. Node A acts as a central coordinator, requesting data from multiple sources and then processing/retrieving data from the same sources.
* The use of solid arrows for the request phase and dashed arrows for the processing/retrieval phase visually distinguishes the two stages.
* The "..." notation suggests scalability, implying that the system can handle a variable number of data nodes.
### Interpretation
The diagram represents a distributed data processing system. Node A initiates a request for data from multiple sources. After processing or retrieving the data, it receives the results back from the same sources. This could represent a data aggregation, query processing, or distributed computation scenario. The distinction between "request" and "process/retrieve" suggests a two-step interaction. The diagram doesn't provide specific details about the data being requested or the nature of the processing, but it clearly outlines the flow of information within the system. The use of dashed lines for the return path suggests a potentially asynchronous or less direct communication channel compared to the initial request. The diagram is a high-level conceptual representation and lacks quantitative data or specific implementation details.
</details>
Alice wants to retrieve more data and queries the network another time.
1. One or more receivers suggest using a protocol document for the next iterations. 2. Alice agrees and uses the protocols with as many resources as possible.
<details>
<summary>img/scenarios/s4-2.png Details</summary>

### Visual Description
\n
## Diagram: Protocol Negotiation Flow
### Overview
The image depicts a diagram illustrating a protocol negotiation flow originating from a central node labeled "A". The diagram shows "A" initiating negotiations with multiple other nodes, represented by circles. The connections between "A" and the other nodes are labeled with "<negotiation of the protocol>". A dashed gray line connects the bottom-most node to a placeholder indicated by "...".
### Components/Axes
* **Node A:** The central initiating node, labeled "A". Located on the left side of the diagram.
* **Negotiation Nodes:** Multiple circular nodes representing entities involved in protocol negotiation. Positioned to the right of Node A.
* **Negotiation Arrows:** Solid black arrows indicating the direction of protocol negotiation from Node A to the negotiation nodes. Each arrow is labeled "<negotiation of the protocol>".
* **Placeholder:** Three dots ("...") indicating that the negotiation flow continues with more nodes.
* **Dashed Line:** A dashed gray line connecting the last explicitly drawn negotiation node to the placeholder.
### Detailed Analysis or Content Details
The diagram shows a one-to-many relationship where Node A initiates a protocol negotiation with several other nodes. The label "<negotiation of the protocol>" appears on each arrow, suggesting a consistent negotiation process. The dashed line and "..." indicate that the diagram represents a partial view of a potentially larger system with more negotiation nodes. There are four fully drawn negotiation nodes. The arrows all originate from Node A and point towards the negotiation nodes.
### Key Observations
* The diagram illustrates a fan-out pattern, where a single entity (A) interacts with multiple others.
* The consistent labeling of the arrows suggests a standardized negotiation process.
* The placeholder indicates that the diagram is not exhaustive and represents a subset of the overall system.
* The dashed line suggests a continuation of the negotiation flow, but without specific details.
### Interpretation
The diagram likely represents a communication or control system where a central authority (A) establishes a protocol with multiple clients or devices. The negotiation process ensures that all parties agree on the communication rules before data exchange or operation begins. The use of a placeholder suggests that the system is scalable and can accommodate an arbitrary number of negotiation nodes. The diagram is abstract and does not provide specific details about the protocol itself or the nature of the negotiation. It focuses on the flow of the negotiation process rather than the content of the negotiation. The diagram could represent a network protocol setup, a security handshake, or any other scenario where a standardized agreement is required before interaction.
</details>
The successive communications will increasingly use protocol documents, thus not necessitating the receiver to process the query with the LLM.
S5. Scaling complex NLP routines.
An LLM (Alice) wants to retrieve data from a system powered by an LLM (Bob) that, in turns, obtains its data from a search engine (i.e., the LLM is combined with a RAG). Bob has to (1) turn the natural language request into a query, (2) retrieve the data from the RAG, and (3) return a summary.
Alice queries Bob to retrieve some data. There is no routine to handle any of the three phases, so Bob has to invoke the LLM twice to turn the query into a format to invoke the RAG and then perform the summarisation.
<details>
<summary>img/scenarios/s5-1.png Details</summary>

### Visual Description
\n
## Diagram: RAG Process Flow
### Overview
The image depicts a diagram illustrating a Retrieval-Augmented Generation (RAG) process flow, broken down into four sequential steps. Each step consists of two nodes labeled 'A' and 'B' connected by an arrow, with descriptive text associated with each step. The diagram visually represents how a query is processed and augmented with retrieved content to generate a response.
### Components/Axes
The diagram consists of four numbered panels (1-4), each containing two circular nodes labeled 'A' and 'B'. Arrows indicate the flow of information from A to B. Each panel also includes a text description below the arrow. A magnifying glass icon is present in panel 3, indicating a retrieval process.
### Detailed Analysis or Content Details
**Panel 1:**
* Arrow direction: A -> B
* Text: "What’s the highest mountain in the world?"
**Panel 2:**
* Arrow direction: A -> B
* Text: "<Processes the query with an LLM>"
**Panel 3:**
* Arrow direction: A -> B, with a return arrow from B to itself.
* Text: "<Invokes the RAG>"
* Icon: A magnifying glass is positioned at the end of the return arrow, indicating a search or retrieval action.
**Panel 4:**
* Arrow direction: A -> B
* Text: "<Summarises the content with an LLM>"
### Key Observations
The diagram illustrates a sequential process. Panel 1 presents the initial query. Panel 2 shows the query being processed by a Large Language Model (LLM). Panel 3 indicates the invocation of the RAG process, likely retrieving relevant information. Panel 4 shows the retrieved content being summarized by an LLM. The return arrow in Panel 3 suggests an iterative process where the RAG component refines the retrieval based on the query and initial results.
### Interpretation
This diagram demonstrates the core workflow of a RAG system. The initial query (Panel 1) is processed by an LLM (Panel 2) to understand the intent. The RAG component (Panel 3) then retrieves relevant information, potentially from a knowledge base, and the LLM (Panel 4) summarizes this information to generate a more informed and accurate response. The magnifying glass icon in Panel 3 visually emphasizes the retrieval aspect of the RAG process. The diagram highlights the synergy between LLMs and external knowledge sources, enabling more robust and context-aware responses. The iterative nature of the RAG process, indicated by the return arrow in Panel 3, suggests a refinement loop to improve the quality of the retrieved information.
</details>
Alice queries Bob again; this time, Bob asks to use a routine to query directly the RAG.
<details>
<summary>img/scenarios/s5-2.png Details</summary>

### Visual Description
\n
## Diagram: RAG Process Flow
### Overview
The image depicts a diagram illustrating a Retrieval-Augmented Generation (RAG) process flow, broken down into three sequential steps (labeled 1, 2, and 3). Each step shows a flow of information between two nodes labeled 'A' and 'B', with accompanying text descriptions of the process occurring at each stage. A dashed orange box surrounds step 3, indicating a potential skip or conditional execution.
### Components/Axes
The diagram consists of:
* **Nodes:** Two circular nodes labeled 'A' and 'B' in each step.
* **Arrows:** Arrows indicating the direction of information flow between nodes.
* **Text Descriptions:** Text labels below each arrow describing the process.
* **Icons:** A document icon in step 1 and a magnifying glass with gears in step 2.
* **Step Numbers:** Numbers 1, 2, and 3 indicating the sequence of steps.
* **Dashed Box:** An orange dashed box surrounding step 3, with the text "SKIP" inside.
### Detailed Analysis or Content Details
**Step 1:**
* Node A is connected to Node B via a solid arrow pointing from left to right.
* Text description: "<Formatted query>".
* A document icon is positioned between Node A and Node B, suggesting the query is related to document retrieval.
**Step 2:**
* Node A is connected to Node B via a solid arrow pointing from left to right.
* Node B is connected back to Node A via a bidirectional arrow.
* Text description: "<Invokes the RAG>".
* A magnifying glass icon with gears is positioned near Node B, indicating a search or retrieval process.
**Step 3:**
* Node A is connected to Node B via a solid arrow pointing from left to right.
* Text description: "<Processes the query with an LLM>".
* This step is enclosed in a dashed orange box with the text "SKIP" inside.
**Step 4:**
* Node A is connected to Node B via a solid arrow pointing from left to right.
* Text description: "<Summarises the content with an LLM>".
### Key Observations
* The diagram illustrates a sequential process with a potential skip in step 3.
* The process involves a formatted query, RAG invocation, and LLM processing/summarization.
* The bidirectional arrow in step 2 suggests a feedback loop or iterative process.
* The "SKIP" label indicates that step 3 might be bypassed under certain conditions.
### Interpretation
The diagram represents a simplified workflow of a Retrieval-Augmented Generation (RAG) system. The process begins with a formatted query (Step 1), which then invokes the RAG mechanism (Step 2). The RAG process likely retrieves relevant information, as indicated by the magnifying glass icon. Step 3, enclosed in a dashed box, suggests a conditional step where the query is processed by a Large Language Model (LLM). The "SKIP" label implies that this step might be omitted based on certain criteria. Finally, Step 4 indicates that the retrieved content is summarized using an LLM.
The bidirectional arrow in Step 2 suggests that the RAG process might involve iterative refinement of the query or retrieval results. The overall flow demonstrates how a RAG system combines information retrieval with LLM capabilities to generate more informed and contextually relevant responses. The diagram is a high-level overview and does not provide specific details about the implementation or the conditions for skipping Step 3.
</details>
Any query that complies with the document protocol now skips the first phase and directly invokes the RAG.
Appendix B Agora Specification
In this section, we provide a more formal description of Agora.
B.1 Transactions
An Agora transaction operates as follows. Suppose that an agent, Alice, is trying to communicate with another agent Bob:
- Alice sends to Bob over HTTPS a JSON document containing three fields:
- protocolHash: The hash of the protocol document. If natural language is used, then the value of protocolHash is null;
- protocolSources: A list of URIs where the protocol document can be found. Must be empty if protocolHash is null and non-empty otherwise;
- body: A string containing the body of the request as specified by the given protocol.
- If Bob does not have the protocol document, he fetches it (either from the sources provided by Alice or from another repository);
- If Bob is unable to use the protocol, he returns a JSON document with one field, namely status, which is equal to “rejected”;
- Otherwise, Bob computes the response using the LLM, routines, or a combination of both;
- Bob sends as response a JSON document with the following fields:
- status: a string indicating the status of the response (can be “success” or “failure”);
- body: the response returned by the agent.
- Note that "status":"failure" must be used only for errors that are not covered by the protocol document (e.g., the agent failing to instantiate the LLM); when the protocol prescribes how to handle an error, the agent should return "status":"success" and the correct error message as body.
B.2 Protocol Documents
A protocol document is identified by its SHA1 hash. Protocol documents can refer to other protocol documents by adding a preamble containing the protocol hash and at least one source for the protocol document. Further discussions concerning the optimal way to reference other protocol documents is left to future work.
B.3 Preferred Protocols
In order to signal to other agents which protocols are supported by default, an agent can expose an endpoint /.wellknown. This endpoint returns a JSON object where each key is a supported protocol hash and each value is a non-empty list of protocol sources.
Appendix C Example of Negotiation Process Between Two Agents
We report the negotiation process between Alice and Bob for a protocol to transmit weather forecast. Note that while the LLMs used Markdown-style formatting, for the sake of readability we replaced Markdown syntax with the corresponding typeset style.
Alice
Hello. I’m here to negotiate a protocol to query a weather forecast service. The service should accept a date and a location as input and return the predicted weather condition, temperature, and precipitation for that day. Before we start, I’d like to clarify the rules for our negotiation. We should focus on designing a simple and unambiguous protocol that specifies the exact format of the messages sent and received. We shouldn’t worry about how the messages will be delivered, and we should assume that the implementation will receive a string and return a string. To begin, I propose that we use a JSON-based protocol. The input message could be a JSON object with two properties: “date” and “location”. The output message could be a JSON object with three properties: “temperature”, “precipitation”, and “weatherCondition”. What are your thoughts on this proposal?
Bob
Your proposal aligns well with the goal of keeping the protocol simple and unambiguous. Using a JSON-based protocol for both the input and output messages is a good approach. Here’s a more detailed specification for the protocol: Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } •
date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. •
location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } •
temperature: A number representing the predicted temperature for that day in °C. •
precipitation: A number representing the predicted precipitation for that day in mm. •
weatherCondition A string representing the predicted weather condition for that day. Possible values are “sunny”, “cloudy”, “rainy”, “snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" } If you agree with this structure, we can finalize the protocol.
Alice
LLM response: I agree with the proposed protocol structure. It aligns well with the requirements and is simple and unambiguous.
Finalised Protocol
Name: Weather Forecast Query Protocol Description: A protocol for querying the weather forecast for a given date and location. Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } •
date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. •
location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } •
temperature: A number representing the predicted temperature for that day in °C. •
precipitation: A number representing the predicted precipitation for that day in mm. •
weatherCondition A string representing the predicted weather condition for that day. Possible values are ”sunny”, ”cloudy”, ”rainy”, ”snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" }
Appendix D 100 Agent Demo - Full Description
D.1 Implementation Notes
We implemented our demo in Python, using Flask servers as basis for our agents. Each agent is either a user or a server:
- Users receive a random task, some randomly generated data and a description of the task data (including its schema). Their objective is to execute the requested action and return a reply according to a certain schema. This allows us to generate a large number of queries without needing to handcraft them. Note that all tasks are single-round, i.e. they can be fulfilled in one round of communication;
- Servers receive queries from other users and reply to them using a combination of three types of tools:
- Database tools, which involve connecting to a personal SQL or MongoDB database (assigned at random). Depending on the server, some databases are initialised with dummy data;
- Mock tools, which are simplifications of actual tools (e.g., for taxi service agents, the assignTaxi tool is a mock tool that, instead of actually sending a taxi to a location, mimics the request flow);
- External tools, which are tools that enable the agent to start a Agora communication with a predefined server, although no information about the respective agents’ schema is provided. In other words, the skiLodge agent can open a channel with the weatherService agent
Moreover, we added three protocol databases, which are simple Flask servers that host protocol documents. The first protocol database is a peer with the second one, the latter of which is also a peer with the third protocol database (but the first protocol database is not a peer of the third one). Every 10 executed queries, one protocol databases shares its protocol documents with its peers. This simulates the propagation of protocol documents between different databases.
Picking a Protocol
Users track the number of communications with a given server about a certain type of task until it hits one of two thresholds: one for using a protocol instead of natural language and one for negotiating a protocol ex novo.
When the first threshold is hit, the user invokes the LLM to check if either the server or the reference protocol database (which is randomly assigned to the user at the start of the demo) already have a suitable protocol. If there are none, the user continues using natural language until the second threshold is hit: in that case, the user begins a negotiation with the server and submits the final protocol to the reference protocol database.
Similarly, each server has a counter that tracks the number of natural language communications with any user since the last negotiation. Once the counter hits a threshold, the server requests a negotiation with the user, regardless of how many of the tracked queries were sent by the current user. After negotiation, the counter is reset.
In our demo, we set the thresholds for the user to respectively 3 and 5 communications, and the threshold for the server to 10.
APIs
For GPT-4o and Gemini 1.5 Pro, we used respectively the OpenAI and Google API. For Llama 3 405b, we used the SambaNova API. Prices per million tokens are reported in Table 1.
Table 1: Prices per million tokens at the time of writing.
| GPT-4o | 5.00 | 15.00 |
| --- | --- | --- |
| Llama 3 405b | 5.00 | 10.00 |
| Gemini 1.5 Pro | 3.50 | 10.50 |
Bootstrapping Quality-of-Life Extensions
For the sake of bootstrapping the network, while implementing the demo we added two features to our nodes:
- Providing each node with a simple protocol for multi-round communication in natural language;
- Allowing the protocol document to include machine-readable metadata, such as the name or a short description of the protocol. This helps an agent to determine quickly which protocols, among a list of potential protocols, can be suitable for a certain task.
We leave whether these features should be integrated with the Agora standard, or whether they should be handled using PDs only, to future work.
D.2 Experimental Setup
Preliminary Tests
We first ran a series of qualitative tests to determine which among the considered LLMs (OpenAI GPT 4o, Llama 3 405b, Gemini 1.5 Pro) were the most suitable for negotiation and programming. We found that while all three LLMs were capable of negotiating and implementing protocols, GPT 4o was the most robust, followed by the Llama 3 405b and finally Gemini 1.5 Pro. Surprisingly, the main factor behind the brittleness of Gemini 1.5 Pro was not the model’s inherent performance, but rather the lack of robustness of the API itself: even with tailored retry systems, the API sometimes failed to respond in a nondeterministic manner (i.e. the same query would at times succeed and at times fail). We believe that our experience was due to temporary server issues, rather than fundamental problems with the model.
LLM Distribution
In light of our preliminary results, we manually assigned a model to each server node, following a power law consistent with our findings (9 nodes with GPT-4o, 4 nodes with Llama 3 405b, 2 nodes with Gemini 1.5 Pro). User agents were instead randomly assigned one of the three LLMs with uniform distribution. Overall, the breakdown of nodes by model is:
- GPT-4o: 38 nodes (9 server nodes, 29 user nodes)
- Llama 3 405b: 32 nodes (4 server nodes, 28 user nodes)
- Gemini 1.5 Pro: 30 nodes (2 server nodes, 28 user nodes)
Out of 1000 queries, 8 (representing thus 0.8% of the total query volume) failed due to Google’s Gemini API not responding. This phenomenon was unrelated to the use of Agora, with 500 Internal Server errors appearing both in the Agora demo and the natural language counterfactual with roughly the same frequency.
Task Distribution
To simulate the heterogeneity in communication frequency (i.e. how some nodes tend to be more active than others), we assigned to each user a “query budget” (which represents how many queries are sent by a given user) following a Pareto distribution with shape parameter equal to $0.5$ , adapted so that each user has at least 1 query. The query budget is then split between three randomly chosen types of queries using a Pareto law with a shape parameter of 1 and a minimum of 1 query per type (unless the budget is less than 3 queries). See Figure 6 for a visualisation of the distribution.
<details>
<summary>x3.png Details</summary>

### Visual Description
\n
## Line Chart: Query Budget vs. User Position
### Overview
The image presents a line chart illustrating the relationship between "Query Budget" and "User Position (Chosen Randomly)". The chart displays a steep decline in Query Budget as User Position increases, suggesting a diminishing return or allocation of resources as users are considered in a random order. The y-axis is on a logarithmic scale.
### Components/Axes
* **X-axis:** "User Position (Chosen Randomly)", ranging from 0 to 80. The axis is linearly scaled.
* **Y-axis:** "Query Budget", ranging from 10⁰ to 10². The axis is logarithmically scaled, with markers at 1, 10, and 100.
* **Data Series:** A single line representing the Query Budget for each User Position. The line is teal in color.
* **Grid:** A light gray grid is present to aid in reading values.
### Detailed Analysis
The line representing Query Budget starts at approximately 100 (at User Position 0) and rapidly decreases. Here's a breakdown of approximate values at specific User Positions:
* User Position 0: Query Budget ≈ 100
* User Position 10: Query Budget ≈ 10
* User Position 20: Query Budget ≈ 5
* User Position 30: Query Budget ≈ 3
* User Position 40: Query Budget ≈ 2
* User Position 50: Query Budget ≈ 1
* User Position 60: Query Budget ≈ 0.5
* User Position 70: Query Budget ≈ 0.2
* User Position 80: Query Budget ≈ 0.1
The line exhibits a steep negative slope initially, gradually flattening out as the User Position increases. There are noticeable plateaus in the line around User Positions 20-30 and 40-50, indicating periods where the Query Budget remains relatively constant despite increasing User Position.
### Key Observations
* The Query Budget decreases exponentially with increasing User Position.
* The initial decrease is much more significant than the decrease at higher User Positions.
* The logarithmic scale of the Y-axis emphasizes the rapid initial decline.
* The plateaus suggest a threshold effect or a change in allocation strategy at certain User Positions.
### Interpretation
This chart likely represents a resource allocation strategy where the Query Budget is distributed among users chosen randomly. The decreasing Query Budget with increasing User Position suggests that the initial users receive a disproportionately larger share of the budget. This could be due to several reasons:
* **Prioritization:** The first users might be considered more valuable or important.
* **Diminishing Returns:** The benefit of allocating additional budget to later users might be lower.
* **Budget Constraint:** The total Query Budget is limited, and as more users are considered, the budget per user must decrease.
The plateaus could indicate that there's a minimum Query Budget allocated to each user, or that the allocation strategy changes after a certain number of users have been considered. The logarithmic scale highlights the significant impact of allocating budget to the initial users. The random selection of users suggests that this allocation strategy is not based on any specific user characteristics, but rather on a fair, albeit potentially inefficient, distribution of resources.
</details>
Figure 6: Distribution of query budgets for users. The y axis is logarithmic.
D.3 Additional Observations
Cost Breakdown
The breakdown of cost by activity is as follows:
- Natural language communication: 54%;
- Negotiation: 6%;
- Checking the suitability of existing protocols 22%;
- Implementing the protocols: 17%;
Note that negotiation, despite being the most expensive activity (since it involves several rounds of communication), actually represented the smallest contribution to the total cost, with cheaper but more frequent operations (i.e. sending natural language messages and checking the suitability of protocols) making up the largest portion.
Similar Protocols
Due to the (intentional) partial insulation of nodes in the network, sometimes similar protocols emerged independently. Nevertheless, agents using different default protocols were still able to communicate by picking one of the available protocols; for the sake of simplicity, the preferred protocol is chosen by the sender.