# A Scalable Communication Protocol for Networks of Large Language Models
**Authors**: Samuele Marro&Emanuele La Malfa&Jesse Wright&Guohao Li, Nigel Shadbolt&Michael Wooldridge&Philip Torr, Oxford, UK Oxford, UK
> Corresponding author. Email:samuele@robots.ox.ac.uk.
Abstract
Communication is a prerequisite for collaboration. When scaling networks of AI-powered agents, communication must be versatile, efficient, and portable. These requisites, which we refer to as the Agent Communication Trilemma, are hard to achieve in large networks of agents. We introduce Agora, a meta protocol that leverages existing communication standards to make LLM-powered agents solve complex problems efficiently. In Agora, agents typically use standardised routines for frequent communications, natural language for rare communications, and LLM-written routines for everything in between. Agora sidesteps the Agent Communication Trilemma and robustly handles changes in interfaces and members, allowing unprecedented scalability with full decentralisation and minimal involvement of human beings. On large Agora networks, we observe the emergence of self-organising, fully automated protocols that achieve complex goals without human intervention.
1 Introduction
Human language evolved primarily for communication purposes (Fedorenko et al., 2024). Despite its inherent ambiguity, natural language provides great versatility and allows humans and machines to collaborate and achieve complex goals that they otherwise could not (Russell & Norvig, 2016).
Decades of literature in computer science explored how to foster collaboration between agents modelled as programs (Wooldridge & Jennings, 1995; Gilbert, 2019). Several research papers design networks of agents to solve complex problems by leveraging each model’s specialisation, the so-called rule-based agents paradigm (Wooldridge, 2009). Despite its influence, such a paradigm faces two major limitations: agents hardly adapt to environmental changes and require structured data that limits their versatility (Gilbert & Terna, 2000).
With the advent of Large Language Models (LLM) (Vaswani et al., 2017; Brown et al., 2020), there has been a resurgent interest in networks of collaborative agents. LLMs can solve a variety of problems (Achiam et al., 2023; Dubey et al., 2024a) expressed in natural language as they excel at following instructions (Schulman et al., 2017; Rafailov et al., 2024). LLMs also showed remarkable improvements at handling structured data such as graphs and formatted languages (Kassner et al., 2020; Collins et al., 2022; Jin et al., 2023; Lin et al., 2024).
In terms of performance (e.g., accuracy on classification), the literature suggests that specialised LLMs outperform general purpose models (Hu et al., 2021; Zhang et al., 2024), as well as mitigating the difficulties of handling gargantuan models and the drawbacks of data and model centralisation (Song et al., 2023).
Thus, we hypothesise that:
Hypothesis
A network of heterogeneous LLMs can automate various complex tasks with nearly no human supervision via specialised and efficient protocols.
However, networks of LLM-powered agents face three key challenges that make communication at scale significantly more difficult:
- LLMs are heterogeneous: different LLMs have different architectures, makers, capabilities and usage policies. Heterogeneity is not unique to agents of LLMs, yet, compared to classic MAS agents, LLMs come with deeper representations of the surrounding environment and are thus more challenging to standardise.
- LLMs are (mostly) general-purpose tools: enumerating and standardising each task they can perform is infeasible.
- LLMs are expensive: the computational footprint and inference time of “small” LLMs dwarfs that of comparable, specialised APIs.
Scalable communication between heterogeneous LLMs must be versatile, i.e., capable of handling a variety of use cases, efficient, i.e., requiring the least computational effort, and portable, i.e., supporting the protocol should require the least human effort possible. The above-mentioned issues constitute the Agent Communication Trilemma, which we expand in Section 3.
In light of this, the aim of this paper is the following:
Key Contribution
We design and implement a communication protocol between heterogeneous LLM-powered agents and assess its feasibility and scalability for solving high-order tasks.
We sidestep the Trilemma with Agora, a meta protocol that relies on the dual use of structured data for frequent communications and natural language for infrequent ones. With Agora, we instantiate large networks of LLM-powered agents that solve complex tasks autonomously by leveraging efficient communications schemas. In such networks, we observe agents develop an emergent fully automated protocol to solve a complex task starting from an instruction expressed in natural language. We believe that this observation can serve as a basis to renew interest in emergent protocols/languages in large networks of LLMs (Lazaridou et al., 2018; Chaabouni et al., 2019; Lazaridou & Baroni, 2020; Chaabouni et al., 2022).
The paper is structured as follows. We first outline the key challenges that constitute the Agent Communication Trilemma (Section 3); we then detail how Agora addresses the Trilemma and serves as a communication protocol for networks of LLMs (Section 4). Finally, in Section 5, we provide two fully functional demos Our code is available at github.com/agora-protocol/paper-demo.: the former, with two agents, to clarify Agora’s operating principles; the latter, with 100, to prove Agora’s scalability and show the emergence of self-organising behaviours.
2 Related Work
Multi-agent LLMs and communication.
At the time of writing, Multi-Agent-Systems of Large Language Models (MAS-LLM) have become an active area of research (Guo et al., 2024) after the upsurge of LLMs as general purpose problem solvers (Brown et al., 2020; Achiam et al., 2023; Dubey et al., 2024b). Many fields have adapted techniques from the MAS-LLM paradigm to solve problems single models fail at, including reasoning and math (Li et al., 2024), Theory of Mind (Cross et al., 2024; Li et al., 2023b), planning (Singh et al., 2024), alignment to human values (Pang et al., 2024), and simulation of games, economics, and political scenarios (Bakhtin et al., 2022; Hua et al., 2023; Wu et al., 2024a). The common intuition of these works is that by breaking a task into sub-components (Hong et al., 2023) and allocating a large number of specialised models (Li et al., 2024) to each of them (Li et al., 2023a), one can achieve higher performance and observe emergent behaviours that otherwise would not occur.
On the other hand, a key requisite for solving complex tasks in large networks of MAS-LLMs is effective and efficient communication. In large networks, LLMs must agree on the actions to take (Chen et al., 2023): works such as Agashe et al. (2023) and Liang et al. (2023) studied how LLMs debate to foster collaboration on high-order tasks (Du et al., 2023). Another recent line of research explores the topology of the MAS-LLM network as a facilitator to reach consensus (Chen et al., 2024).
LLMs for simulations and emergence of protocols.
A few seminal works studied how emergent communication and protocols arise between neural networks that manipulate symbols (Havrylov & Titov, 2017; Lazaridou et al., 2018; Lazaridou & Baroni, 2020). Written before the rise of LLMs, these works inspired researchers to explore how spontaneous collaboration emerges in MAS-LLMs (Wu et al., 2024b), with application to simulation of societies (Gao et al., 2024). Of particular interest for this paper are the works by Chaabouni et al. (2019) and Chaabouni et al. (2022). Chaabouni et al. (2019) describes how emergent communication systems between neural networks privilege longer messages. Chaabouni et al. (2022) posits the existence of “scaling laws” (Kaplan et al., 2020) for large networks of MAS-LLMs in which the dataset, task complexity, and population size are the key to observe emergent behaviours.
3 The Agent Communication Trilemma
<details>
<summary>img/triangle-trilemma.png Details</summary>

### Visual Description
## Triangle Diagram: API Characteristics
### Overview
The image is a triangle diagram illustrating the characteristics of different types of APIs. The outer triangle, in blue, represents a trade-off between Efficiency, Portability, and Versatility. An inner, smaller triangle in red, represents "Agora" and is positioned within the larger triangle.
### Components/Axes
* **Outer Triangle (Blue):**
* Top vertex: "Efficiency"
* Left vertex: "Portability"
* Right vertex: "Versatility"
* Left side: "Traditional static API (e.g., OBP)"
* Right side: "Meta-API (e.g., RDF)"
* Bottom side: "Natural language"
* **Inner Triangle (Red):**
* Label: "Agora"
### Detailed Analysis or ### Content Details
The diagram depicts a conceptual relationship between API types and their characteristics. The outer triangle's vertices represent three desirable qualities: Efficiency, Portability, and Versatility. The sides of the triangle are labeled with API types that are associated with the trade-offs between these qualities.
* **Traditional static API (e.g., OBP):** Positioned along the left side, suggesting a focus on Portability and Efficiency, but potentially less Versatility.
* **Meta-API (e.g., RDF):** Positioned along the right side, suggesting a focus on Versatility and Efficiency, but potentially less Portability.
* **Natural language:** Positioned along the bottom side, suggesting a focus on Portability and Versatility, but potentially less Efficiency.
* **Agora:** Positioned in the center, within the inner triangle, suggesting a balance or compromise between all three qualities.
### Key Observations
* The diagram uses a triangular shape to represent trade-offs between three characteristics.
* The position of each API type along the sides of the triangle indicates its relative strengths and weaknesses.
* "Agora" is positioned centrally, suggesting a balanced approach.
### Interpretation
The diagram suggests that different API types prioritize different characteristics. Traditional static APIs may be strong in portability and efficiency, while Meta-APIs may excel in versatility and efficiency. Natural language APIs may prioritize portability and versatility. "Agora" represents an API approach that attempts to balance all three characteristics. The diagram is a visual representation of the trade-offs involved in API design and selection.
</details>
Figure 1: The Trilemma and how our solution (Agora) balances efficiency, portability and versatility.
An agent is a computer system that, in an environment, is capable of autonomous actions (the so-called ‘agency’ (Horty, 2001)) to meet its design objective (Wooldridge & Jennings, 1995; Wooldridge, 2009, p. 15). Just as humans must negotiate and cooperate to achieve shared goals, so too must agents within multi-agent systems (Wooldridge, 2009, p. 24-25). However, when designing communication protocols for heterogeneous networks (i.e., networks where agents have different architectures, capabilities and design constraints), we run into difficulties when attempting to optimise for three properties at the same time:
- Versatility: communication between agents should support a wide variety of messages, both in terms of content and format;
- Efficiency: the computational cost of running an agent and networking cost of communication should be minimal;
- Portability: supporting the communication protocol should require the least implementation effort by the largest number of agents involved.
We name the trade-off between such properties the Agent Communication Trilemma, which is illustrated in Figure 1. In the next sections, we will discuss how an LLM-powered communication protocol can trade off versatility, efficiency, and portability.
3.1 Versatile vs. Portable Communication
In networks of agents, versatility and portability are at tension for two fundamental reasons (Olivé, 2007). A prerequisite for two agents who communicate is (1) a shared conceptual understanding of the topic on which they communicate. For instance, two agents can communicate about the weather if they both ‘know’ what it means to be sunny, rainy and overcast. For example, they should share a similar notion of describing and measuring temperature (e.g., in degrees Celsius). In addition, (2) agents must encode and decode messages in a way that is intelligible for both. Continuing the weather example, if two agents exchange data using JSON objects, both the sender and the receiver must know the syntax (e.g., the keys of a JSON object, such as temperature) and the semantics (e.g. temperature is a $32$ -bit floating point value representing the temperature, in central London, as measured in degrees Celsius) of the exchanged messages.
In complex scenarios, defining routines whose syntax and semantics satisfy requisites (1) and (2) may be difficult. For example, a programmer has to manually implement a method to decode (or decode) messages to (or from) other agents. Additionally, the programmer must explicitly instruct the agent about how to manipulate and reason about the message content, often by interpreting API documentation describing the semantics of the message. Therefore, there is a trade-off between the breadth of messages (versatility) and the implementation cost (portability).
An example of high-portability, low-versatility is the Open Banking Platform (OBP), which uses a well-defined Open API schema for data transfer (OBL, 2024). OBP is highly portable because it uses a fixed range of well-known concepts which developers can implement; however, it is restricted to discussing a narrow domain of banking data and is thus not versatile. On the other end of the spectrum, rules-based Semantic Web agents (Berners-Lee et al., 2001) that exchange RDF (Beckett et al., 2014) encoded documents are highly versatile since ontologies (Wooldridge, 2009, p. 180) enable the description of structured relations between essentially any concept. Still, they require developers to program agents to implement the specific ontologies used by the network (e.g., if a set of RDF triples states that the temperature is 38°C, an agent must be able to interpret the concepts of “temperature” and “Celsius”).
3.2 Efficient vs. Versatile and Portable Communication
As previously mentioned, rule-based agents excel at the tasks they are designed to solve but hardly adapt to new environments. Decades of research in reinforcement learning (Sutton, 2018) and then in deep reinforcement learning (Arulkumaran et al., 2017; Henderson et al., 2018), introduced a paradigm where agents learn to optimise their reward as proxy of the task we want them to solve. Agentic-LLMs, i.e., multi-agent systems powered by language models, is a recent paradigm for machine-to-machine communication that relies mostly on their proficiency at handling natural language and following instructions (Li et al., 2023a).
Natural language is highly expressive, making it a suitable choice for versatile communication (Russell & Norvig, 2016). Additionally, LLMs trained on massive corpora seem to develop an implicit understanding of various concepts that abstracts and makes communication independent from their internal architecture. Moreover, LLMs can integrate external tools, write code and invoke APIs with relatively little or no training (Schick et al., 2024), since the only requirement is a natural-language description of the tool and its parameters.
Conversely, natural language as a communication medium has two major drawbacks. While engineering and hardware improvements (Dubey et al., 2024b) mitigate costs over time, the computational requirements of invoking an LLM dwarf those of comparable APIs, representing a major bottleneck for scaling networks of LLMs. On the other hand, using closed-source pay-per-usage LLMs hosted by third parties is expensive and raises concerns in terms of replicability of the results (La Malfa et al., 2023). Additionally, natural language is inherently ambiguous: while LLMs have a certain degree of “common sense” to fulfil requests, non-determinism and natural language specifics leave space for errors that routines minimise (for instance, if someone asks for the temperature in Fahrenheit and the agent has a tool that returns the temperature in Celsius, the model must know that Celsius and Fahrenheit are both units of measure for temperature). These factors make LLMs and natural language more prone to errors than other alternatives like handwritten APIs.
In conclusion, RESTful APIs (efficient), RDF tuples (portable) and natural language (versatile) are all trade-offs in the Trilemma. While some approaches are more useful in practice than others, the fact that no communication format achieves all three properties simultaneously suggests that we need a hybrid communication protocol that leverages all of them. The next section outlines our solution.
4 Agora: a Communication Protocol Layer for LLMs
<details>
<summary>img/evil.png Details</summary>

### Visual Description
## System Architecture Diagram: LLM-Powered Node Communication
### Overview
The image is a system architecture diagram illustrating the communication between LLM-Powered Nodes through a secure HTTPS connection. The diagram depicts multiple LLM-Powered Nodes interacting with a platform that supports various databases and programming languages.
### Components/Axes
* **Nodes:** Four black polyhedral shapes labeled "LLM-Powered Node" at the top-left.
* **Communication:** "Send/receive message" is written between the nodes, indicating data exchange.
* **Platform Layer:** A rectangular layer containing icons representing various technologies:
* Databases: SQL, mongo DB, XML
* Programming Languages: PHP, HTML, CSS, JS, Python
* Other: Spark logo, Meta logo, ChatGPT logo
* **Secure Connection:** A lower rectangular layer with a padlock icon and the label "HTTPS".
* **Connectors:** Dashed lines and solid circles connecting the platform layer to the secure connection layer.
### Detailed Analysis or ### Content Details
* **LLM-Powered Nodes:** The nodes are positioned at the top of the diagram and appear to be interconnected via light gray lines, suggesting a network.
* **Platform Layer Technologies:**
* **Databases:** SQL (represented by a database icon with "SQL" written on it), mongo DB (represented by a database icon with a leaf and "mongo DB" written below), XML (represented by an icon with "</> XML").
* **Programming Languages:** PHP (represented by the "php" logo), HTML, CSS, JS (represented by the HTML5 logo, CSS3 logo, and JavaScript logo respectively), Python (represented by the Python logo).
* **Other:** Spark logo, Meta logo, ChatGPT logo
* **HTTPS Layer:** The HTTPS layer is positioned below the platform layer, indicating a secure communication channel.
* **Connectors:** The dashed lines and solid circles connect the corners of the platform layer to the corners of the HTTPS layer, suggesting a connection between the platform and the secure communication channel.
### Key Observations
* The diagram emphasizes the interaction between LLM-Powered Nodes and a platform that supports a variety of technologies.
* The use of HTTPS indicates a focus on secure communication.
* The presence of various database and programming language icons suggests a versatile platform.
### Interpretation
The diagram illustrates a system architecture where LLM-Powered Nodes communicate with each other through a platform that supports various databases and programming languages. The use of HTTPS ensures secure communication between the nodes. The platform's versatility is highlighted by the presence of various database and programming language icons. This architecture suggests a flexible and secure system for LLM-based applications. The diagram implies that the LLM-Powered Nodes can interact with various data sources and services through the platform, enabling a wide range of applications.
</details>
(a) An illustration of Agora and how it abstracts the underlying implementation, communication, and physical layers.
<details>
<summary>img/evil-stack.png Details</summary>

### Visual Description
## Layer Diagram: Agora Architecture
### Overview
The image is a diagram illustrating a layered architecture, likely for a software system or platform named "Agora." The diagram depicts four distinct layers, stacked vertically, with an additional layer indicated above the top layer. Each layer is represented by a rectangular box with a label and associated icons.
### Components/Axes
* **Layers (from top to bottom):**
* Further layers (indicated with a dashed border)
* Agora
* Implementation Layer
* Communication layer
* Physical Layer
* **Icons:**
* Agora Layer: A black geometric shape resembling a multifaceted crystal or polyhedron.
* Implementation Layer: Icons representing ChatGPT, Python, and SQL database.
* Communication Layer: A padlock icon.
### Detailed Analysis or ### Content Details
* **Further layers:** Located at the top of the diagram, enclosed in a dashed red border. The text reads "Further layers".
* **Agora:** Located below "Further layers". The text reads "Agora". A black geometric shape is present to the right of the text.
* **Implementation Layer:** Located below "Agora". The text reads "Implementation Layer". Icons representing ChatGPT, Python, and SQL database are present to the right of the text.
* **Communication layer:** Located below "Implementation Layer". The text reads "Communication layer". A padlock icon is present to the right of the text.
* **Physical Layer:** Located at the bottom of the diagram. The text reads "Physical Layer".
### Key Observations
* The diagram presents a layered architecture, with each layer representing a different level of abstraction or functionality.
* The "Further layers" box suggests that the architecture may have additional layers not explicitly shown in the diagram.
* The icons associated with each layer provide hints about the technologies or concepts involved in that layer. For example, the Implementation Layer uses ChatGPT, Python, and SQL. The Communication layer uses a padlock, suggesting security or encryption.
### Interpretation
The diagram illustrates a layered architecture for the Agora system. The layers are arranged in a stack, suggesting a hierarchical relationship where each layer builds upon the layers below it. The "Physical Layer" likely represents the hardware or infrastructure on which the system runs. The "Communication layer" likely handles network communication and security. The "Implementation Layer" likely contains the core logic and functionality of the system, using technologies like ChatGPT, Python, and SQL. The "Agora" layer likely represents the user-facing interface or application. The "Further layers" box suggests that the architecture may be more complex than what is shown in the diagram, with additional layers for specific features or functionalities.
</details>
(b) Stack of technologies to build Agora.
Figure 2: How Agora fits into a standard communication protocol stack.
The key to solving the Communication Trilemma involves accepting that no single protocol can achieve optimal efficiency, portability and versatility at the same time. In this section we introduce Agora, a meta protocol that takes advantage of the unique capabilities of LLMs to sidestep the Trilemma by adapting different communications methods for different scenarios.
The most powerful LLMs share three key properties:
- They can understand, manipulate, and reply to other agents using natural language;
- They excel at following instructions, including writing code to implement routines (Schick et al., 2024; Hou et al., 2023; Liu et al., 2024);
- They can autonomously negotiate protocols and reach consensus on strategies and behaviours to adopt in complex scenarios (Chen et al., 2023; Fu et al., 2023).
At its core, Agora uses different communication formats depending on the circumstances; an agent can support a wide breadth of communications (high versatility) while handling the majority of the total volume of requests with efficient routines (high efficiency). Moreover, the entire negotiation and implementation workflow is handled by the LLMs and requires no human supervision (high portability). The concept of protocol documents (PD), which we sketch in Figure 3 and discuss in the next section, lies at the core of Agora’s functionalities.
In the next sections, we illustrate the hierarchy of communication methods Agora supports natively and the concept of PD; we then provide an example of how Agora works and how it enables versatile, efficient, and portable communication. We conclude by emphasising how one can integrate and build upon Agora with further technological layers independently from its underlying technologies.
4.1 Communication in (an) Agora
Agora introduces a machine-readable way to transfer and refer to protocols, namely the protocol documents (PDs). A PD is a plain-text description of a communication protocol. Throughout this paper, we use the word “protocol” to refer to any standardised description of structured communication. PDs are self-contained, implementation-agnostic, and contain everything an agent needs to support a protocol: this means that most descriptions of existing protocols, such as RFCs, are also suitable PDs. However, instead of relying on a central body to assign identifiers, a PD is uniquely identified by its hash (for multiplexing).
In Agora, the most frequent communications have dedicated efficient routines, and the least frequent ones use inefficient but flexible LLMs and natural language. In particular:
- When possible, frequent communications are handled through traditional protocols, for which there are standard, human-written implementations (e.g., OBP);
- For communications that happen less frequently (or for which there are no standard protocols), agents can use structured data as an exchange medium (which can be handled by LLM-written routines);
- For communications that might be frequent for one side but not the other, the agents still use structured data, but one side can choose to use an LLM, while the other uses a routine;
- For rare communications or when a routine fails unexpectedly, the agents can resort to natural language.
It is entirely up to the agent to handle a query using a human-written routine, an LLM-written routine, or an LLM (or a combination of these three). This gives the agent maximum flexibility over how to process queries. Forcing or nudging a model to use a specific communication style can improve efficiency, yet its discussion is out of the scope of this paper. One can, for example, specify in the system prompt of an LLM to negotiate a protocol whenever possible. In the Demo (Section 5.3), we will illustrate the trade-off between the versatility of a communication protocol and its expected usage.
Hierarchical communications support any form of communication (maximum versatility) , although in practice an LLM is invoked in very rare cases (maximum efficiency). Moreover, since LLMs can implement routines on their own (since PDs fully describe the syntax and semantics of a protocol), human programmers only need to provide an overview of the tools the agent has access to, which means that the implementation effort required on the human side is minimal (maximum portability). In other words, Agora sidesteps the Communication Trilemma by employing routines for frequent requests and resorting to natural language when agents need to negotiate efficient ways to solve a problem or errors occur.
4.2 An Example of Communication over Agora
<details>
<summary>img/pd-negotiation.png Details</summary>

### Visual Description
## Diagram: LLM-Powered Node Communication
### Overview
The image illustrates two communication scenarios between LLM-Powered Nodes. The first scenario involves natural language communication and negotiation of a PD (presumably a "Policy Document") with a hash value. The second scenario depicts a message formatted as a PD, also with a hash value.
### Components/Axes
* **Nodes:** Represented by black icosahedron-like shapes with blue outlines.
* **LLM-Powered Node:** Text label above the left node.
* **Communication Lines:** Solid and dashed blue lines indicating the flow of information.
* **Text Labels:** Describe the type of communication and associated data.
* **Icons:** A document with a pencil represents "Negotiate PD," and a document with lines represents "Message formatted as PD."
### Detailed Analysis
**Left Side:**
* **Node 1 (top-left):** Labeled "LLM-Powered Node."
* **Communication Line 1:** A solid blue line connects Node 1 to Node 2 (bottom-center). The line is labeled "Natural language."
* **Node 2 (bottom-center):** No explicit label.
* **Negotiation:** A dashed blue line connects Node 1 to an icon of a document with a pencil. Another dashed blue line connects the document icon to Node 2. The text "Negotiate PD hash '123'" is positioned below the document icon.
**Right Side:**
* **Node 3 (top-right):** No explicit label.
* **Communication Line 2:** A solid blue line connects Node 3 to Node 4 (bottom-right).
* **Node 4 (bottom-right):** No explicit label.
* **Message Formatting:** An icon of a document with lines is positioned along the communication line. The text "Message formatted as PD hash '123'" is positioned near the document icon.
### Key Observations
* The diagram highlights two distinct communication methods: natural language and formatted messages.
* Both communication scenarios involve a "PD" and a "hash '123'."
* The dashed lines on the left side suggest a negotiation process, while the solid line on the right side suggests a direct message.
### Interpretation
The diagram illustrates how LLM-powered nodes can communicate using both natural language and structured messages (PDs). The "Negotiate PD" scenario suggests a dynamic exchange where nodes agree on a policy document. The "Message formatted as PD" scenario suggests a more direct transmission of a pre-defined policy document. The hash value likely serves as a means of verifying the integrity of the PD in both scenarios. The diagram implies that LLMs can facilitate both human-readable (natural language) and machine-readable (PD) communication protocols.
</details>
Figure 3: How a protocol document is negotiated between LLM-powered agents (left) and used for future efficient communications.
We now describe how two agents, Alice and Bob, can efficiently communicate over Agora using a PD routine, as illustrated in Figure 3. Alice initially sends a query with the hash of its corresponding PD. Bob uses the hash to determine if he has a corresponding routine. If so, he calls it and handles the communication without invoking the LLM. Otherwise, Bob handles the response with the LLM itself.
If Bob uses an LLM to reply several times to queries that follow a given protocol over time, to the point where using an LLM every time becomes expensive, he can use the LLM to write a routine that handles future communications.
If the routine fails or the communication is a one-off instance that does not require a protocol, Alice and Bob use natural language, which is again handled by the LLM. Natural language is also available to bootstrap communication between nodes that have never interacted before, as well as to negotiate new protocols. That said, the lower cost of routines and the lack of ambiguity are strong incentives for agents to prefer structured data.
Note that PDs can be shared with other nodes in the network, which means that two agents that have never interacted before can use protocols developed by other agents.
In the Appendix A, we provide details of five use cases of Agora to further show its versatility as a personal assistant and data analysis tool, and how it leverages compositionality and scalability to reduce costs.
4.3 Agora as a Layer Zero Protocol
Figure 2 illustrates that Agora is implementation and technology agnostic. The implementation of the agents themselves (e.g., LLMs), the database used to store data (e.g., VectorDB, SQL, MongoDB, etc.), the language in which implementations are written (Python, Java, etc.) and the nature of tools are all abstracted.
At the same time, PDs can refer to other protocol documents, and since routines can call other routines, agents can build upon previous negotiations to solve more complex tasks.
Finally, the versatility and portability of Agora make it straightforward to handle the addition or removal of a node, a change in the capabilities of a node, or a change in the goals of the network, as illustrated in the demo, Section 5.3.
All these factors contribute to making Agora a natural Layer Zero protocol, i.e. a foundation layer, for higher-order communication and collaboration between LLMs. We hope our protocol can fuel theoretical and applied research on complex protocols, negotiation schemes, and consensus algorithms in large networks of LLMs.
5 Agora in Practice
We implement and showcase two scenarios where Agora can be applied. The former, with two agents whose objective is to exchange some data; the latter, with $100$ , to test Agora scalability and the capacity of LLM-powered agents to autonomously coordinate in complex scenarios. For space reasons, the scenarios are further expanded in Appendices C and D; here, we instead focus on their functionalities and the key observations we drew in terms of efficiency/versatility/portability, reduction of costs, scalability and emergent behaviours of fully automated networks of LLMs.
5.1 Implementation Details
The design of Agora for our working demos follows three key principles:
- Minimality. Agora enforces the basic standards that allow for efficient negotiation and use of protocols, leaving everything else to PDs or other higher-order standards;
- Decentralisation. Agora does not rely on central authorities, with any collection of nodes being able to use Agora independently;
- Full backward compatibility. Agora supports existing communication protocols and schemas such as OpenAPI and JSON-Schema.
From a practical point of view, Agora uses HTTPS as base communication layer and JSON as format to exchange metadata. When sending a message in a given protocol, an agent sends a JSON document with three keys: the protocol hash, the body of the request formatted according to the protocol, and a non-empty list of sources from which the protocol can be downloaded. The receiver downloads the PD from its preferred source and, upon checking that the hash matches, stores it for future uses. This hash-based identification system ensures that any node can reference any PD without relying on a central authority to assign identifiers. Where PDs are stored is entirely up to the agents; aside from regular cloud storage, hash-based indexing makes decentralised storage options (such as IPFS Benet (2014)) viable. Additionally, since essentially all protocols can be stored as PDs, Agora has full backwards compatibility with existing protocols (although human programmers are encouraged to provide existing, standardised implementations instead of having the LLM re-implement them from scratch).
To simplify negotiation, an agent can expose an endpoint with a list of supported protocols: a potential sender can thus compare the list with its own to automatically determine if there is a common protocol. The sender can also use a potentially unsupported protocol, although the receiver can choose to reject it by returning a predefined error message.
Refer to LABEL:sec:\pname{}-specification for a more formal description of Agora.
5.2 Demo: Retrieving Weather Data
Consider two agents, Alice and Bob. Alice is a Llama-3-405B (Dubey et al., 2024b) powered agent managing the bookings of a guided tour service in London. While Llama-3 models can be hosted locally, for the sake of a proper comparison with GPT-4o and Gemini, we use a cloud provider, namely SambaNova (https://sambanova.ai). Bob is a GPT-4o (Achiam et al., 2023) agent for weather service that provides weather forecasts for a given date and location. As part of the user interaction loop, Alice notifies the user if heavy raining is expected on a booked date.
To check the weather, she initially uses her LLM to send a natural language query to Bob (phase A1):
Alice - Natural Language
What is the weather forecast for London, UK on 2024-09-27?
Bob uses his Toolformer LLM (Schick et al., 2024) to query his database (phase B1) and returns a natural language reply (phase B2):
Bob - Natural Language
The weather forecast for London, UK, on 2024-09-27 is as follows: “Rainy, 11 degrees Celsius, with a precipitation of 12 mm.”
Over time, the cost of invoking an LLM for phases A1 and B2 dominate all the other costs; Alice and Bob thus decide to develop a protocol. Alice checks if Bob already supports a suitable protocol but finds none. Therefore, she decides to negotiate a protocol with Bob. After a few rounds of negotiation, Alice and Bob agree on the following protocol: Alice sends a JSON document with two fields, location and date, and Bob replies with a JSON document containing three fields, namely temperature (in degrees Celsius), precipitation (in millimetres), and weatherCondition (one of “sunny”, “cloudy”, “rainy” and “snowy”). From there on, Alice specifies the protocol hash when performing a query. An example of exchanged message (excluding Agora’s metadata) is:
Alice - PD
{"location": "London, UK", "date": "2024-09-27"}
Both Alice and Bob independently decide to write a routine to handle their side of the communication. From now on, Alice and Bob do not need to use the LLM to transmit traffic data: a routine now automates phases A1, B1 and B2 and leverages the costs of invoking the respective LLMs.
A cost analysis.
In our demo, negotiating the protocol and implementing the routines cost $0.043$ USD in API calls, compared to an average cost of $0.020$ USD for a natural-language exchange. This means that, as long as Alice and Bob use the agreed-upon protocol more than twice, Agora reduces the overall cost. Please refer to Appendix C for a transcription of the negotiation process and the final protocol.
As a final note, we stress that the entire communication happened without human intervention. Additionally, should Bob become unavailable, Alice can simply reuse the PD with a new node that may use a different LLM/database/technology stack.
5.3 Demo: a Network of 100 Agents
<details>
<summary>img/agora-100.png Details</summary>

### Visual Description
## Diagram: Agora Sub Network Food Order Flow
### Overview
The image illustrates the flow of a food order within the Agora sub-network. It depicts the interactions between different parties (customer, restaurant, delivery service) using "PD" (presumably Protocol Data) and their corresponding hash values. The diagram is divided into two main sections: the initial order placement and the subsequent replies and checks.
### Components/Axes
* **Nodes:**
* Black Diamond: Represents the start and end points of the process.
* Food Plate Icon: Represents the restaurant or food order.
* Scooter Icon: Represents the delivery service.
* Car Icon: Represents traffic flow.
* Document Icon: Represents negotiation.
* **Labels:**
* "Traffic flow PD Hash '600'": Associated with the car icon.
* "'Orders food from a restaurant'": Associated with the initial order placement.
* "Delivers food PD Hash '234'": Associated with the scooter icon.
* "Negotiate PD hash '123'": Associated with the document icon.
* "Agora sub network": Title of the network.
* Numbered circles (1-6): Indicate the sequence of steps.
* "Uses a PD to check the delivery service": Description of step 2.
* "Uses a PD to check the traffic service": Description of step 3.
* "Replies with a PD that the traffic flows": Description of step 4.
* "Replies with a PD that the rider is available": Description of step 5.
* "Replies with a PD that the food is being delivered": Description of step 6.
* "Result": Title of the final stage.
* "The order starts": Description of the final stage.
### Detailed Analysis or ### Content Details
1. **Initial Order Placement (Left Side):**
* The process begins with a black diamond.
* An arrow points from the diamond to the text "'Orders food from a restaurant'" and then to a food plate icon.
* Another arrow points from the diamond to a document icon labeled "Negotiate PD hash '123'".
* A third arrow points from the food plate icon to a scooter icon labeled "Delivers food PD Hash '234'".
* An arrow points from the scooter icon to a car icon labeled "Traffic flow PD Hash '600'".
2. **Order Flow (Right Side):**
* **Step 1:** Starts with a black diamond. An arrow points down to the text "'Orders food from a restaurant'" and then to a food plate icon.
* **Step 2:** An arrow points down to the text "Uses a PD to check the delivery service" and then to a scooter icon.
* **Step 3:** An arrow points down to the text "Uses a PD to check the traffic service" and then to a car icon.
* **Step 4:** An arrow points down to the text "Replies with a PD that the traffic flows" and then to a scooter icon.
* **Step 5:** An arrow points down to the text "Replies with a PD that the rider is available" and then to a food plate icon.
* **Step 6:** An arrow points down to the text "Replies with a PD that the food is being delivered" and then to a scooter icon.
3. **Result (Bottom Right):**
* The "Result" section is enclosed in a rectangle.
* It shows a car icon, a food plate icon, and a scooter icon.
* The text "The order starts" is followed by an arrow pointing to a black diamond.
### Key Observations
* The diagram uses icons and text to represent different stages of a food order.
* The "PD" and hash values suggest a system that uses protocol data and hashing for verification or tracking.
* The diagram highlights the importance of checking delivery service and traffic service in the food order process.
* The "Result" section indicates the successful initiation of the order.
### Interpretation
The diagram illustrates a decentralized system (Agora sub-network) for food ordering, where different parties interact using Protocol Data (PD) and cryptographic hashes. The process involves negotiation, order placement, delivery, and traffic monitoring. The use of PD and hashes suggests a system that prioritizes security and verifiability. The diagram emphasizes the importance of checking the delivery service and traffic conditions before initiating the order, indicating a focus on efficiency and reliability. The final "Result" section confirms the successful start of the order, suggesting a complete and functional system.
</details>
Figure 4: Illustration of how in an Agora network with $100$ agents (left; for clarity, only the relevant sub-network is displayed), an emergent protocol for food delivery emerges (right).
We now show the scaling capabilities and emergent behaviours of Agora by considering a network of 100 LLM-powered agents. In particular, we scale the number of agents, which, as posited in Chaabouni et al. (2022), is a requisite for the emergence of complex behaviours in multi-agent networks.
We design a network of $85$ assistant agents interacting with $15$ server agents, all powered by LLMs. The server agents offer various services, such as booking hotel rooms, calling taxis, ordering food, etc. An example of a sub-network for food delivery is sketched in Figure 4, left. Their specialisation is handled via prompting, as in Deshpande et al. (2023); Joshi et al. (2023); Li et al. (2023a). As part of their workflow, server agents must interact with several tools and databases; additionally, some servers need to interact with other servers to complete assistants’ requests (e.g., taxi services use the traffic data agent to adjust estimated fares for a run). We bootstrap the network by leveraging the underlying communication layer (as described in Section 4 and Figure 2) and inform the nodes of which URLs correspond to which node, as well as manually creating the connection links between agents (e.g. the Taxi Service server knows that the server on port 5007 is a traffic server, but it does not know how to communicate with it and what information it requires);
To showcase the portability of Agora throughout the network, we use different database technologies (SQL and MongoDB) and different LLMs, both open- and closed-source (GPT-4o, Llama-3-405B, and Gemini 1.5 Pro (Reid et al., 2024)). We then generate $1000$ random queries, which range from simple ones, such as requesting today’s weather, to more complex ones, like booking rooms in ski resorts, buying tickets for movies, ordering one of each dish from a menu, and so on. For each query, assistants receive a JSON document (which represents the task data) and are tasked with fulfilling the request and returning a parsed response that follows a given schema. Queries are distributed among assistants following a Pareto distribution, to simulate some assistants sending significantly more requests than others. Each node can also read and share PDs to one of three protocol databases. Overall, these design decisions result in a very heterogeneous network, testing the limits of Agora. Refer to Appendix D for further implementation details.
Emergent protocols in large networks.
Once the connections are established and the networks can send and receive messages, we observe several noteworthy behaviours. As PDs are progressively shared between agents (see Figure 5(b)), we observe the emergence of a decentralised consensus on the appropriate protocols for a given task. An example of this behaviour involves ordering food from restaurants: an agent queries another to request food to be delivered to a certain address. The restaurant agent requests a delivery driver from a food delivery service, who, in turn, checks with the traffic data agent to see if the traffic is smooth enough to fulfil the delivery. None of the agents know each other’s roles and the protocols involved beyond their immediate communication. Still, the interaction of the various agents creates an automated workflow that takes care of everything. The emergence of such a protocol is illustrated in Figure 4 (right). In contrast with some recent literature on the emergence of complex protocols (Chaabouni et al., 2019), we observe that with the proper incentives (i.e., efficiency), agents in Agora escape the inefficient trap of committing to longer messages in large scale communications.
A cost analysis.
We compare the cost of running our Agora network against one that uses natural language for all communications. As shown in Figure 5(a), at the beginning Agora’s cost-efficiency marginally outperforms the network that relies only on natural language; this gap increases over time, with progressively more Agora-powered nodes relying on LLM-written routines. The overall cost in API queries for running $1000$ queries in the natural language network is $36.23$ USD, compared to Agora’s $7.67$ USD: in other words, executing this demo with Agora is approximately five times cheaper than with regular natural language. Continuing the demo for more queries would have led to an even larger cost difference.
<details>
<summary>x1.png Details</summary>

### Visual Description
## Line Chart: Cost vs. Queries for Natural Language and Agora
### Overview
The image is a line chart comparing the cost (in USD) of "Natural Language" and "Agora" queries over a range of 0 to 1000 queries. The chart displays how the cost fluctuates for each method as the number of queries increases.
### Components/Axes
* **X-axis:** "Queries" ranging from 0 to 1000, with tick marks at 0, 200, 400, 600, 800, and 1000.
* **Y-axis:** "Cost (USD)" ranging from 0.000 to 0.040, with tick marks at 0.000, 0.005, 0.010, 0.015, 0.020, 0.025, 0.030, 0.035, and 0.040.
* **Legend:** Located in the center-right of the chart.
* **Natural Language:** Represented by a teal line.
* **Agora:** Represented by a light orange line.
### Detailed Analysis
* **Natural Language (Teal Line):**
* Trend: Generally stable with some fluctuations.
* Initial Cost (at 0 queries): Approximately 0.034 USD.
* Cost at 200 queries: Approximately 0.039 USD.
* Cost at 400 queries: Approximately 0.036 USD.
* Cost at 800 queries: Approximately 0.030 USD.
* Cost at 1000 queries: Approximately 0.033 USD.
* **Agora (Light Orange Line):**
* Trend: Decreases sharply initially, then stabilizes at a lower cost.
* Initial Cost (at 0 queries): Approximately 0.028 USD.
* Cost at 200 queries: Approximately 0.010 USD.
* Cost at 400 queries: Approximately 0.007 USD.
* Cost at 800 queries: Approximately 0.005 USD.
* Cost at 1000 queries: Approximately 0.005 USD.
### Key Observations
* The cost of "Natural Language" queries remains relatively constant, fluctuating between approximately 0.030 and 0.040 USD.
* The cost of "Agora" queries decreases significantly in the initial phase (0-400 queries) and then stabilizes at a much lower cost, around 0.005 USD.
* "Agora" is significantly cheaper than "Natural Language" after the initial decrease.
### Interpretation
The chart suggests that "Agora" is a more cost-effective solution for a large number of queries compared to "Natural Language." While "Natural Language" maintains a stable cost, "Agora" rapidly decreases in cost and remains low, making it potentially more scalable and economical for applications involving numerous queries. The initial higher cost of Agora may be due to initial setup or overhead, but the long-term cost benefit is clear.
</details>
(a) Cost comparison of natural language vs Agora on a network of $100$ agents. Costs are averaged with a window size of $100$ .
<details>
<summary>x2.png Details</summary>

### Visual Description
## Dual-Axis Line Chart: LLM Queries vs. Number of Protocols
### Overview
The image is a dual-axis line chart showing the relationship between the percentage of queries using Large Language Models (LLMs) and the number of protocols over a range of queries. The x-axis represents the number of queries, ranging from 0 to 1000. The left y-axis represents the percentage of queries with LLMs, ranging from 20% to 80%. The right y-axis represents the number of protocols, ranging from 0 to 14.
### Components/Axes
* **X-axis:** "Queries" - Ranges from 0 to 1000 in increments of 200.
* **Left Y-axis:** "Queries with LLMs (%)" - Ranges from 20 to 80 in increments of 10.
* **Right Y-axis:** "Number of Protocols" - Ranges from 0 to 14 in increments of 2.
* **Blue Line:** Represents "Queries with LLMs (%)".
* **Red Line:** Represents "Number of Protocols".
### Detailed Analysis
* **Queries with LLMs (%) (Blue Line):**
* **Trend:** The blue line shows a decreasing trend overall.
* **Data Points:**
* Starts at approximately 83% at 0 queries.
* Decreases sharply to approximately 50% around 100 queries.
* Fluctuates between 40% and 50% until around 300 queries.
* Continues to decrease, reaching a low of approximately 25% around 450 queries.
* Fluctuates between 25% and 35% for the remainder of the chart, ending at approximately 30% at 1000 queries.
* **Number of Protocols (Red Line):**
* **Trend:** The red line shows a stepwise increasing trend.
* **Data Points:**
* Starts at approximately 0 protocols at 0 queries.
* Increases to approximately 4 protocols around 50 queries.
* Increases to approximately 8 protocols around 150 queries.
* Increases to approximately 10 protocols around 250 queries.
* Increases to approximately 13 protocols around 350 queries.
* Increases to approximately 14 protocols around 550 queries.
* Remains constant at 14 protocols from approximately 550 queries to 1000 queries.
### Key Observations
* The percentage of queries with LLMs decreases significantly as the number of queries increases.
* The number of protocols increases in a stepwise manner as the number of queries increases, eventually plateauing.
* There appears to be an inverse relationship between the percentage of queries with LLMs and the number of protocols, especially in the initial phase.
### Interpretation
The chart suggests that as the number of queries increases, the reliance on LLMs for those queries decreases, while the number of protocols used increases and then plateaus. This could indicate that as more queries are processed, the system adapts and relies less on LLMs, possibly due to the development of more efficient or specialized protocols. The initial inverse relationship might suggest that LLMs are initially heavily used to handle new or unknown query types, but as the system learns, more specific protocols are developed, reducing the need for LLMs. The plateau in the number of protocols could indicate a saturation point where all necessary protocols have been developed.
</details>
(b) The number of queries to the LLMs in Agora decreases over time as the number of established PDs grows.
Figure 5: Summary of the efficiency of Agora for the demo with 100 agents.
6 Conclusions
In this paper, we introduced Agora, a meta protocol that sidesteps the Agent Communication Trilemma by using a mix of natural language and structured protocols. We showed that Agora agents can negotiate, implement and use protocols, creating self-organising networks that solve complex tasks. Additionally, we demonstrated the scalability of Agora by testing a $100$ -agent demo and achieving a five-fold reduction in costs compared to natural language-only communication. Our results showcase the power of negotiation as a basis for efficient, scalable, and decentralised agent networks. As LLMs continue to improve and as interactions between them increase, LLM-powered agent networks have the potential to surpass the scale limitations of single LLMs. Developing frameworks and protocols that enable decentralised, flexible and efficient communication, either through Agora or other technologies, can lay the foundations for a future where complex activities are partially, if not fully, automated by LLMs.
Acknowledgements
We thank the Alan Turing Institute for providing the computational power to run our agent network, as well as SambaNova for providing credits for our Llama 3 experiments. Samuele Marro is funded by Microsoft Research Ltd. Emanuele La Malfa is funded by the Alan Turing Institute. Jesse Wright is funded by the Department of Computer Science of the University of Oxford.
References
- Achiam et al. (2023) Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Agashe et al. (2023) Saaket Agashe, Yue Fan, and Xin Eric Wang. Evaluating multi-agent coordination abilities in large language models. arXiv preprint arXiv:2310.03903, 2023.
- Arulkumaran et al. (2017) Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
- Bakhtin et al. (2022) Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Meta Fundamental AI Research Diplomacy Team (FAIR)† Hu, Hengyuan, et al. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Beckett et al. (2014) David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. Rdf 1.1 turtle. World Wide Web Consortium, 2014.
- Benet (2014) Juan Benet. Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561, 2014.
- Berners-Lee et al. (2001) Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific american, 284(5):34–43, 2001.
- Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Chaabouni et al. (2019) Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, and Marco Baroni. Anti-efficient encoding in emergent communication. Advances in Neural Information Processing Systems, 32, 2019.
- Chaabouni et al. (2022) Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, and Bilal Piot. Emergent communication at scale. In International conference on learning representations, 2022.
- Chen et al. (2023) Huaben Chen, Wenkang Ji, Lufeng Xu, and Shiyu Zhao. Multi-agent consensus seeking via large language models. arXiv preprint arXiv:2310.20151, 2023.
- Chen et al. (2024) Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, and Chuchu Fan. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4311–4317. IEEE, 2024.
- Collins et al. (2022) Katherine M Collins, Catherine Wong, Jiahai Feng, Megan Wei, and Joshua B Tenenbaum. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. arXiv preprint arXiv:2205.05718, 2022.
- Cross et al. (2024) Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Yamins, and Nick Haber. Hypothetical minds: Scaffolding theory of mind for multi-agent tasks with large language models. arXiv preprint arXiv:2407.07086, 2024.
- Deshpande et al. (2023) Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023.
- Du et al. (2023) Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
- Dubey et al. (2024a) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, et al. The llama 3 herd of models. 2024a. URL https://api.semanticscholar.org/CorpusID:271571434.
- Dubey et al. (2024b) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024b.
- Fedorenko et al. (2024) Evelina Fedorenko, Steven T. Piantadosi, and Edward A. F. Gibson. Language is primarily a tool for communication rather than thought. In Nature, pp. volume 630. Springer Nature, 2024.
- Fu et al. (2023) Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023.
- Gao et al. (2024) Chen Gao, Fengli Xu, Xu Chen, Xiang Wang, Xiangnan He, and Yong Li. Simulating human society with large language model agents: City, social media, and economic system. In Companion Proceedings of the ACM on Web Conference 2024, pp. 1290–1293, 2024.
- Gilbert (2019) Nigel Gilbert. Agent-based models. Sage Publications, 2019.
- Gilbert & Terna (2000) Nigel Gilbert and Pietro Terna. How to build and use agent-based models in social science. Mind & Society, 1:57–72, 2000.
- Guo et al. (2024) Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024.
- Havrylov & Titov (2017) Serhii Havrylov and Ivan Titov. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. Advances in neural information processing systems, 30, 2017.
- Henderson et al. (2018) Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Hong et al. (2023) Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Horty (2001) John F Horty. Agency and deontic logic. Oxford University Press, 2001.
- Hou et al. (2023) Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 2023.
- Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Hua et al. (2023) Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
- Jin et al. (2023) Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783, 2023.
- Joshi et al. (2023) Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a way to model truthfulness in language models. arXiv preprint arXiv:2310.18168, 2023.
- Kaplan et al. (2020) Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Kassner et al. (2020) Nora Kassner, Benno Krojer, and Hinrich Schütze. Are pretrained language models symbolic reasoners over knowledge? arXiv preprint arXiv:2006.10413, 2020.
- La Malfa et al. (2023) Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G Cohn, Nigel Shadbolt, and Michael Wooldridge. Language models as a service: Overview of a new paradigm and its challenges. arXiv e-prints, pp. arXiv–2309, 2023.
- Lazaridou & Baroni (2020) Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419, 2020.
- Lazaridou et al. (2018) Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. Emergence of linguistic communication from referential games with symbolic and pixel input. arXiv preprint arXiv:1804.03984, 2018.
- Li et al. (2023a) Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for” mind” exploration of large language model society. Advances in Neural Information Processing Systems, 36:51991–52008, 2023a.
- Li et al. (2023b) Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701, 2023b.
- Li et al. (2024) Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. More agents is all you need. arXiv preprint arXiv:2402.05120, 2024.
- Liang et al. (2023) Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023.
- Lin et al. (2024) Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, and Janet B Pierrehumbert. Graph-enhanced large language models in asynchronous plan reasoning. arXiv preprint arXiv:2402.02805, 2024.
- Liu et al. (2024) Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36, 2024.
- OBL (2024) OBL. Open banking read write api profile v4.0. 2024. URL https://openbankinguk.github.io/read-write-api-site3/v4.0/profiles/read-write-data-api-profile.html.
- Olivé (2007) Antoni Olivé. Conceptual modeling of information systems. Springer Science & Business Media, 2007.
- Pang et al. (2024) Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, and Siheng Chen. Self-alignment of large language models via multi-agent social simulation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024.
- Rafailov et al. (2024) Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
- Reid et al. (2024) Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024.
- Russell & Norvig (2016) Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Pearson, 2016.
- Schick et al. (2024) Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024.
- Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Singh et al. (2024) Ishika Singh, David Traum, and Jesse Thomason. Twostep: Multi-agent task planning using classical planners and large language models. arXiv preprint arXiv:2403.17246, 2024.
- Song et al. (2023) Junghwan Song, Heeyoung Jung, Selin Chun, Hyunwoo Lee, Minhyeok Kang, Minkyung Park, Eunsang Cho, et al. How to decentralize the internet: A focus on data consolidation and user privacy. Computer Networks, 234:109911, 2023.
- Sutton (2018) Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Wooldridge (2009) Michael Wooldridge. An introduction to multiagent systems. John wiley & sons, 2009.
- Wooldridge & Jennings (1995) Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
- Wu et al. (2024a) Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024a.
- Wu et al. (2024b) Zengqing Wu, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Run Peng, and Chuan Xiao. Shall we talk: Exploring spontaneous collaborations of competing llm agents. arXiv preprint arXiv:2402.12327, 2024b.
- Zhang et al. (2024) Biao Zhang, Zhongtao Liu, Colin Cherry, and Orhan Firat. When scaling meets llm finetuning: The effect of data, model and finetuning method. arXiv preprint arXiv:2402.17193, 2024.
Appendix A Agora: Use Cases
S1. Agora as a personal assistant.
A user is organising a trip to Paris: they want to book a flight, rent a car, and book a hotel room.
The LLM reads the prompt, identifies the actions it has to undertake and checks if there are LLMs available in Agora who can fulfil it. For each service, an LLM is ready to reply.
1. A user sends a message to its personal assistant. 2. The personal assistant dispatches it to Agora.
<details>
<summary>img/scenarios/s1-1.png Details</summary>

### Visual Description
## Diagram: LLM Assistant Dispatch
### Overview
The image illustrates the process of dispatching a user's request to an LLM's assistant for booking travel arrangements. It shows a user interacting with a phone, which then sends a request to an LLM assistant represented by a network of interconnected nodes representing different services (flight, hotel, car).
### Components/Axes
* **Left Side:**
* A silhouette of a head facing right.
* A smartphone with a blank screen.
* Text: "I want to book a flight, a hotel and a car for next week in Paris." The words "flight", "hotel", and "car" are colored blue, orange, and red, respectively.
* **Middle:**
* Text: "Dispatch to the LLMs' assistant"
* An arrow pointing from left to right.
* **Right Side:**
* A network of four interconnected nodes. Each node contains an icon:
* Top: Airplane (representing flight)
* Right: Car
* Bottom: Hotel
* Left: Empty circle
### Detailed Analysis or ### Content Details
* **User Request:** The user's request is a natural language sentence indicating the desire to book a flight, hotel, and car for a trip to Paris next week.
* **Dispatch:** The arrow indicates the process of sending the user's request to the LLM's assistant.
* **LLM's Assistant Network:** The network of nodes represents the LLM's assistant, which is capable of handling different aspects of the user's request.
* The airplane icon represents the flight booking service.
* The car icon represents the car rental service.
* The hotel icon represents the hotel booking service.
* The empty circle represents an unknown service.
* The connections between the nodes indicate the relationships and dependencies between the different services.
### Key Observations
* The diagram simplifies the complex process of an LLM assistant handling a user's request.
* The use of icons makes it easy to understand the different services involved.
* The connections between the nodes suggest that the LLM's assistant can coordinate different services to fulfill the user's request.
### Interpretation
The diagram illustrates how a user's natural language request is processed by an LLM's assistant to book travel arrangements. The LLM's assistant is represented as a network of interconnected services, each responsible for a specific aspect of the request (flight, hotel, car). The diagram suggests that the LLM's assistant can understand the user's intent, identify the relevant services, and coordinate them to fulfill the request. The empty circle suggests that the LLM's assistant may also have other capabilities beyond booking flights, hotels, and cars. The coloring of the words "flight", "hotel", and "car" in the user's request likely corresponds to the respective icons in the LLM's assistant network, indicating a direct mapping between the user's request and the services provided by the LLM.
</details>
The LLM that acts as personal assistant in the network dispatches the flight, hotel and car requests to the respective LLMs in the network. The messages are dispatched in natural language as there are no pre-existing routines to handle them.
1. The LLM personal assistant dispatches the respective messages to the right node. 2. The car, hotel, and flight LLMs process the requests and turn them into queries for their booking systems. 3. Each LLM replies with their availability and options.
<details>
<summary>img/scenarios/s1-2.png Details</summary>

### Visual Description
## Diagram: Travel Booking Process for Paris
### Overview
The image is a diagram illustrating a travel booking process for Paris. It shows the steps involved in booking a flight, car, and hotel room, with connections between an initial step and the final booking of each service.
### Components/Axes
* **Nodes:**
* A central, unlabeled node (empty circle) acts as the starting point.
* Three destination nodes, each within a circle, represent:
* Airplane icon: "Book a flight for Paris"
* Car icon: "Book a car in Paris"
* Hotel icon: "Book a hotel room in Paris"
* **Edges:**
* Solid black arrows indicate the primary flow from the central node to each of the destination nodes.
* Dashed gray lines connect the destination nodes (airplane, car, hotel) to each other, forming a triangle.
### Detailed Analysis
* **Central Node:** The unlabeled central node is the starting point of the process.
* **Flight Booking:** A solid black arrow points from the central node to the airplane icon, labeled "Book a flight for Paris".
* **Car Booking:** A solid black arrow points from the central node to the car icon, labeled "Book a car in Paris".
* **Hotel Booking:** A solid black arrow points from the central node to the hotel icon, labeled "Book a hotel room in Paris".
* **Interconnections:** Dashed gray lines connect the airplane icon to the car icon, the car icon to the hotel icon, and the hotel icon back to the airplane icon, forming a triangle.
### Key Observations
* The diagram shows a process where a user starts from a central point and branches out to book a flight, car, and hotel room in Paris.
* The dashed gray lines suggest a possible interdependency or relationship between booking these services.
### Interpretation
The diagram illustrates a typical travel booking workflow. The central node represents the initial decision to plan a trip to Paris. From there, the user proceeds to book individual components of the trip: flight, car, and hotel. The dashed lines connecting these components suggest that the booking of one service might influence the booking of another (e.g., booking a flight might influence the choice of hotel location). The diagram simplifies the booking process, focusing on the core elements of travel arrangement.
</details>
For the next iterations, the LLMs involved in the request propose a routine to standardise the requests to avoid natural language and process the request without invoking the LLMs.
<details>
<summary>img/scenarios/s1-3.png Details</summary>

### Visual Description
## Diagram: Travel Booking System Architectures
### Overview
The image presents two diagrams illustrating different architectures for a travel booking system. Diagram 1 depicts a centralized approach where a single entity manages lists of flights, cars, and hotels. Diagram 2 shows a distributed approach where flights, cars, and hotels communicate via a protocol.
### Components/Axes
**Diagram 1:**
* **Nodes:**
* Central Node: Represented by an empty circle.
* Flights: Represented by an airplane icon within a circle.
* Cars: Represented by a car icon within a circle.
* Hotels: Represented by a hotel icon within a circle.
* **Edges:**
* Solid Arrows: Indicate a direct relationship or flow of information.
* Dashed Lines: Indicate a potential or indirect relationship.
* **Labels:**
* "1": Located in a square in the top-left corner, identifying the diagram.
* "<list of flights>": Labeling the arrow from the central node to the flights node.
* "<list of cars>": Labeling the arrow from the central node to the cars node.
* "<list of hotels>": Labeling the arrow from the central node to the hotels node.
**Diagram 2:**
* **Nodes:**
* Flights: Represented by an airplane icon within a circle.
* Cars: Represented by a car icon within a circle.
* Hotels: Represented by a hotel icon within a circle.
* Document: Represented by a document icon.
* **Edges:**
* Solid Arrows: Indicate a direct relationship or flow of information.
* Dashed Lines: Indicate a potential or indirect relationship.
* **Labels:**
* "2": Located in a square in the top-left corner, identifying the diagram.
* "protocol": Labeling the arrows between the flights, cars, hotels, and document nodes.
### Detailed Analysis
**Diagram 1:**
* The central node has outgoing arrows labeled "<list of flights>", "<list of cars>", and "<list of hotels>" pointing to the respective nodes.
* Dashed lines connect the flights, cars, and hotels nodes, suggesting a potential relationship between them.
**Diagram 2:**
* The flights node has an outgoing arrow labeled "protocol" pointing to the document node.
* The hotels node has an outgoing arrow labeled "protocol" pointing to the document node.
* The cars node has an outgoing arrow labeled "protocol" pointing to the document node.
* The document node has an outgoing arrow labeled "protocol" pointing to the cars node.
* Dashed lines connect the flights, cars, and hotels nodes, suggesting a potential relationship between them.
### Key Observations
* Diagram 1 represents a centralized system, while Diagram 2 represents a distributed system.
* In Diagram 1, the central node manages the lists of flights, cars, and hotels.
* In Diagram 2, the flights, cars, and hotels communicate with each other via a protocol.
### Interpretation
The diagrams illustrate two different approaches to building a travel booking system. Diagram 1's centralized approach may be simpler to implement initially but could become a bottleneck as the system grows. Diagram 2's distributed approach offers greater scalability and resilience but requires more complex communication protocols. The choice between the two architectures depends on the specific requirements and constraints of the system.
</details>
The user receives all the data and decides whether to book or not.
S2. Security and scalability.
An LLM (Alice) collects some historical data from another LLM (Bob) that has access to a database whose internal mechanism and implementation are to keep private.
Alice submits a request to collect some historical records from Bob. The request is formatted in natural language.
<details>
<summary>img/scenarios/s2-1.png Details</summary>

### Visual Description
## Diagram: Data Access Restriction
### Overview
The image is a diagram illustrating a data access scenario. It depicts two entities, labeled A and B, and a data storage represented as a stack of disks. The diagram highlights that the data storage is "Not Accessible by other LLMs" and shows the flow of a request from A to B, and then from B to the data storage.
### Components/Axes
* **Nodes:**
* Node A: A circle containing the letter "A".
* Node B: A circle containing the letter "B".
* Top Node: A dotted circle above and between A and B.
* **Connections:**
* Solid black arrow: Connects A to B, indicating a request flow.
* Dashed gray lines: Connect A and B to the top node.
* Double-headed arrow: Connects B to the data storage.
* **Data Storage:** A stack of disks, located on the right side of the diagram.
* **Text Labels:**
* "I need the records from 2012": Text associated with the arrow from A to B.
* "Not Accessible by other LLMs": Text in a rounded rectangle above the data storage.
* **Background:** The right side of the diagram, containing the data storage and the "Not Accessible by other LLMs" label, is shaded in light gray, separated by a dotted vertical line.
### Detailed Analysis
* **Node A:** Located on the left side of the diagram.
* **Node B:** Located on the right side of the diagram, to the right of Node A.
* **Top Node:** Located above and between Nodes A and B, represented by a dotted circle.
* **Arrow from A to B:** A solid black arrow indicates the flow of a request from A to B. The text "I need the records from 2012" is positioned below this arrow, indicating the nature of the request.
* **Dashed Gray Lines:** These lines connect A and B to the top node, suggesting a potential relationship or communication path.
* **Data Storage:** The stack of disks represents a data storage unit.
* **Double-Headed Arrow:** This arrow indicates a bidirectional flow of data between Node B and the data storage.
* **"Not Accessible by other LLMs" Label:** This label indicates that the data storage is restricted and cannot be accessed by other Large Language Models (LLMs).
* **Shaded Background:** The light gray background on the right side, separated by a dotted vertical line, visually isolates the data storage and its access restriction.
### Key Observations
* The diagram highlights a scenario where Node A requests specific data ("records from 2012") from Node B.
* Node B interacts with a data storage unit that is explicitly marked as "Not Accessible by other LLMs."
* The dashed gray lines and the dotted circle at the top suggest a potential alternative path or relationship between A, B, and the top node.
### Interpretation
The diagram illustrates a data access restriction scenario. Node A initiates a request for specific data, which is then handled by Node B. Node B interacts with a data storage unit that is isolated and inaccessible to other LLMs. This suggests a security or privacy measure in place to protect the data within the storage unit. The dashed lines and dotted circle could represent a potential alternative access route or a shared context between A and B, but this path is not the primary focus of the diagram. The key takeaway is the restricted access to the data storage, emphasizing the importance of data security and privacy in the depicted system.
</details>
Alice submits another request to Bob.
Bob negotiates a protocol to query its data and writes a shared document protocol in JSON.
<details>
<summary>img/scenarios/s2-2.png Details</summary>

### Visual Description
## Diagram: Protocol Negotiation and Data Access
### Overview
The image is a diagram illustrating a process involving two entities, labeled A and B, and their interaction with a data storage component that is "Not Accessible by other LLMs". The diagram shows a negotiation phase between A and B, followed by B accessing the data storage based on a defined protocol.
### Components/Axes
* **Nodes:**
* Node A: A circle labeled "A".
* Node B: A circle labeled "B".
* Dotted Node: A circle made of dots, located above and between A and B.
* **Data Storage:** A stack of disks, located on the right side of the diagram. It is contained within a rounded rectangle with the label "Not Accessible by other LLMs".
* **Connections:**
* Solid Arrow: A solid arrow pointing from A to B, labeled "<negotiation of the protocol>".
* Solid Arrow: A solid arrow pointing from B to a document icon, labeled "protocol".
* Double Arrow: A double arrow pointing between B and the data storage.
* Dashed Lines: Dashed lines connecting the dotted node to A and B.
### Detailed Analysis or ### Content Details
* **Node A:** Located on the left side of the diagram.
* **Node B:** Located on the right side of the diagram.
* **Negotiation:** A solid arrow indicates the negotiation of the protocol from A to B. The text label is "<negotiation of the protocol>".
* **Protocol:** A solid arrow indicates the protocol from B to a document icon. The text label is "protocol".
* **Data Access:** A double arrow indicates the data access between B and the data storage.
* **Data Storage:** The data storage is labeled "Not Accessible by other LLMs", indicating that it is isolated from other Large Language Models (LLMs).
### Key Observations
* The diagram highlights a process where A and B negotiate a protocol, and B then uses this protocol to access a specific data storage.
* The data storage is explicitly marked as not accessible by other LLMs, suggesting a security or privacy constraint.
* The dotted node and dashed lines suggest a potential third party or intermediary, but its role is not explicitly defined.
### Interpretation
The diagram illustrates a secure data access scenario. Entity A and Entity B negotiate a protocol. Entity B then uses this protocol to access a data storage that is isolated from other LLMs. This isolation suggests a controlled environment, possibly to protect sensitive data or ensure compliance with specific regulations. The negotiation step implies that access is not granted freely but requires a prior agreement or authentication process. The dotted node could represent a trusted third party involved in the negotiation or a shared resource used by both A and B.
</details>
Alice now uses the protocol to query data from Bob.
Bob directly turns the JSON they receives from Alice into a query for its Database. In this way: Bob does not invoke the LLM and the database internals are not exposed.
<details>
<summary>img/scenarios/s2-3.png Details</summary>

### Visual Description
## Data Flow Diagram: LLM Data Access
### Overview
The image is a data flow diagram illustrating the access and limitations of data for Large Language Models (LLMs). It depicts two LLMs, labeled A and B, and a data storage component. The diagram highlights that data accessible to LLM B is not accessible to other LLMs.
### Components/Axes
* **Nodes:**
* Node A: A circle labeled "A" representing one LLM.
* Node B: A circle labeled "B" representing another LLM.
* Top Node: A dotted circle, unlabeled, representing a potential third LLM or data source.
* Database: A stack of disks representing a data storage component.
* **Edges:**
* Solid Arrow: A solid black arrow pointing from Node A to Node B, indicating data flow.
* Dotted Lines: Dotted gray lines connecting Node A and Node B to the top dotted circle, indicating a potential relationship or data flow.
* Double Arrow: A double-headed arrow between Node B and the database, indicating bidirectional data flow.
* **Text Labels:**
* `{ "record": 2013 }`: A JSON-like structure near Node A, representing data.
* "Not Accessible by other LLMs": A text box near the database, indicating access restrictions.
### Detailed Analysis or ### Content Details
* **Node A:** LLM A has a data record associated with it, represented as `{"record": 2013}`.
* **Data Flow:** Data flows from LLM A to LLM B via a solid black arrow.
* **Access Restriction:** The database is within a grayed-out area, labeled "Not Accessible by other LLMs," indicating that only LLM B can access this data.
* **Bidirectional Flow:** LLM B has bidirectional data flow with the database, indicated by the double-headed arrow.
* **Dotted Connections:** The dotted lines suggest a potential connection between LLM A, LLM B, and another entity (represented by the dotted circle at the top).
### Key Observations
* LLM A provides data to LLM B.
* LLM B has exclusive access to a specific database.
* The year "2013" is associated with the data record of LLM A.
* The dotted connections suggest a broader network or potential data sources.
### Interpretation
The diagram illustrates a scenario where LLM B has access to a dataset that is not available to other LLMs, including LLM A. This could be due to access controls, data silos, or other restrictions. The data flow from LLM A to LLM B suggests that LLM B might be using data from LLM A in addition to its exclusive dataset. The dotted connections imply a more complex data ecosystem, where other LLMs or data sources might be involved, but their access is limited or undefined in this context. The "record": 2013 could represent the year the data was created or last updated.
</details>
S3. Compositional tasks.
An LLM (Alice) wants to (1) analyse some market data and then (2) compute some metrics. Two LLMs in the network can do that.
1. Alice retrieves the protocol documents from a database. 2. Alice finds out that there are two protocol documents that can be used to achieve its goal.
<details>
<summary>img/scenarios/s3-1.png Details</summary>

### Visual Description
## Diagram: Data Processing Flow
### Overview
The image is a diagram illustrating a data processing flow. It depicts three distinct components: documents, a database, and a calculator, interconnected via dashed lines, with a central node labeled "A". The diagram outlines a process involving checking for protocol documents, retrieving numerical data, and computing metrics.
### Components/Axes
* **Documents:** Represented by three stacked document icons at the top-left.
* **Database:** Represented by a stack of disks within a circle at the top-right.
* **Calculator:** Represented by a calculator icon within a circle at the bottom-right.
* **Node A:** A central node labeled "A" within a circle at the bottom-left.
* **Connections:** Dashed lines connect Node A to the database and the calculator. Solid arrows point up and down between the documents and Node A.
* **Text Labels:**
* "Check if a protcol document exists" (above Node A)
* "Retrieve some numerical data and compute some metrics." (below Node A)
### Detailed Analysis
* **Documents to Node A:** Two vertical arrows, one pointing downwards from the documents to Node A and another pointing upwards from Node A to the documents, indicate a two-way interaction. The text "Check if a protcol document exists" is positioned near these arrows, suggesting a verification process.
* **Node A to Database:** A dashed line connects Node A to the database icon.
* **Node A to Calculator:** A dashed line connects Node A to the calculator icon.
* **Database to Calculator:** A dashed line connects the database icon to the calculator icon, completing a triangular relationship between the three components.
* **Node A Function:** The text "Retrieve some numerical data and compute some metrics" describes the function of Node A.
### Key Observations
* The diagram illustrates a data flow where Node A acts as a central processing unit.
* The process begins with checking for protocol documents, followed by retrieving numerical data, and then computing metrics.
* The dashed lines suggest a flow of information or data between the components.
### Interpretation
The diagram represents a simplified model of a data processing system. Node A likely represents a software component or process that interacts with documents, a database, and a calculator. The system checks for the existence of protocol documents, retrieves numerical data from the database, and performs calculations using the calculator component. The dashed lines indicate data dependencies or communication pathways between these components. The triangular relationship between the database, calculator, and Node A suggests that the database provides data for calculations, and Node A orchestrates this process. The diagram highlights the key steps and components involved in this data processing workflow.
</details>
1. Alice submits a request to the first agent to retrieve the data using the first protocol document. 2. Alice receives the data as expected.
<details>
<summary>img/scenarios/s3-2.png Details</summary>

### Visual Description
## Data Flow Diagram: Data Processing Stages
### Overview
The image is a data flow diagram illustrating two sequential data processing stages. Each stage involves an input, a processing step labeled "A", and an output to a data store. The diagram highlights the transformation of data from a document to a dataset, and then to another dataset after some analysis.
### Components/Axes
* **Nodes:** Two circular nodes, each labeled "A", representing processing steps.
* **Data Stores:** Two circular nodes containing a stack of disks, representing data storage.
* **Input:** A document icon on the left side of the diagram.
* **Output:** A scatter plot icon on the right side of the diagram.
* **Flow Arrows:** Arrows indicating the direction of data flow.
* **Labels:** Numbered squares labeled "1" (top-left) and "2" (bottom-right), possibly indicating stages or versions.
### Detailed Analysis or ### Content Details
**Stage 1:**
* **Input:** A document icon flows into processing node "A".
* **Processing:** Node "A" transforms the document data.
* **Output:** The output of node "A" flows into a data store represented by a stack of disks.
**Stage 2:**
* **Input:** The data store (stack of disks) is the input for a scatter plot.
* **Processing:** Node "A" processes the data.
* **Output:** The output of node "A" flows into another data store represented by a stack of disks.
**Icons:**
* **Document Icon:** Represents a document file, possibly containing raw data.
* **Data Store Icon:** Represents a database or data storage system.
* **Scatter Plot Icon:** Represents a dataset visualized as a scatter plot.
**Labels:**
* **"A"**: Label inside each processing node.
* **"1"**: Label in the top-left corner.
* **"2"**: Label in the bottom-right corner.
### Key Observations
* The diagram illustrates a two-stage data processing pipeline.
* The data is transformed from a document to a dataset, and then to another dataset after some analysis.
* The processing steps are represented by nodes labeled "A".
* The data stores are represented by stacks of disks.
### Interpretation
The diagram depicts a typical data processing workflow. In the first stage, raw data from a document is processed and stored in a database. In the second stage, the data is analyzed (possibly visualized as a scatter plot) and stored in another database. The labels "1" and "2" might indicate different versions or stages of the data processing pipeline. The processing step "A" could represent various data transformation or analysis operations. The diagram suggests a sequential flow of data from raw input to processed and analyzed output.
</details>
1. Alice submits a request to the second LLM to compute some metrics on the data using the second protocol document. 2. Alice receives the metrics as expected.
<details>
<summary>img/scenarios/s3-3.png Details</summary>

### Visual Description
## Diagram: Data Processing Flow
### Overview
The image presents a diagram illustrating two distinct data processing flows. Each flow starts with a labeled input, undergoes a transformation represented by the letter "A" within a circle, incorporates different types of data visualization, and culminates in a calculation step symbolized by a calculator icon.
### Components/Axes
* **Input Labels:** The flows are labeled "1" and "2" within square boxes on the left side.
* **Transformation Node:** Each flow includes a circular node labeled "A", representing a transformation or processing step.
* **Data Visualization:**
* Flow 1: A document icon is combined with a scatter plot.
* Flow 2: A bar chart is used.
* **Calculation Node:** Each flow ends with a calculator icon within a circle, representing a calculation or final processing step.
* **Flow Direction:** Arrows indicate the direction of data flow from left to right.
### Detailed Analysis or Content Details
**Flow 1:**
* Starts with input labeled "1".
* Passes through transformation node "A".
* Combines a document icon with a scatter plot. The scatter plot contains approximately 6 data points.
* Ends with a calculator icon.
**Flow 2:**
* Starts with input labeled "2".
* Passes through transformation node "A".
* Incorporates a bar chart with 4 bars of varying heights. The heights are approximately:
* Bar 1: 25% of max height
* Bar 2: 50% of max height
* Bar 3: 75% of max height
* Bar 4: 100% of max height
* Ends with a calculator icon.
### Key Observations
* Both flows share a similar structure, starting with an input, undergoing a transformation, and ending with a calculation.
* The key difference between the flows lies in the type of data visualization used: scatter plot vs. bar chart.
* The transformation node "A" is identical in both flows, suggesting a common processing step.
### Interpretation
The diagram illustrates two alternative data processing pipelines. The choice between the two pipelines likely depends on the nature of the input data and the desired type of analysis. Flow 1, using a scatter plot, might be suitable for exploring relationships between variables, while Flow 2, using a bar chart, might be used for comparing categorical data. The calculator icon at the end of each flow suggests that both pipelines ultimately lead to a numerical result or decision. The "A" node represents a common processing step that is applied to both types of data.
</details>
S4. Scalable consensus in large networks.
An LLM (Alice) wants to collect and aggregate data points from $N\gg 1$ resources. There is no protocol to handle that, and each resource has its own implementation, possibly not public.
1. Alice submits the requests in natural language. 2. Each queried LLM processes the request, turns it into a routine to retrieve the data and sends it back to Alice.
<details>
<summary>img/scenarios/s4-1.png Details</summary>

### Visual Description
## Diagram: Data Request and Retrieval Process
### Overview
The image depicts a diagram illustrating a data request and retrieval process, divided into two stages. The first stage shows a central node 'A' requesting data from multiple sources. The second stage shows the same central node 'A' receiving data from those sources. A dashed line connects the data sources, suggesting a relationship or sequence.
### Components/Axes
* **Nodes:** The diagram contains circular nodes representing data sources or processes. One central node is labeled 'A' in both stages.
* **Arrows:** Solid arrows indicate the direction of data flow.
* **Dashed Line:** A dashed line connects the data sources, potentially indicating a sequence or relationship.
* **Labels:**
* Top-left of the first diagram: "1" inside a square.
* Next to the first diagram: "Request data".
* Top-left of the second diagram: "2" inside a square.
* Next to the second diagram: "Process and retrieve data".
* Ellipsis: A set of three dots "..." is present between two nodes in both diagrams, indicating that there may be more nodes than are shown.
### Detailed Analysis or ### Content Details
**Diagram 1: Request Data**
* Node 'A' has three solid arrows pointing away from it, each directed towards a separate circular node.
* There are four circular nodes on the right side of node 'A', with an ellipsis between the second and third node.
* A dashed line connects the four circular nodes on the right side.
**Diagram 2: Process and Retrieve Data**
* Node 'A' has three solid arrows pointing towards it, each originating from a separate circular node.
* There are four circular nodes on the left side of node 'A', with an ellipsis between the second and third node.
* A dashed line connects the four circular nodes on the left side.
### Key Observations
* The diagram illustrates a two-step process: first, data is requested from multiple sources, and then data is retrieved from those sources.
* Node 'A' acts as a central point for both requesting and receiving data.
* The dashed line connecting the data sources suggests a relationship or sequence between them.
### Interpretation
The diagram represents a common data processing pattern where a central entity (A) requests data from multiple sources, processes it, and then retrieves the processed data. The dashed line could represent a data pipeline, a sequence of operations, or a dependency between the data sources. The ellipsis indicates that the diagram is a simplified representation and that there may be more data sources involved in the actual process. The diagram highlights the flow of data and the central role of entity 'A' in managing the data request and retrieval process.
</details>
Alice wants to retrieve more data and queries the network another time.
1. One or more receivers suggest using a protocol document for the next iterations. 2. Alice agrees and uses the protocols with as many resources as possible.
<details>
<summary>img/scenarios/s4-2.png Details</summary>

### Visual Description
## Diagram: Protocol Negotiation
### Overview
The image is a diagram illustrating a protocol negotiation process. It shows a central node labeled "A" connected to multiple other nodes via directed edges, representing the negotiation flow. A dashed line connects the last node back to the first, suggesting a cyclical or iterative process.
### Components/Axes
* **Nodes:** Represented by circles. One central node labeled "A" and multiple other unlabeled nodes.
* **Edges:** Directed arrows indicating the direction of negotiation.
* **Labels:**
* Central Node: "A"
* Edges: "<negotiation of the protocol>" (appears twice)
* Ellipsis: "..." indicating continuation of nodes.
* **Connections:** Solid black lines with arrowheads indicate the negotiation flow from node A to other nodes. Dashed gray lines indicate a connection between the last and first of the other nodes.
### Detailed Analysis
* **Central Node:** The central node, labeled "A", acts as the starting point for the negotiation process.
* **Negotiation Flow:** The solid black arrows indicate that the negotiation flows from node "A" to each of the other nodes. The text "<negotiation of the protocol>" is associated with these arrows.
* **Other Nodes:** There are four other nodes depicted, with an ellipsis ("...") suggesting that there may be more.
* **Cyclical Connection:** The dashed gray line connects the last node back to the first node, suggesting a loop or cycle in the negotiation process.
### Key Observations
* The diagram illustrates a star-like network topology with node "A" at the center.
* The negotiation process appears to be initiated by node "A" and directed towards other nodes.
* The dashed line suggests a feedback loop or a cyclical process.
### Interpretation
The diagram represents a protocol negotiation process where a central entity "A" initiates negotiations with multiple other entities. The "<negotiation of the protocol>" label suggests that the arrows represent the exchange of information or proposals related to the protocol. The dashed line indicates that the process is iterative, potentially involving adjustments or refinements based on the outcomes of previous negotiations. The ellipsis suggests that the number of entities involved in the negotiation may be variable or unspecified.
</details>
The successive communications will increasingly use protocol documents, thus not necessitating the receiver to process the query with the LLM.
S5. Scaling complex NLP routines.
An LLM (Alice) wants to retrieve data from a system powered by an LLM (Bob) that, in turns, obtains its data from a search engine (i.e., the LLM is combined with a RAG). Bob has to (1) turn the natural language request into a query, (2) retrieve the data from the RAG, and (3) return a summary.
Alice queries Bob to retrieve some data. There is no routine to handle any of the three phases, so Bob has to invoke the LLM twice to turn the query into a format to invoke the RAG and then perform the summarisation.
<details>
<summary>img/scenarios/s5-1.png Details</summary>

### Visual Description
## Diagram: LLM Query Processing
### Overview
The image presents a diagram illustrating four different stages or methods of processing a query using Large Language Models (LLMs). Each stage is represented by two circles labeled 'A' and 'B', connected by an arrow indicating the flow of information or processing. Each stage also has a description of the process occurring.
### Components/Axes
* **Nodes:** Circles labeled 'A' and 'B' represent different components or stages in the query processing.
* **Arrows:** Arrows indicate the direction of information flow or processing.
* **Text Labels:** Text below each diagram describes the specific process occurring in that stage.
* **Stage Numbers:** Each stage is numbered 1 through 4 in the top-left corner of each diagram.
### Detailed Analysis
**Stage 1:**
* **Nodes:** Circle A and Circle B
* **Arrow:** A directed arrow points from A to B.
* **Text:** "What's the highest mountain in the world" is written below the arrow.
* **Description:** This stage represents the initial query input. The query "What's the highest mountain in the world" is passed from A to B.
**Stage 2:**
* **Nodes:** Circle A and Circle B
* **Arrow:** A directed arrow points from A to B.
* **Text:** "<Processes the query with an LLM>" is written below the arrow.
* **Description:** This stage represents the processing of the query using an LLM. The query is processed from A to B.
**Stage 3:**
* **Nodes:** Circle A and Circle B
* **Arrow:** A double-headed arrow connects B to a magnifying glass icon with a gear inside, which is separated by a dotted line. A directed arrow points from A to B.
* **Text:** "<Invokes the RAG>" is written below the arrow.
* **Description:** This stage represents invoking the Retrieval-Augmented Generation (RAG) process. The RAG process is invoked from B.
**Stage 4:**
* **Nodes:** Circle A and Circle B
* **Arrow:** A directed arrow points from A to B.
* **Text:** "<Summarises the content with an LLM>" is written below the arrow.
* **Description:** This stage represents summarizing the content using an LLM. The content is summarized from A to B.
### Key Observations
* The diagram illustrates different approaches to processing a query using LLMs, including direct processing, RAG-based processing, and summarization.
* The stages are presented in a sequential manner, suggesting a possible workflow or comparison of different methods.
### Interpretation
The diagram provides a high-level overview of different methods for processing queries using LLMs. It highlights the use of RAG (Retrieval-Augmented Generation) as a distinct step, suggesting that it enhances the LLM's ability to answer the query. The diagram also shows the summarization of content with an LLM. The diagram suggests a progression from a simple query to more complex processing methods involving external knowledge retrieval and summarization.
</details>
Alice queries Bob again; this time, Bob asks to use a routine to query directly the RAG.
<details>
<summary>img/scenarios/s5-2.png Details</summary>

### Visual Description
## Diagram: LLM Query Processing Flows
### Overview
The image presents three distinct diagrams, each illustrating a different flow for processing queries using Large Language Models (LLMs). Each diagram features two nodes, labeled 'A' and 'B', connected by arrows indicating the direction of processing. The diagrams are numbered 1, 2, and 3 in the top-left corner of each flow.
### Components/Axes
* **Nodes:** Two nodes, labeled 'A' and 'B', represented as circles.
* **Arrows:** Arrows connecting the nodes, indicating the flow of information or processing.
* **Text Labels:** Descriptive text below each diagram explaining the process.
* **Icons:** An icon of a document in diagram 1, and a magnifying glass in diagram 2.
* **SKIP Label:** A "SKIP" label in red text in diagram 3.
### Detailed Analysis
**Diagram 1:**
* **Nodes:** Node A and Node B.
* **Arrow:** A solid black arrow points from Node A to Node B.
* **Icon:** A document icon is positioned to the right of Node A.
* **Text Label:** "<Formatted query>" is written below the arrow.
* **Flow:** Node A sends a formatted query to Node B.
**Diagram 2:**
* **Nodes:** Node A and Node B.
* **Arrow:** A solid black arrow points from Node A to Node B. A double-headed arrow connects Node B to an icon.
* **Icon:** A magnifying glass icon with a gear inside is positioned to the right of Node B, separated by a dashed vertical line.
* **Text Label:** "<Invokes the RAG>" is written below the arrow.
* **Flow:** Node A invokes the RAG (Retrieval-Augmented Generation) process on Node B.
**Diagram 3:**
* **Nodes:** Node A and Node B. Both nodes are gray.
* **Arrow:** A gray arrow points from Node A to Node B.
* **Text Label:** "<Processes the query with an LLM>" is written below the arrow.
* **SKIP Label:** The word "SKIP" is written in red above and to the right of Node B.
* **Red Dashed Border:** A red dashed border surrounds the entire diagram.
* **Flow:** This diagram represents a process that is skipped, where Node A processes the query with an LLM and sends it to Node B.
### Key Observations
* Diagram 1 shows a simple query formatting process.
* Diagram 2 illustrates the invocation of a RAG process.
* Diagram 3 represents a process that is skipped, possibly indicating an alternative or optional step.
### Interpretation
The diagrams illustrate different approaches to processing queries using LLMs. Diagram 1 represents a basic formatting step. Diagram 2 shows the use of Retrieval-Augmented Generation (RAG), a technique to improve LLM performance by retrieving relevant information. Diagram 3, marked as "SKIP," suggests a process that is bypassed, possibly due to specific conditions or configurations. The diagrams highlight the flexibility in designing LLM-based query processing pipelines. The "SKIP" diagram suggests a conditional execution path, where certain processing steps are omitted based on specific criteria.
</details>
Any query that complies with the document protocol now skips the first phase and directly invokes the RAG.
Appendix B Agora Specification
In this section, we provide a more formal description of Agora.
B.1 Transactions
An Agora transaction operates as follows. Suppose that an agent, Alice, is trying to communicate with another agent Bob:
- Alice sends to Bob over HTTPS a JSON document containing three fields:
- protocolHash: The hash of the protocol document. If natural language is used, then the value of protocolHash is null;
- protocolSources: A list of URIs where the protocol document can be found. Must be empty if protocolHash is null and non-empty otherwise;
- body: A string containing the body of the request as specified by the given protocol.
- If Bob does not have the protocol document, he fetches it (either from the sources provided by Alice or from another repository);
- If Bob is unable to use the protocol, he returns a JSON document with one field, namely status, which is equal to “rejected”;
- Otherwise, Bob computes the response using the LLM, routines, or a combination of both;
- Bob sends as response a JSON document with the following fields:
- status: a string indicating the status of the response (can be “success” or “failure”);
- body: the response returned by the agent.
- Note that "status":"failure" must be used only for errors that are not covered by the protocol document (e.g., the agent failing to instantiate the LLM); when the protocol prescribes how to handle an error, the agent should return "status":"success" and the correct error message as body.
B.2 Protocol Documents
A protocol document is identified by its SHA1 hash. Protocol documents can refer to other protocol documents by adding a preamble containing the protocol hash and at least one source for the protocol document. Further discussions concerning the optimal way to reference other protocol documents is left to future work.
B.3 Preferred Protocols
In order to signal to other agents which protocols are supported by default, an agent can expose an endpoint /.wellknown. This endpoint returns a JSON object where each key is a supported protocol hash and each value is a non-empty list of protocol sources.
Appendix C Example of Negotiation Process Between Two Agents
We report the negotiation process between Alice and Bob for a protocol to transmit weather forecast. Note that while the LLMs used Markdown-style formatting, for the sake of readability we replaced Markdown syntax with the corresponding typeset style.
Alice
Hello. I’m here to negotiate a protocol to query a weather forecast service. The service should accept a date and a location as input and return the predicted weather condition, temperature, and precipitation for that day. Before we start, I’d like to clarify the rules for our negotiation. We should focus on designing a simple and unambiguous protocol that specifies the exact format of the messages sent and received. We shouldn’t worry about how the messages will be delivered, and we should assume that the implementation will receive a string and return a string. To begin, I propose that we use a JSON-based protocol. The input message could be a JSON object with two properties: “date” and “location”. The output message could be a JSON object with three properties: “temperature”, “precipitation”, and “weatherCondition”. What are your thoughts on this proposal?
Bob
Your proposal aligns well with the goal of keeping the protocol simple and unambiguous. Using a JSON-based protocol for both the input and output messages is a good approach. Here’s a more detailed specification for the protocol: Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } •
date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. •
location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } •
temperature: A number representing the predicted temperature for that day in °C. •
precipitation: A number representing the predicted precipitation for that day in mm. •
weatherCondition A string representing the predicted weather condition for that day. Possible values are “sunny”, “cloudy”, “rainy”, “snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" } If you agree with this structure, we can finalize the protocol.
Alice
LLM response: I agree with the proposed protocol structure. It aligns well with the requirements and is simple and unambiguous.
Finalised Protocol
Name: Weather Forecast Query Protocol Description: A protocol for querying the weather forecast for a given date and location. Input Message The input message will be a JSON object with the following structure: { "date": "YYYY-MM-DD", "location": "string" } •
date: A string representing the date for which the weather forecast is requested, in the format YYYY-MM-DD. •
location: A string representing the location for which the weather forecast is requested. Output Message The output message will be a JSON object with the following structure: { "temperature": number, "precipitation": number, "weatherCondition": "string" } •
temperature: A number representing the predicted temperature for that day in °C. •
precipitation: A number representing the predicted precipitation for that day in mm. •
weatherCondition A string representing the predicted weather condition for that day. Possible values are ”sunny”, ”cloudy”, ”rainy”, ”snowy”. Example Input: { "date": "2023-10-01", "location": "New York" } Output: { "temperature": 22.5, "precipitation": 5.0, "weatherCondition": "cloudy" }
Appendix D 100 Agent Demo - Full Description
D.1 Implementation Notes
We implemented our demo in Python, using Flask servers as basis for our agents. Each agent is either a user or a server:
- Users receive a random task, some randomly generated data and a description of the task data (including its schema). Their objective is to execute the requested action and return a reply according to a certain schema. This allows us to generate a large number of queries without needing to handcraft them. Note that all tasks are single-round, i.e. they can be fulfilled in one round of communication;
- Servers receive queries from other users and reply to them using a combination of three types of tools:
- Database tools, which involve connecting to a personal SQL or MongoDB database (assigned at random). Depending on the server, some databases are initialised with dummy data;
- Mock tools, which are simplifications of actual tools (e.g., for taxi service agents, the assignTaxi tool is a mock tool that, instead of actually sending a taxi to a location, mimics the request flow);
- External tools, which are tools that enable the agent to start a Agora communication with a predefined server, although no information about the respective agents’ schema is provided. In other words, the skiLodge agent can open a channel with the weatherService agent
Moreover, we added three protocol databases, which are simple Flask servers that host protocol documents. The first protocol database is a peer with the second one, the latter of which is also a peer with the third protocol database (but the first protocol database is not a peer of the third one). Every 10 executed queries, one protocol databases shares its protocol documents with its peers. This simulates the propagation of protocol documents between different databases.
Picking a Protocol
Users track the number of communications with a given server about a certain type of task until it hits one of two thresholds: one for using a protocol instead of natural language and one for negotiating a protocol ex novo.
When the first threshold is hit, the user invokes the LLM to check if either the server or the reference protocol database (which is randomly assigned to the user at the start of the demo) already have a suitable protocol. If there are none, the user continues using natural language until the second threshold is hit: in that case, the user begins a negotiation with the server and submits the final protocol to the reference protocol database.
Similarly, each server has a counter that tracks the number of natural language communications with any user since the last negotiation. Once the counter hits a threshold, the server requests a negotiation with the user, regardless of how many of the tracked queries were sent by the current user. After negotiation, the counter is reset.
In our demo, we set the thresholds for the user to respectively 3 and 5 communications, and the threshold for the server to 10.
APIs
For GPT-4o and Gemini 1.5 Pro, we used respectively the OpenAI and Google API. For Llama 3 405b, we used the SambaNova API. Prices per million tokens are reported in Table 1.
Table 1: Prices per million tokens at the time of writing.
| GPT-4o | 5.00 | 15.00 |
| --- | --- | --- |
| Llama 3 405b | 5.00 | 10.00 |
| Gemini 1.5 Pro | 3.50 | 10.50 |
Bootstrapping Quality-of-Life Extensions
For the sake of bootstrapping the network, while implementing the demo we added two features to our nodes:
- Providing each node with a simple protocol for multi-round communication in natural language;
- Allowing the protocol document to include machine-readable metadata, such as the name or a short description of the protocol. This helps an agent to determine quickly which protocols, among a list of potential protocols, can be suitable for a certain task.
We leave whether these features should be integrated with the Agora standard, or whether they should be handled using PDs only, to future work.
D.2 Experimental Setup
Preliminary Tests
We first ran a series of qualitative tests to determine which among the considered LLMs (OpenAI GPT 4o, Llama 3 405b, Gemini 1.5 Pro) were the most suitable for negotiation and programming. We found that while all three LLMs were capable of negotiating and implementing protocols, GPT 4o was the most robust, followed by the Llama 3 405b and finally Gemini 1.5 Pro. Surprisingly, the main factor behind the brittleness of Gemini 1.5 Pro was not the model’s inherent performance, but rather the lack of robustness of the API itself: even with tailored retry systems, the API sometimes failed to respond in a nondeterministic manner (i.e. the same query would at times succeed and at times fail). We believe that our experience was due to temporary server issues, rather than fundamental problems with the model.
LLM Distribution
In light of our preliminary results, we manually assigned a model to each server node, following a power law consistent with our findings (9 nodes with GPT-4o, 4 nodes with Llama 3 405b, 2 nodes with Gemini 1.5 Pro). User agents were instead randomly assigned one of the three LLMs with uniform distribution. Overall, the breakdown of nodes by model is:
- GPT-4o: 38 nodes (9 server nodes, 29 user nodes)
- Llama 3 405b: 32 nodes (4 server nodes, 28 user nodes)
- Gemini 1.5 Pro: 30 nodes (2 server nodes, 28 user nodes)
Out of 1000 queries, 8 (representing thus 0.8% of the total query volume) failed due to Google’s Gemini API not responding. This phenomenon was unrelated to the use of Agora, with 500 Internal Server errors appearing both in the Agora demo and the natural language counterfactual with roughly the same frequency.
Task Distribution
To simulate the heterogeneity in communication frequency (i.e. how some nodes tend to be more active than others), we assigned to each user a “query budget” (which represents how many queries are sent by a given user) following a Pareto distribution with shape parameter equal to $0.5$ , adapted so that each user has at least 1 query. The query budget is then split between three randomly chosen types of queries using a Pareto law with a shape parameter of 1 and a minimum of 1 query per type (unless the budget is less than 3 queries). See Figure 6 for a visualisation of the distribution.
<details>
<summary>x3.png Details</summary>

### Visual Description
## Line Chart: Query Budget vs. User Position
### Overview
The image is a line chart showing the relationship between "Query Budget" (y-axis) and "User Position (Chosen Randomly)" (x-axis). The y-axis is on a logarithmic scale. The chart illustrates how the query budget decreases as the user position increases.
### Components/Axes
* **X-axis:** "User Position (Chosen Randomly)". The axis ranges from 0 to 80, with tick marks at intervals of 10.
* **Y-axis:** "Query Budget". The axis is on a log scale, ranging from 10^0 (1) to 10^2 (100).
* **Data Series:** A single line in a light teal color represents the query budget.
### Detailed Analysis
* **Trend:** The teal line shows a decreasing trend. The query budget decreases rapidly for lower user positions (0-10) and then decreases more gradually, with step-like drops, as the user position increases. After a user position of approximately 50, the query budget remains constant at approximately 1.
* **Data Points (Approximate):**
* User Position 0: Query Budget ~150
* User Position 10: Query Budget ~10
* User Position 20: Query Budget ~6
* User Position 30: Query Budget ~3
* User Position 40: Query Budget ~2
* User Position 50: Query Budget ~1
* User Position 60: Query Budget ~1
* User Position 70: Query Budget ~1
* User Position 80: Query Budget ~1
### Key Observations
* The query budget is highest for users in the initial positions and decreases significantly as the user position increases.
* The rate of decrease in the query budget slows down as the user position increases.
* After a user position of approximately 50, the query budget plateaus at a value of approximately 1.
### Interpretation
The chart suggests that the query budget allocated to users decreases as their position (presumably in a ranking or list) increases. This could be due to a strategy where more resources are allocated to users who are more likely to be relevant or valuable. The plateau at a user position of 50 indicates a minimum query budget that is allocated to all users beyond that position. The logarithmic scale on the y-axis emphasizes the rapid initial decrease in query budget.
</details>
Figure 6: Distribution of query budgets for users. The y axis is logarithmic.
D.3 Additional Observations
Cost Breakdown
The breakdown of cost by activity is as follows:
- Natural language communication: 54%;
- Negotiation: 6%;
- Checking the suitability of existing protocols 22%;
- Implementing the protocols: 17%;
Note that negotiation, despite being the most expensive activity (since it involves several rounds of communication), actually represented the smallest contribution to the total cost, with cheaper but more frequent operations (i.e. sending natural language messages and checking the suitability of protocols) making up the largest portion.
Similar Protocols
Due to the (intentional) partial insulation of nodes in the network, sometimes similar protocols emerged independently. Nevertheless, agents using different default protocols were still able to communicate by picking one of the available protocols; for the sake of simplicity, the preferred protocol is chosen by the sender.