2407.01489v2
Model: nemotron-free
# Agentless : Demystifying LLM-based Software Engineering Agents
> Contributed equally with author ordering decided byNigiri.
Abstract
Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless – an agentless approach to automatically resolve software development issues. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (32.00%, 96 correct fixes) and low cost ($0.70) compared with all existing open-source software agents! In fact, Agentless has already been adopted by OpenAI as the go-to approach to showcase the real-world coding performance of both GPT-4o and the new OpenAI o1 models. Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patches or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite- $S$ by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the currently overlooked potential of a simplistic, cost-effective technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction. We have open-sourced Agentless at: https://github.com/OpenAutoCoder/Agentless
1 Introduction
Large language models (LLMs) have become the go-to default choice for code generation [Chen et al., 2021; Austin et al., 2021; Li et al., 2023; Wei et al., 2023]. State-of-the-art LLMs like GPT-4 [OpenAI, 2023] and Claude 3.5 Sonnet Anthropic [2024] have demonstrated their prowess in being able to synthesize code snippets based on given user description. However, compared to the main evaluation setting of simple, self-contained problems, applying LLMs on repository-level software engineering tasks has been understudied. Software engineering tasks like feature addition, program repair, and test generation require an in-depth understanding of not only information within files, containing thousands of lines of code, but also repository-level dependencies across files.
Recently, to address the gap and evaluate the ability of tools to automatically solve real-world software engineering problems, the popular SWE-bench Jimenez et al. [2024a] benchmark has been developed. In SWE-bench, each problem consists of a real-world GitHub issue description and the corresponding Python repository. The task is to modify the repository to resolve the issue, either fixing a bug or introducing a new feature. Recently, the authors have published a subset of the benchmark – SWE-bench Lite swe [2024] (300 problems) that performs further filtering and focuses on bug fixing issues.
To solve the challenging real-world software development problems from SWE-bench, inspired by the Devin AI Software Engineer dev [2024], there has been a significant body of work from both academia and industry focusing on developing agent-based approaches Zhang et al. [2024c]; Gauthier [2024]; Yang et al. [2024a]; Chen et al. [2024]; Ma et al. [2024]; Bouzenia et al. [2024]. While there is not a fixed definition for agent-based approaches, they generally equip LLMs with a set of tools and allow agents to iteratively and autonomously perform actions, observe feedback, and plan future steps Liu et al. [2024c]. Example tools can include the ability to open/write/create files, search for code lines, run tests, and execute shell commands. In each attempt to solve a problem, agent-based approaches will have multiple turns, where each turn consists of performing an action. Subsequent turns depend on previous actions and the feedback information the agent receives from the environment.
At first glance, agent-based approaches appear to be a natural and straightforward way to tackle software development tasks. After all, human developers also perform similar actions and use feedback to plan future steps. However, the disparity between human and current LLM abilities leads to the following limitations of agent-based approaches:
- Complex tool usage/design. To utilize tools, current agent-based approaches apply an abstraction layer between the agent and the environment. Examples are mapping real actions to API calls so that agents can use tools by outputting an API call instruction. However, such abstractions and API call specifications require careful design of input/output formats and can easily lead to incorrect or imprecise tool design/usage, especially for more complex action spaces. Given the iterative nature of agent-based approaches, where current action/plan depends on previous turns, incorrectly or imprecisely defining/using a tool can both reduce performance and incur additional cost in wasted LLM queries.
- Lack of control in decision planning. In addition to using tools, current agent-based approaches also delegate the decision-making process to the agents, allowing them to decide when and what action to perform. The agents decide the current action to take based on previous actions taken and the feedback provided by the environment, often with minimal checks to ensure the action taken make sense. Due to the large possible action space and feedback response, it can be extremely easy for autonomous agents to become confused and perform sub-optimal explorations. Furthermore, to solve an issue, an agent can take upwards of 30 or 40 turns, which makes it extremely difficult to both understand the decisions made by the agents and also debug the exact turns where an incorrect decision is made.
- Limited ability to self-reflect. Existing agents struggle with the capability to perform self-reflection Olausson et al. [2023]; Huang et al. [2024]. That is to say they tend to take all information/feedback and do not know how to filter out or correct irrelevant, incorrect, or misleading information Shi et al. [2023]; Zhang et al. [2023]. The limited ability to self-reflect means that an incorrect step can be easily amplified and negatively affect all future decisions made by the agent.
In this paper, we advocate that instead of rushing to develop increasingly complex LLM agent-based approaches and tools for software development (which can also be non-trivial to use or replicate due to the fully autonomous setup), we should first take a step back and ask the following introspective question: Do we really have to employ complex autonomous software agents?
Our work. We set out to answer this important question by building Agentless – an agentless approach to automatically resolve software development issues. To solve each issue, Agentless follows a simple three phase process: localization, repair, and patch validation. In the localization process, Agentless employs a hierarchical process to first localize the fault to specific files, then to relevant classes or functions, and finally to fine-grained edit locations. Agentless 's localization process make uses of both LLM-based localization as well classic information-retrieval-based localization idea Zhou et al. [2012]. To perform repair, Agentless takes the localized edit locations and generates multiple candidate patches in a simple diff format. At the same time, Agentless generates reproduction tests that can reproduce the original error and help with candidate patch selection. Finally, Agentless re-ranks all remaining patches and selects one to submit in order to fix the issue. While Agentless leverages LLMs to perform each detailed task, unlike prior complex agent-based tools, Agentless does not allow LLMs to autonomously decide future actions or operate with any complex tools. Our deliberate choice to avoid using agents not only allows Agentless to have a simplistic and straightforward design that is easy to understand, but also helps avoid the above-mentioned limitations of LLM agents in software development. We evaluate Agentless on the popular SWE-bench Lite [swe, 2024] benchmark and demonstrate that Agentless not only achieves the highest performance (32.00%) among all open-source approaches, but also does so at a fraction of the cost!
Furthermore, we performed a fine-grained manual analysis on the SWE-bench Lite dataset and classified all its problems into different categories across dimensions like problem description, ground truth patch, and bug location information. Surprisingly, we observed that SWE-bench Lite contains problems with exact ground truth patch in the description (4.3%), problems with missing critical information needed to solve the issue (10.0%), and problems that include misleading solutions in the issue description (5.0%). Recognizing these issues, we built SWE-bench Lite- $S$ , which removes such problematic problems, and serves as a more rigorous benchmark to evaluate the ability to solve real-world software development challenges. Overall, in an era focused on achieving top placements on leaderboards, our work highlights the overlooked potential of a simplistic, cost-effective technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.
Contributions. In this work, we make the following contributions:
- An agentless approach. We propose Agentless, an agentless approach to automatically solve software development problems. Agentless leverages LLM-empowered prompting-based and embedding-based retrieval to perform hierarchical localization. During repair, Agentless samples multiple candidate patches in a simple diff format for efficient patch generation. Agentless then generates reproduction tests to verify that the issue has been resolved. Finally, Agentless leverages both regression tests and generated reproduction tests to select the final submission patch.
- Extensive evaluation. We evaluate Agentless on the popular SWE-bench Lite dataset comparing against state-of-the-art agent-based approaches. Our results demonstrates that Agentless is able to achieve higher performance (32.00%, 96 correct fixes) than all open-source approaches, with comparably low cost as well. This shows the previously overlooked potential of a simplistic technique in autonomous software development. Additionally, Agentless has already been adopted by OpenAI as the go-to approach to showcase the real-world coding performance of GPT-4o Chowdhury et al. [2024] as well as the new o1 model family OpenAI [2024c] We further perform a rigorous ablation study to understand the effectiveness of different components of Agentless on the final performance.
- SWE-bench Lite- $S$ benchmark. We performed manual classifications on the problems in the popular SWE-bench Lite dataset. We found that there are problematic problems with unclear, misleading issue descriptions, as well as problems that contain exact ground truth patches. To address these issues, we constructed a filtered dataset of SWE-bench Lite- $S$ that excludes such problematic issues, enabling more rigorous evaluation and comparison. This has also been confirmed recently by OpenAI, which acknowledged our benchmark and released SWE-bench Verified Chowdhury et al. [2024] along the same direction.
2 Background and Related Work
2.1 Agent-based Software Engineering
With the emergence and popularity of agent-based frameworks Xi et al. [2023], recently researchers and industry practitioners have begun developing agent-based approaches to solve software engineering tasks. Devin dev [2024] (and OpenDevin ope [2024b], open-source alternative), is one of the first end-to-end LLM agent-based framework. Devin uses agents to first perform planning based on user requirement, then allows the agent to use file editor, terminal, and web search engine tools to iteratively perform the task. SWE-agent Yang et al. [2024a] designs a custom agent-computer interface (ACI) that allows the LLM agent to interact with the repository environment with actions such as reading, editing files, and running bash commands. Aider Gauthier [2024] first provides a detailed repository map constructed with static and call graph analysis to the LLM to localize the files that require editing; then it generates a simple diff format as the editing patch and uses regression testing to verify if the patch is plausible. Moatless moa [2024] is another open-source agent tool that obtains relevant code locations by providing the agent with both code search tools as well as retrieval methods using LLM-constructed queries. Similar to Aider, Moatless also generates a simple diff format as the final submitted patch. AutoCodeRover Zhang et al. [2024c] further provides the LLM agent with specific code search APIs (e.g., searching methods in a certain class) to iteratively retrieve code context and locate the bug locations. SpecRover Ruan et al. [2024] later improves over AutoCodeRover and targets specifications (i.e., inferring the intended program behavior) by generating function summaries and also feedback messages during specific agent steps. Furthermore, SpecRover also attempts to generate reproduction tests to reproduce the original issue used to select the final patch. In addition to these highlighted examples, there has been a plethora of other agent-based approaches developed in both open-source Gauthier [2024]; rep [2024]; app [2024]; Zhang et al. [2024d] and close-source/commercial products Bouzenia et al. [2024]; Chen et al. [2024]; Ma et al. [2024]; lin [2024]; fac [2024]; ibm [2024]; ope [2024a]; Liu et al. [2024b]; ama [2024]; sup [2024]; gru [2024]; iso [2024]; men [2024]; hon [2024]; cod [2024].
Compared to these agent-based techniques, Agentless offers a simplistic, interpretable, and cost-effective solution to tackle real-world software engineering issues. Different from agent-based tools, Agentless contains well-defined stages of localization, repair, and patch validation without letting the LLM agent decide future actions or use complex tools. Agentless demonstrates for the first time that an agentless approach can achieve very competitive performance, without the additional baggage of having to provide excessive tools or model complex environment behavior/feedback.
2.2 Fault Localization and Program Repair
Fault localization (FL) Wong et al. [2016] techniques aim to identify the suspicious locations (e.g., statements or methods) in source code related to bugs. Dynamic FL techniques mainly include spectrum-based FL (SBFL) Jones and Harrold [2005]; Abreu et al. [2007, 2009] and mutation-based FL (MBFL) Papadakis and Le Traon [2015]; Moon et al. [2014]; Lou et al. [2020]. SBFL typically computes source code locations primarily covered by failing tests as more suspicious than locations primarily covered by passing tests. MBFL further improves upon that to additionally consider the impact of each source code location on the test outcomes (measured using mutation testing Papadakis et al. [2019]). Besides dynamic techniques, researchers have also proposed to directly leverage information retrieval (IR) techniques Singhal et al. [2001] for static FL. Such IR-based techniques Wang et al. [2015]; Saha et al. [2013]; Wang and Lo [2014] formulate FL as a search problem and compare the textual similarity between code elements and the bug report (i.e., query). Moreover, learning-based techniques have also been proposed to leverage machine learning to combine multiple sources of dynamic/static information, including DeepFL Li et al. [2019], FLUCCS Sohn and Yoo [2017], and TRANSFER Meng et al. [2022]. Recently, researchers have proposed LLM-based FL Yang et al. [2024b]; Wu et al. [2023]; Qin et al. [2024]; Kang et al. [2024], which leverages the powerful code and natural language understanding of modern LLMs to directly localize bugs. Meanwhile, most such LLM-based techniques either cannot perform repository-level FL due to the limited context window of LLMs Yang et al. [2024b], or rely on complicated/costly agentic design to navigate through the codebase Kang et al. [2024]; Qin et al. [2024]. In contrast, Agentless employs a simplistic hierarchical FL process (based on both LLMs and IR) to efficiently compute the fine-grained edit locations.
After localizing the bug, the next step is to perform repair. Automated program repair Gazzola et al. [2019] (APR) has been widely studied to automatically generate patches for bugs. Traditional APR techniques can be categorized as template-based Liu et al. [2019]; Ghanbari et al. [2019], heuristic-based Le Goues et al. [2012]; Le et al. [2016], and constraint-based Mechtaev et al. [2016]; Long and Rinard [2015] tools. While effective, traditional APR tools suffer from scalability issues and are limited by their patch variety. As such, researchers have proposed learning-based APR tools either by training NMT (neural machine translation) models Jiang et al. [2021]; Li et al. [2020]; Chen et al. [2019]; Zhu et al. [2021] or using pre-trained LLMs to perform repair Xia and Zhang [2022]; Xia et al. [2023a]; Kolak et al. [2022]; Zhang et al. [2024a]. Specifically, LLM-based APR tools, which sample multiple candidate patches per bug, have been shown to be the state-of-the-art due to the powerful coding capability of modern LLMs Xia et al. [2023a]. More recently, agent-based APR techniques have also been proposed Xia and Zhang [2023]; Bouzenia et al. [2024]; Hidvégi et al. [2024]; Chen [2024]; Zhang et al. [2024b]. Inspired by existing LLM-based APR tools, Agentless samples multiple candidate patches per issue to maximize the chance of generating a correct fix. Different from most LLM-based APR techniques, Agentless generates patches using a simple diff format Gauthier [2024] to avoid generating the complete code and instead focus on producing cost-efficient small edits, increasing the reliability and accuracy of patch generation (less chances for hallucination). Furthermore, different from the simplistic bugs studied in most prior work Xia et al. [2023a], Agentless targets complex repository-level issues spanning multiple locations.
2.3 LLM-based Test Generation
In addition to localizing and repairing bugs, another research area that has been adopting LLM is test generation. One area of test generation is fuzz testing Zeller et al. [2019], also known as fuzzing, to generate large amounts of inputs in order to expose bugs in systems. Researchers have applied LLMs to perform fuzzing in domains such as DL libraries Deng et al. [2023, 2024], OS Kernel Yang et al. [2023b]; Oliinyk et al. [2024], compilers Xia et al. [2024]; Yang et al. [2023a]; Ou et al. [2024], network protocols Meng et al. [2024], and mobile applications Liu et al. [2024a]. LLM-based fuzzers have demonstrated their impact by detecting many bugs not found by traditional fuzzers as well as unlocking new fuzzing domains. Besides fuzzing, researchers have also proposed to leverage LLMs for unit test generation to test individual software units (e.g., methods/classes) Liu et al. [2024c], such as CodeMosa Lemieux et al. [2023], ChatTester Yuan et al. [2024], TestPilot Schäfer et al. [2023], and CoverUp Pizzorno and Berger [2024].
Bug reproduction is a critical step in investigating bug reports Jin and Orso [2012], and is integrated into many recent software engineering agents ope [2024b]; Ruan et al. [2024]; hon [2024]; men [2024]; fac [2024]; Yang et al. [2024a]; Arora et al. [2024]. For example, SpecRover Ruan et al. [2024] begins by generating a test to reproduce the issue described in the bug report; the test then guides the context retrieval and patching process.Unlike such agent-based approaches, which generate a reproduction test and rely on an LLM agent to decide whether the test is correct, Agentless simply executes multiple sampled tests and verifies if the execution results indicate the issue has been reproduced.
3 Agentless
<details>
<summary>2407.01489v2/resources/grinning-cat_1f63a.png Details</summary>

### Visual Description
# Image Analysis Report
## Image Description
The image depicts a stylized cartoon representation of a cat's face. Key visual elements include:
### Facial Features
- **Face Shape**: Square with rounded edges
- **Primary Color**: Bright yellow (#FFD700)
- **Ears**:
- Position: Top corners of the face
- Shape: Triangular
- Color: Orange (#FFA500)
- **Eyes**:
- Shape: Oval
- Color: Black (#000000)
- Placement: Centered horizontally, 1/3 from top
- **Nose**:
- Shape: Triangle
- Color: Pink (#FFC0CB)
- Placement: Centered below eyes
- **Mouth**:
- Shape: Curved "W" (smiling)
- Color: Purple (#800080)
- Placement: Centered below nose
- **Whiskers**:
- Count: 3 per side
- Color: Orange (#FFA500)
- Placement: Extending from cheek area
### Style Characteristics
- Flat design with no gradients or textures
- Solid color fills
- No background elements (transparent/white)
- Minimalist aesthetic
## Technical Specifications
- **Format**: Digital illustration
- **Resolution**: Not applicable (vector-based)
- **Color Mode**: RGB
- **File Type**: Not specified (assumed PNG/SVG)
## Data Extraction
No textual information, labels, axis titles, legends, or data tables present in the image. The illustration contains no embedded text or quantitative data.
## Cross-Reference Verification
N/A (No legends or data elements to cross-reference)
## Conclusion
This image serves as a decorative element rather than a data visualization. All visual components have been documented above for technical reference purposes.
</details>
Approach
<details>
<summary>2407.01489v2/x1.png Details</summary>

### Visual Description
# Technical Document Extraction: Patch Validation Workflow
## Overview
This diagram illustrates a multi-stage workflow for resolving codebase issues through automated patch generation and validation. The process involves issue localization, test reproduction, patch generation, and validation phases.
---
## Key Components & Flow
### 1. Project Codebase
- **Structure**:
- Files: `views/`, `init.py`, `static.py`, `generic/`, `detail.py`, `list.py`
- Code Snippets:
```python
# Example from static.py
class ReqeValidator:
def __call__():
...
```
### 2. Issue Identification
- **Primary Issue**:
- "Lamby misinterprets some matrix expressions using lamby on an..."
- Repeated across multiple sections with slight contextual variations
### 3. Issue Localization
- **Steps**:
1. **Locate to Top-N Files** (Step 1)
- Files: `static.py`, `detail.py`, `generic/detail.py`
2. **Reproduce** (Step 2)
- Code: `def test(): assert ...`
3. **Localize to Classes & Functions** (Step 3)
- Classes: `ReqeValidator`, `Handler`, `ExtHandler`
- Functions: `edit_matrix()`
4. **Localize to Edit Locations** (Step 4)
- Code: `def edit_matrix(): ...`
5. **Edit Validation** (Step 5)
- Validation logic for matrix expressions
### 4. Test Generation & Patch Creation
- **Reproduce Tests** (Step 6)
- Generated: `test.py` with `def test(): assert ...`
- **Generate Patches** (Step 7)
- Patch Files: `def test(): assert ...`
- **Filter & Rank Patches** (Step 8)
- Patch IDs: `Patch-12-20`, `Patch-12-20-1`
- Ranking metrics: Not explicitly defined in diagram
### 5. Patch Validation
- **Validation Phases**:
1. **Localization** (Step 9)
- Confirms patch targets correct code locations
2. **Repair** (Step 10)
- Automated code correction
3. **Patch Validation** (Step 11)
- Final validation check
- **LLM Embedding** (Step 12)
- Uses large language models for context-aware validation
---
## Color-Coded Sections
| Color | Section | Purpose |
|--------|-----------------------|----------------------------------|
| Blue | Project Codebase | Source code repository |
| Purple | Issue Localization | Identifying problem locations |
| Green | Test Generation | Creating test cases |
| Red | Patch Validation | Final validation checks |
---
## Code Snippets
```python
# From static.py
class ReqeValidator:
def __call__():
...
# From detail.py
def load_middleware():
...
# Generated test.py
def test():
assert ...
```
---
## Critical Data Points
1. **Patch IDs**: `Patch-12-20`, `Patch-12-20-1`
2. **File Paths**:
- `views/`
- `static.py`
- `generic/detail.py`
- `list.py`
3. **Validation Metrics**:
- Line ranges: `120-230`
- Code complexity: Not quantified in diagram
---
## Workflow Diagram Structure
```
[Project Codebase]
→ [Issue Identification]
→ [Localization Steps 1-5]
→ [Test Generation]
→ [Patch Creation]
→ [Validation Phases 9-12]
```
---
## Missing Information
- No explicit success/failure criteria defined
- No performance metrics (e.g., patch acceptance rate)
- No version control system details
- No deployment pipeline information
This extraction captures all textual elements, code snippets, and structural relationships visible in the diagram. The workflow emphasizes automated issue resolution through systematic localization, testing, and validation processes.
</details>
Figure 1: Overview of Agentless.
Figure 1 shows the overview of Agentless, consisting of three phases: localization, repair, and patch validation. We first take in the issue description and the existing project codebase as input. Then, we begin our hierarchical localization process by turning the project codebase into a tree-like structure that illustrates the relative location of each file in the project 1 . Next, using this repository structure along with the original issue description, we prompt the LLM to localize and rank the top N most suspicious files that likely require editing to solve the issue 2 . Since our repository structure format does not contain detailed source code information, we additionally retrieve files with most relevant code snippets with the issue description using embedding-based retrieval 3 . We then combine the retrieved files with the LLM-localized files to obtain the final list of suspicious files. However, not all contents in each file need to be modified. As such, we provide a skeleton for each file (i.e., a list of declaration headers of the classes and functions) and ask the LLM to output a specific list of classes and functions that we should examine more closely to fix the bug 4 . We then provide the complete code content of the previous locations and ask the LLM to finalize a smaller set of edit locations (i.e., classes, functions, or even specific lines) 5 . For the repair phase, we provide the code snippets at these edit locations together with the issue description and prompt the LLM to sample multiple patches to solve the issue 6 . Next, we enter the patch validation phase, where we first ask the LLM to sample multiple reproduction tests that aim to replicate the original issue 7 , and then select the optimal one based on actual execution results on the original codebase 8 . Agentless uses the reproduction test along with existing regression tests for patch ranking/selection 9 . Finally, Agentless selects the top-ranked patch as the final patch for submission 10 . We now describe the steps in each of Agentless 's phases in more detail.
3.1 Localization
<details>
<summary>2407.01489v2/resources/weary-cat_1f640.png Details</summary>

### Visual Description
# Technical Document Extraction: Image Analysis
## Image Description
The image depicts a stylized cartoon cat with the following characteristics:
- **Primary Color**: Orange (body, ears, whiskers)
- **Facial Features**:
- Two large white circular eyes
- Pink triangular nose
- Purple oval mouth (open expression)
- **Ears**: Triangular, orange with darker orange inner shading
- **Whiskers**: Three orange lines on each side of the face
- **Background**: Light gray (neutral, non-patterned)
## Textual Analysis
- **No textual elements** (labels, axis titles, legends, data tables, or embedded text) are present in the image.
- **No numerical data**, categorical labels, or diagrammatic components (e.g., flowcharts, heatmaps) are visible.
## Structural Notes
- The image is a flat, two-dimensional illustration with no depth or perspective.
- No cross-referencing of legend colors/labels is required, as no such elements exist.
## Conclusion
This image contains no factual, numerical, or categorical data. It is a purely graphical representation of a cartoon cat with no embedded information requiring extraction.
</details>
To fix or implement a new feature, the first step is to obtain the locations in the source code, as without the correct locations, it can be impossible to provide the right edits. The difficulty lies in the fact that there could be hundreds of files with thousands of lines of code each in a repository, whereas the correct locations to edit are only a few selected lines or functions. Agentless addresses this by using a simple three-step hierarchical localization process: 1) localize to suspicious files; 2) localize each selected files into relevant classes, functions, and variables; 3) localize to code edit locations.
3.1.1 Localize to suspicious files.
First, Agentless narrows down potential locations to specific suspicious files. Instead of providing the complete code snippet for each file, Agentless constructs a concise representation of the repository's file and directory structure, similar to the Linux tree command. We refer to this as the repository structure format, which begins with the root folder of the repository and organizes code files or folder names. Files and folders at the same directory level are aligned vertically, and files/folders in sub-directories are indented. We recursively traverse the entire repository to obtain the structure, which will be used as input for the LLM. The repository structure format provides the necessary file paths alongside the neighboring file names to maintain organizational information in the original codebase. Agentless then inputs the processed repository structure along with the original issue description to an LLM, and requests it to identify a list of the top N suspicious files that need further inspection or modification to resolve the issue.
To compliment the prompting-based localization (using file names only), Agentless also uses a simple embedding-based retrieval method to identify additional suspicious files. However, instead of embedding all files in the repository, Agentless first filters out irrelevant folders. This is done by providing the previously described repository structure and asking the LLM to produce a list of irrelevant folders that do not need to be further inspected or modified to resolve the issue. After removing all files from these irrelevant folders, Agentless divides each remaining file into chunks of code segments and computes the embedding for each chunk using an embedding model. Agentless then embeds the original issue description (i.e., the query) and computes the cosine similarity between the resulting query embedding and each chunk embedding to retrieve a list of relevant files that contain code segments with the highest similarity to the query. Finally, Agentless combines the files obtained via prompting with those retrieved via embedding by selecting top N most common files localized by both, resulting in a final list of relevant files.
<details>
<summary>2407.01489v2/x2.png Details</summary>

### Visual Description
# Technical Document Extraction: Code Snippet Analysis
## Overview
The image contains a Python code snippet defining two custom form field classes: `UUIDField` and `JSONField`, both inheriting from `CharField`. The code includes method definitions, error handling, and widget configurations.
---
## Class Definitions
### `UUIDField` Class
```python
class UUIDField(CharField):
default_error_messages = {
'invalid': _("Enter a valid UUID.")
}
def prepare_value(self, value):
# Implementation not shown
pass
def to_python(self, value):
# Implementation not shown
pass
```
#### Key Components:
- **Inheritance**: Extends `CharField`.
- **Error Messages**:
- `default_error_messages`: Dictionary with key `'invalid'` mapped to a localized error message.
- **Methods**:
- `prepare_value(self, value)`: Prepares the field value (implementation omitted).
- `to_python(self, value)`: Converts the input value to Python type (implementation omitted).
---
### `JSONField` Class
```python
class JSONField(CharField):
default_error_messages = {
'invalid': _("Enter a valid JSON.")
}
widget = Textarea
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.encoder = None
self.decoder = None
def to_python(self, value):
# Process JSON path from the right-hand side
pass
def slugify(self, value, allow_unicode=False):
# Implementation not shown
pass
```
#### Key Components:
- **Inheritance**: Extends `CharField`.
- **Error Messages**:
- `default_error_messages`: Dictionary with key `'invalid'` mapped to a localized error message.
- **Attributes**:
- `widget`: Set to `Textarea` for form rendering.
- `encoder`: Defaults to `None`.
- `decoder`: Defaults to `None`.
- **Methods**:
- `__init__(self, *args, **kwargs)`: Initializes parent class and sets `encoder`/`decoder` to `None`.
- `to_python(self, value)`: Processes JSON input (comment hints at right-hand side path processing).
- `slugify(self, value, allow_unicode=False)`: Slugifies the input value (implementation omitted).
---
## Symbols and Icons
- **Skull Icon**: Located in the top-right corner (likely a decorative element).
- **Document Icon**: Blue paper icon below the skull (possibly indicating file-related functionality).
---
## Notes
- All method implementations (e.g., `prepare_value`, `to_python`) are omitted in the snippet.
- The `slugify` method includes a parameter `allow_unicode=False` for Unicode handling.
- Comments in `to_python` suggest JSON path processing logic.
This extraction captures all textual elements, method signatures, and structural details from the code snippet.
</details>
Figure 2: File skeleton format.
3.1.2 Localize to related elements.
After obtaining the list of suspicious files, Agentless proceeds to the second part of the localization process: localize the related elements within these files. Directly providing the complete context of all files can be large. As such, Agentless builds a compressed format of each file that contains the list of class, function, or variable declarations. We refer to this format as the skeleton format, with an example shown in Figure 2. In the skeleton format, we provide only the headers of the classes and functions in the file. For classes, we further include any class fields and methods (signatures only). Additionally, we also keep comments in the class and module level to provide further information. Compared to providing the entire file context to the model, the skeleton format is a much more concise representation, especially when the file contains thousands of lines, making it impractical/costly to process all at once with existing LLMs. We provide the skeleton of all suspicious files to the LLM at one time in a single prompt, enabling the model to comprehensively analyze the pertinent information and decide the most relevant elements. Using this input, we prompt the LLM to provide a list of related classes and functions that one should examine to fix the provided issue.
<details>
<summary>2407.01489v2/x3.png Details</summary>

### Visual Description
# Technical Document Extraction: Code Editing Workflow Diagram
## Diagram Overview
The image depicts a **code editing workflow** involving **search/replace operations** to fix an issue in a Python application. The process involves three primary components:
1. **Original File**
2. **Edited File**
3. **Search/Replace Diff Generation**
---
## Component Breakdown
### 1. Original File
- **Code Snippet**:
```python
from flask import Flask
app = Flask(__name__)
@app.route("/")
```
- **Visual Elements**:
- Blue file icon in the bottom-right corner.
- Label: `Original File` (below the code block).
### 2. Edited File
- **Code Snippet**:
```python
+ import math # Added line (highlighted in green)
from flask import Flask
app = Flask(__name__)
@app.route("/")
```
- **Visual Elements**:
- Green `+` symbol indicating an addition.
- Label: `Edited File` (below the code block).
### 3. Search/Replace Diff Generation
- **Robot Icon**:
- Blue robot with red eyes and yellow antenna.
- Label: `Generate Search/Replace Diff to fix issue`.
- **Search/Replace Diff Section**:
- **Search Block** (highlighted in red):
```text
<<<<<<< SEARCH
from flask import Flask
import math
from flask import Flask
```
- **Replace Block** (highlighted in green):
```text
>>>>>>> REPLACE
from flask import Flask
from flask import Flask
```
- **Action**:
- Label: `Apply diff to original file to perform edit`.
---
## Flowchart Arrows
1. **Original File → Edited File**:
- Indicates the transition from the original code to the modified version.
2. **Edited File → Search/Replace Diff**:
- Shows the generation of the diff to resolve the issue.
3. **Search/Replace Diff → Original File**:
- Represents applying the diff to the original file.
---
## Key Observations
- **Issue Context**:
- A pink box labeled `ISSUE:` is present but contains unreadable text.
- **Color Coding**:
- **Red**: Search operations (deletions/additions to remove).
- **Green**: Replace operations (additions to retain).
- **Code Consistency**:
- Both files retain `from flask import Flask` and `@app.route("/")`.
- The edited file adds `import math` (highlighted in green).
---
## Workflow Summary
1. Start with the **Original File** containing Flask imports.
2. Modify the file by adding `import math` (resulting in the **Edited File**).
3. Generate a **Search/Replace Diff** to identify changes.
4. Apply the diff to the original file to implement the edit.
---
## Diagram Structure
- **Top-to-Bottom Flow**:
`Original File` → `Edited File` → `Search/Replace Diff` → `Apply diff to original file`.
- **Visual Hierarchy**:
- Code blocks are centrally aligned.
- Labels and icons are positioned adjacent to their respective components.
---
## Missing Information
- The `ISSUE:` label’s text is illegible due to small font size.
- No explicit axis titles or legends are present (the diagram is flowchart-based, not a chart).
---
## Final Notes
This diagram illustrates a **code versioning workflow** using search/replace diffs to manage changes in a Python application. The use of color coding (red/green) aligns with standard diff tools (e.g., Git). The process emphasizes incremental editing and automated diff generation for reproducibility.
</details>
Figure 3: Search/Replace edit format.
3.1.3 Localize to edit locations.
The previous localization step provided us with a list of related code elements; since we localize top N suspicious files, these localized related code elements could be from different files. We now directly provide the code content from these elements to the LLM and ask it to localize specific edit locations. Compared to using the entire file, the input context here is much smaller. With this input, we then ask the LLM to identify the final set of edit locations, specified by line numbers, functions, or classes. Our simple hierarchical localization allows Agentless to select a set of relevant code snippets as edit locations for repair.
3.2 Repair
<details>
<summary>2407.01489v2/resources/cat-with-wry-smile_1f63c.png Details</summary>

### Visual Description
# Technical Document Extraction: Cartoon Cat Face Image
## Image Description
The image depicts a stylized cartoon representation of a cat face. Below is a detailed breakdown of its components:
### **Visual Elements**
1. **Head Shape**:
- Square-shaped head with rounded edges.
- Primary color: Bright orange (#FFA500).
2. **Ears**:
- Two triangular ears positioned at the top corners of the head.
- Inner ear color: Darker orange (#FF8C00).
- No additional detailing (e.g., fur texture).
3. **Eyes**:
- Two black, vertically oriented ovals.
- Eyebrows: Curved upward (black lines) to convey an angry or intense expression.
4. **Nose**:
- Pink triangular shape centered between the eyes.
5. **Mouth**:
- Single black curved line on the right side of the face, forming a smirk or sneer.
6. **Whiskers**:
- Three orange lines on each side of the face, extending outward.
7. **Background**:
- Plain white (#FFFFFF) with no additional elements.
### **Textual Analysis**
- **No embedded text, labels, legends, or data points** are present in the image.
- The image is purely illustrative, with no charts, diagrams, or data tables.
### **Technical Notes**
- The design uses flat, solid colors without gradients or shading.
- The cat’s expression is conveyed through minimalistic line work (eyebrows, mouth).
- No cross-referencing of legends or axis markers is required, as none exist.
## Conclusion
This image serves as a simple, high-contrast icon or emoji-style representation of a cat. It lacks technical or data-driven elements, focusing solely on visual symbolism.
</details>
In the repair stage, the goal is to produce the correct patch to solve the issue. Following existing work on LLM-based program repair Xia and Zhang [2022]; Kolak et al. [2022]; Xia et al. [2023b]; Jiang et al. [2023], we first utilize the identified edit locations and construct a context window of code snippets to provide to the LLM for repair. For example, if the identified location was a class from line 40 to 78, we would produce a context window of [40 - x, 78 + x] where x denotes the context window size. The intuition behind adding the additional code before and after the identified location is to provide the LLM with relevant contextual information for better program repair Xia and Zhang [2022]. If multiple edit locations are identified, we would concatenate these context windows together separated with ``... '' to indicate missing context in the middle.
Using the code snippets, we then ask the LLM to generate patches to solve the issue. However, instead of directly producing the entire code snippet to replace the entire given context, Agentless asks the LLM to generate a Search/Replace edit Gauthier [2024]: a simple diff format to efficiently create each patch. Figure 3 shows an example of the Search/Replace format containing two main parts: 1) search: the original code snippet we want to replace and 2) replace: the replacement code snippet we want to replace with. To apply the generated Search/Replace diff to the original file, we can simply match the search code snippet and replace it with the replacement. This simple diff format avoids generating the complete code and instead focuses on producing small edits, which are not only more cost-efficient, but also more reliable and accurate (less chances for hallucination). For each issue, Agentless uses the LLM to generate multiple potential patches (starting with greedy and then sample multiple patches with higher temperature).
3.3 Patch Validation
<details>
<summary>2407.01489v2/resources/grinning-cat-with-smiling-eyes_1f638.png Details</summary>

### Visual Description
# Image Analysis Report
## Image Description
The image depicts a stylized, cartoon representation of a cat's face. Below is a detailed breakdown of its components:
### Geometric Structure
- **Overall Shape**: Square with rounded corners.
- **Background**: Plain white.
### Color Palette
- **Primary Color**: Bright yellow (face).
- **Secondary Colors**:
- Orange (ears, whiskers).
- Pink (nose).
- Purple (mouth).
- Black (eyes).
### Facial Features
1. **Ears**:
- Two triangular shapes positioned at the top corners of the square.
- Inner fill: Orange.
- Outline: Yellow (matches face color).
2. **Eyes**:
- Two black crescent shapes centered horizontally.
- Curved upward to imply a smiling expression.
3. **Nose**:
- Pink triangular shape located between the eyes.
- Positioned slightly below the midpoint of the face.
4. **Mouth**:
- Purple semicircular shape beneath the nose.
- Curved upward to reinforce the smiling expression.
5. **Whiskers**:
- Three orange lines on each side of the face.
- Placed symmetrically along the lower edges of the face.
### Additional Notes
- No text, labels, or data elements are present in the image.
- The design uses flat colors with no gradients or shading.
- All elements are centrally aligned and proportionally scaled to maintain a minimalist aesthetic.
## Technical Implications
This image could serve as:
- A UI icon or mascot for applications targeting children or pet-related services.
- A placeholder graphic in user interfaces requiring a friendly, approachable visual element.
- A component in educational materials for young audiences due to its simplicity and high contrast.
## Limitations
- Lacks contextual details (e.g., body, environment).
- No interactive or dynamic elements present.
- No textual annotations to provide additional meaning.
## Conclusion
The image is a minimalist, high-contrast representation of a cat's face, optimized for clarity and recognizability. Its design prioritizes simplicity over realism, making it suitable for digital interfaces requiring non-distracting visual elements.
</details>
<details>
<summary>2407.01489v2/x4.png Details</summary>

### Visual Description
# Technical Document Extraction: SQL Code Analysis
## Code Transcription
```python
from sqlfluff import lint
def test_rules_std_L060_raised() -> None:
try:
sql = "SELECT IFNULL(NULL, 100), NVL(NULL, 100);"
result = lint(sql, rules=["L060"])
assert len(result) == 2
except:
print("Other issues")
try:
assert result[0]["description"] == "Use 'COALESCE' instead of 'IFNULL'."
assert result[1]["description"] == "Use 'COALESCE' instead of 'NVL'."
print("Issue resolved")
except AssertionError:
print("Issue reproduced")
return
test_rules_std_L060_raised()
```
## Key Components
1. **Imports**:
- `from sqlfluff import lint`
*Imports the `lint` function from the `sqlfluff` library for SQL linting.*
2. **Function Definition**:
- `def test_rules_std_L060_raised() -> None:`
*Defines a test function with no return value.*
3. **SQL Query**:
```sql
SELECT IFNULL(NULL, 100), NVL(NULL, 100);
```
*Tests two NULL-handling functions: `IFNULL` (MySQL) and `NVL` (Oracle).*
4. **Linting Execution**:
- `result = lint(sql, rules=["L060"])`
*Applies rule `L060` (likely related to NULL handling) to the SQL query.*
5. **Assertions**:
- `assert len(result) == 2`
*Expects exactly two linting issues to be found.*
- `assert result[0]["description"] == "Use 'COALESCE' instead of 'IFNULL'."`
- `assert result[1]["description"] == "Use 'COALESCE' instead of 'NVL'."`
*Validates that the linting output matches expected error messages.*
6. **Error Handling**:
- `except: print("Other issues")`
*Catches general exceptions and prints a generic error message.*
- `except AssertionError: print("Issue reproduced")`
*Catches assertion failures and prints a specific error message.*
7. **Function Invocation**:
- `test_rules_std_L060_raised()`
*Executes the test function at the end of the script.*
## Observations
- **Color Coding**:
- Red text highlights function definitions, SQL keywords, and error messages.
- Blue text highlights assertions and print statements.
- This may indicate syntax highlighting in the original source.
- **Rule Reference**:
- `L060` is a rule identifier in `sqlfluff` (likely from the [SQLFluff Rule List](https://docs.sqlfluff.com/en/stable/rules/)).
- **Purpose**:
- The script tests whether `sqlfluff` correctly identifies deprecated NULL-handling functions (`IFNULL`, `NVL`) and suggests `COALESCE` as a replacement.
## Notes
- The code uses Python-style exception handling (`try/except`).
- The `lint` function returns a list of dictionaries containing linting results (e.g., `description` keys).
- The final `return` statement exits the function after execution.
</details>
Figure 4: Example reproduction test.
3.3.1 Reproduction test generation.
Since Agentless generates multiple candidate patches per issue, we need a way to select a final patch for submission. Please note that under the realistic SWE-bench setup, the original project codebase can only provide regression tests and not any reproduction tests (i.e., bug-triggering tests). This is because the issue has just been raised and the developers have not added any additional tests to trigger the issue. As such, different from the Generate-and-Validate program repair setup Long and Rinard [2016], we do not have access to bug-triggering tests.
Following prior work Ruan et al. [2024]; Chen et al. [2024], Agentless generates additional reproduction test to help with patch selection. More specifically, Agentless leverages the LLM to synthesize a complete testing file that attempts to both reproduce the original issue described in the issue description, as well as verify whether the issue has been fixed. Figure 4 shows an example of the reproduction test that we want the model to synthesize. If the issue is reproduced, the test outcome should print Issue reproduced. On the other hand, the test should output Issue resolved if the issue has been fixed. We also include another output of Other issues if the test runs into any unexpected issues. To generate the reproduction test, Agentless provides the original issue description with an example reproduction test to demonstrate the test format. Similar to repair, we also sample multiple candidate reproduction tests and then execute each test on the original repository to filter any tests that do not output Issue reproduced. Finally, we normalize each test (remove comments, extra spaces, and normalize test names) and then select the test with the highest number of occurrence as the final reproduction test for each issue.
3.3.2 Patch selection.
Using the generated reproduction tests, we start our patch selection process to pick the final submission patch. Agentless first runs all the existing tests in the repository to identify a set of passing tests that successfully pass in the original codebase. However, not all of those passing tests should be considered as regression tests since solving the issue may require changing some of the existing functionalities. Therefore, Agentless provides the list of passing tests to the LLM and ask it to identify any tests that should not be ran to check if the issue has been correctly fixed (i.e., the tests that may be updated/patched during issue fixing). After removing the LLM-identified non-regression tests, we obtain a final set of regression tests. We then run the set of regression tests on all the generated patches. Agentless then keeps the patches with the lowest number of regression failures. For those patches, Agentless then runs the selected reproduction test and only keeps patches that output Issue resolved. Meanwhile, because the reproduction test is generated by the LLM and can potentially be incorrect/imprecise, it could be possible that no patch can pass the reproduction test; in that case, Agentless will fall back on only using the regression test results for selection. Agentless then applies a re-ranking approach using majority voting: We first normalize each patch to ignore surface-level differences (e.g., extra spaces, newlines, and comments), and then select the patch with the highest number of occurrences as the final patch for submission. More specifically, to standardize the patch, we begin by parsing both the old and new code (after applying the patch) into abstract syntax trees. Next, we unparse the trees into a canonical source code format with docstrings removed. Finally, we compute the textual diff between the standardized old and new code to get the normalized patch.
Agentless solves repository-level issues using a simple step-by-step procedure. We note here that none of the techniques used by Agentless in isolation are revolutionary, but instead Agentless smartly combines existing techniques to construct an easy-to-understand approach. Different from prior autonomous agent-based tools that involve complex interactions with the environment, Agentless uses a simplistic three-phase approach to localize, repair, and validate without relying on any agents for decision-making. By conducting localization in a hierarchical manner, Agentless can efficiently and effectively compute the fine-grained locations for editing. Agentless then performs repair by sampling multiple patches using a simple diff format. Agentless 's patch validation approach can further aid the patch selection process by producing reproduction tests that can help verify if the issue is fixed.
4 Experimental Setup
Datasets. We evaluate Agentless and baselines using the popular SWE-bench dataset Jimenez et al. [2024a] to test the ability to solve real-world software engineering issues. Each problem in SWE-bench requires submitting a patch to solve the underlying issue described in the input issue description. In particular, we focus on the widely used SWE-bench Lite version swe [2024], containing 300 self-contained problems with better quality. Furthermore, we also conduct a detailed study (Section 6.1) on the SWE-bench Lite benchmark to not only demonstrate potential issues and biases, but also produce a more rigorous filtered set of problems for better evaluation.
Implementation. We implement Agentless using GPT-4o (gpt-4o-2024-05-13) OpenAI [2024a]. By default, we query the LLM with greedy decoding. During sampling, we use a sampling temperature of $0.8$ . For the embedding-based retrieval method, we implement our approach using LlamaIndex LlamaIndex [2024]. We use OpenAI's text-embedding-3-small OpenAI [2024b] model to compute the embedding with chunk size of 512 and chunk overlap of 0. For each issue, we first localize to the top three suspicious files, and then localize to an unrestricted number of suspicious classes and functions within these files, all using greedy decoding. Next, to maximize the chances of finding the correct edit locations, we draw four samples of edit locations per issue (i.e., the third step in the localization phase). This gives us 4 separate sets of edit locations per issue. For each set, we adopt a context window of $±$ 10 lines around each edit location, and generate 10 patches (1 greedy and 9 samples). This results in a total of 40 patches per bug. We adopt the same Search/Replace edit format from prior work Gauthier [2024], and use the built-in Python ast library ast [2024] to perform parsing in our normalization step. To generate the reproduction tests, we also generate 40 samples (1 greedy and 39 samples) in total prior to patch selection (described in Section 3.3.1). The regression tests are obtained by first running all the tests to obtain a set of passing tests that successfully pass in the original repository and then use the LLM to identify any non-regression tests (described in Section 3.3.2). We do not directly use the provided list of regression tests already identified in the PASS_TO_PASS field of SWE-bench as requested by the SWE-bench maintainers If we directly use the PASS_TO_PASS tests the performance on SWE-bench Lite will be 98.. We modify the official SWE-bench evaluation setup to be able to freely execute arbitrary regression and reproduction tests.
Baselines. We compare Agentless against 26 agent-based approaches. These baseline tools represent the state-of-the-art performance on SWE-bench. We include state-of-the-art open-source as well as commercial or closed-source baselines (indicated via a
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
). We note here that the majority of the closed-source baselines do not provide any trajectories, just the submission patches. Therefore, we cannot verify the steps taken to arrive at the final patches. Moreover, we also include a simple agentless baseline using retrieval-augmented generation (RAG) proposed as part of SWE-bench Jimenez et al. [2024a] for comparison. In this case, the agentless baseline uses the LLM to directly generate a patch file by providing it with the file content of the most relevant files, retrieved using BM25 Robertson et al. [2009].
Metrics. Following prior work Zhang et al. [2024c], we report 1) % Resolved: the percentage of resolved problems in the benchmark, 2) Avg. $ Cost: average inference cost of running the tool, and 3) Avg. # Tokens: average number of input and output tokens used to query to LLM. Additionally, we also report the % Correct Location: the percent of problems where the patch produced by the tool covers the edit location(s) of the ground truth developer patch. We compute this metric over three granularities: file, function, and line. We report that a patch contains the correct location if it edits a superset of all locations in the ground truth patch. For baseline tools, we directly use the reported results either from the official leaderboard Jimenez et al. [2024b] or from the tool's official paper/repository.
5 Evaluation
Table 1: Results on SWE-bench Lite. Note:
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
indicates approaches that are closed-source (i.e., source code is not released).
<details>
<summary>2407.01489v2/resources/grinning-cat_1f63a.png Details</summary>

### Visual Description
# Image Analysis Report
## Image Description
The image depicts a stylized cartoon representation of a cat's face. Key visual elements include:
### Facial Features
- **Face Shape**: Square with rounded edges
- **Primary Color**: Bright yellow (#FFD700)
- **Ears**:
- Position: Top corners of the face
- Shape: Triangular
- Color: Orange (#FFA500)
- **Eyes**:
- Shape: Oval
- Color: Black (#000000)
- Placement: Centered horizontally, 1/3 from top
- **Nose**:
- Shape: Triangle
- Color: Pink (#FFC0CB)
- Placement: Centered below eyes
- **Mouth**:
- Shape: Curved "W" (smiling)
- Color: Purple (#800080)
- Placement: Centered below nose
- **Whiskers**:
- Count: 3 per side
- Color: Orange (#FFA500)
- Placement: Extending from cheek area
### Style Characteristics
- Flat design with no gradients or textures
- Solid color fills
- No background elements (transparent/white)
- Minimalist aesthetic
## Technical Specifications
- **Format**: Digital illustration
- **Resolution**: Not applicable (vector-based)
- **Color Mode**: RGB
- **File Type**: Not specified (assumed PNG/SVG)
## Data Extraction
No textual information, labels, axis titles, legends, or data tables present in the image. The illustration contains no embedded text or quantitative data.
## Cross-Reference Verification
N/A (No legends or data elements to cross-reference)
## Conclusion
This image serves as a decorative element rather than a data visualization. All visual components have been documented above for technical reference purposes.
</details>
indicates approaches built on top of an earlier version of our Agentless. '-' indicates that the relevant information to compute this has not been released.
<details>
<summary>2407.01489v2/resources/question-mark_2753.png Details</summary>

### Visual Description
# Image Analysis Report
## Description
The image contains a single, large, solid red question mark (`?`) centered on a plain white background. There are no additional graphical elements, text, labels, legends, or data structures present.
## Technical Details
- **Primary Symbol**: Red question mark (`?`)
- **Color**: Bright red (`#FF0000` in hexadecimal)
- **Shape**: Standard question mark glyph with smooth curves and a closed loop at the top
- **Size**: Dominates the frame, occupying ~80% of the image height
- **Background**: Uniform white (`#FFFFFF`)
- **Typography**: No text or embedded labels detected
- **Data Elements**: No charts, diagrams, tables, or numerical data present
## Observations
1. The image serves as a symbolic placeholder, likely representing uncertainty, inquiry, or missing information.
2. No contextual metadata (e.g., axis titles, legends, or categorical labels) is available for further analysis.
3. The absence of textual or structural elements suggests this is not part of a larger dataset or visualization system.
## Conclusion
This image contains no extractable textual or numerical information. It functions purely as a visual symbol without associated data or labels.
</details>
indicates that multiple models are used, but some of them are not specified. Claude 3.5 S is short for Claude 3.5 Sonnet.
| Tool | LLM | % Resolved | Avg. $ Cost | Avg. # Tokens | % Correct Location | | |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Line | Function | File | | | | | |
| CodeStory Aide cod [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 129 (43.00%) | - | - | 41.7% | 58.7% | 72.0% |
| Bytedance MarsCode Liu et al. [2024b]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 118 (39.33%) | - | - | 42.7% | 58.0% | 79.7% |
| Honeycomb hon [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 115 (38.33%) | - | - | 44.3% | 57.0% | 69.3% |
| MentatBot men [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 114 (38.00%) | - | - | 37.3% | 53.3% | 69.3% |
| Gru gru [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 107 (35.67%) | - | - | 38.3% | 54.3% | 75.0% |
| Isoform iso [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 105 (35.00%) | - | 41,963 | 38.7% | 55.3% | 72.0% |
| SuperCoder2.0 sup [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 102 (34.00%) | - | - | 41.7% | 63.7% | 65.7% |
| Alibaba Lingma Agent lin [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 99 (33.00%) | - | - | 40.0% | 58.7% | 75.0% |
| Factory Code Droid fac [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 94 (31.33%) | - | - | 36.7% | 55.7% | 72.7% |
| Amazon Q Developer-v2 ama [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 89 (29.67%) | - | - | 40.3% | 52.0% | 74.3% |
| SpecRover Ruan et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 93 (31.00%) | $0.65 | - | - | - | - |
| CodeR Chen et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 85 (28.33%) | $3.34 | 323,802 | 35.7% | 52.3% | 67.0% |
| MASAI Arora et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 84 (28.00%) | - | - | 38.7% | 56.3% | 75.0% |
| SIMA sim [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 83 (27.67%) | $0.82 | - | 37.0% | 54.0% | 79.0% |
| IBM Research Agent-101 ibm [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 80 (26.67%) | - | - | 39.7% | 56.7% | 73.3% |
| OpenCSG StarShip ope [2024a]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 71 (23.67%) | - | - | 39.0% | 61.7% | 90.7% |
| Amazon Q Developer ama [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 61 (20.33%) | - | - | 34.0% | 43.7% | 71.7% |
| RepoUnderstander Ma et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 64 (21.33%) | - | - | - | - | - |
| AutoCodeRover-v2 aut [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 92 (30.67%) | - | - | 35.0% | 52.3% | 69.3% |
| RepoGraph rep [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 89 (29.67%) | - | - | 36.7% | 51.3% | 71.0% |
| Moatless moa [2024] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 80 (26.67%) | $0.17 | - | 38.7% | 54.7% | 78.7% |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 74 (24.67%) | $0.14 | - | 36.0% | 52.0% | 73.0% | |
| OpenDevin+CodeAct v1.8 ope [2024b] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 80 (26.67%) | $1.14 | - | 38.0% | 49.7% | 67.3% |
| Aider Gauthier [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 79 (26.33%) | - | - | 35.3% | 50.0% | 69.7% |
| SWE-agent Yang et al. [2024a] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 69 (23.00%) | $1.62 | 521,208 | 40.7% | 54.3% | 72.0% |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 55 (18.33%) | $2.53 | 498,346 | 29.3% | 42.3% | 58.3% | |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 54 (18.00%) | $2.51 | 245,008 | 30.7% | 45.3% | 61.0% | |
| AppMap Navie app [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 65 (21.67%) | - | - | 29.7% | 44.7% | 59.7% |
| AutoCodeRover Zhang et al. [2024c] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 57 (19.00%) | $0.45 | 38,663 | 29.0% | 42.3% | 62.3% |
| RAG Yang et al. [2024a] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3 Opus | 13 (4.33%) | $0.25 | - | 22.0% | 30.0% | 57.0% |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 8 (2.67%) | $0.13 | - | 12.7% | 23.3% | 47.3% | |
|
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude-2 | 9 (3.00%) | - | - | 16.7% | 24.3% | 46.7% | |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-3.5 | 1 (0.33%) | - | - | 6.3% | 11.3% | 27.3% | |
| Agentless |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 96 (32.00%) | $0.70 | 78,166 | 35.3% | 52.0% | 69.7% |
5.1 Performance on SWE-bench Lite
Table 1 shows the main evaluation result of Agentless and prior agent-based approaches on SWE-bench Lite. We observe that Agentless is able to solve 96 out of 300 problems (32.00%). While this is not the highest percentage of problems solved on SWE-bench Lite, Agentless is extremely competitive compared with prior agent-based approaches while using a much simpler design and overall technique. It is important to note here that many of the top techniques are closed-source/commercial and did not release any source code to reproduce experiments or even trajectories for further verification. Compared with all open-source approaches, Agentless is able to achieve the highest performance of 32.00% (96 / 300) on SWE-bench Lite. Additionally, Agentless only costs on average $0.70, which is less than most prior agent-based approaches. Comparing against the RAG agentless baselines, we see that while Agentless costs slightly more, Agentless is also able to fix way more issues.
<details>
<summary>2407.01489v2/x5.png Details</summary>

### Visual Description
# Technical Analysis of Venn Diagrams Comparing Tool Overlaps
## Diagram Structure
The image contains two Venn diagrams comparing tool overlaps in **closed-source** (left) and **open-source** (right) environments. Each diagram uses overlapping circles to represent tool intersections, with numerical values indicating counts in specific regions.
---
## Closed-Source Diagram
### Legend
- **CodeStoryAide**: Blue
- **MarsCode**: Purple
- **Honeycomb**: Light Green
- **Agentless**: Pink
- **Others**: Yellow
### Key Data Points
1. **Central Overlap (All Tools)**: 60
2. **Pairwise Overlaps**:
- CodeStoryAide & MarsCode: 0
- CodeStoryAide & Honeycomb: 11
- CodeStoryAide & Agentless: 16
- MarsCode & Honeycomb: 7
- MarsCode & Agentless: 0
- Honeycomb & Agentless: 5
3. **Individual Sections**:
- CodeStoryAide: 3
- MarsCode: 4
- Honeycomb: 10
- Agentless: 1
- Others: 26
---
## Open-Source Diagram
### Legend
- **AutoCodeRover-v2**: Blue
- **RepoGraph**: Purple
- **Moatless-claude3**: Light Green
- **Agentless**: Pink
- **Others**: Yellow
### Key Data Points
1. **Central Overlap (All Tools)**: 41
2. **Pairwise Overlaps**:
- AutoCodeRover-v2 & RepoGraph: 1
- AutoCodeRover-v2 & Moatless-claude3: 2
- AutoCodeRover-v2 & Agentless: 0
- RepoGraph & Moatless-claude3: 3
- RepoGraph & Agentless: 2
- Moatless-claude3 & Agentless: 12
3. **Individual Sections**:
- AutoCodeRover-v2: 2
- RepoGraph: 7
- Moatless-claude3: 9
- Agentless: 2
- Others: 25
---
## Observations
1. **Agentless** appears in both diagrams but with minimal individual counts (1 in closed-source, 2 in open-source).
2. **Honeycomb** (closed-source) and **Moatless-claude3** (open-source) show significant pairwise overlaps (11 and 9, respectively).
3. The central overlap is larger in the closed-source diagram (60 vs. 41 in open-source).
4. **Others** categories dominate both diagrams (26 in closed-source, 25 in open-source).
</details>
Figure 5: Venn diagram for issue fixes.
5.1.1 Unique issues fixed.
Figure 5 shows the unique issues solved by Agentless compared with the top-performing closed-source / commercial and open-source approaches (``Others'' indicates all other approaches within each category). First, we see that compared to the open-source agent-based tools, Agentless is able to fix 2 issues that no other tools can resolve, showing the success of using a simple agentless approach in solving difficult issues. Furthermore, even when compared with high-performing commercial approaches, Agentless is still able to offer a unique fix. This low number of unique fixes can be attributed to the fact there are already tools built on top of earlier versions of Agentless (e.g., Isoform iso [2024]) or partly inspired by Agentless (e.g., Bytedance MarsCode Liu et al. [2024b]), thereby reducing the unique issues resolved by Agentless. Nevertheless, our results show that Agentless can be competitive/complementary compared to existing agents.
5.1.2 Localization performance.
In real-world software development, apart from directly fixing the issue, providing the correct edit location to developers is extremely helpful for debugging. As such, we examine the locations of the patches generated by each technique compared with the ground truth patch. We note here that it is possible to fix a bug in a different location than the ground truth, however comparing against the ground truth patch can still serve as an approximate measure. Table 1 additionally shows the percentage of submitted patches with correct locations for each tool, across line, function, and file levels. We first observe that the percentage of patches with correct locations correlates heavily with the solve rate. Interestingly, the highest result for file-level location is OpenCSG StarShip at 90.0%, significantly higher than even the best-performing approaches while at the same time having a relatively low solve rate (23.67%). As OpenCSG StarShip is a commercial product that does not provide source code or detailed trajectories, it is difficult to explain this huge difference between localization and repair performance. In terms of localization performance, by using our simple hierarchical approach, Agentless remains very competitive compared with previous agent-based approaches.
5.1.3 Reproduction test results.
Agentless uses regression and generated reproduction tests to perform filtering in order to select the final submission patch. Therefore, we evaluate the quality of our generated reproduction tests. We note here that as described in Section 3.3.1, we only use the generated reproduction test if it can successfully reproduce the original issue in the original repository. Out of the 300 problems in SWE-bench Lite, Agentless is able to produce 213 reproduction tests that output the required reproduction message when evaluated on the original repository. However, these tests might still be incorrect as a correct reproduction should also be able to verify that the issue has been correctly resolved. To evaluate this, we directly apply the ground truth patch provided in SWE-bench Note that the ground truth patch is only applied here to evaluate the test quality, and is not used during the Agentless process for selecting the reproduction tests.. We found that only 94 tests correctly output the Issues resolved message after applying the ground truth patches. This steep drop-off can be partially explained as sometimes the issue description provided in the problem may not contain enough information to generate complete test cases for validating a correct solution. However, this problem is partially mitigated as Agentless takes a conservative approach of requiring all patches to pass the regression test suite first and will remove a patch if it cannot pass the regression tests but passes on the generated reproduction test. This reduces the likelihood of an incorrect reproduction test selecting a correct patch. Furthermore, if all generated patches cannot pass the reproduction test, Agentless will fall back on only using the regression test results for selection. In Section 5.2.3 we closely examine the impact of using both regression and reproduction test for patch selection has on the performance.
Table 2: Performance of different localization steps.
| Method | Contains GT | LoC | Avg. $ |
| --- | --- | --- | --- |
| File level localization | | | |
| Prompting-based | 78.67% | 3,221 | $0.02 |
| Embedding-based (w/o irrelevant filtering) | 67.67% | 3,388 | $0.05 |
| Embedding-based (w/ irrelevant filtering) | 70.33% | 3,622 | $0.04 |
| Combined | 81.67% | 3,424 | $0.06 |
| Related element localization | | | |
| Complete file | 53.67% | 778 | $0.15 |
| Skeleton format | 58.33% | 698 | $0.02 |
| Edit location localization | | | |
| Greedy | 50.67% | 189 | $0.06 |
| Direct from file-level | 47.00% | 208 | $0.18 |
| Multi-samples merged | 56.33% | 342 | $0.07 |
| Multi-samples | 49.67% | 165 | $0.07 |
| 49.33% | 180 | | |
| 49.33% | 168 | | |
| 48.33% | 213 | | |
5.1.4 Adoption of Agentless.
Although released very recently, Agentless has already received widespread adoption. In Table 1, there are two baseline tools (indicated via
<details>
<summary>2407.01489v2/resources/grinning-cat_1f63a.png Details</summary>

### Visual Description
# Image Analysis Report
## Image Description
The image depicts a stylized cartoon representation of a cat's face. Key visual elements include:
### Facial Features
- **Face Shape**: Square with rounded edges
- **Primary Color**: Bright yellow (#FFD700)
- **Ears**:
- Position: Top corners of the face
- Shape: Triangular
- Color: Orange (#FFA500)
- **Eyes**:
- Shape: Oval
- Color: Black (#000000)
- Placement: Centered horizontally, 1/3 from top
- **Nose**:
- Shape: Triangle
- Color: Pink (#FFC0CB)
- Placement: Centered below eyes
- **Mouth**:
- Shape: Curved "W" (smiling)
- Color: Purple (#800080)
- Placement: Centered below nose
- **Whiskers**:
- Count: 3 per side
- Color: Orange (#FFA500)
- Placement: Extending from cheek area
### Style Characteristics
- Flat design with no gradients or textures
- Solid color fills
- No background elements (transparent/white)
- Minimalist aesthetic
## Technical Specifications
- **Format**: Digital illustration
- **Resolution**: Not applicable (vector-based)
- **Color Mode**: RGB
- **File Type**: Not specified (assumed PNG/SVG)
## Data Extraction
No textual information, labels, axis titles, legends, or data tables present in the image. The illustration contains no embedded text or quantitative data.
## Cross-Reference Verification
N/A (No legends or data elements to cross-reference)
## Conclusion
This image serves as a decorative element rather than a data visualization. All visual components have been documented above for technical reference purposes.
</details>
) already built upon earlier versions of our Agentless: RepoGraph rep [2024] is an open-source tool combining Agentless with repository-level graph, while Isoform iso [2024] is a closed-source commercial tool also built upon Agentless. Additionally, Bytedance MarsCode Liu et al. [2024b] is partly inspired by Agentless in patch sampling/selection and OpenDevin ope [2024b] is in the process of integrating Agentless into their ecosystem. Furthermore, Agentless has also been adopted by OpenAI as the default approach when showcasing the real-world coding performance of GPT-4o on their new SWE-bench Verified benchmark Chowdhury et al. [2024], where Agentless also achieves the best performance compared with all other studied agent-based solutions. At the time of finishing up this draft, OpenAI just released the new o1 model family and also adopted Agentless as the top/default approach to showcase their performance on SWE-bench OpenAI [2024c].
5.2 Ablation study on components of Agentless
Next, we look at how each component in the localization, repair, and patch validation phases contributed to the final Agentless performance. Unless otherwise specified, we vary the configuration of one component while using the default parameters for all other settings.
5.2.1 Localization ablation.
Table 2 shows the performance and cost for each of the 3 steps in Agentless 's localization phase. We show after each localization step the percentage of problems whose ground truth edit locations remain in the location set (``Contains GT''), the average lines of code of each location set (``LoC''), and the average dollar cost of each step (``Avg.$''). The bold method indicates the default setting of Agentless. First, we examine the different configurations of file level localization. To start with, for the retrieval method based on embeddings, we see that without including the irrelevant folder filtering to remove irrelevant folders for embedding (described in Section 3.1.1), both the performance and cost become worse. This demonstrates the importance of limiting the number of files to consider during embedding and focusing on essential parts of the repository for more cost-efficient and effective localization. We see that using the prompting-based or the embedding-based retrieval method alone can locate the ground truth file in 78.7% and 67.7% of cases respectively. This can be further improved by combining them to obtain 81.7% correct file localization, showing that prompting-based and embedding-based retrieval methods can complement each other in identifying different sets of relevant files.
Table 3: Performance of different repair setups.
| Method | Performance | Avg. $ |
| --- | --- | --- |
| Greedy location (40 samples) | 88 (29.33%) | $0.22 |
| Multi-samples merged (40 samples) | 85 (28.33%) | $0.24 |
| Multi-samples (4 x 10 samples) | 96 (32.00%) | $0.29 |
Using all of the localized files leads to a large context window ( $>$ 3000). As such, in our second localization step, we localize to the relevant classes and functions, and are able to drastically reduce the context window ( $<$ 800). We compare our input of using skeleton format (described in Section 3.1.2) to provide a more concise representation with the baseline of using the complete file content. We observe that by using the complete file content, not only is the cost much higher but also the number of localized groundtruth issues is reduced. The reason is most likely that LLMs cannot handle long context very well, so providing the entire file contents can confuse the model. Conversely, by using a more concise representation as the input, we can effectively localize the correct related locations that are needed for inspection and editing.
Next, Agentless localizes to the exact edit locations needed to achieve even more context reduction without losing much of the localization accuracy. We compare the different ways we can perform the edit location localization: 1) ``Greedy'': using greedy decoding to obtain one set of edit locations, 2) ``Direct from file-level'': directly go from file-level localization to the edit locations (instead the default of localizing from file-level to related elements and then to edit locations), 3) ``Multi-samples merged'': sample multiple sets of edit locations and merging them into one set, and 4) ``Multi-samples'': sample multiple sets of edit locations. We first observe that by directly going from file-level to the edit locations, both the cost and performance are worse. The reason is that the model can become confused when providing a large context, demonstrating the importance of our hierarchical localization design. We also find that when merging multiple samples together, the amount of ground truth localized is higher but at the expense of having to add more context as the input during the repair phase. Our default settings also sample the edit locations multiple times, however instead of merging, we perform repair on them separately to make use of the fact each sampled location set can provide similar localization performance while also limiting the input context. In short, by using hierarchical localization steps, Agentless can successfully minimize the cost while performing effective localization.
5.2.2 Repair ablation.
We now look at the impact of our different repair setups on the final performance. Table 3 shows the different settings and inputs for our repair phase with their performance and cost.Starting with using the greedy location set generated in the edit location generation stage, we observe that we can already achieve very high performance of more than 88 issues fixed. Similarly, for the ``Multi-samples merged'' where we merged multiple location sets into one, we can also achieve comparable performance. The performance can be further improved by considering each sampled locations separately (to generate 10 candidate patches each) when performing repair to achieve 96 fixes. The reason is that each different location sets may localize different ground truth locations and provide different context that can be helpful to fix specific issues. By using different edit locations and combining with our extensive test filtering and selection stage, Agentless can drastically improve the repair performance.
<details>
<summary>2407.01489v2/x6.png Details</summary>

### Visual Description
# Chart Analysis: Issues Fixed vs. Number of Samples
## Chart Type
Line chart comparing two datasets: "Agentless" and "all sample".
## Axis Labels
- **X-axis**: `# of samples` (ranges from 4 to 44, incrementing by 4)
- **Y-axis**: `# of issues fixed` (ranges from 80 to 120, incrementing by 10)
## Legend
- **Blue circles**: "Agentless" dataset
- **Purple diamonds**: "all sample" dataset
- **Red dashed vertical line**: Threshold at 40 samples
## Data Points
### Agentless (Blue Circles)
- (4, 80)
- (8, 82)
- (12, 87)
- (16, 88)
- (20, 89)
- (24, 90)
- (28, 91)
- (32, 93)
- (36, 94)
- (40, 96)
- (44, 95)
### All Sample (Purple Diamonds)
- (4, 90)
- (8, 100)
- (12, 108)
- (16, 112)
- (20, 115)
- (24, 118)
- (28, 120)
- (32, 122)
- (36, 124)
- (40, 125)
- (44, 125)
## Key Trends
1. **Agentless**:
- Gradual increase in issues fixed as samples increase.
- Peaks at 96 issues fixed at 40 samples, then slightly declines to 95 at 44 samples.
2. **All Sample**:
- Steep upward trend until 40 samples, reaching 125 issues fixed.
- Plateaus at 125 issues fixed from 40 to 44 samples.
3. **Threshold**:
- Red dashed line at 40 samples marks a critical point where "all sample" performance stabilizes.
## Observations
- The "all sample" dataset consistently outperforms "Agentless" across all sample counts.
- The gap between datasets widens significantly as sample count increases (e.g., 35 issues difference at 44 samples).
- "Agentless" shows minimal growth beyond 40 samples, while "all sample" maintains peak performance.
</details>
Figure 6: Repair performance as the number of patch samples increases.
Next, we examine the impact of using different numbers of sampled candidate patches on the performance of Agentless. Figure 6 shows the number of issues fixed as we increase the number of samples. Note that the sample interval increases by 4 since we use 4 different sets of locations as input. First, we see that by just using 1 greedy sample for each location set, Agentless can already achieve a significant number of correct fixes of 80. We can continue to improve repair performance by adding more samples. However, we observe that the performance plateaus at around 40 samples where adding additional candidate patches does not improve performance. This is because we perform majority voting after test filtering to select the final submission patch, which means that later samples may be ignored since it is difficult for them to offset the majoritively voted patch. Interestingly, we can also see from the figure that if we consider all patch samples (instead of only selecting one patch) for each issue, the total number of possible issues that Agentless can solve is 126 (42.0%). This shows a high upper bound for the potential of Agentless with future work being even better patch re-ranking and selection techniques to further improve the overall performance.
5.2.3 Patch validation ablation.
Table 4: Performance of different patch selection.
| Method | Performance | Avg. $ |
| --- | --- | --- |
| Majority voting | 77 (25.67%) | $0.00 |
| +Regression test | 81 (27.00%) | $0.01 |
| +Reproduction test | 96 (32.00%) | $0.25 |
Finally, we examine the impact of our different test generation and patch selection configurations has on performance. Table 4 shows the result and additional cost of different approaches. We see that by only using majority voting, we can already achieve 77 correct fixes. By adding the existing regression tests, and filter for candidate patches with the lowest amount of regression errors, we can improve performance to 81 issues resolved. Furthermore, the most significant performance improvement was achieved by incorporating additional filtering based on the generated reproduction tests, resulting in the final Agentless performance of 96 fixes. This demonstrates the impact of our patch selection approach, specifically our reproduction test generation, which is able to make use of the high number of candidate patches generated and filter for the correct patch for final submission. However, using reproduction tests also comes with additional costs as Agentless needs to generate these tests which are not provided in the original project repository.
<details>
<summary>2407.01489v2/x7.png Details</summary>

### Visual Description
# Technical Document Analysis of Chart
## Chart Type
Line chart with dual y-axes and a bar chart overlay.
## Axes Labels
- **X-axis**: `# of samples` (categories: 1, 5, 10, 20, 40)
- **Left Y-axis**: `# of issues fixed` (range: 85–110)
- **Right Y-axis**: `# of issues with tests` (range: 75–200)
## Legend
- **Blue line**: `selected reproduction tests`
- **Purple line**: `plausible reproduction tests`
## Data Points
### Selected Reproduction Tests (Blue Line)
- **# of issues fixed**:
- 1 sample: 92
- 5 samples: 102
- 10 samples: 105
- 20 samples: 107
- 40 samples: 108
- **# of issues with tests**:
- 1 sample: 125
- 5 samples: 150
- 10 samples: 175
- 20 samples: 175
- 40 samples: 200
### Plausible Reproduction Tests (Purple Line)
- **# of issues fixed**:
- 1 sample: 86
- 5 samples: 90
- 10 samples: 91
- 20 samples: 91
- 40 samples: 91
- **# of issues with tests**:
- 1 sample: 75
- 5 samples: 100
- 10 samples: 125
- 20 samples: 125
- 40 samples: 125
## Key Trends
1. **Selected Reproduction Tests**:
- Steady increase in issues fixed (92 → 108) as samples grow from 1 to 40.
- Issues with tests rise sharply (125 → 200), peaking at 40 samples.
2. **Plausible Reproduction Tests**:
- Minimal growth in issues fixed (86 → 91), plateauing after 10 samples.
- Issues with tests increase moderately (75 → 125), stabilizing at 20+ samples.
## Observations
- The blue line (`selected`) consistently outperforms the purple line (`plausible`) in both metrics.
- The right y-axis (`# of issues with tests`) shows a direct correlation with the number of samples for selected tests, while plausible tests exhibit a slower growth rate.
</details>
Figure 7: Reproduction test and repair results as number of samples increases.
Next, we look at our reproduction test generation strategy. Figure 7 shows the repair performance (bar, left axis), the total number of selected and plausible reproduction tests (lines, right axis) as we increase the number of candidate reproduction tests generated per issue. Recall that we only select the reproduction test if it can successfully output the Issue reproduced message when evaluated on the original repository. We further consider a selected reproduction test as plausible if it can successfully verify that the ground truth developer patch has fixed the original issue (i.e., output Issue resolved). To begin with, when we only generate one reproduction test (i.e., using the greedy output) per issue, we only produce 100 selected tests for all 300 issues in SWE-bench Lite with the repair performance being 90. We can increase the number of selected reproduction tests by increasing the number of candidate reproduction tests we generate. However, it is interesting to note that the number of plausible reproduction tests does not increase drastically (apart from going from 1 to 5 samples). The reason again is due to some ambiguity in the issue description where it may not contain sufficient information to reproduce and verify the issue has been resolved. Nevertheless, we observe a small improvement in the number of issues resolved, reaching final performance of 96 correct fixes as we increase the number of candidate reproduction tests to 40 per issue.
6 Additional Analysis on SWE-bench Lite
6.1 Problem Classification
<details>
<summary>2407.01489v2/x8.png Details</summary>

### Visual Description
# Technical Document: Pie Chart Analysis
## Chart Type
- **Pie Chart** with four distinct segments representing categorical data distribution.
## Labels and Data Points
1. **Contains reproducible example**
- **Percentage**: 54.3%
- **Color**: Purple
- **Position**: Largest segment, occupying more than half of the chart.
2. **Info in NL**
- **Percentage**: 27.0%
- **Color**: Green
- **Position**: Second-largest segment, occupying a quarter of the chart.
3. **Contains partial reproducible example**
- **Percentage**: 8.7%
- **Color**: Blue
- **Position**: Smallest segment, occupying the least space.
4. **Not enough info**
- **Percentage**: 10.0%
- **Color**: Red
- **Position**: Third-largest segment, slightly larger than the blue segment.
## Legend
- **Location**: Left side of the chart.
- **Color-Label Mapping**:
- Purple → Contains reproducible example
- Green → Info in NL
- Blue → Contains partial reproducible example
- Red → Not enough info
## Key Trends
1. **Dominance of Reproducible Examples**:
- Over 54% of the data falls under "Contains reproducible example," indicating a strong emphasis on reproducibility.
2. **Significant Non-Reproducible Data**:
- Combined, "Contains partial reproducible example" (8.7%) and "Not enough info" (10.0%) account for 18.7% of the data, highlighting gaps in reproducibility.
3. **Language-Specific Data**:
- "Info in NL" (27.0%) suggests a notable portion of the data is tied to Dutch-language information.
## Structural Notes
- **Total Percentage**: 54.3% + 27.0% + 8.7% + 10.0% = 100% (validates completeness).
- **Segment Order**: Segments are arranged clockwise, starting with the largest (purple) and ending with the smallest (blue).
## Conclusion
The chart illustrates a distribution where reproducibility is the primary focus, with a substantial portion of data lacking sufficient information or only partially meeting reproducibility criteria. The inclusion of language-specific data ("Info in NL") adds a layer of contextual relevance.
</details>
(a) Description quality
<details>
<summary>2407.01489v2/x9.png Details</summary>

### Visual Description
# Technical Document: Pie Chart Analysis
## Chart Description
The image depicts a pie chart illustrating the distribution of outcomes across five distinct categories. The chart uses color-coded segments to represent proportional data, with numerical percentages explicitly labeled for each category.
## Key Data Points
| Category | Percentage | Color | Position |
|---------------------|------------|--------|---------------------------------|
| No solution | 73.3% | Purple | Dominant segment occupying the majority of the chart. |
| Complete steps in NL| 9.7% | Pink | Second-largest segment after "No solution". |
| Some steps in NL | 7.7% | Blue | Third-largest segment. |
| Misleading | 5.0% | Green | Fourth-largest segment. |
| Exact patch | 4.3% | Red | Smallest segment. |
## Observations
- The chart emphasizes a stark imbalance, with **"No solution"** accounting for over 73% of the total.
- The combined percentage of **"Complete steps in NL"** and **"Some steps in NL"** (17.4%) suggests partial progress in the "NL" category.
- The smallest segments (**"Exact patch"** and **"Misleading"**) indicate minimal representation of these outcomes.
## Structural Notes
- No axis titles or legends are explicitly labeled in the image, as the categories and percentages are directly embedded within the chart.
- Color coding is used to differentiate categories but is not referenced in a separate legend.
## Total Validation
All percentages sum to 100.0%, confirming data integrity:
```
73.3% + 9.7% + 7.7% + 5.0% + 4.3% = 100.0%
```
This chart provides a clear visual representation of outcome distribution, highlighting the dominance of unresolved cases ("No solution") and the relative scarcity of complete resolutions ("Exact patch").
</details>
(b) Solution in description
<details>
<summary>2407.01489v2/x10.png Details</summary>

### Visual Description
# Stacked Bar Chart Analysis
## Chart Type
- Stacked bar chart
## Axis Labels
- **X-axis**: Categories (`Line`, `Function`, `File`, `Overall`)
- **Y-axis**: `Percentage` (0–100)
## Legend
- **Colors and Labels**:
- `Blue`: `None`
- `Purple`: `Stack trace`
- `Green`: `Natural language`
- `Red`: `Keywords`
## Data Points
### Line
- `None`: 90.3%
- `Stack trace`: 3.0%
- `Natural language`: 6.7%
- `Keywords`: 0%
### Function
- `None`: 68.0%
- `Stack trace`: 6.7%
- `Natural language`: 13.7%
- `Keywords`: 11.7%
### File
- `None`: 50.0%
- `Stack trace`: 13.7%
- `Natural language`: 15.3%
- `Keywords`: 21.0%
### Overall
- `None`: 39.3%
- `Stack trace`: 13.7%
- `Natural language`: 24.7%
- `Keywords`: 22.3%
## Key Trends
1. **Dominance of `None`**:
- Highest in `Line` (90.3%) and decreases progressively to `Overall` (39.3%).
2. **Growth of `Keywords`**:
- Increases from 0% in `Line` to 22.3% in `Overall`.
3. **`Natural language`**:
- Rises from 6.7% (`Line`) to 24.7% (`Overall`), becoming the second-largest segment in `Overall`.
4. **`Stack trace`**:
- Relatively stable (3.0–13.7%) across categories, peaking in `File` and `Overall`.
## Cross-Reference Validation
- Legend colors match bar segments:
- `Blue` (`None`) consistently occupies the largest portion of each bar.
- `Red` (`Keywords`) grows visibly from `Line` to `Overall`.
- `Green` (`Natural language`) and `Purple` (`Stack trace`) segments align with their labeled percentages.
</details>
(c) Location information
Figure 8: Categorization and corresponding breakdown of the SWE-bench Lite problems.
We now take a closer look at the problems in SWE-bench Lite. We first classify the existing problems to gain better understanding and additional insights on exactly what types of problems Agentless and prior approaches can solve. Specifically, we perform manual classification based on the issue description and ground truth developer patch of each problem. Below describes each of classification dimensions and their categories in more detail:
1) Description quality. We first inspect whether each issue description contains sufficient information to perform the desired task. Figure 8(a) shows the distribution of each category: (i) contains enough information in natural language, (ii) contains reproducible failure example, (iii) contains partially reproducible example, and (iv) does not contain enough information.We observe that while a majority of the tasks in SWE-bench Lite contains sufficient information, with many having some small failure examples to showcase the bug, there is a non-trivial percentage (10.0%) of problems which do not contain enough information. Such problems include those that require implementing a new function with a specific name or adding an error message with a specific string that was not provided in the problem description. These types of problems still exist in the benchmark despite claims that they have been completely removed by the filtering process according to SWE-bench Lite. This means the test will fail if the function name or error message string does not match exactly, even if the underlying functionality is correctly implemented. Another example of insufficient information are problems that have multiple different interpretations on how to solve the issue, and only a subset of them can pass the ground truth test. For instance, the issue description will outline two possible solutions suggestions with only one of them aligned well with developer intention. Implementing the other proposed solution suggestion will lead to test failure. This highlights the necessity to further sanitize/improve SWE-bench Lite where these problems with uninformative descriptions shall be further excluded.
2) Solution in description. We also examine whether the solution or steps to solve the problem are already provided in the issue description. Figure 8(b) shows the breakdown of our categories: (i) no solution or steps provided, (ii) partial solution provided (e.g., some steps in natural language), (iii) complete solution provided (e.g., complete steps in natural language), (iv) exact patch provided, and (v) misleading solution or steps. Interestingly, we observe that 4.3% of issues contain the exact ground truth patch in the issue description, while an additional 9.7% of issues describe the exact steps required to come up with the correct solution. This shows that certain problems in SWE-bench Lite can be much easier to solve since they provide the solution either in exact code snippets or natural language. Furthermore, we also observe 5.0% of issues contain proposed solution or steps in the issue description that do not reflect the ground truth patch introduced by the developers. This further highlights potential issues with the benchmark, as these discrepancies can mislead tools to generate incorrect solutions simply by following the issue description.
3) Location information. We further check if the issues description contains the correct location information. We divide the granularity into line, function, and file level. Our categories are: (i) exact locations in natural language, (ii) exact locations provided in failure stack traces, iii) related keywords in the issue description that can be used to search for the location, and (iv) not provided. We first observe that only in very few cases ( $<$ 10%), the issue provides the exact lines needed to fix the bug. However, this number increases as we increase the granularity to functions and files where we found that around half of the issues already provide the location of the file needed to be edited in the description. To repair a bug or introduce a new feature, finding the location to make the edit is extremely important. As such, we leverage this classification and focus our later analysis on the effect the provided location has on the repair performance of Agentless and baseline approaches.
These classification dimensions and categories raise potential issues with the SWE-bench Lite problems such as unsolvable questions, misleading potential solutions, and significant differences in problem difficulties. These issues have not been properly considered by either the benchmark creation process or prior approaches. Furthermore, we hope our classification can provide additional insights on the type of problems that can be solved by existing and future approaches.
6.2 SWE-bench Lite- $S$
Table 5: Performance and ranking on SWE-bench Lite- $S$ . * indicates a tie in ranking.
| Tool | LLM | SWE-bench Lite | SWE-bench Lite- $S$ | | |
| --- | --- | --- | --- | --- | --- |
| % Resolved | Rank | % Resolved | Rank | | |
| CodeStory Aide cod [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 129 (43.00%) | 1 | 114 (45.78%) | 1 |
| Bytedance MarsCode Liu et al. [2024b]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 118 (39.33%) | 2 | 106 (42.57%) | 2 |
| Honeycomb hon [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 115 (38.33%) | 3 | 98 (39.36%) | 3 |
| MentatBot men [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 114 (38.00%) | 4 | 96 (38.55%) | 4 |
| Gru gru [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 107 (35.67%) | 5 | 94 (37.75%) | 5 |
| Isoform iso [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 105 (35.00%) | 6 | 91 (36.55%) | 6 |
| SuperCoder2.0 sup [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 102 (34.00%) | 7 | 87 (34.94%) | 7 |
| Alibaba Lingma Agent lin [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 99 (33.00%) | 8 | 86 (34.54%) | 8 |
| Factory Code Droid fac [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 94 (31.33%) | 10 | 82 (32.93%) | 10 |
| Amazon Q Developer-v2 ama [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 89 (29.67%) | 12* | 76 (30.52%) | 13 |
| CodeR Chen et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 85 (28.33%) | 14 | 71 (28.51%) | 14* |
| MASAI Arora et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 84 (28.00%) | 15 | 70 (28.11%) | 16 |
| SIMA sim [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 83 (27.67%) | 16 | 71 (28.51%) | 14* |
| IBM Research Agent-101 ibm [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 80 (26.67%) | 17* | 66 (26.51%) | 18* |
| OpenCSG StarShip ope [2024a]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 71 (23.67%) | 22 | 56 (22.49%) | 23* |
| Amazon Q Developer ama [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
| NA | 61 (20.33%) | 26 | 51 (20.48%) | 25* |
| RepoUnderstander Ma et al. [2024]
<details>
<summary>2407.01489v2/resources/locked_1f512.png Details</summary>

### Visual Description
# Technical Document Extraction: Padlock Icon Analysis
## Image Description
The image depicts a simplified, stylized padlock icon with the following components:
- **Body**: A square-shaped yellow (hex: #FFD700) outline with rounded corners.
- **Shackle**: A gray (hex: #808080) U-shaped arc positioned at the top center of the body.
- **Keyhole**: A dark purple (hex: #4B0082) oval-shaped indentation centered on the body, representing the key insertion point.
## Structural Analysis
1. **Components**:
- **Shackle**: Connects the two ends of the padlock body when locked.
- **Body**: Represents the main locking mechanism housing.
- **Keyhole**: Indicates the operational interface for key insertion.
2. **Color Coding**:
- Yellow body: Standardized for visibility and recognition.
- Gray shackle: Neutral tone to emphasize functionality over aesthetics.
- Purple keyhole: Contrasting color to highlight the interactive element.
## Data Extraction
- **Textual Elements**: No embedded text, labels, or legends present.
- **Chart/Diagram Elements**: No axes, categories, or data points to extract.
- **Flowchart/Process Diagram**: No directional flow or interconnected components beyond the static padlock structure.
## Validation
- Cross-referencing legend/color mappings: Not applicable (no legend present).
- Accuracy confirmation: All visual elements described align with standard padlock iconography conventions.
## Conclusion
This icon adheres to minimalist design principles, prioritizing clarity and symbolic representation over detailed realism. No quantitative or categorical data is embedded within the image.
</details>
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 64 (21.33%) | 25 | 51 (20.48%) | 25* |
| AutoCodeRover-v2 aut [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 92 (30.67%) | 11 | 79 (31.73%) | 11 |
| RepoGraph rep [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 89 (29.67%) | 12* | 77 (30.92%) | 12 |
| Moatless moa [2024] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 80 (26.67%) | 17* | 67 (26.91%) | 17 |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 74 (24.67%) | 21 | 62 (24.90%) | 21 | |
| OpenDevin+CodeAct v1.8 ope [2024b] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 80 (26.67%) | 17* | 65 (26.10%) | 20 |
| Aider Gauthier [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o+
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 79 (26.33%) | 20 | 66 (26.51%) | 18* |
| SWE-agent Yang et al. [2024a] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3.5 S | 69 (23.00%) | 23 | 58 (23.29%) | 22 |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 55 (18.33%) | 28 | 45 (18.07%) | 27* | |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 54 (18.00%) | 29 | 42 (16.87%) | 29 | |
| AppMap Navie app [2024] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 65 (21.67%) | 24 | 56 (22.49%) | 23* |
| AutoCodeRover Zhang et al. [2024c] |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 57 (19.00%) | 27 | 45 (18.07%) | 27* |
| RAG Yang et al. [2024a] |
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude 3 Opus | 13 (4.33%) | 30 | 10 (4.02%) | 30 |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4 | 8 (2.67%) | 32 | 5 (2.01%) | 32 | |
|
<details>
<summary>2407.01489v2/resources/anthropic.png Details</summary>

### Visual Description
# Technical Document Extraction: Icon Analysis
## Image Description
- **Background**: Solid beige (#F5F0E6) with rounded corners.
- **Central Elements**:
- **Letter "A"**: Bold black (RGB: 0, 0, 0), sans-serif font, centered.
- **Slash ("/")**: Bold black (RGB: 0, 0, 0), diagonal orientation, positioned to the right of the "A", intersecting its lower half.
## Textual Information
- **Embedded Text**:
- "A" (uppercase, Latin alphabet).
- "/" (forward slash, ASCII character).
## Structural Analysis
- **Design Intent**: Minimalist iconography, likely representing an "AI" (Artificial Intelligence) theme due to the combination of "A" and slash.
- **Color Contrast**: High contrast between black text and beige background for visibility.
- **Typography**: Sans-serif font for modern, clean aesthetics.
## Notes
- No additional labels, legends, or data points present.
- No numerical or categorical data to extract.
- No interactive or dynamic elements observed.
## Conclusion
The image is a static icon with no embedded data, charts, or diagrams. Textual elements are limited to the characters "A" and "/".
</details>
Claude-2 | 9 (3.00%) | 31 | 6 (2.41%) | 31 | |
|
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-3.5 | 1 (0.33%) | 33 | 0 (0.00%) | 33 | |
| Agentless |
<details>
<summary>2407.01489v2/resources/openai.png Details</summary>

### Visual Description
# Technical Document: Image Analysis
## Image Description
The image depicts a **geometric logo** composed of **five interlocking loops** arranged in a symmetrical, circular pattern. The design resembles a **knot-like structure** with overlapping curves. Key observations:
1. **Structure**:
- Five loops form a continuous, interconnected pattern.
- Loops overlap to create a **central hexagonal void** (white space).
- Outer boundary is a **single continuous black contour**.
2. **Color Scheme**:
- **Black** lines (bold, uniform thickness).
- **White** background (negative space within loops and central void).
3. **Symmetry**:
- Radial symmetry with **five-fold rotational symmetry**.
- No discernible text, labels, or numerical data.
4. **Design Characteristics**:
- Minimalist, monochromatic.
- No gradients, textures, or additional graphical elements.
## Technical Notes
- **Purpose**: Likely a brand or organizational logo (no contextual clues for specific industry).
- **Vector Compatibility**: Suitable for scalable vector graphics (SVG) due to clean lines and symmetry.
- **File Format**: Not applicable (no embedded metadata or file-specific details).
## Conclusion
The image contains **no textual or numerical information**. The design relies solely on geometric abstraction and symmetry for visual impact.
</details>
GPT-4o | 96 (32.00%) | 9 | 84 (33.73%) | 9 |
Building on the above problem classifications, we will more rigorously compare and contrast Agentless and existing work. Specifically, we focus on a subset of the problems in SWE-bench Lite after removing the problems that contain the exact patch in the problem description, misleading solutions, or do not provide enough information in the original issue description. This eliminates the less reasonable problems and normalizes the difficulty level of the benchmark. We refer to our subset of 249 problems as SWE-bench Lite- $S$ . We note here that our approach of identifying and excluding problematic problems has already been confirmed by a later work done by OpenAI, where they have released a similar filtered benchmark of SWE-bench Verified Chowdhury et al. [2024].
Table 5 shows the results on the SWE-bench Lite- $S$ benchmark and the corresponding ranking of each approach. We also included the results on the original 300 problems in SWE-bench Lite for comparison. While the general ranking of all approaches stay roughly the same, we do observe some small ranking changes. Compared to the original SWE-bench Lite, our filtered benchmark of SWE-bench Lite- $S$ provides a more accurate reflection of the true capability of autonomous software development tools.
<details>
<summary>2407.01489v2/x11.png Details</summary>

### Visual Description
# Technical Document Extraction: Solve Rate Analysis of Code Generation Tools
## Chart Type
Bar chart comparing solve rates of code generation tools with and without reproducible examples.
## Axes
- **X-axis**: Code generation tools (categories)
- AutoCodeRover-v2
- RepoGraph
- Moatless-Claude-3.5
- CodeStoryAide
- MarsCode
- Honeycomb
- Agentless
- **Y-axis**: Solve rate (0.0 to 0.6 in increments of 0.1)
## Legend
- **Blue bars**: "w/o reproducible examples"
- **Purple bars**: "w/ reproducible examples"
- **Red dashed lines**: Reference thresholds at 0.3, 0.4, and 0.5
## Data Points
| Tool | w/o Examples (Blue) | w/ Examples (Purple) |
|-----------------------|---------------------|----------------------|
| AutoCodeRover-v2 | ~0.40 | ~0.29 |
| RepoGraph | ~0.37 | ~0.29 |
| Moatless-Claude-3.5 | ~0.35 | ~0.23 |
| CodeStoryAide | ~0.57 | ~0.41 |
| MarsCode | ~0.50 | ~0.40 |
| Honeycomb | ~0.52 | ~0.35 |
| Agentless | ~0.34 | ~0.34 |
## Key Observations
1. **General Trend**:
- Tools with reproducible examples (purple) consistently show higher solve rates than their counterparts without (blue), except for Agentless where both conditions are equal (~0.34).
2. **Highest Performers**:
- **CodeStoryAide**: Highest solve rate overall (~0.57 without examples, ~0.41 with examples).
- **MarsCode**: Strong performance (~0.50 without, ~0.40 with examples).
3. **Lowest Performers**:
- **Moatless-Claude-3.5**: Lowest solve rate with examples (~0.23), though still outperforms its "without examples" variant (~0.35).
4. **Threshold Analysis**:
- **0.3 threshold**: All tools except Moatless-Claude-3.5 (w/ examples) meet or exceed this.
- **0.4 threshold**: Only CodeStoryAide, MarsCode, and Honeycomb (w/ examples) reach this.
- **0.5 threshold**: Only CodeStoryAide (w/o examples) and Honeycomb (w/o examples) exceed this.
## Critical Notes
- **Agentless**: No improvement observed with reproducible examples (bars identical at ~0.34).
- **CodeStoryAide**: Demonstrates the largest absolute improvement (+0.16) when using reproducible examples.
- **Honeycomb**: Maintains high solve rates even without examples (~0.52), though drops to ~0.35 with examples.
## Structural Consistency
- All legend labels (blue/purple) align with bar colors.
- Red dashed lines correspond to y-axis thresholds (0.3, 0.4, 0.5).
- No missing or misaligned data points.
</details>
(a) Description quality
<details>
<summary>2407.01489v2/x12.png Details</summary>

### Visual Description
# Technical Document Extraction: Solve Rate Comparison Chart
## Chart Type
Bar chart comparing solve rates with and without natural language (NL) steps across multiple code generation tools.
## Axis Labels
- **X-axis**: Tool names (categories)
- AutoCodeRover-v2
- RepoGraph
- Moatless-Claude-3.5
- CodeStoryAide
- MarsCode
- Honeycomb
- Agentless
- **Y-axis**: Solve rate (0.0 to 0.7 in increments of 0.1)
## Legend
- **Blue bars**: "contains NL steps"
- **Purple bars**: "without steps"
## Data Points
| Tool | Contains NL Steps (Blue) | Without Steps (Purple) |
|-----------------------|--------------------------|------------------------|
| AutoCodeRover-v2 | ~0.55 | ~0.25 |
| RepoGraph | ~0.48 | ~0.27 |
| Moatless-Claude-3.5 | ~0.69 | ~0.40 |
| CodeStoryAide | ~0.65 | ~0.37 |
| MarsCode | ~0.74 | ~0.31 |
| Honeycomb | ~0.52 | ~0.30 |
| Agentless | ~0.53 | ~0.29 |
## Key Trends
1. **NL Steps Impact**: All tools show significantly higher solve rates when NL steps are included (blue bars) compared to when they are excluded (purple bars).
2. **Performance Thresholds**: Red dashed reference lines at 0.3, 0.4, 0.5, 0.6, and 0.7 indicate performance benchmarks. Most tools with NL steps exceed the 0.5 threshold.
3. **Top Performers**:
- MarsCode achieves the highest solve rate with NL steps (~0.74).
- Moatless-Claude-3.5 shows the largest improvement with NL steps (~0.69 vs. ~0.40 without).
4. **Lowest Performers**:
- RepoGraph has the lowest solve rate without NL steps (~0.27).
- Moatless-Claude-3.5 has the lowest solve rate with NL steps (~0.69) among tools with NL steps.
## Additional Observations
- **Red Dashed Lines**: Positioned at 0.3, 0.4, 0.5, 0.6, and 0.7 on the y-axis, likely representing target solve rate thresholds.
- **Consistency**: All tools demonstrate a positive correlation between NL step inclusion and solve rate improvement.
</details>
(b) Solution in description
<details>
<summary>2407.01489v2/x13.png Details</summary>

### Visual Description
# Technical Document Extraction: Bar Chart Analysis
## Chart Overview
The image is a grouped bar chart comparing **solve rates** across seven software tools, with four categorical groupings represented by color-coded bars. The chart includes a red dashed threshold line at **0.3 solve rate**.
---
## Axis Labels and Markers
- **X-axis**: Tools (categories)
- AutoCodeRover-v2
- RepoGraph
- Moatless-Claude-3.5
- CodeStoryAide
- MarsCode
- Honeycomb
- Agentless
- **Y-axis**: Solve rate (0.0 to 0.7, increments of 0.1)
- **Legend**:
- `in NL` (blue)
- `in stack trace` (purple)
- `keywords` (green)
- `none` (red)
- **Threshold Line**: Red dashed line at **0.3 solve rate**
---
## Key Trends and Data Points
1. **Highest Solve Rate**:
- The `in NL` category (blue) consistently achieves the highest solve rates across all tools, exceeding **0.5** in all cases.
- **Peak**: Honeycomb (`in NL`) reaches **~0.7**, the highest value in the chart.
2. **Second-Highest Solve Rate**:
- The `keywords` category (green) is the second-highest, with values ranging from **~0.32 to ~0.48**.
- **Peak**: CodeStoryAide (`keywords`) at **~0.48**.
3. **Third-Highest Solve Rate**:
- The `in stack trace` category (purple) ranges from **~0.23 to ~0.45**.
- **Peak**: CodeStoryAide (`in stack trace`) at **~0.45**.
4. **Lowest Solve Rate**:
- The `none` category (red) is consistently the lowest, with values ranging from **~0.14 to ~0.34**.
- **Peak**: CodeStoryAide (`none`) at **~0.34** (only category above the 0.3 threshold).
5. **Threshold Analysis**:
- The red dashed line at **0.3** serves as a benchmark.
- **Observations**:
- All `in NL` bars exceed the threshold.
- `keywords` bars are above the threshold in all cases except RepoGraph (~0.32, marginally above).
- `in stack trace` and `none` categories are mostly below the threshold, except for CodeStoryAide (`none` at ~0.34).
---
## Tool-Specific Observations
| Tool | in NL (~Solve Rate) | in stack trace (~Solve Rate) | keywords (~Solve Rate) | none (~Solve Rate) |
|-----------------------|---------------------|------------------------------|------------------------|--------------------|
| AutoCodeRover-v2 | ~0.5 | ~0.25 | ~0.35 | ~0.23 |
| RepoGraph | ~0.55 | ~0.37 | ~0.32 | ~0.16 |
| Moatless-Claude-3.5 | ~0.55 | ~0.23 | ~0.28 | ~0.14 |
| CodeStoryAide | ~0.63 | ~0.45 | ~0.48 | ~0.34 |
| MarsCode | ~0.65 | ~0.40 | ~0.42 | ~0.33 |
| Honeycomb | ~0.7 | ~0.32 | ~0.40 | ~0.26 |
| Agentless | ~0.6 | ~0.32 | ~0.36 | ~0.21 |
---
## Conclusion
The chart demonstrates that **`in NL`** is the most effective category for solving issues across all tools, while **`none`** performs the worst. The `keywords` category shows moderate effectiveness, and **`in stack trace`** has variable performance. The red threshold line highlights that only `in NL` and select `keywords` cases consistently meet or exceed the 0.3 solve rate benchmark.
</details>
(c) Location information
Figure 9: Solve rate of selected approaches (orange means open-source while indigo means closed-source) on different problem categories in SWE-bench Lite- $S$ . Red dotted line indicates the average solve rate on the entire SWE-bench Lite- $S$ for each approach.
Using the classification results, we further examine the types of problems that are solved by Agentless and prior approaches on SWE-bench Lite- $S$ . Figure 9 shows the solve rate of various top-performing open-source and closed-source approaches across the different categories of problems. We first examine if having code examples to reproduce the error in the issue description can help the LLM better solve the issue in Figure 9(a). Surprisingly, we found that the solve rate of all prior approaches drop when evaluated on the problems with reproducible code examples. Many agent-based approaches Yang et al. [2024a]; ope [2024b]; Chen et al. [2024] attempt to first reproduce the error, however, this may not improve performance even on problems with already provided reproducible examples. However, we observe the performance for Agentless remains very high on the problems with reproducible code examples. This is because Agentless generates reproduction tests using the original issues descriptions, hence can better make use of the reproducible code examples provided (similarly for MentatBot which also contains an explicit test generation step). This demonstrate the importance of the test generation stage for patch selection. Next, we look at the effect of ground truth patch/solution in the issue description. Figure 9(b) shows the expected result where all selected techniques perform better on issues that already provide solution steps in natural language. Furthermore, in Figure 9(c), we examine the solve rate with respect to the location information provided in the issues description. Unsurprisingly, we found that the highest solve rates are on problems where the location is provided in natural language followed by stack traces. The most difficult problems are those that do not contain any clues about the location of the issue in the description. We observe that compared with closed-source approaches, Agentless performs comparably when the location is provided in either natural language, stack trace, or keywords. However, the closed-source agent tools perform better compared to Agentless in the case where no location clue is provided. This highlights an advantage of agent-based tools in solving these more complex problems where they are able to use complex code search tools. This represents potential future work for Agentless to target and further improve these types of problems.
6.3 SWE-bench Verified
Table 6: Performance on SWE-bench Verified.
Similar to SWE-bench Lite- $S$ and inspired by similar concerns in Section 6.1, OpenAI has produced a newly filtered dataset of SWE-bench Verified Chowdhury et al. [2024] validated by human developers to ensure each issue has sufficient amounts of information to be solved. Table 6 shows the performance of Agentless compared with prior agent-based approaches on SWE-bench Verified. Similar to the results on SWE-bench Lite, Agentless maintains its strong performance and is able to solve 194 out of 500 problems (38.80%). Agentless is able to achieve the second highest performance compared with open-source approaches and perform better than many closed-source / commercial techniques. Furthermore, we note that Agentless performs the best among all techniques that use GPT-4o as the LLM.
7 Threats to Validity
Internal. One threat to validity comes from the data leakage of ground truth developer patches in SWE-bench Lite being part of the training data for GPT-4o. Since GPT-4o is a closed-source model, we do not have access to the training data. Meanwhile, we note here that prior work almost exclusively used similar closed-source LLMs (e.g., GPT-4o, GPT-4, Claude-3.5, etc), and our approach can outperform all existing open-source solutions with same models. Furthermore, the authors of SWE-bench Jimenez et al. [2024a] compared the resolve rate of issues collected before and after the knowledge cutoff date of GPT-4, and did not find any significant difference. To completely address this threat, we would need to retrain GPT-4o from scratch which would be infeasible for an academic project.
External. One main external threat comes from our evaluation dataset of SWE-bench Lite. While the performance of Agentless might not generalize to other datasets, SWE-bench Lite is by far the most popular evaluation dataset which contains a diverse range of problems. In addition, OpenAI has independently performed an extensive evaluation of Agentless and other open-source solutions on SWE-bench Lite, SWE-bench, and their new SWE-bench Verified benchmark, further confirming that Agentless outperforms all other open-source agents Chowdhury et al. [2024]. Moreover, on Sept. 12th 2024, OpenAI just released the new o1 model family and also adopted Agentless as the top approach to showcase their performance on SWE-bench OpenAI [2024c]. In the future, we plan to further address this threat by evaluating Agentless on other benchmarks.
8 Conclusion
We propose Agentless – an agentless approach to automatically tackle software development problems. Agentless uses a simple three phase approach of localization, repair, and patch validation. Compared to prior agent-based approaches, Agentless deliberately disallows the LLM for autonomous tool usage or planning. Our evaluation on the popular SWE-bench Lite benchmark demonstrates that Agentless can achieve the highest performance compared with other open-source techniques while at the same time minimizing the cost. Furthermore, we perform a detailed classification of problems in SWE-bench Lite to not only offer new insights but to construct a more rigorous benchmark of SWE-bench Lite- $S$ after removing problematic problems.
Acknowledgments
We thank Jiawei Liu and Yuxiang Wei for providing some of the resources used to run the experiments. One of the authors would like to thank Jun Yang for generously gifting his old bike Sadly the bike is currently broken. which allowed the author to travel faster and thus increasing research speed.
References
- [1]
- ibm [2024] 2024. Agent-101: A Software Engineering Agent for Code Assistance developed by IBM Research. https://github.com/swe-bench/experiments/blob/main/evaluation/lite/20240612_IBM_Research_Agent101/README.md/.
- cod [2024] 2024. Aide by Codestory. https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240702_codestory_aide_mixed.
- sim [2024] 2024. Alex SIMA. https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240706_sima_gpt4o.
- ama [2024] 2024. Amazon Q Developer The most capable generative AI–powered assistant for software development. https://aws.amazon.com/q/developer//.
- app [2024] 2024. AppMap speedruns to the top of the SWE Bench Leaderboard. https://appmap.io/blog/2024/06/20/appmap-navie-swe-bench-leader/.
- aut [2024] 2024. AutoCodeRover Autonomous Software Engineering. https://autocoderover.dev/.
- dev [2024] 2024. Devin, AI software engineer. https://www.cognition.ai/introducing-devin.
- com [2024] 2024. Empower your AI agents with Composio - a platform for managing and integrating tools with LLMs and AI agents using Function Calling. https://docs.composio.dev/introduction/intro/overview.
- fac [2024] 2024. Factory Bringing Autonomy to Software Engineering. https://www.factory.ai/.
- hon [2024] 2024. Honeycomb. https://honeycomb.sh.
- too [2024] 2024. Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku. https://www.anthropic.com/news/3-5-models-and-computer-use.
- iso [2024] 2024. Isoform. https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240829_Isoform.
- lin [2024] 2024. Lingma Agent. https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240622_Lingma_Agent.
- men [2024] 2024. MentatBot: New SOTA Coding Agent, Available Now. https://mentat.ai/blog/mentatbot-sota-coding-agent.
- moa [2024] 2024. Moatless Tools. https://github.com/aorwall/moatless-tools.
- ope [2024a] 2024a. OpenCSG StarShip. https://opencsg.com/product?class=StarShip/.
- ope [2024b] 2024b. OpenDevin: Code Less, Make More. https://github.com/OpenDevin/OpenDevin/.
- ast [2024] 2024. Python ast — Abstract Syntax Trees. https://docs.python.org/3/library/ast.html/.
- rep [2024] 2024. RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph. https://github.com/ozyyshr/RepoGraph.
- gru [2024] 2024. The Road to Ultimate Pull Request Machine. https://gru.ai/blog/road-to-ultimate-pull-request-machine/.
- sol [2024] 2024. Solver. https://solverai.com.
- sup [2024] 2024. SuperCoder. https://superagi.com/supercoder/.
- swe [2024] 2024. SWE-bench Lite. https://www.swebench.com/lite.html.
- Abreu et al. [2009] Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan JC Van Gemund. 2009. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software 82, 11 (2009), 1780–1792.
- Abreu et al. [2007] Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007). IEEE, 89–98.
- Anthropic [2024] Anthropic. 2024. Introducing Claude 3.5 Sonnet. https://www.anthropic.com/news/claude-3-5-sonnet/.
- Arora et al. [2024] Daman Arora, Atharv Sonwane, Nalin Wadhwa, Abhav Mehrotra, Saiteja Utpala, Ramakrishna Bairi, Aditya Kanade, and Nagarajan Natarajan. 2024. MASAI: Modular Architecture for Software-engineering AI Agents. arXiv preprint arXiv:2406.11638 (2024).
- Austin et al. [2021] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
- Bouzenia et al. [2024] Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2024. Repairagent: An autonomous, llm-based agent for program repair. arXiv preprint arXiv:2403.17134 (2024).
- Chen et al. [2024] Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, et al. 2024. CodeR: Issue Resolving with Multi-Agent and Task Graphs. arXiv preprint arXiv:2406.01304 (2024).
- Chen et al. [2021] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Chen [2024] Yang Chen. 2024. Flakiness Repair in the Era of Large Language Models. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 441–443.
- Chen et al. [2019] Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE Transaction on Software Engineering (2019).
- Chowdhury et al. [2024] Neil Chowdhury, James Aung, Chan Jun Shern, Oliver Jaffe, Dane Sherburn, Giulio Starace, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Kevin Liu, and Aleksander Madry. 2024. Introducing SWE-bench Verified. OpenAI Blog (2024). https://openai.com/index/introducing-swe-bench-verified/.
- Deng et al. [2023] Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. 2023. Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. In 32nd International Symposium on Software Testing and Analysis (ISSTA).
- Deng et al. [2024] Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2024. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. In 46th International Conference on Software Engineering (ICSE).
- Gauthier [2024] Paul Gauthier. 2024. Aider is AI pair programming in your terminal. https://aider.chat/.
- Gazzola et al. [2019] Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45, 1 (2019), 34–67.
- Ghanbari et al. [2019] Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical Program Repair via Bytecode Mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). ACM, 19–30.
- Hidvégi et al. [2024] Dávid Hidvégi, Khashayar Etemadi, Sofia Bobadilla, and Martin Monperrus. 2024. Cigar: Cost-efficient program repair with llms. arXiv preprint arXiv:2402.06598 (2024).
- Huang et al. [2024] Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. 2024. Large Language Models Cannot Self-Correct Reasoning Yet. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=IkmD3fKBPQ
- Jiang et al. [2023] Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1430–1442. https://doi.org/10.1109/ICSE48619.2023.00125
- Jiang et al. [2021] Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (May 2021).
- Jimenez et al. [2024a] Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024a. SWE-bench: Can Language Models Resolve Real-world Github Issues?. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66
- Jimenez et al. [2024b] Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024b. SWE-bench Leaderboard. https://www.swebench.com/.
- Jin and Orso [2012] Wei Jin and Alessandro Orso. 2012. Bugredux: Reproducing field failures for in-house debugging. In 2012 34th international conference on software engineering (ICSE). IEEE, 474–484.
- Jones and Harrold [2005] James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 273–282.
- Kang et al. [2024] Sungmin Kang, Gabin An, and Shin Yoo. 2024. A quantitative and qualitative evaluation of LLM-based explainable fault localization. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1424–1446.
- Kolak et al. [2022] Sophia D Kolak, Ruben Martins, Claire Le Goues, and Vincent Josua Hellendoorn. 2022. Patch Generation with Language Models: Feasibility and Scaling Behavior. In Deep Learning for Code Workshop.
- Le et al. [2016] Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 213–224.
- Le Goues et al. [2012] Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72.
- Lemieux et al. [2023] Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. In 45th International Conference on Software Engineering (ICSE).
- Li et al. [2023] Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. Starcoder: may the source be with you!
- Li et al. [2019] Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 169–180.
- Li et al. [2020] Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-Based Code Transformation Learning for Automated Program Repair. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). Association for Computing Machinery, New York, NY, USA, 602–614.
- Liu et al. [2024c] Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. 2024c. Large Language Model-Based Agents for Software Engineering: A Survey. arXiv preprint arXiv:2409.02977 (2024).
- Liu et al. [2019] Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting Template-Based Automated Program Repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). ACM, New York, NY, USA, 31–42.
- Liu et al. [2024b] Yizhou Liu, Pengfei Gao, Xinchen Wang, Chao Peng, and Zhao Zhang. 2024b. MarsCode Agent: AI-native Automated Bug Fixing. arXiv preprint arXiv:2409.00899 (2024).
- Liu et al. [2024a] Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024a. Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13.
- LlamaIndex [2024] LlamaIndex. 2024. LlamaIndex, Data Framework for LLM Applications. https://www.llamaindex.ai/.
- Long and Rinard [2015] Fan Long and Martin Rinard. 2015. Staged Program Repair with Condition Synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). New York, NY, USA, 166–178.
- Long and Rinard [2016] Fan Long and Martin Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering. 702–713.
- Lou et al. [2020] Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. 2020. Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75–87.
- Ma et al. [2024] Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2024. How to Understand Whole Software Repository? arXiv preprint arXiv:2406.01422 (2024).
- Mechtaev et al. [2016] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). 691–701.
- Meng et al. [2024] Ruijie Meng, Martin Mirchev, Marcel Böhme, and Abhik Roychoudhury. 2024. Large language model guided protocol fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS).
- Meng et al. [2022] Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2022. Improving fault localization and program repair with deep semantic features and transferred knowledge. In Proceedings of the 44th International Conference on Software Engineering. 1169–1180.
- Moon et al. [2014] Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. IEEE, 153–162.
- Olausson et al. [2023] Theo X Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, and Armando Solar-Lezama. 2023. Is Self-Repair a Silver Bullet for Code Generation?. In The Twelfth International Conference on Learning Representations.
- Oliinyk et al. [2024] Yaroslav Oliinyk, Michael Scott, Ryan Tsang, Chongzhou Fang, Houman Homayoun, et al. 2024. Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing. arXiv preprint arXiv:2403.03897 (2024).
- OpenAI [2023] OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
- OpenAI [2024a] OpenAI. 2024a. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/.
- OpenAI [2024b] OpenAI. 2024b. New embedding models and API updates. https://openai.com/index/new-embedding-models-and-api-updates/.
- OpenAI [2024c] OpenAI. 2024c. OpenAI o1 System Card. https://openai.com/index/openai-o1-system-card/.
- Ou et al. [2024] Xianfei Ou, Cong Li, Yanyan Jiang, and Chang Xu. 2024. The Mutators Reloaded: Fuzzing Compilers with Large Language Model Generated Mutation Operators. In ASPLOS.
- Papadakis et al. [2019] Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation testing advances: an analysis and survey. In Advances in computers. Vol. 112. Elsevier, 275–378.
- Papadakis and Le Traon [2015] Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability 25, 5-7 (2015), 605–628.
- Pizzorno and Berger [2024] Juan Altmayer Pizzorno and Emery D Berger. 2024. CoverUp: Coverage-Guided LLM-Based Test Generation. arXiv preprint arXiv:2403.16218 (2024).
- Qin et al. [2024] Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, and Xiaoguang Mao. 2024. AgentFL: Scaling LLM-based Fault Localization to Project-Level Context. arXiv preprint arXiv:2403.16362 (2024).
- Robertson et al. [2009] Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
- Ruan et al. [2024] Haifeng Ruan, Yuntong Zhang, and Abhik Roychoudhury. 2024. SpecRover: Code Intent Extraction via LLMs. arXiv preprint arXiv:2408.02232 (2024).
- Saha et al. [2013] Ripon K Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E Perry. 2013. Improving bug localization using structured information retrieval. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 345–355.
- Schäfer et al. [2023] Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. Adaptive Test Generation Using a Large Language Model. arXiv:2302.06527 [cs.SE]
- Shi et al. [2023] Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. 2023. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning. PMLR, 31210–31227.
- Singhal et al. [2001] Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35–43.
- Sohn and Yoo [2017] Jeongju Sohn and Shin Yoo. 2017. Fluccs: Using code and change metrics to improve fault localization. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 273–283.
- Wang et al. [2015] Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the usefulness of ir-based fault localization techniques. In Proceedings of the 2015 international symposium on software testing and analysis. 1–11.
- Wang and Lo [2014] Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd international conference on program comprehension. 53–63.
- Wei et al. [2023] Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2023. Magicoder: Source Code Is All You Need. arXiv preprint arXiv:2312.02120 (2023).
- Wong et al. [2016] W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
- Wu et al. [2023] Yonghao Wu, Zheng Li, Jie M Zhang, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large language models in fault localisation. arXiv preprint arXiv:2308.15276 (2023).
- Xi et al. [2023] Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2023. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864 (2023).
- Xia et al. [2024] Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Universal Fuzzing via Large Language Models. In 46th International Conference on Software Engineering (ICSE).
- Xia et al. [2023a] Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023a. Automated Program Repair in the Era of Large Pre-trained Language Models. In Proceedings of the ACM/IEEE 45th International Conference on Software Engineering (ICSE '23).
- Xia et al. [2023b] Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023b. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
- Xia and Zhang [2022] Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971.
- Xia and Zhang [2023] Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).
- Yang et al. [2024b] Aidan ZH Yang, Claire Le Goues, Ruben Martins, and Vincent Hellendoorn. 2024b. Large language models for test-free fault localization. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.
- Yang et al. [2023a] Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, and Lingming Zhang. 2023a. White-box Compiler Fuzzing Empowered by Large Language Models. arXiv:2310.15991 [cs.SE]
- Yang et al. [2023b] Chenyuan Yang, Zijie Zhao, and Lingming Zhang. 2023b. Kernelgpt: Enhanced kernel fuzzing via large language models. arXiv preprint arXiv:2401.00563 (2023).
- Yang et al. [2024a] John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024a. Swe-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793 (2024).
- Yuan et al. [2024] Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and improving chatgpt for unit test generation. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1703–1726.
- Zeller et al. [2019] Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2019. The fuzzing book.
- Zhang et al. [2024d] Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, et al. 2024d. Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents. arXiv preprint arXiv:2408.07060 (2024).
- Zhang et al. [2024b] Lyuye Zhang, Kaixuan Li, Kairan Sun, Daoyuan Wu, Ye Liu, Haoye Tian, and Yang Liu. 2024b. Acfix: Guiding llms with mined common rbac practices for context-aware repair of access control vulnerabilities in smart contracts. arXiv preprint arXiv:2403.06838 (2024).
- Zhang et al. [2024a] Quanjun Zhang, Chunrong Fang, Yang Xie, YuXiang Ma, Weisong Sun, and Yun Yang Zhenyu Chen. 2024a. A Systematic Literature Review on Large Language Models for Automated Program Repair. arXiv preprint arXiv:2405.01466 (2024).
- Zhang et al. [2023] Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2023. Siren's song in the AI ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 (2023).
- Zhang et al. [2024c] Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024c. AutoCodeRover: Autonomous Program Improvement. arXiv:2404.05427 [cs.SE]
- Zhou et al. [2012] Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In 2012 34th International conference on software engineering (ICSE). IEEE, 14–24.
- Zhu et al. [2021] Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A Syntax-Guided Edit Decoder for Neural Program Repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 341–353.