# Technical Diagram: Automated Software Issue Resolution Pipeline
This diagram illustrates a multi-stage workflow for automatically resolving software issues using Large Language Models (LLMs) and embeddings. The process is divided into three main functional phases: Localization, Repair, and Patch Validation.
## Legend and Components
Located at the bottom right [x: 870, y: 550]:
* **Blue Box:** Localization
* **Purple Box:** Repair
* **Green Box:** Patch Validation
* **Robot Icon:** LLM (Large Language Model)
* **Document/Purple Icon:** Embedding
---
## Phase 1: Localization (Blue Region)
This phase focuses on narrowing down the codebase to specific files, classes, and functions relevant to a reported issue.
1. **Input:**
* **Issue:** "Lambdify misinterprets some matrix expressions Using lambdify on an..."
* **Project Codebase:** Represented as a GitHub repository containing various folders and files.
2. **Step 1:** The codebase and issue are processed to extract the **Repo structure**.
* *Example structure shown:* `views/`, `__init__.py`, `csrf.py`, `static.py`, `generic/` (`detail.py`, `__init__.py`, `edit.py`, `list.py`).
3. **Step 2 & 3 (Localize to Top-N Files):** An LLM and an Embedding tool process the issue and repo structure to identify relevant files.
4. **Step 4 (Localize to Classes & Functions):** The LLM narrows the scope further.
* **class RegexValidator:** `def __call__`, `def __eq__` in `static.py`.
* **class Handler:** `def load_middleware`, `def _matrixify` in `generic/detail.py`.
* **class Truncator:** `def _text_chars`, `def _truncate_html` in `csrf.py`.
5. **Step 5 (Localize to Edit Locations):** The LLM identifies specific code blocks for modification.
* **function: edit_matrix:** `def edit_matrix(): if debug and name:`
* **class: Handler ExtHandler:** `class Handler: def load_middleware ... class ExtHandler:`
---
## Phase 2: Repair (Purple Region)
This phase involves generating potential code fixes based on the localized information.
6. **Step 6 (Generate Patches):** The LLM takes the localized edit locations and the original issue to produce multiple candidate patches.
* **Input context:** `lines: 120-230` and `class: Handler`.
* **Output:** A stack of "Patch" objects, each showing line additions (+12) and deletions (-20) with a visual diff bar (green/red).
---
## Phase 3: Patch Validation (Green Region)
This phase ensures the generated patches actually fix the issue without introducing regressions.
7. **Step 7 (Generate Reprod. Tests):** The LLM generates reproduction test cases (`test.py`) based on the issue description.
* *Code snippet:* `def test(): assert ...`
8. **Step 8:** The generated patches from the Repair phase are applied to the codebase.
9. **Step 9 (Filter & Rank Patches):** The reproduction tests are run against the patched code. The LLM filters out patches that fail the tests and ranks the successful ones.
10. **Step 10 (Submitted Patch):** The highest-ranking valid patch is selected as the final output.
---
## Data Flow Summary
* **Primary Flow:** Issue + Codebase $\rightarrow$ File Localization $\rightarrow$ Function Localization $\rightarrow$ Edit Location $\rightarrow$ Patch Generation $\rightarrow$ Patch Filtering $\rightarrow$ Final Submitted Patch.
* **Secondary Flow:** The Issue also triggers the generation of reproduction tests used specifically in the final validation stage.
* **Feedback Loop:** The diagram shows a red line connecting the "Project Codebase" to the "Patch Validation" and "Submitted Patch" stages, indicating the context used for testing and the final application of the fix.