# Technical Document: LLM Agent Workflow Architecture
This document provides a detailed technical extraction of the provided system architecture diagram. The diagram illustrates a multi-tool LLM (Large Language Model) Agent framework designed to process user questions through keyword extraction, external searching, evidence gathering, and final answer generation.
## 1. System Overview
The system is centered around a central **LLM AGENT** which acts as the orchestrator, routing data between four specialized tool modules.
## 2. Component Isolation and Analysis
### A. Central Orchestrator
* **Label:** LLM AGENT
* **Function:** Receives the initial "Input Question" and coordinates the flow of information between the Keyword Extractor, Search and Storage, Gather Evidence, and Answer Generator tools.
### B. Keyword Extractor Tool (Yellow Region)
This module processes the initial query to identify core search terms.
* **Internal Flow:**
1. **Check Question Complexity:** Evaluates the input.
2. **Decision Node (Yes/No):**
* If **Yes** (Complex): Proceeds to **Divide into Sub-questions**, then to **Extract Keywords**.
* If **No** (Simple): Proceeds directly to **Extract Keywords**.
* **Inputs/Outputs:**
* **Input:** "User Question" (from LLM Agent).
* **Output:** "Keywords" (returned to LLM Agent).
### C. Search and Storage Tool (Blue Region)
This module interacts with external data sources (specifically Stack Overflow, denoted as "SO").
* **Internal Flow:**
1. **Search for questions in SO**: Initial query execution.
2. **Select top-50 questions using BM-25**: Ranking and filtering.
3. **Search for answers in SO**: Retrieving content for the selected questions.
4. **Filter answers**: Refining the retrieved data.
* **Data Storage/Artifacts:**
* **Vector DB:** Receives "Filtered question-answers".
* **JSON file:** Receives "Stored Question IDs".
* **Inputs/Outputs:**
* **Input:** "Keywords" (from LLM Agent).
* **Feedback Loop:** A line connects the JSON file back to the "Select top-50..." step.
### D. Gather Evidence Tool (Purple Region)
This module retrieves and validates information from the stored data.
* **Internal Flow:**
1. **Embed Question**: Vectorization of the query.
2. **Search in VectorDB with MMR**: Retrieval using Maximal Marginal Relevance.
3. **Answer LLM scorer**: Scoring the relevance of retrieved answers.
4. **Check Evidence**: Final validation step.
* **Inputs/Outputs:**
* **Input:** "User Question" (from LLM Agent).
* **Output 1:** "Evidence status" (returned to LLM Agent).
* **Output 2:** "Evidence" (sent to Answer Generator Tool).
* **Data Source:** Connects to the **Vector DB**.
### E. Answer Generator Tool (Red Region)
The final stage of the pipeline.
* **Internal Flow:**
1. **Generate Answer**: The core processing block.
* **Inputs/Outputs:**
* **Input 1:** "Question (If enough evidence)" (from LLM Agent).
* **Input 2:** "Evidence" (from Gather Evidence Tool).
* **Input 3:** "Unanswered questions" (from the Filter answers step in the Search and Storage Tool).
* **Final Output:** "Answer" (exits the system).
## 3. Data Flow Summary
| From | To | Data Label |
| :--- | :--- | :--- |
| External | LLM AGENT | Input Question |
| LLM AGENT | Keyword Extractor Tool | User Question |
| Keyword Extractor Tool | LLM AGENT | Keywords |
| LLM AGENT | Search and Storage Tool | Keywords |
| Search and Storage Tool | Vector DB | Filtered question-answers |
| Search and Storage Tool | JSON file | Stored Question IDs |
| LLM AGENT | Gather Evidence Tool | User Question |
| Gather Evidence Tool | LLM AGENT | Evidence status |
| Gather Evidence Tool | Answer Generator Tool | Evidence |
| LLM AGENT | Answer Generator Tool | Question (If enough evidence) |
| Search and Storage Tool | Answer Generator Tool | Unanswered questions |
| Answer Generator Tool | External | Answer |
## 4. Technical Observations
* **Search Algorithm:** The system specifically utilizes **BM-25** for ranking search results from Stack Overflow.
* **Retrieval Strategy:** The Gather Evidence tool uses **MMR (Maximal Marginal Relevance)** to ensure diversity in the evidence retrieved from the Vector DB.
* **Conditional Logic:** The system includes a complexity check for questions, allowing for decomposition into sub-questions before keyword extraction.
* **Error Handling/Completeness:** The "Unanswered questions" path from the Search tool to the Answer Generator suggests the system handles cases where direct answers are not found in the primary search.