# Technical Document Extraction: CI Pipeline Root Cause Analysis & Solution Generation System
## Overview
The diagram illustrates a two-stage CI/CD pipeline analysis system combining offline preprocessing with online LLM-driven root cause analysis and solution generation. The system is divided into **Offline** (pink) and **Online** (blue) components, with bidirectional data flow between stages.
---
## Offline Processing (Preprocessing)
### Prep. 1: Success Log Template Deduplication
1. **Input**: Original Failed CI Log
2. **Processing Steps**:
- **Keyword Filtering**: Extracts keywords and log tail from CI logs
- **Log Diff Analysis**: Compares with latest success logs
- **Token Overflow Pruning**: Applies context window expansion and density-based ranking
3. **Output**: Critical Log Blocks + RCA Prompt Template
### Prep. 2: Offline Knowledge Base Construction
1. **Input**: Enterprise Document
2. **Processing Steps**:
- **Chunking**: Splits documents into embeddable units
- **Embedding Model**: Converts chunks to vector representations
- **Vector Database**: Stores embeddings for retrieval
3. **Output**: Vector Database with enhanced knowledge blocks
---
## Online Processing (Execution)
### Stage 1: Root Cause Analysis
1. **Input**: Critical Log Blocks + RCA Prompt Template
2. **Processing Steps**:
- **LLM Initial Analysis**: Generates preliminary root cause hypotheses
- **Knowledge Retrieval**: Cross-references with vector database
3. **Output**: Root Cause Analysis Report
### Stage 2: Solution Generation & Execution
1. **Input**: Root Cause Analysis Report
2. **Processing Steps**:
- **Solution Generation**: Creates executable solutions using LLM
- **Customized Tools**: Maps solutions to available CI tools
- **Tool Execution**: Implements solutions via pipeline rerun
3. **Output**: CI Pipeline Fixed status
---
## Key Components & Flow
1. **Offline-Online Integration**:
- Offline components prepare templates and knowledge bases
- Online components use LLMs for dynamic analysis and solution generation
2. **Data Flow**:
- Failed CI logs → Offline preprocessing → Online analysis → Solution execution
- Knowledge base updates feed into both stages
---
## Technical Architecture
- **Offline System**:
- Success Log Template Deduplication Engine
- Knowledge Base Construction Pipeline
- **Online System**:
- LLM-Driven Root Cause Analysis Module
- Automated Solution Generation Framework
- CI Pipeline Execution Controller
---
## Critical Data Points
1. **Log Processing**:
- Keyword extraction from failed CI logs
- Temporal comparison with latest success logs
2. **Knowledge Base**:
- Embedding model converts enterprise documents to vector representations
- Vector database enables semantic search for solution retrieval
3. **LLM Workflow**:
- Initial root cause hypothesis generation
- Solution proposal and tool mapping
- Pipeline rerun execution
---
## Process Flow Diagram
```
[Original Failed CI Log]
→ [Keyword Filtering]
→ [Token Overflow Pruning]
→ [Critical Log Blocks]
→ [RCA Prompt Template]
→ [LLM Root Cause Analysis]
→ [Knowledge Retrieval]
→ [Solution Generation]
→ [Customized Tools]
→ [Tool Execution]
→ [CI Pipeline Fixed]
```
---
## Implementation Notes
1. **Offline Preprocessing**:
- Success log template deduplication reduces noise
- Knowledge base construction enables context-aware analysis
2. **Online Analysis**:
- LLM combines log analysis with knowledge base retrieval
- Solution generation uses tool-specific prompt templates
3. **Execution**:
- Customized tools interface with CI/CD pipelines
- Pipeline rerun validates solution effectiveness
This system architecture enables automated root cause analysis and solution implementation for CI/CD failures through integrated offline preprocessing and online LLM-driven processing.