## Diagram: LLM Fine-Tuning vs. DFS Inference for Theorem Proving
### Overview
The image is a technical diagram comparing two processes: **LLM Fine-Tuning** (left panel) and **DFS Inference** (right panel). It illustrates how a Large Language Model (LLM) is trained to generate proofs for mathematical theorems and how it is then used during inference with a Depth-First Search (DFS) strategy, integrated with a formal verification tool called "Lean." The diagram uses flowcharts, state trees, and annotated steps to explain the workflows.
### Components/Axes
The diagram is divided into two main vertical panels separated by a double line.
**Left Panel: LLM Fine-Tuning**
* **Title:** "LLM Fine-Tuning" (top-left, bold).
* **Process Flow:**
1. Starts with "theorem (i.e., state 0)".
2. An arrow points down to a yellow box labeled "LLM".
3. An arrow points down from the LLM to "predicted proof (e.g., state 0; tactic 1; ...; state 7)".
4. An upward orange arrow labeled "compute loss" points from a yellow box containing a training sample to the "predicted proof".
5. The yellow training sample box contains: "state 0; tactic 6; state 6; tactic 7; state 7."
6. Below this box is a small tree diagram labeled "proof tree of a training sample". The tree has nodes 0, 1, 2, 6, 7. Node 7 has a green checkmark (✓). A blue curved arrow points from node 0 to node 6.
**Right Panel: DFS Inference**
* **Title:** "DFS Inference" (top-center, bold).
* **Legend (Top of Right Panel):**
* `ⓚ` "state k by calling Lean"
* `?` "waiting to call Lean"
* `→` "tactic generated by LLM" (black arrow)
* `→` "backtrack to the last valid parent state" (blue arrow)
* **Main Content:** The inference process is shown in three sequential columns, separated by dashed vertical lines, representing steps in a search.
* **Column 1 (Step 3):**
* **Top:** A state tree showing nodes 0, 1, 2, 3, and several `?` nodes. Node 3 has a red cross (✗). A blue arrow backtracks from node 3 to node 2.
* **Bottom (Gray Box):** A flowchart showing the sequence: "state 0; tactic 1; state 1; tactic 2; state 2" -> "LLM" -> "tactic 3, tactic 4" -> "Lean" -> "state 3: 'error'" (in red). Labeled "step 3" at the bottom.
* **Column 2 (Step 4):**
* **Top:** The state tree expands. Node 4 is added with a red cross (✗). A blue arrow backtracks from node 4 to node 2.
* **Bottom (Gray Box):** Similar flowchart: "state 0; tactic 1; state 1; tactic 2; state 2" -> "LLM" -> "tactic 3, tactic 4" -> "Lean" -> "state 4: 'error'" (in red). An annotation points to the "LLM" box: "no LLM execution at this step". Labeled "step 4" at the bottom.
* **Column 3 (Last Step):**
* **Top:** The state tree shows a successful path. From node 1, a black arrow points to node 6. From node 6, a black arrow points to node 7, which has a green checkmark (✓). Nodes 3, 4, and 5 have red crosses (✗). Blue arrows show backtracking from failed branches.
* **Bottom (Gray Box):** Flowchart: "state 0; tactic 6; state 6" -> "LLM" -> "tactic 7" -> "Lean" -> "state 7: 'complete'" (in green). Labeled "last step" at the bottom.
### Detailed Analysis
**LLM Fine-Tuning Process:**
1. **Input:** A theorem, represented as an initial proof state ("state 0").
2. **Model:** An LLM takes this state as input.
3. **Output:** The LLM generates a "predicted proof," which is a sequence of states and tactics (e.g., "state 0; tactic 1; ...; state 7").
4. **Training Objective:** The loss is computed by comparing the LLM's predicted proof sequence against a ground-truth proof sequence from a training sample (e.g., "state 0; tactic 6; state 6; tactic 7; state 7."). The associated proof tree shows the correct path from state 0 to state 7 via state 6.
**DFS Inference Process:**
1. **Search Strategy:** Uses Depth-First Search to explore possible proof paths.
2. **State Representation:** Circled numbers (`ⓚ`) represent proof states verified by calling Lean. Circled question marks (`?`) represent states waiting for verification.
3. **Action Flow:**
* The LLM generates one or more candidate tactics for the current state (e.g., "tactic 3, tactic 4" for state 2).
* These tactics are sent to "Lean" for verification.
* Lean returns either a new valid state (`ⓚ`) or an "error."
4. **Backtracking:** If a tactic leads to an error (marked with a red `✗`), the search backtracks (blue arrow) to the last valid parent state to try a different tactic.
5. **Progression:**
* **Step 3:** From state 2, tactics 3 and 4 are tried. Both lead to errors (states 3 and 4).
* **Step 4:** The search backtracks to state 2. The diagram notes "no LLM execution at this step," implying the previously generated tactics (3 & 4) are being exhausted or the search is moving to a different branch.
* **Last Step:** The search finds a successful path. From state 1, tactic 6 leads to state 6. From state 6, the LLM generates tactic 7, which Lean verifies, leading to state 7 marked "complete" (green text).
### Key Observations
1. **Integration of LLM and Formal Verifier:** The core workflow involves an LLM proposing tactics and a formal system (Lean) verifying them, creating a human-AI collaborative proof search.
2. **Error-Driven Search:** The DFS is guided by verification results. Errors trigger backtracking, making the search systematic.
3. **Training vs. Inference:** Fine-tuning trains the LLM to mimic complete proof sequences. Inference uses the LLM as a generator within a search algorithm that can recover from its mistakes via backtracking.
4. **Visual Coding:** Colors and symbols are used consistently: yellow for LLM, blue for Lean, red for errors/failure, green for success/completion, blue arrows for backtracking.
5. **State Tree Evolution:** The state trees at the top of the DFS panel visually map the search history, showing explored nodes, dead ends (✗), and the final successful path (✓).
### Interpretation
This diagram explains a methodology for **neural theorem proving**. It demonstrates how to bridge the gap between the flexible, pattern-matching capabilities of LLMs and the rigorous, step-by-step verification of formal systems.
* **The Fine-Tuning Stage** teaches the LLM the "shape" of valid proofs by training it on sequences of states and tactics from a dataset. The model learns to predict the next step in a proof.
* **The Inference Stage** addresses the LLM's fallibility. Instead of relying on a single, potentially incorrect prediction, it uses the LLM as a heuristic tactic generator within a classical search algorithm (DFS). The formal verifier (Lean) acts as an oracle, providing ground truth on whether a proposed step is valid. This makes the system robust; a wrong guess doesn't fail the entire proof but simply causes the search to backtrack and try an alternative.
The key insight is that **LLMs can accelerate formal proof search by proposing promising tactics**, while **formal verification ensures correctness and guides the search away from dead ends**. The "no LLM execution at this step" note in Step 4 is particularly insightful—it suggests the system may cache or reuse previously generated tactics, optimizing the search process. This hybrid approach leverages the strengths of both neural networks (intuition, creativity) and symbolic AI (rigor, reliability) for a complex reasoning task.