\n
## Diagram: LLM Fine-Tuning and TrialMaster Inference Process
### Overview
The image is a technical diagram illustrating two related processes for using Large Language Models (LLMs) in formal theorem proving. The top section depicts a standard "LLM Fine-Tuning" pipeline. The bottom, more detailed section illustrates a multi-step inference process called "TrialMaster Inference," which incorporates backtracking and interaction with a proof assistant (Lean).
### Components/Axes
The diagram is divided into two primary horizontal sections, separated by a double line.
**Top Section: LLM Fine-Tuning**
* **Left Side:** A flow starts with the text `theorem (i.e., state 0)` pointing to a yellow box labeled `LLM`.
* **Middle:** The `LLM` box points to `predicted proof (e.g., state 0; tactic 1; ...; state 4)`.
* **Right Side:** A dashed box contains the `ground truth`. Inside:
* A text sequence: `state 0; tactic 1; state 1; tactic 2; state 2; backtrack to state 0; tactic 3; state 3; tactic 4; state 4.`
* A small state transition graph showing nodes 0, 1, 2, 3, 4. Arrows show transitions: 0→1, 1→2, 2→0 (blue backtrack arrow), 0→3, 3→4. Node 4 has a green checkmark.
* **Connection:** An orange arrow labeled `compute loss` points from the `ground truth` box back to the `predicted proof` text.
**Bottom Section: TrialMaster Inference**
* **Legend (Top-Left):** A gray box defines symbols: `(k) state k → tactic → backtrack`. It shows a circle with a number (state), a black arrow (tactic application), and a blue arrow (backtrack).
* **Main Process (Five Sequential Columns):** Each column represents a step in the inference process, showing a state diagram at the top and a detailed flow below.
* **Column 1:**
* *Top Diagram:* Single node `0`.
* *Flow:* `state 0` → `LLM` (yellow box) → `tactic 1` → `Lean` (blue box) → `state 1`. A curved arrow loops from `state 1` back to the `Lean` box input.
* **Column 2:**
* *Top Diagram:* Nodes `0` and `1` with a black arrow from 0 to 1.
* *Flow:* `state 0; tactic 1; state 1` → `LLM` → `tactic 2` → `Lean` → `state 2`. A curved arrow loops from `state 2` back to the `Lean` box input.
* **Column 3:**
* *Top Diagram:* Nodes `0`, `1`, `2` with black arrows 0→1, 1→2.
* *Flow:* `state 0; tactic 1; tactic 2; state 2` → `LLM` → `"backtrack to state 0"` (blue text).
* **Column 4:**
* *Top Diagram:* Nodes `0`, `1`, `2` with black arrows 0→1, 1→2, and a blue arrow from 2 back to 0.
* *Flow:* `state 0; ...; state 2; backtrack to state 0` → `LLM` → `tactic 3` → `Lean` → `state 3`. A curved arrow loops from `state 3` back to the `Lean` box input.
* **Column 5:**
* *Top Diagram:* Nodes `0`, `1`, `2`, `3`, `4`. Black arrows: 0→1, 1→2, 0→3, 3→4. Blue arrow: 2→0. Node 4 has a green checkmark.
* *Flow:* `state 0; ...; state 3` → `LLM` → `tactic 4` → `Lean` → `state 4: "complete"` (green text).
### Detailed Analysis
The diagram contrasts two methodologies:
1. **LLM Fine-Tuning (Top):** This is a standard supervised learning setup. The model is trained to minimize the loss between its entire predicted proof sequence and a fixed ground truth sequence. The ground truth includes a backtrack step, indicating the training data contains non-linear proof attempts.
2. **TrialMaster Inference (Bottom):** This is an interactive, step-by-step generation process.
* **State:** Represented by circled numbers (0, 1, 2...). It is the current goal or context in the proof.
* **Tactic:** An action proposed by the LLM to advance the proof from the current state.
* **Lean:** The interactive proof assistant that executes the tactic on the current state, producing a new state.
* **Backtrack:** A critical operation where the LLM can decide to revert to a previous state (e.g., from state 2 back to state 0) if it determines the current path is unfruitful. This is shown with a blue arrow in the state diagrams and blue text in the flow.
* **Process Flow:** The LLM receives the current proof state/history, proposes a tactic or a backtrack command. If a tactic is given, Lean executes it to produce a new state. This repeats until a "complete" state is reached.
### Key Observations
* The **state diagrams** at the top of each inference column provide a visual summary of the proof search tree being built. The final diagram (Column 5) matches the structure shown in the `ground truth` of the fine-tuning section.
* The **curved arrows** in the first, second, and fourth inference columns suggest a feedback loop where the new state is fed back into the process for the next LLM call.
* The **color coding** is consistent: Yellow for LLM, Blue for Lean/Backtrack text, Green for success ("complete" and checkmark).
* The **"backtrack to state 0"** command in Column 3 is generated directly by the LLM, not by Lean, indicating the model learns to make strategic control decisions.
### Interpretation
This diagram explains a sophisticated method for improving LLM-based theorem proving. The core insight is that **proof search is not a single, linear forward pass**. The "TrialMaster Inference" system explicitly models and enables:
1. **Interactive Refinement:** The LLM interacts with a formal verifier (Lean) at each step, grounding its predictions in a rigorous environment.
2. **Strategic Backtracking:** The model can recognize dead ends and retreat to an earlier, promising state, mimicking human proof-search behavior. This is a significant advancement over models that can only generate a single, monolithic proof script.
3. **Learning from Non-Linear Proofs:** The fine-tuning data (top section) includes backtracking steps, teaching the model that proofs can involve exploration and correction. The inference process (bottom section) then operationalizes this ability.
The relationship between the two sections is causal: the fine-tuning process trains the LLM to understand proof states, tactics, *and* backtracking, which is then deployed in the more complex, iterative TrialMaster inference loop. The ultimate goal is to produce a verified, complete proof (state 4) by dynamically navigating a proof tree.