## Code Comparison Diagram: LLM vs. LLM + APOLLO Lean 4 Proofs
### Overview
The image presents a side-by-side comparison of two code snippets written in the Lean 4 theorem proving language. The left panel displays code generated by a baseline "LLM" (Large Language Model), while the right panel displays code generated by an augmented system labeled "LLM + APOLLO". The diagram highlights specific differences in the proof tactics used by each system, using color-coded blocks and numbered tags to draw attention to areas where APOLLO modifies, expands, or deletes the baseline LLM's output.
### Components and Spatial Grounding
The image is divided into two distinct vertical panels:
* **Left Panel (LLM):** Enclosed in a red border with a red header box at the top center containing the text "LLM" in black. The background is a very faint red/pink. It contains Lean 4 code. Two specific areas are highlighted with a solid light red background and tagged with "#1" and "#2" on their right edges.
* **Right Panel (LLM + APOLLO):** Enclosed in a green border with a green header box at the top center containing the text "LLM + APOLLO" in black. The background is a very faint green. It contains Lean 4 code. Two specific areas are highlighted with a solid light green background and tagged with "#1" and "#2" on their right edges. Text within the "#2" highlight features a strikethrough effect.
### Content Details: Transcription and Component Isolation
#### 1. Left Panel: LLM
This panel shows the baseline proof attempt.
**Transcription:**
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem induction_pord1p1on2powklt5on2_llm (n k : ℕ) (h₀ : 0 < n) :
(∏ k in Finset.Icc 1 n, 1 + (1 : ℝ) / 2 ^ k) < 5 / 2 := by
induction n with
| zero =>
contradiction
| succ n hn =>
cases n with
| zero =>
norm_num #1
| succ n =>
simp_all [Finset.prod_Icc_succ_top, Nat.cast_succ, div_eq_mul_inv]
norm_num #2
ring_nf
linarith
```
*Spatial Notes for Left Panel:*
* **Highlight #1:** The line `norm_num` under the `| zero =>` case has a red background. The tag `#1` is placed immediately to the right of `norm_num`.
* **Highlight #2:** The block of three lines (`norm_num`, `ring_nf`, `linarith`) at the very end of the code has a red background. The tag `#2` is placed immediately to the right of the `norm_num` line within this block.
#### 2. Right Panel: LLM + APOLLO
This panel shows the modified proof attempt. The setup and initial theorem declaration are identical, except for the theorem name suffix (`_apollo` instead of `_llm`) and slight differences in line wrapping.
**Transcription:**
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem induction_pord1p1on2powklt5on2_apollo (n k : ℕ) (h₀ : 0 < n) :
(∏ k in Finset.Icc 1 n, 1 + (1 : ℝ) / 2 ^ k) < 5 / 2 := by
induction n with
| zero =>
contradiction
| succ n hn =>
cases n with
| zero =>
norm_num
induction k with
| zero =>
norm_num [Finset.prod_Icc_succ_top] at hn h₀ <;> linarith
| succ k ih =>
cases k with
| zero =>
norm_num [Finset.prod_Icc_succ_top] at hn h₀ <;> linarith
| succ k =>
simp_all [Finset.prod_Icc_succ_top, pow_succ, mul_comm]
linarith #1
| succ n =>
simp_all [Finset.prod_Icc_succ_top, Nat.cast_succ, div_eq_mul_inv]
~~norm_num~~ #2
~~ring_nf~~
~~linarith~~
```
*Spatial Notes for Right Panel:*
* **Highlight #1:** A large green background block begins at the `norm_num` line under the `| zero =>` case and extends down to the `linarith` line just before `| succ n =>`. The tag `#1` is placed to the far right, aligned with the top line (`norm_num`) of this highlighted block.
* **Highlight #2:** A green background block covers the final three lines of the code. The text of these three lines (`norm_num`, `ring_nf`, `linarith`) is crossed out with a strikethrough line. The tag `#2` is placed immediately to the right of the crossed-out `norm_num`.
### Key Observations
1. **Expansion at #1:** The baseline LLM attempts to solve the `| zero =>` case of the `cases n` block with a single `norm_num` tactic. This is highlighted in red, implying it fails or is insufficient. The APOLLO system expands this single line into a complex, 14-line nested induction on `k` (`induction k with...`), handling multiple sub-cases to properly close the goal.
2. **Deletion at #2:** At the end of the proof, under the `| succ n =>` case, the baseline LLM generates three tactics: `norm_num`, `ring_nf`, and `linarith` after a `simp_all` call. The APOLLO system highlights these exact three lines and applies a strikethrough, indicating that these tactics are either unnecessary (the proof is already closed by `simp_all`), redundant, or cause an error in the Lean environment.
### Interpretation
This image serves as a technical demonstration of the "APOLLO" system's capabilities in the domain of formal theorem proving (specifically using Lean 4). It illustrates that while a standard LLM can generate the broad structure of a mathematical proof, it often fails at specific logical junctions by either hallucinating overly simplistic tactics that don't work (Observation #1) or by appending unnecessary/failing steps at the end of a branch (Observation #2).
The APOLLO system appears to act as an agentic or iterative wrapper around the LLM. By comparing the two panels, we can deduce APOLLO's function:
* **Generative Repair:** When a tactic fails (like the first `norm_num`), APOLLO does not just give up; it generates a deeper, structurally sound sub-proof (the nested induction on `k`) to bridge the logical gap.
* **Pruning:** APOLLO recognizes when the goal state has been achieved or when subsequent tactics are invalid, actively removing them (the struck-through text) to ensure the final proof script compiles cleanly without errors.
Ultimately, the data suggests that combining an LLM with the APOLLO framework transforms an incomplete or broken formal proof into a verified, working proof by dynamically expanding necessary logic and pruning faulty logic.