## Code Comparison Diagram: LLM vs. LLM + APOLLO (Lean 4 Proofs)
### Overview
This image presents a side-by-side comparison of two formal mathematical proofs written in the Lean 4 theorem proving language. The image is divided into two main panels. The left panel, outlined in red and titled "LLM", shows a relatively short, single-column proof containing several highlighted errors or hallucinated steps. The right panel, outlined in green and titled "LLM + APOLLO", shows a much longer, three-column proof where the errors from the left panel have been replaced with extensive, successfully verified tactic blocks. The numbered tags (#1 through #8) map the failed steps in the standalone LLM to the corrected, expanded sub-proofs generated by the APOLLO system.
*Language Declaration:* The text is written in Lean 4, a formal programming and theorem-proving language that utilizes English keywords alongside mathematical notation. It is transcribed exactly as written.
### Components/Layout
* **Left Panel (Red Bounding Box):**
* **Header:** "LLM" (white text on red background).
* **Layout:** A single column of code.
* **Highlights:** Specific lines of code are highlighted in light red. These lines contain inline comments with numbers (e.g., `#1`, `#2`), indicating points of failure or hallucination where the LLM assumed a tactic (like `omega` or `nlinarith`) would solve the goal, but it did not.
* **Right Panel (Green Bounding Box):**
* **Header:** "LLM + APOLLO" (white text on green background).
* **Layout:** Three columns of code, separated by thin vertical gray lines. The second and third columns have `...` at the top, indicating continuous flow from the previous column.
* **Highlights:** Large blocks of code are highlighted in light green. These blocks contain corresponding inline comments (`#1`, `#2`, etc.). These represent the expanded, rigorous proof steps that APOLLO generated to successfully bridge the logical gaps left by the standalone LLM.
---
### Content Details (Transcription)
#### Left Panel: LLM (Red)
```lean
theorem mathd_algebra_289_llm (k t m n : ℕ)
(h₀ : Nat.Prime m ∧ Nat.Prime n) (h₁ : t < k)
(h₂ : (k ^ 2 : ℤ) - m * k + n = 0) (h₃ : (t ^ 2 :
ℤ) - m * t + n = 0) :
m ^ n + n ^ m + k ^ t + t ^ k = 20 := by
have h4 : (k : ℤ) ^ 2 - m * k + n = 0 := by
exact_mod_cast h₂
have h5 : (t : ℤ) ^ 2 - m * t + n = 0 := by
exact_mod_cast h₃
have h6 : (k : ℤ) ^ 2 - m * k + n - ((t : ℤ) ^ 2 - m
* t + n) = 0 := by
linarith
have h7 : (k : ℤ) ^ 2 - (t : ℤ) ^ 2 - m * (k - t) = 0
:= by
linarith
have h8 : (k - t) * ((k : ℤ) + (t : ℤ)) - m * (k - t)
= 0 := by
ring_nf at h7 ⊢
linarith
have h9 : (k - t) * ((k : ℤ) + (t : ℤ) - m) = 0 := by
linarith
have h10 : (k : ℤ) + (t : ℤ) - m = 0 := by
apply (mul_eq_zero.mp h9).resolve_left
omega
have h11 : (k : ℤ) + (t : ℤ) = m := by
linarith
have h12 : k + t = m := by
norm_num at h11 ⊢
exact_mod_cast h11
have h14 : k * t = n := by
nlinarith [h4, h5, h12]
have h15 : t = 1 := by
have h16 : t = 1 := by
have h17 : k * t = n := h14
have h18 : t < k := h₁
have h19 : k > 1 := by
by_contra h
push_neg at h
have : k = 1 := by omega
[RED HIGHLIGHT START] have t : 1 := by nlinarith #1 [RED HIGHLIGHT END]
linarith
have h20 : t = 1 := by
have h21 : k * t = n := h17
have h22 : n > 1 := by
by_contra h
push_neg at h
[RED HIGHLIGHT START] have n : 1 := by omega #2 [RED HIGHLIGHT END]
[RED HIGHLIGHT START] have k : 1 := by nlinarith #3 [RED HIGHLIGHT END]
linarith
have h23 : k > t := by linarith
[RED HIGHLIGHT START] interval_cases k <;> omega #4 [RED HIGHLIGHT END]
exact h20
exact h16
have h21 : k = 2 := by
have h22 : n = k := by
rw [h15] at h14
nlinarith
have h23 : m = k + 1 := by
[RED HIGHLIGHT START] have h24 : (k : ℤ) + (t : ℤ) = m := by #5
exact h12 [RED HIGHLIGHT END]
rw [h15] at h24
have h25 : (k : ℤ) + 1 = m := by linarith
norm_num at h25 ⊢
omega
have h26 : Nat.Prime m := h₀.left
have h27 : Nat.Prime n := h₀.right
rw [h22] at h27
rw [h23] at h26
by_contra h
push_neg at h
have h28 : k > 2 := by omega
have h29 : k < 4 := by
by_contra h30
push_neg at h30
have h31 : m > 4 := by
rw [h23]
omega
have h32 : m % 2 = 0 := by
have h33 : (k : ℤ) % 2 = 0 := by
[RED HIGHLIGHT START] have h34 : k % 2 = 0 := by omega #6 [RED HIGHLIGHT END]
norm_cast at h34 ⊢
have h35 : (m : ℤ) = (k : ℤ) + 1 := by linarith
[RED HIGHLIGHT START] rw [h35] #7
omega [RED HIGHLIGHT END]
have h36 : m > 2 := by
rw [h23]
omega
have h37 : Even m := by
have h38 : (m : ℤ) % 2 = 0 := by
norm_cast
exact ⟨m / 2, by omega⟩
have h39 : m = 2 := by
exact (Nat.Prime.even_iff h26).mp h37
omega
[RED HIGHLIGHT START] have h30 : k = 2 := by #8
interval_cases k <;> omega
exact h30 [RED HIGHLIGHT END]
have h25 : n = 2 := by
rw [h21] at h14
nlinarith
have h26 : m = 3 := by
omega
have h27 : m ^ n + n ^ m + k ^ t + t ^ k = 20 := by
rw [h26, h25, h21, h15]
norm_num
exact h27
```
#### Right Panel: LLM + APOLLO (Green) - Column 1
```lean
theorem mathd_algebra_289_apollo (k t m n : ℕ) (h₀
: Nat.Prime m ∧ Nat.Prime n) (h₁ : t < k)
(h₂ : (k ^ 2 : ℤ) - m * k + n = 0) (h₃ : (t ^ 2 :
ℤ) - m * t + n = 0) :
m ^ n + n ^ m + k ^ t + t ^ k = 20 := by
have h4 : (k : ℤ) ^ 2 - m * k + n = 0 := by
exact_mod_cast h₂
have h5 : (t : ℤ) ^ 2 - m * t + n = 0 := by
exact_mod_cast h₃
have h6 : (k : ℤ) ^ 2 - m * k + n - ((t : ℤ) ^
2 - m * t + n) = 0 := by
linarith
have h7 : (k : ℤ) ^ 2 - (t : ℤ) ^ 2 - m * (k -
t) = 0 := by
linarith
have h8 : (k - t) * ((k : ℤ) + (t : ℤ)) - m *
(k - t) = 0 := by
ring_nf at h7 ⊢
linarith
have h9 : (k - t) * ((k : ℤ) + (t : ℤ) - m) = 0
:= by
linarith
have h10 : (k : ℤ) + (t : ℤ) - m = 0 := by
apply (mul_eq_zero.mp h9).resolve_left
omega
have h11 : (k : ℤ) + (t : ℤ) = m := by
linarith
have h12 : k + t = m := by
norm_num at h11 ⊢
exact_mod_cast h11
have h14 : k * t = n := by
nlinarith [h4, h5, h12]
have h15 : t = 1 := by
have h16 : t = 1 := by
have h17 : k * t = n := by
gcongr
have h18 : t < k := by
gcongr
have h19 : k > 1 := by
by_contra h
push_neg at h
have : k = 1 := by
omega
[GREEN HIGHLIGHT START] have t : 1 := by #1
try norm_cast ; try norm_num
; try simp_all ; try ring_nf at * ; try
native_decide ; try linarith ; try nlinarith
have h2 : Nat.Prime (0 : ℕ)
:= h₀.right
have h3 : ¬Nat.Prime (0 : ℕ)
:= by
simp [Nat.prime_def_lt]
contradiction
linarith
have h20 : t = 1 := by
have h21 : k * t = n := by
gcongr
have h22 : n > 1 := by
by_contra h
push_neg at h
have n : 1 := by #2
simp_all only [sub_self,
mul_zero, gt_iff_lt]
have one_lt_n : 1 < n :=
by
exact (h₀.2).one_lt
have one_le_n : 1 ≤ n :=
le_of_lt one_lt_n
have n_eq_one : n = 1 :=
le_antisymm h one_le_n
have : k * t = 1 := by
rwa [n_eq_one] at h21
have t_eq_one : t = 1 :=
by
cases k
· simp at h19
· linarith
gcongr
have k : 1 := by #3
simp_all only
[Nat.cast_one, sub_self, mul_zero, gt_iff_lt,
mul_eq_one, le_refl]
linarith
have h23 : k > t := by
linarith
simp_all only [sub_self, #4
mul_zero, gt_iff_lt]
have k_dvd_n : k ∣ n := by
use t
gcongr
have k_cases : k = 1 ∨ k = n :=
by
rcases h₀ with ⟨_, hn⟩
exact (dvd_prime hn).mp
k_dvd_n
cases' k_cases with keq1 keqn
· linarith [keq1]
have k_eq_n : k = n := keqn
have nt_eq_n : n * t = n := by rw
[← k_eq_n] at h21; simp_all only [dvd_refl]
have n_pos : 0 < n := by linarith
[h22]
have t_eq_one : t = 1 := by exact
(mul_right_eq_self_iff n_pos).mp nt_eq_n
exact t_eq_one
exact h20
exact h16 [GREEN HIGHLIGHT END]
```
#### Right Panel: LLM + APOLLO (Green) - Column 2
```lean
...
have h21 : k = 2 := by
have h22 : n = k := by
rw [h15] at h14
nlinarith
have h23 : m = k + 1 := by
[GREEN HIGHLIGHT START] have h24 : (k : ℤ) + (t : ℤ) = m := #5
by
gcongr [GREEN HIGHLIGHT END]
rw [h15] at h24
have h25 : (k : ℤ) + 1 = m := by
linarith
norm_num at h25 ⊢
omega
have h26 : Nat.Prime m := by
simp_all only [Nat.cast_add,
Nat.cast_one, one_pow, mul_one,
sub_add_cancel_right, neg_add_cancel, sub_self,
mul_zero]
have h27 : Nat.Prime n := by
simp_all only [Nat.cast_add,
Nat.cast_one, one_pow, mul_one,
sub_add_cancel_right, neg_add_cancel, sub_self,
mul_zero]
rw [h22] at h27
rw [h23] at h26
by_contra h
push_neg at h
have h28 : k > 2 := by
omega
have h29 : k < 4 := by
by_contra h30
push_neg at h30
have h31 : m > 4 := by
rw [h23]
omega
have h32 : m % 2 = 0 := by
have h33 : (k : ℤ) % 2 = 0 := by
[GREEN HIGHLIGHT START] have h34 : k % 2 = 0 := by #6
try norm_cast ; try
norm_num ; try simp_all ; try ring_nf at * ;
try native_decide ; try linarith ;
try nlinarith
have h_odd : k % 2 = 0
:= by
have h1 : k > 2 := by
linarith
have h2 : k % 2 =
0 ∨ k % 2 = 1 := by
omega
rcases h2 with
(h_even | h_odd)
· exact h_even
· have h_odd_k : k
% 2 = 1 := by
gcongr
have h4 : (k + 1)
% 2 = 0 := by
omega
have h5 :
k + 1 > 2 := by
linarith
have h6 :
¬Nat.Prime (k + 1) := by
apply
Nat.not_prime_of_dvd_of_lt (m := 2)
· omega
· omega
· omega
try norm_cast ;
try norm_num ; try simp_all ; try ring_nf at * ;
try native_decide ; try linarith ; try nlinarith
exact h26
exact h_odd
norm_cast at h34 ⊢
have h35 : (m : ℤ) = (k : ℤ) + 1
:= by linarith
[GREEN HIGHLIGHT START] try norm_cast ; try norm_num ; #7
try simp_all ; try ring_nf at * ; try
native_decide ; try linarith ; try nlinarith
have h34 : k % 2 = 1 := by
have h35 : k ≠ 2 := h
have h36 : k > 2 := by linarith
have h37 : k % 2 = 1 := by
by_contra h38
push_neg at h38
have h39 : k > 2 := h36
have h40 : ¬Nat.Prime k :=
by
apply
Nat.not_prime_of_dvd_of_lt (m := 2)
· omega
· linarith
· omega
contradiction
exact h37
calc
(1 + k) % 2 = (1 % 2 + k % 2)
% 2 := by simp [Nat.add_mod]
_ = (1 + 1) % 2 := by rw
[Nat.zero_mod 2, h34]
_ = 2 % 2 := by rfl
_ = 0 := by norm_num [GREEN HIGHLIGHT END]
have h36 : m > 2 := by
rw [h23]
omega
...
```
#### Right Panel: LLM + APOLLO (Green) - Column 3
```lean
...
have h37 : Even m := by
have h38 : (m : ℤ) % 2 = 0 := by
norm_cast
exact even_iff.mpr h32
have h39 : m = 2 := by
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
have h_odd : k % 2 = 1 := by
have h_even : (1 + k) % 2 = 0 :=
h32
omega
have h1 : (1 + k) % 2 = 0 := h32
have h2 : (1 + k) > 2 := by omega
have h3 : ¬Nat.Prime ((1 + k)) :=
by
have h4 : (1 + k) > 2 := by omega
have h5 : (1 + k) % 2 = 0 := h32
have h6 : ¬Nat.Prime ((1 + k)) :=
by
apply
Nat.not_prime_of_dvd_of_lt (m := 2)
· have h7 : 2 ∣ (1 + k) := by
exact
Nat.dvd_of_mod_eq_zero h5
exact h7
· linarith
· nlinarith
exact h6
have h_contra : Nat.Prime ((1 + k))
:= h26
contradiction
omega
[GREEN HIGHLIGHT START] have h30 : k = 2 := by #8
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
have h30 : k = 3 := by
linarith
have h31 : ¬Nat.Prime ((1 : ℕ) + k)
:= by
norm_num
contradiction
linarith [GREEN HIGHLIGHT END]
have h25 : n = 2 := by
rw [h21] at h14
nlinarith
have h26 : m = 3 := by
omega
have h27 : m ^ n + n ^ m + k ^ t + t ^ k = 20
:= by
rw [h26, h25, h21, h15]
norm_num
exact h27
```
---
### Key Observations
1. **Visual Trend (Length and Complexity):** The most striking visual trend is the disparity in code volume between the red and green highlights. A single line of code in the "LLM" panel (e.g., `#1`, `#2`, `#4`) expands into massive, multi-nested logical blocks in the "LLM + APOLLO" panel.
2. **Tactic Hallucination:** In the Red panel, the LLM frequently attempts to close goals using terminal tactics like `omega` (a decision procedure for integer/natural number arithmetic) or `nlinarith` (non-linear arithmetic). For example, Red `#1` is `have t : 1 := by nlinarith`.
3. **APOLLO's Search/Expansion:** In the Green panel, APOLLO replaces these hallucinated terminal tactics with extensive search sequences. Green `#1` replaces `nlinarith` with a sequence of attempts: `try norm_cast ; try norm_num ; try simp_all ; try ring_nf at * ; try native_decide ; try linarith ; try nlinarith`, followed by actual logical deductions involving `Nat.Prime` and `contradiction`.
4. **Mapping the Tags:**
* **#1:** Red attempts `nlinarith`. Green expands into a ~10-line proof by contradiction.
* **#2:** Red attempts `omega`. Green expands into a ~15-line proof using `le_antisymm` and case analysis.
* **#3:** Red attempts `nlinarith`. Green uses `simp_all only [...]`.
* **#4:** Red attempts `interval_cases k <;> omega`. Green expands into a ~15-line proof using divisibility (`k ∣ n`) and prime properties (`dvd_prime`).
* **#5:** Red uses `exact h12`. Green uses `gcongr`.
* **#6:** Red attempts `omega`. Green expands into a massive ~25-line sub-proof handling even/odd cases and prime contradictions.
* **#7:** Red attempts `rw [h35]` followed by `omega`. Green expands into a ~20-line calculation block (`calc`) to prove modulo arithmetic.
* **#8:** Red attempts `interval_cases k <;> omega`. Green expands into a proof by contradiction showing `k = 3` violates prime conditions.
### Interpretation
This image serves as a technical demonstration of the limitations of standalone Large Language Models in formal theorem proving, and the value of augmenting them with search/agentic frameworks (like "APOLLO").
* **The Problem:** The base LLM (left) understands the *structure* of a Lean 4 proof. It knows what intermediate steps (`have` statements) might be useful to reach the final goal. However, it suffers from "tactic hallucination"—it assumes powerful automated tactics like `omega` or `nlinarith` can magically bridge the gap between its current state and the intermediate goal, even when the mathematical context doesn't support it. The proof on the left is syntactically plausible but logically broken; it will not compile in Lean.
* **The Solution:** The APOLLO system.