## Diagram: Code Comparison - LLM vs LLM + APOLLO
### Overview
The image displays a side-by-side comparison of two Lean 4 code snippets attempting to formally prove a mathematical theorem (`amc12a_2019_p12`). The left panel, outlined in red and labeled "LLM", shows a short, incomplete, or failing proof attempt with specific sections highlighted in red. The right panel, outlined in green and labeled "LLM + APOLLO", shows a highly detailed, successful proof spanning two columns, with corresponding sections highlighted in green. Numbered tags (`#1`, `#2`, `#3`) are used to map specific logical steps between the two approaches, demonstrating how the "APOLLO" system corrects and expands upon the baseline LLM's output.
### Components and Layout
* **Left Panel (Red Outline):**
* **Header:** A red box containing the text "LLM".
* **Content:** A single column of Lean 4 code.
* **Highlights:** Light red background applied to lines 11-16 and line 19.
* **Tags:** `#1`, `#2`, `#3` placed inline with the code.
* **Right Panel (Green Outline):**
* **Header:** A green box containing the text "LLM + APOLLO".
* **Content:** Lean 4 code split into two columns to accommodate its length. The left column contains the beginning of the proof, and the right column continues it.
* **Highlights:** Light green background applied to lines 12-19 and lines 24-38 in the first column.
* **Tags:** `#1`, `#2`, `#3` placed inline with the code, corresponding to the tags in the left panel.
### Content Details (Transcription)
#### Left Panel: LLM
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem amc12a_2019_p12_llm (x y : ℝ) (h₀ : x ≠ 1 ∧ y ≠ 1)
(h₁ : Real.log x / Real.log 2 = Real.log 16 / Real.log y) (h₂ : x * y = 64) :
(Real.log (x / y) / Real.log 2) ^ 2 = 20 := by
have x_nonzero : x ≠ 0 := by #1 -- [HIGHLIGHTED RED START]
rintro rfl
simp at h₁
have y_nonzero : y ≠ 0 := by #2
rintro rfl
simp at h₁ -- [HIGHLIGHTED RED END]
field_simp [Real.log_div, Real.log_mul, Real.log_inv, Real.log_pow] at h₁ ⊢
nlinarith #3 -- [HIGHLIGHTED RED]
```
#### Right Panel: LLM + APOLLO (Column 1)
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem amc12a_2019_p12_apollo (x y : ℝ) (h₀ : x ≠ 1 ∧ y ≠ 1)
(h₁ : Real.log x / Real.log 2 = Real.log 16 / Real.log y) (h₂ : x * y = 64) :
(Real.log (x / y) / Real.log 2) ^ 2 = 20
:= by
have x_nonzero : x ≠ 0 := by #1 -- [HIGHLIGHTED GREEN START]
rintro rfl
simp at h₁
linarith
have y_nonzero : y ≠ 0 := by #2
rintro rfl
simp at h₁
linarith -- [HIGHLIGHTED GREEN END]
field_simp [Real.log_div, Real.log_mul, Real.log_inv, Real.log_pow] at h₁ ⊢
ring_nf at h₁ ⊢
field_simp [Real.log_mul, Real.log_rpow, Real.log_pow, mul_comm] at h₁ ⊢
let L := Real.log (2 : ℝ) #3 -- [HIGHLIGHTED GREEN START]
let X := Real.log x
let Y := Real.log y
have hXY : X + Y = Real.log 64 := by
rw [←Real.log_mul x_nonzero y_nonzero]
rw [h₂]
have hXY' : X + Y = 6 * L := by
rw [hXY]
have hL : 6 * Real.log (2 : ℝ) = Real.log (64) := by
have h2_pos : (2 : ℝ) > 0 := by norm_num
... -- [HIGHLIGHTED GREEN END]
```
#### Right Panel: LLM + APOLLO (Column 2)
```lean
...
have H : Real.log (2^6) = 6 * Real.log (2 : ℝ) := by
simp
rw [← H]
norm_num
gcongr
have h16 : Real.log (16 : ℝ) = 4 * L := by
norm_num
have h2_pos : (2 : ℝ) > 0 := by norm_num
have H' : Real.log (2^4) = (4:ℝ) * Real.log (2 : ℝ) := by
simp
rw [←H']
norm_num
rw [h16] at h₁
have hX : X = (4 * L^2) / Y := by
have hL : Real.log (2 : ℝ) = L := rfl
rw [hL] at h₁
have hL' : L * (4 * L) = 4 * L^2 := by linarith
rw [hL'] at h₁
exact h₁
have hXY_prod : X * Y = 4 * L^2 := by
rw [hX]
field_simp [y_nonzero]
ring_nf
have Y_nonzero : Y ≠ 0 := by
have y_log_nonzero : Real.log y ≠ 0 := by
intro H
have hy_exp : y = Real.exp (Real.log y) := by simp_all only [ne_eq, div_zero, log_eq_zero, false_or, log_neg_eq_log, log_one, exp_zero, mul_neg, mul_one, neg_neg, OfNat.one_ne_ofNat]
rw [H, Real.exp_zero] at hy_exp
have y_eq_one : y = 1 := hy_exp
exact (h₀.right) y_eq_one
exact y_log_nonzero
simp_all only [ne_eq, isUnit_iff_ne_zero, not_false_eq_true, IsUnit.mul_inv_cancel_right]
have diff_eq : X^2 + Y^2 - 2 * X * Y = (X + Y)^2 - 4 * (X * Y) := by ring
rw [hXY'] at diff_eq
rw [hXY_prod] at diff_eq
linarith
```
### Key Observations
1. **Error Correction at `#1` and `#2`:** The baseline LLM attempts to prove `x_nonzero` and `y_nonzero` using `rintro rfl` and `simp at h₁`, but fails to close the goal (indicated by the red highlight). The APOLLO system corrects this by adding the `linarith` tactic to both sub-proofs, successfully closing them (indicated by the green highlight).
2. **Strategic Divergence at `#3`:** The baseline LLM attempts to solve the remainder of the complex logarithmic equation with a single, powerful tactic: `nlinarith`. The red highlight indicates this tactic fails or times out.
3. **APOLLO's Step-by-Step Approach:** At tag `#3`, instead of relying on a single tactic, the APOLLO system introduces variable substitutions (`let L`, `let X`, `let Y`) to convert the logarithmic problem into an algebraic one. It then meticulously proves intermediate lemmas (`hXY`, `hXY'`, `h16`, `hX`, `hXY_prod`, `Y_nonzero`, `diff_eq`) over dozens of lines of code before finally concluding the proof with `linarith`.
### Interpretation
This image serves as a visual demonstration of the limitations of standard Large Language Models (LLMs) in formal theorem proving, and how an augmented system (APOLLO) overcomes them.
The baseline LLM exhibits a common failure mode in formal mathematics: it attempts to "leap" to the conclusion using heavy-handed automation tactics (`nlinarith`) without properly setting up the intermediate logical steps. When the automation fails to find a path, the proof fails.
The "APOLLO" system (which likely represents an agentic search framework, a specialized decoding strategy, or a reinforcement learning pipeline built on top of the LLM) demonstrates a much deeper "understanding" of the required proof structure. By breaking the problem down—specifically by recognizing that the logarithmic equations should be mapped to algebraic variables (`X`, `Y`, `L`)—APOLLO guides the prover through a series of manageable, verifiable steps. The contrast between the single failing red line (`nlinarith`) and the massive block of successful green code highlights APOLLO's ability to perform complex, multi-step reasoning that standard LLMs cannot achieve in a single zero-shot generation.