## Diagram: Code Comparison - LLM vs. LLM + APOLLO in Lean 4
### Overview
This image is a side-by-side comparison of two code snippets written in the Lean 4 theorem-proving language. The image contrasts how a standard Large Language Model (LLM) approaches a formal mathematical proof versus how an augmented system ("LLM + APOLLO") approaches the exact same proof. The goal of the code is to solve a system of two linear equations with two complex variables ($f$ and $z$).
### Components/Layout
The image is divided into two distinct panels:
* **Left Panel (Red Border):**
* **Header:** "LLM" (White text on a red rectangular background, top-center).
* **Content:** A relatively short block of Lean 4 code.
* **Annotations:** Three sections of code are highlighted with a red background and tagged with `#1`, `#2`, and `#3`.
* **Right Panel (Green Border):**
* **Header:** "LLM + APOLLO" (White text on a green rectangular background, top-center).
* **Content:** A significantly longer block of Lean 4 code that wraps into two columns due to its length.
* **Annotations:** Three massive blocks of code are highlighted with a light green background and tagged with `#1`, `#2`, and `#3`. These correspond directly to the numbered steps in the left panel.
### Content Details (Transcription & Structural Analysis)
*Language Declaration: The primary language of the text is English, but the content is formal programming code written in **Lean 4**, utilizing specific mathematical unicode symbols (e.g., `ℂ` for Complex numbers, `∧` for logical AND, subscripts like `₀`).*
Both panels start with the exact same imports and environment setup:
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
set_option pp.numericTypes true
set_option pp.coercions.types true
open BigOperators Real Nat Topology Rat
```
Both panels attempt to prove the same theorem, though the naming differs slightly (`_llm` vs `_apollo`). The theorem establishes that given two hypotheses ($h_0$ and $h_1$), the values of $f$ and $z$ are $-10$ and $7$, respectively.
**Hypotheses:**
$h_0$: $f + 3z = 11$
$h_1$: $3(f - 1) - 5z = -68$
**Goal:** $f = -10 \land z = 7$
---
#### Left Panel Analysis ("LLM")
The standard LLM attempts to solve the equations using high-level automation tactics, specifically `linarith` (linear arithmetic solver).
**Full Transcription of Left Panel Theorem:**
```lean
theorem
algebra_2varlineareq_fp3zeq11_3tfm1m5zeqn68_feqn10_zeq7
_llm (f z : ℂ)
(h₀ : f + (3 : ℂ) * z = (11 : ℂ))
(h₁ : (3 : ℂ) * (f - 1) - (5 : ℂ) * z = (-68 : ℂ))
:
f = (-10 : ℂ) ∧ z = (7 : ℂ) := by
-- Simplify the second equation using mul_sub
have h₂ : 3 * f - 3 - 5 * z = -68 := by
simpa [mul_sub] using h₁
-- Combine like terms to isolate 3*f - 5*z
have h₃ : 3 * f - 5 * z = -65 := by #1
linarith [h₂]
-- Use the two equations to solve for z
have z_eq : z = 7 := by #2
linarith [h₀, h₃]
-- Substitute z back to solve for f
have f_eq : f = -10 := by #3
linarith [h₀, z_eq]
-- Conclude both equalities
exact And.intro f_eq z_eq
```
*Note: The red highlights on `#1`, `#2`, and `#3` in this context likely indicate that these `linarith` tactic calls fail to compile or evaluate correctly in the Lean environment, representing a hallucination of tactic success by the baseline LLM.*
---
#### Right Panel Analysis ("LLM + APOLLO")
The APOLLO-augmented system replaces the short, automated `linarith` calls with incredibly verbose, step-by-step algebraic manipulations using foundational tactics (`rw`, `exact`, `norm_num`, `ring_nf`).
**Transcription of Right Panel Theorem (Segmented by Annotations):**
**Setup and Step 0:**
```lean
theorem
algebra_2varlineareq_fp3zeq11_3tfm1m5zeqn68_feqn10_
zeq7_apollo (f z : ℂ)
(h₀ : f + (3 : ℂ) * z = (11 : ℂ))
(h₁ : (3 : ℂ) * (f - 1) - (5 : ℂ) * z = (-68 :
ℂ)) :
f = (-10 : ℂ) ∧ z = (7 : ℂ) := by
have h₂ : 3 * f - 3 - 5 * z = -68 := by
simpa [mul_sub] using h₁
```
**Annotation Block #1 (Proving $3f - 5z = -65$):**
Instead of `linarith [h₂]`, APOLLO generates a massive block of intermediate steps.
```lean
have h₃ : 3 * f - 5 * z = -65 := by #1
try norm_cast ; try norm_num ; try simp_all ;
try ring_nf at * ; try native_decide ; try
linarith ; try nlinarith
have h_expanded : (3 : ℂ) * f - (3 : ℂ) -
(5 : ℂ) * z = (-68 : ℂ) := by
gcongr
have h_added : (3 : ℂ) * f - (5 : ℂ) * z =
((3 : ℂ) * f - (3 : ℂ) - (5 : ℂ) * z) + (3 : ℂ)
:= by
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try
native_decide ; try linarith ; try nlinarith
have h_add : f * (3 : ℂ) - z * (5 : ℂ) =
(-68 : ℂ) + (3 : ℂ) := by
have lhs : (-3 : ℂ) + (f * (3 : ℂ) - z
* (5 : ℂ)) + (3 : ℂ) = f * (3 : ℂ) - z * (5 : ℂ)
:= by
exact neg_add_cancel_comm (3 : ℂ)
(f * (3 : ℂ) - z * (5 : ℂ))
have rhs : (-68 : ℂ) + (3 : ℂ) = (-68
: ℂ) + (3 : ℂ) := by
exact rfl
rw [← lhs] at h₁
simp_all only
have h_sum : (-68 : ℂ) + (3 : ℂ) = (-65 :
ℂ) := by
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
rw [h_add, h_sum]
have h_subst : (3 : ℂ) * f - (5 : ℂ) * z =
(-68 : ℂ) + (3 : ℂ) := by
rw [h_expanded] at h_added
exact h_added
have h_simpl : (-68 : ℂ) + (3 : ℂ) = (-65 : ℂ)
:= by
norm_num
exact Eq.trans h_subst h_simpl
```
**Annotation Block #2 (Proving $z = 7$):**
Instead of `linarith [h₀, h₃]`, APOLLO manually isolates $z$ through complex substitution and division.
```lean
have z_eq : z = 7 := by #2
try norm_cast ; try norm_num ; try simp_all ;
try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
have f_eq_def : f = (11 : ℂ) - (3 : ℂ) * z :=
by
exact eq_sub_of_add_eq h₀
have h3_sub : (3 : ℂ) * ((11 : ℂ) - (3 : ℂ) *
z) - (5 : ℂ) * z = (-65 : ℂ) := by
rw [f_eq_def] at h₃
exact h₃
have h3_simplified : (33 : ℂ) - (14 : ℂ)
* z = (-65 : ℂ) := by
norm_num
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
gcongr
have isolate_z : (14 : ℂ) * z = 98 := by
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
have eq1 : (30 : ℂ) = ((-68 : ℂ) + z *
(14 : ℂ)) := by
rw [sub_eq_iff_eq_add] at h₁
gcongr
have eq2 : (30 : ℂ) + (68 : ℂ) =
((-68 : ℂ) + (68 : ℂ)) + z * (14 : ℂ) := by
simp_all only [add_sub_cancel_right,
neg_add_cancel_comm, neg_add_cancel, zero_add]
...
```
*(The code wraps to the second column here, continuing the proof for Block #2)*
```lean
...
have eq3 : (98 : ℂ) = (0 : ℂ) + z * (14 :
ℂ) := by
norm_num at eq2
simp_all only [add_sub_cancel_right,
zero_add]
have eq4 : (0 : ℂ) + z * (14 : ℂ) = z *
(14 : ℂ) := by
simp
have eq_final : (98 : ℂ) = z * (14 : ℂ)
:= by
simp_all only [add_sub_cancel_right,
neg_add_cancel_comm, neg_add_cancel, zero_add]
gcongr
have z_sol : z = (7 : ℂ) := by
norm_num
try norm_cast ; try norm_num ; try
simp_all ; try ring_nf at * ; try native_decide ;
try linarith ; try nlinarith
have h_nonzero : (14 : ℂ) ≠ 0 := by
norm_num
have z_inv : z = 98 * ((14 : ℂ)⁻¹) := by
exact (eq_mul_inv_iff_mul_eq₀
h_nonzero).mpr isolate_z
have z_div : z = 98 / (14 : ℂ) := by
exact z_inv
have z_simpl : 98 / (14 : ℂ) = (7 : ℂ) :=
by
norm_num
rw [z_div, z_simpl]
exact z_sol
```
**Annotation Block #3 (Proving $f = -10$):**
Instead of `linarith [h₀, z_eq]`, APOLLO manually substitutes $z=7$ into the original equation.
```lean
have f_eq : f = -10 := by #3
try norm_cast ; try norm_num ; try simp_all ;
try ring_nf at * ; try native_decide ; try
linarith ; try nlinarith
have h_f : f = (11 : ℂ) - (21 : ℂ) := by
exact eq_sub_of_add_eq' h₀
have h_num : (11 : ℂ) - (21 : ℂ) = (-10 : ℂ)
:= by
norm_num
rw [h_f, h_num]
exact And.intro f_eq z_eq
```
### Key Observations
1. **Tactic Failure vs. Explicit Derivation:** The red highlights on the left strongly imply that the generic LLM generated code that looks logically sound to a human but fails the strict compiler checks of Lean 4. The `linarith` tactic often struggles with complex numbers (`ℂ`) or requires specific preprocessing that the LLM omitted.
2. **Brute Force Automation Strategy:** The APOLLO side inserts a "shotgun" approach string of tactics (`try norm_cast ; try norm_num ; try simp_all ; try ring_nf at * ; try native_decide ; try linarith ; try nlinarith`) repeatedly before explicitly writing out the low-level logical steps (`have eq1... have eq2...`).
3. **Extreme Granularity:** To prove $z=7$ from $33 - 14z = -65$, APOLLO breaks the algebra down into minuscule steps: isolating terms, proving $14 \neq 0$, multiplying by the inverse ($14^{-1}$), converting to division, and finally simplifying $98 / 14 = 7$.
### Interpretation
This diagram serves as a technical demonstration of the "APOLLO" system's capabilities in formal theorem proving compared to a baseline LLM.
When baseline LLMs write code for strict mathematical environments like Lean 4, they suffer from "hallucinated automation"—assuming high-level solver tactics (like `linarith`) will magically bridge the gap between intermediate algebraic steps. In reality, formal verification compilers require exact, typed matches, and these leaps of logic fail (indicated by the red boxes).
The APOLLO system solves this by refusing to rely on "magic" tactics. It forces the generation of highly granular, almost painstakingly verbose, low-level logical steps. By manually defining explicit equalities (`have h_expanded`, `have h_sum`, `have z_div`) and using basic rewrite rules (`rw`) and exact equivalences (`exact`), APOLLO constructs a proof that is practically guaranteed to compile and satisfy the strict rules of the formal verifier. The visual contrast—a 15-line failed script versus an 80+ line successful script—highlights the massive gap between "human-readable mathematical intuition" and "machine-verifiable formal logic."