## Diagram: Code Comparison - LLM vs. LLM + APOLLO
### Overview
This image presents a side-by-side comparison of two code snippets written in Lean 4, a formal theorem proving language. The image contrasts the output of a standard Large Language Model ("LLM") on the left with an enhanced system ("LLM + APOLLO") on the right. The visual design uses color-coding (red for the baseline/errors, green for corrections/improvements) and numbered tags to highlight specific differences where the APOLLO system has fixed errors or optimized the proof generated by the base LLM.
### Components and Layout
The image is divided into two main vertical panels:
1. **Left Panel (Red - Baseline LLM):**
* **Header:** A red rectangular box at the top center containing the text "LLM" in black.
* **Border:** A thick red border surrounds the entire panel.
* **Background:** Very light red/pink.
* **Highlights:** Specific lines of code are highlighted with a darker red background.
* **Tags:** Small red rectangular tags with white text (`#1`, `#2`, `#3`) are placed at the end of the highlighted lines to index the errors.
2. **Right Panel (Green - Enhanced System):**
* **Header:** A green rectangular box at the top center containing the text "LLM + APOLLO" in black.
* **Border:** A thick green border surrounds the entire panel.
* **Background:** Very light green.
* **Highlights:** Specific lines of code are highlighted with a darker green background, corresponding to the fixes for the red highlights on the left.
* **Tags:** Small green rectangular tags with black text (`#1`, `#2`, `#3`) are placed at the end of the highlighted lines to index the corrections.
* **Formatting:** Tag `#3` includes a strikethrough line through the code, indicating deletion.
---
### Content Details: Full Transcription and Comparison
Both panels share the same initial setup, imports, and theorem declaration. The differences occur within the proof body (`:= by`).
#### Shared Initial Code (Lines 1-15)
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem mathd_algebra_158_llm (a k : ℕ) (h₀ : Even a)
(h₁ : ↑(Σ k in Finset.range 8, 2 * k + 1) - ↑(Σ k
in Finset.range 5, a + 2 * k) = (4 : ℤ)) :
a = 8 := by
simp_all
ring_nf at *
have eq : (5 : ℤ) * (↑a : ℤ) + (2 : ℤ) * (↑k : ℤ) =
53 := by
linarith
```
*(Note: The theorem name on the right panel is slightly different: `mathd_algebra_158_apollo`, and there are minor line-break differences in the theorem signature due to formatting, but the logical content is identical).*
#### Difference #1: Type Mismatch Correction
* **Left Panel (LLM) - Red Highlight #1:**
```lean
have h_even : ∃ (m : ℕ), a = 2 * m := h₀
```
*Visual:* The entire line is highlighted red. The tag `#1` is at the far right.
* **Right Panel (LLM + APOLLO) - Green Highlight #1:**
```lean
have h_even : ∃ (m : ℕ), a = 2 * m := by
exact Even.exists_two_nsmul a h₀
```
*Visual:* The `:= by` and the subsequent indented line are highlighted green. The tag `#1` is placed after `:= by`.
#### Difference #2: Hallucination vs. Automated Tactic
* **Left Panel (LLM) - Red Highlight #2:**
```lean
have two_divides_53 : ∃ (d : ℤ), 53 = 2 * d := by
rw [← even_sum] at eq_subst
rw [eq_subst] at eq
exact ⟨(↑5) * (↑m : ℤ) + ↑k, eq⟩
```
*Visual:* A four-line block is highlighted red. The tag `#2` is at the end of the first line.
* **Right Panel (LLM + APOLLO) - Green Highlight #2:**
```lean
have two_divides_53 : ∃ (d : ℤ), 53 = 2 * d
:= by omega
```
*Visual:* The `:= by omega` portion is highlighted green. The tag `#2` is placed at the end of the first line.
#### Shared Intermediate Code
```lean
have not_divides : ¬ ∃ (d : ℤ), 53 = 2 * d := by
rintro ⟨d, hd⟩
have mod53 : (53 : ℤ) % 2 = 1 := by
norm_num
have mod_d : (2 * d) % 2 = 0 := by
norm_num
rw [hd] at mod53
```
#### Difference #3: Redundant Code Removal
* **Left Panel (LLM) - Red Highlight #3:**
```lean
rw [mul_mod] at mod53
```
*Visual:* The entire line is highlighted red. The tag `#3` is at the far right.
* **Right Panel (LLM + APOLLO) - Green Highlight #3:**
```lean
rw [mul_mod] at mod53
```
*Visual:* The text `rw [mul_mod] at mod53` has a horizontal strikethrough line through it and is highlighted green. The tag `#3` is at the far right.
#### Shared Final Code
```lean
linarith
have contra : False := by
apply not_divides
exact two_divides_53
exact False.elim contra
```
---
### Key Observations
1. **Syntax/Type Error (Tag #1):** The base LLM attempts to directly assign `h₀` (which is of type `Even a`) to a proposition `∃ (m : ℕ), a = 2 * m`. In Lean, this requires a specific proof term. APOLLO corrects this by invoking the specific lemma `Even.exists_two_nsmul`.
2. **Hallucination (Tag #2):** The base LLM invents variable names (`eq_subst`, `even_sum`) and a variable `m` that is not in scope for this specific block, resulting in a broken proof state. APOLLO completely replaces this flawed multi-line manual proof attempt with a single, powerful automated tactic: `omega` (which handles integer arithmetic).
3. **Redundancy (Tag #3):** The base LLM includes a rewrite tactic (`rw [mul_mod]`) that is either unnecessary or incorrect for the proof path. APOLLO identifies this and removes it (indicated by the strikethrough).
### Interpretation
This image serves as a technical demonstration of the "APOLLO" system's capabilities in the context of formal theorem proving (specifically Lean 4).
The data suggests that while standard LLMs can generate the general structure of a mathematical proof, they frequently fail on the strict syntactic and logical requirements of formal verification languages. They suffer from "hallucinations" (inventing tactics or variables that don't exist, as seen in #2) and type mismatches (as seen in #1).
The APOLLO system acts as an intelligent corrector or agentic wrapper around the LLM. By comparing the left and right panels, we can deduce that APOLLO likely possesses the ability to:
1. Verify code against the Lean compiler.
2. Search for correct lemmas when types mismatch (Fix #1).
3. Recognize when a proof state can be solved by standard automation (replacing hallucinated garbage with `omega` in Fix #2).
4. Prune dead or broken code paths (Fix #3).
Ultimately, the diagram visually communicates that "LLM + APOLLO" transforms broken, uncompilable AI-generated code into a verified, working mathematical proof.