## Code Comparison Diagram: LLM vs. LLM + APOLLO Theorem Proving
### Overview
This image is a side-by-side comparison of two code snippets written in the Lean 4 theorem proving language. It contrasts the output generated by a standard Large Language Model ("LLM") on the left with the output generated by an LLM augmented with a system called "APOLLO" on the right. Both snippets attempt to prove a mathematical theorem related to the International Mathematical Olympiad (IMO) 1983, Problem 6. The visual design uses color coding (red for the baseline LLM, green for the APOLLO-augmented LLM) to highlight the differences in the final proof tactic.
### Components & Layout
The image is divided into two distinct rectangular panels of equal size:
1. **Left Panel (Red Theme):**
* **Header:** A solid red rectangle at the top center containing the text "LLM" in black.
* **Border:** A thick red outline surrounding the panel.
* **Background:** The top portion has a very faint red/pink tint (almost white), while the bottom highlighted code block has a distinct light red background.
* **Annotation:** A small, darker red square in the top-right corner of the highlighted block containing the text "#1" in white.
2. **Right Panel (Green Theme):**
* **Header:** A solid green rectangle at the top center containing the text "LLM + APOLLO" in black.
* **Border:** A thick green outline surrounding the panel.
* **Background:** The top portion has a very faint green tint (almost white), while the bottom highlighted code block has a distinct light green background.
* **Annotation:** A small, darker green square in the top-right corner of the highlighted block containing the text "#1" in black.
### Content Details
Both panels share identical setup code and theorem declarations (with minor line-wrapping differences), but diverge significantly in the final `nlinarith` tactic block.
#### Left Panel Transcription (LLM)
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem imo_1983_p6_llm (a b c : ℝ) (h₀ : 0 < a ∧ 0 <
b ∧ 0 < c) (h₁ : c < a + b) (h₂ : b < a + c)
(h₃ : a < b + c) : 0 ≤ a ^ 2 * b * (a - b) + b ^ 2
* c * (b - c) + c ^ 2 * a * (c - a) := by
nlinarith [sq_nonneg (a - b), sq_nonneg (b - c),
sq_nonneg (c - a), sq_nonneg (a - b),
sq_nonneg (b - c), sq_nonneg (c - a),
mul_pos h₀.left h₀.right.left,
mul_pos h₀.left h₀.right.right,
mul_pos h₀.right.left h₀.right.right]
```
*(Note: The `nlinarith` block is highlighted in light red, with `#1` in the top right corner).*
#### Right Panel Transcription (LLM + APOLLO)
```lean
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem imo_1983_p6_apollo (a b c : ℝ) (h₀ : 0 < a
∧ 0 < b ∧ 0 < c) (h₁ : c < a + b) (h₂ : b < a + c)
(h₃ : a < b + c) : 0 ≤ a ^ 2 * b * (a - b) + b ^ 2
* c * (b - c) + c ^ 2 * a * (c - a) := by
nlinarith [sq_nonneg (a - b), sq_nonneg (b - c),
sq_nonneg (c - a),
mul_pos (sub_pos.mpr h₁)
(sub_pos.mpr h₂), mul_pos
(sub_pos.mpr h₂)
(sub_pos.mpr h₃), mul_pos (sub_pos.mpr h₃)
(sub_pos.mpr h₁)]
```
*(Note: The `nlinarith` block is highlighted in light green, with `#1` in the top right corner).*
### Key Observations
* **Shared Context:** Both models correctly set up the environment (`import Mathlib`, `import Aesop`), set options (`maxHeartbeats 0`), and define the theorem signature for IMO 1983 Problem 6. The theorem involves real numbers $a, b, c$ that are strictly positive ($h_0$) and satisfy the triangle inequalities ($h_1, h_2, h_3$).
* **Naming Convention:** The left theorem is named `imo_1983_p6_llm`, while the right is named `imo_1983_p6_apollo`.
* **Divergence in Logic (The Highlighted Blocks):**
* **The Baseline LLM (Red):** The arguments provided to the `nlinarith` (non-linear arithmetic) tactic contain redundancies. It repeats `sq_nonneg (a - b), sq_nonneg (b - c), sq_nonneg (c - a)` twice. Furthermore, it attempts to use `mul_pos` by combining the basic positivity hypotheses from $h_0$ (e.g., `h₀.left`, which is $0 < a$).
* **The APOLLO LLM (Green):** The arguments are concise and mathematically distinct. It lists the `sq_nonneg` terms only once. Crucially, it uses `mul_pos` in conjunction with `sub_pos.mpr` applied to the triangle inequality hypotheses ($h_1, h_2, h_3$). For example, `sub_pos.mpr h₁` converts the hypothesis $c < a + b$ into the mathematically useful form $0 < a + b - c$.
### Interpretation
This image serves as a demonstration of the efficacy of the "APOLLO" system in improving the formal mathematical reasoning capabilities of Large Language Models.
The data suggests that a standard LLM (left) struggles to provide the correct auxiliary proofs required by the `nlinarith` tactic to close the goal. The baseline LLM exhibits "hallucination" or poor logical planning by repeating identical arguments and feeding basic, insufficient positivity proofs (`h₀`) into the tactic, which likely results in a failure to compile or prove the theorem.
Conversely, the APOLLO-augmented system (right) demonstrates a deeper understanding of the mathematical requirements. It recognizes that to solve this specific non-linear inequality, the tactic needs to know that the terms derived from the triangle inequalities are positive. By applying `sub_pos.mpr` to $h_1, h_2,$ and $h_3$, APOLLO correctly transforms the hypotheses into the exact format required by `nlinarith` to successfully complete the proof. The use of red (often associated with failure/errors) and green (associated with success/correctness) strongly implies that the left code fails while the right code succeeds.