ImProver: Agent-Based Automated Proof Optimization

Automated Proof Optimization is the task of rewriting a formal proof so that it is both correct and optimized for an arbitrary criterion.

We present ImProver, a general LLM agent-based framework that solves this task. Here's an example of ImProver making a lemma from the 2019 IMO Q2 more concise and efficient:

Original (Human-Written)


lemma lemma0 {α : Type} {p : α → α → Prop}
(h1 : ∀ x, ∃! y, p x y) 
(h2 : ∀ x y, p x y ↔ p y x) :
∀ x, Classical.choose 
    (h1 (Classical.choose (h1 x).exists)).exists=x := by    
  -- PROOF START
  intro x
  obtain ⟨y, h1e, h1u⟩ := h1 x
  have h2' : Classical.choose (h1 x).exists = y :=
    h1u _ (Classical.choose_spec (h1 x).exists)
  rw [h2']
  obtain ⟨w, h1e', h1u'⟩ := h1 y
  have h4 := Classical.choose_spec (h1 y).exists
  have hxw : x = w := by
    apply h1u'
    rw [h2]
    exact h1e
  rw [hxw]
  exact h1u' _ h4
          

ImProver (Length-Optimized)


lemma lemma0 {α : Type} {p : α → α → Prop}
  (h1 : ∀ x, ∃! y, p x y) 
  (h2 : ∀ x y, p x y ↔ p y x) :
  ∀ x, Classical.choose 
      (h1 (Classical.choose (h1 x).exists)).exists=x := by
        
  -- PROOF START
  intro x
  obtain ⟨y, h1e, h1u⟩ := h1 x
  rw [h1u _ (Classical.choose_spec _)]
  obtain ⟨w, h1e', h1u'⟩ := h1 y
  rw [h1u' _ ((h2 _ _).mpr h1e)]
  exact h1u' _ (Classical.choose_spec _)
          

We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval.

We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics problems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter and more readable compared to the raw LLM baseline.

Metric Model Improvement Nonempty Improvement Accuracy Improved Acc.
Length GPT-4o 3.7 15.15 26.36% 8.31%
Length ImProver 20.96 55.29 100.0% 35.44%
Readability GPT-4o 2.21 8.02 18.75% 6.13%
Readability ImProver 9.34 30.53 100.0% 24.56%

Check out our code and run ImProver yourself here, or keep exploring to learn more about ImProver.

Recent Activity

Compfiles

ImProver is the first AI software to contribute to a major Lean dataset!

Official Release

ImProver has been officially released!

ArXiV

ImProver is officially in preprint

ICLR

ImProver has been submitted for ICLR 2025

MAI @ NeurIPS2025

ImProver has been submitted for the Math for AI workshop at NeurIPS 2025

Github

ImProver is now Open-Source