LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction

1Institute for Interdisciplinary Information Sciences, Tsinghua University
2California Institute of Technology

LeanProgress Visualization

LeanProgress Visualization

The visualization of LeanProgress. LeanProgress is a lightweight framework that collects the number of remaining steps in proof trees and then balances the data distribution to train the language model. Then LeanProgress takes the proof state as input to generate the remaining steps for each state as a signal for search. LeanProgress also integrates the tactic predict steps in LeanCopilot as a user-friendly tool.


Abstract

Mathematical reasoning remains a significant challenge for Large Language Models (LLMs) due to hallucinations. When combined with formal proof assistants like Lean, these hallucinations can be eliminated through rigorous verification, making theorem proving reliable. However, even with formal verification, LLMs still struggle with long proofs and complex mathematical formalizations.


While Lean with LLMs offers valuable assistance with retrieving lemmas, generating tactics, or even complete proofs, it lacks a crucial capability: providing a sense of proof progress. This limitation particularly impacts the overall development efficiency in large formalization projects. LeanProgress addresses this critical gap by introducing a novel proof progress prediction mechanism that predicts the likelihood of proof completion from intermediate proof states, enabling more intelligent search strategies that significantly improve the efficiency and success rate of automated theorem proving systems.

Key Innovation: Proof Progress Prediction

LeanProgress introduces a groundbreaking approach to theorem proving search by predicting the number of remaining steps toward proof completion. Our method achieves 75.1% overall prediction accuracy and demonstrates a 3.8% improvement on Mathlib4 compared to baseline performance of 41.2%.


Progress Estimation

Fine-tuned DeepSeek Coder 1.3B model that evaluates intermediate proof states and predicts the exact number of remaining steps for completion, with Mean Absolute Error (MAE) of 3.29.

Search Guidance

Intelligent best-first search algorithms that combine progress predictions with log probabilities to prioritize promising proof paths and avoid unproductive exploration branches.

Practical Integration

Native integration with LeanCopilot through the predict_steps_with_suggestion tactic, providing both tactic suggestions and remaining step predictions directly in Lean.

Technical Approach & Dataset

LeanProgress implements a sophisticated data generation and model training pipeline:


Balanced Dataset Construction

We constructed a balanced dataset of approximately 80,000 proof trajectories from Lean Workbook Plus and Mathlib4 using best-first search with the Reprover model. The original dataset exhibited a skewed distribution with an average proof length of 2.47 steps. We addressed this imbalance through strategic sampling ratios:

  • 1-5 steps (Basic): 0.01 sampling ratio
  • 6-10 steps (Intermediate): 0.3 sampling ratio
  • 11-15 steps (Moderate): 0.5 sampling ratio
  • 16-20 steps (Advanced): 0.7 sampling ratio
  • 21+ steps (Expert): 1.0 sampling ratio

This resulted in a more balanced dataset with an average proof length of 10.1 steps, ensuring better representation of longer, more complex proofs.

Model Architecture & Training

We fine-tuned a DeepSeek Coder V1 1.3B base model using Mean Squared Error (MSE) loss with the AdamW optimizer. The model takes proof states and optional proof history as input and predicts the exact number of remaining steps. Key training parameters included batch size of 4, learning rate of 1e-5, and weight decay of 0.01.

Proof History Integration

Our experiments demonstrate the crucial importance of incorporating proof history. Models using both current state and proof history achieved significantly better performance across all proof length ranges, with overall accuracy improving from 61.8% to 75.1% and MAE decreasing from 5.217 to 3.290.

Performance Results & Impact

LeanProgress demonstrates significant improvements across multiple evaluation metrics:


Prediction Accuracy

75.1% overall accuracy in predicting remaining proof steps, with particularly strong performance on longer proofs (76.7% accuracy for 21+ step proofs).

Search Improvement

3.8% improvement on Mathlib4 test dataset compared to baseline Reprover performance of 41.2%, achieved by combining step predictions with log probabilities.

Practical Tool

Integrated predict_steps_with_suggestion tactic in LeanCopilot provides immediate feedback on proof progress and concrete guidance for tactic selection.


Real-World Applications

LeanProgress enables more informed decision-making in interactive theorem proving by providing users with concrete estimates of proof complexity. Unlike traditional tactic suggestion tools that only offer local guidance, LeanProgress provides a global view of the proof process, helping users understand whether they're making progress toward completion or exploring unproductive paths.

Future Potential

The continuous reward signal provided by LeanProgress opens new possibilities for reinforcement learning applications in automated theorem proving. By offering meaningful feedback throughout the proving process rather than just binary success/failure signals, LeanProgress could enable more effective RL training for complex mathematical reasoning tasks.

Key Contributions

1. Balanced Data Generation Pipeline: We developed a systematic approach to address the challenge of skewed proof length distributions by constructing a balanced dataset of ~80k proof trajectories with strategic sampling ratios based on proof complexity.


2. Novel Progress Prediction Model: Our fine-tuned DeepSeek Coder model provides a global view of proof progress by predicting exact remaining steps rather than just immediate next tactics, achieving 75.1% accuracy with MAE of 3.29.


3. Progress-Guided Search Integration: We demonstrated how step predictions can be effectively combined with traditional log probabilities in best-first search frameworks, showing measurable improvements in theorem proving success rates.


4. Practical Tool Development: Integration with LeanCopilot provides a user-friendly interface through the predict_steps_with_suggestion tactic, offering both tactic suggestions and progress feedback directly within the standard Lean workflow.


5. Foundation for Future RL Applications: By providing continuous reward signals throughout the proof process, LeanProgress establishes groundwork for more sophisticated reinforcement learning approaches to automated theorem proving.