LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction

1California Institute of Technology
2Princeton University
Published: Transactions on Machine Learning Research (TMLR) 2025

LeanProgress Visualization

LeanProgress Visualization

LeanProgress predicts how many steps remain to complete a proof. We collect proof trees, balance the data distribution, and train a model to estimate remaining steps. The model takes the current proof state as input and outputs the number of remaining steps, which we use to guide search. We also built a predict_steps_with_suggestion tactic for LeanCopilot that shows both tactic suggestions and progress estimates.


Overview

LLMs struggle with mathematical reasoning because they hallucinate. Formal proof assistants like Lean can verify everything, but LLMs still have trouble with long proofs and complex formalizations.


Existing tools help retrieve lemmas, suggest tactics, or generate complete proofs. But they don't tell you how much progress you've made or how many steps are left. This is a problem for large formalization projects where you need to know if you're on the right track.


LeanProgress predicts the number of remaining steps in a proof. When we integrate this into proof search, it helps prioritize promising paths and avoid dead ends. We show a 3.8% improvement on Mathlib4 (from 41.4% to 45.2%).

How It Works

We fine-tune a model to predict remaining steps from proof states. Our DeepSeek Coder 1.3B model achieves 75.8% accuracy with MAE of 3.15. When we use these predictions to guide search, we see a 3.8% improvement on Mathlib4 (from 41.4% to 45.2%).


Step Prediction

We fine-tune DeepSeek Coder 1.3B to predict the exact number of remaining steps from intermediate proof states. The model takes the current state and optional proof history as input.

Progress-Guided Search

We combine step predictions with tactic log probabilities in a best-first search. This helps prioritize paths that are likely to complete faster and avoid unproductive branches.

LeanCopilot Integration

We built predict_steps_with_suggestion into LeanCopilot. It shows tactic suggestions along with how many steps each one would leave, making it easier to choose tactics.

Dataset and Training

Data Collection

We collected about 80,000 proof trajectories from Lean Workbook Plus and Mathlib4 using best-first search with Reprover. The original data was heavily skewed toward short proofs (average 2.47 steps). We balanced it using different sampling ratios based on proof length:

  • 1-5 steps: 0.01 sampling ratio (most proofs are short)
  • 6-10 steps: 0.3 sampling ratio
  • 11-15 steps: 0.5 sampling ratio
  • 16-20 steps: 0.7 sampling ratio
  • 21+ steps: 1.0 sampling ratio (use all long proofs)

After balancing, the average proof length is 10.1 steps, which better represents longer proofs.

Model Training

We fine-tuned DeepSeek Coder 1.3B using MSE loss and AdamW optimizer. The model takes the current proof state and optional proof history, then predicts remaining steps. Training uses batch size 4, learning rate 1e-5, and weight decay 0.01.

Using Proof History

Including proof history helps a lot. Models with history get 75.8% accuracy (MAE 3.15) vs 61.8% accuracy (MAE 5.22) without it. The history tells the model where you've been, which helps predict where you're going.

Results

Prediction Accuracy

Our model gets 75.8% accuracy at predicting remaining steps (MAE 3.15). It works especially well on longer proofs.

Proof Search

When we use step predictions to guide search, we get 45.2% pass rate on Mathlib4 vs 41.4% baseline—a 3.8% improvement. This works by combining step predictions with tactic log probabilities.

Tool Integration

The predict_steps_with_suggestion tactic shows both tactic suggestions and how many steps remain after each one. This makes it easier to pick tactics during interactive proving.


Why This Helps

Existing tools suggest the next tactic, but they don't tell you if you're making progress. LeanProgress gives you a global view: instead of just knowing what to try next, you know how far you are from the goal. This helps you avoid dead ends and focus on promising paths.

Future Work

Step predictions could work as reward signals for reinforcement learning. Instead of only getting feedback at the end (success or failure), you'd get continuous feedback throughout the proof. This might enable better RL training for theorem proving.

Main Contributions

1. Balanced dataset: We built a dataset of ~80k proof trajectories from Lean Workbook Plus and Mathlib4. Since most proofs are short, we used different sampling ratios to balance the data so longer proofs are well represented.


2. Step prediction model: We fine-tuned DeepSeek Coder 1.3B to predict remaining steps from proof states. It achieves 75.8% accuracy (MAE 3.15). Unlike tactic suggestion tools that only look at the next step, this gives a global view of proof progress.


3. Progress-guided search: We combine step predictions with tactic log probabilities in best-first search. This improves Mathlib4 pass rate from 41.4% to 45.2%.


4. Practical tool: We integrated step prediction into LeanCopilot as the predict_steps_with_suggestion tactic. It shows both tactic suggestions and remaining steps, helping users choose better tactics during interactive proving.


5. Foundation for RL: Step predictions provide continuous reward signals throughout the proof. This could enable better reinforcement learning for theorem proving, since you'd get feedback at every step rather than just at the end.