Abstract
Large language models can write plausible programs, but plausible code is not the same as verified code. BRIDGE studies verifiable coding: generating code together with the formal artifacts needed to explain, constrain, and eventually prove what the code does. In Lean, this means keeping executable implementations, specifications, theorem statements, and proof attempts aligned instead of treating them as separate outputs.
BRIDGE decomposes this problem into three connected domains: Code, Specification, and Theorem/Proof. The method uses domain-specific reasoning and cross-artifact checks to reduce semantic drift, where code may pass tests while a specification is too weak, a theorem misses the intended claim, or a proof attempt only establishes a narrow property. The current work focuses on Lean executable correctness, specification usefulness, and theorem/proof diagnostics, while recognizing that full semantic proof completion at scale remains the long-term goal.
Across a 178-problem Lean benchmark and public verification benchmarks including VERINA and CLEVER, BRIDGE improves executable correctness by up to 1.5x and can reach comparable success with roughly 2x fewer Lean evaluations. Specification-guided prompting improves code generation by up to 17.5 percentage points, and functional scaffolds make it easier to generate Lean-elaborating theorem statements and completed proof attempts aligned with the implementation.