BRIDGE: Building Representations in Domain-Guided Verified Program Synthesis

Robert Joseph George 1,2, Carson Eisenach2, Udaya Ghai2, Dominique Perrault-Joncas2, Anima Anandkumar1, Dean Foster2
1California Institute of Technology; 2Amazon

Abstract

Large language models can write plausible programs, but plausible code is not the same as verified code. BRIDGE studies verifiable coding: generating code together with the formal artifacts needed to explain, constrain, and eventually prove what the code does. In Lean, this means keeping executable implementations, specifications, theorem statements, and proof attempts aligned instead of treating them as separate outputs.

BRIDGE decomposes this problem into three connected domains: Code, Specification, and Theorem/Proof. The method uses domain-specific reasoning and cross-artifact checks to reduce semantic drift, where code may pass tests while a specification is too weak, a theorem misses the intended claim, or a proof attempt only establishes a narrow property. The current work focuses on Lean executable correctness, specification usefulness, and theorem/proof diagnostics, while recognizing that full semantic proof completion at scale remains the long-term goal.

Across a 178-problem Lean benchmark and public verification benchmarks including VERINA and CLEVER, BRIDGE improves executable correctness by up to 1.5x and can reach comparable success with roughly 2x fewer Lean evaluations. Specification-guided prompting improves code generation by up to 17.5 percentage points, and functional scaffolds make it easier to generate Lean-elaborating theorem statements and completed proof attempts aligned with the implementation.

Key Results

  • Up to 1.5x higher Lean executable correctness on the 178-problem BRIDGE suite compared with direct prompting.
  • Roughly 2x fewer Lean evaluations to reach comparable pass@k success in sample-efficiency experiments.
  • Transfer gains on public verification settings, including VERINA, CLEVER, and a solver-backed DAFNYBENCH pilot.
  • Specification reasoning improves code generation by up to 17.5 percentage points and helps produce formal constraints with semantic bite.
  • Functional scaffolds improve downstream theorem/proof artifacts, including more Lean-elaborating theorem statements and more completed proof attempts under fixed prover budgets.
  • Code and data are available in the BRIDGE GitHub repository.

What BRIDGE Changes

BRIDGE structures verified program synthesis as a pipeline over three connected domains:

BRIDGE pipeline over code, specifications, and theorem statements

Code

Generate Lean implementations that elaborate, satisfy termination and totality checks, and pass benchmark tests.

Specification

Make assumptions, edge cases, and behavioral constraints explicit so the code can be checked against meaningful formal intent.

Theorem/Proof

Generate correctness claims and bounded proof attempts that test whether the implementation can be connected to formal verification artifacts.