BRIDGE: Building Representations In Domain Guided Program Synthesis

Robert Joseph George 1,2, Carson Eisenach2, Udaya Ghai2, Dominique Perrault-Joncas2, Anima Anandkumar1, Dean Foster2
1California Institute of Technology; 2Amazon
Updated: arXiv v2 (Feb 25, 2026)

Abstract

Large language models (LLMs) are good at generating code, but remain brittle for formal verification in systems like Lean4. A core scalability challenge is that verified synthesis requires consistent outputs across multiple artifacts: executable code, precise specifications, theorem statements, and ultimately proofs. Existing approaches rarely treat these as a unified pipeline. We present BRIDGE, a structured prompting framework that decomposes verification into three interconnected domains: Code (implementations), Specifications (formal intent), and Theorem Statements (constructive correctness claims), and elicits domain-specific intermediate reasoning to connect them. In Lean4, BRIDGE often adopts a code-first workflow, using the generated implementation as a semantic anchor for downstream specification and theorem statement generation. Across 178 algorithmic problems and five LLMs, BRIDGE improves Lean executable correctness by nearly 1.5x (pass at 5) over direct baselines and can be 2x more sample-efficient at inference time, requiring fewer samples per verified solution at comparable generation lengths. We further find that specification-driven prompting improves Python pass rates by up to 17.5 percent. Beyond inference-time prompting, supervised fine-tuning on BRIDGE-style reasoning traces yields nearly 1.5x higher Lean pass success than code-only SFT, indicating that these intermediate representations are learnable. BRIDGE provides a practical foundation for scaling verified synthesis and motivates future work on expert iteration and full proof generation.

Key Results

  • Across 178 algorithmic problems and five LLMs, nearly 1.5× higher Lean executable correctness (pass@5) vs. direct baselines.
  • Up to 2× better sample efficiency at inference time (fewer samples per verified solution at comparable generation lengths).
  • Specification-driven prompting improves Python pass rates by up to 17.5%.
  • Supervised fine-tuning on BRIDGE-style reasoning traces yields nearly 1.5× higher Lean pass success than code-only SFT.

What BRIDGE Changes

BRIDGE structures verified program synthesis as a pipeline over three connected domains:

BRIDGE pipeline over code, specifications, and theorem statements

Code

Generate an implementation that can be executed and checked, serving as a semantic anchor for downstream artifacts.

Specifications

Elicit precise formal intent that matches the implementation and enables automated checking in Lean.

Theorem Statements

Produce constructive correctness claims that connect the specification to the implementation and set up proof generation.