ResourcesFor the product see home / docs / audit

Technical foundation

The verification protocol behind the platform.

Verifiable Labs is built on three primitives: procedural task generation (contamination structurally impossible), executable or closed-form ground truth (no human-label drift), and conformal- calibrated uncertainty (every reward bounded, not point-estimated). The same protocol powers the SDK, the audit pipeline, and the V-Certified registry. A peer-reviewed reference is below for teams that want the math, alongside the public artefacts and the citation entry.

Key finding13 / 15paired held-out evaluations significant across the cross-env × cross-model matrix
Key findingp < 10⁻¹⁷strongest Wilcoxon result (Llama-3.2-1B + code-humaneval, Δ rel +2064%)
Key finding0 / 1500regressions across all paired test seeds in 15 evaluations

Selected results

Calibration coverage, capability gaps, and reward distributions from the reference evaluation. Full results, error bars, and ablations live in the public Zenodo record.

Classical vs best-LLM mean reward (per domain)
0.000.250.500.751.00Sparse FourierCT (LoDoPaB)MRI KneePhase RetrievalSuper-Res DIV2KClassicalBest LLM
Empirical conformal coverage over episodes
0.700.800.901.00target = 0.90Episode (×100)
Difficulty (1 − classical reward) vs LLM gap
SF-1SF-2SF-3CT-1CT-2MRI-1MRI-2PR-1PR-2SR-1DifficultyLLM gap
Classical-vs-LLM significance (p<0.05); green = significant
OpusSonnetHaikuGPT-5Gemini 2.5SF-1SF-2SF-3CT-1CT-2MRI-1MRI-2PR-1PR-2SR-1

Cite the protocol

For teams that need a citable reference. The Zenodo record is the canonical entry. Code is Apache-2.0, the document itself is CC-BY-4.0.

citation.bib
@misc{zacharioudakis2026verifiable,
  title         = {Conformal-Calibrated Rewards for Scientific RL:
                    Procedural Regeneration Against Benchmark Contamination},
  author        = {Zacharioudakis, Stelios},
  year          = {2026},
  month         = {April},
  publisher     = {Zenodo},
  version       = {v1},
  doi           = {10.5281/zenodo.19786415},
  url           = {https://zenodo.org/records/19786415},
  note          = {National and Kapodistrian University of Athens}
}

Related work

Conformal Prediction (Vovk et al.)Compressed Sensing (Donoho)fastMRI (Zbontar et al.)LoDoPaB-CT (Leuschner et al.)GRPO (Shao et al.)Procedural environments (Cobbe et al.)