The verification protocol behind the platform.

Name: Verifiable Labs SDK
Author: Stelios Zacharioudakis

Verifiable Labs is built on three primitives: procedural task generation (contamination structurally impossible), executable or closed-form ground truth (no human-label drift), and conformal- calibrated uncertainty (every reward bounded, not point-estimated). The same protocol powers the SDK, the audit pipeline, and the V-Certified registry. A peer-reviewed reference is below for teams that want the math, alongside the public artefacts and the citation entry.

Stelios ZacharioudakisORCID 0009-0000-6021-5829 DOI 10.5281/zenodo.19786415 OpenReview 4kQ17M7jeg

Read on Zenodo Code on GitHub

Key finding13 / 15paired held-out evaluations significant across the cross-env × cross-model matrix

Key findingp < 10⁻¹⁷strongest Wilcoxon result (Llama-3.2-1B + code-humaneval, Δ rel +2064%)

Key finding0 / 1500regressions across all paired test seeds in 15 evaluations

Selected results

Calibration coverage, capability gaps, and reward distributions from the reference evaluation. Full results, error bars, and ablations live in the public Zenodo record.

Classical vs best-LLM mean reward (per domain)

Empirical conformal coverage over episodes

Difficulty (1 − classical reward) vs LLM gap

Classical-vs-LLM significance (p<0.05); green = significant

Cite the protocol

For teams that need a citable reference. The Zenodo record is the canonical entry. Code is Apache-2.0, the document itself is CC-BY-4.0.

Zenodo record OpenReview thread

citation.bib

@misc{zacharioudakis2026verifiable,
  title         = {Conformal-Calibrated Rewards for Scientific RL:
                    Procedural Regeneration Against Benchmark Contamination},
  author        = {Zacharioudakis, Stelios},
  year          = {2026},
  month         = {April},
  publisher     = {Zenodo},
  version       = {v1},
  doi           = {10.5281/zenodo.19786415},
  url           = {https://zenodo.org/records/19786415},
  note          = {National and Kapodistrian University of Athens}
}

Related work

Conformal Prediction (Vovk et al.)Compressed Sensing (Donoho)fastMRI (Zbontar et al.)LoDoPaB-CT (Leuschner et al.)GRPO (Shao et al.)Procedural environments (Cobbe et al.)