Environments

Ten verifiable environments, five scientific domains.

Every environment ships closed-form baselines, calibrated rewards, and frontier-model evals. Filter by domain to find your variant.

Single-turn
Compressed Sensing

Sparse Fourier Recovery

Recover a k-sparse signal from m noisy Fourier measurements. Closed-form ground truth via OMP and L1.

Classical0.812
Best LLM0.604Claude Haiku 4.5
+0.208 classicalPrime Intellect
Multi-turn
Compressed Sensing

Sparse Fourier Recovery

Multi-turn variant: agent iteratively refines support estimates with feedback per round.

Classical0.812
Best LLM0.638Claude Opus 4
+0.174 classicalPrime Intellect
Tool-using
Compressed Sensing

Sparse Fourier Recovery

Tool-using variant: agent calls FFT, threshold, and least-squares primitives directly.

Classical0.812
Best LLM0.671Claude Opus 4
+0.141 classicalPrime Intellect
Single-turn
Medical Imaging

CT Reconstruction (LoDoPaB)

Reconstruct low-dose CT slices from sparse-view sinograms. FBP and TV-regularized baselines.

Classical0.741
Best LLM0.512GPT-5
+0.229 classicalPrime Intellect
Multi-turn
Medical Imaging

CT Reconstruction (LoDoPaB)

Multi-turn LoDoPaB: agent iterates over filter / regularizer choices with PSNR feedback.

Classical0.741
Best LLM0.534GPT-5
+0.207 classicalPrime Intellect
Single-turn
Medical Imaging

MRI Knee (fastMRI)

Reconstruct knee MRI from undersampled k-space at 4× and 8× acceleration.

Classical0.687
Best LLM0.488Claude Opus 4
+0.199 classicalPrime Intellect
Multi-turn
Medical Imaging

MRI Knee (fastMRI)

Multi-turn fastMRI: agent refines coil-combine and regularizer choices over rounds.

Classical0.687
Best LLM0.519Claude Opus 4
+0.168 classicalPrime Intellect
Single-turn
Crystallography

Phase Retrieval

Recover phase from intensity-only measurements. HIO and Fienup-style baselines.

Classical0.658
Best LLM0.471GPT-5
+0.187 classicalPrime Intellect
Multi-turn
Crystallography

Phase Retrieval

Multi-turn phase retrieval: agent tunes support and shrinkage parameters across rounds.

Classical0.658
Best LLM0.498Claude Opus 4
+0.160 classicalPrime Intellect
Single-turn
Image Processing

Super-Resolution (DIV2K ×4)

4× upscaling of natural images. Bicubic, ESPCN, and SRCNN baselines for PSNR/SSIM scoring.

Classical0.703
Best LLM0.547Claude Opus 4
+0.156 classicalPrime Intellect