Explore Data Get in touch
I2I — Idea to Impact

Mapping How AI
Discovers
Science.

Explore Methodology → Latest Updates →

What is I2I?

I2I — Idea to Impact — is a research initiative examining how far fully autonomous, end-to-end AI systems can carry the full scientific process. We evaluate end-to-end AI scientist platforms across all ten stages of the research lifecycle: from literature review and hypothesis generation through study design, data analysis, manuscript writing, and peer review. Our goal is to produce a rigorous, reproducible benchmark of what today's systems can and cannot do without meaningful human intervention.

Our evaluation is grounded in the AI-READI dataset — a real-world multimodal biomedical collection of 2,280 diabetes patients — to measure whether AI can surface novel findings, produce valid statistical inferences, and generate manuscripts that hold up under expert review.

A framework for the full scientific lifecycle.

An evaluation benchmark across 10 lifecycle stages

We systematically score autonomous AI platforms on each phase of the research process — assigning capability levels from assisted to fully autonomous — so that strengths, gaps, and inflection points become clearly visible and comparable across systems.

Real clinical data

By grounding every pipeline run in AI-READI's multimodal patient data — ECG, retinal imaging, continuous glucose monitoring, and more — we ensure our findings reflect performance on the kind of messy, high-dimensional data that defines modern biomedical research.

Human expert validation at the finish line

AI-generated manuscripts and findings are submitted to blind review by domain experts in diabetes research and biomedical AI. This closes the loop between automated output and scientific credibility, giving our benchmark a ground truth that goes beyond automated metrics.

AI can assist science.
It cannot yet do science.

Several AI scientist systems can now generate papers end-to-end. The unsolved problem is whether they can conduct rigorous science on real, heterogeneous, multimodal data — and produce findings that hold up under independent expert review.

End-to-end capability does not generalize to biomedical data. Existing systems have demonstrated end-to-end execution on curated machine learning tasks with clean, homogeneous inputs. Performance on real-world multimodal biomedical data — which requires simultaneous reasoning across continuous glucose monitoring, retinal imaging, sleep and activity data, clinical records, and genetic information — remains uncharacterized.

Capability claims outpace independent validation. High-profile demonstrations of AI-generated research are rarely tested against independent replication on new datasets, or validated by domain experts outside the originating lab. I2I addresses this directly by applying systems to an independently curated clinical dataset and subjecting outputs to structured assessment by domain experts.

No shared standard exists for evaluating AI scientists. The field lacks a common evaluation framework that spans practical performance, technical execution, statistical rigor, and manuscript quality simultaneously. Without a unified benchmark, progress claims are difficult to verify and findings across studies cannot be meaningfully compared.

I2I exists to close this gap with rigorous, transparent measurement.

The Full Scientific Discovery Loop.

Explore all 10 stages →

Grounded in Real Patient Data.

Multimodal data types included:

ECGCGM OCTOCTA CFPFLIO Fitness TrackerClinical Labs
Visit aireadi.org ↗