Publications — I2I | Idea to Impact

Paper 1

Can AI Do Science? Evaluating Autonomous AI Systems on Real Biomedical Data

We put leading AI scientist systems to the test — on real patient data, evaluated by real experts.

In Preparation

The Question

AI systems can now write research papers from start to finish. But can they actually do science? This paper investigates whether today's autonomous AI scientist systems can work with complex, real-world biomedical data and produce findings that a human expert would consider valid and meaningful.

What We Built

We developed a framework that maps the full arc of scientific research — from forming a hypothesis to publishing findings — and used it to survey and classify the current landscape of AI scientist systems. We then selected three leading systems and ran them on the AI-READI dataset: 2,280 patients, 3.82TB of data spanning glucose monitoring, retinal imaging, wearables, genetics, and more.

How We Evaluated Them

Each system's output was reviewed at two levels. First, does the paper hold together — is it internally consistent and logically sound? Second, does it contain real scientific value — would a domain expert consider the findings meaningful? We also compared AI-generated reviews against human expert judgment to understand where automated evaluation succeeds and where it falls short.

Publications.

More publications are on the way.