Explore Data
Research Datasets

The Data Behind
the Science.

I2I grounds all evaluation in the AI-READI dataset — a multimodal collection of 2,280 diabetes patients spanning four clinical groups, with data types including CGM, retinal imaging, wearables, genetics, and clinical records.

Cross-Sectional · Multimodal
AI-READI
Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights

AI-READI is the primary dataset underpinning I2I evaluation. Funded by the NIH Bridge2AI program, it is a flagship, ethically-sourced multimodal dataset designed specifically for AI and machine learning research in type 2 diabetes mellitus. The dataset targets 4,000 participants recruited across three sites — the University of Washington, the University of Alabama at Birmingham, and UC San Diego — and is triple-balanced by race/ethnicity, biological sex, and diabetes severity.

Data modalities include continuous glucose monitoring (Dexcom G6), sleep and activity tracking (Garmin), retinal imaging (FLIO, OCTA, CFP), clinical and demographic records, environmental sensing (LeeLab Anura), and genetic data. All data are formatted according to FAIR principles and organized using the Clinical Dataset Structure (CDS) standard developed by the project.

For I2I, AI-READI represents the core challenge: can an autonomous AI scientist reason simultaneously across heterogeneous, high-dimensional biomedical data modalities to produce valid scientific findings?

Participants
2,280 (current) / 4,000 (target)
Data Size
3.82 TB
Design
Cross-sectional
Modalities
CGM, Retinal Imaging, Wearables, Genetics, Clinical, Environmental
Access
Controlled access via FAIRhub
Primary Focus
Type 2 diabetes salutogenesis

One Dataset. One Unified Benchmark.

Explore our methodology →