I2I grounds all evaluation in the AI-READI dataset — a multimodal collection of 2,280 diabetes patients spanning four clinical groups, with data types including CGM, retinal imaging, wearables, genetics, and clinical records.
AI-READI is the primary dataset underpinning I2I evaluation. Funded by the NIH Bridge2AI program, it is a flagship, ethically-sourced multimodal dataset designed specifically for AI and machine learning research in type 2 diabetes mellitus. The dataset targets 4,000 participants recruited across three sites — the University of Washington, the University of Alabama at Birmingham, and UC San Diego — and is triple-balanced by race/ethnicity, biological sex, and diabetes severity.
Data modalities include continuous glucose monitoring (Dexcom G6), sleep and activity tracking (Garmin), retinal imaging (FLIO, OCTA, CFP), clinical and demographic records, environmental sensing (LeeLab Anura), and genetic data. All data are formatted according to FAIR principles and organized using the Clinical Dataset Structure (CDS) standard developed by the project.
For I2I, AI-READI represents the core challenge: can an autonomous AI scientist reason simultaneously across heterogeneous, high-dimensional biomedical data modalities to produce valid scientific findings?