Genomics & AI: From Sequencing the Genome to Personalized Medicine
Sequencing a human genome now takes days and a few hundred dollars. The real bottleneck is interpreting the billions of resulting letters — which variant is benign, which causes disease? AI has moved to the heart of that interpretation problem.
The first draft of the Human Genome Project was completed in 2003, at a cost of billions of dollars and after more than a decade of work. Today, an individual's entire genome can be sequenced in a fraction of that time at a fraction of that cost. But this accessibility created an unexpected bottleneck: producing data became easy, but interpreting it remains hard. A human genome contains, on average, millions of variants (deviations from the reference), and the vast majority have unclear clinical significance. This is exactly where AI enters.
Coding Regions: AlphaMissense
"Missense" variants — those that change one amino acid for another in protein-coding regions of the genome — are responsible for a significant share of disease. The problem: of the roughly 4 million missense variants observed in the human genome, only about 2% have been clinically classified. The rest sit in the gray zone of "variants of uncertain significance" (VUS) — uncertainty for patients and clinicians alike.
DeepMind's AlphaMissense, published in 2023, adapts AlphaFold's protein-structure knowledge to predict whether a missense variant is pathogenic. The model classifies millions of variants at proteome scale and holds great promise for the diagnosis — and potentially the treatment — of rare genetic diseases. Independent evaluations report that AlphaMissense correlates better with functional assays of missense effect than prior prediction algorithms.
Coding vs. non-coding genome
Only about 2% of the genome codes for protein. The remaining 98% is "non-coding" and was long thought to be "junk DNA." In fact, these regions regulate gene activity, and many disease-associated variants are located precisely here.
The Non-Coding Genome: AlphaGenome
For the vast 98% that AlphaMissense cannot reach, DeepMind released AlphaGenome for academic use in June 2025; the model's details were published in Nature on 28 January 2026. AlphaGenome takes as input a 1-megabase (one-million-letter) DNA sequence and predicts thousands of functional genomic "tracks" at single-base-pair resolution: gene expression, chromatin accessibility, histone modifications, transcription factor binding, chromatin contact maps, and the use of splice sites.
The practical meaning: AlphaGenome can predict how a variant in a non-coding region might disrupt gene regulation. AlphaMissense (coding) and AlphaGenome (non-coding) should be thought of as complementary — together they make a much larger fraction of the genome interpretable.
Shortening the "Diagnostic Odyssey" for Rare Diseases
For a child with a rare genetic disease, reaching the right diagnosis takes, on average, 5–7 years in many cohorts — a "diagnostic odyssey" of years of uncertainty for the family and delayed treatment. Whole-genome/exome sequencing has the potential to shorten this process, but the resulting flood of data can overwhelm clinicians without ready access to bioinformatics expertise.
This is where AI-based variant prioritization tools come in. Long-established tools such as Exomiser combine a patient's sequencing data with clinical findings encoded in Human Phenotype Ontology (HPO) terms to rank candidate variants. Newer AI systems such as DeepRare and Fabric GEM, introduced in 2024–2025, outperform Exomiser in certain comparisons, improving the ability to interpret raw genomic files and identify likely disease-causing variants. Especially in neonatal intensive care, rapid exome/genome sequencing combined with AI-assisted analysis can uncover treatable metabolic diseases or epileptic encephalopathies within days.
Right Drug, Right Dose: Pharmacogenomics
One of the most concrete applications of personalized medicine is pharmacogenomics: selecting drugs and doses based on the individual's genetic makeup. People metabolize drugs differently; the same dose can be ineffective in one patient and toxic in another.
The classic example is the CYP2D6 enzyme, which metabolizes about a quarter of clinically used drugs and whose activity varies widely between individuals. Researchers succeeded in modeling enzyme activity on a continuous scale using a neural network trained on all CYP2D6 gene sequences; when applied to patients taking CYP2D6-substrate drugs such as tamoxifen or venlafaxine, the model improved prediction of individual drug response. At a broader scale, models such as DeepDRA (2024) combine transcriptomic and genomic data and report high performance in predicting drug response.
AI's Roles in Genomic Interpretation
| Task | Notable AI tool | Clinical benefit |
|---|---|---|
| Pathogenicity of coding variants | AlphaMissense (2023) | Reduce VUS, contribute to diagnosis |
| Effect of non-coding variants | AlphaGenome (2025–26) | Predict disruptions of gene regulation |
| Rare-disease variant prioritization | Exomiser, DeepRare, Fabric GEM | Shorten the diagnostic odyssey |
| Drug response / dose | CYP2D6 neural net, DeepDRA | Improve efficacy, reduce side effects |
Responsibility and Limits
The power of these tools is proportional to how responsibly they are used. The outputs of models such as AlphaMissense and AlphaGenome are predictions, not definitive diagnoses; clinical classification of a variant requires expert interpretation within frameworks such as ACMG/AMP, alongside family history and functional evidence. Models may be less reliable in certain populations and in data-poor regions; representation gaps in training data bring corresponding equity risks. Genetic data is also deeply personal; privacy, consent, and data security are inseparable ethical issues of this era.
That said, the picture is clear: AI is becoming the key to turning genome sequencing from a pile of data into clinically actionable information. Personalized medicine is moving from promise to practice, in part because of this interpretive capability.
References
- Cheng J. et al. "Accurate proteome-wide missense variant effect prediction with AlphaMissense." Science (2023).
- "Predicting variant pathogenicity with AlphaMissense." Nature Reviews Genetics.
- Google DeepMind. "AlphaGenome: AI for better understanding the genome." deepmind.google (2025).
- "Google's AlphaGenome predicts the function of a DNA sequence." C&EN / ACS (January 2026).
- "AI-Based Tool Helps Diagnose Rare Diseases (DeepRare)." Inside Precision Medicine.
- "An optimized variant prioritization process for rare disease diagnostics: Exomiser and Genomiser." medRxiv (2025).
- "Toward predicting CYP2D6-mediated variable drug response from CYP2D6 gene sequencing data." Science Translational Medicine.
- "Artificial Intelligence and Multi-Omics in Pharmacogenomics: A New Era of Precision Medicine." PMC12381589.