Evo 2: Open-Source AI Trained on Trillions of DNA Base Pairs Across All Life Domains

DNA double helix illustration representing Evo 2 open-source AI model trained on trillions of genomic base pairs

Researchers released Evo 2, an open-source large genome model trained on trillions of base pairs of DNA spanning all three domains of life — bacteria, archaea, and eukaryotes — marking a major step beyond the bacterial-only Evo model released in 2025. After training at scale, Evo 2 developed internal representations of complex genomic features including regulatory DNA, gene structures, and splice sites in organisms like humans, which are difficult even for human experts to identify. The model is open source and available for use in genomics research, drug target identification, and protein function prediction.

Key Takeaways

  • Evo 2 trained on trillions of DNA base pairs from bacteria, archaea, and eukaryotes — the first genome foundation model to cover all three domains of life
  • Model autonomously learned to identify regulatory sequences, splice sites, and gene structures in complex (eukaryotic) genomes without explicit supervision
  • Open-source release by the Arc Institute / Anthropic-backed team; predecessor Evo was bacterial-only; Evo 2 extends to human and other complex genomes

Original source: Ars Technica