How dedicated AI systems are solving problems that human researchers could not, from protein folding to new materials to autonomous laboratories.
This guide is written by Yuma Heymans (@yumahey), founder of o-mega.ai and creator of the AI Agent Index tracking 600+ autonomous AI systems. His research into AI agent architectures informs how autonomous systems are reshaping scientific discovery.
In October 2024, two Nobel Prizes were awarded for AI work. The Physics prize went to John Hopfield and Geoffrey Hinton for foundational neural network research. The Chemistry prize went to Demis Hassabis, John Jumper, and David Baker for protein structure prediction and computational protein design - NobelPrize.org. This was the first time in Nobel history that AI-driven work took home the most prestigious award in science. It signaled something the research community had been feeling for years: AI is no longer a tool that assists scientists. In specific domains, it is the scientist.
This guide is not about using ChatGPT to summarize papers or having an LLM do a literature review. Those are useful but trivial applications. This guide is about a fundamentally different category: dedicated AI systems that solve scientific problems humans could not solve on their own. Systems that discover new molecules, predict protein structures, design materials that never existed, run millions of experiments autonomously, and contribute mathematical proofs that stumped human experts for decades.
The landscape has shifted dramatically in 2025 and 2026. AI-discovered drugs are entering Phase II clinical trials. Autonomous laboratories run 2.2 million experiments per week. A genomic foundation model trained on 9.3 trillion nucleotides can predict the functional impact of genetic mutations without ever being told what a mutation does. An AI system achieved gold-medal performance at the International Mathematical Olympiad. And a startup raised $550 million specifically to build "AI science factories" that run 24/7 with minimal human intervention.
What follows is the deep dive into every major domain where AI is doing real science, not assisting it.
Contents
- Protein folding and design: the problem that started it all
- Drug discovery: from target to clinic in months
- Self-driving laboratories: AI that runs the experiments
- Materials science: 800 years of discovery in one model
- Genomics and cell biology: foundation models for life
- Weather and climate: replacing physics with prediction
- Mathematics: AI that proves and discovers
- Chemistry: automated synthesis and reaction prediction
- Fusion energy and physics: controlling plasma with RL
- The AI Scientist: end-to-end autonomous research
- Where AI-driven science actually fails
- The platforms and infrastructure behind it all
- What comes next
1. Protein Folding and Design: The Problem That Started It All
The protein folding problem was one of the grand challenges of biology for fifty years. Proteins are chains of amino acids that fold into complex 3D shapes, and those shapes determine what the protein does. Knowing the structure tells you how a drug might bind to it, how a disease works, and how to engineer new proteins for industrial or therapeutic purposes. But determining a single protein's structure experimentally used to take years of X-ray crystallography or cryo-electron microscopy work, and the results were often incomplete.
AlphaFold 2 solved this in 2020. Not approximately. Not for a subset of cases. For the practical purposes of biology, it solved the problem - Nature. DeepMind's system predicted the 3D structure of virtually all 200 million known proteins with accuracy competitive with experimental methods. What previously took months per protein now takes minutes. Over 2 million researchers across 190 countries have used the AlphaFold Protein Structure Database. The scientific impact is difficult to overstate: it gave structural biologists immediate access to structures they might have spent their entire careers trying to determine.
The approach is not a standard LLM. AlphaFold uses a custom neural architecture called the Evoformer, which processes multiple sequence alignments (evolutionary relationships between proteins) and pairwise residue interactions simultaneously. It combines attention mechanisms with geometric reasoning about 3D coordinates. The training data is the evolutionary history of proteins across billions of years, not text.
AlphaFold 3 arrived in May 2024, and it expanded the scope dramatically. Where AlphaFold 2 predicted individual protein structures, AlphaFold 3 models entire biomolecular complexes: proteins interacting with DNA, RNA, small molecules, ions, and modified residues - Nature. It replaced the structure module with a diffusion model (the same class of generative model behind image generators like Stable Diffusion, but applied to 3D molecular coordinates). The accuracy improvements over specialized tools are substantial: far better for protein-ligand interactions than state-of-the-art molecular docking, and significantly higher accuracy for antibody-antigen prediction than its predecessor AlphaFold-Multimer v2.3. The source code was released for non-commercial use in November 2024, with public availability following in February 2025.
But the story does not stop at structure prediction. The more consequential development is protein design: using AI to create entirely new proteins that do not exist in nature.
AlphaProteo, announced by DeepMind in September 2024, designs novel protein binders (proteins that attach to specific targets) with 3x to 300x better binding affinities than the best existing methods - DeepMind. On the viral protein BHRF1, 88% of candidate molecules bound successfully. Most remarkably, AlphaProteo was the first system, human or AI, to successfully design a binder for VEGF-A, a protein target relevant to cancer and macular degeneration that had resisted all previous design attempts. Binders for the SARS-CoV-2 spike protein were confirmed to block viral infection in human cell studies conducted at the Francis Crick Institute.
Parallel to DeepMind's work, David Baker's lab at the University of Washington (Baker shared the 2024 Nobel) developed RFdiffusion, a generative model for de novo protein design - Nature. When combined with ProteinMPNN (for sequence design) and AlphaFold 2 (for validation), it forms a complete computational pipeline: imagine a protein that does X, generate its 3D structure, design the amino acid sequence, and validate that the sequence folds correctly. In 2025, RFdiffusion was extended to antibody design, generating antibodies that bind user-specified molecular targets with atomic precision - Nature.
RFdiffusion3, released in 2025, is described by Baker's lab as "the most powerful and versatile protein engineering technology to date." It can now design enzymes with performance nearly on par with those found in nature, something the field has been working toward for decades. Applications already demonstrated include enzymes for breaking down plastic, removing greenhouse gases from air, and neutralizing venomous snake bites - GeekWire.
The open-source ecosystem has grown rapidly. Boltz-2 (MIT license) is the first deep learning model to approach the accuracy of physics-based free energy perturbation methods while running 1000x faster. Chai-1 from Chai Discovery and OpenFold from AQ Laboratory provide additional open alternatives. The democratization of protein structure prediction and design is accelerating faster than nearly any previous scientific capability.
What makes this entire domain different from "using AI to help scientists" is that these systems are doing the science. AlphaProteo does not assist a human designer. It designs proteins that no human could design using existing methods. RFdiffusion generates 3D structures from scratch that have no natural template. The AI is the discoverer.
2. Drug Discovery: From Target to Clinic in Months
The traditional drug discovery pipeline takes 10-15 years and $2-3 billion to bring a single new drug to market. AI is compressing both the timeline and the cost by orders of magnitude, and the results are no longer theoretical. They are in clinical trials.
Insilico Medicine's rentosertib is the most advanced AI-discovered drug in the world. Both the drug target and the molecule were identified using AI. The target was discovered via Insilico's PandaOmics platform (which analyzes multi-omics data to find disease mechanisms), and the molecule was designed via Chemistry42 (a generative AI system that creates novel molecular structures). The entire process from target discovery to Phase I clinical trial took 18 months, compared to the typical 6 years. The estimated cost was roughly $40 million, versus $400 million for traditional approaches.
The Phase IIa trial results, published in Nature Medicine in June 2025, showed that patients with idiopathic pulmonary fibrosis receiving the 60mg dose had a +98.4 mL improvement in forced vital capacity, compared to a -20.3 mL decline in the placebo group - Insilico Medicine. That is a clinically meaningful difference for a devastating lung disease. In March 2026, Eli Lilly signed a deal worth up to $2.75 billion ($115 million upfront) for worldwide rights to develop and commercialize certain Insilico compounds - CNBC.
Isomorphic Labs, DeepMind's drug discovery spinoff, is applying AlphaFold-derived technology to pharmaceutical development. It has partnerships with Eli Lilly and Novartis valued at nearly $3 billion combined. Several lead candidates for oncology and immune-mediated disorders are in IND-enabling studies, with first-in-human trials expected by late 2026.
The scale at which AI drug discovery now operates is staggering. Recursion Pharmaceuticals, which merged with Exscientia in July 2025, runs approximately 2.2 million experiments per week through its BioHive-2 supercomputer (504 NVIDIA H100 GPUs, 2 exaflops of AI performance). The system generates 135 terabytes of multi-timepoint cell images weekly, creating massive phenomics datasets that AI models mine for drug-disease relationships. Five clinical programs are advancing. Over $500 million in upfront and milestone payments have been earned - NVIDIA.
The broader pipeline is enormous and growing. As of 2026, more than 200 AI-originated drugs are in clinical development, up from just 3 in 2016 and 67 in 2023. Phase I success rates for AI-discovered compounds are running at 80-90%, compared to the historical average of approximately 52% - BioMed Nexus. The market is valued at $1.9 billion in 2025 and projected to reach $2.6 billion in 2026.
One of the most compelling demonstrations comes from MIT's antibiotic discovery work. In 2020, researchers used AI to screen over 100 million chemical compounds and identified halicin, a molecule that kills drug-resistant bacteria (C. difficile, A. baumannii, M. tuberculosis) through a completely novel mechanism: disrupting the electrochemical gradient across bacterial membranes - MIT News. No human had identified this mechanism or this molecule. In 2023, they found abaucin, an antibiotic specific to A. baumannii that works through a different novel mechanism (lipoprotein trafficking interference). By 2025, the team shifted to generative AI, designing entirely novel antibiotic compounds that do not exist in any chemical library - MIT News. This progression from screening existing molecules to generating new ones mirrors the broader trajectory of the field.
No AI-discovered drug has yet received FDA approval. That milestone is likely within the next 2-3 years, with Takeda's zasocitinib and Insilico's rentosertib as the leading candidates. But the volume, speed, and cost advantages are already reshaping how pharmaceutical companies allocate R&D budgets. We explored how AI agents are transforming business operations in our guide to AI-generated business processes, and drug discovery is one of the most capital-intensive domains where this transformation is measurable in dollars and clinical outcomes.
3. Self-Driving Laboratories: AI That Runs the Experiments
The most radical shift in how science is done is not a better model or a faster algorithm. It is the autonomous laboratory: a facility where AI designs experiments, robotic systems execute them, and the AI analyzes results and decides what to run next, continuously, with minimal human oversight.
Lila Sciences, launched by Flagship Pioneering in March 2025, represents the most ambitious bet in this space. It raised $200 million in seed capital, then followed with a $350 million Series A, bringing total funding to $550 million at a valuation exceeding $1.3 billion. The company is building what it calls "AI Science Factories": automated labs packed with robotics, sensors, and specialized AI models that run experiments nonstop - Flagship Pioneering. The chief scientist is George Church, a CRISPR pioneer and one of the most influential geneticists alive. Lila has already generated novel antibodies, peptides, and binders that outperform commercially available therapeutics.
The concept is not entirely new, but the scale and autonomy are. Emerald Cloud Lab operates one of the world's largest commercial cloud laboratories, where researchers can design experiments remotely and have robotic systems execute them. Carnegie Mellon University partnered with ECL to create the first fully remote, AI-integrated lab accessible to students and researchers - Nature.
The Coscientist project from Carnegie Mellon (2023) was the first demonstration of a non-organic intelligent system that designs, plans, and executes a real chemistry experiment. Using GPT-4 and Claude connected to robotic lab APIs, it went from a plain-language prompt ("synthesize aspirin") to a complete experimental protocol, executed the experiment, and analyzed the results - Nature. This was published in Nature and demonstrated that the gap between "AI that suggests experiments" and "AI that does experiments" had been closed.
North Carolina State University published work in Nature Chemical Engineering (2025) on a self-driving flow chemistry lab that collects 10x more data per experiment than conventional approaches. Nature named self-driving laboratories one of the top technologies to watch in 2025.
The A-Lab at Lawrence Berkeley National Laboratory is an autonomous robotic laboratory that synthesizes new materials based on AI predictions. Over 17 days of continuous automated operation, it attempted 58 synthesis recipes and successfully created 41 new compounds: a 71% success rate with zero human intervention - Lawrence Berkeley. The materials it synthesized were predicted by DeepMind's GNoME system, creating a closed loop: AI predicts new materials, autonomous lab creates them.
This closed-loop paradigm (hypothesize, experiment, analyze, repeat) is what distinguishes self-driving labs from simple automation. Traditional lab automation executes predefined protocols. Self-driving labs decide what to do next based on what they learned from the last experiment. The AI is not just the hands. It is the mind directing the hands. And it operates at speeds and scales that human-directed research cannot match.
The question the user raised, whether there are systems that can run a thousand experiments in parallel on a specific problem, is not hypothetical. Recursion runs 2.2 million experiments per week. That is over 300,000 per day, far exceeding a thousand in parallel. The constraint is no longer computational or even robotic. It is the physical throughput of the laboratory equipment and the cost of reagents. As these facilities scale, the number of experiments an AI can orchestrate per day will continue to grow.
4. Materials Science: 800 Years of Discovery in One Model
The discovery of new materials has historically been slow, serendipitous, and expensive. Finding a new crystal structure, alloy, or catalyst typically involved years of trial-and-error experimentation guided by chemical intuition. AI has compressed that timeline by orders of magnitude.
GNoME (Graph Networks for Materials Exploration), released by Google DeepMind in 2023, discovered 2.2 million new crystal structures - Nature. To put that number in context, the total number of known inorganic crystals accumulated by all human scientists across all of history was approximately 48,000. GNoME multiplied the known stable materials by a factor of nearly 50. Of these, 380,000 are the most stable and promising for experimental synthesis, and they were contributed to the Materials Project database for any researcher to access. Independent labs have already created 736 of these predicted structures, confirming their physical existence.
The approach is a graph neural network (not an LLM) that processes crystal structures as graphs, where atoms are nodes and bonds are edges. It was trained on existing materials databases and learned to predict which hypothetical compositions would form stable structures. The model does not generate text. It generates crystal structures directly, predicting atomic positions and lattice parameters.
MatterGen from Microsoft, published in Nature in January 2025, takes the inverse approach. Instead of screening a vast space of possible structures, it directly generates novel materials meeting specified property requirements - Nature. This is the difference between searching a haystack and manufacturing the needle. Generated structures are 2x more likely to be new and stable compared to previous generative models, and 10x closer to local energy minimum (meaning they are more physically realistic). A novel material predicted by MatterGen (TaCr2O6) was synthesized in a real lab and experimentally confirmed to match the predicted structure.
MatterSim, also from Microsoft (2024), handles atomistic simulation across the periodic table at temperatures from 0 to 5,000 Kelvin and pressures from 1 atmosphere to 10 million atmospheres - Microsoft Research. It achieved 10x improvement in accuracy at finite temperatures and pressures compared to previous state-of-the-art, with 90-97% data savings for complex simulation tasks. This matters because real materials operate at specific temperatures and pressures, not at the idealized zero-Kelvin conditions that many models assume.
The materials science AI pipeline now looks like this: GNoME or similar systems predict stable structures. MatterGen generates materials with specific desired properties. MatterSim simulates their behavior under real-world conditions. And the A-Lab (or similar autonomous facilities) synthesizes and validates the most promising candidates. The entire loop, from prediction to physical material, can now run with minimal human intervention.
5. Genomics and Cell Biology: Foundation Models for Life
The concept of foundation models (large neural networks pre-trained on massive datasets and then fine-tuned for specific tasks) has migrated from language to biology, with models trained not on text but on the raw molecular sequences of life.
Evo 2, published in Nature in March 2026 by the Arc Institute, is the largest AI model in biology. It was trained on 9.3 trillion nucleotides from 128,000+ whole genomes spanning all three domains of life (bacteria, archaea, eukaryotes) - Nature. It has a 1 million token context window with single-nucleotide resolution, meaning it can process an entire human chromosome-scale sequence and reason about how individual base pairs affect function.
What makes Evo 2 remarkable is what it can do without task-specific training. Without being explicitly taught what mutations are harmful, it predicts the functional impact of genetic variation. It correctly identifies noncoding pathogenic mutations (mutations in "junk DNA" that actually cause disease), a category that most existing tools struggle with because these mutations do not change protein sequences. It identifies clinically significant BRCA1 variants (relevant to breast cancer risk). It does this by learning the statistical structure of genomes across all of life, such that deviations from that structure signal functional disruption.
scGPT, published in Nature Methods in 2024, is a foundation model for single-cell analysis trained on 33+ million single-cell RNA-sequencing profiles - Nature Methods. It performs cell type annotation, perturbation analysis, gene network discovery, and can predict the transcriptional consequences of genetic perturbations. It can also work in reverse: given a desired cellular state, it can identify genetic interventions likely to achieve that state. This reverse-engineering capability is directly relevant to cell regeneration and stem cell therapy, where the core challenge is guiding cells to differentiate into specific tissue types.
The user asked specifically about cell regeneration and AI. This intersection is active but still early. Stem cell therapy has evolved into a $28 billion clinical reality as of 2026. AI is being used to predict mesenchymal stem cell differentiation (which cell types they will become), immunomodulatory function (how they interact with the immune system), and therapeutic potential by analyzing multi-omics and imaging data. Deep learning models based on cell morphology can now predict differentiation propensity before conducting experiments, potentially eliminating months of wet-lab work per cell line. Phase II/III trials using iPSC-derived cardiomyocytes for heart failure and iPSC-derived dopaminergic neurons for Parkinson's disease reported promising interim results in 2025.
IonQ and CCRM announced a strategic partnership in 2026 specifically combining quantum computing with AI for regenerative medicine, with initial projects in Canada and Sweden - IonQ. This is speculative but directionally interesting: quantum simulation of molecular interactions combined with AI-driven experimental design could accelerate the identification of factors that trigger specific cell differentiation pathways.
The pattern across genomics is consistent with what we see in protein science and drug discovery: dedicated models trained on biological data (not text) discover patterns that human analysis missed. Our guide to looking inside LLMs explains the foundational technology, but these biological models represent a fundamentally different application of the same architectural principles: instead of compressing human language, they compress the language of life.
6. Weather and Climate: Replacing Physics With Prediction
Weather forecasting was one of the first domains where AI demonstrably outperformed the state of the art, and in December 2025, it became one of the first domains where AI models were deployed operationally by a national agency.
GenCast from Google DeepMind is a probabilistic weather model based on conditional diffusion. On a comprehensive evaluation of 1,320 forecast targets at 1-15 day lead times, GenCast outperformed ECMWF's 51-member ensemble (the previous gold standard) on 97.2% of targets. Beyond 36 hours, it beat the ensemble on 99.8% of targets - DeepMind. This is a probabilistic model, meaning it does not just predict "it will rain." It predicts probability distributions over possible weather outcomes, enabling better risk assessment for extreme events.
Huawei's Pangu-Weather matched or outperformed ECMWF's high-resolution deterministic model (HRES) on 73% of evaluation metrics, with lower tropical cyclone track errors beyond 48 hours - arXiv.
The operational milestone came on December 17, 2025, when NOAA deployed three AI weather models for production use: AIGFS, AIGEFS, and HGEFS - NOAA. A single 16-day AIGFS forecast uses only 0.3% of the computing resources of the traditional GFS model and finishes in approximately 40 minutes. AIGEFS extends forecast skill by an additional 18-24 hours compared to the traditional GEFS ensemble. These AI models complement rather than replace physics-based models, but the efficiency gain is extraordinary: comparable accuracy at 0.3% of the compute cost.
The approach is fundamentally different from traditional weather modeling. Physics-based models (like GFS) solve the Navier-Stokes equations numerically on a grid, simulating the physical dynamics of the atmosphere. AI weather models learn the mapping from current weather state to future weather state directly from historical data, without explicit physics. They treat weather forecasting as a pattern completion problem: given today's atmospheric state, what does the atmosphere typically do next?
This works because weather is a chaotic but statistically regular system. The same fundamental forces (solar heating, Coriolis effect, moisture dynamics) produce recognizable patterns that a neural network can learn from decades of reanalysis data. The AI does not understand atmospheric physics. It has learned the statistical structure of atmospheric evolution, which turns out to be sufficient for operationally useful forecasts.
The key limitation: AI weather models underperform on extreme weather events, precisely because extreme events are rare in training data - Nature. A model optimized for compression of typical weather patterns will, by definition, be less accurate on atypical patterns. This is the same compression trade-off we discussed in our guide on what the AI model wants: the model's drive toward parsimony sometimes fails at the extremes.
7. Mathematics: AI That Proves and Discovers
Mathematics has always seemed like a domain where AI would struggle, because mathematical proof requires exact logical reasoning, not the fuzzy pattern matching that neural networks excel at. Recent results have challenged that assumption dramatically.
AlphaProof and AlphaGeometry 2 from DeepMind solved 4 of 6 problems at the 2024 International Mathematical Olympiad, achieving silver-medal level performance. AlphaProof solved the competition's hardest problem (Problem 6, a number theory question), which only 5 out of 609 human contestants solved - Nature. AlphaProof works by combining a language model (that translates natural language math problems into Lean 4 formal statements) with a reinforcement learning system (that searches for proofs).
Then in July 2025, Gemini Deep Think raised the bar further: it solved 5 of 6 IMO problems, earning 35 out of 42 points and achieving gold-medal standard. Only 67 of 630 human contestants earned gold that year. It operated entirely in natural language within the standard 4.5-hour time limit, with no formal proof system needed - DeepMind.
Beyond competition math, AI is making genuine mathematical discoveries. FunSearch (DeepMind, 2023) used LLMs to find new solutions to the cap set problem, an open question in combinatorial geometry, and discovered more effective bin-packing algorithms. These were the first LLM-based discoveries on open mathematical problems.
AlphaEvolve (DeepMind, May 2025) is an evolutionary coding agent powered by Gemini. It found an algorithm for multiplying 4x4 complex-valued matrices using 48 scalar multiplications, improving upon Strassen's 1969 algorithm. No human had found this improvement in 55 years - DeepMind. AlphaEvolve also solved 35+ open math problems and has been deployed in production at Google, where it recovers 0.7% of worldwide compute resources through better data center orchestration, achieves 23% speedup in kernel tiling, and 32% speedup in FlashAttention for AI training.
OpenAI's GPT-5 contributed to solving Erdos Problem 848, a decades-old open problem in combinatorial number theory. Mathematicians Sawhney and Sellke were stuck on the final step of their proof until GPT-5 suggested how a specific odd number breaks the symmetry pattern. GPT-5 also scored 94.6% on the 2025 AIME (American Invitational Mathematics Examination) without using any external tools. Four new mathematical results carefully verified by human authors emerged from GPT-5 interactions - OpenAI.
The distinction between "AI assisting math" and "AI doing math" is blurring. When AlphaEvolve discovers an algorithm that humans could not find in 55 years of trying, and that algorithm is provably correct, the AI has done mathematics. When GPT-5 provides the key insight that unlocks a decades-old conjecture, it has contributed to a mathematical discovery. These are not marginal improvements on existing methods. They are novel results.
8. Chemistry: Automated Synthesis and Reaction Prediction
Beyond drug discovery, AI is transforming fundamental chemistry by automating the prediction and execution of chemical reactions.
IBM RXN for Chemistry predicts chemical reaction outcomes with 90% accuracy. Through a collaboration with Thieme Chemistry, forward prediction accuracy improved by 3x and retrosynthesis accuracy (predicting what starting materials yield a desired product) by 9x - IBM Research. RoboRXN extends this by connecting AI prediction to robotic execution: the system predicts a synthesis route, then directs a robotic lab to actually make the molecule.
Chemify, founded by Lee Cronin at the University of Glasgow, raised $50 million in Series B funding in October 2025 and opened the world's first "Chemifarm" for automated chemical synthesis - C&EN. Chemify's approach is distinctive: it uses Chemputation, a machine learning model built on a programming language called XDL (Chemical Description Language) that translates molecular characteristics into executable instructions for synthesis robots. In April 2026, three papers published simultaneously in PNAS, Nature Communications Chemistry, and Nature Communications Biology validated the approach across different chemical domains - PharmiWeb. Chemify has partnerships with 6 of the 20 biggest pharmaceutical companies.
The Chemify model is worth examining because it shows how AI for chemistry is not just about prediction. It is about closing the loop: predict what to make, generate the synthesis instructions, execute them robotically, analyze the results, and iterate. The "Chemifarm" concept is essentially a self-driving chemistry lab specialized for synthesis at scale.
This closed-loop approach is spreading across chemistry. The same pattern we see in drug discovery (AI predicts, robots execute, AI analyzes, repeat) is appearing in catalyst design, polymer science, battery chemistry, and agricultural chemistry. The throughput advantage over human-directed research is not 2x or 5x. It is measured in orders of magnitude, because autonomous systems can run experiments 24 hours a day, 7 days a week, and each cycle feeds directly into the next without the latency of human interpretation.
9. Fusion Energy and Physics: Controlling Plasma With RL
Fusion energy has been "30 years away" for decades. AI is helping close that gap, not by solving the physics, but by solving the control problem.
The core challenge with fusion reactors (tokamaks) is keeping the plasma stable. Plasma at 100+ million degrees Celsius is held in place by powerful magnetic fields generated by superconducting coils, and the shape and position of the plasma must be controlled in real time with millisecond precision. Traditional control systems use pre-programmed algorithms that cannot adapt to the chaotic dynamics of plasma behavior.
In 2022, DeepMind and EPFL published a landmark paper in Nature demonstrating the first deep reinforcement learning system to autonomously control tokamak coils and maintain plasma stability - Nature. Tested on the Variable Configuration Tokamak (TCV) in Lausanne, the RL agent learned to contain plasma and even sculpt it into different configurations (including a rare "droplet" shape) that would be extremely difficult to achieve with traditional control.
TORAX (2024) is DeepMind's open-source plasma simulator for modeling core tokamak plasma dynamics, predicting changes in temperature, density, and electric current. In 2025, DeepMind partnered with Commonwealth Fusion Systems, one of the leading private fusion companies, to apply TORAX to their SPARC tokamak. The simulator will allow CFS to run millions of virtual plasma experiments before SPARC is turned on, and adapt their operational plans once real data arrives - DeepMind.
Beyond fusion, AI is making contributions across physics. At CERN, machine learning algorithms were instrumental in the discovery of the Higgs boson, the observation of the ultra-rare Bs meson two-muon decay, and single top-quark production - CERN. ATLAS and CMS collaborations now use AI for anomaly detection in collision data, searching for signals of new particles that deviate from Standard Model predictions.
Symbolic regression represents another angle: AI that discovers physical laws from data. AI-Feynman (MIT) uses physics-inspired methods to rediscover known laws. More recent work (November 2025, Nature Computational Science) on parallel symbolic enumeration evaluates millions of expressions simultaneously, outperforming previous methods for distilling mathematical relationships from experimental measurements - Nature Computational Science. Rather than fitting data to a known equation, these systems search the space of all possible equations to find the simplest one that fits the data, essentially rediscovering physics from raw observations.
10. The AI Scientist: End-to-End Autonomous Research
Can AI do the entire scientific process? Design the research question, review the literature, formulate hypotheses, run experiments, analyze results, and write up findings?
Sakana AI's "The AI Scientist" is the most ambitious attempt at this. The original version (2024) was a fully automated pipeline for end-to-end paper generation at a cost of approximately $6-$15 per manuscript - Sakana AI. AI Scientist-v2 produced the first fully AI-generated paper to pass rigorous human peer review, receiving an average reviewer score of 6.33, placing it in the top 45% of submissions - Nature.
The system works by chaining LLM calls: one agent reviews the literature, another formulates hypotheses, another designs experiments, another writes and executes code to run those experiments, another analyzes the results, and a final agent writes the paper. The entire pipeline is automated.
The limitations are instructive. 42% of experiments failed due to coding errors. Novelty assessments were poor, with the system sometimes misclassifying established concepts as novel discoveries. The papers it produces are competent but not groundbreaking. This is the state of the art in "general-purpose AI scientist" systems: they can execute the mechanical steps of the scientific process, but they lack the deep domain intuition that drives truly novel research.
Google's AI Co-Scientist (February 2025) takes a more targeted approach. Built on Gemini 2.0, it is a multi-agent system that generates hypotheses, experimental protocols, and research overviews, but it does not execute experiments itself. Instead, it reduces hypothesis generation from weeks to days and feeds suggestions to human researchers for validation - Google Research. The results have been validated in wet-lab experiments: it predicted drug repurposing candidates for acute myeloid leukemia and discovered epigenetic targets for liver fibrosis. Of 15 top-ranked proposals for liver fibrosis, two significantly inhibited fibrosis without cell toxicity. Researchers at Imperial College London reported that the system produced in days the same hypothesis their team took years to develop independently.
The distinction between these two approaches mirrors a broader tension in AI-driven science. The fully autonomous approach (Sakana) is technically impressive but produces mediocre science. The collaborative approach (Google AI Co-Scientist, and the general pattern of AI systems feeding into human-directed research) produces better science but requires human judgment at key decision points. The most productive paradigm today is human-AI collaboration, where AI handles the computationally intensive search and pattern recognition while humans provide the creative direction and critical evaluation.
Our analysis of self-improving AI agents explores a related question: can AI systems improve their own capabilities autonomously? The answer in the scientific context is nuanced. AI can improve its experimental designs based on results (the self-driving lab loop). But the meta-scientific decisions (which questions are worth asking, which results are surprising, what constitutes a breakthrough) still require human judgment. At least for now.
11. Where AI-Driven Science Actually Fails
The hype around AI in science is intense, and it is worth being honest about the failure modes.
No AI-discovered drug has been FDA-approved. The pipeline is large and growing, but the final step (Phase III trials, regulatory approval, and real-world patient outcomes) has not yet been crossed. Drug development has a long history of promising candidates failing in late-stage trials, and there is no guarantee that AI-discovered drugs will escape this pattern.
AI weather models fail on extreme events. The same compression dynamics that make them efficient make them underperform on rare, extreme weather patterns. For the events that matter most (hurricanes, heat waves, flash floods), traditional physics-based models remain essential.
The AI Scientist's 42% experiment failure rate reveals that autonomous research systems are brittle. They can follow the mechanical steps of science but lack the intuition to debug unexpected failures, recognize when an experiment's setup is fundamentally flawed, or adjust research direction based on surprising results.
Galactica (Meta, 2022) was a foundation model trained on 106 billion tokens from scientific papers. It was pulled after 3 days because it generated plausible-sounding but factually incorrect scientific text - MIT Technology Review. The lesson: LLMs trained on scientific text are not the same as AI systems that do science. Generating text that sounds scientific is fundamentally different from generating scientific knowledge. The successful AI-science systems in this guide (AlphaFold, GNoME, RFdiffusion, Recursion's phenomics platform) are not text generators. They are specialized architectures trained on scientific data to produce scientific outputs (structures, molecules, predictions), not prose.
Hallucination in scientific contexts is dangerous. When an LLM hallucinates a plausible-sounding citation, a reader might waste hours looking for a paper that does not exist. When an AI system hallucinates a molecular property, a pharmaceutical company might invest millions in a dead-end compound. The distinction between text-based AI (which hallucinates frequently) and structure-based scientific AI (which produces verifiable predictions) is critical. AlphaFold's predictions can be experimentally validated. A chatbot's claims about drug interactions cannot.
Reproducibility concerns are emerging. AI models are complex, and the training data and hyperparameters that produce a specific result may not be fully documented or reproducible. The scientific community is still developing standards for AI reproducibility in research contexts.
The honest assessment is that AI-driven science is extraordinarily powerful in domains where the problem can be formulated as pattern recognition or optimization over large structured datasets (protein folding, materials prediction, drug screening, weather forecasting). It is much less effective in domains that require novel conceptual frameworks, philosophical insight, or judgment about what questions are worth asking. The most productive paradigm remains hybrid: AI for search and pattern recognition, humans for direction and evaluation.
12. The Platforms and Infrastructure Behind It All
Building AI systems for scientific research requires specialized infrastructure that differs significantly from general-purpose AI deployment. Several platforms have emerged specifically for this purpose.
Recursion (NASDAQ: RXRX) operates the largest AI drug discovery platform by experimental throughput. Its BioHive-2 supercomputer combines 504 NVIDIA H100 GPUs with proprietary phenomics imaging pipelines. The platform generates and analyzes biological data at a scale no human team could process: 2.2 million experiments per week, 135 TB of images weekly.
Benchling provides AI-powered lab informatics, starting at approximately $15,000-20,000 per year for academic institutions. It recently integrated free access to AlphaFold, Chai-1, and Boltz-2 for structure prediction directly within its platform, lowering the barrier for labs that want to incorporate AI-driven structural biology without building their own infrastructure.
Schrodinger (NASDAQ: SDGR) offers computational chemistry and drug design tools with academic pricing starting from $7,500 per year. Its platform has been integrated with Eli Lilly's TuneLab for collaborative drug design.
Emerald Cloud Lab provides cloud-accessible robotic lab infrastructure where researchers can design and execute experiments remotely without owning physical lab equipment. This model is particularly relevant for AI-driven research because it provides the API layer that allows AI systems to directly interact with physical laboratory equipment.
For organizations deploying AI agents across business operations, platforms like o-mega.ai handle the orchestration of multiple specialized AI agents working toward complex goals. The same architectural pattern, multi-agent systems where each agent handles a specialized sub-task, appears in both scientific AI (where separate agents handle hypothesis generation, experiment design, data analysis, and literature review) and business AI (where separate agents handle research, communication, data processing, and workflow automation). The multi-agent orchestration approach that powers business automation is architecturally similar to Google's AI Co-Scientist, which uses multiple specialized agents for different stages of the research process.
{
"title": "AI Drug Discovery Pipeline Growth",
"subtitle": "Number of AI-originated drugs in clinical development by year",
"type": "line",
"xKey": "year",
"yKeys": [{"key": "drugs", "label": "Drugs in Clinical Development"}],
"data": [
{"year": "2016", "drugs": 3},
{"year": "2018", "drugs": 12},
{"year": "2020", "drugs": 31},
{"year": "2022", "drugs": 52},
{"year": "2023", "drugs": 67},
{"year": "2024", "drugs": 130},
{"year": "2025", "drugs": 170},
{"year": "2026", "drugs": 200}
],
"source": "BioMed Nexus / Pharma AI Tracker",
"sourceUrl": "https://biomednexus.com/ai-drug-discovery-companies-clinical-candidates-2026/"
}
{
"title": "AI vs Traditional Drug Discovery Metrics",
"subtitle": "Comparing cost, time, and Phase I success rates",
"type": "grouped-bar",
"xKey": "metric",
"yKeys": [
{"key": "traditional", "label": "Traditional"},
{"key": "ai_driven", "label": "AI-Driven"}
],
"data": [
{"metric": "Phase I Success Rate (%)", "traditional": 52, "ai_driven": 85},
{"metric": "Target-to-Phase I (months)", "traditional": 72, "ai_driven": 18},
{"metric": "Estimated Cost ($M)", "traditional": 400, "ai_driven": 40}
],
"source": "Insilico Medicine / BioMed Nexus",
"sourceUrl": "https://insilico.com/blog/first_phase2"
}
%%title: The AI-Driven Scientific Discovery Loop
%%subtitle: How autonomous systems close the hypothesis-experiment-analysis cycle
graph TD
A ["AI Hypothesis Generation<br/><i>AI Co-Scientist, Foundation Models</i>"] --> B ["Experiment Design<br/><i>Automated protocol generation</i>"]
B --> C ["Autonomous Execution<br/><i>Self-driving labs, Recursion, A-Lab</i>"]
C --> D ["Data Collection<br/><i>135 TB/week imaging, sensor data</i>"]
D --> E ["AI Analysis<br/><i>Pattern recognition, anomaly detection</i>"]
E --> F ["Result Evaluation<br/><i>Statistical validation, comparison</i>"]
F --> A
13. What Comes Next
The trajectory is clear even if the timeline is uncertain. Several trends are converging that will accelerate AI-driven science over the next 2-3 years.
Foundation models for every scientific domain are being built. Evo 2 covers genomics. AlphaFold covers protein structure. GNoME covers materials. GenCast covers weather. The pattern of training large models on domain-specific scientific data (not text) and using them to make predictions that generalize across the domain will expand to domains not yet covered: ecology, geology, pharmacology, neuroscience.
The self-driving lab will become standard infrastructure. Lila Sciences' $550 million raise signals that investors believe autonomous laboratories will be as fundamental to research as computational clusters are today. The cost of robotic lab equipment is falling. The AI capable of directing experiments is improving. The closed loop of hypothesize-experiment-analyze-iterate will become the default mode of empirical research, not an exotic capability.
Multi-modal scientific AI will merge structure prediction, simulation, experimental design, and literature analysis into single systems. Google's AI Co-Scientist is an early step. The end state is a system that reads the literature, identifies gaps, proposes experiments, predicts outcomes, directs robots to run the experiments, analyzes the results, and writes up findings, with human scientists providing oversight, direction, and the creative judgment that AI systems still lack.
The convergence of quantum computing and AI for scientific applications is early but real. IonQ's partnership with CCRM for regenerative medicine, and DeepMind's partnership with CFS for fusion, both represent bets that hybrid quantum-AI systems will be able to simulate molecular and plasma dynamics at scales that purely classical AI cannot reach. If quantum hardware continues to improve, this could unlock capabilities in drug design, materials science, and energy research that are currently computationally intractable.
The economics are shifting. When an AI weather forecast costs 0.3% of the compute of a traditional model, when AI drug discovery costs one-tenth of traditional approaches, and when a self-driving lab produces 10x more data per experiment, the cost-benefit calculus for AI-driven research becomes overwhelming. Organizations that do not adopt these tools will be at a structural disadvantage in the speed and cost of their research outputs.
The most profound shift is conceptual. Science has historically been a human cognitive activity supported by instruments. What these developments suggest is a future where science is increasingly an AI cognitive activity supported by human judgment. The instruments (telescopes, microscopes, particle accelerators) are being joined by AI systems that do not just observe and measure, but hypothesize, design, discover, and prove. The human role shifts from doing the science to directing the science: deciding which problems matter, evaluating which AI-generated results are significant, and providing the creative intuition that bridges between disconnected domains.
We are still in the early chapters of this transition. No AI has made a discovery equivalent to general relativity or the double helix. But the pace of AI-driven scientific achievement in 2024-2026, from Nobel-winning protein prediction to gold-medal mathematics to drugs entering Phase II trials, suggests that the ceiling is much higher than most people imagine.
%%title: AI Scientific Discovery Landscape (2026)
%%subtitle: Major domains and their maturity levels
graph LR
subgraph OPERATIONAL ["Deployed in Production"]
W ["Weather Forecasting<br/><i>NOAA operational</i>"]
P ["Protein Prediction<br/><i>2M+ researchers</i>"]
end
subgraph CLINICAL ["In Clinical Trials"]
D ["Drug Discovery<br/><i>200+ candidates</i>"]
R ["Regenerative Med<br/><i>Phase II/III</i>"]
end
subgraph SCALING ["Rapidly Scaling"]
M ["Materials Science<br/><i>2.2M crystals</i>"]
L ["Self-Driving Labs<br/><i>$550M invested</i>"]
G ["Genomics<br/><i>9.3T nucleotides</i>"]
end
subgraph EMERGING ["Early But Real"]
F ["Fusion Control<br/><i>RL for tokamaks</i>"]
MA ["Mathematics<br/><i>IMO gold</i>"]
Q ["Quantum + AI<br/><i>Partnerships forming</i>"]
end
This guide reflects the state of AI-driven scientific research as of April 2026. This field is evolving faster than any other domain of AI application. Verify current details, especially clinical trial statuses and funding figures, against the latest sources before making decisions.