Back to stories

AlphaFold: How AI Changed Protein Structure Prediction

C1

AlphaFold showed that a computer system can predict many protein 3D structures with near-experimental accuracy, and it changed how biologists get structural information. DeepMind presented major results in 2020, after AlphaFold2 won the CASP14 competition, and the company and its partner EMBL-EBI released a large public database in 2021. This matters because protein shape strongly affects function, and lab methods such as X-ray crystallography and cryo-electron microscopy can be slow, expensive, or difficult for certain proteins.

The scientific problem AlphaFold targeted is the protein folding problem: given an amino-acid sequence, predict the final 3D arrangement of atoms. A protein is a chain of 20 common amino acids, and its sequence is encoded in DNA. The chain folds because of physical forces such as hydrophobic interactions and hydrogen bonds, and because of constraints in the backbone geometry. Predicting the fold is hard because a chain has many possible conformations, and because proteins often have flexible parts. Reliable structure prediction helps researchers infer active sites, binding pockets, and how mutations change stability.

A key reason AlphaFold worked is that it combined two strong sources of information: evolutionary patterns and structural geometry. The system uses multiple sequence alignments, which are collections of related protein sequences found in databases. When two residues mutate together across evolution, it often indicates that they are close in 3D space, because contact helps maintain function. Earlier methods used this “coevolution” idea, but AlphaFold connected it tightly to a neural network that reasons directly about distances and angles, turning statistical signals into a consistent 3D model.

AlphaFold2’s architecture introduced a way to update both sequence-level features and pairwise residue relationships repeatedly, so the model can refine its own understanding. DeepMind described an “Evoformer” module that passes information between a representation of the alignment and a representation of residue-residue pairs. After that, a structure module generates 3D coordinates and then recycles its output back into the network for further improvement. This recycling is important because structure prediction is a global constraint problem: a local choice, like a helix position, affects long-range contacts across the entire protein.

Training also mattered, because the model learned from decades of experimental structures. Public resources like the Protein Data Bank (PDB), founded in 1971, contain experimentally determined structures from techniques such as X-ray crystallography, NMR spectroscopy, and cryo-EM. AlphaFold learned patterns that connect sequence to geometry, such as typical bond lengths, angles, and common secondary structures like alpha helices and beta sheets. DeepMind reported that the 2020 system reached accuracy levels in CASP14 that were close to experimental results for many targets, measured by established metrics used in the competition.

AlphaFold’s impact on biomedical research came quickly because access was broad and the outputs included confidence estimates. The AlphaFold Protein Structure Database, released in 2021 by DeepMind and EMBL-EBI, provided predicted structures for large parts of the human proteome and many model organisms. Researchers could download structures, visualize likely domains, and focus experiments on the most uncertain regions. The model’s per-residue confidence score, often discussed as pLDDT, helps users judge where the prediction is strong and where flexible loops or disordered regions make any single structure less reliable.

The technology also changed how labs plan experiments and interpret disease mutations. Structural predictions can guide mutagenesis experiments by suggesting which residues form an active site or a binding interface. In medical genetics, a mutation that disrupts a tightly packed core or a catalytic residue is more likely to affect function than one on a flexible surface loop, and predicted structures help make that reasoning more concrete. AlphaFold models also help in drug discovery when a protein structure is missing, although real drug design still depends on details like ligand-induced conformational changes and water molecules that can be hard to capture without experiments.

AlphaFold is not a complete solution to structural biology, and its limits are important for responsible use. Many proteins act as complexes, bind DNA or RNA, or change shape when they bind partners, and single-chain predictions do not automatically provide those states. Some regions are intrinsically disordered and do not have one stable structure. Post-translational modifications, cofactors, and membranes can also influence shape. These limits matter because they define where experiments remain essential and where computational prediction should be treated as one strong piece of evidence, not the final answer.

AlphaFold’s broader significance is that it compressed the time from gene sequence to structural hypothesis from months or years to hours or days, which supports faster cycles of research. It also made protein structure a more routine input for fields like microbiology, enzyme engineering, and human disease biology. As newer methods expand to protein complexes and dynamics, the main lesson remains: combining large biological datasets with geometry-aware machine learning can produce tools that change what counts as “available knowledge” for everyday science.

Translation

Example

Save to my dictionary

Free account required

Want to save words and practice them?

Open Readerly to build your personal vocabulary with spaced repetition — completely free.