Designing protein drugs has never been easier? How far is the “Hand of God” from us

▎WuXi AppTec Content Team Editor

In a comfortable infusion room, a patient lies on a leather sofa and watches as the fluid in an infusion bag enters his body drop by drop along a clear plastic tube. He’s using a newer cancer drug that has shrunk the size of the tumor by 90 percent. The doctor said it was a protein molecule that did not exist in nature, tailored to his disease, that was why it was so effective against the tumor.

Among the thousands of thoughts that flashed through David Baker’s mind, there was perhaps such a scene. The well-known scientist at the University of Washington looks more like an artist with shaggy hair. But his team is about to overcome a huge scientific problem – relying on human wisdom to design proteins with special functions that did not exist in nature. They have the potential to diagnose, treat, and even cure disease.

Image source: David Baker Lab website


In living organisms, proteins have fascinated many scientists. The size of this molecule is only nanometers, but its complexity can exceed that of any man-made machine, which is also evident in the subtlety of nature.

These complex proteins are assembled from simple amino acids, each with an average of only 19 atoms. Due to the subtle chemical properties or structural differences of these amino acids, a protein can spontaneously fold into a special shape and perform a specific function in the cell—some can bind DNA and control the switch of genes; Some can recognize pathogens and initiate an immune response.

The process of protein folding sounds simple, but scientists have never really understood the mysteries that belong to nature. If you take into account that there are at least three different conformations for each amino acid, then a protein with only 100 amino acids has up to 3 to the 100th possible structure. Not even a computer can handle this number, let alone a human.

▲Proteins have extremely complex structures and perform extremely sophisticated functions (Image credit: Thomas Splettstoesser, CC BY-SA 3.0, via Wikimedia Commons)

Decades ago, scientists struggled to predict the structure of even the simplest proteins. When David, still in college, wanted to challenge the conundrum, his mentor advised him to keep his head down, because “nobody knows what’s going on”.

Following the advice of his advisor, David put this idea on hold for a short time, and obtained a doctorate in the research group of Professor Randy Schekman, a future Nobel Prize winner, focusing on cell biology. But then, he remembered the dream from a few years ago. After getting a teaching position, David decided to use the power of computers to solve protein folding problems –In 1996, he and graduate students began to write a program called Rosetta. Based on an amino acid sequence, it may be able to solve the structure of a protein from an astronomical number of possibilities.

In nature, proteins always fold into the shape with the “lowest free energy” in order to remain stable. It’s as if water would flow from high to low and stay there. Rather than looking for the lowest free energy shape from these astronomical numbers, the Rosetta program first analyzes the biophysical properties of the protein, simulates a rough shape, and then fine-tunes it, leaving only the lower free energy shape result. As a result, researchers can predict protein structures faster.

Since 1994, biologists, like David, who want to unravel the mysteries of protein folding, have come together regularly to test their results: like an exam, they will Take the sequence of a protein and predict its structure. These predicted structures are then compared to actual structures that have not yet been published, but have been solved experimentally to see which one is closer.

In what has been dubbed the “Olympics” of protein structure, the Rosetta program is always the top contender, with a dominant advantage. In 2018, however, that advantage came to an end.


“We’re getting started with machine learning”. At the end of 2018, after attending a meeting, David suddenly said this to the members of the laboratory.

While the Rosetta program has demonstrated extraordinary ability to predict protein structures in the past, experts know its limitations. Tools for protein structure prediction from biophysical properties use some basic physical rules, such as how far apart two atoms are best, or how to balance electrostatic andThe role of hydrogen bonding. But this is simulation after all, not real physics, which involves more complex quantum realms, too complex even for computers.

But machine learning approaches allow scientists to reach deeper levels. An algorithm called a neural network can simulate the learning process of the brain, enabling AI to quickly become an expert in a certain field. For example, under training, advanced algorithms can find clues of cancer from tissue slices, or, like DeepMind, a well-known machine learning company, let it learn to play Go, or predict protein structures.

At the 2018 conference, Rosetta from David’s team was still the best performing program, but a machine learning algorithm called AlphaFold from DeepMind came out and came in second. . David has a keen sense of the changing winds, and has his team working on machine learning to keep up with the trend. His hunch was not wrong. At the conference two years later, the second-generation AlphaFold beat Rosetta and became famous in one fell swoop.

▲AlphaFold2 is on fire in 2020 (Image credit: DeepMind Blog)

On July 15 last year, DeepMind published a paper in the journal “Nature”, which disclosed the source code of “AlphaFold2” and described its design framework and training method in detail.

And David’s team introduced the RoseTTAFold algorithm it developed in the journal Science on the same day. This neural network can simultaneously consider the pattern of protein sequence, the interaction between different amino acids in the protein, and the possible 3D structure of the protein. In this system, one-, two-, and three-dimensional information can communicate with each other, allowing the neural network to synthesize all the information to determine the relationship between the chemical components of a protein and the structure it folds into.

The researchers said that the RoseTTAFold system’s performance in resolving the 3D structure of proteins is almost equal to that of AlphaFold2, and even better than AlphaFold2 for some proteins. Using public information from AlphaFold, and thanks to the accumulation of machine learning over the years, the development of this algorithm took only a few months.

Since then,

The two algorithms

Used by thousands of laboratories worldwide to predict the structure of proteins of interest. They’re not perfect, but they can quickly give a general direction. In contrast, traditional laboratory analysis techniques take several years.

But this is just a starting point. Before making breakthroughs in predicting protein structures, David had already turned to a diametrically opposite direction: de novo protein design – Theoretically, if you really understand how proteins fold, you can de novo design things that don’t exist in nature. new proteins that exist. In other words, scientists can deduce the DNA sequence of a protein with a specific shape.


In a sense, designing proteins from scratch is orders of magnitude more difficult than predicting protein structures. Suppose a protein is to be designed with 100 amino acids, each amino acid has 20 distinct possibilities, making the total number of possibilities up to 20 to the 100th power.

This number is more than the total number of atoms in the entire universe.

Image credit: ESA/Hubble, CC BY 4.0, via Wikimedia Commons

Rosetta has had some success. From DNA sequences to protein structures, it can find shapes with the lowest energy. In turn, it can be used to deduce the protein components needed to make this shape. In addition, the researchers also learned how to break a protein into small spirals or barrels, just like disassembling Lego toys.

In 2003, David’s team designed the first protein that could not be found in nature, calling it Top7.

This is certainly an important breakthrough, but it does not usher in a new era. Members of David’s lab joke that Top7 is just a “rock” that is thermodynamically stable. Yes, this is the first protein they’ve designed from scratch to fold the way the researchers want it to fold, but it doesn’t have any function.

After 7 years, one of David’s postdoctoral fellows made improvements. He connected part of the antibody to the artificial protein, making the latter functional for the first time: the newly synthesized protein can recognize the influenza virus and is expected to become a new drug, but this is somewhat “cheating”, after all, the most important thing that part is derived from natural antibodies.

Over the next few years, the team refined Rosetta even more. Today, David’s lab, and his collaborators, have been able to design many different proteins. For example, Neil King, who also works at the University of Washington(Neil King) Professor trying to make proteins self-assemble into nanoparticles for delivery of vaccines or gene therapy.

But designing proteins from scratch is still a trial-and-error endeavor that requires a significant investment of resources. Taking the design of binding proteins as an example, from a process point of view, scientists will first use Rosetta to simulate a “pocket” on the surface of the protein of interest, and then design a large number of different helical structures to form a stable backbone. These backbones contain certain amino acids that may fit perfectly into the “pockets”.

This job is like grinding on a key until it fits perfectly into a lock.

The researchers then synthesize the desired DNA sequence according to the design, introduce it into bacterial cells, and expect them to produce the desired protein. After obtaining the proteins, they also performed two tests: evaluating whether the proteins folded as expected, and whether the folded proteins bound as expected to the protein of interest.

In general, artificially engineered proteins rarely satisfy both conditions.

And those proteins that stand out will become the starting point for a new round of design and screening until the best conformation is obtained.


This is not the only problem with artificially engineered proteins. Beyond that, an easy question to ask is, What advantages do these proteins have over traditional antibody drugs? Over the decades, antibody drugs have been proven safe and effective, and pharmaceutical companies know how to develop them. And with engineered proteins, no one knows how safe it is—what if they elicit a strong immune response?

This may lead to another, more philosophical question: what is a protein? Yes, many scientists can use computer programs to design the amino acid chains they want and fold them into the shape they want, but These new molecules are not like proteins in nature and are full of artificial trace. Making artificially designed proteins closer to nature is also the future development direction of scientists.

Or maybe, we don’t have to go all the way. Recently, a new company called Monod Bio announced that it has received a seed round of $25 million for the design of artificial proteins. Its chief executive officer and chief scientific officer are from David’s research group. This company is not developing drugs, but biosensors.

Traditional sensors are often made using electronic chips. The company uses protein sensors that can be designed on demand to detect disease. If certain disease-related molecules are present in the sample, the protein sensor can emit specific light, and the intensity of the light represents the concentration of these disease-related molecules.

And because of the convenience of biosensors, after collecting a sample, researchers can expect results within minutes without the need to travel to the lab for complex analysis.

If successful, this will undoubtedly open up even greater horizons for artificially engineered proteins. And if you want to look to the future, David mentioned more in a TED talk: By applying innovative amino acids, we may be able to expand the upper limit of 20 amino acids and produce more diverse and more functional proteins. .

In addition, these artificial proteins can be used not only for the diagnosis or treatment of diseases, but also for other aspects. Precisely engineered, for example, these proteins can target specific cell populations and facilitate precise drug delivery. In addition, emerging biomaterials based on these artificial proteins are also expected to solve increasingly serious energy and ecological problems.

These goals may seem distant, but they are not out of reach. A recent paper in “Science” pointed out that de novo protein design can generate a series of ligands that bind to the erythropoietin receptor, which may affect the replication and survival of red blood cells and help repair nerve damage. Another paper in “Nature” also reported that scientists have designed a new protein that is similar to natural IL-2, but does not produce corresponding toxicity, and demonstrated in mouse models the treatment of melanoma and colorectal cancer. active.

Despite the skepticism of many, David is optimistic. He expects new breakthroughs in this field in 5-10 years. “Such exciting times are not common in a scientific career,” he said.

WuXi AppTec provides integrated, end-to-end new drug R&D and production services for the global biopharmaceutical industry, covering chemical drug R&D and production, biological research, preclinical testing and clinical trials R&D, cell and gene therapy R&D, testing and production. If you have relevant business needs, please click the picture below to fill in the specific information.