Alphafold2

And a new, exciting era of Biology

Jan 16, 2021

Cynicism can be seductive. I veer too much into it at times. In fact, this very newsletter is rooted in a mildly cynical view of things which is my comfort zone. But while cynicism may make for fun reading, too much of it in real life is not productive.
Personally, I channel little of the cynicism that oozes out on these newsletters in real life. In fact, I’ve been guilty of approaching real life with a naïve, idealistic view of the world. Trusting people is my default state. Perhaps as George Carlin said, “scratch every cynic and you’ll find a disappointed idealist.”
So this one is about hope. And for me, nothing generates hope as much as science and tech.

November, 2020

November 2020 was a month that seemed to both belong and not belong to 2020. Surreal, world-impacting things happened and that made it a poster-boy month of 2020. But there were many positives which may make this month the only good offspring of the 2020 family.

The orange man-thing lost the election. In true 2020 style, it was a reality show of chaos and uncertainty. This was followed by a refusal to concede from the accursed cretin but we take the win where we can.
Vaccines were announced with greater than 92% effectiveness. For a brief second, a wind of collective sigh swept around the world.
NASA and Space-X launched Space-X Crew 1 - the first crewed program on the Space-X dragon capsule heralding a new era in space travel.

Amidst this strange positive turn of 2020, there was one news that squeezed in right at the end of November. It did not get as much attention as everything else (because 2020), but it was a groundbreaking milestone for science and technology. This was the news that Google Deepmind (the rising AI sentient of our times slowly getting powerful) may have ‘solved’ one of biology’s most vexing challenges. AlphaFold2, a model from Deepmind proved that it could predict and visualize the shape of a protein based on its amino acid sequence with 90% accuracy. This the best thing to happen since sliced DNA (I mean CRISPR).

The protein folding problem

Proteins are the basis of life. They are in our bones, flesh, blood and enzymes. 15% of our body weight is proteins. At our deepest cores, the structure of our proteins reveal the scars of our origin as individuals and our lineage - like a living fossil.

Let’s lego-block a protein. First there are the amino acids - 21 of them. When two or more amino acids hook up they form a peptide. Peptides link up to form polypeptides. A protein can be thought of as many interconnected polypeptides. Are you with me so far?

By 2003, when the human genome project was completed, we had finally understood the blueprint that was at the core of all of us - the DNA. This gave us a clue as to how the amino acid sequences formed unique to us basis our DNA. This is great because now we know what proteins are being formed and how that impacts us right? Not really.

Proteins are not just the sequence but also the three dimensional shape. And given a sequence of amino acids, it is not straightforward to predict the shape they would ‘fold into (called protein folding).

Now there are a lot of words that I don’t fully understand like hydrophobic interactions and van der Waals forces that can leave us all yawning a bit. Just knowing the sequence of the chain of amino acids will not give a clear idea on how they are going to fold together.

(Amino acids have side chains which apart from having combined in a sequence to form proteins also then interact with other amino acids in the same sequence to bend this snake like sequence into a three dimensional shape squiggly thing.)

But here’s the math: A given sequence could fold into any of 10300 combinations. For relative comparison the number stars in the Milky way is in the range of 1021. Are you now starting to see why DeepMind comes into the picture?

AlphaFold 2

We’ve been trying to solve this problem for at least 50 years. And since 1994, there is a worldwide experiment called CASP (Critical Assessment of protein Structure Prediction) - a science world cup that runs every two years to see who can predict protein structure well. Since 2006 to 2016, the accuracy of predictions have hovered between 30-40%. Then AlphaFold from Deepmind entered the fray and predicted with 60% accuracy in 2018.

This November, in CASP 2020, AlphaFold2 crossed 90% accuracy in median accuracy predicting the protein structure. Even in the most challenging proteins category, the model, learning from 10,000 known protein structures, predicted with an accuracy of 87%. It outperformed nearly 100 other teams.

You could imagine the collective gasp from the scientists - a moment of shocked silence and then the clamor of excited voices and applause. We expected to reach this level of accuracy in another few decades. 2050 is here.

What does this mean?

Talking about exponential technology, here’s an example. For decades, Lucas labs was trying to identify the structure of a bacterial protein. With AlphaFold2, they got it in 30 minutes. The field of protein structure prediction has been blown up. Perhaps many careers dedicated to researching this for decades were ended that day.

This could open up thousands of proteins in the human genome, enable scientists to now understand sources of diseases and genetic variations better, understand how they bind and then handle it as they see fit. For example, the universal protein database has around 180 million dna and protein sequences but only 170,000 of those proteins structures have been published. There’s an ocean of information of proteins that’s still not figured out. And having the structures figured out will improve the prediction of their function even more drastically.

Imagine 2030. Fewer and fewer drugs are found through trial and error taking enormous amounts of time and effort. In 2020, an average drug discovery costs $2.5 billion and takes 12 years. By this standard, in 2020, we could see drug development at warp speeds thanks to our ability to quickly sequence sources of issues and understand the structure of proteins involved among other advancements.

Drugs could be designed specifically to fit the proteins involved in the disease or unique to our bodies. This is the beginning of the science-fiction part of bio tech.

Protein folding is just one of the potential hurdles. But AI is coming for the other hurdles as well including solving binding problem. AI is currently being used to mix and match existing drugs to solving new conditions, find new molecules to solve diseases or even use AI to parse and analyze the tons of data generated in the drug discovery processes to identify molecules of promise for other causes.

When all of these exponentially improve with greater computing power, more sophisticated models and more centralized data, we are going to see some mind blowing step changes in healthcare over the next few decades. DNA based circuitry? Programmable protein structures? Why not?

Couldn’t be more exciting,

Tyag

If you liked this post and would like to get something new and interesting every week (box of chocolates, really) please subscribe:

You may also be interested in:

Shrinking Brains and Exponential Changes

Ungoals of 2021

Could Be Worse

Alphafold2

And a new, exciting era of Biology

AlphaFold 2

What does this mean?