I just came back from CASP13, the biennial assessment of protein structure prediction methods (I previously blogged about CASP10.) I participated in a panel on deep learning methods in protein structure prediction, as well as a predictor (more on that later.) If you keep tabs on science news, you may have heard that DeepMind’s debut went rather well. So well in fact that not only did they take first place, but put a comfortable distance between them and the second place predictor (the Zhang group) in the free modeling (FM) category, which focuses on modeling novel protein folds. Is the news real or overhyped? What is AlphaFold’s key methodological advance, and does it represent a fundamentally new approach? Is DeepMind forthcoming in sharing the details? And what was the community’s reaction? I will summarize my thoughts on these questions and more below. At the end I will also briefly discuss how RGNs, my end-to-end differentiable model for structure prediction, did on CASP13.
“What just happened?” was a question put to me in exactly these words by at least one researcher at CASP, and a sentiment expressed by most academics I spoke with. As one myself, I shared it going in and throughout the meeting. In fact I went into CASP13 feeling melancholy (the raw results were out two days prior), although my mood lifted during the meeting due to the general excitement and quality of discussions, and as my tribal reflexes gave way to a cooler and more rational assessment of the value of scientific progress.
This will be a long post. I will start with the science: the significance of DeepMind’s result, their methodology, and how it relates to existing methods. Then I will discuss the sociology: how people reacted, why we did so, what this means for the academic discipline of protein structure prediction (and life science companies), and how I think we ought to move forward. After what I hope is an exposition of general interest, I will briefly discuss how RGNs performed at CASP13. Spoiler alert: not very well, partly because the value of co-evolutionary information increased substantially in this CASP relative to prior ones, and partly because I could not submit the original submissions unaltered owing to technical problems.
For the sake of making this post easier to navigate, below is a table of contents.
Table of contents
- The science
- The sociology
- Post-mortem: RGN @ CASP13
Let me get the most important question out of the way: is AlphaFold’s advance really significant, or is it more of the same? I would characterize their advance as roughly two CASPs in one (really ~1.8x). Historically progress in CASP has ebbed and flowed, with a ten year period of almost absolute stagnation, finally broken by the advances seen at CASP11 and 12, which were substantial. What we’ve seen this year is roughly twice as much as the recent average rate of advance (measured in mean ΔGDT_TS from CASP10 to CASP12—GDT_TS is a measure of prediction accuracy ranging from 0 to 100, with 100 being perfect.) As I will explain later, there may actually be a good reason for this “two CASPs” effect, in terms of the underlying methodological breakdown. This can be seen not only in the CASP-over-CASP improvement, but also in terms of the size of the gap between AlphaFold and the second best performer, which is unusually large by CASP standards. Below is a plot that depicts this.
Prior to CASP10, for roughly ten years, the curve was basically flat. CASP11 began to show life because of the introduction of co-evolutionary methods, but just barely because most FM targets had shallow multiple sequence alignments (MSAs), which are required for co-evolutionary methods. CASP12 was when the power of these methods finally got demonstrated, and CASP13, even when excluding AlphaFold, showed further progress due to the widespread adoption of deep learning in co-evolutionary methods. We see that the second best method (Zhang server) improved by almost exactly “one expected CASP”, reflective of the field-wide improvement, and AlphaFold added to this yet another “one CASP”’s worth of improvement. Note these “one CASP”s are very recent history dependent, really just the past few CASPs (10-12), and so please take them with a mountainful of salt. Note also that my method of using mean GDT_TS is problematic because the difficulty of FM prediction targets varies from one CASP to another, although they’ve been supposedly stable recently.
Taken together the above suggests substantial progress, more so than usual, and hence not only did AlphaFold “win” CASP13, but did so by an unusual margin. Great! Does this mean the problem is solved, or nearly so? The answer, right now, is no. We are not there yet. However, if the (AlphaFold-adjusted) trend in the above figure were to continue, then perhaps in two CASPs, i.e. four years, we’ll actually get to a point where the problem can be called solved, in terms of gross topology (mean GDT_TS ~ 85% or so). Of course, this presupposes that the trendline will continue, and we have no real reason to believe that it will, at least not without new conceptual breakthroughs. Keep in mind that unlike other areas of machine learning, new protein structures are not appearing at an increasing rate, and so waiting things out will not help.
The above graph is misleading in one way though because it is dependent on a specific metric, GDT_TS, which only measures gross topology. If we care about high resolution topology, which we certainly do for most practical applications, then a more appropriate metric is GDT_HA, and using it the picture looks a bit different:
Still a good trendline, but much further down from a “solution”.
Another caveat is that both of these metrics measure global goodness of fit, which is important in terms of the basic scientific problem, but is often not indicative of functional utility. Local accuracy, for example the coordination of atoms in an active site or the localized change of conformation due to a mutation, is what is often sought when answering broader biological questions. Global metrics hide local discrepancy by diluting it in the sea of generally good agreement between experimental and predicted structures.
Another way of thinking about this is asking whether the same headlines would have been generated had an academic group achieved the same increase in accuracy that DeepMind has. The answer is certainly not, and we have the CASP11 → CASP12 advance to confirm that, as it was about equal in absolute magnitude (and thus arguably harder coming from a lower starting point) but generated few if any headlines. DeepMind’s publicity machine certainly helped shine a bright light on their advance, which is frankly also good for the field as a whole.
None of this is to detract from the AlphaFold advance. It is an anomalous leap, on the order of a doubling of the usual rate of improvement, and portends very favorably for the future. But that future has yet to be realized. (I actually think people may have walked away a bit too optimistic from this CASP—a DeepMind joins the field for the first time only once, and the value added of their excellent engineering may not get repeatedly re-realized, but we’ll see.)
Let me now switch gears and talk a bit about the landscape of protein structure methodology before AlphaFold’s arrival. I won’t talk much about RGNs here because in some ways they are much more unusual methodologically than AlphaFold is, and so the two are well separated in algorithm space.
AlphaFold is a co-evolution based method, building on the groundwork that has been laid in the past ~7 years by several academic groups. The basic idea is to extract so-called evolutionary couplings from protein MSAs by detecting residues that co-evolve, i.e. that have mutated over evolutionary timeframes in response to other mutations, thereby suggesting physical proximity in 3D space. The first batch of such approaches [2, 3, 5] predicted binary contact matrices from MSAs, i.e. whether two residues are “in contact” or not (typically defined as being within <8Å), and fed that information to simple geometric constraint satisfaction methods to fold the protein and return its 3D coordinates. (There is a pre-history to this field when overly simple statistical models were used to predict such contacts, dating back to the 90s, but I will not cover it as that generation of approaches was not successful and I am by no means trying to be comprehensive here.) This first generation of methods was a substantial breakthrough, and ushered in the new era of protein structure prediction that finally showed promise of working.
An important if expected development was the coupling of such binary contacts with more advanced folding pipelines such as Rosetta and I-Tasser, which resulted in better accuracy and were the state of the art until around the middle of 2016, or just before CASP12. The next major advance came from applying convolutional networks and deep architectures (residual networks) to integrate information globally across the entire matrix of raw couplings to turn them into more accurate contacts. Jinbo Xu’s group developed the first major (and experimentally serious) version of this approach, among others [1, 4, 6].
Which brings us to the present and AlphaFold. Only a few weeks before the CASP13 results became public, Xu published a preprint on bioRxiv that predicted inter-residue distances instead of binary contacts . It used the same input (MSAs), and largely the same architecture as his CASP12 approach, but predicted probabilities over a discretized spatial range and then picked the highest probability one for feeding into CNS to fold the protein. Xu’s preprint showed significant promise on a subset of CASP13 targets, and the buzz among some of us was that Xu’s approach would win the competition. As it turns out, this seemingly simple change had a surprisingly profound impact, and forms one of the key ingredients of AlphaFold’s recipe.
DeepMind has promised to publish a paper on AlphaFold, so the final and definitive description will have to wait for their paper, which I hope will be thorough. They have no plans to release the source code, and are unlikely to put up a public prediction server in the near term, although they appear open to considering it at some point. Having said that, they were generally forthcoming in discussing their method during CASP13, and appeared genuinely interested in sharing the approach with the community and ensuring that people can build on it. The sense I got was that they are in it for the science.
Just like Xu’s approach, AlphaFold uses a softmax over discretized spatial ranges as its output, predicting a probability distribution over distances (the details of the convolutional ResNet architecture are different, but it remains unclear how large a contribution these details made.) Unlike Xu’s approach, which tosses away these probabilities and only uses the most likely distance bin as input to CNS, AlphaFold uses the entire distribution as a (protein-specific) statistical potential function that is directly minimized to generate the protein fold. The key idea of AlphaFold’s approach is that a distribution over pairwise distances between residues corresponds to a potential that can be minimized after being turned into a continuous function. They initially experimented with more complex approaches, including fragment assembly using a generative variational autoencoder. Remarkably however, halfway through CASP13, they discovered that simple and direct minimization of their predicted energy function, using gradient descent (L-BFGS), is sufficient to yield a high accuracy fold. And so they essentially switched to this approach half way and it represents the essence of their final model.
This idea looks deceivingly simple but has rather profound implications. I think its simplicity may somewhat mask the difficulty with which it can be arrived to. More often than not in science, particularly physical sciences, a simple change in perspective can lead to surprising changes in outcomes. The paradigm of predict contacts → feed into complex folding algorithm was so entrenched in the field that it was difficult for most to see it as unnecessary (including for DeepMind’s team, which tried more conventional folding approaches before discovering that a simpler approach works just as well.) Much of the pushback I received toward my end-to-end differentiable approach was because it eschewed any sampling and directly folded the protein.
There are some important technical details. The potential is not used as is, but is normalized using a learned “reference state”, harking back to the old days of knowledge-based potentials like DFIRE and the Quasichemical potential (parenthetically, I wrote a couple of papers on the topic, developing what I think was the first ML-based potential for protein-DNA interactions.) This normalization evidently had a large impact. Furthermore, their potential is coupled with a more traditional physics-based potential and the combined energy function is what is actually minimized.
This idea of predicting a protein-specific energy potential brings AlphaFold’s approach into proximity to another approach, called NEMO, which is currently in open review at ICLR. While the submission is anonymous, it is fair to conclude that, given this talk, it’s been developed by John Ingraham, Adam Riesselman, Chris Sander, and Debora Marks. NEMO too generates a protein-specific energy potential that is then minimized to yield the final protein, but the similarities end there. AlphaFold generates the potential using a neural network, but once done, turns it over to a minimizer that operates independently and is not optimized jointly with the neural network. NEMO on the other hand turns the entire folding process into a differentiable Langevin dynamics simulator, and backpropagates from the final predicted structure through a few hundred steps of the simulator into the neural network variables. Additionally NEMO, like RGNs, only uses raw sequence information and PSSMs.
While the AlphaFold and NEMO approaches do harken back to knowledge-based potentials, they are different in a fundamental way. The knowledge-based potentials of yore (and current physics-based potentials like Rosetta) are universal, in the sense that they at least pretend to be applicable to any protein, and would yield the right result if enough sampling was done to find their minimum. Whether this is true or not in practice is a different matter. The protein-specific potentials of AlphaFold and NEMO are quite different beasts. They are entirely a consequence of the MSA (or sequence + PSSM) that they depend on. What they do is construct a potential surface, particularly in the case of AlphaFold, that is very smooth for the given protein family, and whose minimum closely matches that of the native protein (-family average) fold. It is fantastic (and surprising to some, but I would argue RGNs already showed it is possible by doing so implicitly in the RGN latent space) and extremely useful, in that it allows one to make accurate predictions given an MSA. But it is not an energy potential in the conventional sense.
I should say that this is my characterization and not DeepMind’s. In general I have fairly strong feelings about protein-specific energy potentials, and was planning on writing a more detailed blog post about the topic in connection with the NEMO paper, but have not gotten around to it yet (and unfortunately probably never will.)
Below is a table that summarizes my view of how all the approaches I have discussed so far relate. Adjacent columns in the table indicate methods that in some sense are most similar, but because this is a multi-dimensional space, the relationships are more complex than that. For example, Xu’s approach is similar to AlphaFold because of their prediction of distances, while NEMO is similar to AlphaFold because of their use of protein-specific energy potentials, while NEMO and RGNs are similar because they are end-to-end differentiable and don’t use MSA data, which puts them in a different category altogether. I should point out that NEMO did not participate in CASP13, and neither NEMO nor RGNs are broadly competitive with the other methods (particularly on templated-based modeling (TBM) for RGNs), at least in part because they are using a lot less information.
|Inputs||MSA||MSA||MSA||Sequence or PSSM||PSSM|
|Outputs (pre-folding)||Contacts||Distances||Distributions over distances||Cartesian coordinates (folding internal)||Cartesian coordinates (folding internal)|
|Folding||I-Tasser||CNS||L-BFGS||Differentiable Langevin dynamics||Implicit|
|Energy function||Explicit, fixed, and universal||None||Explicit, learned, and MSA-specific||Explicit, learned, and sequence- or PSSM-specific||Implicit, learned, and PSSM-specific|
The careful reader will note that one column in the above table covers the Zhang group method, which I have not talked about much. Zhang’s approach is interesting for several reasons. First, it came in second during CASP13, and when looking at the overall results (not just FM but also TBM), it is not that far behind AlphaFold’s method. Remarkably, Zhang’s approach does not use predicted distances, but relies on the old style binary contacts. This raises the question of where their improvement is coming from. There are several things going on. While Xu’s approach uses the more informative distances, its folding pipeline is rather simplistic. Zhang’s approach, while using the less informative binary contacts, folds via the sophisticated I-Tasser engine. Since the groups were working independently (and largely in secrecy and competitively), they did not add up their relative contributions. If it were not for AlphaFold, this combined “double” effect may not have been seen until CASP14, but AlphaFold effectively did both at once. Of course, the way AlphaFold achieves this is not via a better folding engine, as theirs is very simple too (L-BFGS). Rather, they get around the problem by building a better energy potential using distributional information. But the advantages of having such an energy potential may be compensated by using a stronger folding engine. I-Tasser also uses templates from the PDB which can substantially help its performance on TBM targets. And perhaps there is further gain to be had by combining AlphaFold’s approach with something like I-Tasser or Rosetta, but AlphaFold’s preliminary results seem to suggest that they’ve already squeezed out what can be had from a better folding engine.
This sheds some light on AlphaFold’s novelty (more on this next.) If it weren’t for AlphaFold, what the field may have moved towards is combining Xu’s approach with Zhang’s, which would have arguably been less elegant than AlphaFold. But this is highly speculative, and it is likely there’s a “half CASP” waiting to be squeezed out by leveraging these partially complementary approaches.
Fundamental scientific insight or superb engineering?
A question that arose over many conversations at CASP13 is whether AlphaFold represents a triumph of insightful science or superb engineering? Such questions can often be silly and divisive (with science somehow occupying a higher ethereal realm than engineering), but at the heart of the question is whether AlphaFold “only” won because it has a large and well-funded team with inexhaustible compute resources, and therefore the academic community has nothing to feel bad about and need not engage in uneasy introspection (you can tell I’m gearing up to shift to the sociology), or whether they have done good science that the academic community missed out on. Insofar as this question merits answering, my own take is that it’s a mixture of the two.
On the fundamental insight front, AlphaFold had a number of good ideas. First, don’t just predict contacts, but also distances, something that Xu does as well but all indications point to the two groups having independently developed the idea. Critically, AlphaFold takes this a step further by predicting a distribution over distances, and then uses that to construct a smooth potential that is minimizable. A second good idea is the use of a reference state, which debiases the predicted potential and demonstrates a solid understanding of knowledge-based potentials that reflects positively on the DeepMind team. The fact that these ideas are “simple”, in the sense that they are unsurprising does not detract from them in the least bit (personally I was actually surprised by the impact the reference potential made, but others appeared less surprised.) The best science is one in which simple ideas have profound consequences, and it very much appears to be the case here. DeepMind is of course also leveraging their deep (no pun intended) expertise in machine learning. For example, the distributional prediction idea seems somewhat similar in spirit to their paper from about a year ago on distributional RL. Whether that insight had any impact on AlphaFold I don’t know, but I think it’s fair to say that the confluence of strong expertise in ML and proteins helped to bring about these advances.
On the engineering front, it’s also clear that the apparently elegant solution we see now is a result of much trial and error, and that much more complex components involving fragment assembly and so on were tried and disposed of. The ability to explore model space rapidly depends heavily on both computational and human resources. So while the final ideas are simple and elegant, they are unlikely to have been discovered if the AlphaFold team wasn’t able to sweep through idea space as rapidly as they did.
If I were to pick, I think about half of the performance improvement we see in AlphaFold comes from the simple ideas above, and about half from the sophisticated engineering of the distance-predicting neural network. If this is true, then academic groups should be able to see substantial improvements in fairly short order.
“What just happened?”
Now that the serious and respectable matters are out of the way, I can finally engage in some gossip. This part will be quite the rant. Like I alluded to in the very beginning of this post, there was, in many ways, a broad sense of existential angst felt by most academic researchers at CASP13, including myself. In a delicious twist of irony, we the people who have bet their careers on trying to obsolete crystallographers are now worried about getting obsoleted ourselves.
I think many of us went through the following phases: (i) fearing that the DeepMind team outsmarted us all by some brilliant fundamental insight, combined with virtuoso engineering; (ii) breathing a sigh of relief that the insights were not radically different from what most of the field was thinking; (iii) (slightly) belittling DeepMind’s contribution by noting its seeming incrementality and crediting their success to Alphabet’s resources.
Setting aside the validity of the above sentiments, the underlying concern behind them is whether protein structure prediction as an academic field has a future, or whether like many parts of machine learning, the best research will from here on out get done in industrial labs, with mere breadcrumbs left for academic groups. Truth be told, I don’t know the answer, and I think it’s possible that some version of this will come to pass. What is clear is that the protein structure field has a new, and formidable, research group. For academic scientists, especially the more junior among us, we will have to contend with whether it’s strategically sound for our careers to continue working on structure prediction. Despite the size of the Baker and Zhang groups for example, I never felt intimidated by them, because on the novelty front I always felt I was several steps ahead. But with DeepMind’s entry I will have to reconsider, and from conversations with others this appears to be a nearly universal concern. Just like in machine learning, for some of us it will make sense to go into industrial labs, while for others it will mean staying in academia but shifting to entirely new problems or structure-proximal problems that avoid head-on competition with DeepMind.
So that’s what just happened. What I’d like to turn my attention to now is what this episode says about academic science, particularly as it pertains to protein structure prediction, and the scientific health of pharmaceutical companies (prepare to be roasted!)
An indictment of academic science
I don’t think we would do ourselves a service by not recognizing that what just happened presents a serious indictment of academic science. There are dozens of academic groups, with researchers likely numbering in the (low) hundreds, working on protein structure prediction. We have been working on this problem for decades, with vast expertise built up on both sides of the Atlantic and Pacific, and not insignificant computational resources when measured collectively. For DeepMind’s group of ~10 researchers, with primarily (but certainly not exclusively) ML expertise, to so thoroughly route everyone surely demonstrates the structural inefficiency of academic science. This is not Go, which had a handful of researchers working on the problem, and which had no direct applications beyond the core problem itself. Protein folding is a central problem of biochemistry, with profound implications for the biological and chemical sciences. How can a problem of such vital importance be so badly neglected?
“I believe that science, at its most creative, is more akin to a hunter-gatherer society than it is to a highly regimented industrial activity, more like a play group than a corporation.” – Marc Kirschner
I wholeheartedly agree with this, and think it is a good thing. The problem occurs when we take this analogy to mean that each small unit of hunter gatherers must defend its turf at all costs, as if the acquisition of scientific knowledge is akin to the hording of food. Science is, in the final analysis, a collective enterprise, and we all gain the greatest benefit when we cooperate and share our knowledge. An element of competitiveness is unavoidable given the human nature of this activity, but it should not rise to the toxicity that currently characterizes much of academia.
More important, and this is where the protein structure field has a very serious problem, the sharing of information must occur with frequent regularity. Even if individual groups are secretive while carrying out their research, if the frequency of sharing is on the order of months, as is typically the case in machine learning, the field can still progress at a rapid pace. But in part due to the canonicalization of CASP, protein structure prediction effectively has a two-year clock cycle, where separate research groups guard their discoveries until after CASP results are announced. As I discussed earlier, it is clear that between the Xu and Zhang groups enough was known to develop a system that would have perhaps rivaled AlphaFold. But because of the siloed nature of the field, it only gets a “gradient update” once every two years. Academic groups are thus forced to independently rediscover the wheel over and over. In DeepMind’s case, even though the team was small in comparison to the total headcount of academic groups, they were presumably able to share information on a very regular basis, and this surely contributed to their success.
The reliance on CASP dates back to an era when structure prediction did not work at all, and when best practices about data separation and prevention of information leakage were not broadly understood. We exist in a very different climate today. Most researchers understand the issues, and are perfectly capable of constructing training and test sets that properly assess the performance of their methods. My own work in this effort, the ProteinNet dataset, is one concrete contribution I have made to democratize and speed up progress in the field. There will invariably be papers with poor controls and exaggerated claims, but the paranoia of cheating and ineptitude must be balanced with encouraging rigorous but rapidly evolving method development.
CASP serves a crucial purpose, and must continue to do so. DeepMind’s results would not have been nearly as convincing had they not taken place as part of CASP. But we must have a middle ground between the gold standard and a more iterative approach to publication and information exchange. CAMEO helps in this regard, but its targets are often not difficult enough. ProteinNet or something like it, like the NEMO authors’ approach of using CATH-based purging, should be encouraged as a mean to provide acceptable assessment of model quality, especially when it is coupled with release of source code that enables transparent reproduction of the training process.
To be sure, the above will not close the gap between academic and industrial research. There are other, more fundamental problems. For example, competitively-compensated research engineers with software and computer science expertise are almost entirely absent from academic labs, despite the critical role they play in industrial research labs. Much of AlphaFold’s success likely stems from the team’s ability to scale up model training to large systems, which in many ways is primarily a software engineering challenge. While academic labs do not need to perform at the level of Google, they must perform at an adequate enough level to support the core scientific mission of their institutions, and this is not currently happening in my opinion.
An indictment of pharma
What is worse than academic groups getting scooped by DeepMind? The fact that the collective powers of Novartis, Pfizer, etc, with their hundreds of thousands (~million?) of employees, let an industrial lab that is a complete outsider to the field, with virtually no prior molecular sciences experience, come in and thoroughly beat them on a problem that is, quite frankly, of far greater importance to pharmaceuticals than it is to Alphabet. It is an indictment of the laughable “basic research” groups of these companies, which pay lip service to fundamental science but focus myopically on target-driven research that they managed to so badly embarrass themselves in this episode.
If you think I’m being overly dramatic, consider this counterfactual scenario. Take a problem proximal to tech companies’ bottom line, e.g. image recognition or speech, and imagine that no tech company was investing research money into the problem. (IBM alone has been working on speech for decades.) Then imagine that a pharmaceutical company suddenly enters ImageNet and blows the competition out of the water, leaving the academics scratching their heads at what just happened and the tech companies almost unaware it even happened. Does this seem like a realistic scenario? Of course not. It would be absurd. That’s because tech companies have broad research agendas spanning the basic to the applied, while pharmas maintain anemic research groups on their seemingly ever continuing mission to downsize internal research labs while building up sales armies numbering in the tens of thousands of employees.
If you think that image recognition is closer to tech’s bottom line than protein structure is to pharma’s, consider the fact that some pharmaceuticals have internal crystallographic databases that rival or exceed the PDB in size for some protein families.
And if you counter with the argument that machine learning is not pharma’s core expertise, then you only prove my point: why isn’t it? While drug companies wrangle over self-titillating questions like “is AI real?” and “how is deep learning any different than the QSAR we did in the 80s”, Alphabet swoops in and sets up camp right in their backyard. As a result the smartest and most ambitious researchers wanting to work on protein structure will look to DeepMind for opportunities instead of Roche or GSK. This fact should send chills down the spines of pharma executives, but it won’t, because they’re clueless, rudderless, and asleep at the helm.
I am being harsh because this has long been a pet peeve of mine. While companies like Alphabet, Facebook, Microsoft, Intel, and IBM have real research groups with billions of dollars spent on fundamental R&D that has led to Nobel or Turing-grade research, pharmaceuticals engage in “research” so narrowly defined that it rarely contributes to our understanding of basic biology. There is perhaps no better example of this than protein structure prediction, a problem that is very close to these companies’ core interest (along with docking), but on which they have spent virtually no resources. The little research on these problems done at pharmas is almost never methodological in nature, instead being narrowly focused on individual drug discovery programs. While the latter is important and obviously contributes to their bottom line, much like similar research done at tech companies, the lack of broadly minded basic research may have robbed biology of decades of progress, and contributed to the ossification of these companies software and machine learning expertise (there is a reason most newly minted ML PhDs run from pharmas like they’re the plague—they have not cultivated a culture that attracts the world’s best ML talent, in part because of their lack of engagement in basic science.) The AlphaFold episode is only an example of several other problems that have been similarly neglected. It is of course possible that these companies have some newfangled protein structure prediction technology internally, but I’m well networked in these circles and I have seen no indication whatsoever that this is the case.
Smaller and newer companies like AtomWise have done better, focusing more seriously on methodological research, and it will likely take a Silicon Valley-like disruptor to finally turn things around.
The way forward
So what now? Should academics fold up their protein structure research programs and move on to greener and less competitive pastures? And will the space see new entrants from other companies, possibly life science ones? I am still digesting CASP13 and by no means have a definitive recipe, but here are my thoughts so far.
First and foremost, we should recognize what an unqualifiable good thing what just happened is. We, meaning the entire scientific community, have made a major advance on one of the most important problems in biochemistry. Who made the advance is less important than the fact the advance was made, and we should unselfishly rejoice in this fact. I say this cognizant of the fact that my own emotions do not entirely coincide with the sentiment I just espoused, but also cognizant of the fact that we are all adults, and that we must and ought to assess this rationally without letting our tribal affiliations cloud our judgement.
DeepMind’s entry also brings several, unintended benefits. We have a new, world class research team in the field, competitive with the very best existing teams. This has happened maybe once a decade if that. We should welcome them with open arms as, first and foremost, new colleagues with shared purpose. We should encourage them to be as open as the academic teams have been in sharing their research, which they appear to be, and learn from them how to improve our engineering practices, and perhaps more importantly, use their lesson to cultivate a better and more open culture of exchange of ideas, instead of the secretive and siloed behavior that characterizes the field.
DeepMind’s entry also raises the profile of the protein structure problem, likely motivating new students and researchers to work on it, inside of academia and outside. Perhaps DeepMind’s entry will also wake pharmas from their deep slumber, and as a result they too begin to stir with new ideas and resources.
Second, regarding the question of how academic groups should respond scientifically to DeepMind’s entry, I suspect the right answer comes from evolution: adapt. Focus on problems that are less resource intensive, and that require key conceptual breakthroughs and less engineering. Solving protein structure is really multiple problems in one. There is what I would characterize as the canonical problem, the prediction of the overall fold of the native state, and it is the one that most people have focused on including DeepMind. This problem remains unsolved, but it’s clear that for perhaps ~30% of such predictions, we can do very well, and for another ~20% reasonably well. If the trend continues, and there are compelling reasons to argue either way, then something like a solution to this problem is conceivable within ~5 years. That solution may come primarily from better engineering, and so perhaps it represents a less favorable strategic landscape for competition.
Most approaches to the above problem have come from co-evolution based methods and so they are by construction “family-level”. They are able to say a lot less about an individual protein sequence, such as a mutated or de novo designed protein. This is the reason why I have focused on this problem for RGNs, as it is a new frontier. It is unclear if we are even marginally closer to solving this after CASP13—I think there is no indication of any real progress here. And so we could just as well be 20 years out.
Even for MSA / family-level predictions, there is the question of desired accuracy, which hinges on the biological application. If one is predicting protein structures to ascertain their general fold for function classification, then high accuracy is unnecessary. If on the other hand the objective is to design small molecule drugs that bind proteins, which require ~1Å accuracy in the local pocket, it is unclear if we have made any detectable progress.
Finally there is the full realization of the protein folding problem, which concerns not only the final native state but the dynamical trajectory the protein takes to get there, as well as the relative energetics of the near native state ensembles. This is arguably the most important problem for protein function prediction, and it remains very far from being solved.
So let us find and learn the important lessons of CASP13; use it to improve our models and our culture; and, recognizing that we are imperfect, competitive humans, rise above our pettiness to celebrate an important milestone for science.
Post-mortem: RGN @ CASP13
While not the primary subject of this blog post, I would be remiss to not comment on my own participation in CASP13 as a predictor for the very first time! The experience was interesting and informative, but in the end RGNs did not perform well, for reasons I will explain briefly. I should say that it is not possible to know definitively without a thorough analysis, so these are only my best guesses for the moment.
First, given the overall improvement of all co-evolution based methods in CASP13 (even ones that have not changed since CASP12), it appears that the increased availability of protein sequences has widened the gap in information advantage between methods that use co-evolution and those that do not (like RGNs.) I suspect this was the biggest factor in lowering the RGN’s relative ranking. Parenthetically, this suggests a new ultra-hard FM category, single sequence targets without detectable homologs, an idea brought up during CASP13.
Another problem, which I only discovered at the beginning of the CASP13 prediction season, is that all my raw predictions got immediately rejected by CASP’s automatic processing pipeline. This is due to RGN-predicted structures having non-physical torsion angles. In some ways it is unsurprising since, as a machine learning model, RGNs only optimize for what they are trained for, in this case dRMSD. So while the overall global topology of the structures can be quite good, locally the structures are often poor (a point I mention in the paper), and this prevented submissions from going through.
To get around this problem I fed my predicted structures through the Rosetta FastRelax pipeline, which partly defeats the purpose of my method, but my aim was to get structures with sufficiently acceptable local structure to get them past the CASP processing pipeline. This worked most of the time, in the sense that I was able to submit structures, but had the effect of altering them in a way that impacted their accuracy.
It is hard to say yet how much of a contribution this made to reducing RGN performance, and there are other factors that also contributed. For example I used an old RGN model trained on the ProteinNet12 dataset, i.e. couple of years out of date, because I did not have time to retrain for CASP13. I doubt this made a major difference, but it was likely a contributor.
All in all it was a good learning experience, and will give me much to think about over winter break.
Thanks to David Baker, Alex Bridgland, Jianlin Cheng, Richard Evans, Tim Green, John Jumper, Daisuke Kihara, John Moult, Sergey Ovchinnikov, Andrew Senior, Jinbo Xu, and Augustin Zidekfor for lively discussions during CASP13 that formed the basis for much of the content of this post.
- Golkov, V. et al. Protein Contact Prediction from Amino Acid Co-Evolution Using Convolutional Networks for Graph-Valued Images. in Annual Conference on Neural Information Processing Systems (NIPS) (2016).
- Jones, D. T., Buchan, D. W. A., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).
- Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. U.S.A. (2013). doi:10.1073/pnas.1314045110
- Liu, Y., Palmedo, P., Ye, Q., Berger, B. & Peng, J. Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Systems 6, 65-74.e3 (2018).
- Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).
- Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLOS Computational Biology 13, e1005324 (2017).
- Xu, J. Distance-based Protein Folding Powered by Deep Learning. bioRxiv 465955 (2018). doi:10.1101/465955