I recently had the fortune of attending the 10th Community Assessment of protein Structure Prediction, or CASP, as it is affectionately known. CASP is a competition of sorts that happens once every two years to ascertain the progress made in computationally predicting protein structure. It is a blind experiment, where the structures to be predicted are unknown beforehand, and thus serves as a unbiased test of the predictive power of current computational methods. It is in many ways a model that the rest of computational biology ought (and is starting) to follow.
I went to CASP in part because it is very relevant to my research; I develop computational methods that use the structures of molecules to predict their binding affinity to other molecules. But I also went to gain a better understanding of the state of structural biology, computational and otherwise. It is no secret that in the past decade, as the genomics revolution kicked into high gear, (computational) structure biology has shown stunted progress. Science is very fashionable, and structural biology is currently out of fashion. There are many possible reasons for this. Two obvious ones are the explosion of available sequence data, which has made sequence-based analysis ripe with low-hanging fruit, and the slow progress in computational structural biology itself. Our ability to predict structures has been stagnant, with most of the improvement gained coming from the availability of more structural data as opposed to fundamental theoretical advances. This stagnation has in fact been a topic of discussion for the past two CASPs at least.
The expectations were somewhat different this year however. A set of publications, coming primarily from two groups [2,3], have demonstrated surprising accuracy in predicting protein structure. Furthermore, these methods attack the problem in an entirely different way from the so-called fragment assembly based methods that are the current standard in the field. They do so by exploiting the evolutionary signal in protein sequences. In particular, they search for residues that co-evolve and use that information as an indication that such residues may be in physical contact. The idea has been around for some time, but recent progress has come from employing more sophisticated mathematical machinery, and these advances appear to make a significant difference. My expectation, and I think that of a few others, was that these co-evolution methods were going to make a big splash at CASP10.
Unfortunately they did not. That’s the bad news. The good news is that it is not because they made inaccurate predictions. Instead, these methods require a very large number of sequences to work, 1,000s to 10,000s, and the targets used in the CASP competition in the relevant category (so-called free modeling) simply did not have that many sequences available. This leaves the possibility open that in the next CASP, we will see a significant breakthrough made by co-evolution methods.
So that was a letdown. But what of the bigger question, the health and long-term outlook of the field? Coming as an outsider, I repeatedly probed the conference attendees on this question, and what I received in reply was somewhat interesting. Almost everyone I talked to readily admitted that the field has been stagnant, and that it is getting increasingly difficult to do the things that are the lifeblood of a scientific discipline: raise money, attract students, and publish in high-impact journals. Yet, somewhat inexplicably, there was also a budding sense of optimism that has been lacking for years. In part it may have been sparked by the recent developments in co-evolution methods, which, although not making the splash some had hoped for, still hold a lot of promise. But my sense is that it is broader than that.
The following is pure speculation on my end, but I am betting that structure will be making a comeback soon, possibly in a year or two, possibly a little after that. I think new computational methods will play a crucial role in this process, as we get better, possibly much better, at predicting structures and their interactions. But another potential driver is the need for interpreting genomic data. While the explosion in ultra high-throughput sequencing has been nothing short of revolutionary, that field is not devoid of its own set of problems. Chief among them is our inability to make sense of the growing mound of genomic data. Perhaps the highest profile example of this is the repeated failure of genome-wide association studies (GWAS) to yield highly predictive loci for diseases or traits. It is appearing increasingly likely that the mere availability of enormous amounts of sequence data will not yield scientific insights, particularly if the attempted leap is directly from sequence to final organismal-level phenotype. The basic problem here is that the mapping function is far too complex. To go from sequence to something like a person’s height involves a highly non-linear set of mappings, and exploring the full space of these mappings without utilizing prior knowledge is hopeless, no matter how many sequences we throw at the problem. The field of genetics so far has largely concerned itself with the tiny fraction of genetic traits that can be traced to one or a handful of loci. But the vast sea that is the human phenotypic landscape will involve much more complex mappings.
So how does structure fit into all this? Structure represents in many ways the first step forward. In going from sequence to phenotype, the first question we should ask is “what are the molecular consequences of these sequences?” Our ability to reliably move between sequence and structure, which is what the structure prediction problem is ultimately about, is central to answering this question, which in turn is central to the interpretation of genomic data. This realization has already started permeating genomics. As it continues to do so, and as potentially significant breakthroughs are made in protein structure prediction using co-evolution methods and possibly other, entirely new approaches, structural biology may suddenly find itself in the spotlight.