I just came back from CASP13, the biennial assessment of protein structure prediction methods (I previously blogged about CASP10.) I participated in a panel on deep learning methods in protein structure prediction, as well as a predictor (more on that later.) If you keep tabs on science news, you may have heard that DeepMind’s debut went rather well. So well in fact that not only did they take first place, but put a comfortable distance between them and the second place predictor (the Zhang group) in the free modeling (FM) category, which focuses on modeling novel protein folds. Is the news real or overhyped? What is AlphaFold’s key methodological advance, and does it represent a fundamentally new approach? Is DeepMind forthcoming in sharing the details? And what was the community’s reaction? I will summarize my thoughts on these questions and more below. At the end I will also briefly discuss how RGNs, my end-to-end differentiable model for structure prediction, did on CASP13.Continue reading
For over a decade now I have been working, essentially off the grid, on protein folding. I started thinking about the problem during my undergraduate years and actively working on it from the very beginning of grad school. For about four years, during the late 2000s, I pursued a radically different approach (to what was current then and now) based on ideas from Bayesian nonparametrics. Despite spending a significant fraction of my Ph.D. time on the problem, I made no publishable progress, and ultimately abandoned the approach. When deep learning began to make noise in the machine learning community around 2010, I started thinking about reformulating the core hypothesis underlying my Bayesian nonparametrics approach in a manner that can be cast as end-to-end differentiable, to utilize the emerging machinery of deep learning. Today I am finally ready to start talking about this long journey, beginning with a preprint that went live on bioRxiv yesterday.
I previously blogged on my adventures in self quantification (QS). In that post I wrote about the general system but did not delve into specific projects. Ultimately however the utility of self quantification is in the detailed insights it gives, and so I’m going to dive deeper into a project that passed a major milestone earlier today: publication of a paper. If you’re interested in the science behind this project, see my other post, A New Way to Read the Genome. Here I will focus on the application and utility of QS as applied to individual projects.
I am pleased to announce that earlier today the embargo was lifted on our most recent paper. This work represents the culmination of over two years of effort by my collaborators and I. You can find the official version on the Nature Genetics website here, and the freely available ReadCube version here. In this post, I will focus on making the science accessible to the lay reader. I have also written another post, The Quantified Anatomy of a Paper, which delves into the quantified-self analytics of this project.
There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others). Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving. To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem. I will do this here for the case of multilayer perceptrons. By the end of this informal discussion I hope to provide an intuitive picture of the surprisingly simple representations that NNs encode.
It is tempting to assume that with the appropriate choice of weights for the edges connecting the second and third layers of the NN discussed in this post, it would be possible to create classifiers that output over any composite region defined by unions and intersections of the 7 regions shown below.
There has been much haranguing about the apparent uselessness of the federal government. While I am no political pundit, I can speak about my little corner of the universe. The US federal government includes something called the National Institutes of Health or NIH, which happens to be the largest scientific research organization in the world. With a budget of over $30 billion, it spends more on research than Microsoft, IBM, Intel, Google, and Apple combined, supporting over 300,000 researchers nationwide. It also employs 6,000 scientists internally, who collectively produce more biomedical research than any other organization in the United States. What does it mean for the NIH staff to be furloughed? It means that every single day, 16.4 research years are wasted, or about three Ph.D. theses. This is likely to be an underestimate because the scientists employed by the NIH are professionals whose scientific output exceeds that of graduate students, and the quality of NIH-produced research backs this up. What kind of research will be delayed every day? You can read the list yourself, but it includes things like deciphering the genetic code, inventing MRI, and sequencing the human genome. This is not hyperbole; all these discoveries were made by NIH-supported researchers, who have received 83 Nobel prizes in total.
The US is the world’s preeminent scientific superpower, “a player without peer” as Nature recently put it. Only through profound and self-inflicted displays of stupidity such as we have witnessed during the past 24 hours will this cease to be the case.
I just came back from ICSB 2013, the leading international conference on systems biology (short write-up here). During the conference Bernhard Palsson gave a great talk, which he ended by promoting a view that (I suspect) is widely held among computational and theoretical biologists but rarely vocalized: most high-impact journals require that novel predictions are experimentally validated before they are deemed worthy for publication, by which point they cease to be novel predictions. Why not allow scientists to publish predictions by themselves?
I recently had the pleasure of attending the 14th International Conference on Systems Biology in Copenhagen. It was a five-day, multi-track bonanza, a strong sign of the field’s continued vibrancy. The keynotes were generally excellent, and while I cannot help but feel a little dismayed by the incrementalism that is inherent to scientific research and that is on display in conferences, the forest view was encouraging and hopeful. This is one of the most exciting fields of science today.