Protein Linguistics

For over a decade now I have been working, essentially off the grid, on protein folding. I started thinking about the problem during my undergraduate years and actively working on it from the very beginning of grad school. For about four years, during the late 2000s, I pursued a radically different approach (to what was current then and now) based on ideas from Bayesian nonparametrics. Despite spending a significant fraction of my Ph.D. time on the problem, I made no publishable progress, and ultimately abandoned the approach. When deep learning began to make noise in the machine learning community around 2010, I started thinking about reformulating the core hypothesis underlying my Bayesian nonparametrics approach in a manner that can be cast as end-to-end differentiable, to utilize the emerging machinery of deep learning. Today I am finally ready to start talking about this long journey, beginning with a preprint that went live on bioRxiv yesterday.

Continue reading

A Conservation Law for Empathy?

Earlier this week I found myself in Rome in the morning with about 20 minutes to spare. Walking around the neighborhood I was staying in (Trastevere), I came across an elderly nun walking along one of the bigger, and more crowded, streets of Rome. As I waited for her to go through a narrow passage in the sea of people, a young woman pushing a stroller physically nudged her out of the way, using the stroller to deny the physical space in front of and adjacent to the older woman as she overtook her. The nun grimaced but seemed resigned to what happened. I saw this unfold despite having been out for only about ten minutes. In contrast, having walked US streets in San Francisco, Boston, and New York for over twenty years, I don’t recall seeing a similar situation happen even once. It follows that frequentist estimates of such occurrences in American cities and Rome suggest very different underlying distributions.

Continue reading

The State of Probabilistic Programming

For two weeks last July, I cocooned myself in a hotel in Portland, OR, living and breathing probabilistic programming as a “student” in the probabilistic programming summer school run by DARPA. The school is part of the broader DARPA program on Probabilistic Programming for Advanced Machine Learning (PPAML), which has resulted in a great infusion of energy (and funding) into the probabilistic programming space. Last year was the inaugural one for the summer school, one that is meant to introduce and disseminate the languages and tools being developed to the broader scientific and technology communities. The school was graciously hosted by Galois Inc., which did a terrific job of organizing the event. Thankfully, they’re hosting the summer school again this year (there’s still time to apply!), which made me think that now is a good time to reflect on last year’s program and provide a snapshot of the state of the field. I will also take some liberty in prognosticating on the future of this space. Note that I am by no means a probabilistic programming expert, merely a curious outsider with a problem or two to solve.

Continue reading

Je Suis Charlie

Yesterday’s news about the horrific massacre in Paris shook me really hard. I spent the day very upset, and the night puzzled by my extreme reaction. Terrorism attacks have virtually become fixtures of the daily news, with yesterday alone over a dozen killed in Iraq. Why did this bother me so much?

Continue reading

The Quantified Anatomy of a Paper

previously blogged on my adventures in self quantification (QS). In that post I wrote about the general system but did not delve into specific projects. Ultimately however the utility of self quantification is in the detailed insights it gives, and so I’m going to dive deeper into a project that passed a major milestone earlier today: publication of a paper. If you’re interested in the science behind this project, see my other post, A New Way to Read the Genome. Here I will focus on the application and utility of QS as applied to individual projects.
Continue reading

A New Way to Read the Genome

I am pleased to announce that earlier today the embargo was lifted on our most recent paper. This work represents the culmination of over two years of effort by my collaborators and I. You can find the official version on the Nature Genetics website here, and the freely available ReadCube version here. In this post, I will focus on making the science accessible to the lay reader. I have also written another post, The Quantified Anatomy of a Paper, which delves into the quantified-self analytics of this project.

Continue reading

What Does a Neural Network Actually Do?

There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others). Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving. To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem. I will do this here for the case of multilayer perceptrons. By the end of this informal discussion I hope to provide an intuitive picture of the surprisingly simple representations that NNs encode.

Continue reading


It is tempting to assume that with the appropriate choice of weights for the edges connecting the second and third layers of the NN discussed in this post, it would be possible to create classifiers that output 1 over any composite region defined by unions and intersections of the 7 regions shown below.

Continue reading

Question: Would You Exit?

This post will be a question to you dear reader. Consider the following scenario: Death is no longer at everyone’s doorstep. Any person can choose to live healthily for as long as they wish, with the caveat that no new person can be born unless someone already living decides to “exit”, a euphemism for a completely painless death, something as easy as walking through a door. Thus while one can go on living forever, it would hypothetically deprive some other person from experiencing the joys (and pains) of life and growth.

If you lived in such a world, would you ever exit? If so, what would you first want to do/accomplish before freeing up your spot for someone else? Feel free to comment below.