# The Quantified Anatomy of a Paper

previously blogged on my adventures in self quantification (QS). In that post I wrote about the general system but did not delve into specific projects. Ultimately however the utility of self quantification is in the detailed insights it gives, and so I’m going to dive deeper into a project that passed a major milestone earlier today: publication of a paper. If you’re interested in the science behind this project, see my other post, A New Way to Read the Genome. Here I will focus on the application and utility of QS as applied to individual projects.

# A New Way to Read the Genome

I am pleased to announce that earlier today the embargo was lifted on our most recent paper. This work represents the culmination of over two years of effort by my collaborators and I. You can find the official version on the Nature Genetics website here, and the freely available ReadCube version here. In this post, I will focus on making the science accessible to the lay reader. I have also written another post, The Quantified Anatomy of a Paper, which delves into the quantified-self analytics of this project.

# NIH Spending Versus Diseases That Kill Us

An infographic has been making the rounds lately, purporting to depict the amount of money donated to help fight various diseases versus the number of actual deaths caused by each disease. This is the original infographic:

# What Does a Neural Network Actually Do?

There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others). Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving. To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem. I will do this here for the case of multilayer perceptrons. By the end of this informal discussion I hope to provide an intuitive picture of the surprisingly simple representations that NNs encode.

Aside

It is tempting to assume that with the appropriate choice of weights for the edges connecting the second and third layers of the NN discussed in this post, it would be possible to create classifiers that output $1$ over any composite region defined by unions and intersections of the 7 regions shown below.

# The Federal Government is Not Useless

There has been much haranguing about the apparent uselessness of the federal government. While I am no political pundit, I can speak about my little corner of the universe. The US federal government includes something called the National Institutes of Health or NIH, which happens to be the largest scientific research organization in the world. With a budget of over \$30 billion, it spends more on research than Microsoft, IBM, Intel, Google, and Apple combined, supporting over 300,000 researchers nationwide. It also employs 6,000 scientists internally, who collectively produce more biomedical research than any other organization in the United States. What does it mean for the NIH staff to be furloughed? It means that every single day, 16.4 research years are wasted, or about three Ph.D. theses. This is likely to be an underestimate because the scientists employed by the NIH are professionals whose scientific output exceeds that of graduate students, and the quality of NIH-produced research backs this up. What kind of research will be delayed every day? You can read the list yourself, but it includes things like deciphering the genetic code, inventing MRI, and sequencing the human genome. This is not hyperbole; all these discoveries were made by NIH-supported researchers, who have received 83 Nobel prizes in total.

The US is the world’s preeminent scientific superpower, “a player without peer” as Nature recently put it. Only through profound and self-inflicted displays of stupidity such as we have witnessed during the past 24 hours will this cease to be the case.

# Predictions Are Cheap in Biology

I just came back from ICSB 2013, the leading international conference on systems biology (short write-up here). During the conference Bernhard Palsson gave a great talk, which he ended by promoting a view that (I suspect) is widely held among computational and theoretical biologists but rarely vocalized: most high-impact journals require that novel predictions are experimentally validated before they are deemed worthy for publication, by which point they cease to be novel predictions. Why not allow scientists to publish predictions by themselves?

# ICSB 2013

I recently had the pleasure of attending the 14th International Conference on Systems Biology in Copenhagen. It was a five-day, multi-track bonanza, a strong sign of the field’s continued vibrancy. The keynotes were generally excellent, and while I cannot help but feel a little dismayed by the incrementalism that is inherent to scientific research and that is on display in conferences, the forest view was encouraging and hopeful. This is one of the most exciting fields of science today.

# 10 Months at Harvard, Quantified

I will soon reach the one-year mark of my fellowship at HMS, which seems like a fitting time to examine how effectively I have spent my time here so far. I have been a practitioner of self quantification long before the movement acquired its name, having tracked some aspect of my life since I was 16. Given the movement’s growing popularity, I thought it appropriate to share some of my life hacking experiments. My approach has cyclically peaked and waned in sophistication, something that I will expound upon later in the post, but I believe that the overall trajectory of my effort has been that of increasing usefulness. Any lifestyle change, particularly one that involves compulsive tracking of one’s behavior, ought to result in actionable information that is demonstrably useful and not merely be a quantitative exercise in vanity. In this post I hope to show that this can in fact be the case for self quantification.

# Is Terrestrial Life of Extraterrestrial Origin?

A few weeks ago a paper titled Life Before Earth was posted on the arXiv preprint repository. It came to my attention by way of this MIT Technology Review article and this blog post. The paper, using a rather simple extrapolation, argues that the apparent rate at which the complexity of terrestrial life increases suggests that its birth occurred approximately 9.7 billion years ago. Earth, in contrast, is around 4.5 billion years old. If their extrapolation is to be believed, then this discrepancy can only be resolved if terrestrial life is in fact of extraterrestrial origin. I will briefly summarize their argument, but I will not attempt to justify its validity. The original paper can be read here and is fairly accessible. The paper’s conclusions are consistent with a fact that has always puzzled me; the surprising complexity and maturity of what is known as the Last Universal Common Ancestor. It is this topic that I wish to focus on in this post.