The Quantified Anatomy of a Paper

previously blogged on my adventures in self quantification (QS). In that post I wrote about the general system but did not delve into specific projects. Ultimately however the utility of self quantification is in the detailed insights it gives, and so I’m going to dive deeper into a project that passed a major milestone earlier today: publication of a paper. If you’re interested in the science behind this project, see my other post, A New Way to Read the Genome. Here I will focus on the application and utility of QS as applied to individual projects.

I have been working on this project in some capacity for about two years. The first thing I wondered about is how much time in total it has consumed. The answer is 1,363 hours and 45 minutes, or about 34 standard work weeks. These numbers reflect actual worked time; things like goofing off, taking bathroom breaks, and chatting with coworkers are excluded. All supporting activities that are not specific to this project, for example attending group meetings and scientific lectures, are also excluded, and so in reality the total time consumed was much more. In addition, the time and effort put in by my collaborators is not accounted for. Given the final results, I’m fairly happy with how my time was spent. The ability to make such an assessment is one of the biggest advantages of self quantification. I don’t have to wonder whether I’m spending my time right, nor do I have to fret about all the wasted hours. I can make more informed judgements. How long did the “SH2 project” take? 1,363 hours.

Of course it gets more interesting when we dig deeper. The first thing I wondered about is task breakdown. What did I actually spend my time on? Here it is:

SH2 Breakdown

Not surprisingly, and somewhat reassuringly, most of my time was spent on research. This means either writing code, working out a model, or just thinking. It excludes the consumption of other people’s work and so in some sense captures my net useful output. The fact that it is my top activity for this project is good news, but it barely crosses the 50% threshold. Instead, and this came as an absolute shock, almost 30% of the time was spent “disseminating”! That mostly refers to writing the actual paper, but also includes giving (and preparing) talks, and all sorts of outreach. The fact that it consumed so much is a little disconcerting, and suggests, for me at least, that writing a paper is a major commitment. It is a significant part of the project and should not be taken lightly. In particular, this argues against publishing “me too” papers (to be sure, the fact that this paper landed in a good journal raised the bar for quality and polish.)

If I were to tease out the one major insight that’s come from this QS analysis it would be the time spent on writing; it is a very non-trivial piece of information. But there are others. Somewhat surprisingly, this paper didn’t require all that much literature reading. This is in part because it was in a space that I’m intimately familiar with and so I didn’t need to learn a lot for it, in contrast to another project i’m working on (see below). It also didn’t require much in the way of planning and strategizing, largely because I walked in knowing what needed to be done (again see below). On the other hand, the administrative burden was quite high. Over 10% of my time was spent on logistics, mostly meetings. This is to be expected given the collaborative nature of the project, and all in all is actually much lower than I feared. The 10% was well worth it, and knowing that it was “only” 10% has reassured me of the worth of such collaborative projects.

Just for fun, by breaking things down over time, I get this:

SH2 Time Series

The massive spike corresponds to the major push for writing the paper. The colors reflect that too, where research activity plunged and writing soared. I should note that I did not magically triple my output during that period. The spike mostly represents a shift in priorities, when all other projects were put on the back burner. There was a phase during which I increased my total output over baseline by 42%, but only for about 4 weeks or so. Needless to say I had a very “minimal” lifestyle during that period. In fact I even stopped going to work to save on commuting time.

The time series analysis doesn’t provide any insights that I didn’t already know, but it is gratifying to see it all in one place. What is more interesting is comparing this project to another, riskier project, on which I have nothing published yet. The second project is very different, taking on a virtually impossible problem using a completely untested approach. While I didn’t quite know it beforehand, these differences are starkly reflected in the activity breakdown:

ProtLing Breakdown

Research still dominates, but less so than before. What is completely different however are all the other pieces. Relatively little time is spent on logistics or writing, reflecting the fact that I’m working on the project alone. Moreover, an enormous amount of time is spent on reading the literature and on planning and strategizing, a full 50% in fact. This is a vastly different situation from the first project, due to the emphasis on execution vs. fundamental research. In some ways however this is also suggesting that the second project is not proceeding so well, that my time allocation is off.

The time series here is actually more interesting:

ProtLing Time Series

The yellow spikes refer to major learning efforts, followed by major bursts of research, bust, and then repeat. The lull in the middle largely coincides with the surge in the other project. Nonetheless, knowing what I know about the project, it is clear to me that I am spending far more time than I should be learning and strategizing, and this too is very actionable information. The temporal dynamics are useful because they point to a pattern in my approach that is suboptimal, where I alternate between exclusive modes of learning and doing. More recently I have tried a more integrated approach, and that appears to be reflected in recent months.

On the whole I hope this analysis provides a window into how QS can be practically useful. After my original post I received many comments questioning the value of such a system, which was puzzling to me as I tried to emphasize its utility by highlighting actionable insights. Hopefully I made a more convincing case this time around. If you have any questions feel free to leave them in the comments.

Update (11/14/14): A reader (mokestrel) asked in the comments below about the distribution of contiguous time blocks spent on writing vs. other tasks. This is an excellent question as writing does often require long blocks of uninterrupted attention, and one would expect to see this in the quantitative data. The first thing I did was just to plot a histogram of the lengths of uninterrupted writing blocks vs. all other tasks for the same project (x-axis is hours).

Writing Breakdown

The distribution is shifted to the right, although it’s not a dramatic effect. I realized however that very short breaks often divide what are virtually contiguous blocks, and so it makes sense to fuse blocks that are not separated by anything other than a break. Here’s the adjusted distribution:

Writing Breakdown Contiguous

Now the effect is much more dramatic, and there are in fact chunks of time that are longer than 12 hours, corresponding to days where all I did was write! I recall a period of about one week where that was more or less true, and the above outliers likely reflect it. Now I know what to strive for next time I write…


  1. Pingback: A New Way to Read the Genome « Some Thoughts on a Mysterious Universe

  2. Very nice report! I would agree that 34 weeks is about right, and your task breakdown is nice. As I guess many would find, writing takes up quite a bit of time, more than we thinkk, let alone answering the reviewers later on. I find that the hard part about writing are the need for bouts of time. I don’t write very well if have only short periods of time, need 2-4 hours straight. Any findings along those lines, that your writing part may require longer bouts of time?

      • That’s great Mohammed, you’ve made it very clear in the second graph. Now I can show that to my PhD students! Its great to have real “quantified” data on this, how much the whole paper writing takes and how we need those long bouts. I think it was Gerald Edelman who said that he worked best in airports. I do too, while travelling, or on trains, and I think its because I know when a long bout is coming and can dig in. Thanks again

    • Hello all. I’ve been following Chris Bailey here at KWC for only a week or so now. Found the link on Farmers Almanac when someone told me to check this out regarding the developing of Sandy. I think Chris does an excellent job of informing us about what to expect. I do have a question, what does the colors mean on the models. I live in Richmond Virginia and most of my coworkers live/work in Kentucky. Is there something that can tell me what the colors indicate? Thanks. Keep up the good work Chris, wish you could do Central Virginia also:)

  3. Pingback: What We Are Reading | Quantified SelfQuantified Self

  4. Pingback: URLs of wisdom (15th November) | Social in silico

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s