Thursday, May 7, 2009

Comparing citations and downloads

The April issue of the Journal of Vision has an interesting editorial on comparing citations and downloads for individual articles. Beau concludes that download statistics provide a useful indicator, two years in advance, of eventual citations. Downloads are also a useful measure in their own right of the interest and significance of individual articles.

For me, these download statistics have always been spooky. In a post of 5 March 2007, I was puzzled about the interest in an old tech. report on a Professional Portrait Studio for Amateur Digital Photography, which is totally outdated. It was written when amateur cameras had no more than 6 effective bits per pixel per channel, low sensitivity, and most of all, an amazing shutter lag of 4-6 seconds.

The download wave started in January 2006, and since then it has been in the top 10 downloads almost every month. How can 400 hundred people a month be interested in it?

Then, in October 2008 an equally misterious wave of downloads started for Spectrophotometer Calibration and Certification. That is a quite arcane subject, with not many people interested in it. However, it remains in the top 10 downloads, with 586 downloads just in the last month of April. What is going on?

In the old days, when publication was on paper, things were easier to explain. For example, in 1996 I cooked up in a few days and under pressure a tech. report with the suggestive title W3 + Structure = Knowledge. More than 300 people sent cards to HP ordering a copy of the report. This was when a tipical report averaged less than 10 requests and only the report on the C++ standard template library had received more requests.

But it was understandable. In June 1996 the dot-com boom was in full swing and companies were running big advertisment campains claiming to be the dot in .com or describing how the Internet was a tsunami. Except for HP, who was totally mum about the Internet. So, it is understandable that when the first HP publication with Internet as a keyword appeared, people would jump on it.

When a few months later I rewrote it under the title Internet's Impact on Publishing, it had the same popularity. Then I rationalized that since the previous report was mostly hot air, people just wanted to see if HP really has nothing up the sleeve. In fact, when I rewrote it once more under the title Structure and Navigation for Electronic Publishing, it went totally unnoticed.

In summary, I can understand the old order logic, but not the modern download logic. Do you have any insights you can share?

