Your Analytics Data Is Very Wrong

Apr 15, 2008

I’ve written about this in the past so I expect this is nothing new for you, my dear reader. The title summarizes everything I am going to say.

My buddy Fred Wilson had a comScore chart about Delicious’ growth (or lack of growth) in his hugely popular We Need A New Path To Liquidity post. He used this data to make the point that web companies are languishing under the ownership of their acquirers when they get bought relatively early in their life.

The founder of Delicious – Joshua Schachter – disagreed with Fred’s conclusion on Delicious and Fred wrote an updated post titled Delicious where he corrects his assertion and asks the (probably rhetorical) question "I wonder how many other web apps are accessed via third party services (twitter’s traffic is largely through its api)? And if that’s a growing trend, then what does that mean for our ability to measure audiences, traffic, and growth from a distance?"

I’ve been a web analytics junky since my first ever angel investment – $25k in NetGenesis (it was net.Genesis at the time) – back when a "web log" was an uncomfortable thing to ponder. I’ve watched, used, and invested in several generations of web analytics companies. I am comfortable making the statement that "whenever one becomes a dominant analytics platform, it immediately starts to decline in accuracy."

While the graphs and tables might be pretty and are almost always used by the "leaders" to assert their "leadership", they distort and misrepresent what’s really going on. When comScore first published their Widget Metrix in 2007, Om Malik correctly compared it to a Jellybean Contest. I’ve yet to meet a widget report that is remotely accurate based on my inductive reasoning (e.g. so far I’ve been able to come up with at least two widget providers in the top 10 of any list that is missing from any list that I’ve seen.)

Now, I don’t mean to pick on comScore. I’ll pick on a friend. My FeedBurner reader data shows that I have 117k readers (or subscribers) to my Feld Thoughts blog. While I’m flattered, this is bullshit. When I dig into the actual user agent data, I find that 98,966 come from a Feed Reader called BlogRovR. I happen to know that BlogRovR is what used to be called Activeweave Stickies, which is a company I looked at 18 months ago. They "autosubscribe my feed" whenever someone installs BlogRovR (which means my subscriber count is inflated by around 99k – I imagine some of the BlogRovR people look at my feed, but certainly not 99k of them. Do the math.) Oh – everyone else that is autosubscribed to BlogRovR (A VC, TechCrunch, …) has the same subscriber count inflation.

While it makes me feel all warm inside that I have the number 117k visibly displayed on my blog and I show up on as #9 on Rating Burner, this is just a very personal example of why "your analytics data is very wrong."

At some level, there isn’t anything wrong with the analytics data being wrong (or inaccurate) – that’s the nature of the beast and why anyone that uses analytics data to figure stuff out should use multiple sources to generate their own analysis. However, I’m regularly amazed by how many conclusions are derived from data sets that have known, fundamental flaws.

As always, check your assumptions.