Revisiting Median vs. Average With Accelerator Data

US Median Income - 1947 to 2007 - 20th, 40th, ...

US Median Income – 1947 to 2007 – 20th, 40th, 60th, 80th, 95th Percentiles (Photo credit: Wikipedia)

I was in a board meeting yesterday at BigDoor where we were benchmarking our current numbers against a couple of recent studies on SaaS-based companies including the 2011 Pacific Crest Private SaaS Company Survey. Several of the things we looked at were averages; the Pacific Crest data was presented as medians. I subsequently had a short conversation in the evening where someone asked me about what I thought the mean was across our investments on a particular metric; I responded that the mean was meaningless – we should be using the median (which I then gave the person asking the question.)

I see people use average all the time when they should be using median. I also find people constantly confusing average, mean, and median. Most of the time when people say “mean”, they mean (oops – I couldn’t help myself) “arithmetic mean” which is the same as “average.”

The Accelerator Data presented on Seed-DB is a great example that entrepreneurs should be able to quickly relate to (unlike the image I included in this post, which I find completely impenetrable.) Seed-DB presents both Average and Median. If you sort by Average $ raised per company, you get one picture. If you sort by Median $ raised per company, you get a very different picture. Now, there’s a lot of missing or estimated data for many of the accelerators, so that impacts the validity / accuracy of the data set, but it’s a great example of how average vs. median changes what you see.

As an entrepreneur, I encourage you to think hard about whether the right thing to compare a particular metric to is median vs. average. While average can be useful, I generally find median to be a much more enlightening number.

Enhanced by Zemanta
  • Regarding accelerator and investment data, my conjecture is that median is more relevant to the company/founder and mean is more relevant to the investor. The thought being that in both cases what you want is some indication of your expected result, and the investor has a portfolio whereas the founder does not.

    • Maybe, but the mean as an indicator of expected result probably changes more dramatically over time, so you have to be very careful if you are just looking at it at one point in time as an investor. But I totally agree with media on the company/founder side.

      • RIght mean isn’t a great metric for an investor other than in hindsight after all the data is in.

        Maybe the alignment is that a long term oriented investor just always thinks from the perspective of the founder so median is a better short term stick anyway. Stated explicitly, you should believe any of your investments could be the outlier or wtf are you doing with them 😉

  • I think it’s great to talk about statistics as they pertain to venture funding and entrepreneurship. But, lets not forget that the median startup, while perhaps being a wonderful learning experience for the founders, is financially a failure. If we all thought in terms of median then we’d say “boy this startup stuff is impossible, guess I shouldn’t even try”. The statistical definition of “expected value”, which is obviously based off of the “value” you “expect” from something, is heavily dependent on the mean outcome.

    In fact, of all the measures of central tendency typically used to infer future gains from past performance, a topic for another day, the mean is the only one that justifies attempting to start a company. The mode, or the most commonly occurring outcome for the non statistics speaking, is definitely not something you want to use as inspiration for your venture.

    So, to recap, as it pertains to entrepreneurs who need that inspiration and that unyielding will to succeed, looking at past data about the financial success of startups:
    mode – you will fail
    median – you will fail
    even 3rd quartile – you will fail
    mean – hey, this might be worth a shot

    So yea, the mean does matter, if for no other reason than to keep telling us starry eyed entrepreneurs to keep going.



    • I don’t think an entrepreneur should use any of these to justify starting a company. Outcome is one measure. There are lots of other qualitative and quantitative reasons to start a company. Just focusing on outcome is a mistake.

      Also, there are lots of different measures at play. I agree that “median outcome” of all startups is likely $0. But if you segment them differently, you see lots of different things going on. For example, I don’t expect that the median outcome of a company in our 2007 fund is going to be $0. While we have some $0’s, I expect the median will be > 0.

      • Jeff

        I wouldn’t say that I’m only focusing on the financial outcome, at least not any more than I should be in a comment that’s related to a post about startup financing. Rather, I’d say that a reasonable expectation of potential financial success is necessary, but not sufficient, for starting a company.

        For your fund, I’d hope your median investment will have a non-zero financial result. But, I’d guess you are already carving out a pretty selective slice of the universe of companies who are raising money.

        100% of companies who are successful are successful; welcome to the tautology club.


      • For startups in general modal outcome is zero (most fail) median is also certainly 0 (over half fail) and mean is greater than 0. (There is a return however small because all market wealth held by corporations is part of the positive balance)

        • Sure, but I’m not sure “Startups in general” matters very much.

  • GeorgeU

    “I see people use average all the time when they should be using mean” – if you are not refering to “arithmetic mean” which “mean” are you refering to?
    Traditionally in statistics lingo: Mean = Arithmetic Mean = Average
    Any other “mean”, is usually preceded with something else like “geometrical mean”.

    • Wow – that was a painfully self-referential mistake on my part. I meant to write “I see people using average all the time when the should be using median.” Oops. Fixed.

  • Pingback: Geek Reading February 21, 2013 | Regular Geek()

  • It’s rare for a single metric to convey the whole story. Doing Torbit has given me a renewed appreciation for histograms. I wish people used them more.

    • I have always loved histograms.

  • I missed you at the event last night Brad, it was a good one. I actually wrote a blog post on this exact issue last week, where I talked about how looking at the wrong “average” screwed up how we thought about a crucial piece of our business. In fact the mode ended up being the most relevant average. Thank you for validating a recent rant of mine

    • Glad you enjoyed the event. Glad to validate a rant anytime.

  • Thanks for this post Brad. I like to look both at median and mean (and the difference between them). Looking at the mean vs median for something like CLV can kill a company that is trying to establish a customer acquisition budget. Looking at both, one can help judge the effect of outliers and can potentially be the difference for a company.

  • F*** the median, means and averages.
    Go for the Best-in-class benchmarks instead. Aim to be in the 5 percentile.

    • How about the 1%? (Bad inside American political joke).

      • But Pacific Crest surveyed only 70 companies. 1% wouldn’t even amount to a full company 🙂

        Is the joke that bad?

  • Peter LePiane

    Although I agree that the median is usually more informative as a measure of the centre (excuse the canadian spelling) than the mean, comparing to both measures of centre (ie. median and mean) is most informative. As long as it is understood that the mean is heavily affected by outliers and the median is more representative of the “typical” value in the population, no great harm should come from the comparison to either/both. That, I assume, is Brad’s point in saying that one should “think hard” about which to compare to.

    In my humble Lean Startup opinion, I believe the most efficient metrics evaluation should consider where the metric falls in the “5 Number Summary” of the data collected:

    1. The minimum value
    2. The 25th percentile
    3. The median
    4. The 75th percentile
    5. The maximum value

    This summary can even be displayed as a “boxplot” ( which is a nice one-stop-shop visual.

    Looking forward to discussing the specifics with @BFeld at INcubes Demo Day in Toronto next Weds :).

    • RBC

      Hi Peter – I like the idea of looking at all five figures to try to suss out meaning and avoid soft logic. Can you check that link though – it doesn’t work for me.

  • Scott

    I had a statistics professor in college remind us that just because your head is in the oven and your feet are in the freezer, that you are not necessarily comfortable.

  • Pierre Powell

    Funny, I have seen no discussion of the mode. I recently took a class in Oil & Gas Reserve Analysis, and realized that most Oil & Gas models are presented as a skewed distribution or a abnormal distribution. I believe that private equity funding is similar. Therefore, when drilling (funding a company) the knowledge is I will drill some dry holes, many with small returns, and a few “gushers”. Therefore, the “mode” represents the most likely return on any one well (investment). This number is skewed away from the median and mean. If I drilled 1000 wells, my return would be much higher, because my return would reflect the mean. The probably is I can’t afford to drill 1000 wells, so to spread risk and approach the mean, I need to make smaller investments in larger number of wells (companies). Obviously this spreads risk.

    Understanding the most likely result (mode) is a much better number to understand the expected return of an investment in a skewed environment. Just a thought!

  • I find this post somewhat bizarre – there can be no best without context. There is a choice most suited to convey some idea about data.

    I want to buy 100 apples regardless of their weight because I have 100 guests to feed. I know the price per lb – I know my “average” budget if I know the mean weight of an apple (from the market in question).

    I intend to sell low margin high volume mens hats – I will stock only one size. I had better know the modal value of head size to centre my market where I will have most clients (and most competition).

    I wish to sell mens and womens hats one size each. I only have reasearch re human head sizes not by sex. the k2-means head size of the entire population will tell me what I need to know.

    I wish to score 5 companies using three performance indicators that are orthogonal – I need a geometric mean.

    I want to divide a group into two equal numeric groups by height – ‘the median is my filter.

    All of the above are legitimate mathematical techniques to represent a population, there are many many more.

    Now Brad say I want the number of soldiers killed by being kicked by a horse each year in each of 14 cavalry corps over a 20-year period (perhaps this could be critical to my business) then like I would be interested in Poissons distribution.

    So Brad no offence but…

    >>While average can be useful, I generally find median to be a much more enlightening number.

    This expresses measures that supposedly interest YOU – but has absolutely no relevance to anyone else – It means no more than me telling you my favourite number. Hint ( it is not irrational) .

    The best measure depends ONLY on the idea they need to convey in context, so if wise they will not be influenced by your preference – (which is arbitrary ).

    However If you *mean* – understand what you are talking about – this is indeed sage advice on average!

    • I meant it in the context of benchmarking company performance on a variety of metrics.

      • Then maybe go with a weighted geometric mean of the %tile rank of each metric

        ie multiply all %tile or 1-%tile together so the product of the factirs of importance counts – eg multiply product

        • I think the point is that measuring the median is better than measuring the average because it eliminates the outliers and thus, the amount they skew the outcome.

          • Traditionally outliers are eliminated before class statistics are generated. If there are outliers that should be ignored, their effect is not be attributed to any average calculation. This can require two passes over a dataset to eliminate outliers and then process results, but simplicity is no excuse for innaccuracy

  • For accelerators, there is a more nuanced number: give the % that had raised capital (or made meaningful revenue) within 9 months. Then give the mean and median for that set, only. YC’s median is pulled down by a lot of companies that fold at the end of the program (or soon thereafter). Their mean is pulled up by Dropbox and AirBnB. So, you might want to drop the top and bottom outliers and take the mean of the remainder.

    At which point, of course, you’d wish you’d taken Josh’s advice and made a histogram.

  • Art

    I was coincidentalmy reading a statistics book today (for a work project) and the author states that ‘average’ is a general term that includes many specific types of functions including mean,median,mode,geometric mean,etc.

    If you intend to compare a single point to the average of a set of others you will also know about the dispersion of those points about the average for the comparison to have much meaning.

  • Pingback: Google Breaks Its Own Rules, Facebook Partners With Big Data, JC Penney Employees Can't Get Enough YouTube, & More. Rocket Clicks Blog()

  • Pingback: Performance Metrics: Average, Mean, Median, and Outliers «