I’m fascinated with a Google Maps mashups (I prefer “mashup” to “hack”). Having recently read 109 East Palace (yet another very good history of the Manhattan Project), pondering nuclear (and other) holocaust in the brilliant Extremely Loud and Incredibly Close, and listening to Amy talk about “the great concavity” (the result of a nuclear explosion that is a central part of the geography on the eastern United States in Infinite Jest) – I’ve had explosions on my mind recently. I can’t say the Google Maps High-Yield Detonation Effects Simulator is particularly comforting, but it is enlightening.
A few days ago, I wrote some thoughts on the war on spam. Today Matt Blumberg has an insightful post up on what’s going on in several state legislatures concerning so called “Child Protection Acts” which are actually just children to the mess that almost occurred at the state level at the end of 2003 which resulted in the federal CAN-SPAM law.
I was on a board call for a company today where we talked about “acceptable downtime” for their web-based service. This company has commercial customers that depend on its software to run their businesses and the software in question is delivered “as a service.” I’ve got a number of companies using this approach (vs. a straight software license approach) and I have a lot of experience with this issue dating back to investments in the 1990’s in Critical Path, Raindance, and Service Metrics.
While it’s easy to talk about “5 nines” (99.999% uptime), there are plenty of people who think this metric doesn’t make sense, especially when you are building an emerging company and have a difficult time predicting user adoption (if you are growing > 20% per month you understand what I mean.) While most companies have SLA’s, these also don’t really take into consideration the actual dynamics associated with uptime for a mission critical app.
For example, when I was on the board of Raindance, the CEO (Paul Berberian) described Raindance’s system as similar to an incredibly finely tuned and awesomely powerful fighter plane (on that even Jack Bauer would be proud to fly in.) There was no question that it was by far the best service delivery platform for reservation less audio conferencing in the late 1990’s. However, in Paul’s words, “when the plane has problems, it simply explodes in the air.” Basically, there was no possible way to create a situation where you can guarantee that there will not be a catastrophic system failure and – in this situation – while there was plenty of fall over capacity, it’s too expensive to create 100% redundancy so it will take some time (usually in Raindance’s case an hour or two, although it once took two days) to get the system back up. The capex investment in Raindance’s core infrastructure (at the time) was around $40m – we simply couldn’t afford another $40m for a fully redundant system, even if we could configure something so it was – in fact – fully redundant.
There have been many high profile services that have had catastrophic multi-day outages. eBay had a number of multi-day outages in 1999; Critical Path had a two day outage in 1999 (I remember it not so fondly because I was without email for two days in the middle of an IPO I was involved in); Amazon had some issues as recently as last years holiday season; my website was down for an hour last night because of a firewall configuration issue; the list goes on.
Interestingly, as an online service (consumer or enterprise) becomes more popular, the importance of it being up and operational all the time increases. While this is a logical idea, it’s a feedback loop that creates some pain at some point for a young, but rapidly growing company. As the importance of the service increases, expectations increase and – when there is the inevitable failure (whether it’s for a minute, an hour, or a day) – more people notice (since you have more users).
After watching this play out many times, I think every company gets a couple of free passes. However, once you’ve used them up, the tides turn and users become much more impatient with you, even if your overall performance increases. Ironically, I can’t seem to find any correlation concerning price – the behavior that I’ve witnessed seems to be comparable between free services, services that you make money from (e.g. eBay), and services that you pay for.
My advice on the board call today was that – based on our rate of growth (rapid but not completely out of control yet) – we should get ahead of the issue and invest in a much more redundant infrastructure today. We haven’t yet used up our first free pass (we’ve had several small downtime incidents, but nothing that wasn’t quickly recovered from), but we had a scare recently (fortunately it was in the middle of the night on a weekend and – given that we are a commercial service – didn’t affect many users). The debate that ensued balanced cost and redundancy (do we spent $10k, $100k, $500k, or $1m) and we concluded that spending roughly up to 50% of our current capex cost was a reasonable ceiling that should give us plenty of redundant capacity in case of a major outage. Of course, the network architecture and fall over plan is probably as important (or more important) then the equipment.
I’m searching for a way to describe “acceptable downtime” for an early stage company on a steep adoption curve. I’m still looking (and I’m sure I’ll feel pain during my search – both as a user and an investor), but there must be a better way than simply saying “5 nines.”
I was sitting in a meeting with Ryan McIntyre and his cell phone went off. His ringtone sounded vaguely familiar but I couldn’t quite place it. I told him. He laughed and reminded me about the story behind Hell Sink Ye. If you are looking for a new ringtone, try this. If you want a ring tone symphony, try this.
I’m a Firefox fiend (extension count going up daily; Greasemonkey security fears acknowledged) and I just found out that if you have multiple tabs, you can go to tab N by hitting Ctrl-N. If you are like me and can’t visually scan horizontally for numbers (e.g. I have to count from left to right to figure out which tab I’m at – minor dyslexia-like thing) there’s an extension that’ll number the tabs for you.
Joshua from del.icio.us just commented on my for:bradfeld post about tagging things in del.icio.us for me. Joshua has changed how the “for:“ prefix works and the right way to tag something for me in del.icio.us is to use for:bfeld as the tag.
I’ve gotten some good stuff from folks over the past week – tag away and send me stuff you want me to look at. If you don’t know what I like or am interested in, er, um, read my blog? Or – read Amy’s – she gives plenty of hints also (plus she talks about how robots entertain her – what more could a nerd like me want in a woman??!)
I woke up one day (yesterday) and all of a sudden I’m using Skype all the time (including SkypeOut). What changed? First, I’m at my house in Alaska and my habits got shaken up. Second, I installed my Vonage phone so I could have a 303 area code while I’m up here (and just to play around with Vonage) so I decided that while I was at it, I should try Skype as a phone and see how it compared. But – most importantly – the Skype for Outlook plug-in hit beta.
Holy shit – this is cool. It works exactly as you’d want it to. I hate talking on the phone, but it’s a side effect of being a VC. I live in two apps – Outlook (on the left monitor) and Firefox (on the right monitor). Suddenly I’ve got a little magic toolbar that lets me call people via Skype by looking up their info in Outlook. Perfect. And – it works. $12 from Paypal (10 euros) and I’m set with long distance for a long time.
Now – all I need is a headset with a microphone that integrates with my fancy speakers.
Fred Wilson clued me into a fun use of del.icio.us. He calls it the “for:“ tag. He set one up for his wife Joanne (Gotham Gal). Whenever he wants her to see something on the web, he simply tags it “for:gothamgal”.
I just set up a for:bradfeld tag in del.icio.us. Then – I subscribed to it in NewsGator. If you run across something on the web you want me to see, just tag it.
I stumbled over Edward Tufte’s Sparklines this morning. Tufte’s The Visual Display of Quantitative Information is one of my favorite books and his essay The Cognitive Style of PowerPoint helped shape my point of view that PowerPoint is contributing to the intellectual degredation of contemporary society.
Sparklines are an awesome new way to display high densities of quantitative information. Tufte nailed it once again – hopefully we’ll start seeing these pop up all over the place. Fortunately there’s already a PHP Graphing Library up at Sourceforge and O’Reilly’s xml.com site has a good article with Python code up.