My post on How to Fix Obamacare generated plenty of feedback – some public and some via email. One of the emails reinforced the challenge of “traditional software development” vs. the new generation of “Agile software development.” I started experiencing, and understanding, agile in 2004 when I made an investment in Rally Software. At the time it was an idea in Ryan Martens brain; today it is a public company valued around $600 million, employing around 400 people, and pacing the world of agile software development.
The email I received described the challenge of a large organization when confronted with the kind of legacy systems – and traditional software development processes – that Obamacare is saddled with. The solution – an agile one – just reinforces the power of “throw it away and start over” as an approach in these situations. Enjoy the story and contemplate whether it applies to your organization.
I just read your post on Fixing the Obamacare site.
It reminds me of my current project at my day job. The backend infrastructure that handles all the Internet connectivity and services for a world-wide distributed technology that was built by a team of 150 engineers overseas. The infrastructure is extremely unreliable and since there’s no good auditability of the services, no one can say for sure, but estimates vary from a 5% to 25% failure rate of all jobs through the system. For three years management has been trying to fix the problem, and the fix is always “just around the corner”. It’s broken at every level, from the week-long deployment processes, the 50% failure rate for deploys, and the inability to scale the service.
I’ve been arguing for years to rebuild it from scratch using modern processes (agile), modern architecture (decoupled web services), and modern technology (rails), and everyone has said “it’s impossible and it’ll cost too much.”
I finally convinced my manager to give me and one other engineer two months to work on a rearchitecture effort in secret, even though our group has nothing to do with the actual web services.
Starting from basic use cases, we architected a new, decoupled system from scratch, and chose one component to implement from scratch. It corresponds roughly to 1/6 of the existing system.
In two months we were able to build a new service that:
- scales to 3x the load with 1/4 the servers
- operates at seven 9s reliability
- deploys in 30 seconds
- implemented with 2 engineers compared to an estimated 25 for the old system
Suddenly the impossible is not just possible, it’s the best path forward. We have management buy-in, and they want to do the same for the rest of the services.
But no amount of talking would have convinced them after three years of being entrenched in the same old ways of doing things. We just had to go build it to prove our point.