?

Log in

No account? Create an account

Previous Entry | Next Entry

Why hasn't Amazon fixed things overnight?

Well, if you have read Amazon CTO Werner Vogel's blog or seen any of his recent presentations, you'll definitely be (like me) inclined to "cock-up over conspiracy" as the explanation for the current shambles.

So why have things gone this desperately wrong this quickly?

The simple answer is Amazon's architecture. It's highly distributed, and there's no operations team. Each component (and over 200 go into a single page) is run by its development team, of four to five people. They are responsible for its features, its development - and for making sure it runs effectively. The result should be a company that can move quickly in response to outside events.

At least that's the theory.

I'm afraid the real world doesn't work like that. I've been a developer and I've managed developers and I can tell you that what really happens is something like this:

Someone comes up with a neat idea that they evangelise among the other developers, and it gets added to the platform. The developers become wedded to their idea, and they keep adding features. Something from the outside occurs that affects the data managed by the service, and they don't notice. After all, it's their design and it's perfect. The problem gets worse, and a few external symptoms are noted and passed on to the developers. They're too busy to pay much attention to them, and so they ignore them. Then suddenly, BANG, and everything breaks.

Oh, and it's a holiday weekend and there's no one there to actually handle the problem as the whole team's gone off on a skiing trip.

Now I can't guarantee that's what has happened with the deletion of GLBT content from the Amazon ratings system, but I suspect it's more likely than not.

So here's where my conjecture comes in:

Someone probably had the idea of reducing Amazon's exposure to bad publicity without increasing the site's legal liability. Manual censorship of the rankings would certainly make the service more liable, so the idea was probably a tool that would let the site's users do the work for it. After all, if the community doesn't like it, then, well, US community standards laws apply and you're safe. A group of developers coded it up, and it worked well - for a while.

Either a parameter wasn't quite right, or someone released a new version of a keyword file without testing - and, well, suddenly the GLBT books were off the list. Maybe someone gamed the system, too - it's impossible to tell from outside.

A separate test and operations team would have been likely to spot the underlying flaw before it got released - or at least spotted the first wave of complaints and started to triage them effectively, with a more productive response than "It's a glitch".

So now Amazon has to unwind data that's spread across its distributed application platform, which may be stored in any or all of three different kinds of database, and in at least three different geographies and many more data centres.

Ooops.

That's going to take a while to deal with.

Meanwhile their Seattle-based PR team is just about to start a very long day - and a group of developers are going to be desperately trying to explain just went wrong.

[ETA 23/4/2012. After three years of this post being targeted heavily by spammers, I have locked commenting.]

Comments

damsel_ophelia
Apr. 13th, 2009 04:22 pm (UTC)
Re: Disability books affected also
Rebroadcasted. Not getting as much press, but just as important - and sickening.