?

Log in

No account? Create an account

Previous Entry | Next Entry

Five Exabytes Redux

A while back I posted a link to the How Much Information 2003 study, and used it as a hook for a Guardian article. One of the comments in the study that I found interesting was a statement that 5 exabytes was the sum total of every word spoken by every human being.

Well, that may not be the case. This blog posting, in Language Log, suggests that we're actually dealing with a zettascale problem, and that 5 exabytes is actually about 8,000 times too small...

Interesting math. I think there may be a slight flaw in the assumptions on both sides of the argument. For one thing there doesn't appear to be a source for the original 5 exabytes statement, beyond this "Data Power Of Ten" fact sheet, which is something I should have tried to track down at the time. For another, the author of Language Log doesn't seem to be aware of the techniques used to encode speech for telephony. I'd be amazed if you couldn't shave more than a couple of orders of magnitude off his calculations if we used an 8-bit μ-law encoder at 8KHz.

Certainly his 16 KHz 16-bit linear single-channel audio, at 32KB per second is overkill for voice storage. My digital voice recorder can put more than 2846 minutes of voice on a 128MB memory stick, and that's using an LPEC codec at 16KHz, and telephony codecs are much more efficient, especially those used over satellite circuits...

And then there's the lossy codecs, like GSM, running 13kbit/sec...

(link originally found on Boing Boing)