Enter the maze

The power of a word

a library of old books, with a laptop

Sometimes it can be hard to see how small things can be powerful. How can one tiny act actually get anything done? If you ever feel like small things can’t make a difference, think about reCAPTCHA. It’s an extraordinarily clever idea that is helping to digitise old books, one tiny word at a time.

The computer fooler

Chances are you know what CAPTCHAs are, even if you’ve never seen the word itself, which stands for “Completely Automated Turing test to tell Computers and Humans Apart”. You know when you sign up for an email address or buy something online, you often have to decipher a squiggly bunch of letters as a security measure? That’s because humans are much better than computers at reading distorted letters, and some sites want to make sure that only humans are able to use their services. For example, if not for the squiggly CAPTCHAs when signing up for email addresses, spammers could write code that would automatically sign up millions of addresses to send spam from. Likewise, touts could flood ticket websites and buy up all the tickets for an event in seconds.

With all the signups and purchases going on all over the web, humans have to spend a lot of time proving they’re humans. More than 100 million CAPTCHAs get typed every day. If you put that all together, that’s hundreds of thousands of hours of work, each day, just typing random rippling letters into web forms. Suddenly a tiny bit of effort seems like a lot – and it was kind of getting wasted. OK, it’s a useful security measure, but there wasn’t any lasting benefit from all that typing. That’s where reCAPTCHA comes in.

An army of typists

The idea came from Luis Von Ahn, a computer scientist at Carnegie Mellon University in Pittsburgh. He was one of the original creators of CAPTCHAs, and when he saw how much brainpower goes into solving his puzzles, he thought that there might be a way to channel all that effort into something good. What do you do with an army of hundreds of millions of typists? It turns out you can assign them to typing up old books. Lots of texts written before the computer age need to be digitised in order to make them more accessible, and to preserve them for the future. Technology exists already to digitise text – it’s called Optical Character Recognition, or OCR. But the problem is that letters might be faded or distorted, and computers find distorted letters difficult to recognise. That, as you’ll recall, is exactly why CAPTCHAs work in the first place.

an example of a reCAPTCHA

Ready to solve?

The idea of reCAPTCHA is that whenever a word comes up that the OCR can’t read, it gets made into a CAPTCHA for humans to solve. The CAPTCHA is actually two words, one that’s already solved and one that’s unknown. The solved word works as a check: if the person types the solved word correctly, the system assumes the typist is a human, because we already know that it’s very unlikely a computer can get CAPTCHAs correct. The system then remembers the person’s answer for the unknown word and compares it to other people’s answers for the same word. If three humans agree on the spelling of the word the system decides it must be right, and it gets added to the digitised book. It also becomes a control word – the word on the other side that determines whether someone’s human. Clever eh?

That’s the technology, anyway. What’s really amazing about reCAPTCHA is the human element. Harnessing all those tiny moments spent by millions of people around the world has helped preserve a staggering amount of text. All in all, it’s the equivalent of 160 books a day. The folks at reCAPTCHA are currently working on digitising old editions of the New York Times. That amounts to over 130 years of newspapers, which they expect to zip through by 2010. It’s nice to know that just booking some tickets online can help a tiny bit in something really powerful, like keeping old knowledge alive.