Wednesday, September 16, 2009

Google Buys reCAPTCHA

reCAPTCHA seems like a perfect match for Google: it's a project that generates CAPTCHAs and uses the results to digitize books. "reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. (...) Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one."


It's no wonder that Google decided to acquire reCAPTCHA and use the service to improve Google Book Search's digitizing accuracy.

"reCAPTCHA's unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process."

The service offers a simple JavaScript API that allows you to embed CAPTCHAs in any web page and many popular sites use it: Facebook, Twitter, Ticketmaster.

No comments:

Post a Comment