Tuesday, September 29, 2009

Google Docs OCR

Google Docs API tests a new feature that lets you perform OCR (optical character recognition) on an image. There's a live demo that illustrates this feature: you can upload a high-resolution JPG, GIF, or PNG image that has less than 10 MB and Google Docs extracts the text and converts it into a new document. Google mentions that "the operation can currently take up to 40 seconds" and a small test showed that the service is not yet reliable: it's slow and it frequently returns errors.


The results are far from perfect and you'll find many errors, but the service is free and it's constantly improving. Here's the result of the OCR for this scanned document:


There aren't many free OCR services available, so an OCR service provided by Google would be very popular. ABBYY FineReader Online is one of the best online OCR services, but the free version is limited to 10 pages a day.

Google sponsors the development of an open-source OCR software called OCRopus, but it's not clear if the online service provided by Google Docs uses OCRopus.

No comments:

Post a Comment