As full-text indexing and search mechanisms become more and more sophisticated, organizations are finding that not all of their content is accessible to these technologies, even though there is a real and immediate need for them to be.
Information that is stored in image formats like TIFF or Image-Only PDFs (such as scanned documents, faxes, etc.) cannot be included in a full-text search because although the human eye can read the text on the image, it’s merely a bunch of pixels to the computer.
By processing these documents through an OCR engine, the engine is able to ‘Read’ the characters found in the image, and either extract the text found to an external file, or provide a “Text Layer” to the document, making it accessible to search engines.
This also has the added benefit of allowing users who are reading the document to be able to search for key words or phrases.