Have you ever noticed that some PDFs don’t actually have “real” text? You can’t highlight or search the text—each page behaves like a single image?
Optical Character Recognition (OCR) is a way to automatically convert an image of text into actual machine-encoded text that is sensible to computers and apps.
This conversion is important because it allows:
- Greater searchability
- Better legibility
- Use of text-to-speech software
- Use of other assistive technologies
In many ways, a PDF that has gone through an OCR process is better and more useful to everyone. Students can search for a term they remember reading about, and you don’t need to worry about students squinting over blurry text.
Many modern scanners will automatically apply OCR to a scanned document. Using the scanner in MCAD’s library or the mobile app Adobe Scan will produce pretty reliable results.
So, you have an older PDF that hasn’t gone through the OCR process. What should you do?
If you can find a PDF where the image quality is pretty good, and you have access to Adobe Acrobat Pro, you might be able to run OCR on it automatically:
- Open the PDF file in Acrobat.
- Click on the Edit PDF tool in the right pane. Acrobat automatically applies OCR to your document and converts it to a fully editable copy of your PDF.
- Choose File > Save As and type a new name for the new and improved PDF.
If the PDF isn’t a good enough image to OCR properly or you don’t have access to Adobe Acrobat Pro, contact the Learning Center and Disability Services for help.