You receive a scanned contract. You open it, try to select some text β and nothing happens. You hit Ctrl+F to search for a clause β no results. The document looks fine, but as far as your computer is concerned, itβs just a photograph.
This is one of the most frustrating things about working with scanned PDFs. OCR is the solution.
What is OCR?
OCR stands for Optical Character Recognition. Itβs a technology that looks at an image of text β like a scanned page β and figures out what the characters actually are, converting the visual image into real, machine-readable text.
Think of it like this: a scanned PDF is a photograph of a document. OCR reads that photograph and says, βThat squiggle is an βAβ, that one is a βBββ¦β β reconstructing the text character by character.
Once OCR has run, the output PDF looks identical to the original scan, but now has an invisible text layer underneath. That text layer is what makes search and copy-paste work.
When do you need OCR?
You need OCR when your PDF was created by scanning a physical document rather than being exported from software. Common cases include:
- Scanned contracts or agreements β signed documents that were scanned back in
- Old records β archived documents, invoices, or receipts scanned from paper
- Photographed documents β pages captured on a phone camera or flatbed scanner
- Faxed documents β faxes are often saved as image-based PDFs
- Books and academic papers β older published material scanned from print
If you can already highlight and copy text in your PDF, it doesnβt need OCR β a text layer already exists.
How to make a scanned PDF searchable for free
PDForgeβs OCR tool runs entirely in your browser. Your file never leaves your device β the entire recognition process happens locally.
- Go to the OCR PDF tool
- Drop your scanned PDF onto the upload area
- Select the language of the document
- Choose your output format β Searchable PDF or Plain Text
- Click Run OCR
- Download your result
The searchable PDF output preserves your original scan exactly as it looks, with an invisible text layer added on top. Open it in any PDF viewer and Ctrl+F works immediately.
Turn your scanned PDF into a searchable document β free, private, no upload needed.
Run OCR βSearchable PDF vs Plain Text β which should you choose?
- Searchable PDF β the document looks exactly the same as your original scan, but text is now selectable, searchable, and copy-pasteable. Best when you want to keep the original layout and appearance.
- Plain Text (.txt) β just the extracted text, with no images or formatting. Best when you only need the words β for example, feeding content into another tool, searching a large batch of documents, or copy-pasting into a word processor.
For most people, Searchable PDF is the right choice.
What languages does OCR support?
The OCR tool supports 12 languages out of the box: English, Spanish, French, German, Portuguese, Italian, Chinese (Simplified), Japanese, Korean, Arabic, Hindi, and Russian.
Always select the correct language for your document. Choosing the wrong language dramatically reduces accuracy β OCR relies on language-specific character patterns and dictionaries to improve recognition.
Tips for better OCR accuracy
OCR is not magic β the quality of the output depends heavily on the quality of the input. A few things that make a big difference:
- Scan at 150 DPI or higher β low-resolution scans produce blurry text that OCR struggles with. 300 DPI is ideal for most documents.
- Straight pages β if the scan is rotated or skewed, accuracy drops. Most scanners auto-correct this, but photos taken by hand often donβt.
- Clean originals β coffee stains, heavy shadows, and crinkled pages all hurt recognition quality.
- Good contrast β black text on white paper works best. Very light prints or faded documents are harder to read.
- Standard fonts β printed text is recognised much more reliably than handwriting.
OCR accuracy on a clean 300 DPI scan of printed text is typically 98β99%. On a blurry phone photo, it can drop below 80%.
Is OCR the same as editing a PDF?
Not quite. OCR makes text readable by machines β searchable and selectable β but the text layer is invisible and sits on top of the image. Youβre not editing the original document; youβre adding a layer that enables search and copy.
If you want to actually edit the text content, youβd need to convert the OCR output to a Word document first.
Is your file private?
Yes. PDForge processes everything locally in your browser using Tesseract.js, the open-source OCR engine. Your PDF bytes never leave your device β no server receives your file, and nothing is stored anywhere.
Other tools you might find useful
- Compress PDF β reduce the file size of your scanned PDF after OCR
- Merge PDF β combine multiple scanned documents into one searchable file
- Split PDF β extract specific pages from a large scanned document
- PDF to Word β export the text as an editable Word document
OCR is one of those tools you donβt think about until you desperately need it. When that moment comes, the process should be fast, free, and private.