What is OCR? How to Make a Scanned PDF Searchable

You receive a scanned contract. You open it, try to select some text — and nothing happens. You hit Ctrl+F to search for a clause — no results. The document looks fine, but as far as your computer is concerned, it’s just a photograph.

This is one of the most frustrating things about working with scanned PDFs. OCR is the solution.

What is OCR?

OCR stands for Optical Character Recognition. It’s a technology that looks at an image of text — like a scanned page — and figures out what the characters actually are, converting the visual image into real, machine-readable text.

Think of it like this: a scanned PDF is a photograph of a document. OCR reads that photograph and says, “That squiggle is an ‘A’, that one is a ‘B’…” — reconstructing the text character by character.

Once OCR has run, the output PDF looks identical to the original scan, but now has an invisible text layer underneath. That text layer is what makes search and copy-paste work.

When do you need OCR?

You need OCR when your PDF was created by scanning a physical document rather than being exported from software. Common cases include:

Scanned contracts or agreements — signed documents that were scanned back in
Old records — archived documents, invoices, or receipts scanned from paper
Photographed documents — pages captured on a phone camera or flatbed scanner
Faxed documents — faxes are often saved as image-based PDFs
Books and academic papers — older published material scanned from print

If you can already highlight and copy text in your PDF, it doesn’t need OCR — a text layer already exists.

How to make a scanned PDF searchable for free

PDForge’s OCR tool runs entirely in your browser. Your file never leaves your device — the entire recognition process happens locally.

Go to the OCR PDF tool
Drop your scanned PDF onto the upload area
Select the language of the document
Choose your output format — Searchable PDF or Plain Text
Click Run OCR
Download your result

The searchable PDF output preserves your original scan exactly as it looks, with an invisible text layer added on top. Open it in any PDF viewer and Ctrl+F works immediately.

Turn your scanned PDF into a searchable document — free, private, no upload needed.

Run OCR →

Searchable PDF vs Plain Text — which should you choose?

Searchable PDF — the document looks exactly the same as your original scan, but text is now selectable, searchable, and copy-pasteable. Best when you want to keep the original layout and appearance.
Plain Text (.txt) — just the extracted text, with no images or formatting. Best when you only need the words — for example, feeding content into another tool, searching a large batch of documents, or copy-pasting into a word processor.

For most people, Searchable PDF is the right choice.

What languages does OCR support?

The OCR tool supports 12 languages out of the box: English, Spanish, French, German, Portuguese, Italian, Chinese (Simplified), Japanese, Korean, Arabic, Hindi, and Russian.

Always select the correct language for your document. Choosing the wrong language dramatically reduces accuracy — OCR relies on language-specific character patterns and dictionaries to improve recognition.

Tips for better OCR accuracy

OCR is not magic — the quality of the output depends heavily on the quality of the input. A few things that make a big difference:

Scan at 150 DPI or higher — low-resolution scans produce blurry text that OCR struggles with. 300 DPI is ideal for most documents.
Straight pages — if the scan is rotated or skewed, accuracy drops. Most scanners auto-correct this, but photos taken by hand often don’t.
Clean originals — coffee stains, heavy shadows, and crinkled pages all hurt recognition quality.
Good contrast — black text on white paper works best. Very light prints or faded documents are harder to read.
Standard fonts — printed text is recognised much more reliably than handwriting.

OCR accuracy on a clean 300 DPI scan of printed text is typically 98–99%. On a blurry phone photo, it can drop below 80%.

Is OCR the same as editing a PDF?

Not quite. OCR makes text readable by machines — searchable and selectable — but the text layer is invisible and sits on top of the image. You’re not editing the original document; you’re adding a layer that enables search and copy.

If you want to actually edit the text content, you’d need to convert the OCR output to a Word document first.

Is your file private?

Yes. PDForge processes everything locally in your browser using Tesseract.js, the open-source OCR engine. Your PDF bytes never leave your device — no server receives your file, and nothing is stored anywhere.

Other tools you might find useful

Compress PDF — reduce the file size of your scanned PDF after OCR
Merge PDF — combine multiple scanned documents into one searchable file
Split PDF — extract specific pages from a large scanned document
PDF to Word — export the text as an editable Word document

OCR is one of those tools you don’t think about until you desperately need it. When that moment comes, the process should be fast, free, and private.