Google Announces Optical Character Recognition for PDFs

Google Announces Optical Character Recognition for PDFs

Google has been indexing PDF documents for quite some time now, but a recent post on the official Google Blog announces that they are now able to read scanned documents and convert image text into crawlable text to better understand and rank the content of documents.

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

Read some more about it at the Google Blog.

E-Web Marketing
Follow us

E-Web Marketing

We’ve been in the digital marketing field for over 18 years and worked with hundreds of Australian (and international) businesses to grow their web presence. Specialising in SEO, search ads (PPC), social media, content marketing, email marketing and conversion rate optimisation.
E-Web Marketing
Follow us

Latest posts by E-Web Marketing (see all)

E-Web Marketing

<p>We’ve been in the digital marketing field for over 18 years and worked with hundreds of Australian (and international) businesses to grow their web presence. Specialising in SEO, search ads (PPC), social media, content marketing, email marketing and conversion rate optimisation.</p>

No Comments

Post a Comment

Comment
Name
Email
Website