Is this newsletter not displaying properly? Click here to read it online.

PDF StorePDF Store


"OCR and Image-Only PDFs"

By Rowan Hanna
PDF Store Support Team
Issue 23 for 2005


When it comes to PDF Extraction, it's crucial to know what kind of source PDF you will be using. PDF is a huge format, and it comes in several different "flavors": text only, image plus text, and image only.

Image only PDFs can be created by imaging applications or scanning. In such files, text may not be recognized: while the resulting PDFs look like the printed originals, they are in fact flat images. This can be sufficient for some purposes, but if you want to select, search or extract text, then you will need to perform an Optical Character Recognition (OCR). OCR is the process of comparing images on screen with characters in a database to determine which shapes represent text.

With Acrobat's Paper Capture plug-in, it's possible to perform an OCR and add an invisible layer of text (known as "hidden text") to the image PDF. In effect, this makes it an image plus text PDF document. The AdLib OCR Add On for the AdLib eXpress range takes this approach, handling large volumes of such documents. Gemini, on the other hand, boasts a character mapping facility that can be used to convert image only PDFs into a variety of editable formats such as HTML and RTF.

These are just some of the many tools available from PDF Store's range of PDF Extraction products.

Create Web Page PDFs in Acrobat
By Dan Shea
Often, you can find yourself wanting to keep an offline record of a web page: you may be traveling, and won't have internet access, or perhaps you need a static copy of the page for your archives. Either way, Acrobat's web capture feature can be an invaluable addition to your bag of tricks. This PDF tip explains.

Featured Products

PDFlib

PDFlib:
A widely used programming library which allows the programmer to generate PDF and integrate this ability into any application or server environment. PDFlib is available for all major operating environments and development environments.
$450.00 - Buy Now


Locklizard Safeguard
$2495.00 - PDF Security

ARTS PDF Aerialist
$379.00 - Edit & Prepare PDF
PDF Store Top 5

1 Nitro PDF Professional
2 Pitstop Professional
3 XpdfViewer
4 Quite Imposing Plus
5 ARTS PDF Tools
Sponsoring this issue

ADVERTISEMENT

Unsubscribe from this newsletter. Edit your subscription.

Address: Planet PDF, Level 3 370 Little Bourke Street, 3000, Victoria, Australia
Copyright: © Planet PDF 2005. All rights reserved. Planet PDF, PDF Store, ARTS PDF and ARTS PDF Global Services are all divisions of BinaryThing.