Is this newsletter not displaying properly? Click here to read it online.

PDF StorePDF Store


"Working with image-only PDFs"

By Leah Lothringer
PDF Store Support Team
Issue 17 for 2006


When it comes to PDF extraction, there is some confusion over what an image only PDF is. It is crucial to know what kind of source PDF you will be using when attempting to extract data. PDF is a huge format, and it comes in several different "flavors": text only, image plus text, and image only.

Image only PDFs can be created by an imaging application or scanning a document directly into a new PDF. In such files, text is not recognised as individual letters but rather a single flat image. This can be sufficient for some purposes, but if you want to select, search or extract text, then you will need to perform an Optical Character Recognition (OCR). OCR is the process of comparing images on screen with characters in a database to determine which shapes represent text. Over at Planet PDF, Ernest Svenson expands on the benefits and complexities of OCR technology in OCR, PDFs, and bates-numbered documents.

With Acrobat's Paper Capture plug-in, it's possible to perform an OCR and add an invisible layer of text (known as "hidden text") to the image PDF. In effect, this makes it an image plus text PDF document. Comparatively, Gemini boasts a character mapping facility that can be used to convert image only PDFs into a variety of editable formats such as HTML and RTF.

For automating OCR on an unlimited number of PDF files, take a look at AutoCaptureX4. Hot folder support is included.

These are just some of the many tools available from PDF Store's range of PDF Create/Convert and Edit/Prepare products.

Copy images in Acrobat 7
By Dan Shea
Acrobat allows users to copy and re-use images from PDF documents when this is not specifically prohibited by the file's security settings.

Featured Products

Acrobotics. Automation for PDF.

PDF Multi Print Server:
Batch Print hundreds or even thousands of PDF files without interruption (no pop up or dialog boxes). PDF Multi Print Server contains both a graphical user interface, as well as a command line. FDF and XFDF printing is supported. A desktop version is also available.
$495.00 - Buy Now


IntelliPDF Print Bookmarks
$99.00 - Edit & Prepare PDF - Buy Now

Pitstop Professional
$599.00 - PDF Prepress & PDF Print - Buy Now
PDF Store Top 5

1. Nitro PDF Professional
2. ARTS PDF Aerialist
3. PDFlib
4. ARTS PDF Split & Merge
5. ARTS PDF Crackerjack
Sponsoring this issue

ADVERTISEMENT
PDFlib TET

Unsubscribe from this newsletter. Edit your subscription.

Address: Planet PDF, Level 3 370 Little Bourke Street, 3000, Victoria, Australia
Copyright: © Planet PDF 2006. All rights reserved. Planet PDF, PDF Store, ARTS PDF and ARTS PDF Global Services are all divisions of BinaryThing.