
"Advanced PDF Content Extraction"

By Leah Lothringer
PDF Store Support Team
Issue 8 for 2006

In previous weeks, we have discussed PDF extraction and explored some of the different ways to
obtain simple text and images from a PDF document. There is a wide range of PDF extraction tools
available and luckily, some of these are geared towards complex extraction tasks. For instance,
you may wish to extract sections of text from a PDF document so that the layout is preserved
when reused in a word processor. Authors and publishing professionals might need to extract
images to the vector file format and be able to control output resolution. It is possible to exercise
precise control when extracting content from PDF.
For precision text extraction, have a look at BCL Jade. This plug-in for Adobe Acrobat can
extract tabular data for use in MS Word, MS Excel and XML templates and maintains all character
and paragraph formatting (colors, fonts, superscripts, breaks, margins, indentations and more).
Consider PDF FLY if you wish to maintain visual quality and data integrity when extracting
images from PDF. This tool batch converts PDF, EPS and PostScript into both vector and raster
formats at any resolution and allows for optimization and customization during conversion.
Finally, pdf2cad is an excellent option for users working with PDF and AutoCAD. This
standalone, desktop tool converts PDF into the DXF file format, accurately reproducing lines,
shapes and text strings. Featuring a multitude of configuration settings, including cropping,
rotation, font name mappings and rendering.
These are just three of the many PDF extraction tools available from PDF Store - browse the
Create/Convert section of the website for a complete list of products.
|