PDF Store Planet PDF
PDF news, in-depth articles and tips.
Planet PDF
HomeShop by CategoryShop by VendorProduct SpotlightsPlanet PDFPlanet PDF ForumWeb Expo
PDF Spotlights View Cart Checkout Newsletters Help & Support Contact Login  
Create/Convert Edit/Prepare Split/Merge Stamping Prepress/Print Impose Forms Security Server Developer

PDF Spotlights

#1: Archiving with PDF

PDF Spotlights

PDF Spotlights are written to give you an idea of the sorts of things that are possible with PDF; each spotlight covers a major practical area of real-world PDF use. Check back in the coming months as we add more spotlights.

- The PDF Store Support Team.

Businesses around the world have for the past few years been using PDF as an archival format - not necessarily because they want a paperless office - but rather because they recognize the benefits of an electronic document archive, namely convenient accessibility to to anyone with permission. The PDF format is particularly popular because it preserves the exact look and feel of the original physical document.

Strengths

Longevity Paper documents age with time and use. Electronic documents don't. Even though programs written to view electronic documents do age, the basic specification of the PDF file format is in the public domain, and so the health of your electronic archive is not tied to the health of the company that developed the PDF viewer.

Preserves look and feel PDF was originally created to duplicate paper documents, and as such is second to none for preserving the physical character of the original source. PDF's support for multi-page documents is another advantage.

Metadata Metadata (information that describes a document, such as author, date created, keywords and so on) is one of the big advantages electronic documents have over paper. The latest version of PDF (1.5) includes support for XMP (the Extensible Metadata Platform); so while you still need to develop processes to identify the metadata you want and to ensure that that metadata is preserved, at least the format supports it, and supports it quite well in fact.

Searchable Properly-created PDF documents contain all their content in machine-readable form. This, combined with PDFs metadata capabilities, makes it possible to search a indexed PDF-based archive in ways simply impossible with a conventional paper-based archive.

Weaknesses

Bells and whistles While the basic features of PDF are appropriate for archival applications, some of the extended features of PDF are not - specifically, functionality that depends on extensions to the public-domain specification or external helper applications. For example, embedded video or audio files can be encoded using standards that are not part of the published PDF specification, which undermines the longevity of the document in which they are included. For this reason, care needs to be taken to ensure that PDFs created for archival purposes do not include unnecessary frills.

Getting down to business

Creation The first step in establishing a PDF-based archive is in acquiring the PDF documents themselves. If you are creating your PDFs from other document types then you'll need some form of creation or conversion application. Creation applications appear in the source program as an additional printer; to create a PDF document, the user simply prints the file using the PDF printer, and a PDF file is created. Metadata can then be added via document properties. Popular examples of PDF creation products are DocuCom PDF Driver, Amyuni PDF converter or Adobe Acrobat.

Scan and OCR The situation is a little different if you are converting physical paper to PDF. You will need to scan and OCR the documents, a process that involves scanning your physical documents to the TIFF image format, converting the TIFF images to PDF and then OCR'ing the PDF documents. This process is somewhat labour intensive if done manually but if automated the results are surprisingly good, if not quite perfect. Note: Image-based PDFs must be OCR'd if you intend to index them for searching. Applications suited to this process are AdLib eXpress and AdLib OCR.

Batch conversion Batch conversion While simple creation tools are useful, if you have large numbers of documents to convert then they tend to become unwieldy. An alternative choice would be one of the tools that allow batch-processing of documents. With one of these tools, all you need to do is specify the PDF creation settings you want, click ok and the tool starts crunching away while you go for lunch. Typically products that support batch processing also offers 'hot' folder functionality - any file copied to a hot folder is automatically converted to PDF using whatever settings you specified when you set up the folder. A popular application that falls into this category is AdLib eXpress.

Using your archive

Searching PDFs and PDF-based archives are searchable. If you open a PDF document then you can search for text within this document, but if you create an index of your PDF archive then you can search your entire archive of PDFs without having to open each PDF individually. An indexed PDF archive offers the option of having your PDF archived available for searching on networks, web-sites, CDs or DVDs. A popular application that falls into this category is ARTS PDF Search.

Conclusions

The PDF file format has a number of features that make it a good choice for archival applications, even in situations that don't involve setting up a full-blown indexed, searchable document repository. Hopefully this discussion has given you a good first impression of the possibilities associated with PDF archives; if you have any questions please feel free to get in touch.

Find PDF Software

The best and broadest range of PDF software

more searching options...

Get Nitro PDF Professional
Download Nitro PDF Pro


Need Advice?

Free advice on workflow, installation, features, compatibility, anything.

PDF Store Top 5

  1. Nitro PDF™ Professional
  2. ARTS PDF Aerialist
  3. Pitstop Professional
  4. Quite Imposing Plus
  5. ARTS PDF Crackerjack
PDF Store