Overcoming External Hyperlink and Bookmark Restrictions in the ISO PDF/A Standard for Archived Documents

By Jean Ouellette | March 25, 2009

[On May 14, 2010, the following edits were made to this post, originally published on March 24, 2009]

Many organizations need to preserve documents for a long period of time, for example, decades or more. The challenge with this need is that hardware and software technologies continually evolve and may become incompatible with the documents preserved or archived. Fortunately, the International Organization for Standardization (ISO) defined a standard a number of years ago, for long term document archival based on the PDF format called PDF/A or PDF/A-1 (ISO 19005-1:2005). The ISO worked with representatives from government, industry, academia and Adobe to define this PDF/A standard.

A key feature of the PDF/A standard is that documents must be 100% self-contained. That is, all of the information necessary for displaying the document must be contained within the single file. This restriction is significant since, today, when creating a document, it’s very common to reference other documents using hypertext links. [Added] This includes embedding all fonts used for displaying content and using XMP metadata to embed information about the file’s attributes. I understand this restriction is logical and necessary, but it makes it difficult to satisfy the PDF/A requirements. [Added] There are a number of things that PDF/A files may not include. Some of these are: Encryption; multimedia; JavaScript; and LZW compression.

Adlib Express now includes the following capabilities:

  • Conversion of any  supported file typeto PDF/A-1b
  • Conversion of existing PDF files to PDF/A-1b
  • Verification of PDF/A-1b compliancy

One approach to meeting the standard is to remove all external hyperlinks and bookmarks from the documents. This, in my opinion, would not be viable in most cases since it may remove important content. An alternative is to directly embed all referenced documents within the main document. This is the approach taken by Adlib Express, it can automatically merge documents and converts the external hyperlinks and bookmarks to internal links since they can now be references within the same document. This approach achieves ISO’s “100% self-contained” requirement, enabling the resulting PDF/A to be compliant. Check out this link for more details on how Express can help support your document archival processes.

I invite you to comment on PDF/A as well as on what I proposed above. Let’s talk.

Don’t forget to share this post