Linking Complex Page References
Introduction
This tutorial shows how to automatically link complex page references in a PDF document using the AutoBookmark™ plug-in for the Adobe® Acrobat®. This tutorial explains how to add links to pages in external PDF documents based on the text pattern search. This tutorial explains how to add links to pages in external documents based on the text pattern search. It specifically focuses on linking a comma separated list of page references that follows a certain “signature” keyword or text pattern.
Here is a sample text and desired outcome:
Linking text and desired outcome
This is an advanced tutorial that requires the following knowledge:
  • Basic understanding of regular expressions and their application to text search
  • Basic knowledge of linking by text search with AutoBookmark plug-in
Prerequisites
You need a copy of the Adobe® Acrobat® along with the AutoBookmark™ plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoBookmark™ plug-in.
This tutorial uses "Plug-ins > Links > Generate Links > Generate Links By Text Search (multiple rules)" menu to perform linking by text search. This menu is supplied by AutoBookmark™ plug-in.
Linking a Single Page Reference
It is a trivial task to add links to page numbers that follow a specific keyword or text pattern. For example, if we need to add a link to the page reference that follows the “CRF Page” text (for example: "CRF Page 3" or "CRF Page 25"), then the linking expression is straight-forward:
Find text pattern: (?<=CRF Page )(\d+)
Link action: \1,file://CRFDocument.pdf,
Explanation of the text pattern:
(?<=CRF Page ) - this is a positive-lookbehind regular expression that matches "CRF Page " text, but excludes it from the matching string. This part of the expression is not going to be linked. It is used for the search only.
(\d+) - this regular expression matches any number of digits and is used to match a page number. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows to refer to the matching text from the link action string. We can now refer to the matching page number by \1, since it is a first group in this regular expression.
Explanation of the link action:
If we want to link a specific page in the external PDF document, then we need to start with a page number that is followed by a file reference. We are using \1 to refer to the actual page number matched by the text search pattern. For example, for "CRF Page 3" text, the linking action will be file://CRFDocument.pdf,3.
If you need to link to a page in the same document, then simply omit the file:// keyword:
Find text pattern: (?<=CRF Page )(\d+)
Link action: \1
Since there is only one linking rule involved, it is possible to use "Plug-ins > Links > Generate Links > Generate Links By Text Search (single rule)" menu to perform the linking.
Linking a Page List
The situation becomes much more complex when we need to find and link a list of page references that follow a “signature” pattern. For example, we cannot use the same approach to link “CRF Page 5, 15, 23, 101” text. The single rule will only find and link page 5 and will miss the rest. We need to use multiple rule method to link pages 15, 23 and 101 to the corresponding pages. The reason why we have to use multiple linking expressions lies in the limitations of the regular expression syntax. The problem is that ‘look-behind’ text patterns needs to define a fixed-length pattern. Since the page list can have variable length due to different number of references and different number of digits in page numbers, it is not possible to describe all possible combinations with a fixed length expression. One approach is to provide a separate search rule for each case.
Here is a list of search expressions that cover various cases.
"x" is used to designate a single digit,
"Y" is used to designate a number with any number of digits that is being linked,
"z{m}" is used to designate any combination of digits, spaces or commas that has m characters.
Basically, "m" is a length of text that is allowed between "CRF Page " and the page number being linked.
(?<=CRF Page )(\d+) - to cover "CRF Page Y" case.
(?<=CRF Page \d{1}, )(\d+) - to cover "CRF Page x, Y" case.
(?<=CRF Page [\d, ]{2}, )(\d+) - to cover "CRF Page z{2}, Y" case.
(?<=CRF Page [\d, ]{3}, )(\d+) - to cover "CRF Page z{3}, Y" case.
(?<=CRF Page [\d, ]{4}, )(\d+) - to cover "CRF Page z{4}, Y" case.
....
(?<=CRF Page [\d, ]{20}, )(\d+) - to cover "CRF Page z{20}, Y" case.
Note that all these search patterns use the same linking action: \1,file://CRFDocument.pdf to perform the linking.
The multiple-rule scenario illustrated above is not really practical for the long lists. We have to write multiple rules that are getting more and more complex with each element. Fortunately, the AutoBookmark offers a special feature called “search context” that greatly simplify adding links in these kind of cases. It allows using just a single rule that is easy to understand:
Find text pattern:(\d+)
Link action: \1,file://CRFDocument.pdf
Search context: CRF Page [\d, \-]+
The above settings will search for a text string "CRF Page" that is followed by a list of numbers. Numbers can be separated by a comma or a dash. For example: CRF Page 2, 10, 12-15. The search conditions are defined in the “search context” string. Next, the "find text" pattern is going to be applied only to the text string that matches the context. In our example, the "find text" pattern will look just for the numbers (\d+). The linking action will be added for every number and it will point to a corresponding page in the CRFDocument.pdf file. If this rule is applied to CRF Page 2, 10, 12-15 text string, then the following links will be created:
  • "2" will get a link that points to page 2 in CRFDocument.pdf
  • "10" will get a link that points to page 10 in CRFDocument.pdf
  • "12" will get a link that points to page 12 in CRFDocument.pdf
  • "15" will get a link that points to page 2 in CRFDocument.pdf
The complexity of multiple rules is gone. The single rule can handle a page list without any limitations. Here is a screenshot of the "Generate Links By Text Search" dialog that shows the rule definition:
Search rule settings