Linking Complex Page References
Introduction
This tutorial shows how to automatically link complex page references in a PDF document using the AutoBookmark™ plug-in for the Adobe® Acrobat®. This tutorial explains how to add links to pages in external PDF documents based on the text pattern search. This tutorial explains how to add links to pages in external documents based on the text pattern search. It specifically focuses on linking a comma separated list of page references that follows a certain “signature” keyword or text pattern.
Here is a sample text and desired outcome:
Linking text and desired outcome
This is an advanced tutorial that requires the following knowledge:
  • Basic understanding of regular expressions and their application to text search
  • Basic knowledge of linking by text search with AutoBookmark plug-in
Prerequisites
You need a copy of the Adobe® Acrobat® along with the AutoBookmark™ plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoBookmark™ plug-in.
This tutorial uses "Plug-ins > Links > Generate Links > Generate Links By Text Search (multiple rules)" menu to perform linking by text search. This menu is supplied by AutoBookmark™ plug-in.
Linking a Single Page Reference
It is a trivial task to add links to page numbers that follow a specific keyword or text pattern. For example, if we need to add a link to the page reference that follows the “CRF Page” text (for example: "CRF Page 3" or "CRF Page 25"), then the linking expression is straight-forward:
Find text pattern: (?<=CRF Page )(\d+)
Link action: \1,file://CRFDocument.pdf,
Explanation of the text pattern:
(?<=CRF Page ) - this is a positive-lookbehind regular expression that matches "CRF Page " text, but excludes it from the matching string. This part of the expression is not going to be linked. It is used for the search only.
(\d+) - this regular expression matches any number of digits and is used to match a page number. By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows to refer to the matching text from the link action string. We can now refer to the matching page number by \1, since it is a first group in this regular expression.
Explanation of the link action:
If we want to link a specific page in the external PDF document, then we need to start with a page number that is followed by a file reference. We are using \1 to refer to the actual page number matched by the text search pattern. For example, for "CRF Page 3" text, the linking action will be file://CRFDocument.pdf,3.
If you need to link to a page in the same document, then simply omit the file:// keyword:
Find text pattern: (?<=CRF Page )(\d+)
Link action: \1
Since there is only one linking rule involved, it is possible to use "Plug-ins > Links > Generate Links > Generate Links By Text Search (single rule)" menu to perform the linking.
Linking a Page List
The situation becomes much more complex when we need to find and link a list of page references that follow a “signature” pattern. For example, we cannot use the same approach to link “CRF Page 5, 15, 23, 101” text. The single rule will only find and link page 5 and will miss the rest. We need to use multiple rule method to link pages 15, 23 and 101 to the corresponding pages. The reason why we have to use multiple linking expressions lies in the limitations of the regular expression syntax. The problem is that ‘look-behind’ text patterns needs to define a fixed-length pattern. Since the page list can have variable length due to different number of references and different number of digits in page numbers, it is not possible to describe all possible combinations with a fixed length expression. The approach is to provide a separate search rule for each case.
Here is a list of search expressions that cover various cases.
"x" is used to designate a single digit,
"Y" is used to designate a number with any number of digits that is being linked,
"z{m}" is used to designate any combination of digits, spaces or commas that has m characters.
Basically, "m" is a length of text that is allowed between "CRF Page " and the page number being linked.
(?<=CRF Page )(\d+) - to cover "CRF Page Y" case.
(?<=CRF Page \d{1}, )(\d+) - to cover "CRF Page x, Y" case.
(?<=CRF Page [\d, ]{2}, )(\d+) - to cover "CRF Page z{2}, Y" case.
(?<=CRF Page [\d, ]{3}, )(\d+) - to cover "CRF Page z{3}, Y" case.
(?<=CRF Page [\d, ]{4}, )(\d+) - to cover "CRF Page z{4}, Y" case.
....
(?<=CRF Page [\d, ]{20}, )(\d+) - to cover "CRF Page z{20}, Y" case.
Note that all these search patterns use the same linking action: \1,file://CRFDocument.pdf to perform the linking.
Let's show what specific text patterns are covered by one of the rules, for example (?<=CRF Page [\d, ]{4}, )(\d+):
Since the rule allows exactly 4 characters between "CRF Page " and comma that is followed by a space and a page number, then the following sample text will be covered by this rule (the 4 characters are shown in bold) :
  • CRF Page 2, 1, 75
  • CRF Page 22,1, 75
  • CRF Page 2,21, 75
  • CRF Page 2345, 75
In all cases listed above, the link will be added only to page "75".
Use "Plug-ins > Links > Generate Links > Generate Links By Text Search (multiple rules)" menu to perform the linking with multiple rules. You can download the settings file with sample rules shown in this tutorial and load them via "Load Settings" button located on "Generate Links By Text Search" dialog. Rename downloaded file by removing *.bin file extension (it is used for proper downloading only).