Splitting PDF Documents By Keywords

AutoSplit Pro plug-in for Adobe® Acrobat®

Introduction
It is often necessary to split a PDF document at pages that contain specific keywords. The AutoSplit™ software allows to search a PDF document and check every page for presence of the user-specified keywords. If at least one keyword appears on the page, then it is marked as a splitting page. The document will be split at these pages and multiple PDF documents will be created.
Sample Document Description
The sample PDF document we are going to use in this tutorial contains 20 pages with Bates numbers from ABC-200001 to ABC-200020 in the lower right corner of each page. The goal is to split PDF document at pages with a specific Bates numbers from a user-specified list and name each output PDF document using a corresponding Bates number.
Splitting by keywords
Splitting Approach
We are going to use the "Page with Keywords From List" option to split the PDF document at pages that contain following Bates numbers (keywords): ABC-200001, ABC-200004, ABC-200005, ABC-200008, ABC-200012. The PDF document will be split at pages that contain any of these Bates numbers (keywords).
Prerequisites
You need a copy of the Adobe® Acrobat® along with the AutoSplit™ Pro plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoSplit™ Pro plug-in.

Step-by-Step Tutorial

Step 1 - Open a PDF File
Start Adobe® Acrobat® application and open a PDF file using “File > Open…” menu to open a PDF document that needs to be processed.
Step 2 - Open “Split Document Settings” Menu
Select “Plug-ins > Split Document…” from the main Acrobat® menu to open “Split Document Settings” dialog.
[⚡ How to locate Plugins menu ⚡].
Step 3 - Select Splitting Method
Check the "Use separator:" box and select "Page With Keywords From List" from the menu.
Click "Options..." to open the "Specify List of Keywords" dialog.
Select Page with Keywords from List option from Use Separator menu
Step 4 - Specify List of Keywords
Enter the split keywords in the text field, one keyword per line. In this example, the following Bates numbers are entered into the keyword list: ABC-200001, ABC-200004, ABC-200005, ABC-200008, ABC-200012.
Optionally, check the "Match text case" option to perform case-sensitive search. Check the "Match whole words" option to match only whole words and ignore partial match when a keyword appears as a part of another word. Click "OK" once done.
Enter keywords one on each line
Step 5 - Specify Output File Naming
Next, we are going to configure AutoSplit to use Bates numbers as output file names. Bates number text pattern is going to be extracted from each output document text and used as part of the file name.
Press the "Add" button in the "Output Naming and Destination" section.
Press Add button
Select "Text By Search" option. Click "Next>>".
Select Text by Search option
Enter the following search expression ABC-\d{6}\b into the "Find what" box. This expression is looking for ABC- followed by exactly 6 digits. Press "?..." button if you want to learn more about search expression syntax.
Press the "OK" button once done.
Enter a search pattern into Find What box
Step 6 - Specify An Output Folder
Specify an output folder via the "Browse..." button. Click "OK" to proceed.
Press Browse button to select an output folder
Step 7 - Start Splitting Process
Click "OK" in the dialog box to start the splitting process.
Start processing
Step 8 - Examine The Results
The "AutoSplit Results" dialog appears on screen once the processing is completed. It shows a list of files that have been created. Click "Open Output Folder" to inspect the results.
Examine output files
Step 9 - Inspect Output Files
The AutoSplit™ plug-in has split the input PDF document at pages with specific Bates numbers and created 5 output PDF documents:
  • The first document with pages from ABC-200001 to ABC-200003.
  • The second document with page ABC-200004 only.
  • The third document with pages from ABC-200005 to ABC-200007.
  • The fourth document with pages from ABC-200008 to ABC-200011
  • The fifth document with pages from ABC-200012 to the last page of the input PDF document.
Files in the output folder
Click here for a list of all step-by-step tutorials available.