Splitting PDF Invoices Into Multiple Folders

AutoSplit Pro plug-in for Adobe® Acrobat®

Introduction
It is very common problem to split a PDF document that contains multiple invoices or statements. The AutoSplit™ Pro software can split a PDF documents that contain variable-length invoices, account statements, reports into separate PDF files. The following tutorial is going to show how to take a PDF file with multiple invoices and split it into multiple output folders based on invoice "billing type". Each output PDF file will contain only a single invoice.
Splitting invoices into multiple folders
Input Document Description
The input PDF document contains multiple invoices of variable length. The goal is to split it into multiple documents that contain each invoice as a separate file and name it, using a text from the first page of each document, using invoice numbers for file names. Each invoice should be saved into proper sub-folder according to its "billing type". Each invoice has one of the 3 different labels - COD, CORPCOD and CORPORATE. Splitted invoices should be placed into a proper sub-folder (COD, CORPCOD and CORPORATE) according to these keywords.
Splitting Approach
It is common that each invoice has its separate page numbering, typically in a form of "Page N of M" text pattern. The easiest method to split such documents is to use "Page 1 of " or “1 of ” text as a "separator". Since this text always occurs on the first page of each invoice, it is natural to use it as a reliable separator. Using “Page 1 of” is the very common approach for splitting invoices or statements.
Prerequisites
You need a copy of Adobe Acrobat Standard or Professional along with AutoSplit Pro plug-in installed on your computer in order to use this tutorial. You can download trial versions of both Adobe Acrobat and AutoSplit Pro.

Step-by-Step Tutorial

Step 1 - Open a PDF File
Start Adobe® Acrobat® application and open a PDF file using “File > Open…” menu to open a PDF document that needs to be processed.
Step 2 - Open “Split Document Settings” Menu
Select “Plug-ins > Split Document…” from the main Acrobat® menu to open “Split Document Settings” dialog.
[⚡ How to locate Plugins menu ⚡].
Step 3 - Select Splitting Method
Check “Use separator” box, select “Page With Matching Text” from the list of available options. Next, press “Options…” button.
Select Page with Matching Text splitting option
Step 4 - Configure Splitting Parameters
Enter “1 of” into “Find what” box. The document will be split at pages that contain “1 of …” text. This is common way to detect a first page of each invoice since it often contains "Page 1 of X" labels. Check “Search for text only inside a specified area on the page” box to limit text search to a specific page area.
Press “Edit Area…” button to define a page area where to look for the text.
Enter search text into Find what box and select page area
Step 5 - Define Text Location On the Page
Define page area where separator text (1 of ...) is located by drawing a box around it. Try to select an area that does not include any other text. Use "Zoom" tool to enlarge part of the page for a more precise selection.
Click “OK” once done.
Select page area where to extract text
Step 6 - Confirm Split Method Settings
Press “OK” button in the “Separator Page by Text Search” dialog to save settings.
Step 7 - Specify Output File Naming
Set "Name prefix:" to "No Prefix" option. Make sure "Base filename" box is empty and there are no any entries in "Append to name:" list.
Press “Add…” to start defining an output naming scheme.
Define output file naming scheme
Step 8 - Start Defining an Output Naming Scheme
Now your actions depend on where exactly you want to extract files from the input document. The goal is to separate output documents into multiple sub-folders based on text present on the first page of each document. The output file path will be constructed entirely based on the document content. This method allows to "sort" output files by "billing type" into different folders.
Press “Add…” to start defining an output naming scheme.
Select “Custom Text”, and click “Next”.
Select Custom Text option and press Next button
Step 9 - Type an Output Folder Path
Type C:\Invoices\ into Custom Text: entry box. This is going to be a root folder where all other sub-folders will be placed. You can type any other output folder path according to your project requirements. Make sure it ends with “\”. Click “OK” button.
Type root folder location terminated with a backslash
Step 11 - Define a Naming Scheme for Sub-Folders
Now you should see a new entry in the "Append to name:" list. Press “Add…” button one more time.
Press Add button one more time to specify naming for sub-folders
Step 12 - Select Text From Location Option
Now your actions will depend on how exactly you want to extract text from the document for the output sub-folder. The tutorial assumes that text is going to be extracted from a fixed location on the first page of each output document.
Select "Text From Location" option and press "Next" button to advance to the next screen.
Select Text From Location option
Step 13 - Define Page Area Where To Extract Text
Position mouse at the page area, where you want to extract text for the sub-folder name and press and hold left mouse button.
Draw a box around the area, where invoice type is located, and release mouse button. Now you have defined an area for the text extraction. If you made a mistake simply do it again.
Press “OK” button once done.
Select area on the page where to extract text
Step 14 - Add Backslash After Folder Name
Now you should see a second entry in the "Append to name:" list. Press "Add..." button again to add a backslash character after a folder name.
Press Add button
Step 15 - Select Custom Text Option
Select “Custom Text”, and click “Next” button.
Select Custom Text option
Step 16 - Type a Backslash
Type \ (backslash) into "Custom Text" entry box. This is a closing symbol for the folder in the output path. Press “OK” button.
Type a backslash into Custom Text box
Step 17 - Define a Naming Scheme for Files
Now you should see a third entry in the "Append to name:" list. Press "Add..." button again. Now we have to create a file name for each output document.
Press Add button one more time
Step 18 - Select Text From Location Option
We will assume that filename (without extension) is also extracted from the first page of each output document as it is often the case.
Select "Text From Location" option and press "Next" button.
Select Text From Location option
Step 19 - Define Page Area For Extracting File Name
Now position mouse at the page area where you want to extract text for the filename (where invoice number is located) and press and hold left mouse button.
Draw a box around the area and release mouse button. Now you have defined an area for the text extraction. If you made a mistake simply do it again.
Press “OK” button once done.
Draw a box around invoice number
Step 20 - Add File Extension
Now you should see a forth entry in the "Append to name:" list. Next step is to add a file extension to the filename.
Press "Add" button one more time.
Press Add button one more time to add file extension
Step 21 - Select Custom Text Option
Select “Custom Text”, and click “Next”.
Select Custom Text Option
Step 22 - Type “.pdf” File Extension
Type .pdf into "Custom Text" entry box and press “OK” button.
Type .pdf into Custom Text box
Step 23 - Save Profile (Optionally)
Now you should see a fifth entry in the "Append to name:" list. We have completed configuring a file naming scheme for saving files into sub-folders. Both sub-folder and file names will be extracted from a specified locations on first page of each output document. If certain folders specified by the path do not exist, then they will be automatically created. Exercise caution while designing the file naming scheme since it may cause creating a large number of unwanted folder if settings are incorrect.
IMPORTANT: You have to make sure that text used to create folders and file names (while using this method) conforms to the Windows file naming rules and restrictions. The text used to makeup folder names should not contain:
  • "*?<>:|.
One of the common problems is inclusion of period (.) and colon (:) symbols into the folder names. This is not allowed by Windows file naming convention.
The text used to create file names should not contain the following characters:
  • \"/*?<>:|
If the output file path for one of the output files contains invalid symbol(s), then you will get the following error message. Review the file naming scheme and text extraction areas to make sure only correct text is used. Consider using "extract text by search" instead of "text from location" method to make sure no invalid symbols are used.
Error message displayed if file path contains incorrect symbols
Press "Save Profile..." button to save splitting configuration into APR file for future reuse. You can later restore this exact splitting configuration by using “Load Profile…” button and selecting a previous saved file.
Optionally, press Save Profile button to save settings into a file
Step 24 - Confirm Settings
No need to specify an output folder via "Browse..." button, because it is going to be assembled from various naming parts specified in “Append to name” list.
Click "OK” button to proceed.
Press OK button to proceed to splitting
Step 25 - Start Splitting Process
Click “OK” in the dialog box to start the process.
Press OK button to confirm
Step 26 - Examine Output Files
The “AutoSplit Results” dialog appears on screen once the processing is completed.
The “Results” dialog shows a complete list of output files (without full path) that were created during the processing. The AutoSplit plug-in has splitted each invoice as a separate PDF file, named it by the invoice number, sorted and saved according to its "billing type" into a proper sub-folder.
Inspect output files
All sub-folders are going to be created automatically at the time of the processing (c:\Invoices\COD, …\CORPCOD, …\CORPORATE).
Output folders created
Click here for a list of all step-by-step tutorials available.