AutoSplit™
/ AutoSplit Pro™
PDF
Document Splitting And Merging Plug-In for Adobe®
Acrobat®
Users
Guide
version 2.8
Copyright © 2003-2008 EverMap Company LLC.
http://www.evermap.com
Revision
Date: January
24, 2008
Getting Started
1. 1. System
Requirements and Compatibility
1. 2. What
is AutoSplit plug-in?
1. 3. Commercial
and Evaluation Versions
1. 4. Installation
Details
Functionality
2. 1. Software
Functionality
2. 2. Splitting
Functionality Overview
2.
3. Merging Functionality
Overview
2. 4. Extracting PDF Packages Overview
2. 5. Batch Processing Support Overview.
Splitting Documents: Step-by-Step Procedures
3. 1. Splitting
the document
3. 2. Output
Documents Options
3. 3. Output
Documents Naming
3. 4. Security
Settings
3. 5. Configuring
Splitting Modes
3.
5. 1. Configuring "Equal
size documents of N Pages" mode
3. 5. 2. Configuring "Use
Bookmark Tree" mode
3. 5. 3. Configuring "Use
Separator" mode
3. 5. 4. Configuring "Use
Manually Defined Page Ranges"
3. 5. 4. 1. Defining Pages
Ranges
3. 5. 4. 2. Splitting by
Text
3. 5. 4. 3. Splitting by Element
Content
3. 5. 4. 4. Email Output Documents
3. 6. Split by Text:
Extracting Pages Using Keywords and Text Patterns
3. 7. Batch
Processing
3.7.1.
Creating
Document Splitting Batch Procedure
3.7.2. Running
Document Splitting Sequence
3.7.3. Starting
Document Splitting Sequence From Command Line Prompt
3. 8.
Executing
Merging Documents: Step-by-Step Procedures
4. 1. Merging
Multiple Documents
4. 2. Merging
Documents by Appending Pages
4. 3. Merging
Documents by Interleaving Pages
4. 4. Merging Content of the Folder
4. 5. Batch Processing
Technical Support
5. 1. Why
plug-in might not load?
5. 2. Contacting
technical support
|
|
1. 1. System Requirements and Compatibility
- Microsoft Windows NT/ME/XP/Vista
- Full commercial version of Adobe Acrobat 5.0 / 6.0 / 7.0 / 8.0.
- This plug-in will not work with the free Adobe® Acrobat® Reader.
1. 2. What is AutoSplit plug-in?
AutoSplit plug-in combines industry standard document splitting and merging methods as well as unique features for content based splitting and page extraction. Plug-in splits a document into multiple files based on text or element content, page count, bookmarks, and page ranges. Users can extract pages from PDF documents that contain specific keywords or match text patterns. This functionality is leveraging PCRE regular expression engine (Copyright University of Cambridge) - that is also used for text processing in popular Perl language. Another powerful option is ability to extract pages that contain specific elements such as forms, images, various types of links, text notes, rubber stamps, highlighting and etc. PDF documents can be merged by appending or interleaving pages from multiple input files. AutoSplit software includes document security and watermarking functionality for automatic processing of output files. Plug-in features powerful and intuitive user-interface that is easy to use for power users as well as for beginners.
AutoSplit
plug-in is available in two versions: AutoSplit™ and
AutoSplit Pro™. AutoSplit Pro includes all functionality
of AutoSplit plus batch processing and splitting by separator pages
(blank pages or page containing matching text).
Summary:
- Split by page count.
- Split by bookmark hierarchy.
- Split documents at blank pages (AutoSplit Pro).
- Split documents at pages with matching text (AutoSplit Pro).
- Split by custom page ranges.
- Split by text content.
- Split by element content (forms, images, notes, rubber stamps and etc.).
- Options to extract odd/even pages, reverse page order.
- Protect output files with password, restrict user access to certain features.
- Add watermark to all pages in the output document.
- Options to automatically name output files.
- Automatic generation of HTML index file.
- Batch processing support for Acrobat Professional (AutoSplit Pro).
- Merge documents by appending pages.
- Merge documents by interleaving pages.
- Extract all embedded files from PDF files and packages.
- Export embedded files metadata (into CSV, HTML, Excel XML formats) from PDF files or packages .
1. 3. Commercial and Evaluation Versions
Plug-in
is available in two flavors: a full commercial version (requires
purchase) and a free evaluation version (can be downloaded over the
internet at www.evermap.com.)
IMPORTANT: Limitation of the
evaluation version:
- "DEMO" watermark is added to all output pages.
- 30 days evaluation period
The
installation program automatically determines the path to the Adobe
Acrobat software that is installed on your computer. The plug-in will
be installed into the "plug-ins" folder of the Adobe
Acrobat application (standard location of all Acrobat plug-ins). If
there are multiple version of Adobe Acrobat on your computer, Plug-in
would be installed in the most recent version of the software.
The
full path to the plug-in is: <Adobe Acrobat Installation Path>
/ plug_ins/ EverMap/ AutoSplit.api
IMPORTANT: Make sure that Adobe
Acrobat is not running while you are installing the plug-in.
If you
can't find AutoSplit menus after installing a plug-in please refer to
Why plug-in might not load?
section.
Once the plug-in is installed it should add the
following menu entries to the Adobe Acrobat:
"Plug-ins
/ Split Document...",
"Plug-ins /
Merge Documents...",
Plug-ins
/ PDF Package / Extract Package(s)...",
Plug-ins
/ PDF Package / Export Content Metadata..."
IMPORTANT:
Please note that you have open a document first in order to use
"Plug-ins / Split Document..." menu. If there is no active
document open, then this menu appears disabled (grayed out).
|
|
Split Functionality:
- Split document into files containing N pages per file.
- Split document based on specified level of bookmark hierarchy.
- Split document into multiple files at blank pages (including scanned images of blank pages). This is feature is available in AutoSplit Pro package only.
- Split document into multiple files at page with specified text pattern. This feature is available in AutoSplit Pro package only.
- Split document based on manually defined page ranges with a multitude of available options such as odd/even pages, reverse order, page ranges from individual bookmarks and etc.
- Split document based on text content: extract pages that contain specified keywords and text patterns.
- Split document based on element content: extract pages that contain specific elements such as forms, images, notes, rubber stamps (18 different attributes are available).
- Use "Split Document" command in Adobe Acrobat batch framework to automate document processing.
Merge Functionality:
- Merge unlimited number of PDF documents into one by appending pages.
- Merge unlimited number of PDF documents into one by interleaving pages.
- Merge all PDF documents in the folder and optionally include all subfolders.
- Pad input documents with blank page if number of pages in the document is odd.
- Use "Merge Documents" command in Adobe Acrobat batch framework to automate document processing.
- Specify page ranges or subsets(odd/even pages) for each document.
- Reverse page order for any input document.
- Automatically bookmark each document in the output and transfer bookmarks.
- Import merging instructions from CSV files.
- Set output document properties such as "Title", "Author", "Subject" and "Keywords".
- Change document merging order by sorting input documents alphabetically.
Extracting PDF Packages:
- Extract all embedded files into a specified output folder.
- Automatically create detailed html report with links to extracted files.
- Output all metadata (file name, description, size and etc.) to spreadsheet-ready CSV report file .
- Extract all metadata to spreadsheet-ready XML / CSV file without extracting embedded files.
- Batch processing support for extracting embedded files.
Security Options:
- Secure output documents with an "open" password.
- Restrict user access for editing, printing, copying to clipboard and editing text notes.
Document Options (Splitting):
Document Naming (Splitting):
- Transfer bookmarks to output file.
- Update links and bookmarks in output documents to point to correct locations.
- Remove page labels in output documents.
- Linearize output files for fast internet access.
- Add watermark (text string) to every output page.
- Set output folder for all documents.
- Specify name prefix (can be input document name, date time and etc.).
- Specify base filename to be used for all output files.
- Specify what to add to the base filename: number, letter, page range, page label or a text from page.
- Specify custom file names (for manual page ranges method).
- Create folders to mirror bookmark tree structure (for splitting by bookmarks only).
Batch processing support is available when running AutoSplit Pro plug-in with Acrobat Professional (AutoSplit Pro package only).
2. 2. Splitting Functionality Overview
There
are 4 splitting modes available:
2.2.1. "Equal
Size Documents of N Pages"
This is
the most simple splitting method. It creates a number of output
documents from the current one, N pages each (user specifies N). The
last document may have less pages than others, if number of pages in
the input is not divisible by N.
Number of documents created =
(Number of Pages In Input Document + (N-1)) / (N)
This method is available only if the input document has bookmarks defined. It creates an output document for every bookmark in the document. User can control the level of bookmarks used in the splitting process by specifying starting and ending levels (top most bookmark level is 1). For example, if you want to create documents that will contain pages defined by top level of bookmark hierarchy (one document per bookmark section) specify levels from "1 to 1".
AutoSplit can process simple JavaScript actions associated with the bookmarks. Typically, bookmark action implemented with JavaScript contains the following line:
this.pageNum = X;
Where X is the actual page number that needs to be displayed when user clicks on the bookmark. For example, the code "this.pageNum = 5;" is going to display fifth page in the document. AutoSplit scans all JavaScript actions associated with bookmarks and retrieves all corresponding page numbers.
AutoSplit can create folders on disk that mirror bookmark tree structure of the input document. This option is useful for documents with complex bookmark structure and when better file organization is desired. For example, input document contains the following bookmark tree:
--Section
A
|
Chapter 1
|
Chapter 2
--Section
B
|
Chapter 3
|
Appendix A
|
Table 1
|
Table 2
Assuming that "Create Folders" option is turned on, then the following folder structure will be created in the output folder on disk:
Folder
"Section A" contains 2 documents "Chapter 1.pdf"
and "Chapter 2.pdf".
Folder "Section B"
contains one document "Chapter 3.pdf" and one folder
"Appendix A".
Folder "Appendix A" contains 2
documents "Table 1.pdf" and "Table 2.pdf".
2.2.3. "Use Separator" (available in AutoSplit Pro only)
This mode allows to split input document into multiple files at pages that are treated as separators. There are three options available in this mode: "Page with matching text", "Blank page" and "Blank image page". Input PDF document is scanned for separator pages and multiple output documents are created for groups of pages located between them.
- "Page with matching text" option:
Input document is scanned for pages that contain specified text (or regular expression pattern). Document is split into multiple files at location of those pages. Pages that contain matching text become first pages in the corresponding output documents. The plug-in searches page text and content of all annotations such as "Typewriter", "Note" and etc.- "Blank page" option:
Input document is expected to have pages with content separated by completely blank PDF pages. Blank pages should not have any elements: either visible or hidden. New output document will be created for every continuous page range between blank pages. Blank pages will be discarded and will not be included into any output documents.- "Blank image page" option:
Input document is expected to have pages with content separated by pages that contain a single scanned image (black and white) of the blank page. Sometimes such pages may appear completely blank or contain a little bit of noise (similar to a blank page that came out of the fax machine). New output document will be created for every continuous page range between blank pages. Blank pages will be discarded and will not be included into any output documents. It is possible to adjust sensitivity of the algorithm. Click on the small "…" button to the right of the option menu to open "Blank image page settings" dialog. This dialog allows to adjust amount of noise that scanned image of blank page might have. Remember that adjusting this parameter too high or too low may result in misclassification of blank pages and missing a valid splitting boundary. Typically, this mode is used to split big PDF documents that contain many multi-page fax transmissions separated by a blank page.
2.2.4. "Use Manually Defined Page Ranges"
This is most flexible splitting mode. It allows user to specify how many output documents will be created and what pages ranges (or page sets) from input document need to be copied to output. Each output document might consist of unlimited number of page ranges. Variety of options can be applied to every page range, such as selection of odd/even pages, reverse order and optional watermark. Page ranges or starting/ending page numbers can be also specified by selecting a bookmark (every valid bookmark points to a page in the document) from a bookmark tree view. Extracting pages based on text or element content is also one of the powerful options in this mode. Every output document then has a list of associated pages ranges or queries (to extract based on text or element content) that completely define its content.
2.2.5. Split By Text Functionality
Page ranges can be also defined based on actual text content. Pages that match specified keywords or search patterns can be dynamically extracted from the document. AutoSplit automatically searches input document (at run-time) and builds a list of pages that matches user search query. Regular expressions can be used to define matching text patterns. Regular expression pattern matching has the same syntax and semantics as Perl – widely popular programming language for processing textual information. User can choose to extract just a single matching page from the document, set of all matching pages or even define a page range. Separate matching queries can be used to define first and last pages of the page range. See Split by Content: extracting pages using keywords and text patterns" section for detailed information on this topic.
2.2.6. Split By Content Functionality
Pages that contain specific elements (such as forms, images, notes, rubber stamps and etc.) can be dynamically extracted without specifying actual page numbers. AutoSplit automatically searches input document (at run-time) and builds a list of pages that matches user request. Selected pages then would be added to the corresponding output document. This functionality is available as one of the many ways to define output pages in "Use Manually Defined Page Ranges" mode. There are 18 different options available for specifying element content of the page. See "Split by Element Content" for detailed information on available options.
2. 3. Merging Functionality Overview
There are two document merging methods available:
"Append pages"
Pages from multiple input documents are sequentially copied (appended) to a new output document. User can optionally specify a range of pages to use from each input document. Software proceeds as follows: first selected pages from first document are copied to the output, then pages from the second document are transferred and so on. The resulting PDF document is opened in Adobe Acrobat as "MergeResults" document that needs to be saved to a file by a user.
"Interleave Pages"
Pages from multiple input documents are sequentially interleaved and copied to a new output document. User can optionally specify a range of pages to use from each input document. Software proceeds as follows: first N pages from first document are copied to the output, and then N pages from the second document are copied and so on. This process is repeated until there are enough pages in each document to proceed. If there is uneven number of pages in the input documents, pages that can't be interleaved are appended to the end of the output document. User can specify N - a number of pages to interleave from each document. Specify a correct number of pages to use from each input document to achieve correct results. The resulting PDF document is opened in Adobe Acrobat as "MergeResults" document that needs to be saved to a file by a user.
2. 4. Extracting PDF Packages Overview
The plug-in provides functionality for automatic
extraction of all embedded files from any PDF document or package. User is prompted
to select one or more input PDF files and output folder for extracted files. The
software creates both HTML and CSV report files that contain metadata information
about extracted files such as name, description, size, creation date and etc.
Use "Plug-ins > PDF Package > Extract Package(s)" menu to extract all embedded files
from one or more input PDF file.
Use "Plug-ins > PDF Package > Export Content Metadata" menu to create a spreadsheet-ready
CSV or XML report file that lists all embedded files for selected PDF files.
The plug-in also adds "Extract Embedded Files" command to the batch processing framework
(available in Adobe Acrobat Professional only). Use this command in the batch sequences
where it is necessary to extract all embedded files.
2. 5. Batch Processing Support Overview
The plug-in adds the following commands
to the batch processing framework available in Adobe Acrobat Professional ("Advanced
> Document Processing > Batch Processing" menu):
Use these commands in Acrobat batch sequences to automate repetitive tasks or processing many documents at once.
- "Document / Split Document"
- "Document / Merge Documents"
- "Document / Extract Embedded Files"
|
|
Document splitting dialog can be started from Adobe Acrobat menu, by selecting "Plug-ins / Split Document...". "Split Document Settings" dialog will pop-up on the screen. Select desired splitting method and output options and click "OK" button to execute operation. Operation is always applied to current active document (window) of Adobe Acrobat. You need to have at list one document window open, otherwise menu selection will be disabled.
3. 2. Output Documents Options
Output
options are grouped together into "Output Document Processing
Options" section. There are 4 options available. These options
are applied to every output document.
"Linearize
document for fast network access"
All output
files will be linearized for faster page-served remote (network)
access. Select this option if you want to use output documents on the
web.
"Update links in output documents"
When
pages are extracted from original documents, many links within the
document may become broken because now they are no longer located in
the same PDF file or because pages have been moved to a different
location within the document. Plug-in can update all navigational
elements in the output documents (bookmarks and named destinations)
so they still point to the right location. If destination page of the
link is still located in the same file as a link, the link will be
update to point to a correct page number within output document. If
destination page will located in the different output document, it
will be converted to a intra-document link and will point to a
correct page in the another document. If destination page of the link
is not going to be extracted to any of the output files, link will be
adjusted to point back to original document. The plug-in processes all link annotations
as well as interactive form elements such as buttons. All "View > Go To" actions
(such as "Next Page", "First Page", "Previous Page" and "Last Page" will be converted
into regular "go to a page view" actions.
WARNING:
Turning this option "ON" might significantly increase
processing time depending on the number links and bookmarks contained
in the document.
"Remove all
page labels"
This options
remove all pages labels from output documents. When document is split
into several other documents, page ordering change and you have to
redo page labels because they are no longer valid.
"Transfer
bookmarks to the output documents"
This
options transfers all bookmarks from the original document to every
output one. If input document has a lot of bookmarks that are going
to be duplicated in every output document the total size of output
might significantly bigger than the size of the input document. If
you check "Update links in output documents" options, all
bookmarks will get updated and will still point to the right location
either in the original or in the one of the output documents.
WARNING: Turning this option "ON" might increase size of every output document depending on the number of bookmarks.
"Create
HTML index file"
AutoSplit
plug-in automatically generates HTML index file that lists all output
documents and links to their locations on disk. Document index is
useful when splitting a large PDF file into smaller ones for faster
web access. Index file is placed into output documents folder. The
name of the file is generated by appending word "index" to
the name of input PDF document. For example, if input document is
"manual.pdf", then index file name is "manual_index.htm".
Output
document name is constructed from 4 different components that can be
customized by the user:
Output document name = Output folder +
Name Prefix (optional) + Base Filename (any custom string) + Suffix
(auto-incrementing)
Output naming scheme consists of 4 parts:
- Output folder- where all documents will be placed. Click "Browse..." button to browse for a folder.
- Name prefix, option that automatically adds text to the beginning of each output file name. There are 6 options available:
"No prefix" - no prefix is added.
"Use input name" - input file name is used as prefix for output file names.
"Use today's date" - today's date is used as a prefix.
"Use document title" - document's title is used as a prefix. You can view or change document's title by selecting "File > Document Properties..." from the menu.
"Use document author" - the name of the author of the document is used. You can view or change author's name by selecting "File > Document Properties..." from the menu.
"Use document creation date" - document's creation date is used. You can view document's creation date by selecting "File > Document Properties..." from the menu.- Base filename, text string that is going to be used as a base for every output document. You can type any text string that contains letters, digits and space character. All characters that are prohibited in the file names will get replaced by underscore symbol _. White space characters are trimmed at the end of the base filename. You can enter {space} to add a white space character at the end of the base filename.
Additional suffix that is going to be appended to the end of base filename to form a complete document name. There are 5 possible automatic naming scheme available: "Add Number", "Add Letter" , "Add Page Ranges ", "Add Page Label" and "Add Text From Page".
- "Add Number" software will automatically add a number to the base filename. This number is going to be incremented by one for each file. For example, if you defined base filename as "Book", then output documents will be named as "Book1.pdf", "Book2.pdf", "Book3.pdf" and so on. PDF extension is automatically gets appended to the filename. You can control the following parameters in this mode: starting number, number to increment by and optional padding with zeroes. Click "Options..." button to the open "File Numbering" dialog that lets you change file numbering settings.
- "Add Letter" software will automatically add a letter to the base filename. This letter is going to be incremented in the alphabetical order for each output document. For example, if you defined base filename as "Book", then output documents will be named as "BookA.pdf", "BookB.pdf", "BookC.pdf" and so on. PDF extension is automatically gets appended to the filename.
- "Add Page: software will automatically add page ranges to the base filename. For example, if you are trying to split document by equal page count of 10 pages per document, then output documents will be named as "Book1-10.pdf", "Book11-21.pdf" and so on. PDF extension is automatically gets appended to the filename.
- “Add Page Label” software will automatically add a page label from the first page the corresponding output document. Make sure that input document has page labels defined before using this option.
- “Add Text From Page”: software can automatically extract text from a first page of each output document and append it to the output file name. Text location on the page is defined by a bounding box. Press “Options” button located to the right from this menu to open “Extract Text From Page” dialog. Enter text location according to instructions displayed in this dialog. The plug-in searches page text and content of all annotations such as "Typewriter", "Note" and etc.
Automatic naming can be skipped in only two cases: when splitting by bookmarks or if output file name is directly specified for each output document when splitting by manual page ranges. Double click on document title in “Define Output Documents and Page Ranges” view to specify a custom document name. You can also specify custom document properties such as document's "Title", "Author", "Subject" and "Keywords". If no custom properties are specified, then these values will be copied from input document.
Turn on "Create Folders" option to generate disk folders that mirror bookmark tree structure of the input document (more details). This option is only available when splitting by bookmarks.
IMPORTANT: When splitting a document using bookmark tree, documents are named after corresponding bookmark titles (more details).
You can limit access to an Adobe PDF document by setting passwords and restricting certain features such as printing and editing. When a document has restricted features, any tools and menu items related to those features are dimmed.
"Password protect output documents": Check this option to restrict access to output files or password protect document permissions. If document password is specified, then users will be prompted to enter password every time they will try to open output document. Leave document password blank if you do not want restrict access to the document. If permissions password is set, then others will not be able to change document editing and printing permissions unless they enter this password. If you are also restricting printing or editing you should also add a password to enhance security. Please note that some third party PDF reading software might not support these security settings.
You can set the following document access permissions:
- "Allow document editing": Check this option to allow others to edit output documents.
- "Allow document printing": Check this option to allow others to print output documents.
- "Allow copying to clipboard": Check this option to allow others to copy selected text or images to the clipboard.
- "Allow editing of text notes": Check this option to allow others to edit text notes in output documents.
3.
5. Configuring Splitting Modes
3. 5. 1.Configuring "Equal size documents of N Pages" mode
The
only available option for this splitting method is number of pages
per document. Specify any number greater than 0. Input document will
be split into multiple output documents N pages each. Last document
might have less pages than others, if number of pages in the input is
not divisible by N.
Number of documents created = (Number of Pages
In Input Document + (N-1)) / (N)
"Define Output Documents
and Page Ranges
" section of the settings dialog
is not available in this mode.
3. 5. 2. Configuring "Use Bookmark Tree" mode
This method is available only if input document has bookmarks defined. It creates number of output documents: one per every bookmark. User can control the level of bookmarks used in the splitting process by specifying starting and ending levels (top most bookmark level is 1). For example, if you want to create documents that will contain pages defined by top level of bookmark hierarchy (one document per bookmark section) specify levels from "1 to 1".
"Define
Output documents and Page Ranges" section of the settings dialog
is not available in this mode.
"Output naming and
destination" section of the settings dialog is not fully
available in this mode. User can only select output folder, but can't
control naming of the documents. Output documents are named using
bookmark titles. White spaces are replaced with underline symbol. For
example, if input document has 3 top level bookmarks: "Documentation
Roadmap", "API Changes", "Index of Methods",
than 3 output documents will be created (assuming user selected
bookmark levels from "1 to 1" for splitting) with the
names: DOCUMENTATION_ROADMAP.PDF, API_CHANGES.PDF and
INDEX_OF_METHODS.PDF.
3. 5. 3 Configuring "Use Separator" mode
This method is available in AutoSplit Pro plug-in only. The idea behind this method is to define special "separator" pages that designate locations in the document where it needs to be splitted. For example, inserting a blank page is a very popular way of separating continuous groups of pages in the PDF document. There are three different separator types available in this mode:
- Page with matching text
- Blank page
- Blank image page
3. 5. 3. 1. Splitting by page with matching text
Input
document is scanned for pages that contain specified text (or regular
expression pattern). Typically, this text should occur on the first
page of the group of pages that should be extracted into separate
document. Document is split into multiple files at those locations.
Pages that contain matching text become first pages in the
corresponding output documents. Click on the small button ("…")
that is located to the right of the "Use separator" menu to
open "Separator Page By Text Search" dialog where you can
specify text to search for along with text matching
options.
For example, input document contains multiple reports of variable
length, between 1 to 5 pages each. The goal is to split it into
multiple files with one report per document. We know that first page
of each report contains a text string of the following format:
"Report 2345", "Report 1123" and so on. The word
"Report" is followed by exactly 4 digits. This can be
accomplished by specifying separator page with matching text:
"Report \d{4}". Make sure that "Use regular
expressions" option is turned on. This regular expression
pattern will only match text that contains the word "Report"
with 4 digits after it.
3. 5. 3. 2. Splitting at blank pages
"Blank page" option: input document is expected to have pages with content separated by completely blank PDF pages. Blank pages should not have any elements: either visible or hidden. New output document will be created for every continuous page range between blank pages. Blank pages will be discarded and will not be included into any output documents.
3. 5. 3. 3. Splitting at blank image pages
PDF documents can be created from the output from fax or a scanner. In this case, PDF document contains no other elements, but single image of every page. This image represents a page in the original document. Blank pages can be easily inserted into such documents by scanning a blank page. Split by blank image page can automatically detect such pages and split input document at those location. Software provides a way to adjust for amount of noise that is contained in the scanned image of the blank page. Click on small button ("…") located to the right of "Use separator" menu to open "Blank Image Page Settings" dialog. Move slider to adjust sensitivity of the algorithm or select a sample page that represents the worst case of a blank page (most noise, black edges and etc.).. Remember that adjusting this parameter too high or too low may result in misclassification of blank pages and missing a valid splitting boundary.
3. 5. 4.
Configuring "Use Manually Defined Page Ranges" mode
3.
5. 4. 1. Defining Pages Ranges
In
this method user can specify exactly how many output documents need
to be created and what pages have go to which output document. Every
output document is described with a set of page ranges that it
suppose to contain. Each page range can have additional options such
as odd/even pages selection, reversing page order or adding a
watermark to each page. Page ranges can be defined either by
specifying starting and ending page numbers or by starting page
number and number of pages in the range. Page ranges can be also
created automatically from individual bookmark.
- To create new output document definition: click "Add Output Document" button in the "Define Output Documents And Page Ranges" section. "Specify Page Ranges" dialog would appear on the screen. Specify desired individual page numbers and page ranges that should be extracted into the new document. You can enter all page numbers as a comma-separated list, e.g., 1, 4-8, 10-20. Swap first and last page number of the page range to specify that these pages need to be processed in reverse order, e.g., 5-1. You can also specify that only even or odd pages should be processed from a specified page range. Add letter 'E' at the end of the page range to indicate only even pages, e.g., 1-5E, or add letter 'D' to indicate only odd pages, e.g., 1-5D. Click "OK" button once you have entered all necessary page numbers. The new document entry will be created in the output document list with a name "Document N" (where N is an integer number starting from 1, for example "Document 1"). This name has nothing to do with actual output document name and merely serves as document design name. This new entry is getting automatically selected in the view and several more buttons become active allowing user to add individual page range definitions to this output document. Double click on the document name to open "Output Document Options" dialog if you want to enter a specific filename for the output document.
- Right click on the document name to active a document menu that contains variety of operations you can perform on the output document entry.
- To delete a document from design view: select document in the list and click "Remove Document" button.
- To delete all documents from design view: click "Remove All Documents" buttons. All document definitions will be removed and design view will be cleared. No documents will be defined for the output.
- To manually add a page range to the output document: select desired document in the list and click "Add Range …" button. "Range of Pages " dialog will appear on the screen. Specify starting/ending page numbers or starting page and number of pages. User can also retrieve page numbers from bookmarks (if there are any bookmarks defined for input document). Click "From Bookmark" buttons to assign page number from a bookmark. There are 4 additional options that user can define for each individual page range: "Include Even Pages From The Range", "Include Odd Pages From The Range", "Reverse Page Order In The Output Document" and "Add Watermark To Every Page In Output".
- Transparency level of a watermark can be adjusted between 0 and 100%. Watermark can be placed either in front or behind the page content. Placing a watermark behind page content sometimes might produce unexpected results. Some documents may contain page elements that are indistinguishable from white page background, but can obscure watermark visibility. Use “View / Navigation Tabs / Content “ tool to inspect actual page content before deciding on placing watermarks behind a text content.
- To quickly add a page range from the bookmark: select desired document in the list and click "Range From Bookmark..." button. "Select Bookmark To Define Page Range" dialog will appear on the screen. Select a bookmark from the tree view and click "OK" button.
- To edit existing page range definition: select a page range in the list and press "Edit Page Range …" button. You can also double click on the page range entry in the list and "Edit " dialog will pop up on the screen.
3.
5. 4. 2. Splitting by Text
To
define pages from document text: select a desired output document in
the list and click "Pages From Text…" button. The "Extract Pages By Text
Search" dialog will appear on the screen. The dialog provides three options for defining what should be the
output of the operation: single page, set of all matching pages or a
page range. Single query is added to the list of page ranges as a
result of this operation. This query would be executed at the run
time and result of the text matching operation would be added to the
output document. It's possible that query could not yield any
matching pages. Only one matching pattern needs to be specified for
finding single page or set of all matching pages in the document. If
you want to define a page range, separate matching patterns needs to
be specified for starting and ending pages. When executing such
query, plug-in tries to find a page in the document that matches
first page pattern. After this page is found, plug-in will search for
the page that matches last page pattern. This process will be
repeated resulting in one or more page ranges depending on the
document content. In many cases, it's hard to come up with matching
pattern that defines the last page of the range. For example,
chapters in the book do not have any special keywords or text
patterns that designate the end. The solution is to define the same
matching pattern for the last page as for the first page and check
"Exclude last page from the output page range" checkbox.
For more information on how to use this functionality see "Split
by Content: extracting pages using keywords and text patterns"
section.
3.
5. 4. 3. Splitting by Element Content
To
define pages by element content: select a
desired output document in the list and click "Pages From Content…" button. The
"Extract Pages By Element Content" dialog will appear on the screen. The dialog
provides 18 different options that can be selected to identify pages that need to
be extracted. Page is included
into the output document only if it contains ALL specified elements
or attributes. Available options are:
- "Image" – page contains embedded image element.
- "Form elements" – page contains form or any interactive elements such as push buttons, check and radio buttons, text entries and etc.
- "Postscript object" – page contains embedded postscript data.
- "Shading" – page contains shading element.
- "Transparency" – page contains transparency effects (any PDF element that has transparency such as highlighting or watermark).
- "Page is rotated" – page orientation is different from vertical: it is rotated 90, 180 or 270 degrees.
- "Page transition" – page has an associated page transition effect.
- "Submit form action" – page contains an element that has "Submit Form" action associated with it.
- "Reset form action" – page contains an element that has "Reset Form" action associated with.
- "Link to a page in the same document" – page has a link to another page in the same document.
- "Link to a page in another document" – page has a link to a page that is located in another PDF document.
- "Link to an internet address (URI)" – page has a link to a resource that has web/internet address.
- "Text comment (annotation)" – page contains text box comment or any other element containing text annotation element.
- "Note comment (popup)" – page contains a popup note comment that can be either hidden or visible.
- "Launches an application or prints" – page contains an element that launches another program, opens PDF documents or performs printing.
- "Executes JavaScript code" – page contains an element that has associated action that executes JavaScript code.
- "Highlighting" – page contains highlighting elements (bright transparent markings created with "Highlight" tool).
- "Rubber stamp" – page contains rubber stamp annotation element.
3.
5. 4. 4. Email Output Documents
You can
email output documents created in "Use manually defined page ranges" splitting mode.
This splitting mode is the only mode that defines an exact number of output documents
as well as their page content. The software allows user to specify a different email
recipient for each output document. Email options such as recipients, message subject
and text are specified on per document basis. Double-click on a document name in
the output document list and select "Email Options" tab in the "Document Properties"
dialog. Check "Send this document via email" if you want to send this document via
email. Enter email addresses into "To", "CC" ("carbon copy") and "BBC" (blind carbon
copy") fields. Type in email subject and message text. The document itself will
be emailed as binary file attachment. Check "Show email confirmation dialog before
sending each file" option if you want to verify or edit each outgoing email message
in your default email application. Depending on the operation system, user account
type and account security settings, you may be prompted by Windows to confirm each
outgoing message. The Windows security dialog typically has an option to stop asking
for confirmation. Please note that this option only works until your restart Windows.
This is a security feature that is build into Windows operating system and cannot
be easily disabled. All email messages will be sent via your default email application
(such as MS Outlook, MS Outlook Express, Thunderbird and etc.). It requires that
your email client should be configured to support MAPI protocol (most email clients
do support MAPI by default and most likely no action is required to turn it on).
MAPI (Messaging Application Programming Interface) is an application programming
interface (API) designed by Microsoft. Any software that supports MAPI can communicate
with any mailserver and send and receive data via this interface regardless of their
type and software provider.
3. 6. Split by Text: extracting pages using keywords and text patterns
Splitting
document based on actual text content is a very powerful feature of
the plug-in. In order to get most out of it sometimes it's necessary
to use regular expressions to define matching text patterns. A
regular expression is a pattern that is matched against a subject
string from left to right. Most characters stand for themselves in a
pattern, and match the corresponding characters in the subject.
Regular expressions are also described in the Perl documentation and
in a number of other books and online resources, some of which have
copious examples. There are many web sites that serve as online
repository of useful regular expressions. The description here is
intended as introductory documentation only.
The most trivial
example is to use plain text keywords as matching pattern. Pages that
contain those words will be used as a successful match.
For
example: using "Johnson" as a matching pattern will result
in extracting all pages that contain this text (assuming that output
option is a set of all matching pages). Although this simple query is
useful, we might want to define more complex patterns. The power of
regular expressions comes from the ability to include alternatives
and repetitions in the pattern. These are encoded in the pattern by
the use of meta-characters, which do not stand for themselves
but instead are interpreted in some special way. There are 11 special
meta-characters:
|
Character |
Description |
|
\ |
general escape character with several uses |
|
$ |
assert end of string (or line, in multi-line mode) |
|
. |
match any character |
|
[ |
start character class definition |
|
| |
start of alternative branch |
|
( |
start sub-pattern |
|
) |
end sub-pattern |
|
? |
extends the meaning of ( |
|
* |
0 or more quantifier |
|
+ |
1 or more quantifier |
|
{ |
start min/max quantifier |
|
! |
negate the result of regular expression |
It's
impossible to cover all details of regular expression syntax in this
brief introduction. See www.regular-expressions.info
for detailed information.
Here are few useful examples that will
help you utilize some powerful features of regular expressions:
Matching alternatives: Vertical bar characters are used to separate alternative patterns. For example, the pattern Johnson|Peterson matches either "Johnson" or "Peterson". Any number of alternatives may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used.
Sub-Patterns: Sub-patterns are delimited by parentheses (round brackets), which can be nested. For example, the pattern ((red|white) (BMW|Volvo)) matches all combinations of "red" and "white" with words "BMW" and "Volvo" (i.e. "red BMW" or "white Volvo"). Another example: (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. The meta-character \1 here serves as a back reference to the first matching sub-pattern. Such references must, however, follow the sub-pattern to which they refer.
Matching
two patterns simultaneously: Sometimes it's necessary to match
two or more patterns simultaneously (logical AND). Regular
expressions do not provide AND operation directly. However, it's
possible to write a regular expression to produce desired behavior.
For example, the pattern (John Smith).*(Expense Report)
matches all pages that contain "John Smith" and "Expense
Report" strings together. This expression consists of 3 parts:
1. (John Smith) - matches "John
Smith" string
2. .* - matches 0 or
more occurrences of any character.
3. (Expense
Report) - matches "Expense Report" string.
Matching whole words: Simple text patterns such as Alert are also going to match words "Alerts", "Alerted" and etc. If you want your pattern to match only whole words, surround it with \b meta-characters. For example, use \bAlert\b to match only word Alert and exclude all other words that might contain it as a sub-string.
Matching sub-string: If text that you want to match should appear only inside bigger word, use \B meta-character. For example, the pattern \Bword\B will match word "swordfish", but will ignore words "word", "words" and "password".
Repetitions: The general repetition quantifier specifies a minimum and maximum number of permitted matches, by giving the two numbers in curly brackets (braces), separated by a comma. The numbers must be less than 65536, and the first must be less than or equal to the second. For example: z{ 2,4} matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special character. If the second number is omitted, but the comma is present, there is no upper limit; if the second number and the comma are both omitted, the quantifier specifies an exact number of required matches.
Negating the result: Sometimes it's necessary to match text that does contain certain words or patterns. This can be done by adding ! as a first character of the regular expression. For example, !(Approved) matches any string that does not contain word "Approved" in it.
Character types: backslash can be used to specify generic character types:
\d
any decimal
digit
\D
any character that is not a decimal digit
\s
any whitespace
character
\S
any character that is not a whitespace character
\w
any "word" character (A "word" character is any
letter or digit or the underscore character)
\W
any "non-word" character
For example: \d{8} matches exactly 8 digits.
3. 7. Batch Processing (available in AutoSplit Pro and AutoSplit eCTD)
Acrobat
software provides a batch processing facility that allows you to
create batch sequences and apply them to multiple documents at once.
AutoSplit plug-in installs "Split Document" and "Merge
Documents" commands that can be accessed and executed via "Batch
Sequences" dialog. These commands may be used alone or inside
another batch sequence allowing you to incorporate document splitting
procedure into your existing workflow. You need Acrobat Professional
in order to have batch processing available on your system. The
section below outlines the procedure to use when creating a batch
sequence for splitting documents. You can use the same procedure to
create and configure batch sequences that use "Merge Documents"
command. However there is one fundamental difference in the way these
two commands handle input document. "Split Document"
command take input document and splits it into multiple. "Merge
Documents" command uses its own list of input folders and
documents and essentially ignores input document that is passed to it
by Acrobat batch processing framework.
3.7.1 Creating a
document splitting batch procedure
1.
Select "Advanced / Batch Processing…" from Acrobat's
application menu. "Batch sequence" dialog will pop up on
the screen.
2. Click "New Sequence…" button and
type in a name for the new sequence. For example, type
"SplitByBookmarks" and click "OK" button.
3.
Click "Select Commands…" button in "Batch Edit
Sequence" dialog.
4. Select "Document / Split Document"
from the list of available commands in "Edit Sequence"
dialog and click "Add>>" button.
5. Now you should
see a new "Split Document" entry in the sequence list on
the right side of the dialog.
6. Double click on the "Split
Document" entry (on the right side of the dialog).
7. "Split
Document Settings" dialog pops up on the screen. Select desired
splitting method and processing options. These settings will be saved
as part of your new batch processing sequence. Click "OK"
button to save settings and continue.
8. Click "OK"
button in "Edit Sequence" dialog.
9. Set "Select
output location" option to "Don't save changes". Click
"OK" button in "Batch Edit Sequence" dialog.
10.
Now you have created a new batch processing sequence that can be used
on single or multiple documents. This new command will be always
available as custom processing sequence in future sessions.
3.7.2 Running a document splitting
sequence
1.
Open "Batch Sequences" dialog by selecting "Advanced /
Batch Processing" from Acrobat's main menu.
2. Select desired
sequence from the list and click "Run Sequence" button.
3.
Click "OK" in confirmation dialog and Acrobat will prompt
you to select input documents. Make sure that documents you want to
process are not currently open in Acrobat viewer.
4. Select one or
more input documents and click "OK" button to start
processing.
3.7.3 Starting a document splitting sequence from a command line prompt
AutoSplit plug-in can be launched
from operating system command line prompt or from a batch file. Use
any text editor (for example, Notepad.exe) to create a batch file
that will be executed from the command line. The following is a
sample batch file assuming Acrobat 7.0 is located in default
installation folder:
SET
AUTOSPLIT=C:\Documents and Settings\YourLogonName\Application
Data\Adobe\Acrobat\8.0\Sequences\SplitSingle.sequ
"C:\program
files\Adobe Acrobat 8.0\Acrobat\Acrobat.EXE"
Note
that this batch file contains two lines of text, not one. Make sure
you specify correct location and name of the batch sequence file you
want to execute. Replace YourLogonName
in the path with your real login name. It's generally a good idea to
verify if a sequence file you specify actually exists. If you are not
using Adobe Acrobat 8.0 than you have to make necessary changes to
the Acrobat.exe and batch sequences folder paths. Save this batch
file to a disk and make sure it has BAT extension. For example, save
it as "Autosplit.bat".
If you need a better solution for
creating and executing Acrobat batch sequences then take a look at
AutoBatch plug-in
that is specifically designed for this task.
Executing batch
file from command line:
- Make sure Adobe Acrobat is not running on your system.
- Open a folder where your command line batch file is located.
- Double click on the batch file. Adobe Acrobat should launch and immediately start executing a batch sequence.
- Acrobat will automatically exit once processing is completed.
3.8 Executing
Click "OK" button on the "Split Document Settings" dialog to execute document splitting operation. If "OK" button is disable, it means that selected splitting method is either not-applicable to input document or requires selecting of additional parameters.
|
|
4. 1. Merging Multiple Documents
Document
merging dialog can be started from Adobe Acrobat menu, by selecting
"Plug-ins / Merge Documents...". "Merge Document
Settings" dialog will pop-up on the screen. Select desired
document merging operation and input documents and click "OK"
button to execute operation. New document will be created by
combining input documents together according to selected settings and
opened in the viewer. This document will be temporary named
"MergeResults" and will need to be saved to a permanent
file by a user unless "Save document as" option is selected
and output filename is specified.
Check "Pad input document
with a blank page..." option to automatically add blank page to
the end of each input document that has odd number of pages. This
option helps to ensure that first page of each input document will
start on the odd page (front side of the page in the double-sided
printouts) in the output document.
4. 2. Merging Documents by Appending Pages
-
Select input documents that need to be merged by clicking on "Add
Document..." or "Add Folder.." button. File selection
dialog should appear on screen. Select desired documents and click
"OK" button. Selected documents will be added to the list
of input documents. The order of the files in the list determines
their processing order.
- Double-click on the file in the list or
select a file and click "Set Page Range..." button to
optionally specify range of pages to use.
- Select processing mode
by picking "Append Pages" from "Merge Operation"
menu.
- Click "Save..." button to optionally save
processing settings (including selected input files) to a settings
file. This file can later loaded by clicking on the "Load..."
button.
- Click "OK" button to start merging documents.
Pages from multiple input documents are sequentially copied (appended) to a new output document. User can optionally specify a range of pages to use from each input document. Software proceeds as follows: first selected pages from first document are copied to the output, then pages from the second document are transferred and so on. The resulting PDF document is opened in Adobe Acrobat as "MergeResults" document that needs to be saved to a file by a user.
4. 3. Merging Documents by Interleaving Pages
-
Select input documents that need to be merged by clicking on "Add
Document..." or "Add Folder.." button. File selection
dialog should appear on screen. Select desired documents and click
"OK" button. Select documents will be added to the list of
inputs. The order of the files in the list determines their
processing order.
- Double-click on the file in the list or
select a file and click "Set Page Range..." button to
optionally specify range of pages to use.
- Select processing mode
by picking "Interleave Pages" from "Merge Operation"
menu.
- Specify number of pages from each document to interleave
in "Number of pages to interleave" entry box. This number
should be always greater than zero.
- Click "Save..."
button to optionally save processing settings (including selected
input files) to a settings file. This file can later loaded by
clicking on the "Load..." button.
- Click "OK"
button to start merging documents.
Pages
from multiple input documents are sequentially interleaved and copied
to a new output document. User can optionally specify a range of
pages to use from each input document. Software proceeds as follows:
first N pages from first document are copied to the output, and then
N pages from the second document are copied and so on. This process
is repeated until there are enough pages in each document to proceed. If there is not enough pages in one document, the plug-in
will start looping pages from the beginning. For example, if you have document A
(200 pages long) and want to insert document B (2 pages long) after every 2 pages
of document A, then software will repeat pages from document B as many times as
necessary to complete this task.
You need to specify a correct number of pages to use from each input document to
achieve correct results. For example, if you selected two files A and
B as input documents and specified "1" as number of pages
to interleave, then output document will look as follows: page 1 from
A, page 1 from B, page 2 from A, page 2 from B, page 3 from A, page 3
from B and so on. The resulting PDF document is opened in Adobe
Acrobat as "MergeResults" document that needs to be
saved to a file by a user.
4. 4. Merging Content of the Folder
All PDF
documents in the folder can be merged by adding a folder to the list
of input document. The software will automatically determine the list
of PDF documents in the folder at the run-time. Click "Add
Folder..." button to select a folder. You will be prompted to
specify folder processing options:
- Check "Include documents
from all subfolders" option to add PDF documents from subfolders
to the list of input documents.
- Check "Sort documents
alphabetically before merging" option to sort documents before
merging using their full paths. This option will affect the order of
the pages in the output document. There are two possible sorting
orders: "Ascending" and "Descending". Subfolder
names are also used when determining sort order of the input
document. For example, you have 3 documents in the input folder:
C.pdf, A.pdf, and B.pdf. If "Ascending" sort order is
specified then documents will be merged in the following order:
A.pdf, B.pdf, and C.pdf. If "Descending" sort order is
selected then files will be merged in the different order: C.pdf,
B.pdf, A.pdf.
You can always modify folder processing settings by
double-clicking on the folder entry in the "Documents To Merge"
list.
4. 5. Batch Processing (available in AutoSplit Pro and AutoSplit eCTD)
Acrobat software provides a
batch processing facility that allows you to create batch sequences
and apply them to multiple documents at once. AutoSplit plug-in
installs "Split Document" and "Merge Documents"
commands that can be accessed and executed via "Batch Sequences"
dialog. These commands may be used alone or inside another batch
sequence allowing you to incorporate document splitting procedure
into your existing workflow. You need Acrobat Professional in order
to have batch processing available on your system. The section below
outlines the procedure to use when creating a batch sequence for
merging documents. "Merge Documents" command uses its own
list of input folders and documents and essentially ignores input
document that is passed to it by Acrobat batch processing
framework.
4.5.1
Creating a document merging batch procedure
1.
Create a new batch sequence (for example "Merge") using
"Advanced /
Document Processing / Batch Processing" menu.
2. Set “Run
commands on” to “Existing file” option.
3.
Specify any bogus existing input file for this sequence. This file
will be actually ignored during processing. Acrobat batch processing
is designed for a pipeline-style processing of a single document. It
does not fit well for complex multi-file operation as file
merging.
4. Select "Don't save changes" for "Select
Output location" option. Input/Output files are selected inside
a "Merge Documents" command.
5. Click "Select
Commands" button.
6. Add "Document > Merge Documents"
command to the sequence.
7. Click "Edit..." button or
double-click on just added command.
8. This will open "Merge
Documents Settings" dialog.
9. Specify all necessary input
files and folders.
10. Specify "Output Document" for
merging results.
11. Click "OK" to save this changes to
a command.
12. Close all batch sequence dialogs by clicking on
"OK" buttons.
4.5.2 Running a document merging
sequence
1.
Open "Batch Sequences" dialog by selecting "Advanced /
Batch Processing" from Acrobat's main menu.
2. Select desired
sequence from the list and click "Run Sequence" button.
3.
Click "OK" in confirmation dialog to start processing.
|
|
5. 1. Why plug-in might not load?
If you have successfully downloaded and installed a plug-in but can't find it in the "Plug-ins" menu, then make sure that "Use only certified plug-ins" option in Acrobat application preferences is not set:
1.
Start Adobe Acrobat and select "Edit
> Preferences..."
from the main application menu.
2. Select "General"
group of settings (Acrobat version 8) or a "Startup"
(Acrobat versions 6 and 7).
3. Make sure that "Use
only certified plug-ins"
option is OFF.
4. Save settings and restart Adobe Acrobat.
Sometimes
(if Adobe Acrobat is running silently by a web browser), it might be
necessary to restart your computer in order to let Acrobat to
recognize a newly installed plug-in.
One of the other possible
reasons (but extremely rare) is that there are too many plug-ins
already loaded by Adobe Acrobat viewer. There is a limit to the
number of plug-ins that can be loaded by the viewer at any one time.
The number is variable and dependent on the code generation settings
of all loaded plug-ins. Solution: Remove some of the Adobe Acrobat
plug-ins to reduce number of plug-ins simultaneously used.
5. 2. Contacting technical support
Technical support can be contacted via e-mail at tech@evermap.com or using special form on our web site www.evermap.com. EverMap LLC. doesn’t provide technical support by phone. Technical support is free of charge for all users of our software, including users of the free evaluation version.
Credits: This software contains PCRE library that is Copyright (c) 1997-2003 University of Cambridge.