Optimize the PDF
By default, Asciidoctor PDF does not optimize the PDF it generates or compresses its streams. This page covers several approaches you can take to optimize your PDF.
If you’re creating a PDF for Amazon’s Kindle Direct Publishing (KDP), GitLab repository preview, or other online publishers, you’ll likely need to optimize the file before uploading.
In their words, you must tidy up the reference tree and flatten all transparencies (mostly likely referring to images).
If you don’t do this step, the platform may reject your upload or fail to display it properly.
A simple way to reduce the size of the PDF file is to enable stream compression (using the FlateDecode method).
You can enable this feature by setting the
compress attribute on the document:
$ asciidoctor-pdf -a compress filename.adoc
Asciidoctor PDF also provides a flag (and bin script) that uses Ghostscript (via rghost) to optimize and compress the generated PDF with minimal impact on its quality.
You must have Ghostscript (command:
gs) and the
rghost gem installed to use it.
To install the rghost gem, open a terminal and type the following command.
$ gem install rghost
Here’s an example usage that converts your document and optimizes it:
$ asciidoctor-pdf -a optimize filename.adoc
The command will generate an optimized PDF file that is compliant with the PDF 1.4 specification.
If this command fails because the
gs command cannot be found, you’ll need to set it using the
GS environment variable.
On Windows, this step is almost always required since the Ghostscript installer does not install the
gs command into a standard location.
Here’s an example that shows how you can override the
gs command path:
$ GS=/path/to/gs asciidoctor-pdf -a optimize filename.adoc
You’ll need to use the technique for assigning an environment variable that’s relevant for your system.
In addition to optimizing the PDF file, you can also configure the optimizer to convert the document from standard PDF to PDF/A or PDF/X.
To do so, you can pass one of the following compliance keywords in the value of the optimize attribute:
$ asciidoctor-pdf -a optimize=PDF/A filename.adoc
The one limitation of generating an optimized file is that it does not allow non-ASCII characters in the document metadata fields (i.e., title, author, subject, etc.).
To work around this limitation, you can force Ghostscript to generate a PDF 1.3 file using the
pdf-version attribute (or you can generate a PDF/X document):
$ asciidoctor-pdf -a optimize -a pdf-version=1.3 filename.adoc
|Downgrading the PDF version may break the PDF if it contains an image that uses color blending or transparency. Specifically, the text on the page can become rasterized, which causes links to stop working and prevents text from being selected. If you’re in this situation, it might be best to try hexapdf instead.|
If you’re looking for a smaller file size, you can try reducing the quality of the output file by passing a quality keyword to the
optimize attribute (e.g.,
optimize attribute accepts the following keywords:
default (default, same if value is empty),
Refer to the Ghostscript documentation to learn what settings these presets affect.
$ asciidoctor-pdf -a optimize=prepress filename.adoc
To combine the quality and compliance, you separate the keywords using a comma, with the quality keyword first:
$ asciidoctor-pdf -a optimize=prepress,PDF/A filename.adoc
If you’ve already generated the PDF, and want to optimize it directly, you can use the bin script:
$ asciidoctor-pdf-optimize filename.pdf
The command will overwrite the PDF file with an optimized version.
You can also try reducing the quality of the output file using the
--quality flag (e.g.,
--quality flag accepts the following keywords:
In both cases, if a file is found with the extension
.pdfmark and the same rootname as the input file, it will be used to add metadata to the generated PDF document.
This file is necessary when using versions of Ghostscript < 8.54, which did not automatically preserve this metadata.
You can instruct the converter to automatically generate a pdfmark file by setting the
pdfmark attribute (i.e.,
When using a more recent version of Ghostscript, you do not need to generate a
.pdfmark file for this purpose.
If you have difficulty getting the
rghost gem installed, or you aren’t getting the results you expect, you can try the optimizer provided by hexapdf instead.
Another option to optimize the PDF is hexapdf (gem: hexapdf, command: hexapdf). Before introducing it, though, it’s important to point out that its license is AGPL. If that’s okay with you, read on to learn how to use it.
You can then use it to optimize your PDF as follows:
$ hexapdf optimize --compress-pages --force filename.pdf filename.pdf
This command does not manipulate the images in any way. It merely compresses the objects in the PDF and prunes any unreachable references. But given how much waste Prawn leaves behind, this turns out to reduce the file size substantially.
You can hook this command directly into the converter by providing your own implementation of the
Start by creating a Ruby file named optimizer-hexapdf.rb, then populate it with the following code:
require 'hexapdf/cli' class Asciidoctor::PDF::Optimizer def initialize(*) app = HexaPDF::CLI::Application.new app.instance_variable_set :@force, true @optimize = app.main_command.commands['optimize'] end def optimize_file path options = @optimize.instance_variable_get :@out_options options.compress_pages = true #options.object_streams = :preserve #options.xref_streams = :preserve #options.streams = :preserve # or :uncompress @optimize.execute path, path nil rescue # retry without page compression, which can sometimes fail options.compress_pages = false @optimize.execute path, path nil end end
To activate your custom optimizer, load this file when invoking the
asciidoctor-pdf using the
-r flag and set the
optimize attribute as well using the
$ asciidoctor-pdf -r ./optimizer-hexapdf.rb -a optimize filename.adoc
Now you can convert and optimize all in one go.
To see more options that
hexapdf optimize offers, run:
$ hexapdf help optimize
For example, to make the source of the PDF a bit more readable (though less optimized), set the stream-related options to
--streams preserve from the CLI or
options.streams = :preserve from the API).
You can also disable page compression (e.g.,
--no-compress-pages from the CLI or
options.compress_pages = false from the API).
hexapdf also allows you to add password protection to your PDF, if that’s something you’re interested in doing.
Instead of optimizing the objects in the vector PDF, you may want to rasterize the PDF instead. Rasterizing the PDF prevents any of the text or other objects from being selected, similar to a scanned document.
Asciidoctor PDF doesn’t provide built-in support for rasterizing the generated PDF. However, you can use Ghostscript to flatten all the text in the PDF, thus preventing it from being selected.
$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dNoOutputFonts -r300 -o output.pdf input.pdf
You can adjust the value of the
-r option (the density) to get a higher or lower quality result.
Alternately, you can use the
convert command from ImageMagick to convert each page in the PDF to an image.
$ convert -density 300 -quality 100 input.pdf output.pdf
Yet another option is to combine Ghostscript and ImageMagick to produce a PDF with pages converted to images.
$ gs -dBATCH -dNOPAUSE -sDEVICE=png16m -o /tmp/tmp-%02d.png -r300 input.pdf convert /tmp/tmp-*.png output.pdf rm -f /tmp/tmp-*.png
Using Ghostscript to handle the rasterization produces a much smaller output file. The drawback of using Ghostscript in this way is that it has to use intermediate files.