Optimize the PDF

By default, Asciidoctor PDF does not optimize the PDF it generates and it does not compress the streams within it. This page covers several approaches you can take to optimize and compress the PDF output.

If you’re creating a PDF for Amazon’s Kindle Direct Publishing (KDP), GitLab repository preview, or other online publishers, you’ll likely need to optimize the file before uploading. In their words, you must tidy up the reference tree and flatten all transparencies (mostly likely referring to images). If you don’t do this step, the platform may reject your upload or fail to display it properly. (For KDP, -a optimize works best. For GitLab repository preview, either -a optimize or hexapdf optimize will do the trick.)

Enable stream compression

A simple way to reduce the size of the PDF file is to enable stream compression (using the FlateDecode method). You can enable this feature by setting the compress attribute on the document:

$ asciidoctor-pdf -a compress filename.adoc

For a more thorough optimization, you can use the integrated optimizer or HexaPDF.

RGhost

Asciidoctor PDF also provides a flag (and bin script) that uses Ghostscript (via Ruby Ghostscript aka RGhost) to optimize and compress the generated PDF with minimal impact on its quality. You must have Ghostscript (command: gs) and RGhost (gem: rghost) installed to use it.

Install RGhost

First, install the rghost gem. Refer to the table below for instructions on how to install RGhost. If you use Bundler to manage your gems in a Gemfile, add the entry listed in the Bundler column. Otherwise, run the command in the gem command column.

Library gem name Bundler gem command

RGhost

rghost

gem 'rghost'
gem install rghost

Convert and optimize

Here’s an example usage that converts your document and optimizes it:

$ asciidoctor-pdf -a optimize filename.adoc

The command will generate an optimized PDF file that’s compliant with the PDF 1.4 specification.

If this command fails because the gs command cannot be found, you’ll need to specify the path using the GS environment variable. On Windows, this step is almost always required since the Ghostscript installer does not install the gs command into a standard location that RGhost can locate. Here’s an example that shows how you can override the gs command path:

$ GS=/path/to/gs asciidoctor-pdf -a optimize filename.adoc

You’ll need to use the technique for assigning an environment variable that’s appropriate for your system. Here’s how we set the GS environment variable using PowerShell in CI:

$ echo "$(& where.exe /R 'C:\Program Files\gs' gswin64c.exe)" | Out-File -FilePath $env:GS -Encoding utf8 -Append

In addition to optimizing the PDF file, you can also configure the optimizer to convert the document from standard PDF to PDF/A or PDF/X. To do so, you can pass one of the following compliance keywords in the value of the optimize attribute: PDF/A, PDF/A-1, PDF/A-2, PDF/A-3, PDF/X, PDF/X-1, or PDF/X-3.

$ asciidoctor-pdf -a optimize=PDF/A filename.adoc

The one limitation of generating an optimized file is that it does not allow non-ASCII characters in the document metadata fields (i.e., title, author, subject, etc.). To work around this limitation, you can force Ghostscript to generate a PDF 1.3 file using the pdf-version attribute (or you can generate a PDF/X document):

$ asciidoctor-pdf -a optimize -a pdf-version=1.3 filename.adoc
Downgrading the PDF version may break the PDF if it contains an image that uses color blending or transparency. Specifically, the text on the page can become rasterized, which causes links to stop working and prevents text from being selected. If you’re in this situation, it might be best to try HexaPDF instead.

If you’re looking for a smaller file size, you can try reducing the quality of the output file by passing a quality keyword to the optimize attribute (e.g., --optimize=screen). The optimize attribute accepts the following keywords: default (default, same if value is empty), screen, ebook, printer, and prepress. Refer to the Ghostscript documentation to learn what settings these presets affect.

$ asciidoctor-pdf -a optimize=prepress filename.adoc

To combine the quality and compliance, you separate the keywords using a comma, with the quality keyword first:

$ asciidoctor-pdf -a optimize=prepress,PDF/A filename.adoc

If you’ve already generated the PDF, and want to optimize it directly, you can use the bin script:

$ asciidoctor-pdf-optimize filename.pdf

The command will overwrite the PDF file with an optimized version. You can also try reducing the quality of the output file using the --quality flag (e.g., --quality screen). The --quality flag accepts the following keywords: default (default), screen, ebook, printer, and prepress.

In both cases, if a file is found with the extension .pdfmark and the same rootname as the input file, it will be used to add metadata to the generated PDF document. This file is necessary when using versions of Ghostscript < 8.54, which did not automatically preserve this metadata. You can instruct the converter to automatically generate a pdfmark file by setting the pdfmark attribute (i.e., -a pdfmark) When using a more recent version of Ghostscript, you do not need to generate a .pdfmark file for this purpose.

The asciidoctor-pdf-optimize command is not guaranteed to reduce the size of the PDF file. It may actually make the PDF larger. You should probably only consider using it if the file size of the original PDF is several megabytes.

If you have difficulty getting the rghost gem installed, or you aren’t getting the results you expect, you can try the optimizer provided by HexaPDF instead.

HexaPDF

Another option to optimize the PDF is HexaPDF (gem: hexapdf, command: hexapdf). Before introducing it into your stack, it’s important to emphasize that its license is AGPL. If that’s okay with you, read on to learn how to use it.

Install HexaPDF

First, install the hexapdf gem. Refer to the table below for instructions on how to install HexaPDF. If you use Bundler to manage your gems in a Gemfile, add the entry listed in the Bundler column. Otherwise, run the command in the gem command column.

Library gem name Bundler gem command

HexaPDF

hexapdf

gem 'hexapdf'
gem install hexapdf

Compress a PDF

You can then use it to optimize your PDF as follows:

$ hexapdf optimize --compress-pages --force filename.pdf filename.pdf

This command does not manipulate the images in any way. It merely compresses the objects in the PDF and prunes any unreachable references. But given how much waste Prawn leaves behind, this turns out to reduce the file size substantially.

You can hook this command directly into the converter by providing your own implementation of the Optimizer class. Start by creating a Ruby file named optimizer-hexapdf.rb, then populate it with the following code:

optimizer-hexapdf.rb
require 'hexapdf/cli'

class Asciidoctor::PDF::Optimizer
  def initialize(*)
    app = HexaPDF::CLI::Application.new
    app.instance_variable_set :@force, true
    @optimize = app.main_command.commands['optimize']
  end

  def optimize_file path
    options = @optimize.instance_variable_get :@out_options
    options.compress_pages = true
    #options.object_streams = :preserve
    #options.xref_streams = :preserve
    #options.streams = :preserve # or :uncompress
    @optimize.execute path, path
    nil
  rescue
    # retry without page compression, which can sometimes fail
    options.compress_pages = false
    @optimize.execute path, path
    nil
  end
end

To activate your custom optimizer, load this file when invoking the asciidoctor-pdf using the -r flag and set the optimize attribute as well using the -a flag.

$ asciidoctor-pdf -r ./optimizer-hexapdf.rb -a optimize filename.adoc

Now you can convert and optimize all in one go.

To see more options that hexapdf optimize offers, run:

$ hexapdf help optimize

For example, to make the source of the PDF a bit more readable (though less optimized), set the stream-related options to preserve (e.g., --streams preserve from the CLI or options.streams = :preserve from the API). You can also disable page compression (e.g., --no-compress-pages from the CLI or options.compress_pages = false from the API).

hexapdf also allows you to add password protection to your PDF, if that’s something you’re interested in doing.

Rasterizing the PDF

Instead of optimizing the objects in the vector PDF, you may want to rasterize the PDF instead. Rasterizing the PDF prevents any of the text or other objects from being selected, similar to a scanned document.

Asciidoctor PDF doesn’t provide built-in support for rasterizing the generated PDF. However, you can use Ghostscript to flatten all the text in the PDF, thus preventing it from being selected.

$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dNoOutputFonts -r300 -o output.pdf input.pdf

You can adjust the value of the -r option (the density) to get a higher or lower quality result.

Alternately, you can use the convert command from ImageMagick to convert each page in the PDF to an image.

$ convert -density 300 -quality 100 input.pdf output.pdf

Yet another option is to combine Ghostscript and ImageMagick to produce a PDF with pages converted to images.

$ gs -dBATCH -dNOPAUSE -sDEVICE=png16m -o /tmp/tmp-%02d.png -r300 input.pdf
  convert /tmp/tmp-*.png output.pdf
  rm -f /tmp/tmp-*.png

Using Ghostscript to handle the rasterization produces a much smaller output file. The drawback of using Ghostscript in this way is that it has to use intermediate files.