Syntax Highlighting

AsciiDoc defines a style of listing block known as a source block for adding source code snippets intended to be colorized by a syntax highlighter to a document. Here’s a simple example of a source block:

[source,ruby]
----
puts "Hello, World!"
----

If a language is specified on a listing block, the source style is implied. Therefore, it can be excluded in this case.

[,ruby]
----
puts "Hello, World!"
----

Here’s how we expect it to appear:

puts "Hello, World!"

It’s up to an AsciiDoc processor such as Asciidoctor to apply the syntax highlighting to this source block, a process referred to as source highlighting. Asciidoctor provides adapters that integrate with several popular syntax highlighter libraries to perform this task. It also provides an interface for implementing a custom syntax highlighter adapter.

Syntax highlighter types

Asciidoctor supports two types of syntax highlighters: client-side and build-time. Let’s explore each type and how they work.

A client-side syntax highlighter performs syntax highlighting in the browser as the page is loading. Asciidoctor does not invoke the syntax highlighter itself. Instead, it focuses on adding the assets to the generated HTML so the browser can load the syntax highlighter and run it. For this type of syntax highlighter, Asciidoctor passes the contents of the source block through to the output as is. It also adds metadata to the element so that the syntax highlighter knows to highlight it and which language it is. Unfortunately, Asciidoctor does not process callout numbers in the source block in this case, so they may cause the syntax highlighter to get tripped up.

A build-time syntax highlighter performs syntax highlighting during AsciiDoc conversion. Asciidoctor does invoke the syntax highlighter in this case. It also takes care of hiding the callout numbers from the syntax highlighter, ensuring they are put back in the proper place afterwards. What Asciidoctor emits into the output is the result produced by the syntax highlighter, which are the tokens enclosed in <span> elements to apply color and other formatting (either inline or via CSS classes). These syntax highlighters tend to support more features because Asciidoctor has greater control over the process.

Client-side vs build-time

There are benefits and drawbacks of each type. The benefit of a client-side syntax highlighter is that does not require installing any additional libraries. It also makes conversion faster and makes it produce smaller output since the syntax highlighting is deferred until page load. The main drawback is that callouts in the source block can be mangled by the syntax highlighter or confuse it. The benefits of a build-time syntax highlighter are that you have more control over syntax highlighting and can enable additional features such as line numbers and line highlighting. The main drawback is that it requires installing an extra library, it slows down conversion, and causes the output to be larger.

You should try each syntax highlighter and find the one that works best for you.

Built-in syntax highlighter adapters

Type Syntax Highlighter Required Gem Compatible Converters

Client-side

Highlight.js

n/a

HTML, Reveal.js

Build-time

CodeRay

coderay (>= 1.1)

HTML, PDF, EPUB3, Reveal.js

Pygments

pygments.rb (>= 1.2)

HTML, PDF, EPUB3, Reveal.js

Rouge

rouge (>= 2)

HTML, PDF, EPUB3, Reveal.js

Asciidoctor does not apply syntax highlighting when generating DocBook. The assumption is that this is a task the DocBook toolchain can handle. The DocBook converter generates a <programlisting> element for the source block and passes through the source language as specified in the AsciiDoc. It’s then up to the DocBook toolchain to apply syntax highlighting to the contents of that tag.

You can explore these integrations in depth on the Highlight.js, Rouge, Pygments, and CodeRay pages.

You can also create your own integration by making a Custom Syntax Highlighter Adapter.

Custom subs on source blocks

You should not mix syntax highlighting with AsciiDoc text formatting (i.e., the quotes and macros substitutions). In many converters, these two operations are mutually exclusive.

Example 1. Improper use of custom substitutions on a source block
[,java,subs=+quotes]
----
interface OrderRepository extends CrudRepository<Order,Long> {

  *List<Order>* findByCategory(String category);

  Order findById(long id);
}
----

The additional markup introduced by AsciiDoc text formatting may confuse the syntax highlighter and lead to unexpected results. While it may work in some syntax highlighters in the HTML backend (which perhaps know how to work around the formatting tags), it will most certainly fail when converting to other formats such as PDF.

If you’re going to customize the substitutions on a source block using the subs attribute, you should limit those substitutions to attribute replacements (attributes).

Example 2. Valid use of custom substitutions on a source block
[,java,subs=attributes+]
----
interface {model}Repository extends CrudRepository<{model},Long> {

  {model} findById(long id);
}
----

If you still need to emphasize certain tokens in the block of code, you should do so by creating a custom lexer or formatter for the syntax highlighting library that understands these additional semantics and perhaps even hints. In other words, work through the syntax highlighter so it’s added during the highlighting process, giving the syntax highlighter full knowledge as to what’s going on. Another option is to use a custom syntax highlighter adapter.

Before trying to get the syntax highlighter to recognize new tokens, make sure it doesn’t already recognize them. If it does, it may just be a matter of customizing the syntax highlighter theme to apply different formatting to those tokens.