Character Replacement Substitutions

The character replacement substitution step processes textual characters such as marks, arrows and dashes and replaces them with the decimal format of their Unicode code point, i.e., their numeric character reference. The replacements step depends on the substitutions completed by the special characters step.

Table 1. Textual symbol replacements
Name Syntax Unicode Replacement Rendered Notes

Copyright

(C)

©

©

Registered

(R)

®

®

Trademark

(TM)

™

Em dash

--

—

 — 

Only replaced if between two word characters, between a word character and a line boundary, or flanked by spaces.

When flanked by space characters (e.g., a -- b), the normal spaces are replaced by thin spaces ( ). Otherwise, the em dash is followed by a zero-width space (​) to provide a break opportunity.

Ellipsis

...

…

…​

The ellipsis is followed by a zero-width space (​) to provide a break opportunity.

Single right arrow

->

→

Double right arrow

=>

⇒

Single left arrow

<-

&#8592;

Double left arrow

<=

&#8656;

Typographic apostrophe

Sam's

Sam&#8217;s

Sam’s

The typewriter apostrophe is replaced with the typographic (aka curly or smart) apostrophe.

This substitution step also recognizes HTML and XML character references as well as decimal and hexadecimal Unicode code points and substitutes them for their corresponding decimal form Unicode code point.

For example, to produce the § symbol you could write &sect;, &#x00A7;, or &#167;. When the document is processed, replacements will replace the section symbol reference, regardless of whether it is a named character reference or a numeric character reference, with &#167;. In turn, &#167; will display as §.

An AsciiDoc processor allows you to use any of the named character references (aka named entities) defined in HTML (e.g., &euro; resolves to €). However, using named character references can cause problems when generating non-HTML output such as PDF because the lookup table needed to resolve these names may not be defined. The recommendation is avoid using named character references, with the exception of the well-known ones defined in XML (i.e., lt, gt, amp, quot, apos). Instead, use numeric character references (e.g., &#8364;).

Anatomy of a character reference

A character reference is a standard sequence of characters that is substituted for a single character by an AsciiDoc processor. There are two types of character references: named character references and numeric character references.

A named character reference (often called a character entity reference) is a short name that refers to a character (i.e., glyph). To make the reference, the name must be prefixed with an ampersand (&) and end with a semicolon (;).

For example:

  • &dagger; displays as †

  • &euro; displays as €

  • &loz; displays as ◊

Numeric character references are the decimal or hexadecimal Universal Character Set/Unicode code points which refer to a character.

  • The decimal code point references are prefixed with an ampersand (&), followed by a hash (#), and end with a semicolon (;).

  • Hexadecimal code point references are prefixed with an ampersand (&), followed by a hash (#), followed by a lowercase x, and end with a semicolon (;).

For example:

  • &#x2020; or &#8224; displays as †

  • &#x20AC; or &#8364; displays as €

  • &#x25CA; or &#9674; displays as ◊

Developers may be more familiar with using Unicode escape sequences to perform text substitutions. For example, to produce an @ sign using a Unicode escape sequence, you would prefix the hexadecimal Unicode code point with a backslash (\) and an uppercase or lowercase u, i.e. u0040. However, the AsciiDoc syntax doesn’t recognize Unicode escape sequences at this time.

AsciiDoc also provides built-in attributes for representing some common symbols. These attributes and their corresponding output are listed in Character Replacement Attributes Reference.

Default replacements substitution

Table 2 lists the specific blocks and inline elements the replacements substitution step applies to automatically.

Table 2. Blocks and inline elements subject to the replacements substitution
Blocks and elements Substitution step applied by default

Attribute entry values

No

Comments

No

Examples

Yes

Headers

No

Literal, listings, and source

No

Macros

Yes
(except passthrough macros)

Open

Yes

Paragraphs

Yes

Passthrough blocks

No

Quotes and verses

Yes

Sidebars

Yes

Tables

Varies

Titles

Yes

replacements substitution value

The replacements substitution step can be modified on blocks and inline elements. For blocks, the step’s name, replacements, can be assigned to the subs attribute. For inline elements, the built-in values r or replacements can be applied to inline text to add the replacements substitution step.

The replacements step depends on the substitutions completed by the special characters step. This is important to keep in mind when applying the replacements value to blocks and inline elements.