Migrate from Confluence XHTML to Asciidoctor
You can convert Atlassian Confluence XHTML pages to Asciidoctor using this Groovy script.
The script calls Pandoc to convert single or multiple HTML files exported from Confluence to AsciiDoc files. You’ll need Pandoc installed before running this script. If you have trouble running this script, you can use the Pandoc command referenced inside the script to convert XHTML files to AsciiDoc manually.
// This script is provided by melix.
// The source can be found at https://gist.github.com/melix/6020336
@Grab('net.sourceforge.htmlcleaner:htmlcleaner:2.4')
import org.htmlcleaner.*
def src = new File('html').toPath()
def dst = new File('asciidoc').toPath()
def cleaner = new HtmlCleaner()
def props = cleaner.properties
props.translateSpecialEntities = false
def serializer = new SimpleHtmlSerializer(props)
src.toFile().eachFileRecurse { f ->
def relative = src.relativize(f.toPath())
def target = dst.resolve(relative)
if (f.isDirectory()) {
target.toFile().mkdir()
} else if (f.name.endsWith('.html')) {
def tmpHtml = File.createTempFile('clean', 'html')
println "Converting $relative"
def result = cleaner.clean(f)
result.traverse({ tagNode, htmlNode ->
tagNode?.attributes?.remove 'class'
if ('td' == tagNode?.name || 'th'==tagNode?.name) {
tagNode.name='td'
String txt = tagNode.text
tagNode.removeAllChildren()
tagNode.insertChild(0, new ContentNode(txt))
}
true
} as TagNodeVisitor)
serializer.writeToFile(
result, tmpHtml.absolutePath, "utf-8"
)
"pandoc -f html-native_divs -t asciidoctor $tmpHtml --wrap=none -o ${target}.adoc".execute().waitFor()
tmpHtml.delete()
}/* else {
"cp html/$relative $target".execute()
}*/
}
This script was created by Cédric Champeau (melix). You can find the source of this script hosted at this gist.
The script is designed to be run locally on HTML files or directories containing HTML files exported from Confluence.
Usage
-
Save the script contents to a
convert.groovyfile in a working directory. -
Make the file executable according to your specific OS requirements.
-
Create an
htmldirectory for input files and anasciidocdirectory for output files, both inside the working directory. -
Place individual files, or a directory containing files, into the aforementioned
htmldirectory. -
Run
groovy convertto convert the files contained inside thehtmldirectory. -
Look for the generated output file in the
asciidocdirectory and confirm it meets your requirements.