|
|
|
|
==========================
|
|
|
|
|
Docutils_ Hacker's Guide
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
:Author: Lea Wiemann
|
|
|
|
|
:Contact: docutils-develop@lists.sourceforge.net
|
|
|
|
|
:Revision: $Revision: 7302 $
|
|
|
|
|
:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03 Jan 2012) $
|
|
|
|
|
:Copyright: This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
:Abstract: This is the introduction to Docutils for all persons who
|
|
|
|
|
want to extend Docutils in some way.
|
|
|
|
|
:Prerequisites: You have used reStructuredText_ and played around with
|
|
|
|
|
the `Docutils front-end tools`_ before. Some (basic) Python
|
|
|
|
|
knowledge is certainly helpful (though not necessary, strictly
|
|
|
|
|
speaking).
|
|
|
|
|
|
|
|
|
|
.. _Docutils: http://docutils.sourceforge.net/
|
|
|
|
|
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
|
|
|
|
|
.. _Docutils front-end tools: ../user/tools.html
|
|
|
|
|
|
|
|
|
|
.. contents::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Overview of the Docutils Architecture
|
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
|
|
To give you an understanding of the Docutils architecture, we'll dive
|
|
|
|
|
right into the internals using a practical example.
|
|
|
|
|
|
|
|
|
|
Consider the following reStructuredText file::
|
|
|
|
|
|
|
|
|
|
My *favorite* language is Python_.
|
|
|
|
|
|
|
|
|
|
.. _Python: http://www.python.org/
|
|
|
|
|
|
|
|
|
|
Using the ``rst2html.py`` front-end tool, you would get an HTML output
|
|
|
|
|
which looks like this::
|
|
|
|
|
|
|
|
|
|
[uninteresting HTML code removed]
|
|
|
|
|
<body>
|
|
|
|
|
<div class="document">
|
|
|
|
|
<p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p>
|
|
|
|
|
</div>
|
|
|
|
|
</body>
|
|
|
|
|
</html>
|
|
|
|
|
|
|
|
|
|
While this looks very simple, it's enough to illustrate all internal
|
|
|
|
|
processing stages of Docutils. Let's see how this document is
|
|
|
|
|
processed from the reStructuredText source to the final HTML output:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reading the Document
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
The **Reader** reads the document from the source file and passes it
|
|
|
|
|
to the parser (see below). The default reader is the standalone
|
|
|
|
|
reader (``docutils/readers/standalone.py``) which just reads the input
|
|
|
|
|
data from a single text file. Unless you want to do really fancy
|
|
|
|
|
things, there is no need to change that.
|
|
|
|
|
|
|
|
|
|
Since you probably won't need to touch readers, we will just move on
|
|
|
|
|
to the next stage:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parsing the Document
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
The **Parser** analyzes the the input document and creates a **node
|
|
|
|
|
tree** representation. In this case we are using the
|
|
|
|
|
**reStructuredText parser** (``docutils/parsers/rst/__init__.py``).
|
|
|
|
|
To see what that node tree looks like, we call ``quicktest.py`` (which
|
|
|
|
|
can be found in the ``tools/`` directory of the Docutils distribution)
|
|
|
|
|
with our example file (``test.txt``) as first parameter (Windows users
|
|
|
|
|
might need to type ``python quicktest.py test.txt``)::
|
|
|
|
|
|
|
|
|
|
$ quicktest.py test.txt
|
|
|
|
|
<document source="test.txt">
|
|
|
|
|
<paragraph>
|
|
|
|
|
My
|
|
|
|
|
<emphasis>
|
|
|
|
|
favorite
|
|
|
|
|
language is
|
|
|
|
|
<reference name="Python" refname="python">
|
|
|
|
|
Python
|
|
|
|
|
.
|
|
|
|
|
<target ids="python" names="python" refuri="http://www.python.org/">
|
|
|
|
|
|
|
|
|
|
Let us now examine the node tree:
|
|
|
|
|
|
|
|
|
|
The top-level node is ``document``. It has a ``source`` attribute
|
|
|
|
|
whose value is ``text.txt``. There are two children: A ``paragraph``
|
|
|
|
|
node and a ``target`` node. The ``paragraph`` in turn has children: A
|
|
|
|
|
text node ("My "), an ``emphasis`` node, a text node (" language is "),
|
|
|
|
|
a ``reference`` node, and again a ``Text`` node (".").
|
|
|
|
|
|
|
|
|
|
These node types (``document``, ``paragraph``, ``emphasis``, etc.) are
|
|
|
|
|
all defined in ``docutils/nodes.py``. The node types are internally
|
|
|
|
|
arranged as a class hierarchy (for example, both ``emphasis`` and
|
|
|
|
|
``reference`` have the common superclass ``Inline``). To get an
|
|
|
|
|
overview of the node class hierarchy, use epydoc (type ``epydoc
|
|
|
|
|
nodes.py``) and look at the class hierarchy tree.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Transforming the Document
|
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
|
|
In the node tree above, the ``reference`` node does not contain the
|
|
|
|
|
target URI (``http://www.python.org/``) yet.
|
|
|
|
|
|
|
|
|
|
Assigning the target URI (from the ``target`` node) to the
|
|
|
|
|
``reference`` node is *not* done by the parser (the parser only
|
|
|
|
|
translates the input document into a node tree).
|
|
|
|
|
|
|
|
|
|
Instead, it's done by a **Transform**. In this case (resolving a
|
|
|
|
|
reference), it's done by the ``ExternalTargets`` transform in
|
|
|
|
|
``docutils/transforms/references.py``.
|
|
|
|
|
|
|
|
|
|
In fact, there are quite a lot of Transforms, which do various useful
|
|
|
|
|
things like creating the table of contents, applying substitution
|
|
|
|
|
references or resolving auto-numbered footnotes.
|
|
|
|
|
|
|
|
|
|
The Transforms are applied after parsing. To see how the node tree
|
|
|
|
|
has changed after applying the Transforms, we use the
|
|
|
|
|
``rst2pseudoxml.py`` tool:
|
|
|
|
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
|
|
|
|
$ rst2pseudoxml.py test.txt
|
|
|
|
|
<document source="test.txt">
|
|
|
|
|
<paragraph>
|
|
|
|
|
My
|
|
|
|
|
<emphasis>
|
|
|
|
|
favorite
|
|
|
|
|
language is
|
|
|
|
|
<reference name="Python" **refuri="http://www.python.org/"**>
|
|
|
|
|
Python
|
|
|
|
|
.
|
|
|
|
|
<target ids="python" names="python" ``refuri="http://www.python.org/"``>
|
|
|
|
|
|
|
|
|
|
For our small test document, the only change is that the ``refname``
|
|
|
|
|
attribute of the reference has been replaced by a ``refuri``
|
|
|
|
|
attribute |---| the reference has been resolved.
|
|
|
|
|
|
|
|
|
|
While this does not look very exciting, transforms are a powerful tool
|
|
|
|
|
to apply any kind of transformation on the node tree.
|
|
|
|
|
|
|
|
|
|
By the way, you can also get a "real" XML representation of the node
|
|
|
|
|
tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Writing the Document
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
To get an HTML document out of the node tree, we use a **Writer**, the
|
|
|
|
|
HTML writer in this case (``docutils/writers/html4css1.py``).
|
|
|
|
|
|
|
|
|
|
The writer receives the node tree and returns the output document.
|
|
|
|
|
For HTML output, we can test this using the ``rst2html.py`` tool::
|
|
|
|
|
|
|
|
|
|
$ rst2html.py --link-stylesheet test.txt
|
|
|
|
|
<?xml version="1.0" encoding="utf-8" ?>
|
|
|
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
|
|
|
<head>
|
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
|
|
|
<meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" />
|
|
|
|
|
<title></title>
|
|
|
|
|
<link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" />
|
|
|
|
|
</head>
|
|
|
|
|
<body>
|
|
|
|
|
<div class="document">
|
|
|
|
|
<p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p>
|
|
|
|
|
</div>
|
|
|
|
|
</body>
|
|
|
|
|
</html>
|
|
|
|
|
|
|
|
|
|
So here we finally have our HTML output. The actual document contents
|
|
|
|
|
are in the fourth-last line. Note, by the way, that the HTML writer
|
|
|
|
|
did not render the (invisible) ``target`` node |---| only the
|
|
|
|
|
``paragraph`` node and its children appear in the HTML output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extending Docutils
|
|
|
|
|
==================
|
|
|
|
|
|
|
|
|
|
Now you'll ask, "how do I actually extend Docutils?"
|
|
|
|
|
|
|
|
|
|
First of all, once you are clear about *what* you want to achieve, you
|
|
|
|
|
have to decide *where* to implement it |---| in the Parser (e.g. by
|
|
|
|
|
adding a directive or role to the reStructuredText parser), as a
|
|
|
|
|
Transform, or in the Writer. There is often one obvious choice among
|
|
|
|
|
those three (Parser, Transform, Writer). If you are unsure, ask on
|
|
|
|
|
the Docutils-develop_ mailing list.
|
|
|
|
|
|
|
|
|
|
In order to find out how to start, it is often helpful to look at
|
|
|
|
|
similar features which are already implemented. For example, if you
|
|
|
|
|
want to add a new directive to the reStructuredText parser, look at
|
|
|
|
|
the implementation of a similar directive in
|
|
|
|
|
``docutils/parsers/rst/directives/``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Modifying the Document Tree Before It Is Written
|
|
|
|
|
------------------------------------------------
|
|
|
|
|
|
|
|
|
|
You can modify the document tree right before the writer is called.
|
|
|
|
|
One possibility is to use the publish_doctree_ and
|
|
|
|
|
publish_from_doctree_ functions.
|
|
|
|
|
|
|
|
|
|
To retrieve the document tree, call::
|
|
|
|
|
|
|
|
|
|
document = docutils.core.publish_doctree(...)
|
|
|
|
|
|
|
|
|
|
Please see the docstring of publish_doctree for a list of parameters.
|
|
|
|
|
|
|
|
|
|
.. XXX Need to write a well-readable list of (commonly used) options
|
|
|
|
|
of the publish_* functions. Probably in api/publisher.txt.
|
|
|
|
|
|
|
|
|
|
``document`` is the root node of the document tree. You can now
|
|
|
|
|
change the document by accessing the ``document`` node and its
|
|
|
|
|
children |---| see `The Node Interface`_ below.
|
|
|
|
|
|
|
|
|
|
When you're done with modifying the document tree, you can write it
|
|
|
|
|
out by calling::
|
|
|
|
|
|
|
|
|
|
output = docutils.core.publish_from_doctree(document, ...)
|
|
|
|
|
|
|
|
|
|
.. _publish_doctree: ../api/publisher.html#publish_doctree
|
|
|
|
|
.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Node Interface
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
As described in the overview above, Docutils' internal representation
|
|
|
|
|
of a document is a tree of nodes. We'll now have a look at the
|
|
|
|
|
interface of these nodes.
|
|
|
|
|
|
|
|
|
|
(To be completed.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What Now?
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document is not complete. Many topics could (and should) be
|
|
|
|
|
covered here. To find out with which topics we should write about
|
|
|
|
|
first, we are awaiting *your* feedback. So please ask your questions
|
|
|
|
|
on the Docutils-develop_ mailing list.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. |---| unicode:: 8212 .. em-dash
|
|
|
|
|
:trim:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|