|
|
|
|
==================================================
|
|
|
|
|
A Record of reStructuredText Syntax Alternatives
|
|
|
|
|
==================================================
|
|
|
|
|
|
|
|
|
|
:Author: David Goodger
|
|
|
|
|
:Contact: docutils-develop@lists.sourceforge.net
|
|
|
|
|
:Revision: $Revision: 7383 $
|
|
|
|
|
:Date: $Date: 2012-03-19 18:04:49 +0100 (Mo, 19 Mär 2012) $
|
|
|
|
|
:Copyright: This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
The following are ideas, alternatives, and justifications that were
|
|
|
|
|
considered for reStructuredText syntax, which did not originate with
|
|
|
|
|
Setext_ or StructuredText_. For an analysis of constructs which *did*
|
|
|
|
|
originate with StructuredText or Setext, please see `Problems With
|
|
|
|
|
StructuredText`_. See the `reStructuredText Markup Specification`_
|
|
|
|
|
for full details of the established syntax.
|
|
|
|
|
|
|
|
|
|
The ideas are divided into sections:
|
|
|
|
|
|
|
|
|
|
* Implemented_: already done. The issues and alternatives are
|
|
|
|
|
recorded here for posterity.
|
|
|
|
|
|
|
|
|
|
* `Not Implemented`_: these ideas won't be implemented.
|
|
|
|
|
|
|
|
|
|
* Tabled_: these ideas should be revisited in the future.
|
|
|
|
|
|
|
|
|
|
* `To Do`_: these ideas should be implemented. They're just waiting
|
|
|
|
|
for a champion to resolve issues and get them done.
|
|
|
|
|
|
|
|
|
|
* `... Or Not To Do?`_: possible but questionable. These probably
|
|
|
|
|
won't be implemented, but you never know.
|
|
|
|
|
|
|
|
|
|
.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
|
|
|
|
|
.. _StructuredText:
|
|
|
|
|
http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage
|
|
|
|
|
.. _Problems with StructuredText: problems.html
|
|
|
|
|
.. _reStructuredText Markup Specification:
|
|
|
|
|
../../ref/rst/restructuredtext.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. contents::
|
|
|
|
|
|
|
|
|
|
-------------
|
|
|
|
|
Implemented
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
Field Lists
|
|
|
|
|
===========
|
|
|
|
|
|
|
|
|
|
Prior to the syntax for field lists being finalized, several
|
|
|
|
|
alternatives were proposed.
|
|
|
|
|
|
|
|
|
|
1. Unadorned RFC822_ everywhere::
|
|
|
|
|
|
|
|
|
|
Author: Me
|
|
|
|
|
Version: 1
|
|
|
|
|
|
|
|
|
|
Advantages: clean, precedent (RFC822-compliant). Disadvantage:
|
|
|
|
|
ambiguous (these paragraphs are a prime example).
|
|
|
|
|
|
|
|
|
|
Conclusion: rejected.
|
|
|
|
|
|
|
|
|
|
2. Special case: use unadorned RFC822_ for the very first or very last
|
|
|
|
|
text block of a document::
|
|
|
|
|
|
|
|
|
|
"""
|
|
|
|
|
Author: Me
|
|
|
|
|
Version: 1
|
|
|
|
|
|
|
|
|
|
The rest of the document...
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
Advantages: clean, precedent (RFC822-compliant). Disadvantages:
|
|
|
|
|
special case, flat (unnested) field lists only, still ambiguous::
|
|
|
|
|
|
|
|
|
|
"""
|
|
|
|
|
Usage: cmdname [options] arg1 arg2 ...
|
|
|
|
|
|
|
|
|
|
We obviously *don't* want the like above to be interpreted as a
|
|
|
|
|
field list item. Or do we?
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
Conclusion: rejected for the general case, accepted for specific
|
|
|
|
|
contexts (PEPs, email).
|
|
|
|
|
|
|
|
|
|
3. Use a directive::
|
|
|
|
|
|
|
|
|
|
.. fields::
|
|
|
|
|
|
|
|
|
|
Author: Me
|
|
|
|
|
Version: 1
|
|
|
|
|
|
|
|
|
|
Advantages: explicit and unambiguous, RFC822-compliant.
|
|
|
|
|
Disadvantage: cumbersome.
|
|
|
|
|
|
|
|
|
|
Conclusion: rejected for the general case (but such a directive
|
|
|
|
|
could certainly be written).
|
|
|
|
|
|
|
|
|
|
4. Use Javadoc-style::
|
|
|
|
|
|
|
|
|
|
@Author: Me
|
|
|
|
|
@Version: 1
|
|
|
|
|
@param a: integer
|
|
|
|
|
|
|
|
|
|
Advantages: unambiguous, precedent, flexible. Disadvantages:
|
|
|
|
|
non-intuitive, ugly, not RFC822-compliant.
|
|
|
|
|
|
|
|
|
|
Conclusion: rejected.
|
|
|
|
|
|
|
|
|
|
5. Use leading colons::
|
|
|
|
|
|
|
|
|
|
:Author: Me
|
|
|
|
|
:Version: 1
|
|
|
|
|
|
|
|
|
|
Advantages: unambiguous, obvious (*almost* RFC822-compliant),
|
|
|
|
|
flexible, perhaps even elegant. Disadvantages: no precedent, not
|
|
|
|
|
quite RFC822-compliant.
|
|
|
|
|
|
|
|
|
|
Conclusion: accepted!
|
|
|
|
|
|
|
|
|
|
6. Use double colons::
|
|
|
|
|
|
|
|
|
|
Author:: Me
|
|
|
|
|
Version:: 1
|
|
|
|
|
|
|
|
|
|
Advantages: unambiguous, obvious? (*almost* RFC822-compliant),
|
|
|
|
|
flexible, similar to syntax already used for literal blocks and
|
|
|
|
|
directives. Disadvantages: no precedent, not quite
|
|
|
|
|
RFC822-compliant, similar to syntax already used for literal blocks
|
|
|
|
|
and directives.
|
|
|
|
|
|
|
|
|
|
Conclusion: rejected because of the syntax similarity & conflicts.
|
|
|
|
|
|
|
|
|
|
Why is RFC822 compliance important? It's a universal Internet
|
|
|
|
|
standard, and super obvious. Also, I'd like to support the PEP format
|
|
|
|
|
(ulterior motive: get PEPs to use reStructuredText as their standard).
|
|
|
|
|
But it *would* be easy to get used to an alternative (easy even to
|
|
|
|
|
convert PEPs; probably harder to convert python-deviants ;-).
|
|
|
|
|
|
|
|
|
|
Unfortunately, without well-defined context (such as in email headers:
|
|
|
|
|
RFC822 only applies before any blank lines), the RFC822 format is
|
|
|
|
|
ambiguous. It is very common in ordinary text. To implement field
|
|
|
|
|
lists unambiguously, we need explicit syntax.
|
|
|
|
|
|
|
|
|
|
The following question was posed in a footnote:
|
|
|
|
|
|
|
|
|
|
Should "bibliographic field lists" be defined at the parser level,
|
|
|
|
|
or at the DPS transformation level? In other words, are they
|
|
|
|
|
reStructuredText-specific, or would they also be applicable to
|
|
|
|
|
another (many/every other?) syntax?
|
|
|
|
|
|
|
|
|
|
The answer is that bibliographic fields are a
|
|
|
|
|
reStructuredText-specific markup convention. Other syntaxes may
|
|
|
|
|
implement the bibliographic elements explicitly. For example, there
|
|
|
|
|
would be no need for such a transformation for an XML-based markup
|
|
|
|
|
syntax.
|
|
|
|
|
|
|
|
|
|
.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Interpreted Text "Roles"
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
The original purpose of interpreted text was as a mechanism for
|
|
|
|
|
descriptive markup, to describe the nature or role of a word or
|
|
|
|
|
phrase. For example, in XML we could say "<function>len</function>"
|
|
|
|
|
to mark up "len" as a function. It is envisaged that within Python
|
|
|
|
|
docstrings (inline documentation in Python module source files, the
|
|
|
|
|
primary market for reStructuredText) the role of a piece of
|
|
|
|
|
interpreted text can be inferred implicitly from the context of the
|
|
|
|
|
docstring within the program source. For other applications, however,
|
|
|
|
|
the role may have to be indicated explicitly.
|
|
|
|
|
|
|
|
|
|
Interpreted text is enclosed in single backquotes (`).
|
|
|
|
|
|
|
|
|
|
1. Initially, it was proposed that an explicit role could be indicated
|
|
|
|
|
as a word or phrase within the enclosing backquotes:
|
|
|
|
|
|
|
|
|
|
- As a prefix, separated by a colon and whitespace::
|
|
|
|
|
|
|
|
|
|
`role: interpreted text`
|
|
|
|
|
|
|
|
|
|
- As a suffix, separated by whitespace and a colon::
|
|
|
|
|
|
|
|
|
|
`interpreted text :role`
|
|
|
|
|
|
|
|
|
|
There are problems with the initial approach:
|
|
|
|
|
|
|
|
|
|
- There could be ambiguity with interpreted text containing colons.
|
|
|
|
|
For example, an index entry of "Mission: Impossible" would
|
|
|
|
|
require a backslash-escaped colon.
|
|
|
|
|
|
|
|
|
|
- The explicit role is descriptive markup, not content, and will
|
|
|
|
|
not be visible in the processed output. Putting it inside the
|
|
|
|
|
backquotes doesn't feel right; the *role* isn't being quoted.
|
|
|
|
|
|
|
|
|
|
2. Tony Ibbs suggested that the role be placed outside the
|
|
|
|
|
backquotes::
|
|
|
|
|
|
|
|
|
|
role:`prefix` or `suffix`:role
|
|
|
|
|
|
|
|
|
|
This removes the embedded-colons ambiguity, but limits the role
|
|
|
|
|
identifier to be a single word (whitespace would be illegal).
|
|
|
|
|
Since roles are not meant to be visible after processing, the lack
|
|
|
|
|
of whitespace support is not important.
|
|
|
|
|
|
|
|
|
|
The suggested syntax remains ambiguous with respect to ratios and
|
|
|
|
|
some writing styles. For example, suppose there is a "signal"
|
|
|
|
|
identifier, and we write::
|
|
|
|
|
|
|
|
|
|
...calculate the `signal`:noise ratio.
|
|
|
|
|
|
|
|
|
|
"noise" looks like a role.
|
|
|
|
|
|
|
|
|
|
3. As an improvement on #2, we can bracket the role with colons::
|
|
|
|
|
|
|
|
|
|
:role:`prefix` or `suffix`:role:
|
|
|
|
|
|
|
|
|
|
This syntax is similar to that of field lists, which is fine since
|
|
|
|
|
both are doing similar things: describing.
|
|
|
|
|
|
|
|
|
|
This is the syntax chosen for reStructuredText.
|
|
|
|
|
|
|
|
|
|
4. Another alternative is two colons instead of one::
|
|
|
|
|
|
|
|
|
|
role::`prefix` or `suffix`::role
|
|
|
|
|
|
|
|
|
|
But this is used for analogies ("A:B::C:D": "A is to B as C is to
|
|
|
|
|
D").
|
|
|
|
|
|
|
|
|
|
Both alternative #2 and #4 lack delimiters on both sides of the
|
|
|
|
|
role, making it difficult to parse (by the reader).
|
|
|
|
|
|
|
|
|
|
5. Some kind of bracketing could be used:
|
|
|
|
|
|
|
|
|
|
- Parentheses::
|
|
|
|
|
|
|
|
|
|
(role)`prefix` or `suffix`(role)
|
|
|
|
|
|
|
|
|
|
- Braces::
|
|
|
|
|
|
|
|
|
|
{role}`prefix` or `suffix`{role}
|
|
|
|
|
|
|
|
|
|
- Square brackets::
|
|
|
|
|
|
|
|
|
|
[role]`prefix` or `suffix`[role]
|
|
|
|
|
|
|
|
|
|
- Angle brackets::
|
|
|
|
|
|
|
|
|
|
<role>`prefix` or `suffix`<role>
|
|
|
|
|
|
|
|
|
|
(The overlap of \*ML tags with angle brackets would be too
|
|
|
|
|
confusing and precludes their use.)
|
|
|
|
|
|
|
|
|
|
Syntax #3 was chosen for reStructuredText.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comments
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
A problem with comments (actually, with all indented constructs) is
|
|
|
|
|
that they cannot be followed by an indented block -- a block quote --
|
|
|
|
|
without swallowing it up.
|
|
|
|
|
|
|
|
|
|
I thought that perhaps comments should be one-liners only. But would
|
|
|
|
|
this mean that footnotes, hyperlink targets, and directives must then
|
|
|
|
|
also be one-liners? Not a good solution.
|
|
|
|
|
|
|
|
|
|
Tony Ibbs suggested a "comment" directive. I added that we could
|
|
|
|
|
limit a comment to a single text block, and that a "multi-block
|
|
|
|
|
comment" could use "comment-start" and "comment-end" directives. This
|
|
|
|
|
would remove the indentation incompatibility. A "comment" directive
|
|
|
|
|
automatically suggests "footnote" and (hyperlink) "target" directives
|
|
|
|
|
as well. This could go on forever! Bad choice.
|
|
|
|
|
|
|
|
|
|
Garth Kidd suggested that an "empty comment", a ".." explicit markup
|
|
|
|
|
start with nothing on the first line (except possibly whitespace) and
|
|
|
|
|
a blank line immediately following, could serve as an "unindent". An
|
|
|
|
|
empty comment does **not** swallow up indented blocks following it,
|
|
|
|
|
so block quotes are safe. "A tiny but practical wart." Accepted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anonymous Hyperlinks
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
Alan Jaffray came up with this idea, along with the following syntax::
|
|
|
|
|
|
|
|
|
|
Search the `Python DOC-SIG mailing list archives`{}_.
|
|
|
|
|
|
|
|
|
|
.. _: http://mail.python.org/pipermail/doc-sig/
|
|
|
|
|
|
|
|
|
|
The idea is sound and useful. I suggested a "double underscore"
|
|
|
|
|
syntax::
|
|
|
|
|
|
|
|
|
|
Search the `Python DOC-SIG mailing list archives`__.
|
|
|
|
|
|
|
|
|
|
.. __: http://mail.python.org/pipermail/doc-sig/
|
|
|
|
|
|
|
|
|
|
But perhaps single underscores are okay? The syntax looks better, but
|
|
|
|
|
the hyperlink itself doesn't explicitly say "anonymous"::
|
|
|
|
|
|
|
|
|
|
Search the `Python DOC-SIG mailing list archives`_.
|
|
|
|
|
|
|
|
|
|
.. _: http://mail.python.org/pipermail/doc-sig/
|
|
|
|
|
|
|
|
|
|
Mixing anonymous and named hyperlinks becomes confusing. The order of
|
|
|
|
|
targets is not significant for named hyperlinks, but it is for
|
|
|
|
|
anonymous hyperlinks::
|
|
|
|
|
|
|
|
|
|
Hyperlinks: anonymous_, named_, and another anonymous_.
|
|
|
|
|
|
|
|
|
|
.. _named: named
|
|
|
|
|
.. _: anonymous1
|
|
|
|
|
.. _: anonymous2
|
|
|
|
|
|
|
|
|
|
Without the extra syntax of double underscores, determining which
|
|
|
|
|
hyperlink references are anonymous may be difficult. We'd have to
|
|
|
|
|
check which references don't have corresponding targets, and match
|
|
|
|
|
those up with anonymous targets. Keeping to a simple consistent
|
|
|
|
|
ordering (as with auto-numbered footnotes) seems simplest.
|
|
|
|
|
|
|
|
|
|
reStructuredText will use the explicit double-underscore syntax for
|
|
|
|
|
anonymous hyperlinks. An alternative (see `Reworking Explicit Markup
|
|
|
|
|
(Round 1)`_ below) for the somewhat awkward ".. __:" syntax is "__"::
|
|
|
|
|
|
|
|
|
|
An anonymous__ reference.
|
|
|
|
|
|
|
|
|
|
__ http://anonymous
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reworking Explicit Markup (Round 1)
|
|
|
|
|
===================================
|
|
|
|
|
|
|
|
|
|
Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added
|
|
|
|
|
to reStructuredText. Subsequently it was asserted that hyperlinks
|
|
|
|
|
(especially anonymous hyperlinks) would play an increasingly important
|
|
|
|
|
role in reStructuredText documents, and therefore they require a
|
|
|
|
|
simpler and more concise syntax. This prompted a review of the
|
|
|
|
|
current and proposed explicit markup syntaxes with regards to
|
|
|
|
|
improving usability.
|
|
|
|
|
|
|
|
|
|
1. Original syntax::
|
|
|
|
|
|
|
|
|
|
.. _blah: internal hyperlink target
|
|
|
|
|
.. _blah: http://somewhere external hyperlink target
|
|
|
|
|
.. _blah: blahblah_ indirect hyperlink target
|
|
|
|
|
.. __: anonymous internal target
|
|
|
|
|
.. __: http://somewhere anonymous external target
|
|
|
|
|
.. __: blahblah_ anonymous indirect target
|
|
|
|
|
.. [blah] http://somewhere footnote
|
|
|
|
|
.. blah:: http://somewhere directive
|
|
|
|
|
.. blah: http://somewhere comment
|
|
|
|
|
|
|
|
|
|
.. Note::
|
|
|
|
|
|
|
|
|
|
The comment text was intentionally made to look like a hyperlink
|
|
|
|
|
target.
|
|
|
|
|
|
|
|
|
|
Origins:
|
|
|
|
|
|
|
|
|
|
* Except for the colon (a delimiter necessary to allow for
|
|
|
|
|
phrase-links), hyperlink target ``.. _blah:`` comes from Setext.
|
|
|
|
|
* Comment syntax from Setext.
|
|
|
|
|
* Footnote syntax from StructuredText ("named links").
|
|
|
|
|
* Directives and anonymous hyperlinks original to reStructuredText.
|
|
|
|
|
|
|
|
|
|
Advantages:
|
|
|
|
|
|
|
|
|
|
+ Consistent explicit markup indicator: "..".
|
|
|
|
|
+ Consistent hyperlink syntax: ".. _" & ":".
|
|
|
|
|
|
|
|
|
|
Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Anonymous target markup is awkward: ".. __:".
|
|
|
|
|
- The explicit markup indicator ("..") is excessively overloaded?
|
|
|
|
|
- Comment text is limited (can't look like a footnote, hyperlink,
|
|
|
|
|
or directive). But this is probably not important.
|
|
|
|
|
|
|
|
|
|
2. Alan Jaffray's proposed syntax #1::
|
|
|
|
|
|
|
|
|
|
__ _blah internal hyperlink target
|
|
|
|
|
__ blah: http://somewhere external hyperlink target
|
|
|
|
|
__ blah: blahblah_ indirect hyperlink target
|
|
|
|
|
__ anonymous internal target
|
|
|
|
|
__ http://somewhere anonymous external target
|
|
|
|
|
__ blahblah_ anonymous indirect target
|
|
|
|
|
__ [blah] http://somewhere footnote
|
|
|
|
|
.. blah:: http://somewhere directive
|
|
|
|
|
.. blah: http://somewhere comment
|
|
|
|
|
|
|
|
|
|
The hyperlink-connoted underscores have become first-level syntax.
|
|
|
|
|
|
|
|
|
|
Advantages:
|
|
|
|
|
|
|
|
|
|
+ Anonymous targets are simpler.
|
|
|
|
|
+ All hyperlink targets are one character shorter.
|
|
|
|
|
|
|
|
|
|
Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Inconsistent internal hyperlink targets. Unlike all other named
|
|
|
|
|
hyperlink targets, there's no colon. There's an extra leading
|
|
|
|
|
underscore, but we can't drop it because without it, "blah" looks
|
|
|
|
|
like a relative URI. Unless we restore the colon::
|
|
|
|
|
|
|
|
|
|
__ blah: internal hyperlink target
|
|
|
|
|
|
|
|
|
|
- Obtrusive markup?
|
|
|
|
|
|
|
|
|
|
3. Alan Jaffray's proposed syntax #2::
|
|
|
|
|
|
|
|
|
|
.. _blah internal hyperlink target
|
|
|
|
|
.. blah: http://somewhere external hyperlink target
|
|
|
|
|
.. blah: blahblah_ indirect hyperlink target
|
|
|
|
|
.. anonymous internal target
|
|
|
|
|
.. http://somewhere anonymous external target
|
|
|
|
|
.. blahblah_ anonymous indirect target
|
|
|
|
|
.. [blah] http://somewhere footnote
|
|
|
|
|
!! blah: http://somewhere directive
|
|
|
|
|
## blah: http://somewhere comment
|
|
|
|
|
|
|
|
|
|
Leading underscores have been (almost) replaced by "..", while
|
|
|
|
|
comments and directives have gained their own syntax.
|
|
|
|
|
|
|
|
|
|
Advantages:
|
|
|
|
|
|
|
|
|
|
+ Anonymous hyperlinks are simpler.
|
|
|
|
|
+ Unique syntax for comments. Connotation of "comment" from
|
|
|
|
|
some programming languages (including our favorite).
|
|
|
|
|
+ Unique syntax for directives. Connotation of "action!".
|
|
|
|
|
|
|
|
|
|
Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Inconsistent internal hyperlink targets. Again, unlike all other
|
|
|
|
|
named hyperlink targets, there's no colon. There's a leading
|
|
|
|
|
underscore, matching the trailing underscores of references,
|
|
|
|
|
which no other hyperlink targets have. We can't drop that one
|
|
|
|
|
leading underscore though: without it, "blah" looks like a
|
|
|
|
|
relative URI. Again, unless we restore the colon::
|
|
|
|
|
|
|
|
|
|
.. blah: internal hyperlink target
|
|
|
|
|
|
|
|
|
|
- All (except for internal) hyperlink targets lack their leading
|
|
|
|
|
underscores, losing the "hyperlink" connotation.
|
|
|
|
|
|
|
|
|
|
- Obtrusive syntax for comments. Alternatives::
|
|
|
|
|
|
|
|
|
|
;; blah: http://somewhere
|
|
|
|
|
(also comment syntax in Lisp & others)
|
|
|
|
|
,, blah: http://somewhere
|
|
|
|
|
("comma comma": sounds like "comment"!)
|
|
|
|
|
|
|
|
|
|
- Iffy syntax for directives. Alternatives?
|
|
|
|
|
|
|
|
|
|
4. Tony Ibbs' proposed syntax::
|
|
|
|
|
|
|
|
|
|
.. _blah: internal hyperlink target
|
|
|
|
|
.. _blah: http://somewhere external hyperlink target
|
|
|
|
|
.. _blah: blahblah_ indirect hyperlink target
|
|
|
|
|
.. anonymous internal target
|
|
|
|
|
.. http://somewhere anonymous external target
|
|
|
|
|
.. blahblah_ anonymous indirect target
|
|
|
|
|
.. [blah] http://somewhere footnote
|
|
|
|
|
.. blah:: http://somewhere directive
|
|
|
|
|
.. blah: http://somewhere comment
|
|
|
|
|
|
|
|
|
|
This is the same as the current syntax, except for anonymous
|
|
|
|
|
targets which drop their "__: ".
|
|
|
|
|
|
|
|
|
|
Advantage:
|
|
|
|
|
|
|
|
|
|
+ Anonymous targets are simpler.
|
|
|
|
|
|
|
|
|
|
Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Anonymous targets lack their leading underscores, losing the
|
|
|
|
|
"hyperlink" connotation.
|
|
|
|
|
- Anonymous targets are almost indistinguishable from comments.
|
|
|
|
|
(Better to know "up front".)
|
|
|
|
|
|
|
|
|
|
5. David Goodger's proposed syntax: Perhaps going back to one of
|
|
|
|
|
Alan's earlier suggestions might be the best solution. How about
|
|
|
|
|
simply adding "__ " as a synonym for ".. __: " in the original
|
|
|
|
|
syntax? These would become equivalent::
|
|
|
|
|
|
|
|
|
|
.. __: anonymous internal target
|
|
|
|
|
.. __: http://somewhere anonymous external target
|
|
|
|
|
.. __: blahblah_ anonymous indirect target
|
|
|
|
|
|
|
|
|
|
__ anonymous internal target
|
|
|
|
|
__ http://somewhere anonymous external target
|
|
|
|
|
__ blahblah_ anonymous indirect target
|
|
|
|
|
|
|
|
|
|
Alternative 5 has been adopted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Backquotes in Phrase-Links
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
[From a 2001-06-05 Doc-SIG post in reply to questions from Doug
|
|
|
|
|
Hellmann.]
|
|
|
|
|
|
|
|
|
|
The first draft of the spec, posted to the Doc-SIG in November 2000,
|
|
|
|
|
used square brackets for phrase-links. I changed my mind because:
|
|
|
|
|
|
|
|
|
|
1. In the first draft, I had already decided on single-backquotes for
|
|
|
|
|
inline literal text.
|
|
|
|
|
|
|
|
|
|
2. However, I wanted to minimize the necessity for backslash escapes,
|
|
|
|
|
for example when quoting Python repr-equivalent syntax that uses
|
|
|
|
|
backquotes.
|
|
|
|
|
|
|
|
|
|
3. The processing of identifiers (function/method/attribute/module
|
|
|
|
|
etc. names) into hyperlinks is a useful feature. PyDoc recognizes
|
|
|
|
|
identifiers heuristically, but it doesn't take much imagination to
|
|
|
|
|
come up with counter-examples where PyDoc's heuristics would result
|
|
|
|
|
in embarassing failure. I wanted to do it deterministically, and
|
|
|
|
|
that called for syntax. I called this construct "interpreted
|
|
|
|
|
text".
|
|
|
|
|
|
|
|
|
|
4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the
|
|
|
|
|
idea of using double-backquotes as syntax.
|
|
|
|
|
|
|
|
|
|
5. I worked out some rules for inline markup recognition.
|
|
|
|
|
|
|
|
|
|
6. In combination with #5, double backquotes lent themselves to inline
|
|
|
|
|
literals, neatly satisfying #2, minimizing backslash escapes. In
|
|
|
|
|
fact, the spec says that no interpretation of any kind is done
|
|
|
|
|
within double-backquote inline literal text; backslashes do *no*
|
|
|
|
|
escaping within literal text.
|
|
|
|
|
|
|
|
|
|
7. Single backquotes are then freed up for interpreted text.
|
|
|
|
|
|
|
|
|
|
8. I already had square brackets required for footnote references.
|
|
|
|
|
|
|
|
|
|
9. Since interpreted text will typically turn into hyperlinks, it was
|
|
|
|
|
a natural fit to use backquotes as the phrase-quoting syntax for
|
|
|
|
|
trailing-underscore hyperlinks.
|
|
|
|
|
|
|
|
|
|
The original inspiration for the trailing underscore hyperlink syntax
|
|
|
|
|
was Setext. But for phrases Setext used a very cumbersome
|
|
|
|
|
``underscores_between_words_like_this_`` syntax.
|
|
|
|
|
|
|
|
|
|
The underscores can be viewed as if they were right-pointing arrows:
|
|
|
|
|
``-->``. So ``hyperlink_`` points away from the reference, and
|
|
|
|
|
``.. _hyperlink:`` points toward the target.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Substitution Mechanism
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by
|
|
|
|
|
Alan Jaffray, "reStructuredText inline markup". It reminded me of a
|
|
|
|
|
missing piece of the reStructuredText puzzle, first referred to in my
|
|
|
|
|
contribution to "Documentation markup & processing / PEPs" (Doc-SIG
|
|
|
|
|
2001-06-21).
|
|
|
|
|
|
|
|
|
|
Substitutions allow the power and flexibility of directives to be
|
|
|
|
|
shared by inline text. They are a way to allow arbitrarily complex
|
|
|
|
|
inline objects, while keeping the details out of the flow of text.
|
|
|
|
|
They are the equivalent of SGML/XML's named entities. For example, an
|
|
|
|
|
inline image (using reference syntax alternative 4d (vertical bars)
|
|
|
|
|
and definition alternative 3, the alternatives chosen for inclusion in
|
|
|
|
|
the spec)::
|
|
|
|
|
|
|
|
|
|
The |biohazard| symbol must be used on containers used to dispose
|
|
|
|
|
of medical waste.
|
|
|
|
|
|
|
|
|
|
.. |biohazard| image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
|
|
|
|
|
The ``|biohazard|`` substitution reference will be replaced in-line by
|
|
|
|
|
whatever the ``.. |biohazard|`` substitution definition generates (in
|
|
|
|
|
this case, an image). A substitution definition contains the
|
|
|
|
|
substitution text bracketed with vertical bars, followed by a an
|
|
|
|
|
embedded inline-compatible directive, such as "image". A transform is
|
|
|
|
|
required to complete the substitution.
|
|
|
|
|
|
|
|
|
|
Syntax alternatives for the reference:
|
|
|
|
|
|
|
|
|
|
1. Use the existing interpreted text syntax, with a predefined role
|
|
|
|
|
such as "sub"::
|
|
|
|
|
|
|
|
|
|
The `biohazard`:sub: symbol...
|
|
|
|
|
|
|
|
|
|
Advantages: existing syntax, explicit. Disadvantages: verbose,
|
|
|
|
|
obtrusive.
|
|
|
|
|
|
|
|
|
|
2. Use a variant of the interpreted text syntax, with a new suffix
|
|
|
|
|
akin to the underscore in phrase-link references::
|
|
|
|
|
|
|
|
|
|
(a) `name`@
|
|
|
|
|
(b) `name`#
|
|
|
|
|
(c) `name`&
|
|
|
|
|
(d) `name`/
|
|
|
|
|
(e) `name`<
|
|
|
|
|
(f) `name`::
|
|
|
|
|
(g) `name`:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Due to incompatibility with other constructs and ordinary text
|
|
|
|
|
usage, (f) and (g) are not possible.
|
|
|
|
|
|
|
|
|
|
3. Use interpreted text syntax with a fixed internal format::
|
|
|
|
|
|
|
|
|
|
(a) `:name:`
|
|
|
|
|
(b) `name:`
|
|
|
|
|
(c) `name::`
|
|
|
|
|
(d) `::name::`
|
|
|
|
|
(e) `%name%`
|
|
|
|
|
(f) `#name#`
|
|
|
|
|
(g) `/name/`
|
|
|
|
|
(h) `&name&`
|
|
|
|
|
(i) `|name|`
|
|
|
|
|
(j) `[name]`
|
|
|
|
|
(k) `<name>`
|
|
|
|
|
(l) `&name;`
|
|
|
|
|
(m) `'name'`
|
|
|
|
|
|
|
|
|
|
To avoid ML confusion (k) and (l) are definitely out. Square
|
|
|
|
|
brackets (j) won't work in the target (the substitution definition
|
|
|
|
|
would be indistinguishable from a footnote).
|
|
|
|
|
|
|
|
|
|
The ```/name/``` syntax (g) is reminiscent of "s/find/sub"
|
|
|
|
|
substitution syntax in ed-like languages. However, it may have a
|
|
|
|
|
misleading association with regexps, and looks like an absolute
|
|
|
|
|
POSIX path. (i) is visually equivalent and lacking the
|
|
|
|
|
connotations.
|
|
|
|
|
|
|
|
|
|
A disadvantage of all of these is that they limit interpreted text,
|
|
|
|
|
albeit only slightly.
|
|
|
|
|
|
|
|
|
|
4. Use specialized syntax, something new::
|
|
|
|
|
|
|
|
|
|
(a) #name#
|
|
|
|
|
(b) @name@
|
|
|
|
|
(c) /name/
|
|
|
|
|
(d) |name|
|
|
|
|
|
(e) <<name>>
|
|
|
|
|
(f) //name//
|
|
|
|
|
(g) ||name||
|
|
|
|
|
(h) ^name^
|
|
|
|
|
(i) [[name]]
|
|
|
|
|
(j) ~name~
|
|
|
|
|
(k) !name!
|
|
|
|
|
(l) =name=
|
|
|
|
|
(m) ?name?
|
|
|
|
|
(n) >name<
|
|
|
|
|
|
|
|
|
|
"#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes
|
|
|
|
|
looks just like a POSIX path; it is likely for such usage to appear
|
|
|
|
|
in text.
|
|
|
|
|
|
|
|
|
|
"|" (d) and "^" (h) are feasible.
|
|
|
|
|
|
|
|
|
|
5. Redefine the trailing underscore syntax. See definition syntax
|
|
|
|
|
alternative 4, below.
|
|
|
|
|
|
|
|
|
|
Syntax alternatives for the definition:
|
|
|
|
|
|
|
|
|
|
1. Use the existing directive syntax, with a predefined directive such
|
|
|
|
|
as "sub". It contains a further embedded directive resolving to an
|
|
|
|
|
inline-compatible object::
|
|
|
|
|
|
|
|
|
|
.. sub:: biohazard
|
|
|
|
|
.. image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
|
|
|
|
|
.. sub:: parrot
|
|
|
|
|
That bird wouldn't *voom* if you put 10,000,000 volts
|
|
|
|
|
through it!
|
|
|
|
|
|
|
|
|
|
The advantages and disadvantages are the same as in inline
|
|
|
|
|
alternative 1.
|
|
|
|
|
|
|
|
|
|
2. Use syntax as in #1, but with an embedded directivecompressed::
|
|
|
|
|
|
|
|
|
|
.. sub:: biohazard image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
|
|
|
|
|
This is a bit better than alternative 1, but still too much.
|
|
|
|
|
|
|
|
|
|
3. Use a variant of directive syntax, incorporating the substitution
|
|
|
|
|
text, obviating the need for a special "sub" directive name. If we
|
|
|
|
|
assume reference alternative 4d (vertical bars), the matching
|
|
|
|
|
definition would look like this::
|
|
|
|
|
|
|
|
|
|
.. |biohazard| image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
|
|
|
|
|
4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.)
|
|
|
|
|
|
|
|
|
|
Instead of adding new syntax, redefine the trailing underscore
|
|
|
|
|
syntax to mean "substitution reference" instead of "hyperlink
|
|
|
|
|
reference". Alan's example::
|
|
|
|
|
|
|
|
|
|
I had lunch with Jonathan_ today. We talked about Zope_.
|
|
|
|
|
|
|
|
|
|
.. _Jonathan: lj [user=jhl]
|
|
|
|
|
.. _Zope: http://www.zope.org/
|
|
|
|
|
|
|
|
|
|
A problem with the proposed syntax is that URIs which look like
|
|
|
|
|
simple reference names (alphanum plus ".", "-", "_") would be
|
|
|
|
|
indistinguishable from substitution directive names. A more
|
|
|
|
|
consistent syntax would be::
|
|
|
|
|
|
|
|
|
|
I had lunch with Jonathan_ today. We talked about Zope_.
|
|
|
|
|
|
|
|
|
|
.. _Jonathan: lj:: user=jhl
|
|
|
|
|
.. _Zope: http://www.zope.org/
|
|
|
|
|
|
|
|
|
|
(``::`` after ``.. _Jonathan: lj``.)
|
|
|
|
|
|
|
|
|
|
The "Zope" target is a simple external hyperlink, but the
|
|
|
|
|
"Jonathan" target contains a directive. Alan proposed is that the
|
|
|
|
|
reference text be replaced by whatever the referenced directive
|
|
|
|
|
(the "directive target") produces. A directive reference becomes a
|
|
|
|
|
hyperlink reference if the contents of the directive target resolve
|
|
|
|
|
to a hyperlink. If the directive target resolves to an icon, the
|
|
|
|
|
reference is replaced by an inline icon. If the directive target
|
|
|
|
|
resolves to a hyperlink, the directive reference becomes a
|
|
|
|
|
hyperlink reference.
|
|
|
|
|
|
|
|
|
|
This seems too indirect and complicated for easy comprehension.
|
|
|
|
|
|
|
|
|
|
The reference in the text will sometimes become a link, sometimes
|
|
|
|
|
not. Sometimes the reference text will remain, sometimes not. We
|
|
|
|
|
don't know *at the reference*::
|
|
|
|
|
|
|
|
|
|
This is a `hyperlink reference`_; its text will remain.
|
|
|
|
|
This is an `inline icon`_; its text will disappear.
|
|
|
|
|
|
|
|
|
|
That's a problem.
|
|
|
|
|
|
|
|
|
|
The syntax that has been incorporated into the spec and parser is
|
|
|
|
|
reference alternative 4d with definition alternative 3::
|
|
|
|
|
|
|
|
|
|
The |biohazard| symbol...
|
|
|
|
|
|
|
|
|
|
.. |biohazard| image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
|
|
|
|
|
We can also combine substitution references with hyperlink references,
|
|
|
|
|
by appending a "_" (named hyperlink reference) or "__" (anonymous
|
|
|
|
|
hyperlink reference) suffix to the substitution reference. This
|
|
|
|
|
allows us to click on an image-link::
|
|
|
|
|
|
|
|
|
|
The |biohazard|_ symbol...
|
|
|
|
|
|
|
|
|
|
.. |biohazard| image:: biohazard.png
|
|
|
|
|
[height=20 width=20]
|
|
|
|
|
.. _biohazard: http://www.cdc.gov/
|
|
|
|
|
|
|
|
|
|
There have been several suggestions for the naming of these
|
|
|
|
|
constructs, originally called "substitution references" and
|
|
|
|
|
"substitutions".
|
|
|
|
|
|
|
|
|
|
1. Candidate names for the reference construct:
|
|
|
|
|
|
|
|
|
|
(a) substitution reference
|
|
|
|
|
(b) tagging reference
|
|
|
|
|
(c) inline directive reference
|
|
|
|
|
(d) directive reference
|
|
|
|
|
(e) indirect inline directive reference
|
|
|
|
|
(f) inline directive placeholder
|
|
|
|
|
(g) inline directive insertion reference
|
|
|
|
|
(h) directive insertion reference
|
|
|
|
|
(i) insertion reference
|
|
|
|
|
(j) directive macro reference
|
|
|
|
|
(k) macro reference
|
|
|
|
|
(l) substitution directive reference
|
|
|
|
|
|
|
|
|
|
2. Candidate names for the definition construct:
|
|
|
|
|
|
|
|
|
|
(a) substitution
|
|
|
|
|
(b) substitution directive
|
|
|
|
|
(c) tag
|
|
|
|
|
(d) tagged directive
|
|
|
|
|
(e) directive target
|
|
|
|
|
(f) inline directive
|
|
|
|
|
(g) inline directive definition
|
|
|
|
|
(h) referenced directive
|
|
|
|
|
(i) indirect directive
|
|
|
|
|
(j) indirect directive definition
|
|
|
|
|
(k) directive definition
|
|
|
|
|
(l) indirect inline directive
|
|
|
|
|
(m) named directive definition
|
|
|
|
|
(n) inline directive insertion definition
|
|
|
|
|
(o) directive insertion definition
|
|
|
|
|
(p) insertion definition
|
|
|
|
|
(q) insertion directive
|
|
|
|
|
(r) substitution definition
|
|
|
|
|
(s) directive macro definition
|
|
|
|
|
(t) macro definition
|
|
|
|
|
(u) substitution directive definition
|
|
|
|
|
(v) substitution definition
|
|
|
|
|
|
|
|
|
|
"Inline directive reference" (1c) seems to be an appropriate term at
|
|
|
|
|
first, but the term "inline" is redundant in the case of the
|
|
|
|
|
reference. Its counterpart "inline directive definition" (2g) is
|
|
|
|
|
awkward, because the directive definition itself is not inline.
|
|
|
|
|
|
|
|
|
|
"Directive reference" (1d) and "directive definition" (2k) are too
|
|
|
|
|
vague. "Directive definition" could be used to refer to any
|
|
|
|
|
directive, not just those used for inline substitutions.
|
|
|
|
|
|
|
|
|
|
One meaning of the term "macro" (1k, 2s, 2t) is too
|
|
|
|
|
programming-language-specific. Also, macros are typically simple text
|
|
|
|
|
substitution mechanisms: the text is substituted first and evaluated
|
|
|
|
|
later. reStructuredText substitution definitions are evaluated in
|
|
|
|
|
place at parse time and substituted afterwards.
|
|
|
|
|
|
|
|
|
|
"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that
|
|
|
|
|
something new is getting added rather than one construct being
|
|
|
|
|
replaced by another.
|
|
|
|
|
|
|
|
|
|
Which brings us back to "substitution". The overall best names are
|
|
|
|
|
"substitution reference" (1a) and "substitution definition" (2v). A
|
|
|
|
|
long way to go to add one word!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inline External Targets
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
Currently reStructuredText has two hyperlink syntax variations:
|
|
|
|
|
|
|
|
|
|
* Named hyperlinks::
|
|
|
|
|
|
|
|
|
|
This is a named reference_ of one word ("reference"). Here is
|
|
|
|
|
a `phrase reference`_. Phrase references may even cross `line
|
|
|
|
|
boundaries`_.
|
|
|
|
|
|
|
|
|
|
.. _reference: http://www.example.org/reference/
|
|
|
|
|
.. _phrase reference: http://www.example.org/phrase_reference/
|
|
|
|
|
.. _line boundaries: http://www.example.org/line_boundaries/
|
|
|
|
|
|
|
|
|
|
+ Advantages:
|
|
|
|
|
|
|
|
|
|
- The plaintext is readable.
|
|
|
|
|
- Each target may be reused multiple times (e.g., just write
|
|
|
|
|
``"reference_"`` again).
|
|
|
|
|
- No syncronized ordering of references and targets is necessary.
|
|
|
|
|
|
|
|
|
|
+ Disadvantages:
|
|
|
|
|
|
|
|
|
|
- The reference text must be repeated as target names; could lead
|
|
|
|
|
to mistakes.
|
|
|
|
|
- The target URLs may be located far from the references, and hard
|
|
|
|
|
to find in the plaintext.
|
|
|
|
|
|
|
|
|
|
* Anonymous hyperlinks (in current reStructuredText)::
|
|
|
|
|
|
|
|
|
|
This is an anonymous reference__. Here is an anonymous
|
|
|
|
|
`phrase reference`__. Phrase references may even cross `line
|
|
|
|
|
boundaries`__.
|
|
|
|
|
|
|
|
|
|
__ http://www.example.org/reference/
|
|
|
|
|
__ http://www.example.org/phrase_reference/
|
|
|
|
|
__ http://www.example.org/line_boundaries/
|
|
|
|
|
|
|
|
|
|
+ Advantages:
|
|
|
|
|
|
|
|
|
|
- The plaintext is readable.
|
|
|
|
|
- The reference text does not have to be repeated.
|
|
|
|
|
|
|
|
|
|
+ Disadvantages:
|
|
|
|
|
|
|
|
|
|
- References and targets must be kept in sync.
|
|
|
|
|
- Targets cannot be reused.
|
|
|
|
|
- The target URLs may be located far from the references.
|
|
|
|
|
|
|
|
|
|
For comparison and historical background, StructuredText also has two
|
|
|
|
|
syntaxes for hyperlinks:
|
|
|
|
|
|
|
|
|
|
* First, ``"reference text":URL``::
|
|
|
|
|
|
|
|
|
|
This is a "reference":http://www.example.org/reference/
|
|
|
|
|
of one word ("reference"). Here is a "phrase
|
|
|
|
|
reference":http://www.example.org/phrase_reference/.
|
|
|
|
|
|
|
|
|
|
* Second, ``"reference text", http://example.com/absolute_URL``::
|
|
|
|
|
|
|
|
|
|
This is a "reference", http://www.example.org/reference/
|
|
|
|
|
of one word ("reference"). Here is a "phrase reference",
|
|
|
|
|
http://www.example.org/phrase_reference/.
|
|
|
|
|
|
|
|
|
|
Both syntaxes share advantages and disadvantages:
|
|
|
|
|
|
|
|
|
|
+ Advantages:
|
|
|
|
|
|
|
|
|
|
- The target is specified immediately adjacent to the reference.
|
|
|
|
|
|
|
|
|
|
+ Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Poor plaintext readability.
|
|
|
|
|
- Targets cannot be reused.
|
|
|
|
|
- Both syntaxes use double quotes, common in ordinary text.
|
|
|
|
|
- In the first syntax, the URL and the last word are stuck
|
|
|
|
|
together, exacerbating the line wrap problem.
|
|
|
|
|
- The second syntax is too magical; text could easily be written
|
|
|
|
|
that way by accident (although only absolute URLs are recognized
|
|
|
|
|
here, perhaps because of the potential for ambiguity).
|
|
|
|
|
|
|
|
|
|
A new type of "inline external hyperlink" has been proposed.
|
|
|
|
|
|
|
|
|
|
1. On 2002-06-28, Simon Budig proposed__ a new syntax for
|
|
|
|
|
reStructuredText hyperlinks::
|
|
|
|
|
|
|
|
|
|
This is a reference_(http://www.example.org/reference/) of one
|
|
|
|
|
word ("reference"). Here is a `phrase
|
|
|
|
|
reference`_(http://www.example.org/phrase_reference/). Are
|
|
|
|
|
these examples, (single-underscore), named? If so, `anonymous
|
|
|
|
|
references`__(http://www.example.org/anonymous/) using two
|
|
|
|
|
underscores would probably be preferable.
|
|
|
|
|
|
|
|
|
|
__ http://mail.python.org/pipermail/doc-sig/2002-June/002648.html
|
|
|
|
|
|
|
|
|
|
The syntax, advantages, and disadvantages are similar to those of
|
|
|
|
|
StructuredText.
|
|
|
|
|
|
|
|
|
|
+ Advantages:
|
|
|
|
|
|
|
|
|
|
- The target is specified immediately adjacent to the reference.
|
|
|
|
|
|
|
|
|
|
+ Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Poor plaintext readability.
|
|
|
|
|
- Targets cannot be reused (unless named, but the semantics are
|
|
|
|
|
unclear).
|
|
|
|
|
|
|
|
|
|
+ Problems:
|
|
|
|
|
|
|
|
|
|
- The ``"`ref`_(URL)"`` syntax forces the last word of the
|
|
|
|
|
reference text to be joined to the URL, making a potentially
|
|
|
|
|
very long word that can't be wrapped (URLs can be very long).
|
|
|
|
|
The reference and the URL should be separate. This is a
|
|
|
|
|
symptom of the following point:
|
|
|
|
|
|
|
|
|
|
- The syntax produces a single compound construct made up of two
|
|
|
|
|
equally important parts, *with syntax in the middle*, *between*
|
|
|
|
|
the reference and the target. This is unprecedented in
|
|
|
|
|
reStructuredText.
|
|
|
|
|
|
|
|
|
|
- The "inline hyperlink" text is *not* a named reference (there's
|
|
|
|
|
no lookup by name), so it shouldn't look like one.
|
|
|
|
|
|
|
|
|
|
- According to the IETF standards RFC 2396 and RFC 2732,
|
|
|
|
|
parentheses are legal URI characters and curly braces are legal
|
|
|
|
|
email characters, making their use prohibitively difficult.
|
|
|
|
|
|
|
|
|
|
- The named/anonymous semantics are unclear.
|
|
|
|
|
|
|
|
|
|
2. After an analysis__ of the syntax of (1) above, we came up with the
|
|
|
|
|
following compromise syntax::
|
|
|
|
|
|
|
|
|
|
This is an anonymous reference__
|
|
|
|
|
__<http://www.example.org/reference/> of one word
|
|
|
|
|
("reference"). Here is a `phrase reference`__
|
|
|
|
|
__<http://www.example.org/phrase_reference/>. `Named
|
|
|
|
|
references`_ _<http://www.example.org/anonymous/> use single
|
|
|
|
|
underscores.
|
|
|
|
|
|
|
|
|
|
__ http://mail.python.org/pipermail/doc-sig/2002-July/002670.html
|
|
|
|
|
|
|
|
|
|
The syntax builds on that of the existing "inline internal
|
|
|
|
|
targets": ``an _`inline internal target`.``
|
|
|
|
|
|
|
|
|
|
+ Advantages:
|
|
|
|
|
|
|
|
|
|
- The target is specified immediately adjacent to the reference,
|
|
|
|
|
improving maintainability:
|
|
|
|
|
|
|
|
|
|
- References and targets are easily kept in sync.
|
|
|
|
|
- The reference text does not have to be repeated.
|
|
|
|
|
|
|
|
|
|
- The construct is executed in two parts: references identical to
|
|
|
|
|
existing references, and targets that are new but not too big a
|
|
|
|
|
stretch from current syntax.
|
|
|
|
|
|
|
|
|
|
- There's overwhelming precedent for quoting URLs with angle
|
|
|
|
|
brackets [#]_.
|
|
|
|
|
|
|
|
|
|
+ Disadvantages:
|
|
|
|
|
|
|
|
|
|
- Poor plaintext readability.
|
|
|
|
|
- Lots of "line noise".
|
|
|
|
|
- Targets cannot be reused (unless named; see below).
|
|
|
|
|
|
|
|
|
|
To alleviate the readability issue slightly, we could allow the
|
|
|
|
|
target to appear later, such as after the end of the sentence::
|
|
|
|
|
|
|
|
|
|
This is a named reference__ of one word ("reference").
|
|
|
|
|
__<http://www.example.org/reference/> Here is a `phrase
|
|
|
|
|
reference`__. __<http://www.example.org/phrase_reference/>
|
|
|
|
|
|
|
|
|
|
Problem: this could only work for one reference at a time
|
|
|
|
|
(reference/target pairs must be proximate [refA trgA refB trgB],
|
|
|
|
|
not interleaved [refA refB trgA trgB] or nested [refA refB trgB
|
|
|
|
|
trgA]). This variation is too problematic; references and inline
|
|
|
|
|
external targets will have to be kept imediately adjacent (see (3)
|
|
|
|
|
below).
|
|
|
|
|
|
|
|
|
|
The ``"reference__ __<target>"`` syntax is actually for "anonymous
|
|
|
|
|
inline external targets", emphasized by the double underscores. It
|
|
|
|
|
follows that single trailing and leading underscores would lead to
|
|
|
|
|
*implicitly named* inline external targets. This would allow the
|
|
|
|
|
reuse of targets by name. So after ``"reference_ _<target>"``,
|
|
|
|
|
another ``"reference_"`` would point to the same target.
|
|
|
|
|
|
|
|
|
|
.. [#]
|
|
|
|
|
From RFC 2396 (URI syntax):
|
|
|
|
|
|
|
|
|
|
The angle-bracket "<" and ">" and double-quote (")
|
|
|
|
|
characters are excluded [from URIs] because they are often
|
|
|
|
|
used as the delimiters around URI in text documents and
|
|
|
|
|
protocol fields.
|
|
|
|
|
|
|
|
|
|
Using <> angle brackets around each URI is especially
|
|
|
|
|
recommended as a delimiting style for URI that contain
|
|
|
|
|
whitespace.
|
|
|
|
|
|
|
|
|
|
From RFC 822 (email headers):
|
|
|
|
|
|
|
|
|
|
Angle brackets ("<" and ">") are generally used to indicate
|
|
|
|
|
the presence of a one machine-usable reference (e.g.,
|
|
|
|
|
delimiting mailboxes), possibly including source-routing to
|
|
|
|
|
the machine.
|
|
|
|
|
|
|
|
|
|
3. If it is best for references and inline external targets to be
|
|
|
|
|
immediately adjacent, then they might as well be integrated.
|
|
|
|
|
Here's an alternative syntax embedding the target URL in the
|
|
|
|
|
reference::
|
|
|
|
|
|
|
|
|
|
This is an anonymous `reference <http://www.example.org
|
|
|
|
|
/reference/>`__ of one word ("reference"). Here is a `phrase
|
|
|
|
|
reference <http://www.example.org/phrase_reference/>`__.
|
|
|
|
|
|
|
|
|
|
Advantages and disadvantages are similar to those in (2).
|
|
|
|
|
Readability is still an issue, but the syntax is a bit less
|
|
|
|
|
heavyweight (reduced line noise). Backquotes are required, even
|
|
|
|
|
for one-word references; the target URL is included within the
|
|
|
|
|
reference text, forcing a phrase context.
|
|
|
|
|
|
|
|
|
|
We'll call this variant "embedded URIs".
|
|
|
|
|
|
|
|
|
|
Problem: how to refer to a title like "HTML Anchors: <a>" (which
|
|
|
|
|
ends with an HTML/SGML/XML tag)? We could either require more
|
|
|
|
|
syntax on the target (like ``"`reference text
|
|
|
|
|
__<http://example.com/>`__"``), or require the odd conflicting
|
|
|
|
|
title to be escaped (like ``"`HTML Anchors: \<a>`__"``). The
|
|
|
|
|
latter seems preferable, and not too onerous.
|
|
|
|
|
|
|
|
|
|
Similarly to (2) above, a single trailing underscore would convert
|
|
|
|
|
the reference & inline external target from anonymous to implicitly
|
|
|
|
|
named, allowing reuse of targets by name.
|
|
|
|
|
|
|
|
|
|
I think this is the least objectionable of the syntax alternatives.
|
|
|
|
|
|
|
|
|
|
Other syntax variations have been proposed (by Brett Cannon and Benja
|
|
|
|
|
Fallenstein)::
|
|
|
|
|
|
|
|
|
|
`phrase reference`->http://www.example.com
|
|
|
|
|
|
|
|
|
|
`phrase reference`@http://www.example.com
|
|
|
|
|
|
|
|
|
|
`phrase reference`__ ->http://www.example.com
|
|
|
|
|
|
|
|
|
|
`phrase reference` [-> http://www.example.com]
|
|
|
|
|
|
|
|
|
|
`phrase reference`__ [-> http://www.example.com]
|
|
|
|
|
|
|
|
|
|
`phrase reference` <http://www.example.com>_
|
|
|
|
|
|
|
|
|
|
None of these variations are clearly superior to #3 above. Some have
|
|
|
|
|
problems that exclude their use.
|
|
|
|
|
|
|
|
|
|
With any kind of inline external target syntax it comes down to the
|
|
|
|
|
conflict between maintainability and plaintext readability. I don't
|
|
|
|
|
see a major problem with reStructuredText's maintainability, and I
|
|
|
|
|
don't want to sacrifice plaintext readability to "improve" it.
|
|
|
|
|
|
|
|
|
|
The proponents of inline external targets want them for easily
|
|
|
|
|
maintainable web pages. The arguments go something like this:
|
|
|
|
|
|
|
|
|
|
- Named hyperlinks are difficult to maintain because the reference
|
|
|
|
|
text is duplicated as the target name.
|
|
|
|
|
|
|
|
|
|
To which I said, "So use anonymous hyperlinks."
|
|
|
|
|
|
|
|
|
|
- Anonymous hyperlinks are difficult to maintain becuase the
|
|
|
|
|
references and targets have to be kept in sync.
|
|
|
|
|
|
|
|
|
|
"So keep the targets close to the references, grouped after each
|
|
|
|
|
paragraph. Maintenance is trivial."
|
|
|
|
|
|
|
|
|
|
- But targets grouped after paragraphs break the flow of text.
|
|
|
|
|
|
|
|
|
|
"Surely less than URLs embedded in the text! And if the intent is
|
|
|
|
|
to produce web pages, not readable plaintext, then who cares about
|
|
|
|
|
the flow of text?"
|
|
|
|
|
|
|
|
|
|
Many participants have voiced their objections to the proposed syntax:
|
|
|
|
|
|
|
|
|
|
Garth Kidd: "I strongly prefer the current way of doing it.
|
|
|
|
|
Inline is spectactularly messy, IMHO."
|
|
|
|
|
|
|
|
|
|
Tony Ibbs: "I vehemently agree... that the inline alternatives
|
|
|
|
|
being suggested look messy - there are/were good reasons they've
|
|
|
|
|
been taken out... I don't believe I would gain from the new
|
|
|
|
|
syntaxes."
|
|
|
|
|
|
|
|
|
|
Paul Moore: "I agree as well. The proposed syntax is far too
|
|
|
|
|
punctuation-heavy, and any of the alternatives discussed are
|
|
|
|
|
ambiguous or too subtle."
|
|
|
|
|
|
|
|
|
|
Others have voiced their support:
|
|
|
|
|
|
|
|
|
|
fantasai: "I agree with Simon. In many cases, though certainly
|
|
|
|
|
not in all, I find parenthesizing the url in plain text flows
|
|
|
|
|
better than relegating it to a footnote."
|
|
|
|
|
|
|
|
|
|
Ken Manheimer: "I'd like to weigh in requesting some kind of easy,
|
|
|
|
|
direct inline reference link."
|
|
|
|
|
|
|
|
|
|
(Interesting that those *against* the proposal have been using
|
|
|
|
|
reStructuredText for a while, and those *for* the proposal are either
|
|
|
|
|
new to the list ["fantasai", background unknown] or longtime
|
|
|
|
|
StructuredText users [Ken Manheimer].)
|
|
|
|
|
|
|
|
|
|
I was initially ambivalent/against the proposed "inline external
|
|
|
|
|
targets". I value reStructuredText's readability very highly, and
|
|
|
|
|
although the proposed syntax offers convenience, I don't know if the
|
|
|
|
|
convenience is worth the cost in ugliness. Does the proposed syntax
|
|
|
|
|
compromise readability too much, or should the choice be left up to
|
|
|
|
|
the author? Perhaps if the syntax is *allowed* but its use strongly
|
|
|
|
|
*discouraged*, for aesthetic/readability reasons?
|
|
|
|
|
|
|
|
|
|
After a great deal of thought and much input from users, I've decided
|
|
|
|
|
that there are reasonable use cases for this construct. The
|
|
|
|
|
documentation should strongly caution against its use in most
|
|
|
|
|
situations, recommending independent block-level targets instead.
|
|
|
|
|
Syntax #3 above ("embedded URIs") will be used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doctree Representation of Transitions
|
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
|
|
(Although not reStructuredText-specific, this section fits best in
|
|
|
|
|
this document.)
|
|
|
|
|
|
|
|
|
|
Having added the "horizontal rule" construct to the `reStructuredText
|
|
|
|
|
Markup Specification`_, a decision had to be made as to how to reflect
|
|
|
|
|
the construct in the implementation of the document tree. Given this
|
|
|
|
|
source::
|
|
|
|
|
|
|
|
|
|
Document
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
Paragraph 1
|
|
|
|
|
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
Paragraph 2
|
|
|
|
|
|
|
|
|
|
The horizontal rule indicates a "transition" (in prose terms) or the
|
|
|
|
|
start of a new "division". Before implementation, the parsed document
|
|
|
|
|
tree would be::
|
|
|
|
|
|
|
|
|
|
<document>
|
|
|
|
|
<section names="document">
|
|
|
|
|
<title>
|
|
|
|
|
Document
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 1
|
|
|
|
|
-------- <--- error here
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 2
|
|
|
|
|
|
|
|
|
|
There are several possibilities for the implementation:
|
|
|
|
|
|
|
|
|
|
1. Implement horizontal rules as "divisions" or segments. A
|
|
|
|
|
"division" is a title-less, non-hierarchical section. The first
|
|
|
|
|
try at an implementation looked like this::
|
|
|
|
|
|
|
|
|
|
<document>
|
|
|
|
|
<section names="document">
|
|
|
|
|
<title>
|
|
|
|
|
Document
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 1
|
|
|
|
|
<division>
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 2
|
|
|
|
|
|
|
|
|
|
But the two paragraphs are really at the same level; they shouldn't
|
|
|
|
|
appear to be at different levels. There's really an invisible
|
|
|
|
|
"first division". The horizontal rule splits the document body
|
|
|
|
|
into two segments, which should be treated uniformly.
|
|
|
|
|
|
|
|
|
|
2. Treating "divisions" uniformly brings us to the second
|
|
|
|
|
possibility::
|
|
|
|
|
|
|
|
|
|
<document>
|
|
|
|
|
<section names="document">
|
|
|
|
|
<title>
|
|
|
|
|
Document
|
|
|
|
|
<division>
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 1
|
|
|
|
|
<division>
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 2
|
|
|
|
|
|
|
|
|
|
With this change, documents and sections will directly contain
|
|
|
|
|
divisions and sections, but not body elements. Only divisions will
|
|
|
|
|
directly contain body elements. Even without a horizontal rule
|
|
|
|
|
anywhere, the body elements of a document or section would be
|
|
|
|
|
contained within a division element. This makes the document tree
|
|
|
|
|
deeper. This is similar to the way HTML_ treats document contents:
|
|
|
|
|
grouped within a ``<body>`` element.
|
|
|
|
|
|
|
|
|
|
3. Implement them as "transitions", empty elements::
|
|
|
|
|
|
|
|
|
|
<document>
|
|
|
|
|
<section names="document">
|
|
|
|
|
<title>
|
|
|
|
|
Document
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 1
|
|
|
|
|
<transition>
|
|
|
|
|
<paragraph>
|
|
|
|
|
Paragraph 2
|
|
|
|
|
|
|
|
|
|
A transition would be a "point element", not containing anything,
|
|
|
|
|
only identifying a point within the document structure. This keeps
|
|
|
|
|
the document tree flatter, but the idea of a "point element" like
|
|
|
|
|
"transition" smells bad. A transition isn't a thing itself, it's
|
|
|
|
|
the space between two divisions. However, transitions are a
|
|
|
|
|
practical solution.
|
|
|
|
|
|
|
|
|
|
Solution 3 was chosen for incorporation into the document tree model.
|
|
|
|
|
|
|
|
|
|
.. _HTML: http://www.w3.org/MarkUp/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syntax for Line Blocks
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
* An early idea: How about a literal-block-like prefix, perhaps
|
|
|
|
|
"``;;``"? (It is, after all, a *semi-literal* literal block, no?)
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
|
|
Take it away, Eric the Orchestra Leader! ;;
|
|
|
|
|
|
|
|
|
|
A one, two, a one two three four
|
|
|
|
|
|
|
|
|
|
Half a bee, philosophically,
|
|
|
|
|
must, *ipso facto*, half not be.
|
|
|
|
|
But half the bee has got to be,
|
|
|
|
|
*vis a vis* its entity. D'you see?
|
|
|
|
|
|
|
|
|
|
But can a bee be said to be
|
|
|
|
|
or not to be an entire bee,
|
|
|
|
|
when half the bee is not a bee,
|
|
|
|
|
due to some ancient injury?
|
|
|
|
|
|
|
|
|
|
Singing...
|
|
|
|
|
|
|
|
|
|
Kinda lame.
|
|
|
|
|
|
|
|
|
|
* Another idea: in an ordinary paragraph, if the first line ends with
|
|
|
|
|
a backslash (escaping the newline), interpret the entire paragraph
|
|
|
|
|
as a verse block? For example::
|
|
|
|
|
|
|
|
|
|
Add just one backslash\
|
|
|
|
|
And this paragraph becomes
|
|
|
|
|
An awful haiku
|
|
|
|
|
|
|
|
|
|
(Awful, and arguably invalid, since in Japanese the word "haiku"
|
|
|
|
|
contains three syllables not two.)
|
|
|
|
|
|
|
|
|
|
This idea was superceded by the rules for escaped whitespace, useful
|
|
|
|
|
for `character-level inline markup`_.
|
|
|
|
|
|
|
|
|
|
* In a `2004-02-22 docutils-develop message`__, Jarno Elonen proposed
|
|
|
|
|
a "plain list" syntax (and also provided a patch)::
|
|
|
|
|
|
|
|
|
|
| John Doe
|
|
|
|
|
| President, SuperDuper Corp.
|
|
|
|
|
| jdoe@example.org
|
|
|
|
|
|
|
|
|
|
__ http://thread.gmane.org/gmane.text.docutils.devel/1187
|
|
|
|
|
|
|
|
|
|
This syntax is very natural. However, these "plain lists" seem very
|
|
|
|
|
similar to line blocks, and I see so little intrinsic "list-ness"
|
|
|
|
|
that I'm loathe to add a new object. I used the term "blurbs" to
|
|
|
|
|
remove the "list" connotation from the originally proposed name.
|
|
|
|
|
Perhaps line blocks could be refined to add the two properties they
|
|
|
|
|
currently lack:
|
|
|
|
|
|
|
|
|
|
A) long lines wrap nicely
|
|
|
|
|
B) HTML output doesn't look like program code in non-CSS web
|
|
|
|
|
browsers
|
|
|
|
|
|
|
|
|
|
(A) is an issue of all 3 aspects of Docutils: syntax (construct
|
|
|
|
|
behaviour), internal representation, and output. (B) is partly an
|
|
|
|
|
issue of internal representation but mostly of output.
|
|
|
|
|
|
|
|
|
|
ReStructuredText will redefine line blocks with the "|"-quoting
|
|
|
|
|
syntax. The following is my current thinking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syntax
|
|
|
|
|
------
|
|
|
|
|
|
|
|
|
|
Perhaps line block syntax like this would do::
|
|
|
|
|
|
|
|
|
|
| M6: James Bond
|
|
|
|
|
| MIB: Mr. J.
|
|
|
|
|
| IMF: not decided yet, but probably one of the following:
|
|
|
|
|
| Ethan Hunt
|
|
|
|
|
| Jim Phelps
|
|
|
|
|
| Claire Phelps
|
|
|
|
|
| CIA: Lea Leiter
|
|
|
|
|
|
|
|
|
|
Note that the "nested" list does not have nested syntax (the "|" are
|
|
|
|
|
not further indented); the leading whitespace would still be
|
|
|
|
|
significant somehow (more below). As for long lines in the input,
|
|
|
|
|
this could suffice::
|
|
|
|
|
|
|
|
|
|
| John Doe
|
|
|
|
|
| Founder, President, Chief Executive Officer, Cook, Bottle
|
|
|
|
|
Washer, and All-Round Great Guy
|
|
|
|
|
| SuperDuper Corp.
|
|
|
|
|
| jdoe@example.org
|
|
|
|
|
|
|
|
|
|
The lack of "|" on the third line indicates that it's a continuation
|
|
|
|
|
of the second line, wrapped.
|
|
|
|
|
|
|
|
|
|
I don't see much point in allowing arbitrary nested content. Multiple
|
|
|
|
|
paragraphs or bullet lists inside a "blurb" doesn't make sense to me.
|
|
|
|
|
Simple nested line blocks should suffice.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Internal Representation
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Line blocks are currently represented as text blobs as follows::
|
|
|
|
|
|
|
|
|
|
<!ELEMENT line_block %text.model;>
|
|
|
|
|
<!ATTLIST line_block
|
|
|
|
|
%basic.atts;
|
|
|
|
|
%fixedspace.att;>
|
|
|
|
|
|
|
|
|
|
Instead, we could represent each line by a separate element::
|
|
|
|
|
|
|
|
|
|
<!ELEMENT line_block (line+)>
|
|
|
|
|
<!ATTLIST line_block %basic.atts;>
|
|
|
|
|
|
|
|
|
|
<!ELEMENT line %text.model;>
|
|
|
|
|
<!ATTLIST line %basic.atts;>
|
|
|
|
|
|
|
|
|
|
We'd keep the significance of the leading whitespace of each line
|
|
|
|
|
either by converting it to non-breaking spaces at output, or with a
|
|
|
|
|
per-line margin. Non-breaking spaces are simpler (for HTML, anyway)
|
|
|
|
|
but kludgey, and wouldn't support indented long lines that wrap. But
|
|
|
|
|
should inter-word whitespace (i.e., not leading whitespace) be
|
|
|
|
|
preserved? Currently it is preserved in line blocks.
|
|
|
|
|
|
|
|
|
|
Representing a more complex line block may be tricky::
|
|
|
|
|
|
|
|
|
|
| But can a bee be said to be
|
|
|
|
|
| or not to be an entire bee,
|
|
|
|
|
| when half the bee is not a bee,
|
|
|
|
|
| due to some ancient injury?
|
|
|
|
|
|
|
|
|
|
Perhaps the representation could allow for nested line blocks::
|
|
|
|
|
|
|
|
|
|
<!ELEMENT line_block (line | line_block)+>
|
|
|
|
|
|
|
|
|
|
With this model, leading whitespace would no longer be significant.
|
|
|
|
|
Instead, left margins are implied by the nesting. The example above
|
|
|
|
|
could be represented as follows::
|
|
|
|
|
|
|
|
|
|
<line_block>
|
|
|
|
|
<line>
|
|
|
|
|
But can a bee be said to be
|
|
|
|
|
<line_block>
|
|
|
|
|
<line>
|
|
|
|
|
or not to be an entire bee,
|
|
|
|
|
<line_block>
|
|
|
|
|
<line>
|
|
|
|
|
when half the bee is not a bee,
|
|
|
|
|
<line_block>
|
|
|
|
|
<line>
|
|
|
|
|
due to some ancient injury?
|
|
|
|
|
|
|
|
|
|
I wasn't sure what to do about even more complex line blocks::
|
|
|
|
|
|
|
|
|
|
| Indented
|
|
|
|
|
| Not indented
|
|
|
|
|
| Indented a bit
|
|
|
|
|
| A bit more
|
|
|
|
|
| Only one space
|
|
|
|
|
|
|
|
|
|
How should that be parsed and nested? Should the first line have
|
|
|
|
|
the same nesting level (== indentation in the output) as the fourth
|
|
|
|
|
line, or the same as the last line? Mark Nodine suggested that such
|
|
|
|
|
line blocks be parsed similarly to complexly-nested block quotes,
|
|
|
|
|
which seems reasonable. In the example above, this would result in
|
|
|
|
|
the nesting of first line matching the last line's nesting. In
|
|
|
|
|
other words, the nesting would be relative to neighboring lines
|
|
|
|
|
only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Output
|
|
|
|
|
------
|
|
|
|
|
|
|
|
|
|
In HTML, line blocks are currently output as "<pre>" blocks, which
|
|
|
|
|
gives us significant whitespace and line breaks, but doesn't allow
|
|
|
|
|
long lines to wrap and causes monospaced output without stylesheets.
|
|
|
|
|
Instead, we could output "<div>" elements parallelling the
|
|
|
|
|
representation above, where each nested <div class="line_block"> would
|
|
|
|
|
have an increased left margin (specified in the stylesheet).
|
|
|
|
|
|
|
|
|
|
Jarno suggested the following HTML output::
|
|
|
|
|
|
|
|
|
|
<div class="line_block">
|
|
|
|
|
<span class="line">First, top level line</span><br class="hidden"/>
|
|
|
|
|
<div class="line_block"><span class="hidden"> </span>
|
|
|
|
|
<span class="line">Second, once nested</span><br class="hidden"/>
|
|
|
|
|
<span class="line">Third, once nested</span><br class="hidden"/>
|
|
|
|
|
...
|
|
|
|
|
</div>
|
|
|
|
|
...
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
|
The ``<br class="hidden" />`` and ``<span
|
|
|
|
|
class="hidden"> </span>`` are meant to support non-CSS and
|
|
|
|
|
non-graphical browsers. I understand the case for "br", but I'm not
|
|
|
|
|
so sure about hidden " ". I question how much effort should be
|
|
|
|
|
put toward supporting non-graphical and especially non-CSS browsers,
|
|
|
|
|
at least for html4css1.py output.
|
|
|
|
|
|
|
|
|
|
Should the lines themselves be ``<span>`` or ``<div>``? I don't like
|
|
|
|
|
mixing inline and block-level elements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation Plan
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
We'll leave the old implementation in place (via the "line-block"
|
|
|
|
|
directive only) until all Writers have been updated to support the new
|
|
|
|
|
syntax & implementation. The "line-block" directive can then be
|
|
|
|
|
updated to use the new internal representation, and its documentation
|
|
|
|
|
will be updated to recommend the new syntax.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
List-Driven Tables
|
|
|
|
|
==================
|
|
|
|
|
|
|
|
|
|
The original idea came from Dylan Jay:
|
|
|
|
|
|
|
|
|
|
... to use a two level bulleted list with something to
|
|
|
|
|
indicate it should be rendered as a table ...
|
|
|
|
|
|
|
|
|
|
It's an interesting idea. It could be implemented in as a directive
|
|
|
|
|
which transforms a uniform two-level list into a table. Using a
|
|
|
|
|
directive would allow the author to explicitly set the table's
|
|
|
|
|
orientation (by column or by row), the presence of row headers, etc.
|
|
|
|
|
|
|
|
|
|
Alternatives:
|
|
|
|
|
|
|
|
|
|
1. (Implemented in Docutils 0.3.8).
|
|
|
|
|
|
|
|
|
|
Bullet-list-tables might look like this::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
|
|
|
|
|
* - Treat
|
|
|
|
|
- Quantity
|
|
|
|
|
- Description
|
|
|
|
|
* - Albatross!
|
|
|
|
|
- 299
|
|
|
|
|
- On a stick!
|
|
|
|
|
* - Crunchy Frog!
|
|
|
|
|
- 1499
|
|
|
|
|
- If we took the bones out, it wouldn't be crunchy,
|
|
|
|
|
now would it?
|
|
|
|
|
* - Gannet Ripple!
|
|
|
|
|
- 199
|
|
|
|
|
- On a stick!
|
|
|
|
|
|
|
|
|
|
This list must be written in two levels. This wouldn't work::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
|
|
|
|
|
* Treat
|
|
|
|
|
* Albatross!
|
|
|
|
|
* Gannet!
|
|
|
|
|
* Crunchy Frog!
|
|
|
|
|
|
|
|
|
|
* Quantity
|
|
|
|
|
* 299
|
|
|
|
|
* 199
|
|
|
|
|
* 1499
|
|
|
|
|
|
|
|
|
|
* Description
|
|
|
|
|
* On a stick!
|
|
|
|
|
* On a stick!
|
|
|
|
|
* If we took the bones out...
|
|
|
|
|
|
|
|
|
|
The above is a single list of 12 items. The blank lines are not
|
|
|
|
|
significant to the markup. We'd have to explicitly specify how
|
|
|
|
|
many columns or rows to use, which isn't a good idea.
|
|
|
|
|
|
|
|
|
|
2. Beni Cherniavsky suggested a field list alternative. It could look
|
|
|
|
|
like this::
|
|
|
|
|
|
|
|
|
|
.. field-list-table::
|
|
|
|
|
:headrows: 1
|
|
|
|
|
|
|
|
|
|
- :treat: Treat
|
|
|
|
|
:quantity: Quantity
|
|
|
|
|
:descr: Description
|
|
|
|
|
|
|
|
|
|
- :treat: Albatross!
|
|
|
|
|
:quantity: 299
|
|
|
|
|
:descr: On a stick!
|
|
|
|
|
|
|
|
|
|
- :treat: Crunchy Frog!
|
|
|
|
|
:quantity: 1499
|
|
|
|
|
:descr: If we took the bones out, it wouldn't be
|
|
|
|
|
crunchy, now would it?
|
|
|
|
|
|
|
|
|
|
Column order is determined from the order of fields in the first
|
|
|
|
|
row. Field order in all other rows is ignored. As a side-effect,
|
|
|
|
|
this allows trivial re-arrangement of columns. By using named
|
|
|
|
|
fields, it becomes possible to omit fields in some rows without
|
|
|
|
|
losing track of things, which is important for spans.
|
|
|
|
|
|
|
|
|
|
3. An alternative to two-level bullet lists would be to use enumerated
|
|
|
|
|
lists for the table cells::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
|
|
|
|
|
* 1. Treat
|
|
|
|
|
2. Quantity
|
|
|
|
|
3. Description
|
|
|
|
|
* 1. Albatross!
|
|
|
|
|
2. 299
|
|
|
|
|
3. On a stick!
|
|
|
|
|
* 1. Crunchy Frog!
|
|
|
|
|
2. 1499
|
|
|
|
|
3. If we took the bones out, it wouldn't be crunchy,
|
|
|
|
|
now would it?
|
|
|
|
|
|
|
|
|
|
That provides better correspondence between cells in the same
|
|
|
|
|
column than does bullet-list syntax, but not as good as field list
|
|
|
|
|
syntax. I think that were only field-list-tables available, a lot
|
|
|
|
|
of users would use the equivalent degenerate case::
|
|
|
|
|
|
|
|
|
|
.. field-list-table::
|
|
|
|
|
- :1: Treat
|
|
|
|
|
:2: Quantity
|
|
|
|
|
:3: Description
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
4. Another natural variant is to allow a description list with field
|
|
|
|
|
lists as descriptions::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
:headrows: 1
|
|
|
|
|
|
|
|
|
|
Treat
|
|
|
|
|
:quantity: Quantity
|
|
|
|
|
:descr: Description
|
|
|
|
|
Albatross!
|
|
|
|
|
:quantity: 299
|
|
|
|
|
:descr: On a stick!
|
|
|
|
|
Crunchy Frog!
|
|
|
|
|
:quantity: 1499
|
|
|
|
|
:descr: If we took the bones out, it wouldn't be
|
|
|
|
|
crunchy, now would it?
|
|
|
|
|
|
|
|
|
|
This would make the whole first column a header column ("stub").
|
|
|
|
|
It's limited to a single column and a single paragraph fitting on
|
|
|
|
|
one source line. Also it wouldn't allow for empty cells or row
|
|
|
|
|
spans in the first column. But these are limitations that we could
|
|
|
|
|
live with, like those of simple tables.
|
|
|
|
|
|
|
|
|
|
The List-driven table feature could be done in many ways. Each user
|
|
|
|
|
will have their preferred usage. Perhaps a single "list-table"
|
|
|
|
|
directive could handle them all, depending on which options and
|
|
|
|
|
content are present.
|
|
|
|
|
|
|
|
|
|
Issues:
|
|
|
|
|
|
|
|
|
|
* How to indicate that there's 1 header row? Perhaps two lists? ::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
|
|
|
|
|
+ - Treat
|
|
|
|
|
- Quantity
|
|
|
|
|
- Description
|
|
|
|
|
|
|
|
|
|
* - Albatross!
|
|
|
|
|
- 299
|
|
|
|
|
- On a stick!
|
|
|
|
|
|
|
|
|
|
This is probably too subtle though. Better would be a directive
|
|
|
|
|
option, like ``:headrows: 1``. An early suggestion for the header
|
|
|
|
|
row(s) was to use a directive option::
|
|
|
|
|
|
|
|
|
|
.. field-list-table::
|
|
|
|
|
:header:
|
|
|
|
|
- :treat: Treat
|
|
|
|
|
:quantity: Quantity
|
|
|
|
|
:descr: Description
|
|
|
|
|
- :treat: Albatross!
|
|
|
|
|
:quantity: 299
|
|
|
|
|
:descr: On a stick!
|
|
|
|
|
|
|
|
|
|
But the table data is at two levels and looks inconsistent.
|
|
|
|
|
|
|
|
|
|
In general, we cannot extract the header row from field lists' field
|
|
|
|
|
names because field names cannot contain everything one might put in
|
|
|
|
|
a table cell. A separate header row also allows shorter field names
|
|
|
|
|
and doesn't force one to rewrite the whole table when the header
|
|
|
|
|
text changes. But for simpler cases, we can offer a ":header:
|
|
|
|
|
fields" option, which does extract header cells from field names::
|
|
|
|
|
|
|
|
|
|
.. field-list-table::
|
|
|
|
|
:header: fields
|
|
|
|
|
|
|
|
|
|
- :Treat: Albatross!
|
|
|
|
|
:Quantity: 299
|
|
|
|
|
:Description: On a stick!
|
|
|
|
|
|
|
|
|
|
* How to indicate the column widths? A directive option? ::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
:widths: 15 10 35
|
|
|
|
|
|
|
|
|
|
Automatic defaults from the text used?
|
|
|
|
|
|
|
|
|
|
* How to handle row and/or column spans?
|
|
|
|
|
|
|
|
|
|
In a field list, column-spans can be indicated by specifying the
|
|
|
|
|
first and last fields, separated by space-dash-space or ellipsis::
|
|
|
|
|
|
|
|
|
|
- :foo - baz: quuux
|
|
|
|
|
- :foo ... baz: quuux
|
|
|
|
|
|
|
|
|
|
Commas were proposed for column spans::
|
|
|
|
|
|
|
|
|
|
- :foo, bar: quux
|
|
|
|
|
|
|
|
|
|
But non-adjacent columns become problematic. Should we report an
|
|
|
|
|
error, or duplicate the value into each span of adjacent columns (as
|
|
|
|
|
was suggested)? The latter suggestion is appealing but may be too
|
|
|
|
|
clever. Best perhaps to simply specify the two ends.
|
|
|
|
|
|
|
|
|
|
It was suggested that comma syntax should be allowed, too, in order
|
|
|
|
|
to allow the user to avoid trouble when changing the column order.
|
|
|
|
|
But changing the column order of a table with spans is not trivial;
|
|
|
|
|
we shouldn't make it easier to mess up.
|
|
|
|
|
|
|
|
|
|
One possible syntax for row-spans is to simply treat any row where a
|
|
|
|
|
field is missing as a row-span from the last row where it appeared.
|
|
|
|
|
Leaving a field empty would still be possible by writing a field
|
|
|
|
|
with empty content. But this is too implicit.
|
|
|
|
|
|
|
|
|
|
Another way would be to require an explicit continuation marker
|
|
|
|
|
(``...``/``-"-``/``"``?) in all but the first row of a spanned
|
|
|
|
|
field. Empty comments could work (".."). If implemented, the same
|
|
|
|
|
marker could also be supported in simple tables, which lack
|
|
|
|
|
row-spanning abilities.
|
|
|
|
|
|
|
|
|
|
Explicit markup like ":rowspan:" and ":colspan:" was also suggested.
|
|
|
|
|
|
|
|
|
|
Sometimes in a table, the first header row contains spans. It may
|
|
|
|
|
be necessary to provide a way to specify the column field names
|
|
|
|
|
independently of data rows. A directive option would do it.
|
|
|
|
|
|
|
|
|
|
* We could specify "column-wise" or "row-wise" ordering, with the same
|
|
|
|
|
markup structure. For example, with definition data::
|
|
|
|
|
|
|
|
|
|
.. list-table::
|
|
|
|
|
:column-wise:
|
|
|
|
|
|
|
|
|
|
Treat
|
|
|
|
|
- Albatross!
|
|
|
|
|
- Crunchy Frog!
|
|
|
|
|
Quantity
|
|
|
|
|
- 299
|
|
|
|
|
- 1499
|
|
|
|
|
Description
|
|
|
|
|
- On a stick!
|
|
|
|
|
- If we took the bones out, it wouldn't be
|
|
|
|
|
crunchy, now would it?
|
|
|
|
|
|
|
|
|
|
* A syntax for _`stubs in grid tables` is easy to imagine::
|
|
|
|
|
|
|
|
|
|
+------------------------++------------+----------+
|
|
|
|
|
| Header row, column 1 || Header 2 | Header 3 |
|
|
|
|
|
+========================++============+==========+
|
|
|
|
|
| body row 1, column 1 || column 2 | column 3 |
|
|
|
|
|
+------------------------++------------+----------+
|
|
|
|
|
|
|
|
|
|
Or this idea from Nick Moffitt::
|
|
|
|
|
|
|
|
|
|
+-----+---+---+
|
|
|
|
|
| XOR # T | F |
|
|
|
|
|
+=====+===+===+
|
|
|
|
|
| T # F | T |
|
|
|
|
|
+-----+---+---+
|
|
|
|
|
| F # T | F |
|
|
|
|
|
+-----+---+---+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Auto-Enumerated Lists
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
Implemented 2005-03-24: combination of variation 1 & 2.
|
|
|
|
|
|
|
|
|
|
The advantage of auto-numbered enumerated lists would be similar to
|
|
|
|
|
that of auto-numbered footnotes: lists could be written and rearranged
|
|
|
|
|
without having to manually renumber them. The disadvantages are also
|
|
|
|
|
the same: input and output wouldn't match exactly; the markup may be
|
|
|
|
|
ugly or confusing (depending on which alternative is chosen).
|
|
|
|
|
|
|
|
|
|
1. Use the "#" symbol. Example::
|
|
|
|
|
|
|
|
|
|
#. Item 1.
|
|
|
|
|
#. Item 2.
|
|
|
|
|
#. Item 3.
|
|
|
|
|
|
|
|
|
|
Advantages: simple, explicit. Disadvantage: enumeration sequence
|
|
|
|
|
cannot be specified (limited to arabic numerals); ugly.
|
|
|
|
|
|
|
|
|
|
2. As a variation on #1, first initialize the enumeration sequence?
|
|
|
|
|
For example::
|
|
|
|
|
|
|
|
|
|
a) Item a.
|
|
|
|
|
#) Item b.
|
|
|
|
|
#) Item c.
|
|
|
|
|
|
|
|
|
|
Advantages: simple, explicit, any enumeration sequence possible.
|
|
|
|
|
Disadvantages: ugly; perhaps confusing with mixed concrete/abstract
|
|
|
|
|
enumerators.
|
|
|
|
|
|
|
|
|
|
3. Alternative suggested by Fred Bremmer, from experience with MoinMoin::
|
|
|
|
|
|
|
|
|
|
1. Item 1.
|
|
|
|
|
1. Item 2.
|
|
|
|
|
1. Item 3.
|
|
|
|
|
|
|
|
|
|
Advantages: enumeration sequence is explicit (could be multiple
|
|
|
|
|
"a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise
|
|
|
|
|
erroneous input (e.g., a duplicate item "1.") would pass silently,
|
|
|
|
|
either causing a problem later in the list (if no blank lines
|
|
|
|
|
between items) or creating two lists (with blanks).
|
|
|
|
|
|
|
|
|
|
Take this input for example::
|
|
|
|
|
|
|
|
|
|
1. Item 1.
|
|
|
|
|
|
|
|
|
|
1. Unintentional duplicate of item 1.
|
|
|
|
|
|
|
|
|
|
2. Item 2.
|
|
|
|
|
|
|
|
|
|
Currently the parser will produce two list, "1" and "1,2" (no
|
|
|
|
|
warnings, because of the presence of blank lines). Using Fred's
|
|
|
|
|
notation, the current behavior is "1,1,2 -> 1 1,2" (without blank
|
|
|
|
|
lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What
|
|
|
|
|
should the behavior be with auto-numbering?
|
|
|
|
|
|
|
|
|
|
Fred has produced a patch__, whose initial behavior is as follows::
|
|
|
|
|
|
|
|
|
|
1,1,1 -> 1,2,3
|
|
|
|
|
1,2,2 -> 1,2,3
|
|
|
|
|
3,3,3 -> 3,4,5
|
|
|
|
|
1,2,2,3 -> 1,2,3 [WARNING] 3
|
|
|
|
|
1,1,2 -> 1,2 [WARNING] 2
|
|
|
|
|
|
|
|
|
|
(After the "[WARNING]", the "3" would begin a new list.)
|
|
|
|
|
|
|
|
|
|
I have mixed feelings about adding this functionality to the spec &
|
|
|
|
|
parser. It would certainly be useful to some users (myself
|
|
|
|
|
included; I often have to renumber lists). Perhaps it's too
|
|
|
|
|
clever, asking the parser to guess too much. What if you *do* want
|
|
|
|
|
three one-item lists in a row, each beginning with "1."? You'd
|
|
|
|
|
have to use empty comments to force breaks. Also, I question
|
|
|
|
|
whether "1,2,2 -> 1,2,3" is optimal behavior.
|
|
|
|
|
|
|
|
|
|
In response, Fred came up with "a stricter and more explicit rule
|
|
|
|
|
[which] would be to only auto-number silently if *all* the
|
|
|
|
|
enumerators of a list were identical". In that case::
|
|
|
|
|
|
|
|
|
|
1,1,1 -> 1,2,3
|
|
|
|
|
1,2,2 -> 1,2 [WARNING] 2
|
|
|
|
|
3,3,3 -> 3,4,5
|
|
|
|
|
1,2,2,3 -> 1,2 [WARNING] 2,3
|
|
|
|
|
1,1,2 -> 1,2 [WARNING] 2
|
|
|
|
|
|
|
|
|
|
Should any start-value be allowed ("3,3,3"), or should
|
|
|
|
|
auto-numbered lists be limited to begin with ordinal-1 ("1", "A",
|
|
|
|
|
"a", "I", or "i")?
|
|
|
|
|
|
|
|
|
|
__ http://sourceforge.net/tracker/index.php?func=detail&aid=548802
|
|
|
|
|
&group_id=38414&atid=422032
|
|
|
|
|
|
|
|
|
|
4. Alternative proposed by Tony Ibbs::
|
|
|
|
|
|
|
|
|
|
#1. First item.
|
|
|
|
|
#3. Aha - I edited this in later.
|
|
|
|
|
#2. Second item.
|
|
|
|
|
|
|
|
|
|
The initial proposal required unique enumerators within a list, but
|
|
|
|
|
this limits the convenience of a feature of already limited
|
|
|
|
|
applicability and convenience. Not a useful requirement; dropped.
|
|
|
|
|
|
|
|
|
|
Instead, simply prepend a "#" to a standard list enumerator to
|
|
|
|
|
indicate auto-enumeration. The numbers (or letters) of the
|
|
|
|
|
enumerators themselves are not significant, except:
|
|
|
|
|
|
|
|
|
|
- as a sequence indicator (arabic, roman, alphabetic; upper/lower),
|
|
|
|
|
|
|
|
|
|
- and perhaps as a start value (first list item).
|
|
|
|
|
|
|
|
|
|
Advantages: explicit, any enumeration sequence possible.
|
|
|
|
|
Disadvantages: a bit ugly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Adjacent citation references
|
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
A special case for inline markup was proposed and implemented:
|
|
|
|
|
multiple citation references could be joined into one::
|
|
|
|
|
|
|
|
|
|
[cite1]_[cite2]_ instead of requiring [cite1]_ [cite2]_
|
|
|
|
|
|
|
|
|
|
However, this was rejected as an unwarranted exception to the rules
|
|
|
|
|
for inline markup.
|
|
|
|
|
(The main motivation for the proposal, grouping citations in the latex writer,
|
|
|
|
|
was implemented by recognising the second group in the example above and
|
|
|
|
|
transforming it into ``\cite{cite1,cite2}``.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inline markup recognition
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
Implemented 2011-12-05 (version 0.9):
|
|
|
|
|
Extended `inline markup recognition rules`_.
|
|
|
|
|
|
|
|
|
|
Non-ASCII whitespace, punctuation characters and "international" quotes are
|
|
|
|
|
allowed around inline markup (based on `Unicode categories`_). The rules for
|
|
|
|
|
ASCII characters were not changed.
|
|
|
|
|
|
|
|
|
|
Rejected alternatives:
|
|
|
|
|
|
|
|
|
|
a) Use `Unicode categories`_ for all chars (ASCII or not)
|
|
|
|
|
|
|
|
|
|
+1 comprehensible, standards based,
|
|
|
|
|
-1 many "false positives" need escaping,
|
|
|
|
|
-1 not backwards compatible.
|
|
|
|
|
|
|
|
|
|
b) full backwards compatibility
|
|
|
|
|
|
|
|
|
|
:Pi: only before start-string
|
|
|
|
|
:Pf: only behind end-string
|
|
|
|
|
:Po: "conservative" sorting of other punctuation:
|
|
|
|
|
|
|
|
|
|
:``.,;!?\\``: Close
|
|
|
|
|
:``¡¿``: Open
|
|
|
|
|
|
|
|
|
|
+1 backwards compatible,
|
|
|
|
|
+1 logical extension of the existing rules,
|
|
|
|
|
-1 exception list for "other" punctuation needed,
|
|
|
|
|
-1 rules even more complicated,
|
|
|
|
|
-1 not clear how to sort "other" punctuation that is currently not
|
|
|
|
|
recognized,
|
|
|
|
|
-2 international quoting convention like
|
|
|
|
|
»German ›angular‹ quotes« not recognized.
|
|
|
|
|
|
|
|
|
|
.. _Inline markup recognition rules:
|
|
|
|
|
../../ref/rst/restructuredtext.html#inline-markup-recognition-rules
|
|
|
|
|
.. _Unicode categories:
|
|
|
|
|
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-----------------
|
|
|
|
|
Not Implemented
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
Reworking Footnotes
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
As a further wrinkle (see `Reworking Explicit Markup (Round 1)`_
|
|
|
|
|
above), in the wee hours of 2002-02-28 I posted several ideas for
|
|
|
|
|
changes to footnote syntax:
|
|
|
|
|
|
|
|
|
|
- Change footnote syntax from ``.. [1]`` to ``_[1]``? ...
|
|
|
|
|
- Differentiate (with new DTD elements) author-date "citations"
|
|
|
|
|
(``[GVR2002]``) from numbered footnotes? ...
|
|
|
|
|
- Render footnote references as superscripts without "[]"? ...
|
|
|
|
|
|
|
|
|
|
These ideas are all related, and suggest changes in the
|
|
|
|
|
reStructuredText syntax as well as the docutils tree model.
|
|
|
|
|
|
|
|
|
|
The footnote has been used for both true footnotes (asides expanding
|
|
|
|
|
on points or defining terms) and for citations (references to external
|
|
|
|
|
works). Rather than dealing with one amalgam construct, we could
|
|
|
|
|
separate the current footnote concept into strict footnotes and
|
|
|
|
|
citations. Citations could be interpreted and treated differently
|
|
|
|
|
from footnotes. Footnotes would be limited to numerical labels:
|
|
|
|
|
manual ("1") and auto-numbered (anonymous "#", named "#label").
|
|
|
|
|
|
|
|
|
|
The footnote is the only explicit markup construct (starts with ".. ")
|
|
|
|
|
that directly translates to a visible body element. I've always been
|
|
|
|
|
a little bit uncomfortable with the ".. " marker for footnotes because
|
|
|
|
|
of this; ".. " has a connotation of "special", but footnotes aren't
|
|
|
|
|
especially "special". Printed texts often put footnotes at the bottom
|
|
|
|
|
of the page where the reference occurs (thus "foot note"). Some HTML
|
|
|
|
|
designs would leave footnotes to be rendered the same positions where
|
|
|
|
|
they're defined. Other online and printed designs will gather
|
|
|
|
|
footnotes into a section near the end of the document, converting them
|
|
|
|
|
to "endnotes" (perhaps using a directive in our case); but this
|
|
|
|
|
"special processing" is not an intrinsic property of the footnote
|
|
|
|
|
itself, but a decision made by the document author or processing
|
|
|
|
|
system.
|
|
|
|
|
|
|
|
|
|
Citations are almost invariably collected in a section at the end of a
|
|
|
|
|
document or section. Citations "disappear" from where they are
|
|
|
|
|
defined and are magically reinserted at some well-defined point.
|
|
|
|
|
There's more of a connection to the "special" connotation of the ".. "
|
|
|
|
|
syntax. The point at which the list of citations is inserted could be
|
|
|
|
|
defined manually by a directive (e.g., ".. citations::"), and/or have
|
|
|
|
|
default behavior (e.g., a section automatically inserted at the end of
|
|
|
|
|
the document) that might be influenced by options to the Writer.
|
|
|
|
|
|
|
|
|
|
Syntax proposals:
|
|
|
|
|
|
|
|
|
|
+ Footnotes:
|
|
|
|
|
|
|
|
|
|
- Current syntax::
|
|
|
|
|
|
|
|
|
|
.. [1] Footnote 1
|
|
|
|
|
.. [#] Auto-numbered footnote.
|
|
|
|
|
.. [#label] Auto-labeled footnote.
|
|
|
|
|
|
|
|
|
|
- The syntax proposed in the original 2002-02-28 Doc-SIG post:
|
|
|
|
|
remove the ".. ", prefix a "_"::
|
|
|
|
|
|
|
|
|
|
_[1] Footnote 1
|
|
|
|
|
_[#] Auto-numbered footnote.
|
|
|
|
|
_[#label] Auto-labeled footnote.
|
|
|
|
|
|
|
|
|
|
The leading underscore syntax (earlier dropped because
|
|
|
|
|
``.. _[1]:`` was too verbose) is a useful reminder that footnotes
|
|
|
|
|
are hyperlink targets.
|
|
|
|
|
|
|
|
|
|
- Minimal syntax: remove the ".. [" and "]", prefix a "_", and
|
|
|
|
|
suffix a "."::
|
|
|
|
|
|
|
|
|
|
_1. Footnote 1.
|
|
|
|
|
_#. Auto-numbered footnote.
|
|
|
|
|
_#label. Auto-labeled footnote.
|
|
|
|
|
|
|
|
|
|
``_1.``, ``_#.``, and ``_#label.`` are markers,
|
|
|
|
|
like list markers.
|
|
|
|
|
|
|
|
|
|
Footnotes could be rendered something like this in HTML
|
|
|
|
|
|
|
|
|
|
| 1. This is a footnote. The brackets could be dropped
|
|
|
|
|
| from the label, and a vertical bar could set them
|
|
|
|
|
| off from the rest of the document in the HTML.
|
|
|
|
|
|
|
|
|
|
Two-way hyperlinks on the footnote marker ("1." above) would also
|
|
|
|
|
help to differentiate footnotes from enumerated lists.
|
|
|
|
|
|
|
|
|
|
If converted to endnotes (by a directive/transform), a horizontal
|
|
|
|
|
half-line might be used instead. Page-oriented output formats
|
|
|
|
|
would typically use the horizontal line for true footnotes.
|
|
|
|
|
|
|
|
|
|
+ Footnote references:
|
|
|
|
|
|
|
|
|
|
- Current syntax::
|
|
|
|
|
|
|
|
|
|
[1]_, [#]_, [#label]_
|
|
|
|
|
|
|
|
|
|
- Minimal syntax to match the minimal footnote syntax above::
|
|
|
|
|
|
|
|
|
|
1_, #_, #label_
|
|
|
|
|
|
|
|
|
|
As a consequence, pure-numeric hyperlink references would not be
|
|
|
|
|
possible; they'd be interpreted as footnote references.
|
|
|
|
|
|
|
|
|
|
+ Citation references: no change is proposed from the current footnote
|
|
|
|
|
reference syntax::
|
|
|
|
|
|
|
|
|
|
[GVR2001]_
|
|
|
|
|
|
|
|
|
|
+ Citations:
|
|
|
|
|
|
|
|
|
|
- Current syntax (footnote syntax)::
|
|
|
|
|
|
|
|
|
|
.. [GVR2001] Python Documentation; van Rossum, Drake, et al.;
|
|
|
|
|
http://www.python.org/doc/
|
|
|
|
|
|
|
|
|
|
- Possible new syntax::
|
|
|
|
|
|
|
|
|
|
_[GVR2001] Python Documentation; van Rossum, Drake, et al.;
|
|
|
|
|
http://www.python.org/doc/
|
|
|
|
|
|
|
|
|
|
_[DJG2002]
|
|
|
|
|
Docutils: Python Documentation Utilities project; Goodger
|
|
|
|
|
et al.; http://docutils.sourceforge.net/
|
|
|
|
|
|
|
|
|
|
Without the ".. " marker, subsequent lines would either have to
|
|
|
|
|
align as in one of the above, or we'd have to allow loose
|
|
|
|
|
alignment (I'd rather not)::
|
|
|
|
|
|
|
|
|
|
_[GVR2001] Python Documentation; van Rossum, Drake, et al.;
|
|
|
|
|
http://www.python.org/doc/
|
|
|
|
|
|
|
|
|
|
I proposed adopting the "minimal" syntax for footnotes and footnote
|
|
|
|
|
references, and adding citations and citation references to
|
|
|
|
|
reStructuredText's repertoire. The current footnote syntax for
|
|
|
|
|
citations is better than the alternatives given.
|
|
|
|
|
|
|
|
|
|
From a reply by Tony Ibbs on 2002-03-01:
|
|
|
|
|
|
|
|
|
|
However, I think easier with examples, so let's create one::
|
|
|
|
|
|
|
|
|
|
Fans of Terry Pratchett are perhaps more likely to use
|
|
|
|
|
footnotes [1]_ in their own writings than other people
|
|
|
|
|
[2]_. Of course, in *general*, one only sees footnotes
|
|
|
|
|
in academic or technical writing - it's use in fiction
|
|
|
|
|
and letter writing is not normally considered good
|
|
|
|
|
style [4]_, particularly in emails (not a medium that
|
|
|
|
|
lends itself to footnotes).
|
|
|
|
|
|
|
|
|
|
.. [1] That is, little bits of referenced text at the
|
|
|
|
|
bottom of the page.
|
|
|
|
|
.. [2] Because Terry himself does, of course [3]_.
|
|
|
|
|
.. [3] Although he has the distinction of being
|
|
|
|
|
*funny* when he does it, and his fans don't always
|
|
|
|
|
achieve that aim.
|
|
|
|
|
.. [4] Presumably because it detracts from linear
|
|
|
|
|
reading of the text - this is, of course, the point.
|
|
|
|
|
|
|
|
|
|
and look at it with the second syntax proposal::
|
|
|
|
|
|
|
|
|
|
Fans of Terry Pratchett are perhaps more likely to use
|
|
|
|
|
footnotes [1]_ in their own writings than other people
|
|
|
|
|
[2]_. Of course, in *general*, one only sees footnotes
|
|
|
|
|
in academic or technical writing - it's use in fiction
|
|
|
|
|
and letter writing is not normally considered good
|
|
|
|
|
style [4]_, particularly in emails (not a medium that
|
|
|
|
|
lends itself to footnotes).
|
|
|
|
|
|
|
|
|
|
_[1] That is, little bits of referenced text at the
|
|
|
|
|
bottom of the page.
|
|
|
|
|
_[2] Because Terry himself does, of course [3]_.
|
|
|
|
|
_[3] Although he has the distinction of being
|
|
|
|
|
*funny* when he does it, and his fans don't always
|
|
|
|
|
achieve that aim.
|
|
|
|
|
_[4] Presumably because it detracts from linear
|
|
|
|
|
reading of the text - this is, of course, the point.
|
|
|
|
|
|
|
|
|
|
(I note here that if I have gotten the indentation of the
|
|
|
|
|
footnotes themselves correct, this is clearly not as nice. And if
|
|
|
|
|
the indentation should be to the left margin instead, I like that
|
|
|
|
|
even less).
|
|
|
|
|
|
|
|
|
|
and the third (new) proposal::
|
|
|
|
|
|
|
|
|
|
Fans of Terry Pratchett are perhaps more likely to use
|
|
|
|
|
footnotes 1_ in their own writings than other people
|
|
|
|
|
2_. Of course, in *general*, one only sees footnotes
|
|
|
|
|
in academic or technical writing - it's use in fiction
|
|
|
|
|
and letter writing is not normally considered good
|
|
|
|
|
style 4_, particularly in emails (not a medium that
|
|
|
|
|
lends itself to footnotes).
|
|
|
|
|
|
|
|
|
|
_1. That is, little bits of referenced text at the
|
|
|
|
|
bottom of the page.
|
|
|
|
|
_2. Because Terry himself does, of course 3_.
|
|
|
|
|
_3. Although he has the distinction of being
|
|
|
|
|
*funny* when he does it, and his fans don't always
|
|
|
|
|
achieve that aim.
|
|
|
|
|
_4. Presumably because it detracts from linear
|
|
|
|
|
reading of the text - this is, of course, the point.
|
|
|
|
|
|
|
|
|
|
I think I don't, in practice, mind the targets too much (the use
|
|
|
|
|
of a dot after the number helps a lot here), but I do have a
|
|
|
|
|
problem with the body text, in that I don't naturally separate out
|
|
|
|
|
the footnotes as different than the rest of the text - instead I
|
|
|
|
|
keep wondering why there are numbers interspered in the text. The
|
|
|
|
|
use of brackets around the numbers ([ and ]) made me somehow parse
|
|
|
|
|
the footnote references as "odd" - i.e., not part of the body text
|
|
|
|
|
- and thus both easier to skip, and also (paradoxically) easier to
|
|
|
|
|
pick out so that I could follow them.
|
|
|
|
|
|
|
|
|
|
Thus, for the moment (and as always susceptable to argument), I'd
|
|
|
|
|
say -1 on the new form of footnote reference (i.e., I much prefer
|
|
|
|
|
the existing ``[1]_`` over the proposed ``1_``), and ambivalent
|
|
|
|
|
over the proposed target change.
|
|
|
|
|
|
|
|
|
|
That leaves David's problem of wanting to distinguish footnotes
|
|
|
|
|
and citations - and the only thing I can propose there is that
|
|
|
|
|
footnotes are numeric or # and citations are not (which, as a
|
|
|
|
|
human being, I can probably cope with!).
|
|
|
|
|
|
|
|
|
|
From a reply by Paul Moore on 2002-03-01:
|
|
|
|
|
|
|
|
|
|
I think the current footnote syntax ``[1]_`` is *exactly* the
|
|
|
|
|
right balance of distinctness vs unobtrusiveness. I very
|
|
|
|
|
definitely don't think this should change.
|
|
|
|
|
|
|
|
|
|
On the target change, it doesn't matter much to me.
|
|
|
|
|
|
|
|
|
|
From a further reply by Tony Ibbs on 2002-03-01, referring to the
|
|
|
|
|
"[1]" form and actual usage in email:
|
|
|
|
|
|
|
|
|
|
Clearly this is a form people are used to, and thus we should
|
|
|
|
|
consider it strongly (in the same way that the usage of ``*..*``
|
|
|
|
|
to mean emphasis was taken partly from email practise).
|
|
|
|
|
|
|
|
|
|
Equally clearly, there is something "magical" for people in the
|
|
|
|
|
use of a similar form (i.e., ``[1]``) for both footnote reference
|
|
|
|
|
and footnote target - it seems natural to keep them similar.
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
I think that this established plaintext usage leads me to strongly
|
|
|
|
|
believe we should retain square brackets at both ends of a
|
|
|
|
|
footnote. The markup of the reference end (a single trailing
|
|
|
|
|
underscore) seems about as minimal as we can get away with. The
|
|
|
|
|
markup of the target end depends on how one envisages the thing -
|
|
|
|
|
if ".." means "I am a target" (as I tend to see it), then that's
|
|
|
|
|
good, but one can also argue that the "_[1]" syntax has a neat
|
|
|
|
|
symmetry with the footnote reference itself, if one wishes (in
|
|
|
|
|
which case ".." presumably means "hidden/special" as David seems
|
|
|
|
|
to think, which is why one needs a ".." *and* a leading underline
|
|
|
|
|
for hyperlink targets.
|
|
|
|
|
|
|
|
|
|
Given the persuading arguments voiced, we'll leave footnote & footnote
|
|
|
|
|
reference syntax alone. Except that these discussions gave rise to
|
|
|
|
|
the "auto-symbol footnote" concept, which has been added. Citations
|
|
|
|
|
and citation references have also been added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syntax for Questions & Answers
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
Implement as a generic two-column marked list? As a standalone
|
|
|
|
|
(non-directive) construct? (Is the markup ambiguous?) Add support to
|
|
|
|
|
parts.contents?
|
|
|
|
|
|
|
|
|
|
New elements would be required. Perhaps::
|
|
|
|
|
|
|
|
|
|
<!ELEMENT question_list (question_list_item+)>
|
|
|
|
|
<!ATTLIST question_list
|
|
|
|
|
numbering (none | local | global)
|
|
|
|
|
#IMPLIED
|
|
|
|
|
start NUMBER #IMPLIED>
|
|
|
|
|
<!ELEMENT question_list_item (question, answer*)>
|
|
|
|
|
<!ELEMENT question %text.model;>
|
|
|
|
|
<!ELEMENT answer (%body.elements;)+>
|
|
|
|
|
|
|
|
|
|
Originally I thought of implementing a Q&A list with special syntax::
|
|
|
|
|
|
|
|
|
|
Q: What am I?
|
|
|
|
|
|
|
|
|
|
A: You are a question-and-answer
|
|
|
|
|
list.
|
|
|
|
|
|
|
|
|
|
Q: What are you?
|
|
|
|
|
|
|
|
|
|
A: I am the omniscient "we".
|
|
|
|
|
|
|
|
|
|
Where each "Q" and "A" could also be numbered (e.g., "Q1"). However,
|
|
|
|
|
a simple enumerated or bulleted list will do just fine for syntax. A
|
|
|
|
|
directive could treat the list specially; e.g. the first paragraph
|
|
|
|
|
could be treated as a question, the remainder as the answer (multiple
|
|
|
|
|
answers could be represented by nested lists). Without special
|
|
|
|
|
syntax, this directive becomes low priority.
|
|
|
|
|
|
|
|
|
|
As described in the FAQ__, no special syntax or directive is needed
|
|
|
|
|
for this application.
|
|
|
|
|
|
|
|
|
|
__ http://docutils.sf.net/FAQ.html
|
|
|
|
|
#how-can-i-mark-up-a-faq-or-other-list-of-questions-answers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
--------
|
|
|
|
|
Tabled
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
Reworking Explicit Markup (Round 2)
|
|
|
|
|
===================================
|
|
|
|
|
|
|
|
|
|
See `Reworking Explicit Markup (Round 1)`_ for an earlier discussion.
|
|
|
|
|
|
|
|
|
|
In April 2004, a new thread becan on docutils-develop: `Inconsistency
|
|
|
|
|
in RST markup`__. Several arguments were made; the first argument
|
|
|
|
|
begat later arguments. Below, the arguments are paraphrased "in
|
|
|
|
|
quotes", with responses.
|
|
|
|
|
|
|
|
|
|
__ http://thread.gmane.org/gmane.text.docutils.devel/1386
|
|
|
|
|
|
|
|
|
|
1. References and targets take this form::
|
|
|
|
|
|
|
|
|
|
targetname_
|
|
|
|
|
|
|
|
|
|
.. _targetname: stuff
|
|
|
|
|
|
|
|
|
|
But footnotes, "which generate links just like targets do", are
|
|
|
|
|
written as::
|
|
|
|
|
|
|
|
|
|
[1]_
|
|
|
|
|
|
|
|
|
|
.. [1] stuff
|
|
|
|
|
|
|
|
|
|
"Footnotes should be written as"::
|
|
|
|
|
|
|
|
|
|
[1]_
|
|
|
|
|
|
|
|
|
|
.. _[1]: stuff
|
|
|
|
|
|
|
|
|
|
But they're not the same type of animal. That's not a "footnote
|
|
|
|
|
target", it's a *footnote*. Being a target is not a footnote's
|
|
|
|
|
primary purpose (an arguable point). It just happens to grow a
|
|
|
|
|
target automatically, for convenience. Just as a section title::
|
|
|
|
|
|
|
|
|
|
Title
|
|
|
|
|
=====
|
|
|
|
|
|
|
|
|
|
isn't a "title target", it's a *title*, which happens to grow a
|
|
|
|
|
target automatically. The consistency is there, it's just deeper
|
|
|
|
|
than at first glance.
|
|
|
|
|
|
|
|
|
|
Also, ".. [1]" was chosen for footnote syntax because it closely
|
|
|
|
|
resembles one form of actual footnote rendering. ".. _[1]:" is too
|
|
|
|
|
verbose; excessive punctuation is required to get the job done.
|
|
|
|
|
|
|
|
|
|
For more of the reasoning behind the syntax, see `Problems With
|
|
|
|
|
StructuredText (Hyperlinks) <problems.html#hyperlinks>`__ and
|
|
|
|
|
`Reworking Footnotes`_.
|
|
|
|
|
|
|
|
|
|
2. "I expect directives to also look like ``.. this:`` [one colon]
|
|
|
|
|
because that also closely parallels the link and footnote target
|
|
|
|
|
markup."
|
|
|
|
|
|
|
|
|
|
There are good reasons for the two-colon syntax:
|
|
|
|
|
|
|
|
|
|
Two colons are used after the directive type for these reasons:
|
|
|
|
|
|
|
|
|
|
- Two colons are distinctive, and unlikely to be used in common
|
|
|
|
|
text.
|
|
|
|
|
|
|
|
|
|
- Two colons avoids clashes with common comment text like::
|
|
|
|
|
|
|
|
|
|
.. Danger: modify at your own risk!
|
|
|
|
|
|
|
|
|
|
- If an implementation of reStructuredText does not recognize a
|
|
|
|
|
directive (i.e., the directive-handler is not installed), a
|
|
|
|
|
level-3 (error) system message is generated, and the entire
|
|
|
|
|
directive block (including the directive itself) will be
|
|
|
|
|
included as a literal block. Thus "::" is a natural choice.
|
|
|
|
|
|
|
|
|
|
-- `restructuredtext.html#directives
|
|
|
|
|
<../../ref/rst/restructuredtext.html#directives>`__
|
|
|
|
|
|
|
|
|
|
The last reason is not particularly compelling; it's more of a
|
|
|
|
|
convenient coincidence or mnemonic.
|
|
|
|
|
|
|
|
|
|
3. "Comments always seemed too easy. I almost never write comments.
|
|
|
|
|
I'd have no problem writing '.. comment:' in front of my comments.
|
|
|
|
|
In fact, it would probably be more readable, as comments *should*
|
|
|
|
|
be set off strongly, because they are very different from normal
|
|
|
|
|
text."
|
|
|
|
|
|
|
|
|
|
Many people do use comments though, and some applications of
|
|
|
|
|
reStructuredText require it. For example, all reStructuredText
|
|
|
|
|
PEPs (and this document!) have an Emacs stanza at the bottom, in a
|
|
|
|
|
comment. Having to write ".. comment::" would be very obtrusive.
|
|
|
|
|
|
|
|
|
|
Comments *should* be dirt-easy to do. It should be easy to
|
|
|
|
|
"comment out" a block of text. Comments in programming languages
|
|
|
|
|
and other markup languages are invariably easy.
|
|
|
|
|
|
|
|
|
|
Any author is welcome to preface their comments with "Comment:" or
|
|
|
|
|
"Do Not Print" or "Note to Editor" or anything they like. A
|
|
|
|
|
"comment" directive could easily be implemented. It might be
|
|
|
|
|
confused with admonition directives, like "note" and "caution"
|
|
|
|
|
though. In unrelated (and unpublished and unfinished) work, adding
|
|
|
|
|
a "comment" directive as a true document element was considered::
|
|
|
|
|
|
|
|
|
|
If structure is necessary, we could use a "comment" directive
|
|
|
|
|
(to avoid nonsensical DTD changes, the "comment" directive
|
|
|
|
|
could produce an untitled topic element).
|
|
|
|
|
|
|
|
|
|
4. "One of the goals of reStructuredText is to be *readable* by people
|
|
|
|
|
who don't know it. This construction violates that: it is not at
|
|
|
|
|
all obvious to the uninitiated that text marked by '..' is a
|
|
|
|
|
comment. On the other hand, '.. comment:' would be totally
|
|
|
|
|
transparent."
|
|
|
|
|
|
|
|
|
|
Totally transparent, perhaps, but also very obtrusive. Another of
|
|
|
|
|
`reStructuredText's goals`_ is to be unobtrusive, and
|
|
|
|
|
".. comment::" would violate that. The goals of reStructuredText
|
|
|
|
|
are many, and they conflict. Determining the right set of goals
|
|
|
|
|
and finding solutions that best fit is done on a case-by-case
|
|
|
|
|
basis.
|
|
|
|
|
|
|
|
|
|
Even readability is has two aspects. Being readable without any
|
|
|
|
|
prior knowledge is one. Being as easily read in raw form as in
|
|
|
|
|
processed form is the other. ".." may not contribute to the former
|
|
|
|
|
aspect, but ".. comment::" would certainly detract from the latter.
|
|
|
|
|
|
|
|
|
|
.. _author's note:
|
|
|
|
|
.. _reStructuredText's goals: ../../ref/rst/introduction.html#goals
|
|
|
|
|
|
|
|
|
|
5. "Recently I sent someone an rst document, and they got confused; I
|
|
|
|
|
had to explain to them that '..' marks comments, *unless* it's a
|
|
|
|
|
directive, etc..."
|
|
|
|
|
|
|
|
|
|
The explanation of directives *is* roundabout, defining comments in
|
|
|
|
|
terms of not being other things. That's definitely a wart.
|
|
|
|
|
|
|
|
|
|
6. "Under the current system, a mistyped directive (with ':' instead
|
|
|
|
|
of '::') will be silently ignored. This is an error that could
|
|
|
|
|
easily go unnoticed."
|
|
|
|
|
|
|
|
|
|
A parser option/setting like "--comments-on-stderr" would help.
|
|
|
|
|
|
|
|
|
|
7. "I'd prefer to see double-dot-space / command / double-colon as the
|
|
|
|
|
standard Docutils markup-marker. It's unusual enough to avoid
|
|
|
|
|
being accidently used. Everything that starts with a double-dot
|
|
|
|
|
should end with a double-colon."
|
|
|
|
|
|
|
|
|
|
That would increase the punctuation verbosity of some constructs
|
|
|
|
|
considerably.
|
|
|
|
|
|
|
|
|
|
8. Edward Loper proposed the following plan for backwards
|
|
|
|
|
compatibility:
|
|
|
|
|
|
|
|
|
|
1. ".. foo" will generate a deprecation warning to stderr, and
|
|
|
|
|
nothing in the output (no system messages).
|
|
|
|
|
2. ".. foo: bar" will be treated as a directive foo. If there
|
|
|
|
|
is no foo directive, then do the normal error output.
|
|
|
|
|
3. ".. foo:: bar" will generate a deprecation warning to
|
|
|
|
|
stderr, and be treated as a directive. Or leave it valid?
|
|
|
|
|
|
|
|
|
|
So some existing documents might start printing deprecation
|
|
|
|
|
warnings, but the only existing documents that would *break*
|
|
|
|
|
would be ones that say something like::
|
|
|
|
|
|
|
|
|
|
.. warning: this should be a comment
|
|
|
|
|
|
|
|
|
|
instead of::
|
|
|
|
|
|
|
|
|
|
.. warning:: this should be a comment
|
|
|
|
|
|
|
|
|
|
Here, we're trading fairly common a silent error (directive
|
|
|
|
|
falsely treated as a comment) for a fairly uncommon explicitly
|
|
|
|
|
flagged error (comment falsely treated as directive). To make
|
|
|
|
|
things even easier, we could add a sentence to the
|
|
|
|
|
unknown-directive error. Something like "If you intended to
|
|
|
|
|
create a comment, please use '.. comment:' instead".
|
|
|
|
|
|
|
|
|
|
On one hand, I understand and sympathize with the points raised. On
|
|
|
|
|
the other hand, I think the current syntax strikes the right balance
|
|
|
|
|
(but I acknowledge a possible lack of objectivity). On the gripping
|
|
|
|
|
hand, the comment and directive syntax has become well established, so
|
|
|
|
|
even if it's a wart, it may be a wart we have to live with.
|
|
|
|
|
|
|
|
|
|
Making any of these changes would cause a lot of breakage or at least
|
|
|
|
|
deprecation warnings. I'm not sure the benefit is worth the cost.
|
|
|
|
|
|
|
|
|
|
For now, we'll treat this as an unresolved legacy issue.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------
|
|
|
|
|
To Do
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
Nested Inline Markup
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
These are collected notes on a long-discussed issue. The original
|
|
|
|
|
mailing list messages should be referred to for details.
|
|
|
|
|
|
|
|
|
|
* In a 2001-10-31 discussion I wrote:
|
|
|
|
|
|
|
|
|
|
Try, for example, `Ed Loper's 2001-03-21 post`_, which details
|
|
|
|
|
some rules for nested inline markup. I think the complexity is
|
|
|
|
|
prohibitive for the marginal benefit. (And if you can understand
|
|
|
|
|
that tree without going mad, you're a better man than I. ;-)
|
|
|
|
|
|
|
|
|
|
Inline markup is already fragile. Allowing nested inline markup
|
|
|
|
|
would only be asking for trouble IMHO. If it proves absolutely
|
|
|
|
|
necessary, it can be added later. The rules for what can appear
|
|
|
|
|
inside what must be well thought out first though.
|
|
|
|
|
|
|
|
|
|
.. _Ed Loper's 2001-03-21 post:
|
|
|
|
|
http://mail.python.org/pipermail/doc-sig/2001-March/001487.html
|
|
|
|
|
|
|
|
|
|
-- http://mail.python.org/pipermail/doc-sig/2001-October/002354.html
|
|
|
|
|
|
|
|
|
|
* In a 2001-11-09 Doc-SIG post, I wrote:
|
|
|
|
|
|
|
|
|
|
The problem is that in the
|
|
|
|
|
what-you-see-is-more-or-less-what-you-get markup language that
|
|
|
|
|
is reStructuredText, the symbols used for inline markup ("*",
|
|
|
|
|
"**", "`", "``", etc.) may preclude nesting.
|
|
|
|
|
|
|
|
|
|
I've rethought this position. Nested markup is not precluded, just
|
|
|
|
|
tricky. People and software parse "double and 'single' quotes" all
|
|
|
|
|
the time. Continuing,
|
|
|
|
|
|
|
|
|
|
I've thought over how we might implement nested inline
|
|
|
|
|
markup. The first algorithm ("first identify the outer inline
|
|
|
|
|
markup as we do now, then recursively scan for nested inline
|
|
|
|
|
markup") won't work; counterexamples were given in my `last post
|
|
|
|
|
<http://mail.python.org/pipermail/doc-sig/2001-November/002363.html>`__.
|
|
|
|
|
|
|
|
|
|
The second algorithm makes my head hurt::
|
|
|
|
|
|
|
|
|
|
while 1:
|
|
|
|
|
scan for start-string
|
|
|
|
|
if found:
|
|
|
|
|
push on stack
|
|
|
|
|
scan for start or end string
|
|
|
|
|
if new start string found:
|
|
|
|
|
recurse
|
|
|
|
|
elif matching end string found:
|
|
|
|
|
pop stack
|
|
|
|
|
elif non-matching end string found:
|
|
|
|
|
if its a markup error:
|
|
|
|
|
generate warning
|
|
|
|
|
elif the initial start-string was misinterpreted:
|
|
|
|
|
# e.g. in this case: ***strong** in emphasis*
|
|
|
|
|
restart with the other interpretation
|
|
|
|
|
# but it might be several layers back ...
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This is similar to how the parser does section title
|
|
|
|
|
recognition, but sections are much more regular and
|
|
|
|
|
deterministic.
|
|
|
|
|
|
|
|
|
|
Bottom line is, I don't think the benefits are worth the effort,
|
|
|
|
|
even if it is possible. I'm not going to try to write the code,
|
|
|
|
|
at least not now. If somebody codes up a consistent, working,
|
|
|
|
|
general solution, I'll be happy to consider it.
|
|
|
|
|
|
|
|
|
|
-- http://mail.python.org/pipermail/doc-sig/2001-November/002388.html
|
|
|
|
|
|
|
|
|
|
* In a `2003-05-06 Docutils-Users post`__ Paul Tremblay proposed a new
|
|
|
|
|
syntax to allow for easier nesting. It eventually evolved into
|
|
|
|
|
this::
|
|
|
|
|
|
|
|
|
|
:role:[inline text]
|
|
|
|
|
|
|
|
|
|
The duplication with the existing interpreted text syntax is
|
|
|
|
|
problematic though.
|
|
|
|
|
|
|
|
|
|
__ http://article.gmane.org/gmane.text.docutils.user/317
|
|
|
|
|
|
|
|
|
|
* Could the parser be extended to parse nested interpreted text? ::
|
|
|
|
|
|
|
|
|
|
:emphasis:`Some emphasized text with :strong:`some more
|
|
|
|
|
emphasized text` in it and **perhaps** :reference:`a link``
|
|
|
|
|
|
|
|
|
|
* In a `2003-06-18 Docutils-Develop post`__, Mark Nodine reported on
|
|
|
|
|
his implementation of a form of nested inline markup in his
|
|
|
|
|
Perl-based parser (unpublished). He brought up some interesting
|
|
|
|
|
ideas. The implementation was flawed, however, by the change in
|
|
|
|
|
semantics required for backslash escapes.
|
|
|
|
|
|
|
|
|
|
__ http://article.gmane.org/gmane.text.docutils.devel/795
|
|
|
|
|
|
|
|
|
|
* Docutils-develop threads between David Abrahams, David Goodger, and
|
|
|
|
|
Mark Nodine (beginning 2004-01-16__ and 2004-01-19__) hashed out
|
|
|
|
|
many of the details of a potentially successful implementation, as
|
|
|
|
|
described below. David Abrahams checked in code to the "nesting"
|
|
|
|
|
branch of CVS, awaiting thorough review.
|
|
|
|
|
|
|
|
|
|
__ http://thread.gmane.org/gmane.text.docutils.devel/1102
|
|
|
|
|
__ http://thread.gmane.org/gmane.text.docutils.devel/1125
|
|
|
|
|
|
|
|
|
|
It may be possible to accomplish nested inline markup in general with
|
|
|
|
|
a more powerful inline markup parser. There may be some issues, but
|
|
|
|
|
I'm not averse to the idea of nested inline markup in general. I just
|
|
|
|
|
don't have the time or inclination to write a new parser now. Of
|
|
|
|
|
course, a good patch would be welcome!
|
|
|
|
|
|
|
|
|
|
I envisage something like this. Explicit-role interpreted text must
|
|
|
|
|
be nestable. Prefix-based is probably preferred, since suffix-based
|
|
|
|
|
will look like inline literals::
|
|
|
|
|
|
|
|
|
|
``text`:role1:`:role2:
|
|
|
|
|
|
|
|
|
|
But it can be disambiguated, so it ought to be left up to the author::
|
|
|
|
|
|
|
|
|
|
`\ `text`:role1:`:role2:
|
|
|
|
|
|
|
|
|
|
In addition, other forms of inline markup may be nested if
|
|
|
|
|
unambiguous::
|
|
|
|
|
|
|
|
|
|
*emphasized ``literal`` and |substitution ref| and link_*
|
|
|
|
|
|
|
|
|
|
IOW, the parser ought to be as permissive as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Index Entries & Indexes
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
Were I writing a book with an index, I guess I'd need two
|
|
|
|
|
different kinds of index targets: inline/implicit and
|
|
|
|
|
out-of-line/explicit. For example::
|
|
|
|
|
|
|
|
|
|
In this `paragraph`:index:, several words are being
|
|
|
|
|
`marked`:index: inline as implicit `index`:index:
|
|
|
|
|
entries.
|
|
|
|
|
|
|
|
|
|
.. index:: markup
|
|
|
|
|
.. index:: syntax
|
|
|
|
|
|
|
|
|
|
The explicit index directives above would refer to
|
|
|
|
|
this paragraph. It might also make sense to allow multiple
|
|
|
|
|
entries in an ``index`` directive:
|
|
|
|
|
|
|
|
|
|
.. index::
|
|
|
|
|
markup
|
|
|
|
|
syntax
|
|
|
|
|
|
|
|
|
|
The words "paragraph", "marked", and "index" would become index
|
|
|
|
|
entries pointing at the words in the first paragraph. The index
|
|
|
|
|
entry words appear verbatim in the text. (Don't worry about the
|
|
|
|
|
ugly ":index:" part; if indexing is the only/main application of
|
|
|
|
|
interpreted text in your documents, it can be implicit and
|
|
|
|
|
omitted.) The two directives provide manual indexing, where the
|
|
|
|
|
index entry words ("markup" and "syntax") do not appear in the
|
|
|
|
|
main text. We could combine the two directives into one::
|
|
|
|
|
|
|
|
|
|
.. index:: markup; syntax
|
|
|
|
|
|
|
|
|
|
Semicolons instead of commas because commas could *be* part of the
|
|
|
|
|
index target, like::
|
|
|
|
|
|
|
|
|
|
.. index:: van Rossum, Guido
|
|
|
|
|
|
|
|
|
|
Another reason for index directives is because other inline markup
|
|
|
|
|
wouldn't be possible within inline index targets.
|
|
|
|
|
|
|
|
|
|
Sometimes index entries have multiple levels. Given::
|
|
|
|
|
|
|
|
|
|
.. index:: statement syntax: expression statements
|
|
|
|
|
|
|
|
|
|
In a hypothetical index, combined with other entries, it might
|
|
|
|
|
look like this::
|
|
|
|
|
|
|
|
|
|
statement syntax
|
|
|
|
|
expression statements ..... 56
|
|
|
|
|
assignment ................ 57
|
|
|
|
|
simple statements ......... 58
|
|
|
|
|
compound statements ....... 60
|
|
|
|
|
|
|
|
|
|
Inline multi-level index targets could be done too. Perhaps
|
|
|
|
|
something like::
|
|
|
|
|
|
|
|
|
|
When dealing with `expression statements <statement syntax:>`,
|
|
|
|
|
we must remember ...
|
|
|
|
|
|
|
|
|
|
The opposite sense could also be possible::
|
|
|
|
|
|
|
|
|
|
When dealing with `index entries <:multi-level>`, there are
|
|
|
|
|
many permutations to consider.
|
|
|
|
|
|
|
|
|
|
Also "see / see also" index entries.
|
|
|
|
|
|
|
|
|
|
Given::
|
|
|
|
|
|
|
|
|
|
Here's a paragraph.
|
|
|
|
|
|
|
|
|
|
.. index:: paragraph
|
|
|
|
|
|
|
|
|
|
(The "index" directive above actually targets the *preceding*
|
|
|
|
|
object.) The directive should produce something like this XML::
|
|
|
|
|
|
|
|
|
|
<paragraph>
|
|
|
|
|
<index_entry text="paragraph"/>
|
|
|
|
|
Here's a paragraph.
|
|
|
|
|
</paragraph>
|
|
|
|
|
|
|
|
|
|
This kind of content model would also allow true inline
|
|
|
|
|
index-entries::
|
|
|
|
|
|
|
|
|
|
Here's a `paragraph`:index:.
|
|
|
|
|
|
|
|
|
|
If the "index" role were the default for the application, it could be
|
|
|
|
|
dropped::
|
|
|
|
|
|
|
|
|
|
Here's a `paragraph`.
|
|
|
|
|
|
|
|
|
|
Both of these would result in this XML::
|
|
|
|
|
|
|
|
|
|
<paragraph>
|
|
|
|
|
Here's a <index_entry>paragraph</index_entry>.
|
|
|
|
|
</paragraph>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
from 2002-06-24 docutils-develop posts
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
|
|
If all of your index entries will appear verbatim in the text,
|
|
|
|
|
this should be sufficient. If not (e.g., if you want "Van Rossum,
|
|
|
|
|
Guido" in the index but "Guido van Rossum" in the text), we'll
|
|
|
|
|
have to figure out a supplemental mechanism, perhaps using
|
|
|
|
|
substitutions.
|
|
|
|
|
|
|
|
|
|
I've thought a bit more on this, and I came up with two possibilities:
|
|
|
|
|
|
|
|
|
|
1. Using interpreted text, embed the index entry text within the
|
|
|
|
|
interpreted text::
|
|
|
|
|
|
|
|
|
|
... by `Guido van Rossum [Van Rossum, Guido]` ...
|
|
|
|
|
|
|
|
|
|
The problem with this is obvious: the text becomes cluttered and
|
|
|
|
|
hard to read. The processed output would drop the text in
|
|
|
|
|
brackets, which goes against the spirit of interpreted text.
|
|
|
|
|
|
|
|
|
|
2. Use substitutions::
|
|
|
|
|
|
|
|
|
|
... by |Guido van Rossum| ...
|
|
|
|
|
|
|
|
|
|
.. |Guido van Rossum| index:: Van Rossum, Guido
|
|
|
|
|
|
|
|
|
|
A problem with this is that each substitution definition must have
|
|
|
|
|
a unique name. A subsequent ``.. |Guido van Rossum| index:: BDFL``
|
|
|
|
|
would be illegal. Some kind of anonymous substitution definition
|
|
|
|
|
mechanism would be required, but I think that's going too far.
|
|
|
|
|
|
|
|
|
|
Both of these alternatives are flawed. Any other ideas?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------------------
|
|
|
|
|
... Or Not To Do?
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
This is the realm of the possible but questionably probable. These
|
|
|
|
|
ideas are kept here as a record of what has been proposed, for
|
|
|
|
|
posterity and in case any of them prove to be useful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Compound Enumerated Lists
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to
|
|
|
|
|
allow for nested enumerated lists without indentation?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Indented Lists
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
Allow for variant styles by interpreting indented lists as if they
|
|
|
|
|
weren't indented? For example, currently the list below will be
|
|
|
|
|
parsed as a list within a block quote::
|
|
|
|
|
|
|
|
|
|
paragraph
|
|
|
|
|
|
|
|
|
|
* list item 1
|
|
|
|
|
* list item 2
|
|
|
|
|
|
|
|
|
|
But a lot of people seem to write that way, and HTML browsers make it
|
|
|
|
|
look as if that's the way it should be. The parser could check the
|
|
|
|
|
contents of block quotes, and if they contain only a single list,
|
|
|
|
|
remove the block quote wrapper. There would be two problems:
|
|
|
|
|
|
|
|
|
|
1. What if we actually *do* want a list inside a block quote?
|
|
|
|
|
|
|
|
|
|
2. What if such a list comes immediately after an indented construct,
|
|
|
|
|
such as a literal block?
|
|
|
|
|
|
|
|
|
|
Both could be solved using empty comments (problem 2 already exists
|
|
|
|
|
for a block quote after a literal block). But that's a hack.
|
|
|
|
|
|
|
|
|
|
Perhaps a runtime setting, allowing or disabling this convenience,
|
|
|
|
|
would be appropriate. But that raises issues too:
|
|
|
|
|
|
|
|
|
|
User A, who writes lists indented (and their config file is set up
|
|
|
|
|
to allow it), sends a file to user B, who doesn't (and their
|
|
|
|
|
config file disables indented lists). The result of processing by
|
|
|
|
|
the two users will be different.
|
|
|
|
|
|
|
|
|
|
It may seem minor, but it adds ambiguity to the parser, which is bad.
|
|
|
|
|
|
|
|
|
|
See the `Doc-SIG discussion starting 2001-04-18`__ with Ed Loper's
|
|
|
|
|
"Structuring: a summary; and an attempt at EBNF", item 4 (and
|
|
|
|
|
follow-ups, here__ and here__). Also `docutils-users, 2003-02-17`__
|
|
|
|
|
and `beginning 2003-08-04`__.
|
|
|
|
|
|
|
|
|
|
__ http://mail.python.org/pipermail/doc-sig/2001-April/001776.html
|
|
|
|
|
__ http://mail.python.org/pipermail/doc-sig/2001-April/001789.html
|
|
|
|
|
__ http://mail.python.org/pipermail/doc-sig/2001-April/001793.html
|
|
|
|
|
__ http://sourceforge.net/mailarchive/message.php?msg_id=3838913
|
|
|
|
|
__ http://sf.net/mailarchive/forum.php?thread_id=2957175&forum_id=11444
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sloppy Indentation of List Items
|
|
|
|
|
================================
|
|
|
|
|
|
|
|
|
|
Perhaps the indentation shouldn't be so strict. Currently, this is
|
|
|
|
|
required::
|
|
|
|
|
|
|
|
|
|
1. First line,
|
|
|
|
|
second line.
|
|
|
|
|
|
|
|
|
|
Anything wrong with this? ::
|
|
|
|
|
|
|
|
|
|
1. First line,
|
|
|
|
|
second line.
|
|
|
|
|
|
|
|
|
|
Problem? ::
|
|
|
|
|
|
|
|
|
|
1. First para.
|
|
|
|
|
|
|
|
|
|
Block quote. (no good: requires some indent relative to first
|
|
|
|
|
para)
|
|
|
|
|
|
|
|
|
|
Second Para.
|
|
|
|
|
|
|
|
|
|
2. Have to carefully define where the literal block ends::
|
|
|
|
|
|
|
|
|
|
Literal block
|
|
|
|
|
|
|
|
|
|
Literal block?
|
|
|
|
|
|
|
|
|
|
Hmm... Non-strict indentation isn't such a good idea.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lazy Indentation of List Items
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
Another approach: Going back to the first draft of reStructuredText
|
|
|
|
|
(2000-11-27 post to Doc-SIG)::
|
|
|
|
|
|
|
|
|
|
- This is the fourth item of the main list (no blank line above).
|
|
|
|
|
The second line of this item is not indented relative to the
|
|
|
|
|
bullet, which precludes it from having a second paragraph.
|
|
|
|
|
|
|
|
|
|
Change that to *require* a blank line above and below, to reduce
|
|
|
|
|
ambiguity. This "loosening" may be added later, once the parser's
|
|
|
|
|
been nailed down. However, a serious drawback of this approach is to
|
|
|
|
|
limit the content of each list item to a single paragraph.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David's Idea for Lazy Indentation
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
Consider a paragraph in a word processor. It is a single logical line
|
|
|
|
|
of text which ends with a newline, soft-wrapped arbitrarily at the
|
|
|
|
|
right edge of the page or screen. We can think of a plaintext
|
|
|
|
|
paragraph in the same way, as a single logical line of text, ending
|
|
|
|
|
with two newlines (a blank line) instead of one, and which may contain
|
|
|
|
|
arbitrary line breaks (newlines) where it was accidentally
|
|
|
|
|
hard-wrapped by an application. We can compensate for the accidental
|
|
|
|
|
hard-wrapping by "unwrapping" every unindented second and subsequent
|
|
|
|
|
line. The indentation of the first line of a paragraph or list item
|
|
|
|
|
would determine the indentation for the entire element. Blank lines
|
|
|
|
|
would be required between list items when using lazy indentation.
|
|
|
|
|
|
|
|
|
|
The following example shows the lazy indentation of multiple body
|
|
|
|
|
elements::
|
|
|
|
|
|
|
|
|
|
- This is the first paragraph
|
|
|
|
|
of the first list item.
|
|
|
|
|
|
|
|
|
|
Here is the second paragraph
|
|
|
|
|
of the first list item.
|
|
|
|
|
|
|
|
|
|
- This is the first paragraph
|
|
|
|
|
of the second list item.
|
|
|
|
|
|
|
|
|
|
Here is the second paragraph
|
|
|
|
|
of the second list item.
|
|
|
|
|
|
|
|
|
|
A more complex example shows the limitations of lazy indentation::
|
|
|
|
|
|
|
|
|
|
- This is the first paragraph
|
|
|
|
|
of the first list item.
|
|
|
|
|
|
|
|
|
|
Next is a definition list item:
|
|
|
|
|
|
|
|
|
|
Term
|
|
|
|
|
Definition. The indentation of the term is
|
|
|
|
|
required, as is the indentation of the definition's
|
|
|
|
|
first line.
|
|
|
|
|
|
|
|
|
|
When the definition extends to more than
|
|
|
|
|
one line, lazy indentation may occur. (This is the second
|
|
|
|
|
paragraph of the definition.)
|
|
|
|
|
|
|
|
|
|
- This is the first paragraph
|
|
|
|
|
of the second list item.
|
|
|
|
|
|
|
|
|
|
- Here is the first paragraph of
|
|
|
|
|
the first item of a nested list.
|
|
|
|
|
|
|
|
|
|
So this paragraph would be outside of the nested list,
|
|
|
|
|
but inside the second list item of the outer list.
|
|
|
|
|
|
|
|
|
|
But this paragraph is not part of the list at all.
|
|
|
|
|
|
|
|
|
|
And the ambiguity remains::
|
|
|
|
|
|
|
|
|
|
- Look at the hyphen at the beginning of the next line
|
|
|
|
|
- is it a second list item marker, or a dash in the text?
|
|
|
|
|
|
|
|
|
|
Similarly, we may want to refer to numbers inside enumerated
|
|
|
|
|
lists:
|
|
|
|
|
|
|
|
|
|
1. How many socks in a pair? There are
|
|
|
|
|
2. How many pants in a pair? Exactly
|
|
|
|
|
1. Go figure.
|
|
|
|
|
|
|
|
|
|
Literal blocks and block quotes would still require consistent
|
|
|
|
|
indentation for all their lines. For block quotes, we might be able
|
|
|
|
|
to get away with only requiring that the first line of each contained
|
|
|
|
|
element be indented. For example::
|
|
|
|
|
|
|
|
|
|
Here's a paragraph.
|
|
|
|
|
|
|
|
|
|
This is a paragraph inside a block quote.
|
|
|
|
|
Second and subsequent lines need not be indented at all.
|
|
|
|
|
|
|
|
|
|
- A bullet list inside
|
|
|
|
|
the block quote.
|
|
|
|
|
|
|
|
|
|
Second paragraph of the
|
|
|
|
|
bullet list inside the block quote.
|
|
|
|
|
|
|
|
|
|
Although feasible, this form of lazy indentation has problems. The
|
|
|
|
|
document structure and hierarchy is not obvious from the indentation,
|
|
|
|
|
making the source plaintext difficult to read. This will also make
|
|
|
|
|
keeping track of the indentation while writing difficult and
|
|
|
|
|
error-prone. However, these problems may be acceptable for Wikis and
|
|
|
|
|
email mode, where we may be able to rely on less complex structure
|
|
|
|
|
(few nested lists, for example).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Multiple Roles in Interpreted Text
|
|
|
|
|
==================================
|
|
|
|
|
|
|
|
|
|
In reStructuredText, inline markup cannot be nested (yet; `see
|
|
|
|
|
above`__). This also applies to interpreted text. In order to
|
|
|
|
|
simultaneously combine multiple roles for a single piece of text, a
|
|
|
|
|
syntax extension would be necessary. Ideas:
|
|
|
|
|
|
|
|
|
|
1. Initial idea::
|
|
|
|
|
|
|
|
|
|
`interpreted text`:role1,role2:
|
|
|
|
|
|
|
|
|
|
2. Suggested by Jason Diamond::
|
|
|
|
|
|
|
|
|
|
`interpreted text`:role1:role2:
|
|
|
|
|
|
|
|
|
|
If a document is so complex as to require nested inline markup,
|
|
|
|
|
perhaps another markup system should be considered. By design,
|
|
|
|
|
reStructuredText does not have the flexibility of XML.
|
|
|
|
|
|
|
|
|
|
__ `Nested Inline Markup`_
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameterized Interpreted Text
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
In some cases it may be expedient to pass parameters to interpreted
|
|
|
|
|
text, analogous to function calls. Ideas:
|
|
|
|
|
|
|
|
|
|
1. Parameterize the interpreted text role itself (suggested by Jason
|
|
|
|
|
Diamond)::
|
|
|
|
|
|
|
|
|
|
`interpreted text`:role1(foo=bar):
|
|
|
|
|
|
|
|
|
|
Positional parameters could also be supported::
|
|
|
|
|
|
|
|
|
|
`CSS`:acronym(Cascading Style Sheets): is used for HTML, and
|
|
|
|
|
`CSS`:acronym(Content Scrambling System): is used for DVDs.
|
|
|
|
|
|
|
|
|
|
Technical problem: current interpreted text syntax does not
|
|
|
|
|
recognize roles containing whitespace. Design problem: this smells
|
|
|
|
|
like programming language syntax, but reStructuredText is not a
|
|
|
|
|
programming language.
|
|
|
|
|
|
|
|
|
|
2. Put the parameters inside the interpreted text::
|
|
|
|
|
|
|
|
|
|
`CSS (Cascading Style Sheets)`:acronym: is used for HTML, and
|
|
|
|
|
`CSS (Content Scrambling System)`:acronym: is used for DVDs.
|
|
|
|
|
|
|
|
|
|
Although this could be defined on an individual basis (per role),
|
|
|
|
|
we ought to have a standard. Hyperlinks with embedded URIs already
|
|
|
|
|
use angle brackets; perhaps they could be used here too::
|
|
|
|
|
|
|
|
|
|
`CSS <Cascading Style Sheets>`:acronym: is used for HTML, and
|
|
|
|
|
`CSS <Content Scrambling System>`:acronym: is used for DVDs.
|
|
|
|
|
|
|
|
|
|
Do angle brackets connote URLs too much for this to be acceptable?
|
|
|
|
|
How about the "tag" connotation -- does it save them or doom them?
|
|
|
|
|
|
|
|
|
|
3. `Nested inline markup`_ could prove useful here::
|
|
|
|
|
|
|
|
|
|
`CSS :def:`Cascading Style Sheets``:acronym: is used for HTML,
|
|
|
|
|
and `CSS :def:`Content Scrambling System``:acronym: is used for
|
|
|
|
|
DVDs.
|
|
|
|
|
|
|
|
|
|
Inline markup roles could even define the default roles of nested
|
|
|
|
|
inline markup, allowing this cleaner syntax::
|
|
|
|
|
|
|
|
|
|
`CSS `Cascading Style Sheets``:acronym: is used for HTML, and
|
|
|
|
|
`CSS `Content Scrambling System``:acronym: is used for DVDs.
|
|
|
|
|
|
|
|
|
|
Does this push inline markup too far? Readability becomes a serious
|
|
|
|
|
issue. Substitutions may provide a better alternative (at the expense
|
|
|
|
|
of verbosity and duplication) by pulling the details out of the text
|
|
|
|
|
flow::
|
|
|
|
|
|
|
|
|
|
|CSS| is used for HTML, and |CSS-DVD| is used for DVDs.
|
|
|
|
|
|
|
|
|
|
.. |CSS| acronym:: Cascading Style Sheets
|
|
|
|
|
.. |CSS-DVD| acronym:: Content Scrambling System
|
|
|
|
|
:text: CSS
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
This whole idea may be going beyond the scope of reStructuredText.
|
|
|
|
|
Documents requiring this functionality may be better off using XML or
|
|
|
|
|
another markup system.
|
|
|
|
|
|
|
|
|
|
This argument comes up regularly when pushing the envelope of
|
|
|
|
|
reStructuredText syntax. I think it's a useful argument in that it
|
|
|
|
|
provides a check on creeping featurism. In many cases, the resulting
|
|
|
|
|
verbosity produces such unreadable plaintext that there's a natural
|
|
|
|
|
desire *not* to use it unless absolutely necessary. It's a matter of
|
|
|
|
|
finding the right balance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syntax for Interpreted Text Role Bindings
|
|
|
|
|
=========================================
|
|
|
|
|
|
|
|
|
|
The following syntax (idea from Jeffrey C. Jacobs) could be used to
|
|
|
|
|
associate directives with roles::
|
|
|
|
|
|
|
|
|
|
.. :rewrite: class:: rewrite
|
|
|
|
|
|
|
|
|
|
`She wore ribbons in her hair and it lay with streaks of
|
|
|
|
|
grey`:rewrite:
|
|
|
|
|
|
|
|
|
|
The syntax is similar to that of substitution declarations, and the
|
|
|
|
|
directive/role association may resolve implementation issues. The
|
|
|
|
|
semantics, ramifications, and implementation details would need to be
|
|
|
|
|
worked out.
|
|
|
|
|
|
|
|
|
|
The example above would implement the "rewrite" role as adding a
|
|
|
|
|
``class="rewrite"`` attribute to the interpreted text ("inline"
|
|
|
|
|
element). The stylesheet would then pick up on the "class" attribute
|
|
|
|
|
to do the actual formatting.
|
|
|
|
|
|
|
|
|
|
The advantage of the new syntax would be flexibility. Uses other than
|
|
|
|
|
"class" may present themselves. The disadvantage is complexity:
|
|
|
|
|
having to implement new syntax for a relatively specialized operation,
|
|
|
|
|
and having new semantics in existing directives ("class::" would do
|
|
|
|
|
something different).
|
|
|
|
|
|
|
|
|
|
The `"role" directive`__ has been implemented.
|
|
|
|
|
|
|
|
|
|
__ ../../ref/rst/directives.html#role
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Character Processing
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
Several people have suggested adding some form of character processing
|
|
|
|
|
to reStructuredText:
|
|
|
|
|
|
|
|
|
|
* Some sort of automated replacement of ASCII sequences:
|
|
|
|
|
|
|
|
|
|
- ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash).
|
|
|
|
|
- Convert quotes to curly quote entities. (Essentially impossible
|
|
|
|
|
for HTML? Unnecessary for TeX.)
|
|
|
|
|
- Various forms of ``:-)`` to smiley icons.
|
|
|
|
|
- ``"\ "`` to . Problem with line-wrapping though: it could
|
|
|
|
|
end up escaping the newline.
|
|
|
|
|
- Escaped newlines to <BR>.
|
|
|
|
|
- Escaped period or quote or dash as a disappearing catalyst to
|
|
|
|
|
allow character-level inline markup?
|
|
|
|
|
|
|
|
|
|
* XML-style character entities, such as "©" for the copyright
|
|
|
|
|
symbol.
|
|
|
|
|
|
|
|
|
|
Docutils has no need of a character entity subsystem. Supporting
|
|
|
|
|
Unicode and text encodings, character entities should be directly
|
|
|
|
|
represented in the text: a copyright symbol should be represented by
|
|
|
|
|
the copyright symbol character. If this is not possible in an
|
|
|
|
|
authoring environment, a pre-processing stage can be added, or a table
|
|
|
|
|
of substitution definitions can be devised.
|
|
|
|
|
|
|
|
|
|
A "unicode" directive has been implemented to allow direct
|
|
|
|
|
specification of esoteric characters. In combination with the
|
|
|
|
|
substitution construct, "include" files defining common sets of
|
|
|
|
|
character entities can be defined and used. `A set of character
|
|
|
|
|
entity set definition files have been defined`__ (`tarball`__).
|
|
|
|
|
There's also `a description and instructions for use`__.
|
|
|
|
|
|
|
|
|
|
__ http://docutils.sf.net/tmp/charents/
|
|
|
|
|
__ http://docutils.sf.net/tmp/charents.tgz
|
|
|
|
|
__ http://docutils.sf.net/tmp/charents/README.html
|
|
|
|
|
|
|
|
|
|
To allow for `character-level inline markup`_, a limited form of
|
|
|
|
|
character processing has been added to the spec and parser: escaped
|
|
|
|
|
whitespace characters are removed from the processed document. Any
|
|
|
|
|
further character processing will be of this functional type, rather
|
|
|
|
|
than of the character-encoding type.
|
|
|
|
|
|
|
|
|
|
.. _character-level inline markup:
|
|
|
|
|
../../ref/rst/restructuredtext.html#character-level-inline-markup
|
|
|
|
|
|
|
|
|
|
* Directive idea::
|
|
|
|
|
|
|
|
|
|
.. text-replace:: "pattern" "replacement"
|
|
|
|
|
|
|
|
|
|
- Support Unicode "U+XXXX" codes.
|
|
|
|
|
- Support regexps, perhaps with alternative "regexp-replace"
|
|
|
|
|
directive.
|
|
|
|
|
- Flags for regexps; ":flags:" option, or individuals.
|
|
|
|
|
- Specifically, should the default be case-sensistive or
|
|
|
|
|
-insensitive?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Page Or Line Breaks
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
* Should ^L (or something else in reST) be defined to mean
|
|
|
|
|
force/suggest page breaks in whatever output we have?
|
|
|
|
|
|
|
|
|
|
A "break" or "page-break" directive would be easy to add. A new
|
|
|
|
|
doctree element would be required though (perhaps "break"). The
|
|
|
|
|
final behavior would be up to the Writer. The directive argument
|
|
|
|
|
could be one of page/column/recto/verso for added flexibility.
|
|
|
|
|
|
|
|
|
|
Currently ^L (Python's ``\f``) characters are treated as whitespace.
|
|
|
|
|
They're converted to single spaces, actually, as are vertical tabs
|
|
|
|
|
(^K, Python's ``\v``). It would be possible to recognize form feeds
|
|
|
|
|
as markup, but it requires some thought and discussion first. Are
|
|
|
|
|
there any downsides? Many editing environments do not allow the
|
|
|
|
|
insertion of control characters. Will it cause any harm? It would
|
|
|
|
|
be useful as a shorthand for the directive.
|
|
|
|
|
|
|
|
|
|
It's common practice to use ^L before Emacs "Local Variables"
|
|
|
|
|
lists::
|
|
|
|
|
|
|
|
|
|
^L
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|
|
|
|
|
|
|
|
|
|
These are already present in many PEPs and Docutils project
|
|
|
|
|
documents. From the Emacs manual (info):
|
|
|
|
|
|
|
|
|
|
A "local variables list" goes near the end of the file, in the
|
|
|
|
|
last page. (It is often best to put it on a page by itself.)
|
|
|
|
|
|
|
|
|
|
It would be unfortunate if this construct caused a final blank page
|
|
|
|
|
to be generated (for those Writers that recognize the page breaks).
|
|
|
|
|
We'll have to add a transform that looks for a "break" plus zero or
|
|
|
|
|
more comments at the end of a document, and removes them.
|
|
|
|
|
|
|
|
|
|
Probably a bad idea because there is no such thing as a page in a
|
|
|
|
|
generic document format.
|
|
|
|
|
|
|
|
|
|
* Could the "break" concept above be extended to inline forms?
|
|
|
|
|
E.g. "^L" in the middle of a sentence could cause a line break.
|
|
|
|
|
Only recognize it at the end of a line (i.e., ``\f\n``)?
|
|
|
|
|
|
|
|
|
|
Or is formfeed inappropriate? Perhaps vertical tab (``\v``), but
|
|
|
|
|
even that's a stretch. Can't use carriage returns, since they're
|
|
|
|
|
commonly used for line endings.
|
|
|
|
|
|
|
|
|
|
Probably a bad idea as well because we do not want to use control
|
|
|
|
|
characters for well-readable and well-writable markup, and after all
|
|
|
|
|
we have the line block syntax for line breaks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Superscript Markup
|
|
|
|
|
==================
|
|
|
|
|
|
|
|
|
|
Add ``^superscript^`` inline markup? The only common non-markup uses
|
|
|
|
|
of "^" I can think of are as short hand for "superscript" itself and
|
|
|
|
|
for describing control characters ("^C to cancel"). The former
|
|
|
|
|
supports the proposed syntax, and it could be argued that the latter
|
|
|
|
|
ought to be literal text anyhow (e.g. "``^C`` to cancel").
|
|
|
|
|
|
|
|
|
|
However, superscripts are seldom needed, and new syntax would break
|
|
|
|
|
existing documents. When it's needed, the ``:superscript:``
|
|
|
|
|
(``:sup:``) role can we used as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Code Execution
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
Add the following directives?
|
|
|
|
|
|
|
|
|
|
- "exec": Execute Python code & insert the results. Call it
|
|
|
|
|
"python" to allow for other languages?
|
|
|
|
|
|
|
|
|
|
- "system": Execute an ``os.system()`` call, and insert the results
|
|
|
|
|
(possibly as a literal block). Definitely dangerous! How to make
|
|
|
|
|
it safe? Perhaps such processing should be left outside of the
|
|
|
|
|
document, in the user's production system (a makefile or a script or
|
|
|
|
|
whatever). Or, the directive could be disabled by default and only
|
|
|
|
|
enabled with an explicit command-line option or config file setting.
|
|
|
|
|
Even then, an interactive prompt may be useful, such as:
|
|
|
|
|
|
|
|
|
|
The file.txt document you are processing contains a "system"
|
|
|
|
|
directive requesting that the ``sudo rm -rf /`` command be
|
|
|
|
|
executed. Allow it to execute? (y/N)
|
|
|
|
|
|
|
|
|
|
- "eval": Evaluate an expression & insert the text. At parse
|
|
|
|
|
time or at substitution time? Dangerous? Perhaps limit to canned
|
|
|
|
|
macros; see text.date_.
|
|
|
|
|
|
|
|
|
|
.. _text.date: ../todo.html#text-date
|
|
|
|
|
|
|
|
|
|
It's too dangerous (or too complicated in the case of "eval"). We do
|
|
|
|
|
not want to have such things in the core.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``encoding`` Directive
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Add an "encoding" directive to specify the character encoding of the
|
|
|
|
|
input data? Not a good idea for the following reasons:
|
|
|
|
|
|
|
|
|
|
- When it sees the directive, the parser will already have read the
|
|
|
|
|
input data, and encoding determination will already have been done.
|
|
|
|
|
|
|
|
|
|
- If a file with an "encoding" directive is edited and saved with
|
|
|
|
|
a different encoding, the directive may cause data corruption.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Support for Annotations
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
Add an "annotation" role, as the equivalent of the HTML "title"
|
|
|
|
|
attribute? This is secondary information that may "pop up" when the
|
|
|
|
|
pointer hovers over the main text. A corresponding directive would be
|
|
|
|
|
required to associate annotations with the original text (by name, or
|
|
|
|
|
positionally as in anonymous targets?).
|
|
|
|
|
|
|
|
|
|
There have not been many requests for such feature, though. Also,
|
|
|
|
|
cluttering WYSIWYG plaintext with annotations may not seem like a good
|
|
|
|
|
idea, and there is no "tool tip" in formats other than HTML.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``term`` Role
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
Add a "term" role for unfamiliar or specialized terminology? Probably
|
|
|
|
|
not; there is no real use case, and emphasis is enough for most cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Object references
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
We need syntax for `object references`_.
|
|
|
|
|
|
|
|
|
|
- Parameterized substitutions? For example::
|
|
|
|
|
|
|
|
|
|
See |figure (figure name)| on |page (figure name)|.
|
|
|
|
|
|
|
|
|
|
.. |figure (name)| figure-ref:: (name)
|
|
|
|
|
.. |page (name)| page-ref:: (name)
|
|
|
|
|
|
|
|
|
|
The result would be::
|
|
|
|
|
|
|
|
|
|
See figure 3.11 on page 157.
|
|
|
|
|
|
|
|
|
|
But this would require substitution directives to be processed at
|
|
|
|
|
reference-time, not at definition-time as they are now. Or,
|
|
|
|
|
perhaps the directives could just leave ``pending`` elements
|
|
|
|
|
behind, and the transforms do the work? How to pass the data
|
|
|
|
|
through? Too complicated. Use interpreted text roles.
|
|
|
|
|
|
|
|
|
|
.. _object references:
|
|
|
|
|
../todo.html#object-numbering-and-object-references
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|