You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
197 lines
7.3 KiB
197 lines
7.3 KiB
7 years ago
|
================================
|
||
|
Docutils_ Internationalization
|
||
|
================================
|
||
|
|
||
|
:Author: David Goodger
|
||
|
:Contact: docutils-develop@lists.sourceforge.net
|
||
|
:Date: $Date: 2014-07-07 05:12:02 +0200 (Mo, 07 Jul 2014) $
|
||
|
:Revision: $Revision: 7769 $
|
||
|
:Copyright: This document has been placed in the public domain.
|
||
|
|
||
|
|
||
|
.. contents::
|
||
|
|
||
|
|
||
|
This document describes the internationalization facilities of the
|
||
|
Docutils_ project. `Introduction to i18n`_ by Tomohiro KUBOTA is a
|
||
|
good general reference. "Internationalization" is often abbreviated
|
||
|
as "i18n": "i" + 18 letters + "n".
|
||
|
|
||
|
.. Note::
|
||
|
|
||
|
The i18n facilities of Docutils should be considered a "first
|
||
|
draft". They work so far, but improvements are welcome.
|
||
|
Specifically, standard i18n facilities like "gettext" have yet to
|
||
|
be explored.
|
||
|
|
||
|
Docutils is designed to work flexibly with text in multiple languages
|
||
|
(one language at a time). Language-specific features are (or should
|
||
|
be [#]_) fully parameterized. To enable a new language, two modules
|
||
|
have to be added to the project: one for Docutils itself (the
|
||
|
`Docutils Language Module`_) and one for the reStructuredText parser
|
||
|
(the `reStructuredText Language Module`_).
|
||
|
|
||
|
.. [#] If anything in Docutils is insufficiently parameterized, it
|
||
|
should be considered a bug. Please report bugs to the Docutils
|
||
|
project bug tracker on SourceForge at
|
||
|
http://sourceforge.net/p/docutils/bugs/
|
||
|
|
||
|
.. _Docutils: http://docutils.sourceforge.net/
|
||
|
.. _Introduction to i18n:
|
||
|
http://www.debian.org/doc/manuals/intro-i18n/
|
||
|
|
||
|
|
||
|
Language Module Names
|
||
|
=====================
|
||
|
|
||
|
Language modules are named using `language tags`_ as defined in
|
||
|
`BCP 47`_. [#]_ in lowercase, converting hyphens to underscores [#]_.
|
||
|
|
||
|
A typical language identifier consists of a 2-letter language code
|
||
|
from `ISO 639`_ (3-letter codes can be used if no 2-letter code
|
||
|
exists). The language identifier can have an optional subtag,
|
||
|
typically for variations based on country (from `ISO 3166`_ 2-letter
|
||
|
country codes). If no language identifier is specified, the default
|
||
|
is "en" for English. Examples of module names include ``en.py``,
|
||
|
``fr.py``, ``ja.py``, and ``pt_br.py``.
|
||
|
|
||
|
.. [#] BCP stands for 'Best Current Practice', and is a persistent
|
||
|
name for a series of RFCs whose numbers change as they are updated.
|
||
|
The latest RFC describing language tag syntax is RFC 5646, Tags for
|
||
|
the Identification of Languages, and it obsoletes the older RFCs
|
||
|
4646, 3066 and 1766.
|
||
|
|
||
|
.. [#] Subtags are separated from primary tags by underscores instead
|
||
|
of hyphens, to conform to Python naming rules.
|
||
|
|
||
|
.. _language tags: http://www.w3.org/International/articles/language-tags/
|
||
|
.. _BCP 47: http://www.rfc-editor.org/rfc/bcp/bcp47.txt
|
||
|
.. _ISO 639: http://www.loc.gov/standards/iso639-2/php/English_list.php
|
||
|
.. _ISO 3166: http://www.iso.ch/iso/en/prods-services/iso3166ma/
|
||
|
02iso-3166-code-lists/index.html
|
||
|
|
||
|
|
||
|
Python Code
|
||
|
===========
|
||
|
|
||
|
Generally Python code in Docutils is ASCII-only. In language modules,
|
||
|
Unicode-escapes can be used for non-ASCII characters.
|
||
|
|
||
|
`PEP 263`_ introduces source code encodings to Python modules,
|
||
|
implemented beginning in Python 2.3. Especially for languages with
|
||
|
non-Latin scripts, using UTF-8 encoded literal Unicode strings increases the
|
||
|
readability. Start the source code file with the magic comment::
|
||
|
|
||
|
# -*- coding: utf-8 -*-
|
||
|
|
||
|
As mentioned in the note above, developers are invited to explore
|
||
|
"gettext" and other i18n technologies.
|
||
|
|
||
|
.. _PEP 263: http://www.python.org/peps/pep-0263.html
|
||
|
|
||
|
|
||
|
Docutils Language Module
|
||
|
========================
|
||
|
|
||
|
Modules in ``docutils/languages`` contain language mappings for
|
||
|
markup-independent language-specific features of Docutils. To make a
|
||
|
new language module, just copy the ``en.py`` file, rename it with the
|
||
|
code for your language (see `Language Module Names`_ above), and
|
||
|
translate the terms as described below.
|
||
|
|
||
|
Each Docutils language module contains three module attributes:
|
||
|
|
||
|
``labels``
|
||
|
This is a mapping of node class names to language-dependent
|
||
|
boilerplate label text. The label text is used by Writer
|
||
|
components when they encounter document tree elements whose class
|
||
|
names are the mapping keys.
|
||
|
|
||
|
The entry values (*not* the keys) should be translated to the
|
||
|
target language.
|
||
|
|
||
|
``bibliographic_fields``
|
||
|
This is a mapping of language-dependent field names (converted to
|
||
|
lower case) to canonical field names (keys of
|
||
|
``DocInfo.biblio_notes`` in ``docutils.transforms.frontmatter``).
|
||
|
It is used when transforming bibliographic fields.
|
||
|
|
||
|
The keys should be translated to the target language.
|
||
|
|
||
|
``author_separators``
|
||
|
This is a list of strings used to parse the 'Authors'
|
||
|
bibliographic field. They separate individual authors' names, and
|
||
|
are tried in order (i.e., earlier items take priority, and the
|
||
|
first item that matches wins). The English-language module
|
||
|
defines them as ``[';', ',']``; semi-colons can be used to
|
||
|
separate names like "Arthur Pewtie, Esq.".
|
||
|
|
||
|
Most languages won't have to "translate" this list.
|
||
|
|
||
|
|
||
|
reStructuredText Language Module
|
||
|
================================
|
||
|
|
||
|
Modules in ``docutils/parsers/rst/languages`` contain language
|
||
|
mappings for language-specific features of the reStructuredText
|
||
|
parser. To make a new language module, just copy the ``en.py`` file,
|
||
|
rename it with the code for your language (see `Language Module
|
||
|
Names`_ above), and translate the terms as described below.
|
||
|
|
||
|
Each reStructuredText language module contains two module attributes:
|
||
|
|
||
|
``directives``
|
||
|
This is a mapping from language-dependent directive names to
|
||
|
canonical directive names. The canonical directive names are
|
||
|
registered in ``docutils/parsers/rst/directives/__init__.py``, in
|
||
|
``_directive_registry``.
|
||
|
|
||
|
The keys should be translated to the target language. Synonyms
|
||
|
(multiple keys with the same values) are allowed; this is useful
|
||
|
for abbreviations.
|
||
|
|
||
|
``roles``
|
||
|
This is a mapping language-dependent role names to canonical role
|
||
|
names for interpreted text. The canonical directive names are
|
||
|
registered in ``docutils/parsers/rst/states.py``, in
|
||
|
``Inliner._interpreted_roles`` (this may change).
|
||
|
|
||
|
The keys should be translated to the target language. Synonyms
|
||
|
(multiple keys with the same values) are allowed; this is useful
|
||
|
for abbreviations.
|
||
|
|
||
|
|
||
|
Testing the Language Modules
|
||
|
============================
|
||
|
|
||
|
Whenever a new language module is added or an existing one modified,
|
||
|
the unit tests should be run. The test modules can be found in the
|
||
|
docutils/test directory from code_ or from the `latest snapshot`_.
|
||
|
|
||
|
The ``test_language.py`` module can be run as a script. With no
|
||
|
arguments, it will test all language modules. With one or more
|
||
|
language codes, it will test just those languages. For example::
|
||
|
|
||
|
$ python test_language.py en
|
||
|
..
|
||
|
----------------------------------------
|
||
|
Ran 2 tests in 0.095s
|
||
|
|
||
|
OK
|
||
|
|
||
|
Use the "alltests.py" script to run all test modules, exhaustively
|
||
|
testing the parser and other parts of the Docutils system.
|
||
|
|
||
|
.. _code: https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/
|
||
|
.. _latest snapshot: https://sourceforge.net/p/docutils/code/HEAD/tarball
|
||
|
|
||
|
|
||
|
Submitting the Language Modules
|
||
|
===============================
|
||
|
|
||
|
If you do not have repository write access and want to contribute your
|
||
|
language modules, feel free to submit them via the `SourceForge patch
|
||
|
tracker`__.
|
||
|
|
||
|
__ http://sourceforge.net/p/docutils/patches/
|