|
|
====================== |
|
|
Python Source Reader |
|
|
====================== |
|
|
:Author: David Goodger |
|
|
:Contact: docutils-develop@lists.sourceforge.net |
|
|
:Revision: $Revision: 7302 $ |
|
|
:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03 Jan 2012) $ |
|
|
:Copyright: This document has been placed in the public domain. |
|
|
|
|
|
This document explores issues around extracting and processing |
|
|
docstrings from Python modules. |
|
|
|
|
|
For definitive element hierarchy details, see the "Python Plaintext |
|
|
Document Interface DTD" XML document type definition, pysource.dtd_ |
|
|
(which modifies the generic docutils.dtd_). Descriptions below list |
|
|
'DTD elements' (XML 'generic identifiers' or tag names) corresponding |
|
|
to syntax constructs. |
|
|
|
|
|
|
|
|
.. contents:: |
|
|
|
|
|
|
|
|
Model |
|
|
===== |
|
|
|
|
|
The Python Source Reader ("PySource") model that's evolving in my mind |
|
|
goes something like this: |
|
|
|
|
|
1. Extract the docstring/namespace [#]_ tree from the module(s) and/or |
|
|
package(s). |
|
|
|
|
|
.. [#] See `Docstring Extractor`_ below. |
|
|
|
|
|
2. Run the parser on each docstring in turn, producing a forest of |
|
|
doctrees (per nodes.py). |
|
|
|
|
|
3. Join the docstring trees together into a single tree, running |
|
|
transforms: |
|
|
|
|
|
- merge hyperlinks |
|
|
- merge namespaces |
|
|
- create various sections like "Module Attributes", "Functions", |
|
|
"Classes", "Class Attributes", etc.; see pysource.dtd_ |
|
|
- convert the above special sections to ordinary doctree nodes |
|
|
|
|
|
4. Run transforms on the combined doctree. Examples: resolving |
|
|
cross-references/hyperlinks (including interpreted text on Python |
|
|
identifiers); footnote auto-numbering; first field list -> |
|
|
bibliographic elements. |
|
|
|
|
|
(Or should step 4's transforms come before step 3?) |
|
|
|
|
|
5. Pass the resulting unified tree to the writer/builder. |
|
|
|
|
|
I've had trouble reconciling the roles of input parser and output |
|
|
writer with the idea of modes ("readers" or "directors"). Does the |
|
|
mode govern the tranformation of the input, the output, or both? |
|
|
Perhaps the mode should be split into two. |
|
|
|
|
|
For example, say the source of our input is a Python module. Our |
|
|
"input mode" should be the "Python Source Reader". It discovers (from |
|
|
``__docformat__``) that the input parser is "reStructuredText". If we |
|
|
want HTML, we'll specify the "HTML" output formatter. But there's a |
|
|
piece missing. What *kind* or *style* of HTML output do we want? |
|
|
PyDoc-style, LibRefMan style, etc. (many people will want to specify |
|
|
and control their own style). Is the output style specific to a |
|
|
particular output format (XML, HTML, etc.)? Is the style specific to |
|
|
the input mode? Or can/should they be independent? |
|
|
|
|
|
I envision interaction between the input parser, an "input mode" , and |
|
|
the output formatter. The same intermediate data format would be used |
|
|
between each of these, being transformed as it progresses. |
|
|
|
|
|
|
|
|
Docstring Extractor |
|
|
=================== |
|
|
|
|
|
We need code that scans a parsed Python module, and returns an ordered |
|
|
tree containing the names, docstrings (including attribute and |
|
|
additional docstrings), and additional info (in parentheses below) of |
|
|
all of the following objects: |
|
|
|
|
|
- packages |
|
|
- modules |
|
|
- module attributes (+ values) |
|
|
- classes (+ inheritance) |
|
|
- class attributes (+ values) |
|
|
- instance attributes (+ values) |
|
|
- methods (+ formal parameters & defaults) |
|
|
- functions (+ formal parameters & defaults) |
|
|
|
|
|
(Extract comments too? For example, comments at the start of a module |
|
|
would be a good place for bibliographic field lists.) |
|
|
|
|
|
In order to evaluate interpreted text cross-references, namespaces for |
|
|
each of the above will also be required. |
|
|
|
|
|
See python-dev/docstring-develop thread "AST mining", started on |
|
|
2001-08-14. |
|
|
|
|
|
|
|
|
Interpreted Text |
|
|
================ |
|
|
|
|
|
DTD elements: package, module, class, method, function, |
|
|
module_attribute, class_attribute, instance_attribute, variable, |
|
|
parameter, type, exception_class, warning_class. |
|
|
|
|
|
To classify identifiers explicitly, the role is given along with the |
|
|
identifier in either prefix or suffix form:: |
|
|
|
|
|
Use :method:`Keeper.storedata` to store the object's data in |
|
|
`Keeper.data`:instance_attribute:. |
|
|
|
|
|
The role may be one of 'package', 'module', 'class', 'method', |
|
|
'function', 'module_attribute', 'class_attribute', |
|
|
'instance_attribute', 'variable', 'parameter', 'type', |
|
|
'exception_class', 'exception', 'warning_class', or 'warning'. Other |
|
|
roles may be defined. |
|
|
|
|
|
.. _pysource.dtd: pysource.dtd |
|
|
.. _docutils.dtd: ../ref/docutils.dtd |
|
|
|
|
|
|
|
|
.. |
|
|
Local Variables: |
|
|
mode: indented-text |
|
|
indent-tabs-mode: nil |
|
|
fill-column: 70 |
|
|
End:
|
|
|
|