### Overview
Python-Markdown includes an API for extension writers to plug their own
custom functionality and/or syntax into the parser. There are preprocessors
which allow you to alter the source before it is passed to the parser,
inline patterns which allow you to add, remove or override the syntax of
any inline elements, and postprocessors which allow munging of the
output of the parser before it is returned.
As the parser builds an [ElementTree][] object which is later rendered
as Unicode text, there are also some helpers provided to make manipulation of
the tree easier. Each part of the API is discussed in its respective
section below. You may find reading the source of some [[Available Extensions]]
helpful as well. For example, the [[Footnotes]] extension uses most of the
features documented here.
* [Preprocessors][]
* [TextPreprocessors][]
* [Line Preprocessors][]
* [InlinePatterns][]
* [Postprocessors][]
* [ElementTree Postprocessors][]
* [TextProstprocessors][]
* [Working with the ElementTree][]
* [Integrating your code into Markdown][]
* [extendMarkdown][]
* [registerExtension][]
* [Config Settings][]
* [makeExtension][]
Preprocessors
Preprocessors munge the source text before it is passed into the Markdown
core. This is an excellent place to clean up bad syntax, extract things the
parser may otherwise choke on and perhaps even store it for later retrieval.
There are two types of preprocessors: [TextPreprocessors][] and
[Line Preprocessors][].
TextPreprocessors
TextPreprocessors should inherit from `markdown.TextPreprocessor` and implement
a `run` method with one argument `text`. The `run` method of each
TextPreprocessor will be passed the entire source text as a single Unicode
string and should either return that single Unicode string, or an altered
version of it.
For example, a simple TextPreprocessor that normalizes newlines [^1] might look
like this:
class NormalizePreprocessor(markdown.TextPreprocessor):
def run(self, text):
return text.replace("\r\n", "\n").replace("\r", "\n")
[^1]: It should be noted that Markdown already normalizes newlines. This
example is for illustrative purposes only.
Line Preprocessors
Line Preprocessors should inherit from `markdown.Preprocessor` and implement
a `run` method with one argument `lines`. The `run` method of each Line
Preprocessor will be passed the entire source text as a list of Unicode strings.
Each string will contain one line of text. The `run` method should return
either that list, or an altered list of Unicode strings.
A pseudo example:
class MyPreprocessor(markdown.Preprocessor):
def run(self, lines):
new_lines = []
for line in lines:
m = MYREGEX.match(line)
if m:
# do stuff
else:
new_lines.append(line)
return new_lines
Inline Patterns
Inline Patterns implement the inline HTML element syntax for Markdown such as
`*emphasis*` or `[links](http://example.com)`. Pattern objects should be
instances of classes that inherit from `markdown.Pattern` or one of its
children. Each pattern object uses a single regular expression and must have
the following methods:
* `getCompiledRegExp()`: Returns a compiled regular expression.
* `handleMatch(m)`: Accepts a match object and returns an ElementTree
element of a plain Unicode string.
Note that any regular expression returned by `getCompiledRegExp` must capture
the whole block. Therefore, they should all start with `r'^(.*?)'` and end
with `r'(.*?)!'. When using the default `getCompiledRegExp()` method provided
in the `Pattern` you can pass in a regular expression without that and
`getCompiledRegExp` will wrap your expression for you. This means that the first
group of your match will be `m.group(2)` as `m.group(1)` will match everything
before the pattern.
For an example, consider this simplified emphasis pattern:
class EmphasisPattern(markdown.Pattern):
def handleMatch(self, m):
el = markdown.etree.Element('em')
el.text = m.group(3)
return el
As discussed in [Integrating Your Code Into Markdown][], an instance of this
class will need to be provided to Markdown. That instance would be created
like so:
# an oversimplified regex
MYPATTERN = r'\*([^*]+)\*'
# pass in pattern and create instance
emphasis = EmphasisPattern(MYPATTERN)
Actually it would not be necessary to create that pattern (and not just because
a more sophisticated emphasis pattern already exists in Markdown). The fact is,
that example pattern is not very DRY. A pattern for `**strong**` text would
be almost identical, with the exception that it would create a 'strong' element.
Therefore, Markdown provides a number of generic pattern classes that can
provide some common functionality. For example, both emphasis and strong are
implemented with separate instances of the `SimpleTagPettern` listed below.
Feel free to use or extend any of these Pattern classes.
**Generic Pattern Classes**
* `SimpleTextPattern(pattern)`:
Returns simple text of `group(2)` of a `pattern`.
* `SimpleTagPattern(pattern, tag)`:
Returns an element of type "`tag`" with a text attribute of `group(3)`
of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
* `SubstituteTagPattern(pattern, tag)`:
Returns an element of type "`tag`" with no children or text (i.e.: 'br').
There may be other Pattern classes in the Markdown source that you could extend
or use as well. Read through the source and see if there is anything you can
use. You might even get a few ideas for different approaches to your specific
situation.
Postprocessors
Postprocessors manipulate a document after it has passed through the Markdown
core. This is were stored text gets added back in such as a list of footnotes,
a table of contents or raw html.
There are two types of postprocessors: [ElementTree Postprocessors][] and
[TextPostprocessors][].
ElementTree Postprocessors
An ElementTree Postprocessor should inherit from `markdown.Postprocessor`,
over-ride the `run` method which takes one argument `root` and return either
that root element or a modified root element.
A pseudo example:
class MyPostprocessor(markdown.Postprocessor):
def run(self, root):
#do stufff
return my_modified_root
For specifics on manipulating the ElementTree, see
[Working with the ElementTree][] below.
TextPostprocessors
A TextPostprocessor should inherit from `markdown.TextPostprocessor` and
over-ride the `run` method which takes one argument `text` and returns a
Unicode string.
TextPostprocessors are run after the ElementTree has been serialized back into
Unicode text. For example, this may be an appropriate place to add a table of
contents to a document:
class TocTextPostprocessor(markdown.TextPostprocessor):
def run(self, text):
return MYMARKERRE.sub(MyToc, text)
Working with the ElementTree
As mentioned, the Markdown parser converts a source document to an
[ElementTree][] object before serializing that back to Unicode text.
Markdown has provided some helpers to ease that manipulation within the context
of the Markdown module.
First, to get access to the ElementTree module import ElementTree from
``markdown`` rather than importing it directly. This will ensure you are using
the same version of ElementTree as markdown. The module is named ``etree``
within Markdown.
from markdown import etree
``markdown.etree`` tries to import ElementTree from any known location, first
as a standard library module (from ``xml.etree`` in Python 2.5), then as a third
party package (``Elementree``). In each instance, ``cElementTree`` is tried
first, then ``ElementTree`` if the faster C implementation is not available on
your system.
Sometimes you may want text inserted into an element to be parsed by
[InlinePatterns][]. In such a situation, simply insert the text into an
`inline` tag and the text will be automatically run through the InlinePatterns.
Here's a basic example which creates an HTML table (note that the contents of
the second cell (``td2``) will be run through InlinePatterns latter):
table = etree.Element("table")
table.set("cellpadding", "2") # Set cellpadding to 2
tr = etree.SubElement(table, "tr") # Add child tr to table
td1 = etree.SubElement(tr, "td") # Add child td1 to tr
td1.text = "Cell content" # Add plain text content to td1 element
td2 = etree.SubElement(tr, "td") # Add second td to tr
inline = etree.SubElement(td2, "inline") # Add an inline element to td2
inline.text = "Some *text* with **inline** formatting." # Add markup text
table.tail = "Text after table" # Added text after table Element
You can also manipulate an existing tree. Consider the following example which
adds a ``class`` attribute to all ``a`` elements:
def set_link_class(self, element):
for child in element:
if child.tag == "a":
child.set("class", "myclass") #set the class attribute
set_link_class(child) # run recursively on children
For more information about working with ElementTree see the ElementTree
[Documentation](http://effbot.org/zone/element-index.htm)
([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)).