From bbdbde59f7edde5df5630308fbffd657b3c26f60 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Wed, 7 Mar 2012 06:25:50 -0500 Subject: Renamed a few docswith better file names. --- docs/cli.md | 108 ++++++++ docs/command_line.md | 108 -------- docs/extensions/api.md | 621 +++++++++++++++++++++++++++++++++++++++++++ docs/extensions/extra.md | 2 +- docs/extensions/index.md | 14 +- docs/extensions/wikilinks.md | 2 +- docs/index.md | 8 +- docs/install.md | 2 +- docs/reference.md | 238 +++++++++++++++++ docs/test_suite.md | 2 +- docs/using_as_module.md | 237 ----------------- docs/writing_extensions.md | 621 ------------------------------------------- 12 files changed, 981 insertions(+), 982 deletions(-) create mode 100644 docs/cli.md delete mode 100644 docs/command_line.md create mode 100644 docs/extensions/api.md create mode 100644 docs/reference.md delete mode 100644 docs/using_as_module.md delete mode 100644 docs/writing_extensions.md diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..6048ecc --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,108 @@ +title: Command Line +prev_title: Library Reference +prev_url: reference.html +next_title: Extensions +next_url: extensions/index.html + + +Using Python-Markdown on the Command Line +========================================= + +While Python-Markdown is primarily a python library, a command line script is +included as well. While there are many other command line implementations +of Markdown, you may not have them installed, or you may prefer to use +Python-Markdown's various extensions. + +Generally, you will want to have the Markdown library fully installed on your +system (``setup.py install`` or ``easy_install markdown``) to run the command +line script. + +Assuming the `python` executable is on your system path, just run the following: + + python -m markdown [options] [args] + +That will run the module as a script. Note that on older python versions (2.5 +and 2.6), you may need to specify the appropriate module: + + python -m markdown.__main__ [options] [args] + +Use the `--help` option for available options: + + python -m markdown --help + +If you are using Python 2.4 or you don't want to have to call the python +executable directly, follow the instructions below: + +Setup +----- + +Upon installation, the ``markdown_py`` script will have been copied to +your Python "Scripts" directory. Different systems require different methods to +ensure that any files in the Python "Scripts" directory are on your system +path. + +* **Windows**: + + Assuming a default install of Python on Windows, your "Scripts" directory + is most likely something like ``C:\\Python26\Scripts``. Verify the location + of your "Scripts" directory and add it to you system path. + + Calling ``markdown_py`` from the command line will call the wrapper batch + file ``markdown_py.bat`` in the "Scripts" directory created during install. + +* __*nix__ (Linux, OSX, BSD, Unix, etc.): + + As each *nix distribution is different and we can't possibly document all + of them here, we'll provide a few helpful pointers: + + * Some systems will automatically install the script on your path. Try it + and see if it works. Just run ``markdown_py`` from the command line. + + * Other systems may maintain a separate "Scripts" directory which you + need to add to your path. Find it (check with your distribution) and + either add it to your path or make a symbolic link to it from your path. + + * If you are sure ``markdown_py`` is on your path, but it still isn't being + found, check the permissions of the file and make sure it is executable. + + As an alternative, you could just ``cd`` into the directory which contains + the source distribution, and run it from there. However, remember that your + markdown text files will not likely be in that directory, so it is much + more convenient to have ``markdown_py`` on your path. + +__Note:__ Python-Markdown uses "markdown_py" as a script name because +the Perl implementation has already taken the more obvious name "markdown". +Additionally, the default Python configuration on some systems would cause a +script named "markdown.py" to fail by importing itself rather than the markdown +library. Therefore, the script has been named "markdown_py" as a compromise. If +you prefer a different name for the script on your system, it is suggested that +you create a symbolic link to `markdown_py` with your preferred name. + +Usage +----- + +To use ``markdown_py`` from the command line, run it as + + $ markdown_py input_file.txt + +or + + $ markdown_py input_file.txt > output_file.html + +For a complete list of options, run + + $ markdown_py --help + +Using Extensions +---------------- + +For an extension to be run from the command line it must be provided in a module +on your python path (see the [Extension API](extensions/api.html) for details). +It can then be invoked by the name of that module: + + $ markdown_py -x footnotes text_with_footnotes.txt > output.html + +If the extension supports config options, you can pass them in as well: + + $ markdown_py -x "footnotes(PLACE_MARKER=~~~~~~~~)" input.txt + diff --git a/docs/command_line.md b/docs/command_line.md deleted file mode 100644 index c597c52..0000000 --- a/docs/command_line.md +++ /dev/null @@ -1,108 +0,0 @@ -title: Command Line -prev_title: Library Reference -prev_url: using_as_module.html -next_title: Extensions -next_url: extensions/index.html - - -Using Python-Markdown on the Command Line -========================================= - -While Python-Markdown is primarily a python library, a command line script is -included as well. While there are many other command line implementations -of Markdown, you may not have them installed, or you may prefer to use -Python-Markdown's various extensions. - -Generally, you will want to have the Markdown library fully installed on your -system (``setup.py install`` or ``easy_install markdown``) to run the command -line script. - -Assuming the `python` executable is on your system path, just run the following: - - python -m markdown [options] [args] - -That will run the module as a script. Note that on older python versions (2.5 -and 2.6), you may need to specify the appropriate module: - - python -m markdown.__main__ [options] [args] - -Use the `--help` option for available options: - - python -m markdown --help - -If you are using Python 2.4 or you don't want to have to call the python -executable directly, follow the instructions below: - -Setup ------ - -Upon installation, the ``markdown_py`` script will have been copied to -your Python "Scripts" directory. Different systems require different methods to -ensure that any files in the Python "Scripts" directory are on your system -path. - -* **Windows**: - - Assuming a default install of Python on Windows, your "Scripts" directory - is most likely something like ``C:\\Python26\Scripts``. Verify the location - of your "Scripts" directory and add it to you system path. - - Calling ``markdown_py`` from the command line will call the wrapper batch - file ``markdown_py.bat`` in the "Scripts" directory created during install. - -* __*nix__ (Linux, OSX, BSD, Unix, etc.): - - As each *nix distribution is different and we can't possibly document all - of them here, we'll provide a few helpful pointers: - - * Some systems will automatically install the script on your path. Try it - and see if it works. Just run ``markdown_py`` from the command line. - - * Other systems may maintain a separate "Scripts" directory which you - need to add to your path. Find it (check with your distribution) and - either add it to your path or make a symbolic link to it from your path. - - * If you are sure ``markdown_py`` is on your path, but it still isn't being - found, check the permissions of the file and make sure it is executable. - - As an alternative, you could just ``cd`` into the directory which contains - the source distribution, and run it from there. However, remember that your - markdown text files will not likely be in that directory, so it is much - more convenient to have ``markdown_py`` on your path. - -__Note:__ Python-Markdown uses "markdown_py" as a script name because -the Perl implementation has already taken the more obvious name "markdown". -Additionally, the default Python configuration on some systems would cause a -script named "markdown.py" to fail by importing itself rather than the markdown -library. Therefore, the script has been named "markdown_py" as a compromise. If -you prefer a different name for the script on your system, it is suggested that -you create a symbolic link to `markdown_py` with your preferred name. - -Usage ------ - -To use ``markdown_py`` from the command line, run it as - - $ markdown_py input_file.txt - -or - - $ markdown_py input_file.txt > output_file.html - -For a complete list of options, run - - $ markdown_py --help - -Using Extensions ----------------- - -For an extension to be run from the command line it must be provided in a module -which should be in your python path (see [writing_extensions](writing_extensions.html) -for details). It can then be invoked by the name of that module: - - $ markdown_py -x footnotes text_with_footnotes.txt > output.html - -If the extension supports config options, you can pass them in as well: - - $ markdown_py -x "footnotes(PLACE_MARKER=~~~~~~~~)" input.txt - diff --git a/docs/extensions/api.md b/docs/extensions/api.md new file mode 100644 index 0000000..c27abf0 --- /dev/null +++ b/docs/extensions/api.md @@ -0,0 +1,621 @@ +title: Extensions API +prev_title: Wikilinks Extension +prev_url: wikilinks.html +next_title: Test Suite +next_url: ../test_suite.html + +Writing Extensions for Python-Markdown +====================================== + +Overview +-------- + +Python-Markdown includes an API for extension writers to plug their own +custom functionality and/or syntax into the parser. There are preprocessors +which allow you to alter the source before it is passed to the parser, +inline patterns which allow you to add, remove or override the syntax of +any inline elements, and postprocessors which allow munging of the +output of the parser before it is returned. If you really want to dive in, +there are also blockprocessors which are part of the core BlockParser. + +As the parser builds an [ElementTree][] object which is later rendered +as Unicode text, there are also some helpers provided to ease manipulation of +the tree. Each part of the API is discussed in its respective section below. +Additionally, reading the source of some [Available Extensions][] may be +helpful. For example, the [Footnotes][] extension uses most of the features +documented here. + +* [Preprocessors][] +* [InlinePatterns][] +* [Treeprocessors][] +* [Postprocessors][] +* [BlockParser][] +* [Working with the ElementTree][] +* [Integrating your code into Markdown][] + * [extendMarkdown][] + * [OrderedDict][] + * [registerExtension][] + * [Config Settings][] + * [makeExtension][] + +

Preprocessors

+ +Preprocessors munge the source text before it is passed into the Markdown +core. This is an excellent place to clean up bad syntax, extract things the +parser may otherwise choke on and perhaps even store it for later retrieval. + +Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and +implement a ``run`` method with one argument ``lines``. The ``run`` method of +each Preprocessor will be passed the entire source text as a list of Unicode +strings. Each string will contain one line of text. The ``run`` method should +return either that list, or an altered list of Unicode strings. + +A pseudo example: + + from markdown.preprocessors import Preprocessor + + class MyPreprocessor(Preprocessor): + def run(self, lines): + new_lines = [] + for line in lines: + m = MYREGEX.match(line) + if m: + # do stuff + else: + new_lines.append(line) + return new_lines + +

Inline Patterns

+ +Inline Patterns implement the inline HTML element syntax for Markdown such as +``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be +instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or +one of its children. Each pattern object uses a single regular expression and +must have the following methods: + +* **``getCompiledRegExp()``**: + + Returns a compiled regular expression. + +* **``handleMatch(m)``**: + + Accepts a match object and returns an ElementTree element of a plain + Unicode string. + +Note that any regular expression returned by ``getCompiledRegExp`` must capture +the whole block. Therefore, they should all start with ``r'^(.*?)'`` and end +with ``r'(.*?)!'``. When using the default ``getCompiledRegExp()`` method +provided in the ``Pattern`` you can pass in a regular expression without that +and ``getCompiledRegExp`` will wrap your expression for you and set the +`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your +match will be ``m.group(2)`` as ``m.group(1)`` will match everything before the +pattern. + +For an example, consider this simplified emphasis pattern: + + from markdown.inlinepatterns import Pattern + from markdown.util import etree + + class EmphasisPattern(Pattern): + def handleMatch(self, m): + el = etree.Element('em') + el.text = m.group(3) + return el + +As discussed in [Integrating Your Code Into Markdown][], an instance of this +class will need to be provided to Markdown. That instance would be created +like so: + + # an oversimplified regex + MYPATTERN = r'\*([^*]+)\*' + # pass in pattern and create instance + emphasis = EmphasisPattern(MYPATTERN) + +Actually it would not be necessary to create that pattern (and not just because +a more sophisticated emphasis pattern already exists in Markdown). The fact is, +that example pattern is not very DRY. A pattern for `**strong**` text would +be almost identical, with the exception that it would create a 'strong' element. +Therefore, Markdown provides a number of generic pattern classes that can +provide some common functionality. For example, both emphasis and strong are +implemented with separate instances of the ``SimpleTagPettern`` listed below. +Feel free to use or extend any of the Pattern classes found at `markdown.inlinepatterns`. + +**Generic Pattern Classes** + +* **``SimpleTextPattern(pattern)``**: + + Returns simple text of ``group(2)`` of a ``pattern``. + +* **``SimpleTagPattern(pattern, tag)``**: + + Returns an element of type "`tag`" with a text attribute of ``group(3)`` + of a ``pattern``. ``tag`` should be a string of a HTML element (i.e.: 'em'). + +* **``SubstituteTagPattern(pattern, tag)``**: + + Returns an element of type "`tag`" with no children or text (i.e.: 'br'). + +There may be other Pattern classes in the Markdown source that you could extend +or use as well. Read through the source and see if there is anything you can +use. You might even get a few ideas for different approaches to your specific +situation. + +

Treeprocessors

+ +Treeprocessors manipulate an ElemenTree object after it has passed through the +core BlockParser. This is where additional manipulation of the tree takes +place. Additionally, the InlineProcessor is a Treeprocessor which steps through +the tree and runs the InlinePatterns on the text of each Element in the tree. + +A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``, +over-ride the ``run`` method which takes one argument ``root`` (an Elementree +object) and returns either that root element or a modified root element. + +A pseudo example: + + from markdown.treprocessors import Treeprocessor + + class MyTreeprocessor(Treeprocessor): + def run(self, root): + #do stuff + return my_modified_root + +For specifics on manipulating the ElementTree, see +[Working with the ElementTree][] below. + +

Postprocessors

+ +Postprocessors manipulate the document after the ElementTree has been +serialized into a string. Postprocessors should be used to work with the +text just before output. + +A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` +and over-ride the ``run`` method which takes one argument ``text`` and returns +a Unicode string. + +Postprocessors are run after the ElementTree has been serialized back into +Unicode text. For example, this may be an appropriate place to add a table of +contents to a document: + + from markdown.postprocessors import Postprocessor + + class TocPostprocessor(Postprocessor): + def run(self, text): + return MYMARKERRE.sub(MyToc, text) + +

BlockParser

+ +Sometimes, pre/tree/postprocessors and Inline Patterns aren't going to do what +you need. Perhaps you want a new type of block type that needs to be integrated +into the core parsing. In such a situation, you can add/change/remove +functionality of the core ``BlockParser``. The BlockParser is composed of a +number of Blockproccessors. The BlockParser steps through each block of text +(split by blank lines) and passes each block to the appropriate Blockprocessor. +That Blockprocessor parses the block and adds it to the ElementTree. The +[Definition Lists][] extension would be a good example of an extension that +adds/modifies Blockprocessors. + +A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor`` +and implement both the ``test`` and ``run`` methods. + +The ``test`` method is used by BlockParser to identify the type of block. +Therefore the ``test`` method must return a boolean value. If the test returns +``True``, then the BlockParser will call that Blockprocessor's ``run`` method. +If it returns ``False``, the BlockParser will move on to the next +BlockProcessor. + +The **``test``** method takes two arguments: + +* **``parent``**: The parent etree Element of the block. This can be useful as + the block may need to be treated differently if it is inside a list, for + example. + +* **``block``**: A string of the current block of text. The test may be a + simple string method (such as ``block.startswith(some_text)``) or a complex + regular expression. + +The **``run``** method takes two arguments: + +* **``parent``**: A pointer to the parent etree Element of the block. The run + method will most likely attach additional nodes to this parent. Note that + nothing is returned by the method. The Elementree object is altered in place. + +* **``blocks``**: A list of all remaining blocks of the document. Your run + method must remove (pop) the first block from the list (which it altered in + place - not returned) and parse that block. You may find that a block of text + legitimately contains multiple block types. Therefore, after processing the + first type, your processor can insert the remaining text into the beginning + of the ``blocks`` list for future parsing. + +Please be aware that a single block can span multiple text blocks. For example, +The official Markdown syntax rules state that a blank line does not end a +Code Block. If the next block of text is also indented, then it is part of +the previous block. Therefore, the BlockParser was specifically designed to +address these types of situations. If you notice the ``CodeBlockProcessor``, +in the core, you will note that it checks the last child of the ``parent``. +If the last child is a code block (``
...
``), then it +appends that block to the previous code block rather than creating a new +code block. + +Each BlockProcessor has the following utility methods available: + +* **``lastChild(parent)``**: + + Returns the last child of the given etree Element or ``None`` if it had no + children. + +* **``detab(text)``**: + + Removes one level of indent (four spaces by default) from the front of each + line of the given text string. + +* **``looseDetab(text, level)``**: + + Removes "level" levels of indent (defaults to 1) from the front of each line + of the given text string. However, this methods allows secondary lines to + not be indented as does some parts of the Markdown syntax. + +Each BlockProcessor also has a pointer to the containing BlockParser instance at +``self.parser``, which can be used to check or alter the state of the parser. +The BlockParser tracks it's state in a stack at ``parser.state``. The state +stack is an instance of the ``State`` class. + +**``State``** is a subclass of ``list`` and has the additional methods: + +* **``set(state)``**: + + Set a new state to string ``state``. The new state is appended to the end + of the stack. + +* **``reset()``**: + + Step back one step in the stack. The last state at the end is removed from + the stack. + +* **``isstate(state)``**: + + Test that the top (current) level of the stack is of the given string + ``state``. + +Note that to ensure that the state stack doesn't become corrupted, each time a +state is set for a block, that state *must* be reset when the parser finishes +parsing that block. + +An instance of the **``BlockParser``** is found at ``Markdown.parser``. +``BlockParser`` has the following methods: + +* **``parseDocument(lines)``**: + + Given a list of lines, an ElementTree object is returned. This should be + passed an entire document and is the only method the ``Markdown`` class + calls directly. + +* **``parseChunk(parent, text)``**: + + Parses a chunk of markdown text composed of multiple blocks and attaches + those blocks to the ``parent`` Element. The ``parent`` is altered in place + and nothing is returned. Extensions would most likely use this method for + block parsing. + +* **``parseBlocks(parent, blocks)``**: + + Parses a list of blocks of text and attaches those blocks to the ``parent`` + Element. The ``parent`` is altered in place and nothing is returned. This + method will generally only be used internally to recursively parse nested + blocks of text. + +While is is not recommended, an extension could subclass or completely replace +the ``BlockParser``. The new class would have to provide the same public API. +However, be aware that other extensions may expect the core parser provided +and will not work with such a drastically different parser. + +

Working with the ElementTree

+ +As mentioned, the Markdown parser converts a source document to an +[ElementTree][] object before serializing that back to Unicode text. +Markdown has provided some helpers to ease that manipulation within the context +of the Markdown module. + +First, to get access to the ElementTree module import ElementTree from +``markdown`` rather than importing it directly. This will ensure you are using +the same version of ElementTree as markdown. The module is found at +``markdown.util.etree`` within Markdown. + + from markdown.util import etree + +``markdown.util.etree`` tries to import ElementTree from any known location, +first as a standard library module (from ``xml.etree`` in Python 2.5), then as +a third party package (``Elementree``). In each instance, ``cElementTree`` is +tried first, then ``ElementTree`` if the faster C implementation is not +available on your system. + +Sometimes you may want text inserted into an element to be parsed by +[InlinePatterns][]. In such a situation, simply insert the text as you normally +would and the text will be automatically run through the InlinePatterns. +However, if you do *not* want some text to be parsed by InlinePatterns, +then insert the text as an ``AtomicString``. + + from markdown.util import AtomicString + some_element.text = AtomicString(some_text) + +Here's a basic example which creates an HTML table (note that the contents of +the second cell (``td2``) will be run through InlinePatterns latter): + + table = etree.Element("table") + table.set("cellpadding", "2") # Set cellpadding to 2 + tr = etree.SubElement(table, "tr") # Add child tr to table + td1 = etree.SubElement(tr, "td") # Add child td1 to tr + td1.text = markdown.AtomicString("Cell content") # Add plain text content + td2 = etree.SubElement(tr, "td") # Add second td to tr + td2.text = "*text* with **inline** formatting." # Add markup text + table.tail = "Text after table" # Add text after table + +You can also manipulate an existing tree. Consider the following example which +adds a ``class`` attribute to ```` elements: + + def set_link_class(self, element): + for child in element: + if child.tag == "a": + child.set("class", "myclass") #set the class attribute + set_link_class(child) # run recursively on children + +For more information about working with ElementTree see the ElementTree +[Documentation](http://effbot.org/zone/element-index.htm) +([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)). + +

Integrating Your Code Into Markdown

+ +Once you have the various pieces of your extension built, you need to tell +Markdown about them and ensure that they are run in the proper sequence. +Markdown accepts a ``Extension`` instance for each extension. Therefore, you +will need to define a class that extends ``markdown.extensions.Extension`` and +over-rides the ``extendMarkdown`` method. Within this class you will manage +configuration options for your extension and attach the various processors and +patterns to the Markdown instance. + +It is important to note that the order of the various processors and patterns +matters. For example, if we replace ``http://...`` links with ``
`` elements, +and *then* try to deal with inline html, we will end up with a mess. +Therefore, the various types of processors and patterns are stored within an +instance of the Markdown class in [OrderedDict][]s. Your ``Extension`` class +will need to manipulate those OrderedDicts appropriately. You may insert +instances of your processors and patterns into the appropriate location in an +OrderedDict, remove a built-in instance, or replace a built-in instance with +your own. + +

extendMarkdown

+ +The ``extendMarkdown`` method of a ``markdown.extensions.Extension`` class +accepts two arguments: + +* **``md``**: + + A pointer to the instance of the Markdown class. You should use this to + access the [OrderedDict][]s of processors and patterns. They are found + under the following attributes: + + * ``md.preprocessors`` + * ``md.inlinePatterns`` + * ``md.parser.blockprocessors`` + * ``md.treepreprocessors`` + * ``md.postprocessors`` + + Some other things you may want to access in the markdown instance are: + + * ``md.htmlStash`` + * ``md.output_formats`` + * ``md.set_output_format()`` + * ``md.registerExtension()`` + * ``md.html_replacement_text`` + * ``md.tab_length`` + * ``md.enable_attributes`` + * ``md.smart_emphasis`` + +* **``md_globals``**: + + Contains all the various global variables within the markdown module. + +Of course, with access to those items, theoretically you have the option to +changing anything through various [monkey_patching][] techniques. However, you +should be aware that the various undocumented or private parts of markdown +may change without notice and your monkey_patches may break with a new release. +Therefore, what you really should be doing is inserting processors and patterns +into the markdown pipeline. Consider yourself warned. + +[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch + +A simple example: + + from markdown.extensions import Extension + + class MyExtension(Extension): + def extendMarkdown(self, md, md_globals): + # Insert instance of 'mypattern' before 'references' pattern + md.inlinePatterns.add('mypattern', MyPattern(md), 'OrderedDict + +An OrderedDict is a dictionary like object that retains the order of it's +items. The items are ordered in the order in which they were appended to +the OrderedDict. However, an item can also be inserted into the OrderedDict +in a specific location in relation to the existing items. + +Think of OrderedDict as a combination of a list and a dictionary as it has +methods common to both. For example, you can get and set items using the +``od[key] = value`` syntax and the methods ``keys()``, ``values()``, and +``items()`` work as expected with the keys, values and items returned in the +proper order. At the same time, you can use ``insert()``, ``append()``, and +``index()`` as you would with a list. + +Generally speaking, within Markdown extensions you will be using the special +helper method ``add()`` to add additional items to an existing OrderedDict. + +The ``add()`` method accepts three arguments: + +* **``key``**: A string. The key is used for later reference to the item. + +* **``value``**: The object instance stored in this item. + +* **``location``**: Optional. The items location in relation to other items. + + Note that the location can consist of a few different values: + + * The special strings ``"_begin"`` and ``"_end"`` insert that item at the + beginning or end of the OrderedDict respectively. + + * A less-than sign (``<``) followed by an existing key (i.e.: + ``"``) followed by an existing key (i.e.: + ``">somekey"``) inserts that item after the existing key. + +Consider the following example: + + >>> from markdown.odict import OrderedDict + >>> od = OrderedDict() + >>> od['one'] = 1 # The same as: od.add('one', 1, '_begin') + >>> od['three'] = 3 # The same as: od.add('three', 3, '>one') + >>> od['four'] = 4 # The same as: od.add('four', 4, '_end') + >>> od.items() + [("one", 1), ("three", 3), ("four", 4)] + +Note that when building an OrderedDict in order, the extra features of the +``add`` method offer no real value and are not necessary. However, when +manipulating an existing OrderedDict, ``add`` can be very helpful. So let's +insert another item into the OrderedDict. + + >>> od.add('two', 2, '>one') # Insert after 'one' + >>> od.values() + [1, 2, 3, 4] + +Now let's insert another item. + + >>> od.add('twohalf', 2.5, '>> od.keys() + ["one", "two", "twohalf", "three", "four"] + +Note that we also could have set the location of "twohalf" to be 'after two' +(i.e.: ``'>two'``). However, it's unlikely that you will have control over the +order in which extensions will be loaded, and this could affect the final +sorted order of an OrderedDict. For example, suppose an extension adding +'twohalf' in the above examples was loaded before a separate extension which +adds 'two'. You may need to take this into consideration when adding your +extension components to the various markdown OrderedDicts. + +Once an OrderedDict is created, the items are available via key: + + MyNode = od['somekey'] + +Therefore, to delete an existing item: + + del od['somekey'] + +To change the value of an existing item (leaving location unchanged): + + od['somekey'] = MyNewObject() + +To change the location of an existing item: + + t.link('somekey', 'registerExtension + +Some extensions may need to have their state reset between multiple runs of the +Markdown class. For example, consider the following use of the [Footnotes][] +extension: + + md = markdown.Markdown(extensions=['footnotes']) + html1 = md.convert(text_with_footnote) + md.reset() + html2 = md.convert(text_without_footnote) + +Without calling ``reset``, the footnote definitions from the first document will +be inserted into the second document as they are still stored within the class +instance. Therefore the ``Extension`` class needs to define a ``reset`` method +that will reset the state of the extension (i.e.: ``self.footnotes = {}``). +However, as many extensions do not have a need for ``reset``, ``reset`` is only +called on extensions that are registered. + +To register an extension, call ``md.registerExtension`` from within your +``extendMarkdown`` method: + + + def extendMarkdown(self, md, md_globals): + md.registerExtension(self) + # insert processors and patterns here + +Then, each time ``reset`` is called on the Markdown instance, the ``reset`` +method of each registered extension will be called as well. You should also +note that ``reset`` will be called on each registered extension after it is +initialized the first time. Keep that in mind when over-riding the extension's +``reset`` method. + +

Config Settings

+ +If an extension uses any parameters that the user may want to change, +those parameters should be stored in ``self.config`` of your +``markdown.Extension`` class in the following format: + + self.config = {parameter_1_name : [value1, description1], + parameter_2_name : [value2, description2] } + +When stored this way the config parameters can be over-ridden from the +command line or at the time Markdown is initiated: + + markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt + +Note that parameters should always be assumed to be set to string +values, and should be converted at run time. For example: + + i = int(self.getConfig("SOME_PARAM")) + +

makeExtension

+ +Each extension should ideally be placed in its own module starting +with the ``mdx_`` prefix (e.g. ``mdx_footnotes.py``). The module must +provide a module-level function called ``makeExtension`` that takes +an optional parameter consisting of a dictionary of configuration over-rides +and returns an instance of the extension. An example from the footnote +extension: + + def makeExtension(configs=None) : + return FootnoteExtension(configs=configs) + +By following the above example, when Markdown is passed the name of your +extension as a string (i.e.: ``'footnotes'``), it will automatically import +the module and call the ``makeExtension`` function initiating your extension. + +You may have noted that the extensions packaged with Python-Markdown do not +use the ``mdx_`` prefix in their module names. This is because they are all +part of the ``markdown.extensions`` package. Markdown will first try to import +from ``markdown.extensions.extname`` and upon failure, ``mdx_extname``. If both +fail, Markdown will continue without the extension. + +However, Markdown will also accept an already existing instance of an extension. +For example: + + import markdown + import myextension + configs = {...} + myext = myextension.MyExtension(configs=configs) + md = markdown.Markdown(extensions=[myext]) + +This is useful if you need to implement a large number of extensions with more +than one residing in a module. + +[Preprocessors]: #preprocessors +[InlinePatterns]: #inlinepatterns +[Treeprocessors]: #treeprocessors +[Postprocessors]: #postprocessors +[BlockParser]: #blockparser +[Working with the ElementTree]: #working_with_et +[Integrating your code into Markdown]: #integrating_into_markdown +[extendMarkdown]: #extendmarkdown +[OrderedDict]: #ordereddict +[registerExtension]: #registerextension +[Config Settings]: #configsettings +[makeExtension]: #makeextension +[ElementTree]: http://effbot.org/zone/element-index.htm +[Available Extensions]: index.html +[Footnotes]: footnotes.html +[Definition Lists]: definition_lists.html diff --git a/docs/extensions/extra.md b/docs/extensions/extra.md index 998a200..adafe07 100644 --- a/docs/extensions/extra.md +++ b/docs/extensions/extra.md @@ -45,4 +45,4 @@ of those extensions are not part of PHP Markdown Extra, and therefore, not part of Python-Markdown Extra. If you really would like Extra to include additional extensions, we suggest creating your own clone of Extra under a different name -(see [Writing Extensions](../writing_extensions.html)). +(see the [Extension API](api.html)). diff --git a/docs/extensions/index.md b/docs/extensions/index.md index b09b2c2..293d1b0 100644 --- a/docs/extensions/index.md +++ b/docs/extensions/index.md @@ -1,6 +1,6 @@ title: Extensions prev_title: Command Line -prev_url: ../command_line.html +prev_url: ../cli.html next_title: Extra Extension next_url: extra.html @@ -13,14 +13,12 @@ to change and/or extend the behavior of the parser without having to edit the actual source files. To use an extension, pass it's name to markdown with the `extensions` keyword. -See [Using Markdown as a Python Library](../using_as_module.html) for more -details. +See the [Library Reference](../reference.html) for more details. markdown.markdown(some_text, extensions=['extra', 'nl2br']) -From the command line, specify an extension with the `-x` option. -See [Using Python-Markdown on the Command Line](../command_line.html) or use the -`--help` option for more details. +From the command line, specify an extension with the `-x` option. See the +[Command Line docs](../cli.html) or use the `--help` option for more details. python -m markdown -x extra input.txt > output.html @@ -56,10 +54,10 @@ Third Party Extensions Various individuals and/or organizations have developed extensions which they have made available to the public. A [list of third party -extensions](http://freewisdom.org/projects/python-markdown/Available_Extensions) +extensions](https://github.com/waylan/Python-Markdown/wiki/Third-Party-Extensions) is maintained on the wiki for your convenience. The Python-Markdown team offers no official support for these extensions. Please see the developer of each extension for support. If you would like to write your own extensions, see the -[Extensions API](../writing_extensions.html) for details. +[Extensions API](api.html) for details. diff --git a/docs/extensions/wikilinks.md b/docs/extensions/wikilinks.md index 22baf9f..522e7c5 100644 --- a/docs/extensions/wikilinks.md +++ b/docs/extensions/wikilinks.md @@ -2,7 +2,7 @@ title: Wikilinks Extension prev_title: Table of Contents Extension prev_url: toc.html next_title: Extension API -next_url: ../writing_extensions.html +next_url: api.html WikiLinks ========= diff --git a/docs/index.md b/docs/index.md index 1c6ca5a..fb48399 100644 --- a/docs/index.md +++ b/docs/index.md @@ -30,13 +30,13 @@ features: Python-Markdown defaults to ignoring middle-word emphasis. In other words, `some_long_filename.txt` will not become `somelongfilename.txt`. This can be switched off if desired. See the - [Library Reference](using_as_module.html) for details. + [Library Reference](reference.html) for details. * __Extensions__ - Various [extensions](extensions/) are provided (including + Various [extensions](extensions/index.html) are provided (including [extra](extensions/extra.html)) to expand the base syntax. Additionally, - a public [Extension API](writing_extensions.html) is available to write + a public [Extension API](extensions/api.html) is available to write your own extensions. * __Output Formats__ @@ -52,7 +52,7 @@ features: * __Command Line Interface__ In addition to being a Python Library, a - [command line script](command_line.html) is available for your convenience. + [command line script](cli.html) is available for your convenience. Support ------- diff --git a/docs/install.md b/docs/install.md index cc2a10d..a7e7ff0 100644 --- a/docs/install.md +++ b/docs/install.md @@ -2,7 +2,7 @@ title: Installation prev_title: Summary prev_url: index.html next_title: Library Reference -next_url: using_as_module.html +next_url: reference.html Installing Python-Markdown ========================== diff --git a/docs/reference.md b/docs/reference.md new file mode 100644 index 0000000..16735c7 --- /dev/null +++ b/docs/reference.md @@ -0,0 +1,238 @@ +title: Library Reference +prev_title: Installation +prev_url: install.html +next_title: Command Line +next_url: cli.html + + +Using Markdown as a Python Library +================================== + +First and foremost, Python-Markdown is intended to be a python library module +used by various projects to convert Markdown syntax into HTML. + +The Basics +---------- + +To use markdown as a module: + + import markdown + html = markdown.markdown(your_text_string) + +The Details +----------- + +Python-Markdown provides two public functions (`markdown.markdown` and +`markdown.markdownFromFile`) both of which wrap the public class +`markdown.Markdown`. If you're processing one document at a time, the +functions will serve your needs. However, if you need to process +multiple documents, it may be advantageous to create a single instance +of the `markdown.Markdown` class and pass multiple documents through it. + +### `markdown.markdown(text [, **kwargs])` + +The following options are available on the `markdown.markdown` function: + +* __`text`__ (required): The source text string. + + Note that Python-Markdown expects **Unicode** as input (although + a simple ASCII string may work) and returns output as Unicode. + Do not pass encoded strings to it! If your input is encoded, (e.g. as + UTF-8), it is your responsibility to decode it. For example: + + input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8") + text = input_file.read() + html = markdown.markdown(text) + + If you want to write the output to disk, you must encode it yourself: + + output_file = codecs.open("some_file.html", "w", + encoding="utf-8", + errors="xmlcharrefreplace" + ) + output_file.write(html) + +* __`extensions`__: A list of extensions. + + Python-Markdown provides an API for third parties to write extensions to + the parser adding their own additions or changes to the syntax. A few + commonly used extensions are shipped with the markdown library. See + the [extension documentation](extensions/index.html) for a list of + available extensions. + + The list of extensions may contain instances of extensions or stings of + extension names. If an extension name is provided as a string, the + extension must be importable as a python module either within the + `markdown.extensions` package or on your PYTHONPATH with a name starting + with `mdx_`, followed by the name of the extension. Thus, + `extensions=['extra']` will first look for the module + `markdown.extensions.extra`, then a module named `mdx_extra`. + +* __`extension-configs`__: A dictionary of configuration settings for extensions. + + The dictionary must be of the following format: + + extension-configs = {'extension_name_1': + [ + ('option_1', 'value_1'), + ('option_2', 'value_2') + ], + 'extension_name_2': + [ + ('option_1', 'value_1') + ] + } + See the documentation specific to the extension you are using for help in + specifying configuration settings for that extension. + +* __`output_format`__: Format of output. + + Supported formats are: + + * `"xhtml1"`: Outputs XHTML 1.x. **Default**. + * `"xhtml5"`: Outputs XHTML style tags of HTML 5 + * `"xhtml"`: Outputs latest supported version of XHTML (currently XHTML 1.1). + * `"html4"`: Outputs HTML 4 + * `"html5"`: Outputs HTML style tags of HTML 5 + * `"html"`: Outputs latest supported version of HTML (currently HTML 4). + + Note that it is suggested that the more specific formats ("xhtml1", + "html5", & "html4") be used as "xhtml" or "html" may change in the future + if it makes sense at that time. The values can either be lowercase or + uppercase. + +* __`safe_mode`__: Disallow raw html. + + If you are using Markdown on a web system which will transform text + provided by untrusted users, you may want to use the "safe_mode" + option which ensures that the user's HTML tags are either replaced, + removed or escaped. (They can still create links using Markdown syntax.) + + The following values are accepted: + + * `False` (Default): Raw HTML is passed through unaltered. + + * `replace`: Replace all HTML blocks with the text assigned to + `html_replacement_text` To maintain backward compatibility, setting + `safe_mode=True` will have the same effect as `safe_mode='replace'`. + + To replace raw HTML with something other than the default, do: + + md = markdown.Markdown(safe_mode='replace', + html_replacement_text='--RAW HTML NOT ALLOWED--') + + * `remove`: All raw HTML will be completely stripped from the text with + no warning to the author. + + * `escape`: All raw HTML will be escaped and included in the document. + + For example, the following source: + + Foo bar. + + Will result in the following HTML: + +

Foo <b>bar</b>.

+ + Note that "safe_mode" does not alter the `enable_attributes` option, which + could allow someone to inject javascript (i.e., `{@onclick=alert(1)}`). You + may also want to set `enable_attributes=False` when using "safe_mode". + +* __`html_replacement_text`__: Text used when safe_mode is set to `replace`. + Defaults to `[HTML_REMOVED]`. + +* __`tab_length`__: Length of tabs in the source. Default: 4 + +* __`enable_attributes`__: Enable the conversion of attributes. Default: True + +* __`smart_emphasis`__: Treat `_connected_words_` intelligently Default: True + +* __`lazy_ol`__: Ignore number of first item of ordered lists. Default: True + + Given the following list: + + 4. Apples + 5. Oranges + 6. Pears + + By default markdown will ignore the fact the the first line started + with item number "4" and the HTML list will start with a number "1". + If `lazy_ol` is set to `True`, then markdown will output the following + HTML: + +
    +
  1. Apples
  2. +
  3. Oranges
  4. +
  5. Pears
  6. +
+ + +### `markdown.markdownFromFile(**kwargs)` + +With a few exceptions, `markdown.markdownFromFile` accepts the same options as +`markdown.markdown`. It does **not** accept a `text` (or Unicode) string. +Instead, it accepts the following required options: + +* __`input`__ (required): The source text file. + + `input` may be set to one of three options: + + * a string which contains a path to a readable file on the file system, + * a readable file-like object, + * or `None` (default) which will read from `stdin`. + +* __`output`__: The target which output is written to. + + `output` may be set to one of three options: + + * a string which contains a path to a writable file on the file system, + * a writable file-like object, + * or `None` (default) which will write to `stdout`. + +* __`encoding`__: The encoding of the source text file. Defaults to + "utf-8". The same encoding will always be used for input and output. + The 'xmlcharrefreplace' error handler is used when encoding the output. + + **Note:** This is the only place that decoding and encoding of unicode + takes place in Python-Markdown. If this rather naive solution does not + meet your specific needs, it is suggested that you write your own code + to handle your encoding/decoding needs. + +### `markdown.Markdown([**kwargs])` + +The same options are available when initializing the `markdown.Markdown` class +as on the `markdown.markdown` function, except that the class does **not** +accept a source text string on initialization. Rather, the source text string +must be passed to one of two instance methods: + +* `Markdown.convert(source)` + + The `source` text must meet the same requirements as the `text` argument + of the `markdown.markdown` function. + + You should also use this method if you want to process multiple strings + without creating a new instance of the class for each string. + + md = markdown.Markdown() + html1 = md.convert(text1) + html2 = md.convert(text2) + + Note that depending on which options and/or extensions are being used, + the parser may need its state reset between each call to `convert`. + + html1 = md.convert(text1) + md.reset() + html2 = md.convert(text2) + + You can also change calls to `reset` togeather: + + html3 = md.reset().convert(text3) + +* `Markdown.convertFile(**kwargs)` + + The arguments of this method are identical to the arguments of the same + name on the `markdown.markdownFromFile` function (`input`, `output`, and + `encoding`). As with the `convert` method, this method should be used to + process multiple files without creating a new instance of the class for + each document. State may need to be `reset` between each call to + `convertFile` as is the case with `convert`. diff --git a/docs/test_suite.md b/docs/test_suite.md index c1034b6..cf9353c 100644 --- a/docs/test_suite.md +++ b/docs/test_suite.md @@ -1,6 +1,6 @@ title: Test Suite prev_title: Extension API -prev_url: writing_extensions.html +prev_url: extensions/api.html # Test Suite diff --git a/docs/using_as_module.md b/docs/using_as_module.md deleted file mode 100644 index 72c4965..0000000 --- a/docs/using_as_module.md +++ /dev/null @@ -1,237 +0,0 @@ -title: Library Reference -prev_title: Installation -prev_url: install.html -next_title: Command Line -next_url: command_line.html - - -Using Markdown as a Python Library -================================== - -First and foremost, Python-Markdown is intended to be a python library module -used by various projects to convert Markdown syntax into HTML. - -The Basics ----------- - -To use markdown as a module: - - import markdown - html = markdown.markdown(your_text_string) - -The Details ------------ - -Python-Markdown provides two public functions (`markdown.markdown` and -`markdown.markdownFromFile`) both of which wrap the public class -`markdown.Markdown`. If you're processing one document at a time, the -functions will serve your needs. However, if you need to process -multiple documents, it may be advantageous to create a single instance -of the `markdown.Markdown` class and pass multiple documents through it. - -### `markdown.markdown(text [, **kwargs])` - -The following options are available on the `markdown.markdown` function: - -* __`text`__ (required): The source text string. - - Note that Python-Markdown expects **Unicode** as input (although - a simple ASCII string may work) and returns output as Unicode. - Do not pass encoded strings to it! If your input is encoded, (e.g. as - UTF-8), it is your responsibility to decode it. For example: - - input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8") - text = input_file.read() - html = markdown.markdown(text) - - If you want to write the output to disk, you must encode it yourself: - - output_file = codecs.open("some_file.html", "w", - encoding="utf-8", - errors="xmlcharrefreplace" - ) - output_file.write(html) - -* __`extensions`__: A list of extensions. - - Python-Markdown provides an API for third parties to write extensions to - the parser adding their own additions or changes to the syntax. A few - commonly used extensions are shipped with the markdown library. See - the [extension documentation](extensions) for a list of available extensions. - - The list of extensions may contain instances of extensions or stings of - extension names. If an extension name is provided as a string, the - extension must be importable as a python module either within the - `markdown.extensions` package or on your PYTHONPATH with a name starting - with `mdx_`, followed by the name of the extension. Thus, - `extensions=['extra']` will first look for the module - `markdown.extensions.extra`, then a module named `mdx_extra`. - -* __`extension-configs`__: A dictionary of configuration settings for extensions. - - The dictionary must be of the following format: - - extension-configs = {'extension_name_1': - [ - ('option_1', 'value_1'), - ('option_2', 'value_2') - ], - 'extension_name_2': - [ - ('option_1', 'value_1') - ] - } - See the documentation specific to the extension you are using for help in - specifying configuration settings for that extension. - -* __`output_format`__: Format of output. - - Supported formats are: - - * `"xhtml1"`: Outputs XHTML 1.x. **Default**. - * `"xhtml5"`: Outputs XHTML style tags of HTML 5 - * `"xhtml"`: Outputs latest supported version of XHTML (currently XHTML 1.1). - * `"html4"`: Outputs HTML 4 - * `"html5"`: Outputs HTML style tags of HTML 5 - * `"html"`: Outputs latest supported version of HTML (currently HTML 4). - - Note that it is suggested that the more specific formats ("xhtml1", - "html5", & "html4") be used as "xhtml" or "html" may change in the future - if it makes sense at that time. The values can either be lowercase or - uppercase. - -* __`safe_mode`__: Disallow raw html. - - If you are using Markdown on a web system which will transform text - provided by untrusted users, you may want to use the "safe_mode" - option which ensures that the user's HTML tags are either replaced, - removed or escaped. (They can still create links using Markdown syntax.) - - The following values are accepted: - - * `False` (Default): Raw HTML is passed through unaltered. - - * `replace`: Replace all HTML blocks with the text assigned to - `html_replacement_text` To maintain backward compatibility, setting - `safe_mode=True` will have the same effect as `safe_mode='replace'`. - - To replace raw HTML with something other than the default, do: - - md = markdown.Markdown(safe_mode='replace', - html_replacement_text='--RAW HTML NOT ALLOWED--') - - * `remove`: All raw HTML will be completely stripped from the text with - no warning to the author. - - * `escape`: All raw HTML will be escaped and included in the document. - - For example, the following source: - - Foo bar. - - Will result in the following HTML: - -

Foo <b>bar</b>.

- - Note that "safe_mode" does not alter the `enable_attributes` option, which - could allow someone to inject javascript (i.e., `{@onclick=alert(1)}`). You - may also want to set `enable_attributes=False` when using "safe_mode". - -* __`html_replacement_text`__: Text used when safe_mode is set to `replace`. - Defaults to `[HTML_REMOVED]`. - -* __`tab_length`__: Length of tabs in the source. Default: 4 - -* __`enable_attributes`__: Enable the conversion of attributes. Default: True - -* __`smart_emphasis`__: Treat `_connected_words_` intelligently Default: True - -* __`lazy_ol`__: Ignore number of first item of ordered lists. Default: True - - Given the following list: - - 4. Apples - 5. Oranges - 6. Pears - - By default markdown will ignore the fact the the first line started - with item number "4" and the HTML list will start with a number "1". - If `lazy_ol` is set to `True`, then markdown will output the following - HTML: - -
    -
  1. Apples
  2. -
  3. Oranges
  4. -
  5. Pears
  6. -
- - -### `markdown.markdownFromFile(**kwargs)` - -With a few exceptions, `markdown.markdownFromFile` accepts the same options as -`markdown.markdown`. It does **not** accept a `text` (or Unicode) string. -Instead, it accepts the following required options: - -* __`input`__ (required): The source text file. - - `input` may be set to one of three options: - - * a string which contains a path to a readable file on the file system, - * a readable file-like object, - * or `None` (default) which will read from `stdin`. - -* __`output`__: The target which output is written to. - - `output` may be set to one of three options: - - * a string which contains a path to a writable file on the file system, - * a writable file-like object, - * or `None` (default) which will write to `stdout`. - -* __`encoding`__: The encoding of the source text file. Defaults to - "utf-8". The same encoding will always be used for input and output. - The 'xmlcharrefreplace' error handler is used when encoding the output. - - **Note:** This is the only place that decoding and encoding of unicode - takes place in Python-Markdown. If this rather naive solution does not - meet your specific needs, it is suggested that you write your own code - to handle your encoding/decoding needs. - -### `markdown.Markdown([**kwargs])` - -The same options are available when initializing the `markdown.Markdown` class -as on the `markdown.markdown` function, except that the class does **not** -accept a source text string on initialization. Rather, the source text string -must be passed to one of two instance methods: - -* `Markdown.convert(source)` - - The `source` text must meet the same requirements as the `text` argument - of the `markdown.markdown` function. - - You should also use this method if you want to process multiple strings - without creating a new instance of the class for each string. - - md = markdown.Markdown() - html1 = md.convert(text1) - html2 = md.convert(text2) - - Note that depending on which options and/or extensions are being used, - the parser may need its state reset between each call to `convert`. - - html1 = md.convert(text1) - md.reset() - html2 = md.convert(text2) - - You can also change calls to `reset` togeather: - - html3 = md.reset().convert(text3) - -* `Markdown.convertFile(**kwargs)` - - The arguments of this method are identical to the arguments of the same - name on the `markdown.markdownFromFile` function (`input`, `output`, and - `encoding`). As with the `convert` method, this method should be used to - process multiple files without creating a new instance of the class for - each document. State may need to be `reset` between each call to - `convertFile` as is the case with `convert`. diff --git a/docs/writing_extensions.md b/docs/writing_extensions.md deleted file mode 100644 index 1e40019..0000000 --- a/docs/writing_extensions.md +++ /dev/null @@ -1,621 +0,0 @@ -title: Extensions API -prev_title: Wikilinks Extension -prev_url: extensions/wikilinks.html -next_title: Test Suite -next_url: test_suite.html - -Writing Extensions for Python-Markdown -====================================== - -Overview --------- - -Python-Markdown includes an API for extension writers to plug their own -custom functionality and/or syntax into the parser. There are preprocessors -which allow you to alter the source before it is passed to the parser, -inline patterns which allow you to add, remove or override the syntax of -any inline elements, and postprocessors which allow munging of the -output of the parser before it is returned. If you really want to dive in, -there are also blockprocessors which are part of the core BlockParser. - -As the parser builds an [ElementTree][] object which is later rendered -as Unicode text, there are also some helpers provided to ease manipulation of -the tree. Each part of the API is discussed in its respective section below. -Additionally, reading the source of some [Available Extensions][] may be -helpful. For example, the [Footnotes][] extension uses most of the features -documented here. - -* [Preprocessors][] -* [InlinePatterns][] -* [Treeprocessors][] -* [Postprocessors][] -* [BlockParser][] -* [Working with the ElementTree][] -* [Integrating your code into Markdown][] - * [extendMarkdown][] - * [OrderedDict][] - * [registerExtension][] - * [Config Settings][] - * [makeExtension][] - -

Preprocessors

- -Preprocessors munge the source text before it is passed into the Markdown -core. This is an excellent place to clean up bad syntax, extract things the -parser may otherwise choke on and perhaps even store it for later retrieval. - -Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and -implement a ``run`` method with one argument ``lines``. The ``run`` method of -each Preprocessor will be passed the entire source text as a list of Unicode -strings. Each string will contain one line of text. The ``run`` method should -return either that list, or an altered list of Unicode strings. - -A pseudo example: - - from markdown.preprocessors import Preprocessor - - class MyPreprocessor(Preprocessor): - def run(self, lines): - new_lines = [] - for line in lines: - m = MYREGEX.match(line) - if m: - # do stuff - else: - new_lines.append(line) - return new_lines - -

Inline Patterns

- -Inline Patterns implement the inline HTML element syntax for Markdown such as -``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be -instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or -one of its children. Each pattern object uses a single regular expression and -must have the following methods: - -* **``getCompiledRegExp()``**: - - Returns a compiled regular expression. - -* **``handleMatch(m)``**: - - Accepts a match object and returns an ElementTree element of a plain - Unicode string. - -Note that any regular expression returned by ``getCompiledRegExp`` must capture -the whole block. Therefore, they should all start with ``r'^(.*?)'`` and end -with ``r'(.*?)!'``. When using the default ``getCompiledRegExp()`` method -provided in the ``Pattern`` you can pass in a regular expression without that -and ``getCompiledRegExp`` will wrap your expression for you and set the -`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your -match will be ``m.group(2)`` as ``m.group(1)`` will match everything before the -pattern. - -For an example, consider this simplified emphasis pattern: - - from markdown.inlinepatterns import Pattern - from markdown.util import etree - - class EmphasisPattern(Pattern): - def handleMatch(self, m): - el = etree.Element('em') - el.text = m.group(3) - return el - -As discussed in [Integrating Your Code Into Markdown][], an instance of this -class will need to be provided to Markdown. That instance would be created -like so: - - # an oversimplified regex - MYPATTERN = r'\*([^*]+)\*' - # pass in pattern and create instance - emphasis = EmphasisPattern(MYPATTERN) - -Actually it would not be necessary to create that pattern (and not just because -a more sophisticated emphasis pattern already exists in Markdown). The fact is, -that example pattern is not very DRY. A pattern for `**strong**` text would -be almost identical, with the exception that it would create a 'strong' element. -Therefore, Markdown provides a number of generic pattern classes that can -provide some common functionality. For example, both emphasis and strong are -implemented with separate instances of the ``SimpleTagPettern`` listed below. -Feel free to use or extend any of the Pattern classes found at `markdown.inlinepatterns`. - -**Generic Pattern Classes** - -* **``SimpleTextPattern(pattern)``**: - - Returns simple text of ``group(2)`` of a ``pattern``. - -* **``SimpleTagPattern(pattern, tag)``**: - - Returns an element of type "`tag`" with a text attribute of ``group(3)`` - of a ``pattern``. ``tag`` should be a string of a HTML element (i.e.: 'em'). - -* **``SubstituteTagPattern(pattern, tag)``**: - - Returns an element of type "`tag`" with no children or text (i.e.: 'br'). - -There may be other Pattern classes in the Markdown source that you could extend -or use as well. Read through the source and see if there is anything you can -use. You might even get a few ideas for different approaches to your specific -situation. - -

Treeprocessors

- -Treeprocessors manipulate an ElemenTree object after it has passed through the -core BlockParser. This is where additional manipulation of the tree takes -place. Additionally, the InlineProcessor is a Treeprocessor which steps through -the tree and runs the InlinePatterns on the text of each Element in the tree. - -A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``, -over-ride the ``run`` method which takes one argument ``root`` (an Elementree -object) and returns either that root element or a modified root element. - -A pseudo example: - - from markdown.treprocessors import Treeprocessor - - class MyTreeprocessor(Treeprocessor): - def run(self, root): - #do stuff - return my_modified_root - -For specifics on manipulating the ElementTree, see -[Working with the ElementTree][] below. - -

Postprocessors

- -Postprocessors manipulate the document after the ElementTree has been -serialized into a string. Postprocessors should be used to work with the -text just before output. - -A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` -and over-ride the ``run`` method which takes one argument ``text`` and returns -a Unicode string. - -Postprocessors are run after the ElementTree has been serialized back into -Unicode text. For example, this may be an appropriate place to add a table of -contents to a document: - - from markdown.postprocessors import Postprocessor - - class TocPostprocessor(Postprocessor): - def run(self, text): - return MYMARKERRE.sub(MyToc, text) - -

BlockParser

- -Sometimes, pre/tree/postprocessors and Inline Patterns aren't going to do what -you need. Perhaps you want a new type of block type that needs to be integrated -into the core parsing. In such a situation, you can add/change/remove -functionality of the core ``BlockParser``. The BlockParser is composed of a -number of Blockproccessors. The BlockParser steps through each block of text -(split by blank lines) and passes each block to the appropriate Blockprocessor. -That Blockprocessor parses the block and adds it to the ElementTree. The -[Definition Lists][] extension would be a good example of an extension that -adds/modifies Blockprocessors. - -A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor`` -and implement both the ``test`` and ``run`` methods. - -The ``test`` method is used by BlockParser to identify the type of block. -Therefore the ``test`` method must return a boolean value. If the test returns -``True``, then the BlockParser will call that Blockprocessor's ``run`` method. -If it returns ``False``, the BlockParser will move on to the next -BlockProcessor. - -The **``test``** method takes two arguments: - -* **``parent``**: The parent etree Element of the block. This can be useful as - the block may need to be treated differently if it is inside a list, for - example. - -* **``block``**: A string of the current block of text. The test may be a - simple string method (such as ``block.startswith(some_text)``) or a complex - regular expression. - -The **``run``** method takes two arguments: - -* **``parent``**: A pointer to the parent etree Element of the block. The run - method will most likely attach additional nodes to this parent. Note that - nothing is returned by the method. The Elementree object is altered in place. - -* **``blocks``**: A list of all remaining blocks of the document. Your run - method must remove (pop) the first block from the list (which it altered in - place - not returned) and parse that block. You may find that a block of text - legitimately contains multiple block types. Therefore, after processing the - first type, your processor can insert the remaining text into the beginning - of the ``blocks`` list for future parsing. - -Please be aware that a single block can span multiple text blocks. For example, -The official Markdown syntax rules state that a blank line does not end a -Code Block. If the next block of text is also indented, then it is part of -the previous block. Therefore, the BlockParser was specifically designed to -address these types of situations. If you notice the ``CodeBlockProcessor``, -in the core, you will note that it checks the last child of the ``parent``. -If the last child is a code block (``
...
``), then it -appends that block to the previous code block rather than creating a new -code block. - -Each BlockProcessor has the following utility methods available: - -* **``lastChild(parent)``**: - - Returns the last child of the given etree Element or ``None`` if it had no - children. - -* **``detab(text)``**: - - Removes one level of indent (four spaces by default) from the front of each - line of the given text string. - -* **``looseDetab(text, level)``**: - - Removes "level" levels of indent (defaults to 1) from the front of each line - of the given text string. However, this methods allows secondary lines to - not be indented as does some parts of the Markdown syntax. - -Each BlockProcessor also has a pointer to the containing BlockParser instance at -``self.parser``, which can be used to check or alter the state of the parser. -The BlockParser tracks it's state in a stack at ``parser.state``. The state -stack is an instance of the ``State`` class. - -**``State``** is a subclass of ``list`` and has the additional methods: - -* **``set(state)``**: - - Set a new state to string ``state``. The new state is appended to the end - of the stack. - -* **``reset()``**: - - Step back one step in the stack. The last state at the end is removed from - the stack. - -* **``isstate(state)``**: - - Test that the top (current) level of the stack is of the given string - ``state``. - -Note that to ensure that the state stack doesn't become corrupted, each time a -state is set for a block, that state *must* be reset when the parser finishes -parsing that block. - -An instance of the **``BlockParser``** is found at ``Markdown.parser``. -``BlockParser`` has the following methods: - -* **``parseDocument(lines)``**: - - Given a list of lines, an ElementTree object is returned. This should be - passed an entire document and is the only method the ``Markdown`` class - calls directly. - -* **``parseChunk(parent, text)``**: - - Parses a chunk of markdown text composed of multiple blocks and attaches - those blocks to the ``parent`` Element. The ``parent`` is altered in place - and nothing is returned. Extensions would most likely use this method for - block parsing. - -* **``parseBlocks(parent, blocks)``**: - - Parses a list of blocks of text and attaches those blocks to the ``parent`` - Element. The ``parent`` is altered in place and nothing is returned. This - method will generally only be used internally to recursively parse nested - blocks of text. - -While is is not recommended, an extension could subclass or completely replace -the ``BlockParser``. The new class would have to provide the same public API. -However, be aware that other extensions may expect the core parser provided -and will not work with such a drastically different parser. - -

Working with the ElementTree

- -As mentioned, the Markdown parser converts a source document to an -[ElementTree][] object before serializing that back to Unicode text. -Markdown has provided some helpers to ease that manipulation within the context -of the Markdown module. - -First, to get access to the ElementTree module import ElementTree from -``markdown`` rather than importing it directly. This will ensure you are using -the same version of ElementTree as markdown. The module is found at -``markdown.util.etree`` within Markdown. - - from markdown.util import etree - -``markdown.util.etree`` tries to import ElementTree from any known location, -first as a standard library module (from ``xml.etree`` in Python 2.5), then as -a third party package (``Elementree``). In each instance, ``cElementTree`` is -tried first, then ``ElementTree`` if the faster C implementation is not -available on your system. - -Sometimes you may want text inserted into an element to be parsed by -[InlinePatterns][]. In such a situation, simply insert the text as you normally -would and the text will be automatically run through the InlinePatterns. -However, if you do *not* want some text to be parsed by InlinePatterns, -then insert the text as an ``AtomicString``. - - from markdown.util import AtomicString - some_element.text = AtomicString(some_text) - -Here's a basic example which creates an HTML table (note that the contents of -the second cell (``td2``) will be run through InlinePatterns latter): - - table = etree.Element("table") - table.set("cellpadding", "2") # Set cellpadding to 2 - tr = etree.SubElement(table, "tr") # Add child tr to table - td1 = etree.SubElement(tr, "td") # Add child td1 to tr - td1.text = markdown.AtomicString("Cell content") # Add plain text content - td2 = etree.SubElement(tr, "td") # Add second td to tr - td2.text = "*text* with **inline** formatting." # Add markup text - table.tail = "Text after table" # Add text after table - -You can also manipulate an existing tree. Consider the following example which -adds a ``class`` attribute to ``
`` elements: - - def set_link_class(self, element): - for child in element: - if child.tag == "a": - child.set("class", "myclass") #set the class attribute - set_link_class(child) # run recursively on children - -For more information about working with ElementTree see the ElementTree -[Documentation](http://effbot.org/zone/element-index.htm) -([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)). - -

Integrating Your Code Into Markdown

- -Once you have the various pieces of your extension built, you need to tell -Markdown about them and ensure that they are run in the proper sequence. -Markdown accepts a ``Extension`` instance for each extension. Therefore, you -will need to define a class that extends ``markdown.extensions.Extension`` and -over-rides the ``extendMarkdown`` method. Within this class you will manage -configuration options for your extension and attach the various processors and -patterns to the Markdown instance. - -It is important to note that the order of the various processors and patterns -matters. For example, if we replace ``http://...`` links with ``
`` elements, -and *then* try to deal with inline html, we will end up with a mess. -Therefore, the various types of processors and patterns are stored within an -instance of the Markdown class in [OrderedDict][]s. Your ``Extension`` class -will need to manipulate those OrderedDicts appropriately. You may insert -instances of your processors and patterns into the appropriate location in an -OrderedDict, remove a built-in instance, or replace a built-in instance with -your own. - -

extendMarkdown

- -The ``extendMarkdown`` method of a ``markdown.extensions.Extension`` class -accepts two arguments: - -* **``md``**: - - A pointer to the instance of the Markdown class. You should use this to - access the [OrderedDict][]s of processors and patterns. They are found - under the following attributes: - - * ``md.preprocessors`` - * ``md.inlinePatterns`` - * ``md.parser.blockprocessors`` - * ``md.treepreprocessors`` - * ``md.postprocessors`` - - Some other things you may want to access in the markdown instance are: - - * ``md.htmlStash`` - * ``md.output_formats`` - * ``md.set_output_format()`` - * ``md.registerExtension()`` - * ``md.html_replacement_text`` - * ``md.tab_length`` - * ``md.enable_attributes`` - * ``md.smart_emphasis`` - -* **``md_globals``**: - - Contains all the various global variables within the markdown module. - -Of course, with access to those items, theoretically you have the option to -changing anything through various [monkey_patching][] techniques. However, you -should be aware that the various undocumented or private parts of markdown -may change without notice and your monkey_patches may break with a new release. -Therefore, what you really should be doing is inserting processors and patterns -into the markdown pipeline. Consider yourself warned. - -[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch - -A simple example: - - from markdown.extensions import Extension - - class MyExtension(Extension): - def extendMarkdown(self, md, md_globals): - # Insert instance of 'mypattern' before 'references' pattern - md.inlinePatterns.add('mypattern', MyPattern(md), 'OrderedDict - -An OrderedDict is a dictionary like object that retains the order of it's -items. The items are ordered in the order in which they were appended to -the OrderedDict. However, an item can also be inserted into the OrderedDict -in a specific location in relation to the existing items. - -Think of OrderedDict as a combination of a list and a dictionary as it has -methods common to both. For example, you can get and set items using the -``od[key] = value`` syntax and the methods ``keys()``, ``values()``, and -``items()`` work as expected with the keys, values and items returned in the -proper order. At the same time, you can use ``insert()``, ``append()``, and -``index()`` as you would with a list. - -Generally speaking, within Markdown extensions you will be using the special -helper method ``add()`` to add additional items to an existing OrderedDict. - -The ``add()`` method accepts three arguments: - -* **``key``**: A string. The key is used for later reference to the item. - -* **``value``**: The object instance stored in this item. - -* **``location``**: Optional. The items location in relation to other items. - - Note that the location can consist of a few different values: - - * The special strings ``"_begin"`` and ``"_end"`` insert that item at the - beginning or end of the OrderedDict respectively. - - * A less-than sign (``<``) followed by an existing key (i.e.: - ``"``) followed by an existing key (i.e.: - ``">somekey"``) inserts that item after the existing key. - -Consider the following example: - - >>> from markdown.odict import OrderedDict - >>> od = OrderedDict() - >>> od['one'] = 1 # The same as: od.add('one', 1, '_begin') - >>> od['three'] = 3 # The same as: od.add('three', 3, '>one') - >>> od['four'] = 4 # The same as: od.add('four', 4, '_end') - >>> od.items() - [("one", 1), ("three", 3), ("four", 4)] - -Note that when building an OrderedDict in order, the extra features of the -``add`` method offer no real value and are not necessary. However, when -manipulating an existing OrderedDict, ``add`` can be very helpful. So let's -insert another item into the OrderedDict. - - >>> od.add('two', 2, '>one') # Insert after 'one' - >>> od.values() - [1, 2, 3, 4] - -Now let's insert another item. - - >>> od.add('twohalf', 2.5, '>> od.keys() - ["one", "two", "twohalf", "three", "four"] - -Note that we also could have set the location of "twohalf" to be 'after two' -(i.e.: ``'>two'``). However, it's unlikely that you will have control over the -order in which extensions will be loaded, and this could affect the final -sorted order of an OrderedDict. For example, suppose an extension adding -'twohalf' in the above examples was loaded before a separate extension which -adds 'two'. You may need to take this into consideration when adding your -extension components to the various markdown OrderedDicts. - -Once an OrderedDict is created, the items are available via key: - - MyNode = od['somekey'] - -Therefore, to delete an existing item: - - del od['somekey'] - -To change the value of an existing item (leaving location unchanged): - - od['somekey'] = MyNewObject() - -To change the location of an existing item: - - t.link('somekey', 'registerExtension - -Some extensions may need to have their state reset between multiple runs of the -Markdown class. For example, consider the following use of the [Footnotes][] -extension: - - md = markdown.Markdown(extensions=['footnotes']) - html1 = md.convert(text_with_footnote) - md.reset() - html2 = md.convert(text_without_footnote) - -Without calling ``reset``, the footnote definitions from the first document will -be inserted into the second document as they are still stored within the class -instance. Therefore the ``Extension`` class needs to define a ``reset`` method -that will reset the state of the extension (i.e.: ``self.footnotes = {}``). -However, as many extensions do not have a need for ``reset``, ``reset`` is only -called on extensions that are registered. - -To register an extension, call ``md.registerExtension`` from within your -``extendMarkdown`` method: - - - def extendMarkdown(self, md, md_globals): - md.registerExtension(self) - # insert processors and patterns here - -Then, each time ``reset`` is called on the Markdown instance, the ``reset`` -method of each registered extension will be called as well. You should also -note that ``reset`` will be called on each registered extension after it is -initialized the first time. Keep that in mind when over-riding the extension's -``reset`` method. - -

Config Settings

- -If an extension uses any parameters that the user may want to change, -those parameters should be stored in ``self.config`` of your -``markdown.Extension`` class in the following format: - - self.config = {parameter_1_name : [value1, description1], - parameter_2_name : [value2, description2] } - -When stored this way the config parameters can be over-ridden from the -command line or at the time Markdown is initiated: - - markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt - -Note that parameters should always be assumed to be set to string -values, and should be converted at run time. For example: - - i = int(self.getConfig("SOME_PARAM")) - -

makeExtension

- -Each extension should ideally be placed in its own module starting -with the ``mdx_`` prefix (e.g. ``mdx_footnotes.py``). The module must -provide a module-level function called ``makeExtension`` that takes -an optional parameter consisting of a dictionary of configuration over-rides -and returns an instance of the extension. An example from the footnote -extension: - - def makeExtension(configs=None) : - return FootnoteExtension(configs=configs) - -By following the above example, when Markdown is passed the name of your -extension as a string (i.e.: ``'footnotes'``), it will automatically import -the module and call the ``makeExtension`` function initiating your extension. - -You may have noted that the extensions packaged with Python-Markdown do not -use the ``mdx_`` prefix in their module names. This is because they are all -part of the ``markdown.extensions`` package. Markdown will first try to import -from ``markdown.extensions.extname`` and upon failure, ``mdx_extname``. If both -fail, Markdown will continue without the extension. - -However, Markdown will also accept an already existing instance of an extension. -For example: - - import markdown - import myextension - configs = {...} - myext = myextension.MyExtension(configs=configs) - md = markdown.Markdown(extensions=[myext]) - -This is useful if you need to implement a large number of extensions with more -than one residing in a module. - -[Preprocessors]: #preprocessors -[InlinePatterns]: #inlinepatterns -[Treeprocessors]: #treeprocessors -[Postprocessors]: #postprocessors -[BlockParser]: #blockparser -[Working with the ElementTree]: #working_with_et -[Integrating your code into Markdown]: #integrating_into_markdown -[extendMarkdown]: #extendmarkdown -[OrderedDict]: #ordereddict -[registerExtension]: #registerextension -[Config Settings]: #configsettings -[makeExtension]: #makeextension -[ElementTree]: http://effbot.org/zone/element-index.htm -[Available Extensions]: extensions/ -[Footnotes]: extensions/footnotes.html -[Definition Lists]: extensions/definition_lists.html -- cgit v1.2.3