From b941beb7973025d359f3e7839a5b99ea7f62c534 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Sat, 9 Aug 2008 22:28:45 -0400 Subject: Reorganized docs. Added an AUTHORS and INSTALL files. INSTALL is incomplete. --- docs/writing_extensions.txt | 298 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 298 insertions(+) create mode 100644 docs/writing_extensions.txt (limited to 'docs/writing_extensions.txt') diff --git a/docs/writing_extensions.txt b/docs/writing_extensions.txt new file mode 100644 index 0000000..2ae279c --- /dev/null +++ b/docs/writing_extensions.txt @@ -0,0 +1,298 @@ +### Overview + +Python-Markdown includes an API for extension writers to plug their own +custom functionality and/or syntax into the parser. There are preprocessors +which allow you to alter the source before it is passed to the parser, +inline patterns which allow you to add, remove or override the syntax of +any inline elements, and postprocessors which allow munging of the +output of the parser before it is returned. + +As the parser builds an [ElementTree][] DOM object which is later rendered +as Unicode text, there are also some helpers provided to make manipulation of +the DOM tree easier. Each part of the API is discussed in its respective +section below. You may find reading the source of some [[existing extensions]] +helpful as well. For example, the [[footnote]] extension uses most of the +features documented here. + +* [Preprocessors][] + * [TextPreprocessors][] + * [Line Preprocessors][] +* [InlinePatterns][] +* [Postprocessors][] + * [DOM Postprocessors][] + * [TextProstprocessors][] +* [Working with the DOM][] +* [Integrating your code into Markdown][] + * [extendMarkdown][] + * [Config Settings][] + * [makeExtension][] + +

Preprocessors

+ +Preprocessors munge the source text before it is passed into the Markdown +core. This is an excellent place to clean up bad syntax, extract things the +parser may otherwise choke on and perhaps even store it for later retrieval. + +There are two types of preprocessors: [TextPreprocessors][] and +[Line Preprocessors][]. + +

TextPreprocessors

+ +TextPreprocessors should inherit from `markdown.TextPreprocessor` and implement +a `run` method with one argument `text`. The `run` method of each +TextPreprocessor will be passed the entire source text as a single Unicode +string and should either return that single Unicode string, or an altered +version of it. + +For example, a simple TextPreprocessor that normalizes newlines [^1] might look +like this: + + class NormalizePreprocessor(markdown.TextPreprocessor): + def run(self, text): + return text.replace("\r\n", "\n").replace("\r", "\n") + +[^1]: It should be noted that Markdown already normalizes newlines. This +example is for illustrative purposes only. + +

Line Preprocessors

+ +Line Preprocessors should inherit from `markdown.Preprocessor` and implement +a `run` method with one argument `lines`. The `run` method of each Line +Preprocessor will be passed the entire source text as a list of Unicode strings. +Each string will contain one line of text. The `run` method should return +either that list, or an altered list of Unicode strings. + +A pseudo example: + + class MyPreprocessor(markdown.Preprocessor): + def run(self, lines): + new_lines = [] + for line in lines: + m = MYREGEX.match(line) + if m: + # do stuff + else: + new_lines.append(line) + return new_lines + +

Inline Patterns

+ +Inline Patterns implement the inline HTML element syntax for Markdown such as +`*emphasis*` or `[links](http://example.com)`. Pattern objects should be +instances of classes that inherit from `markdown.Pattern` or one of its +children. Each pattern object uses a single regular expression and must have +the following methods: + +* `getCompiledRegExp()`: Returns a compiled regular expression. +* `handleMatch(m)`: Accepts a match object and returns an ElementTree +element of a plain Unicode string. + +Note that any regular expression returned by `getCompiledRegExp` must capture +the whole block. Therefore, they should all start with `r'^(.*?)'` and end +with `r'(.*?)!'. When using the default `getCompiledRegExp()` method provided +in the `Pattern` you can pass in a regular expression without that and +`getCompiledRegExp` will wrap your expression for you. This means that the first +group of your match will be `m.group(2)` as `m.group(1)` will match everything +before the pattern. + +For an example, consider this simplified emphasis pattern: + + class EmphasisPattern(markdown.Pattern): + def handleMatch(self, m): + el = markdown.etree.Element('em') + el.text = m.group(3) + return el + +As discussed in [Integrating Your Code Into Markdown][], an instance of this +class will need to be provided to Markdown. That instance would be created +like so: + + # an oversimplified regex + MYPATTERN = r'\*([^*]+)\*' + # pass in pattern and create instance + emphasis = EmphasisPattern(MYPATTERN) + +Actually it would not be necessary to create that pattern (and not just because +a more sophisticated emphasis pattern already exists in Markdown). The fact is, +that example pattern is not very DRY. A pattern for `**strong**` text would +be almost identical, with the exception that it would create a 'strong' element. +Therefore, Markdown provides a number of generic pattern classes that can +provide some common functionality. For example, both emphasis and strong are +implemented with separate instances of the `SimpleTagPettern` listed below. +Feel free to use or extend any of these Pattern classes. + +**Generic Pattern Classes** + +* `SimpleTextPattern(pattern)`: + + Returns simple text of `group(2)` of a `pattern`. + +* `SimpleTagPattern(pattern, tag)`: + + Returns an element of type "`tag`" with a text attribute of `group(3)` + of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em'). + +* `SubstituteTagPattern(pattern, tag)`: + + Returns an element of type "`tag`" with no children or text (i.e.: 'br'). + +There may be other Pattern classes in the Markdown source that you could extend +or use as well. Read through the source and see if there is anything you can +use. You might even get a few ideas for different approaches to your specific +situation. + +

Postprocessors

+ +Postprocessors manipulate a document after it has passed through the Markdown +core. This is were stored text gets added back in such as a list of footnotes, +a table of contents or raw html. + +There are two types of postprocessors: [DOM Postprocessors][] and +[TextPostprocessors][]. + +

DOM Postprocessors

+ +A DOM Postprocessor should inherit from `markdown.Postprocessor` and over-ride +the `run` method which takes one argument `root` and should return either +that root element or a modified root element. + +A pseudo example: + + class MyPostprocessor(markdown.Postprocessor): + def run(self, root): + #do stufff + return my_modified_root + +For specifics on manipulating the DOM, see [Working with the DOM][] below. + +

TextPostprocessors

+ +A TextPostprocessor should inherit from `markdown.TextPostprocessor` and +over-ride the `run` method which takes one argument `text` and returns a +Unicode string. + +TextPostprocessors are run after the DOM has been serialized back into Unicode +text. For example, this may be an appropriate place to add a table of contents +to a document: + + class TocTextPostprocessor(markdown.TextPostprocessor): + def run(self, text): + return MYMARKERRE.sub(MyToc, text) + +

Working with the DOM

+ +As mentioned, the Markdown parser converts a source document to an +[ElementTree][] DOM object before serializing that back to Unicode text. +Markdown has provided some helpers to ease that manipulation within the context +of the Markdown module... + +

Integrating Your Code Into Markdown + +Once you have the various pieces of your extension built, you need to tell +Markdown about them and ensure that they are run in the proper sequence. +Markdown accepts a `Extension` instance for each extension. Therefore, you +will need to define a class that extends `markdown.Extension` and over-rides +the `extendMarkdown` method. Within this class you will manage configuration +options for your extension and attach the various processors and patterns to +the Markdown instance. + +It is important to note that the order of the various processors and patterns +matters. For example, if we replace `http://...` links with elements, and +*then* try to deal with inline html, we will end up with a mess. Therefore, +the various types of processors and patterns are stored within an instance of +the Markdown class within lists. Your `Extension` class will need to manipulate +those lists appropriately. You may insert instances of your processors and +patterns into the appropriate location in a list, remove a built-in instances, +or replace a built-in instance with your own. + +

`extendMarkdown`

+ +The `extendMarkdown` method of a `markdown.Extension` class accepts two +arguments: + +* `md`: + + A pointer to the instance of the Markdown class. You should use this to + access the lists of processors and patterns. They are found under the + following attributes: + + * `md.textPreprocessors` + * `md.preprocessors` + * `md.inlinePatterns` + * `md.postpreprocessors` + * `md.textPostprocessors` + + Some other things you may want to access in the markdown instance are: + + * `md.inlineStash` + * `md.htmlStash` + * `md.registerExtension()` + +* `md_globals` + + Contains all the various global variables within the markdown module. + +Of course, with access to those items, theoretically you have the option to +changing anything through various monkeypatching techniques. However, you should +be aware that the various undocumented or private parts of markdown may change +without notice and your monkeypatches may no longer work. Therefore, what you +really should be doing is inserting processors and patterns into the markdown +pipeline. + +

Config Settings

+ +If an extension uses any parameters that the user may want to change, +those parameters should be stored in `self.config` of your `markdown.Extension` +class in the following format: + + self.config = {parameter_1_name : [value1, description1], + parameter_2_name : [value2, description2] } + +When stored this way the config parameters can be over-ridden from the +command line or at the time Markdown is initiated: + + markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt + +Note that parameters should always be assumed to be set to string +values, and should be converted at run time. For example: + + i = int(self.getConfig("SOME_PARAM")) + +

`makeExtension`

+ +Each extension should ideally be placed in its own module starting +with the ``mdx_`` prefix (e.g. ``mdx_footnotes.py``). The module must +provide a module-level function called ``makeExtension`` that takes +an optional parameter consisting of a dictionary of configuration over-rides +and returns an instance of the extension. An example from the footnote extension: + + def makeExtension(configs=None) : + return FootnoteExtension(configs=configs) + +By following the above example, when Markdown is passed the name of your +extension as a string (i.e.: ``'footnotes'``), it will automatically import +the module and call the ``makeExtension`` function initiating your extension. + +However, Markdown will also accept an already existing instance of an extension.For example: + + import markdown, mdx_myextension + configs = {...} + myext = mdx_myextension.MyExtension(configs=configs) + md = markdown.Markdown(extensions=[myext]) + +This is useful if you need to implement a large number of extensions with more +than one residing in a module. + +[Preprocessors]: #preprocessors +[TextPreprocessors]: #textpreprocessors +[Line Preprocessors]: #linepreprocessors +[InlinePatterns]: #inlinepatterns +[Postprocessors]: #postprocessors +[DOM Postprocessors]: #dompostprocessors +[TextProstprocessors]: #textpostprocessors +[Working with the DOM]: #working_with_dom +[Integrating your code into Markdown]: #integrating_into_markdown +[extendMarkdown]: #extendmarkdown +[Config Settings]: #configsettings +[makeExtension]: #makeextension + -- cgit v1.2.3 From bfe67ee6a17f600ecac2df5a02511136d9da62ae Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Tue, 12 Aug 2008 15:32:24 -0400 Subject: Cleaned up writing_extensions docs. --- docs/writing_extensions.txt | 97 +++++++++++++++++++++------------------------ 1 file changed, 45 insertions(+), 52 deletions(-) (limited to 'docs/writing_extensions.txt') diff --git a/docs/writing_extensions.txt b/docs/writing_extensions.txt index dbb4dd4..5f0ff01 100755 --- a/docs/writing_extensions.txt +++ b/docs/writing_extensions.txt @@ -10,8 +10,8 @@ output of the parser before it is returned. As the parser builds an [ElementTree][] object which is later rendered as Unicode text, there are also some helpers provided to make manipulation of the tree easier. Each part of the API is discussed in its respective -section below. You may find reading the source of some [[existing extensions]] -helpful as well. For example, the [[footnote]] extension uses most of the +section below. You may find reading the source of some [[Available Extensions]] +helpful as well. For example, the [[Footnotes]] extension uses most of the features documented here. * [Preprocessors][] @@ -152,8 +152,8 @@ There are two types of postprocessors: [ElementTree Postprocessors][] and

ElementTree Postprocessors

-A ElementTree Postprocessor should inherit from `markdown.Postprocessor` and over-ride -the `run` method which takes one argument `root` and should return either +An ElementTree Postprocessor should inherit from `markdown.Postprocessor`, +over-ride the `run` method which takes one argument `root` and return either that root element or a modified root element. A pseudo example: @@ -163,7 +163,8 @@ A pseudo example: #do stufff return my_modified_root -For specifics on manipulating the ElementTree, see [Working with the ElementTree][] below. +For specifics on manipulating the ElementTree, see +[Working with the ElementTree][] below.

TextPostprocessors

@@ -171,9 +172,9 @@ A TextPostprocessor should inherit from `markdown.TextPostprocessor` and over-ride the `run` method which takes one argument `text` and returns a Unicode string. -TextPostprocessors are run after the ElementTree has been serialized back into Unicode -text. For example, this may be an appropriate place to add a table of contents -to a document: +TextPostprocessors are run after the ElementTree has been serialized back into +Unicode text. For example, this may be an appropriate place to add a table of +contents to a document: class TocTextPostprocessor(markdown.TextPostprocessor): def run(self, text): @@ -182,60 +183,52 @@ to a document:

Working with the ElementTree

As mentioned, the Markdown parser converts a source document to an -[ElementTree][] ElementTree object before serializing that back to Unicode text. +[ElementTree][] object before serializing that back to Unicode text. Markdown has provided some helpers to ease that manipulation within the context of the Markdown module. -First of all, to get access to the ElementTree module object you should use: - import markdown - etree = markdown.etree +First, to get access to the ElementTree module import ElementTree from +``markdown`` rather than importing it directly. This will ensure you are using +the same version of ElementTree as markdown. The module is named ``etree`` +within Markdown. + + from markdown import etree -It's try to import ElementTree from any known places, first as standard Python -cElementTree/ElementTree module, then as separately installed cElementTree/ElementTree. -Another thing you need to know is that all text data, included in tag will be -processed later with [InlinePatterns][]. +``markdown.etree`` tries to import ElementTree from any known location, first +as a standard library module (from ``xml.etree`` in Python 2.5), then as a third +party package (``Elementree``). In each instance, ``cElementTree`` is tried +first, then ``ElementTree`` if the faster C implementation is not available on +your system. + +Sometimes you may want text inserted into an element to be parsed by +[InlinePatterns][]. In such a situation, simply insert the text into an +`inline` tag and the text will be automatically run through the InlinePatterns. -Example below show basic ElementTree functionality: +Here's a basic example which creates an HTML table (note that the contents of +the second cell (``td2``) will be run through InlinePatterns latter): table = etree.Element("table") table.set("cellpadding", "2") # Set cellpadding to 2 - tr = etree.SubElement(table, "tr") # Added child tr to table - td = etree.SubElement(tr, "td") # Added child td to tr - td.text = "Cell content" # Added text content to td element + tr = etree.SubElement(table, "tr") # Add child tr to table + td1 = etree.SubElement(tr, "td") # Add child td1 to tr + td1.text = "Cell content" # Add plain text content to td1 element + td2 = etree.SubElement(tr, "td") # Add second td to tr + inline = etree.SubElement(td2, "inline") # Add an inline element to td2 + inline.text = "Some *text* with **inline** formatting." # Add markup text table.tail = "Text after table" # Added text after table Element - print etree.tostring(table) # Serialized our table - -Now let's write a simple ElementTree Postprocessor, that will add "class" attribute -to all "a" elements: - - class AttrPostprocessor(markdown.Postprocessor): - - def _findElement(self, element, name): - """ - find elements with @name and return list - - Keyword arguments: - - * element: ElementTree Element - * name: tag name to search - - """ - result = [] - for child in element: - if child.tag == name: - result.append(child) - result += self._findElement(child, name) - return result - - def run(self, root): - for element in self._findElement(root, "a"): - element.set("class", "MyClass") # Set "class" atribute - - retrun root +You can also manipulate an existing tree. Consider the following example which +adds a ``class`` attribute to all ``a`` elements: -For more information about working with ElementTree visit -ElementTree [official site](http://effbot.org/zone/element-index.htm). + def set_link_class(self, element): + for child in element: + if child.tag == "a": + child.set("class", "myclass") #set the class attribute + set_link_class(child) # run recursively on children + +For more information about working with ElementTree see the ElementTree +[Documentation](http://effbot.org/zone/element-index.htm) +([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)).

Integrating Your Code Into Markdown @@ -346,4 +339,4 @@ than one residing in a module. [extendMarkdown]: #extendmarkdown [Config Settings]: #configsettings [makeExtension]: #makeextension - +[ElementTree]: http://effbot.org/zone/element-index.htm -- cgit v1.2.3 From 88c72d709be87321b736778d19d9277db02422a8 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Tue, 12 Aug 2008 23:03:26 -0400 Subject: Added notes on registerExtension and reset as well as the new markdown_extensions package among a few other minor edits. --- docs/writing_extensions.txt | 64 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 9 deletions(-) (limited to 'docs/writing_extensions.txt') diff --git a/docs/writing_extensions.txt b/docs/writing_extensions.txt index 5f0ff01..df17698 100755 --- a/docs/writing_extensions.txt +++ b/docs/writing_extensions.txt @@ -24,6 +24,7 @@ features documented here. * [Working with the ElementTree][] * [Integrating your code into Markdown][] * [extendMarkdown][] + * [registerExtension][] * [Config Settings][] * [makeExtension][] @@ -277,11 +278,46 @@ arguments: Contains all the various global variables within the markdown module. Of course, with access to those items, theoretically you have the option to -changing anything through various monkeypatching techniques. However, you should -be aware that the various undocumented or private parts of markdown may change -without notice and your monkeypatches may no longer work. Therefore, what you -really should be doing is inserting processors and patterns into the markdown -pipeline. +changing anything through various [monkey_patching][] techniques. In fact, this +is how both the [[HeaderId]] and [[CodeHilite]] extensions work. However, you +should be aware that the various undocumented or private parts of markdown may +change without notice and your monkey_patches may break with a new release. +Therefore, what you really should be doing is inserting processors and patterns +into the markdown pipeline. Consider yourself warned. + +[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch + +

registerExtension

+ +Some extensions may need to have their state reset between multiple runs of the +Markdown class. For example, consider the following use of the [[Footnotes]] +extension: + + md = markdown.Markdown(extensions=['footnotes']) + html1 = md.convert(text_with_footnote) + md.reset() + html2 = md.convert(text_without_footnote) + +Without calling ``reset``, the footnote definitions from the first document will +be inserted into the second document as they are still stored within the class +instance. Therefore the ``Extension`` class needs to define a ``reset`` method +that will reset the state of the extension (i.e.: ``self.footnotes = {}``). +However, as many extensions do not have a need for ``reset``, ``reset`` is only +called on extensions that are registered. + +To register an extension, call ``md.registerExtension`` from within your +``extendMarkdown`` method: + + + def extendMarkdown(self, md, md_globals): + md.registerExtension(self) + # insert processors and patterns here + +Then, each time ``reset`` is called on the Markdown instance, the ``reset`` +method of each registered extension will be called as well. You should also +note that ``reset`` will be called on each registered extension after it is +initialized the first time. Keep that in mind when over-riding the extension's +``reset`` method.

Config Settings

@@ -308,7 +344,8 @@ Each extension should ideally be placed in its own module starting with the ``mdx_`` prefix (e.g. ``mdx_footnotes.py``). The module must provide a module-level function called ``makeExtension`` that takes an optional parameter consisting of a dictionary of configuration over-rides -and returns an instance of the extension. An example from the footnote extension: +and returns an instance of the extension. An example from the footnote +extension: def makeExtension(configs=None) : return FootnoteExtension(configs=configs) @@ -317,11 +354,19 @@ By following the above example, when Markdown is passed the name of your extension as a string (i.e.: ``'footnotes'``), it will automatically import the module and call the ``makeExtension`` function initiating your extension. -However, Markdown will also accept an already existing instance of an extension.For example: +You may have noted that the extensions packaged with Python-Markdown do not +use the ``mdx_`` prefix in their module names. This is because they are all +part of the ``markdown_extensions`` package. Markdown will first try to import +from ``markdown_extensions.extname`` and upon failure, ``mdx_extname``. If both +fail, Markdown will continue without the extension. + +However, Markdown will also accept an already existing instance of an extension. +For example: - import markdown, mdx_myextension + import markdown + import myextension configs = {...} - myext = mdx_myextension.MyExtension(configs=configs) + myext = myextension.MyExtension(configs=configs) md = markdown.Markdown(extensions=[myext]) This is useful if you need to implement a large number of extensions with more @@ -337,6 +382,7 @@ than one residing in a module. [Working with the ElementTree]: #working_with_et [Integrating your code into Markdown]: #integrating_into_markdown [extendMarkdown]: #extendmarkdown +[registerExtension]: #registerextension [Config Settings]: #configsettings [makeExtension]: #makeextension [ElementTree]: http://effbot.org/zone/element-index.htm -- cgit v1.2.3