From b62ddeda02fadcd09def9354eb2ef46a7562a106 Mon Sep 17 00:00:00 2001 From: Waylan Limberg Date: Wed, 6 Dec 2017 23:18:29 -0500 Subject: Switch docs to MKDocs (#602) Fixes #601. Merged in 6f87b32 from the md3 branch and did a lot of cleanup. Changes include: * Removed old docs build tool, templates, etc. * Added MkDocs config file, etc. * filename.txt => filename.md * pythonhost.org/Markdown => Python-Markdown.github.io * Markdown lint and other cleanup. * Automate pages deployment in makefile with `mkdocs gh-deploy` Assumes a git remote is set up named "pages". Do git remote add pages https://github.com/Python-Markdown/Python-Markdown.github.io.git ... before running `make deploy` the first time. --- docs/extensions/api.txt | 682 ------------------------------------------------ 1 file changed, 682 deletions(-) delete mode 100644 docs/extensions/api.txt (limited to 'docs/extensions/api.txt') diff --git a/docs/extensions/api.txt b/docs/extensions/api.txt deleted file mode 100644 index 246bb27..0000000 --- a/docs/extensions/api.txt +++ /dev/null @@ -1,682 +0,0 @@ -title: Extensions API -prev_title: WikiLinks Extension -prev_url: wikilinks.html -next_title: Test Suite -next_url: ../test_suite.html - -Writing Extensions for Python-Markdown -====================================== - -Python-Markdown includes an API for extension writers to plug their own -custom functionality and/or syntax into the parser. There are Preprocessors -which allow you to alter the source before it is passed to the parser, -inline patterns which allow you to add, remove or override the syntax of -any inline elements, and Postprocessors which allow munging of the -output of the parser before it is returned. If you really want to dive in, -there are also Blockprocessors which are part of the core BlockParser. - -As the parser builds an [ElementTree][ElementTree] object which is later rendered -as Unicode text, there are also some helpers provided to ease manipulation of -the tree. Each part of the API is discussed in its respective section below. -Additionally, reading the source of some [Available Extensions][] may be -helpful. For example, the [Footnotes][] extension uses most of the features -documented here. - -Preprocessors {: #preprocessors } ---------------------------------- - -Preprocessors munge the source text before it is passed into the Markdown -core. This is an excellent place to clean up bad syntax, extract things the -parser may otherwise choke on and perhaps even store it for later retrieval. - -Preprocessors should inherit from `markdown.preprocessors.Preprocessor` and -implement a `run` method with one argument `lines`. The `run` method of -each Preprocessor will be passed the entire source text as a list of Unicode -strings. Each string will contain one line of text. The `run` method should -return either that list, or an altered list of Unicode strings. - -A pseudo example: - - from markdown.preprocessors import Preprocessor - - class MyPreprocessor(Preprocessor): - def run(self, lines): - new_lines = [] - for line in lines: - m = MYREGEX.match(line) - if m: - # do stuff - else: - new_lines.append(line) - return new_lines - -Inline Patterns {: #inlinepatterns } ------------------------------------- - -Inline Patterns implement the inline HTML element syntax for Markdown such as -`*emphasis*` or `[links](http://example.com)`. Pattern objects should be -instances of classes that inherit from `markdown.inlinepatterns.Pattern` or -one of its children. Each pattern object uses a single regular expression and -must have the following methods: - -* **`getCompiledRegExp()`**: - - Returns a compiled regular expression. - -* **`handleMatch(m)`**: - - Accepts a match object and returns an ElementTree element of a plain - Unicode string. - -Also, Inline Patterns can define the property `ANCESTOR_EXCLUDES` with either -a list or tuple of undesirable ancestors. The pattern should not match if it -would cause the content to be a descendant of one of the defined tag names. - -Note that any regular expression returned by `getCompiledRegExp` must capture -the whole block. Therefore, they should all start with `r'^(.*?)'` and end -with `r'(.*?)!'`. When using the default `getCompiledRegExp()` method -provided in the `Pattern` you can pass in a regular expression without that -and `getCompiledRegExp` will wrap your expression for you and set the -`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your -match will be `m.group(2)` as `m.group(1)` will match everything before the -pattern. - -For an example, consider this simplified emphasis pattern: - - from markdown.inlinepatterns import Pattern - from markdown.util import etree - - class EmphasisPattern(Pattern): - def handleMatch(self, m): - el = etree.Element('em') - el.text = m.group(3) - return el - -As discussed in [Integrating Your Code Into Markdown][], an instance of this -class will need to be provided to Markdown. That instance would be created -like so: - - # an oversimplified regex - MYPATTERN = r'\*([^*]+)\*' - # pass in pattern and create instance - emphasis = EmphasisPattern(MYPATTERN) - -Actually it would not be necessary to create that pattern (and not just because -a more sophisticated emphasis pattern already exists in Markdown). The fact is, -that example pattern is not very DRY. A pattern for `**strong**` text would -be almost identical, with the exception that it would create a 'strong' element. -Therefore, Markdown provides a number of generic pattern classes that can -provide some common functionality. For example, both emphasis and strong are -implemented with separate instances of the `SimpleTagPattern` listed below. -Feel free to use or extend any of the Pattern classes found at -`markdown.inlinepatterns`. - -**Generic Pattern Classes** - -* **`SimpleTextPattern(pattern)`**: - - Returns simple text of `group(2)` of a `pattern`. - -* **`SimpleTagPattern(pattern, tag)`**: - - Returns an element of type "`tag`" with a text attribute of `group(3)` - of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em'). - -* **`SubstituteTagPattern(pattern, tag)`**: - - Returns an element of type "`tag`" with no children or text (i.e.: `br`). - -There may be other Pattern classes in the Markdown source that you could extend -or use as well. Read through the source and see if there is anything you can -use. You might even get a few ideas for different approaches to your specific -situation. - -Treeprocessors {: #treeprocessors } ------------------------------------ - -Treeprocessors manipulate an ElementTree object after it has passed through the -core BlockParser. This is where additional manipulation of the tree takes -place. Additionally, the InlineProcessor is a Treeprocessor which steps through -the tree and runs the Inline Patterns on the text of each Element in the tree. - -A Treeprocessor should inherit from `markdown.treeprocessors.Treeprocessor`, -over-ride the `run` method which takes one argument `root` (an ElementTree -object) and either modifies that root element and returns `None` or returns a -new ElementTree object. - -A pseudo example: - - from markdown.treeprocessors import Treeprocessor - - class MyTreeprocessor(Treeprocessor): - def run(self, root): - root.text = 'modified content' - -Note that Python class methods return `None` by default when no `return` -statement is defined. Additionally all Python variables refer to objects by -reference. Therefore, the above `run` method modifies the `root` element -in place and returns `None`. The changes made to the `root` element and its -children are retained. - -Some may be inclined to return the modified `root` element. While that would -work, it would cause a copy of the entire ElementTree to be generated each -time the Treeprocessor is run. Therefore, it is generally expected that -the `run` method would only return `None` or a new ElementTree object. - -For specifics on manipulating the ElementTree, see -[Working with the ElementTree][workingwithetree] below. - -Postprocessors {: #postprocessors } ------------------------------------ - -Postprocessors manipulate the document after the ElementTree has been -serialized into a string. Postprocessors should be used to work with the -text just before output. - -A Postprocessor should inherit from `markdown.postprocessors.Postprocessor` -and over-ride the `run` method which takes one argument `text` and returns -a Unicode string. - -Postprocessors are run after the ElementTree has been serialized back into -Unicode text. For example, this may be an appropriate place to add a table of -contents to a document: - - from markdown.postprocessors import Postprocessor - - class TocPostprocessor(Postprocessor): - def run(self, text): - return MYMARKERRE.sub(MyToc, text) - -BlockParser {: #blockparser } ------------------------------ - -Sometimes, Preprocessors, Treeprocessors, Postprocessors, and Inline Patterns -are not going to do what you need. Perhaps you want a new type of block type -that needs to be integrated into the core parsing. In such a situation, you can -add/change/remove functionality of the core `BlockParser`. The BlockParser is -composed of a number of Blockprocessors. The BlockParser steps through each -block of text (split by blank lines) and passes each block to the appropriate -Blockprocessor. That Blockprocessor parses the block and adds it to the -ElementTree. The -[Definition Lists][] extension would be a good example of an extension that -adds/modifies Blockprocessors. - -A Blockprocessor should inherit from `markdown.blockprocessors.BlockProcessor` -and implement both the `test` and `run` methods. - -The `test` method is used by BlockParser to identify the type of block. -Therefore the `test` method must return a Boolean value. If the test returns -`True`, then the BlockParser will call that Blockprocessor's `run` method. -If it returns `False`, the BlockParser will move on to the next -Blockprocessor. - -The **`test`** method takes two arguments: - -* **`parent`**: The parent ElementTree Element of the block. This can be useful - as the block may need to be treated differently if it is inside a list, for - example. - -* **`block`**: A string of the current block of text. The test may be a - simple string method (such as `block.startswith(some_text)`) or a complex - regular expression. - -The **`run`** method takes two arguments: - -* **`parent`**: A pointer to the parent ElementTree Element of the block. The run - method will most likely attach additional nodes to this parent. Note that - nothing is returned by the method. The ElementTree object is altered in place. - -* **`blocks`**: A list of all remaining blocks of the document. Your run - method must remove (pop) the first block from the list (which it altered in - place - not returned) and parse that block. You may find that a block of text - legitimately contains multiple block types. Therefore, after processing the - first type, your processor can insert the remaining text into the beginning - of the `blocks` list for future parsing. - -Please be aware that a single block can span multiple text blocks. For example, -The official Markdown syntax rules state that a blank line does not end a -Code Block. If the next block of text is also indented, then it is part of -the previous block. Therefore, the BlockParser was specifically designed to -address these types of situations. If you notice the `CodeBlockProcessor`, -in the core, you will note that it checks the last child of the `parent`. -If the last child is a code block (`
...
`), then it -appends that block to the previous code block rather than creating a new -code block. - -Each Blockprocessor has the following utility methods available: - -* **`lastChild(parent)`**: - - Returns the last child of the given ElementTree Element or `None` if it - had no children. - -* **`detab(text)`**: - - Removes one level of indent (four spaces by default) from the front of each - line of the given text string. - -* **`looseDetab(text, level)`**: - - Removes "level" levels of indent (defaults to 1) from the front of each line - of the given text string. However, this methods allows secondary lines to - not be indented as does some parts of the Markdown syntax. - -Each Blockprocessor also has a pointer to the containing BlockParser instance at -`self.parser`, which can be used to check or alter the state of the parser. -The BlockParser tracks it's state in a stack at `parser.state`. The state -stack is an instance of the `State` class. - -**`State`** is a subclass of `list` and has the additional methods: - -* **`set(state)`**: - - Set a new state to string `state`. The new state is appended to the end - of the stack. - -* **`reset()`**: - - Step back one step in the stack. The last state at the end is removed from - the stack. - -* **`isstate(state)`**: - - Test that the top (current) level of the stack is of the given string - `state`. - -Note that to ensure that the state stack does not become corrupted, each time a -state is set for a block, that state *must* be reset when the parser finishes -parsing that block. - -An instance of the **`BlockParser`** is found at `Markdown.parser`. -`BlockParser` has the following methods: - -* **`parseDocument(lines)`**: - - Given a list of lines, an ElementTree object is returned. This should be - passed an entire document and is the only method the `Markdown` class - calls directly. - -* **`parseChunk(parent, text)`**: - - Parses a chunk of markdown text composed of multiple blocks and attaches - those blocks to the `parent` Element. The `parent` is altered in place - and nothing is returned. Extensions would most likely use this method for - block parsing. - -* **`parseBlocks(parent, blocks)`**: - - Parses a list of blocks of text and attaches those blocks to the `parent` - Element. The `parent` is altered in place and nothing is returned. This - method will generally only be used internally to recursively parse nested - blocks of text. - -While is is not recommended, an extension could subclass or completely replace -the `BlockParser`. The new class would have to provide the same public API. -However, be aware that other extensions may expect the core parser provided -and will not work with such a drastically different parser. - -Working with the ElementTree {: #working_with_et } --------------------------------------------------- - -As mentioned, the Markdown parser converts a source document to an -[ElementTree][ElementTree] object before serializing that back to Unicode text. -Markdown has provided some helpers to ease that manipulation within the context -of the Markdown module. - -First, to get access to the ElementTree module import ElementTree from -`markdown` rather than importing it directly. This will ensure you are using -the same version of ElementTree as markdown. The module is found at -`markdown.util.etree` within Markdown. - - from markdown.util import etree - -`markdown.util.etree` tries to import ElementTree from any known location, -first as a standard library module (from `xml.etree` in Python 2.5), then as -a third party package (ElementTree). In each instance, `cElementTree` is -tried first, then ElementTree if the faster C implementation is not -available on your system. - -Sometimes you may want text inserted into an element to be parsed by -[Inline Patterns][]. In such a situation, simply insert the text as you normally -would and the text will be automatically run through the Inline Patterns. -However, if you do *not* want some text to be parsed by Inline Patterns, -then insert the text as an `AtomicString`. - - from markdown.util import AtomicString - some_element.text = AtomicString(some_text) - -Here's a basic example which creates an HTML table (note that the contents of -the second cell (`td2`) will be run through Inline Patterns latter): - - table = etree.Element("table") - table.set("cellpadding", "2") # Set cellpadding to 2 - tr = etree.SubElement(table, "tr") # Add child tr to table - td1 = etree.SubElement(tr, "td") # Add child td1 to tr - td1.text = markdown.util.AtomicString("Cell content") # Add plain text content - td2 = etree.SubElement(tr, "td") # Add second td to tr - td2.text = "*text* with **inline** formatting." # Add markup text - table.tail = "Text after table" # Add text after table - -You can also manipulate an existing tree. Consider the following example which -adds a `class` attribute to `` elements: - - def set_link_class(self, element): - for child in element: - if child.tag == "a": - child.set("class", "myclass") #set the class attribute - set_link_class(child) # run recursively on children - -For more information about working with ElementTree see the ElementTree -[Documentation](http://effbot.org/zone/element-index.htm) -([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)). - -Integrating Your Code Into Markdown {: #integrating_into_markdown } -------------------------------------------------------------------- - -Once you have the various pieces of your extension built, you need to tell -Markdown about them and ensure that they are run in the proper sequence. -Markdown accepts an `Extension` instance for each extension. Therefore, you -will need to define a class that extends `markdown.extensions.Extension` and -over-rides the `extendMarkdown` method. Within this class you will manage -configuration options for your extension and attach the various processors and -patterns to the Markdown instance. - -It is important to note that the order of the various processors and patterns -matters. For example, if we replace `http://...` links with `` elements, -and *then* try to deal with inline HTML, we will end up with a mess. -Therefore, the various types of processors and patterns are stored within an -instance of the Markdown class in [OrderedDict][]s. Your `Extension` class -will need to manipulate those OrderedDicts appropriately. You may insert -instances of your processors and patterns into the appropriate location in an -OrderedDict, remove a built-in instance, or replace a built-in instance with -your own. - -### `extendMarkdown` {: #extendmarkdown } - -The `extendMarkdown` method of a `markdown.extensions.Extension` class -accepts two arguments: - -* **`md`**: - - A pointer to the instance of the Markdown class. You should use this to - access the [OrderedDict][]s of processors and patterns. They are found - under the following attributes: - - * `md.preprocessors` - * `md.inlinePatterns` - * `md.parser.blockprocessors` - * `md.treeprocessors` - * `md.postprocessors` - - Some other things you may want to access in the markdown instance are: - - * `md.htmlStash` - * `md.output_formats` - * `md.set_output_format()` - * `md.output_format` - * `md.serializer` - * `md.registerExtension()` - * `md.html_replacement_text` - * `md.tab_length` - * `md.enable_attributes` - * `md.smart_emphasis` - -* **`md_globals`**: - - Contains all the various global variables within the markdown module. - -!!! Warning - With access to the above items, theoretically you have the option to - change anything through various [monkey_patching][] techniques. However, - you should be aware that the various undocumented parts of markdown may - change without notice and your monkey_patches may break with a new release. - Therefore, what you really should be doing is inserting processors and - patterns into the markdown pipeline. Consider yourself warned! - -[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch - -A simple example: - - from markdown.extensions import Extension - - class MyExtension(Extension): - def extendMarkdown(self, md, md_globals): - # Insert instance of 'mypattern' before 'references' pattern - md.inlinePatterns.add('mypattern', MyPattern(md), '`) followed by an existing key (i.e.: - `">somekey"`) inserts that item after the existing key. - -Consider the following example: - - >>> from markdown.odict import OrderedDict - >>> od = OrderedDict() - >>> od['one'] = 1 # The same as: od.add('one', 1, '_begin') - >>> od['three'] = 3 # The same as: od.add('three', 3, '>one') - >>> od['four'] = 4 # The same as: od.add('four', 4, '_end') - >>> od.items() - [("one", 1), ("three", 3), ("four", 4)] - -Note that when building an OrderedDict in order, the extra features of the -`add` method offer no real value and are not necessary. However, when -manipulating an existing OrderedDict, `add` can be very helpful. So let's -insert another item into the OrderedDict. - - >>> od.add('two', 2, '>one') # Insert after 'one' - >>> od.values() - [1, 2, 3, 4] - -Now let's insert another item. - - >>> od.add('two-point-five', 2.5, '>> od.keys() - ["one", "two", "two-point-five", "three", "four"] - -Note that we also could have set the location of "two-point-five" to be 'after two' -(i.e.: `'>two'`). However, it's unlikely that you will have control over the -order in which extensions will be loaded, and this could affect the final -sorted order of an OrderedDict. For example, suppose an extension adding -"two-point-five" in the above examples was loaded before a separate extension -which adds 'two'. You may need to take this into consideration when adding your -extension components to the various markdown OrderedDicts. - -Once an OrderedDict is created, the items are available via key: - - MyNode = od['somekey'] - -Therefore, to delete an existing item: - - del od['somekey'] - -To change the value of an existing item (leaving location unchanged): - - od['somekey'] = MyNewObject() - -To change the location of an existing item: - - t.link('somekey', '