diff options
31 files changed, 489 insertions, 223 deletions
diff --git a/docs/AUTHORS b/docs/AUTHORS index cfe2b34..2843b56 100644 --- a/docs/AUTHORS +++ b/docs/AUTHORS @@ -37,6 +37,7 @@ Daniel Krech Steward Midwinter Jack Miller Neale Pickett +Paul Stansifer John Szakmeister Malcolm Tredinnick Ben Wilson diff --git a/docs/CHANGE_LOG b/docs/CHANGE_LOG index 1b4af45..e005ff8 100644 --- a/docs/CHANGE_LOG +++ b/docs/CHANGE_LOG @@ -1,6 +1,12 @@ PYTHON MARKDOWN CHANGELOG ========================= +Sept 28, 2009: Released version 2.0.2-Final. + +May 20, 2009: Released version 2.0.1-Final. + +Mar 30, 2009: Released version 2.0-Final. + Mar 8, 2009: Release Candidate 2.0-rc-1. Feb 2009: Added support for multi-level lists to new Blockprocessors. diff --git a/docs/release-2.0.2.txt b/docs/release-2.0.2.txt new file mode 100644 index 0000000..8ae9a3d --- /dev/null +++ b/docs/release-2.0.2.txt @@ -0,0 +1,9 @@ +Python-Markdown 2.0.2 Release Notes +=================================== + +Python-Markdown 2.0.2 is a bug-fix release. No new features have been added. +Most notably, the setup script has been updated to include a dependency on +ElementTree on older versions of Python (< 2.5). There have also been a few +fixes for minor parsing bugs in some edge cases. For a full list of changes, +see the git log. + diff --git a/docs/using_as_module.txt b/docs/using_as_module.txt index cfeb88d..130d0a7 100644 --- a/docs/using_as_module.txt +++ b/docs/using_as_module.txt @@ -20,13 +20,13 @@ string should work) and returns output as Unicode. Do not pass encoded strings If your input is encoded, e.g. as UTF-8, it is your responsibility to decode it. E.g.: - input_file = codecs.open("some_file.txt", mode="r", encoding="utf8") + input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8") text = input_file.read() html = markdown.markdown(text, extensions) If you later want to write it to disk, you should encode it yourself: - output_file = codecs.open("some_file.html", "w", encoding="utf8") + output_file = codecs.open("some_file.html", "w", encoding="utf-8") output_file.write(html) More Options @@ -61,7 +61,7 @@ The ``Markdown`` class has the method ``convertFile`` which reads in a file and writes out to a file-like-object: md = markdown.Markdown() - md.convertFile(input="in.txt", output="out.html", encoding="utf8") + md.convertFile(input="in.txt", output="out.html", encoding="utf-8") The markdown module also includes a shortcut function ``markdownFromFile`` that wraps the above method. @@ -69,7 +69,7 @@ wraps the above method. markdown.markdownFromFile(input="in.txt", output="out.html", extensions=[], - encoding="utf8", + encoding="utf-8", safe=False) In either case, if the ``output`` keyword is passed a file name (i.e.: diff --git a/docs/writing_extensions.txt b/docs/writing_extensions.txt index 860c2ec..3aad74a 100644 --- a/docs/writing_extensions.txt +++ b/docs/writing_extensions.txt @@ -38,15 +38,15 @@ Preprocessors munge the source text before it is passed into the Markdown core. This is an excellent place to clean up bad syntax, extract things the parser may otherwise choke on and perhaps even store it for later retrieval. -Preprocessors should inherit from ``markdown.Preprocessor`` and implement -a ``run`` method with one argument ``lines``. The ``run`` method of each -Preprocessor will be passed the entire source text as a list of Unicode strings. -Each string will contain one line of text. The ``run`` method should return -either that list, or an altered list of Unicode strings. +Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and +implement a ``run`` method with one argument ``lines``. The ``run`` method of +each Preprocessor will be passed the entire source text as a list of Unicode +strings. Each string will contain one line of text. The ``run`` method should +return either that list, or an altered list of Unicode strings. A pseudo example: - class MyPreprocessor(markdown.Preprocessor): + class MyPreprocessor(markdown.preprocessors.Preprocessor): def run(self, lines): new_lines = [] for line in lines: @@ -61,9 +61,9 @@ A pseudo example: Inline Patterns implement the inline HTML element syntax for Markdown such as ``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be -instances of classes that inherit from ``markdown.Pattern`` or one of its -children. Each pattern object uses a single regular expression and must have -the following methods: +instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or +one of its children. Each pattern object uses a single regular expression and +must have the following methods: * **``getCompiledRegExp()``**: @@ -84,7 +84,7 @@ match everything before the pattern. For an example, consider this simplified emphasis pattern: - class EmphasisPattern(markdown.Pattern): + class EmphasisPattern(markdown.inlinepatterns.Pattern): def handleMatch(self, m): el = markdown.etree.Element('em') el.text = m.group(3) @@ -135,13 +135,13 @@ core BlockParser. This is where additional manipulation of the tree takes place. Additionally, the InlineProcessor is a Treeprocessor which steps through the tree and runs the InlinePatterns on the text of each Element in the tree. -A Treeprocessor should inherit from ``markdown.Treeprocessor``, +A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``, over-ride the ``run`` method which takes one argument ``root`` (an Elementree object) and returns either that root element or a modified root element. A pseudo example: - class MyTreeprocessor(markdown.Treeprocessor): + class MyTreeprocessor(markdown.treeprocessors.Treeprocessor): def run(self, root): #do stuff return my_modified_root @@ -155,15 +155,15 @@ Postprocessors manipulate the document after the ElementTree has been serialized into a string. Postprocessors should be used to work with the text just before output. -A Postprocessor should inherit from ``markdown.Postprocessor`` and -over-ride the ``run`` method which takes one argument ``text`` and returns a -Unicode string. +A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` +and over-ride the ``run`` method which takes one argument ``text`` and returns +a Unicode string. Postprocessors are run after the ElementTree has been serialized back into Unicode text. For example, this may be an appropriate place to add a table of contents to a document: - class TocPostprocessor(markdown.Postprocessor): + class TocPostprocessor(markdown.postprocessors.Postprocessor): def run(self, text): return MYMARKERRE.sub(MyToc, text) @@ -179,8 +179,8 @@ That Blockprocessor parses the block and adds it to the ElementTree. The [[Definition Lists]] extension would be a good example of an extension that adds/modifies Blockprocessors. -A Blockprocessor should inherit from ``markdown.BlockProcessor`` and implement -both the ``test`` and ``run`` methods. +A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor`` +and implement both the ``test`` and ``run`` methods. The ``test`` method is used by BlockParser to identify the type of block. Therefore the ``test`` method must return a boolean value. If the test returns diff --git a/markdown/__init__.py b/markdown/__init__.py index 086fde9..26314f6 100644 --- a/markdown/__init__.py +++ b/markdown/__init__.py @@ -39,8 +39,8 @@ Copyright 2004 Manfred Stienstra (the original version) License: BSD (see docs/LICENSE for details). """ -version = "2.0.1" -version_info = (2,0,1, "Final") +version = "2.0.3" +version_info = (2,0,3, "Final") import re import codecs @@ -182,7 +182,7 @@ class Markdown: def __init__(self, extensions=[], extension_configs={}, - safe_mode = False, + safe_mode = False, output_format=DEFAULT_OUTPUT_FORMAT): """ Creates a new Markdown instance. @@ -200,12 +200,12 @@ class Markdown: * "xhtml": Outputs latest supported version of XHTML (currently XHTML 1.1). * "html4": Outputs HTML 4 * "html": Outputs latest supported version of HTML (currently HTML 4). - Note that it is suggested that the more specific formats ("xhtml1" + Note that it is suggested that the more specific formats ("xhtml1" and "html4") be used as "xhtml" or "html" may change in the future - if it makes sense at that time. + if it makes sense at that time. """ - + self.safeMode = safe_mode self.registeredExtensions = [] self.docType = "" @@ -300,9 +300,9 @@ class Markdown: # Map format keys to serializers self.output_formats = { - 'html' : html4.to_html_string, + 'html' : html4.to_html_string, 'html4' : html4.to_html_string, - 'xhtml' : etree.tostring, + 'xhtml' : etree.tostring, 'xhtml1': etree.tostring, } @@ -327,12 +327,14 @@ class Markdown: for ext in extensions: if isinstance(ext, basestring): ext = load_extension(ext, configs.get(ext, [])) - try: - ext.extendMarkdown(self, globals()) - except AttributeError: - message(ERROR, "Incorrect type! Extension '%s' is " - "neither a string or an Extension." %(repr(ext))) - + if isinstance(ext, Extension): + try: + ext.extendMarkdown(self, globals()) + except NotImplementedError, e: + message(ERROR, e) + else: + message(ERROR, 'Extension "%s.%s" must be of type: "markdown.Extension".' \ + % (ext.__class__.__module__, ext.__class__.__name__)) def registerExtension(self, extension): """ This gets called by the extension """ @@ -346,7 +348,8 @@ class Markdown: self.references.clear() for extension in self.registeredExtensions: - extension.reset() + if hasattr(extension, 'reset'): + extension.reset() def set_output_format(self, format): """ Set the output format for the class instance. """ @@ -395,7 +398,7 @@ class Markdown: root = newRoot # Serialize _properly_. Strip top-level tags. - output, length = codecs.utf_8_decode(self.serializer(root, encoding="utf8")) + output, length = codecs.utf_8_decode(self.serializer(root, encoding="utf-8")) if self.stripTopLevelTags: try: start = output.index('<%s>'%DOC_TAG)+len(DOC_TAG)+2 @@ -429,7 +432,7 @@ class Markdown: Keyword arguments: - * input: Name of source text file. + * input: File object or path of file as string. * output: Name of output file. Writes to stdout if `None`. * encoding: Encoding of input and output files. Defaults to utf-8. @@ -438,7 +441,10 @@ class Markdown: encoding = encoding or "utf-8" # Read the source - input_file = codecs.open(input, mode="r", encoding=encoding) + if isinstance(input, basestring): + input_file = codecs.open(input, mode="r", encoding=encoding) + else: + input_file = input text = input_file.read() input_file.close() text = text.lstrip(u'\ufeff') # remove the byte-order mark @@ -447,7 +453,7 @@ class Markdown: html = self.convert(text) # Write to file or stdout - if isinstance(output, (str, unicode)): + if isinstance(output, basestring): output_file = codecs.open(output, "w", encoding=encoding) output_file.write(html) output_file.close() @@ -499,7 +505,8 @@ class Extension: * md_globals: Global variables in the markdown module namespace. """ - pass + raise NotImplementedError, 'Extension "%s.%s" must define an "extendMarkdown"' \ + 'method.' % (self.__class__.__module__, self.__class__.__name__) def load_extension(ext_name, configs = []): @@ -540,8 +547,8 @@ def load_extension(ext_name, configs = []): # function called makeExtension() try: return module.makeExtension(configs.items()) - except AttributeError: - message(CRITICAL, "Failed to initiate extension '%s'" % ext_name) + except AttributeError, e: + message(CRITICAL, "Failed to initiate extension '%s': %s" % (ext_name, e)) def load_extensions(ext_names): @@ -582,15 +589,15 @@ def markdown(text, * "xhtml": Outputs latest supported version of XHTML (currently XHTML 1.1). * "html4": Outputs HTML 4 * "html": Outputs latest supported version of HTML (currently HTML 4). - Note that it is suggested that the more specific formats ("xhtml1" + Note that it is suggested that the more specific formats ("xhtml1" and "html4") be used as "xhtml" or "html" may change in the future - if it makes sense at that time. + if it makes sense at that time. Returns: An HTML document as a string. """ md = Markdown(extensions=load_extensions(extensions), - safe_mode=safe_mode, + safe_mode=safe_mode, output_format=output_format) return md.convert(text) @@ -602,7 +609,7 @@ def markdownFromFile(input = None, safe_mode = False, output_format = DEFAULT_OUTPUT_FORMAT): """Read markdown code from a file and write it to a file or a stream.""" - md = Markdown(extensions=load_extensions(extensions), + md = Markdown(extensions=load_extensions(extensions), safe_mode=safe_mode, output_format=output_format) md.convertFile(input, output, encoding) diff --git a/markdown/commandline.py b/markdown/commandline.py index 1eedc6d..dce2b8a 100644 --- a/markdown/commandline.py +++ b/markdown/commandline.py @@ -2,47 +2,25 @@ COMMAND-LINE SPECIFIC STUFF ============================================================================= -The rest of the code is specifically for handling the case where Python -Markdown is called from the command line. """ import markdown import sys +import optparse import logging from logging import DEBUG, INFO, WARN, ERROR, CRITICAL -EXECUTABLE_NAME_FOR_USAGE = "python markdown.py" -""" The name used in the usage statement displayed for python versions < 2.3. -(With python 2.3 and higher the usage statement is generated by optparse -and uses the actual name of the executable called.) """ - -OPTPARSE_WARNING = """ -Python 2.3 or higher required for advanced command line options. -For lower versions of Python use: - - %s INPUT_FILE > OUTPUT_FILE - -""" % EXECUTABLE_NAME_FOR_USAGE - def parse_options(): """ Define and parse `optparse` options for command-line usage. """ - - try: - optparse = __import__("optparse") - except: - if len(sys.argv) == 2: - return {'input': sys.argv[1], - 'output': None, - 'safe': False, - 'extensions': [], - 'encoding': None }, CRITICAL - else: - print OPTPARSE_WARNING - return None, None - - parser = optparse.OptionParser(usage="%prog INPUTFILE [options]") + usage = """%prog [options] [INPUTFILE] + (STDIN is assumed if no INPUTFILE is given)""" + desc = "A Python implementation of John Gruber's Markdown. " \ + "http://www.freewisdom.org/projects/python-markdown/" + ver = "%%prog %s" % markdown.version + + parser = optparse.OptionParser(usage=usage, description=desc, version=ver) parser.add_option("-f", "--file", dest="filename", default=sys.stdout, help="write output to OUTPUT_FILE", metavar="OUTPUT_FILE") @@ -56,10 +34,10 @@ def parse_options(): help="print info messages") parser.add_option("-s", "--safe", dest="safe", default=False, metavar="SAFE_MODE", - help="safe mode ('replace', 'remove' or 'escape' user's HTML tag)") + help="'replace', 'remove' or 'escape' HTML tags in input") parser.add_option("-o", "--output_format", dest="output_format", default='xhtml1', metavar="OUTPUT_FORMAT", - help="Format of output. One of 'xhtml1' (default) or 'html4'.") + help="'xhtml1' (default) or 'html4'.") parser.add_option("--noisy", action="store_const", const=DEBUG, dest="verbose", help="print debug messages") @@ -68,9 +46,8 @@ def parse_options(): (options, args) = parser.parse_args() - if not len(args) == 1: - parser.print_help() - return None, None + if len(args) == 0: + input_file = sys.stdin else: input_file = args[0] diff --git a/markdown/extensions/codehilite.py b/markdown/extensions/codehilite.py index c5d496b..b9e1760 100644 --- a/markdown/extensions/codehilite.py +++ b/markdown/extensions/codehilite.py @@ -10,9 +10,9 @@ Copyright 2006-2008 [Waylan Limberg](http://achinghead.com/). Project website: <http://www.freewisdom.org/project/python-markdown/CodeHilite> Contact: markdown@freewisdom.org - + License: BSD (see ../docs/LICENSE for details) - + Dependencies: * [Python 2.3+](http://python.org/) * [Markdown 2.0+](http://www.freewisdom.org/projects/python-markdown/) @@ -38,41 +38,45 @@ class CodeHilite: Basic Usage: >>> code = CodeHilite(src = 'some text') >>> html = code.hilite() - + * src: Source string or any object with a .readline attribute. - + * linenos: (Boolen) Turn line numbering 'on' or 'off' (off by default). * css_class: Set class name of wrapper div ('codehilite' by default). - + Low Level Usage: >>> code = CodeHilite() >>> code.src = 'some text' # String or anything with a .readline attr. >>> code.linenos = True # True or False; Turns line numbering on or of. >>> html = code.hilite() - + """ - def __init__(self, src=None, linenos=False, css_class="codehilite"): + def __init__(self, src=None, linenos=False, css_class="codehilite", + lang=None, style='default', noclasses=False): self.src = src - self.lang = None + self.lang = lang self.linenos = linenos self.css_class = css_class + self.style = style + self.noclasses = noclasses def hilite(self): """ - Pass code to the [Pygments](http://pygments.pocoo.org/) highliter with - optional line numbers. The output should then be styled with css to - your liking. No styles are applied by default - only styling hooks - (i.e.: <span class="k">). + Pass code to the [Pygments](http://pygments.pocoo.org/) highliter with + optional line numbers. The output should then be styled with css to + your liking. No styles are applied by default - only styling hooks + (i.e.: <span class="k">). returns : A string of html. - + """ self.src = self.src.strip('\n') - - self._getLang() + + if self.lang == None: + self._getLang() try: from pygments import highlight @@ -96,8 +100,10 @@ class CodeHilite: lexer = guess_lexer(self.src) except ValueError: lexer = TextLexer() - formatter = HtmlFormatter(linenos=self.linenos, - cssclass=self.css_class) + formatter = HtmlFormatter(linenos=self.linenos, + cssclass=self.css_class, + style=self.style, + noclasses=self.noclasses) return highlight(self.src, lexer, formatter) def _escape(self, txt): @@ -114,8 +120,8 @@ class CodeHilite: txt = txt.replace('\t', ' '*TAB_LENGTH) txt = txt.replace(" "*4, " ") txt = txt.replace(" "*3, " ") - txt = txt.replace(" "*2, " ") - + txt = txt.replace(" "*2, " ") + # Add line numbers lines = txt.splitlines() txt = '<div class="codehilite"><pre><ol>\n' @@ -126,31 +132,31 @@ class CodeHilite: def _getLang(self): - """ + """ Determines language of a code block from shebang lines and whether said line should be removed or left in place. If the sheband line contains a path (even a single /) then it is assumed to be a real shebang lines and - left alone. However, if no path is given (e.i.: #!python or :::python) + left alone. However, if no path is given (e.i.: #!python or :::python) then it is assumed to be a mock shebang for language identifitation of a - code fragment and removed from the code block prior to processing for - code highlighting. When a mock shebang (e.i: #!python) is found, line - numbering is turned on. When colons are found in place of a shebang - (e.i.: :::python), line numbering is left in the current state - off + code fragment and removed from the code block prior to processing for + code highlighting. When a mock shebang (e.i: #!python) is found, line + numbering is turned on. When colons are found in place of a shebang + (e.i.: :::python), line numbering is left in the current state - off by default. - + """ import re - + #split text into lines lines = self.src.split("\n") #pull first line to examine fl = lines.pop(0) - + c = re.compile(r''' (?:(?:::+)|(?P<shebang>[#]!)) # Shebang or 2 or more colons. - (?P<path>(?:/\w+)*[/ ])? # Zero or 1 path - (?P<lang>[\w+-]*) # The language + (?P<path>(?:/\w+)*[/ ])? # Zero or 1 path + (?P<lang>[\w+-]*) # The language ''', re.VERBOSE) # search first line for shebang m = c.search(fl) @@ -169,7 +175,7 @@ class CodeHilite: else: # No match lines.insert(0, fl) - + self.src = "\n".join(lines).strip("\n") @@ -184,14 +190,16 @@ class HiliteTreeprocessor(markdown.treeprocessors.Treeprocessor): for block in blocks: children = block.getchildren() if len(children) == 1 and children[0].tag == 'code': - code = CodeHilite(children[0].text, + code = CodeHilite(children[0].text, linenos=self.config['force_linenos'][0], - css_class=self.config['css_class'][0]) - placeholder = self.markdown.htmlStash.store(code.hilite(), + css_class=self.config['css_class'][0], + style=self.config['pygments_style'][0], + noclasses=self.config['noclasses'][0]) + placeholder = self.markdown.htmlStash.store(code.hilite(), safe=True) # Clear codeblock in etree instance block.clear() - # Change to p element which will later + # Change to p element which will later # be removed when inserting raw html block.tag = 'p' block.text = placeholder @@ -204,19 +212,23 @@ class CodeHiliteExtension(markdown.Extension): # define default configs self.config = { 'force_linenos' : [False, "Force line numbers - Default: False"], - 'css_class' : ["codehilite", + 'css_class' : ["codehilite", "Set class name for wrapper <div> - Default: codehilite"], + 'pygments_style' : ['tango', 'Pygments HTML Formatter Style (Colorscheme) - Default: tango'], + 'noclasses': [False, 'Use inline styles instead of CSS classes - Default false'] } - + # Override defaults with user settings for key, value in configs: - self.setConfig(key, value) + self.setConfig(key, value) def extendMarkdown(self, md, md_globals): """ Add HilitePostprocessor to Markdown instance. """ hiliter = HiliteTreeprocessor(md) hiliter.config = self.config - md.treeprocessors.add("hilite", hiliter, "_begin") + md.treeprocessors.add("hilite", hiliter, "_begin") + + md.registerExtension(self) def makeExtension(configs={}): diff --git a/markdown/extensions/extra.py b/markdown/extensions/extra.py index 4a2ffbf..e569029 100644 --- a/markdown/extensions/extra.py +++ b/markdown/extensions/extra.py @@ -44,6 +44,8 @@ class ExtraExtension(markdown.Extension): def extendMarkdown(self, md, md_globals): """ Register extension instances. """ md.registerExtensions(extensions, self.config) + # Turn on processing of markdown text within raw html + md.preprocessors['html_block'].markdown_in_raw = True def makeExtension(configs={}): return ExtraExtension(configs=dict(configs)) diff --git a/markdown/extensions/fenced_code.py b/markdown/extensions/fenced_code.py index 307b1dc..2b03bbc 100644 --- a/markdown/extensions/fenced_code.py +++ b/markdown/extensions/fenced_code.py @@ -9,7 +9,7 @@ This extension adds Fenced Code Blocks to Python-Markdown. >>> import markdown >>> text = ''' ... A paragraph before a fenced code block: - ... + ... ... ~~~ ... Fenced code block ... ~~~ @@ -22,14 +22,14 @@ Works with safe_mode also (we check this because we are using the HtmlStash): >>> markdown.markdown(text, extensions=['fenced_code'], safe_mode='replace') u'<p>A paragraph before a fenced code block:</p>\\n<pre><code>Fenced code block\\n</code></pre>' - + Include tilde's in a code block and wrap with blank lines: >>> text = ''' ... ~~~~~~~~ - ... + ... ... ~~~~ - ... + ... ... ~~~~~~~~''' >>> markdown.markdown(text, extensions=['fenced_code']) u'<pre><code>\\n~~~~\\n\\n</code></pre>' @@ -40,7 +40,7 @@ Multiple blocks and language tags: ... ~~~~{.python} ... block one ... ~~~~ - ... + ... ... ~~~~.html ... <p>block two</p> ... ~~~~''' @@ -52,39 +52,63 @@ Copyright 2007-2008 [Waylan Limberg](http://achinghead.com/). Project website: <http://www.freewisdom.org/project/python-markdown/Fenced__Code__Blocks> Contact: markdown@freewisdom.org -License: BSD (see ../docs/LICENSE for details) +License: BSD (see ../docs/LICENSE for details) Dependencies: -* [Python 2.3+](http://python.org) +* [Python 2.4+](http://python.org) * [Markdown 2.0+](http://www.freewisdom.org/projects/python-markdown/) +* [Pygments (optional)](http://pygments.org) """ import markdown, re +from markdown.extensions.codehilite import CodeHilite, CodeHiliteExtension # Global vars FENCED_BLOCK_RE = re.compile( \ - r'(?P<fence>^~{3,})[ ]*(\{?\.(?P<lang>[a-zA-Z0-9_-]*)\}?)?[ ]*\n(?P<code>.*?)(?P=fence)[ ]*$', + r'(?P<fence>^~{3,})[ ]*(\{?\.(?P<lang>[a-zA-Z0-9_-]*)\}?)?[ ]*\n(?P<code>.*?)(?P=fence)[ ]*$', re.MULTILINE|re.DOTALL ) CODE_WRAP = '<pre><code%s>%s</code></pre>' LANG_TAG = ' class="%s"' - class FencedCodeExtension(markdown.Extension): def extendMarkdown(self, md, md_globals): """ Add FencedBlockPreprocessor to the Markdown instance. """ + md.registerExtension(self) - md.preprocessors.add('fenced_code_block', - FencedBlockPreprocessor(md), + md.preprocessors.add('fenced_code_block', + FencedBlockPreprocessor(md), "_begin") class FencedBlockPreprocessor(markdown.preprocessors.Preprocessor): - + + def __init__(self, md): + markdown.preprocessors.Preprocessor.__init__(self, md) + + self.checked_for_codehilite = False + self.codehilite_conf = {} + + def getConfig(self, key): + if key in self.config: + return self.config[key][0] + else: + return None + def run(self, lines): """ Match and store Fenced Code Blocks in the HtmlStash. """ + + # Check for code hilite extension + if not self.checked_for_codehilite: + for ext in self.markdown.registeredExtensions: + if isinstance(ext, CodeHiliteExtension): + self.codehilite_conf = ext.config + break + + self.checked_for_codehilite = True + text = "\n".join(lines) while 1: m = FENCED_BLOCK_RE.search(text) @@ -92,7 +116,21 @@ class FencedBlockPreprocessor(markdown.preprocessors.Preprocessor): lang = '' if m.group('lang'): lang = LANG_TAG % m.group('lang') - code = CODE_WRAP % (lang, self._escape(m.group('code'))) + + # If config is not empty, then the codehighlite extension + # is enabled, so we call it to highlite the code + if self.codehilite_conf: + highliter = CodeHilite(m.group('code'), + linenos=self.codehilite_conf['force_linenos'][0], + css_class=self.codehilite_conf['css_class'][0], + style=self.codehilite_conf['pygments_style'][0], + lang=(m.group('lang') if m.group('lang') else None), + noclasses=self.codehilite_conf['noclasses'][0]) + + code = highliter.hilite() + else: + code = CODE_WRAP % (lang, self._escape(m.group('code'))) + placeholder = self.markdown.htmlStash.store(code, safe=True) text = '%s\n%s\n%s'% (text[:m.start()], placeholder, text[m.end():]) else: @@ -109,7 +147,7 @@ class FencedBlockPreprocessor(markdown.preprocessors.Preprocessor): def makeExtension(configs=None): - return FencedCodeExtension() + return FencedCodeExtension(configs=configs) if __name__ == "__main__": diff --git a/markdown/extensions/footnotes.py b/markdown/extensions/footnotes.py index 6dacab7..e1a9cda 100644 --- a/markdown/extensions/footnotes.py +++ b/markdown/extensions/footnotes.py @@ -38,11 +38,18 @@ class FootnoteExtension(markdown.Extension): """ Setup configs. """ self.config = {'PLACE_MARKER': ["///Footnotes Go Here///", - "The text string that marks where the footnotes go"]} + "The text string that marks where the footnotes go"], + 'UNIQUE_IDS': + [False, + "Avoid name collisions across " + "multiple calls to reset()."]} for key, value in configs: self.config[key][0] = value - + + # In multiple invocations, emit links that don't get tangled. + self.unique_prefix = 0 + self.reset() def extendMarkdown(self, md, md_globals): @@ -66,8 +73,9 @@ class FootnoteExtension(markdown.Extension): ">amp_substitute") def reset(self): - """ Clear the footnotes on reset. """ + """ Clear the footnotes on reset, and prepare for a distinct document. """ self.footnotes = markdown.odict.OrderedDict() + self.unique_prefix += 1 def findFootnotesPlaceholder(self, root): """ Return ElementTree Element that contains Footnote placeholder. """ @@ -91,11 +99,17 @@ class FootnoteExtension(markdown.Extension): def makeFootnoteId(self, id): """ Return footnote link id. """ - return 'fn:%s' % id + if self.getConfig("UNIQUE_IDS"): + return 'fn:%d-%s' % (self.unique_prefix, id) + else: + return 'fn:%s' % id def makeFootnoteRefId(self, id): """ Return footnote back-link id. """ - return 'fnref:%s' % id + if self.getConfig("UNIQUE_IDS"): + return 'fnref:%d-%s' % (self.unique_prefix, id) + else: + return 'fnref:%s' % id def makeFootnotesDiv(self, root): """ Return div of footnotes as et Element. """ diff --git a/markdown/extensions/toc.py b/markdown/extensions/toc.py index 1624ccf..fd2a86a 100644 --- a/markdown/extensions/toc.py +++ b/markdown/extensions/toc.py @@ -60,13 +60,9 @@ class TocTreeprocessor(markdown.treeprocessors.Treeprocessor): if header_rgx.match(c.tag): tag_level = int(c.tag[-1]) - # Regardless of how many levels we jumped - # only one list should be created, since - # empty lists containing lists are illegal. - - if tag_level < level: + while tag_level < level: list_stack.pop() - level = tag_level + level -= 1 if tag_level > level: newlist = etree.Element("ul") @@ -75,7 +71,10 @@ class TocTreeprocessor(markdown.treeprocessors.Treeprocessor): else: list_stack[-1].append(newlist) list_stack.append(newlist) - level = tag_level + if level == 0: + level = tag_level + else: + level += 1 # Do not override pre-existing ids if not "id" in c.attrib: diff --git a/markdown/inlinepatterns.py b/markdown/inlinepatterns.py index 331bead..917a9d3 100644 --- a/markdown/inlinepatterns.py +++ b/markdown/inlinepatterns.py @@ -69,7 +69,7 @@ STRONG_RE = r'(\*{2}|_{2})(.+?)\2' # **strong** STRONG_EM_RE = r'(\*{3}|_{3})(.+?)\2' # ***strong*** if markdown.SMART_EMPHASIS: - EMPHASIS_2_RE = r'(?<!\S)(_)(\S.+?)\2' # _emphasis_ + EMPHASIS_2_RE = r'(?<!\w)(_)(\S.+?)\2(?!\w)' # _emphasis_ else: EMPHASIS_2_RE = r'(_)(.+?)\2' # _emphasis_ diff --git a/markdown/preprocessors.py b/markdown/preprocessors.py index ef04cab..b199f0a 100644 --- a/markdown/preprocessors.py +++ b/markdown/preprocessors.py @@ -77,20 +77,53 @@ class HtmlBlockPreprocessor(Preprocessor): """Remove html blocks from the text and store them for later retrieval.""" right_tag_patterns = ["</%s>", "%s>"] + attrs_pattern = r""" + \s+(?P<attr>[^>"'/= ]+)=(?P<q>['"])(?P<value>.*?)(?P=q) # attr="value" + | # OR + \s+(?P<attr1>[^>"'/= ]+)=(?P<value1>[^> ]+) # attr=value + | # OR + \s+(?P<attr2>[^>"'/= ]+) # attr + """ + left_tag_pattern = r'^\<(?P<tag>[^> ]+)(?P<attrs>(%s)*)\s*\/?\>?' % attrs_pattern + attrs_re = re.compile(attrs_pattern, re.VERBOSE) + left_tag_re = re.compile(left_tag_pattern, re.VERBOSE) + markdown_in_raw = False def _get_left_tag(self, block): - return block[1:].replace(">", " ", 1).split()[0].lower() + m = self.left_tag_re.match(block) + if m: + tag = m.group('tag') + raw_attrs = m.group('attrs') + attrs = {} + if raw_attrs: + for ma in self.attrs_re.finditer(raw_attrs): + if ma.group('attr'): + if ma.group('value'): + attrs[ma.group('attr').strip()] = ma.group('value') + else: + attrs[ma.group('attr').strip()] = "" + elif ma.group('attr1'): + if ma.group('value1'): + attrs[ma.group('attr1').strip()] = ma.group('value1') + else: + attrs[ma.group('attr1').strip()] = "" + elif ma.group('attr2'): + attrs[ma.group('attr2').strip()] = "" + return tag, len(m.group(0)), attrs + else: + tag = block[1:].replace(">", " ", 1).split()[0].lower() + return tag, len(tag+2), {} - def _get_right_tag(self, left_tag, block): + def _get_right_tag(self, left_tag, left_index, block): for p in self.right_tag_patterns: tag = p % left_tag i = block.rfind(tag) if i > 2: - return tag.lstrip("<").rstrip(">"), i + len(p)-2 + len(left_tag) - return block.rstrip()[-len(left_tag)-2:-1].lower(), len(block) + return tag.lstrip("<").rstrip(">"), i + len(p)-2 + left_index-2 + return block.rstrip()[-left_index:-1].lower(), len(block) def _equal_tags(self, left_tag, right_tag): - if left_tag == 'div' or left_tag[0] in ['?', '@', '%']: # handle PHP, etc. + if left_tag[0] in ['?', '@', '%']: # handle PHP, etc. return True if ("/" + left_tag) == right_tag: return True @@ -113,7 +146,7 @@ class HtmlBlockPreprocessor(Preprocessor): left_tag = '' right_tag = '' in_tag = False # flag - + while text: block = text[0] if block.startswith("\n"): @@ -125,13 +158,17 @@ class HtmlBlockPreprocessor(Preprocessor): if not in_tag: if block.startswith("<"): - left_tag = self._get_left_tag(block) - right_tag, data_index = self._get_right_tag(left_tag, block) + left_tag, left_index, attrs = self._get_left_tag(block) + right_tag, data_index = self._get_right_tag(left_tag, + left_index, + block) if block[1] == "!": # is a comment block left_tag = "--" - right_tag, data_index = self._get_right_tag(left_tag, block) + right_tag, data_index = self._get_right_tag(left_tag, + left_index, + block) # keep checking conditions below and maybe just append if data_index < len(block) \ @@ -147,13 +184,24 @@ class HtmlBlockPreprocessor(Preprocessor): if self._is_oneliner(left_tag): new_blocks.append(block.strip()) continue - + if block.rstrip().endswith(">") \ and self._equal_tags(left_tag, right_tag): - new_blocks.append( - self.markdown.htmlStash.store(block.strip())) + if self.markdown_in_raw and 'markdown' in attrs.keys(): + start = re.sub(r'\smarkdown(=[\'"]?[^> ]*[\'"]?)?', + '', block[:left_index]) + end = block[-len(right_tag)-2:] + block = block[left_index:-len(right_tag)-2] + new_blocks.append( + self.markdown.htmlStash.store(start)) + new_blocks.append(block) + new_blocks.append( + self.markdown.htmlStash.store(end)) + else: + new_blocks.append( + self.markdown.htmlStash.store(block.strip())) continue - else: #if not block[1] == "!": + else: # if is block level tag and is not complete if markdown.isBlockLevel(left_tag) or left_tag == "--" \ @@ -169,19 +217,47 @@ class HtmlBlockPreprocessor(Preprocessor): new_blocks.append(block) else: - items.append(block.strip()) + items.append(block) - right_tag, data_index = self._get_right_tag(left_tag, block) + right_tag, data_index = self._get_right_tag(left_tag, + left_index, + block) if self._equal_tags(left_tag, right_tag): # if find closing tag in_tag = False - new_blocks.append( - self.markdown.htmlStash.store('\n\n'.join(items))) + if self.markdown_in_raw and 'markdown' in attrs.keys(): + start = re.sub(r'\smarkdown(=[\'"]?[^> ]*[\'"]?)?', + '', items[0][:left_index]) + items[0] = items[0][left_index:] + end = items[-1][-len(right_tag)-2:] + items[-1] = items[-1][:-len(right_tag)-2] + new_blocks.append( + self.markdown.htmlStash.store(start)) + new_blocks.extend(items) + new_blocks.append( + self.markdown.htmlStash.store(end)) + else: + new_blocks.append( + self.markdown.htmlStash.store('\n\n'.join(items))) items = [] if items: - new_blocks.append(self.markdown.htmlStash.store('\n\n'.join(items))) + if self.markdown_in_raw and 'markdown' in attrs.keys(): + start = re.sub(r'\smarkdown(=[\'"]?[^> ]*[\'"]?)?', + '', items[0][:left_index]) + items[0] = items[0][left_index:] + end = items[-1][-len(right_tag)-2:] + items[-1] = items[-1][:-len(right_tag)-2] + new_blocks.append( + self.markdown.htmlStash.store(start)) + new_blocks.extend(items) + new_blocks.append( + self.markdown.htmlStash.store(end)) + else: + new_blocks.append( + self.markdown.htmlStash.store('\n\n'.join(items))) + #new_blocks.append(self.markdown.htmlStash.store('\n\n'.join(items))) new_blocks.append('\n') new_text = "\n\n".join(new_blocks) diff --git a/markdown/tests/misc/div.html b/markdown/tests/misc/div.html index 7cd0d6d..7b68854 100644 --- a/markdown/tests/misc/div.html +++ b/markdown/tests/misc/div.html @@ -1,4 +1,5 @@ <div id="sidebar"> -<p><em>foo</em></p> + _foo_ + </div>
\ No newline at end of file diff --git a/markdown/tests/misc/html.html b/markdown/tests/misc/html.html index 81ac5ee..cd6d4af 100644 --- a/markdown/tests/misc/html.html +++ b/markdown/tests/misc/html.html @@ -5,5 +5,9 @@ <p>Now some <arbitrary>arbitrary tags</arbitrary>.</p> <div>More block level html.</div> +<div class="foo bar" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>"> +Html with various attributes. +</div> + <p>And of course <script>blah</script>.</p> <p><a href="script>stuff</script">this <script>link</a></p>
\ No newline at end of file diff --git a/markdown/tests/misc/html.txt b/markdown/tests/misc/html.txt index 3ac3ae0..c08fe1d 100644 --- a/markdown/tests/misc/html.txt +++ b/markdown/tests/misc/html.txt @@ -7,6 +7,10 @@ Now some <arbitrary>arbitrary tags</arbitrary>. <div>More block level html.</div> +<div class="foo bar" title="with 'quoted' text." valueless_attr weirdness="<i>foo</i>"> +Html with various attributes. +</div> + And of course <script>blah</script>. [this <script>link](<script>stuff</script>) diff --git a/markdown/tests/misc/multi-line-tags.html b/markdown/tests/misc/multi-line-tags.html index 763a050..784c1dd 100644 --- a/markdown/tests/misc/multi-line-tags.html +++ b/markdown/tests/misc/multi-line-tags.html @@ -1,4 +1,5 @@ <div> -<p>asdf asdfasd</p> +asdf asdfasd + </div>
\ No newline at end of file diff --git a/markdown/tests/misc/multiline-comments.html b/markdown/tests/misc/multiline-comments.html index 547ba0b..12f8cb5 100644 --- a/markdown/tests/misc/multiline-comments.html +++ b/markdown/tests/misc/multiline-comments.html @@ -2,7 +2,7 @@ foo ---> +--> <p> @@ -12,5 +12,6 @@ foo <div> -<p>foo</p> +foo + </div>
\ No newline at end of file diff --git a/markdown/treeprocessors.py b/markdown/treeprocessors.py index 1dc612a..ca3b02b 100644 --- a/markdown/treeprocessors.py +++ b/markdown/treeprocessors.py @@ -275,24 +275,25 @@ class InlineProcessor(Treeprocessor): if child.getchildren(): stack.append(child) - for element, lst in insertQueue: - if element.text: - element.text = \ - markdown.inlinepatterns.handleAttributes(element.text, - element) - i = 0 - for newChild in lst: - # Processing attributes - if newChild.tail: - newChild.tail = \ - markdown.inlinepatterns.handleAttributes(newChild.tail, + if markdown.ENABLE_ATTRIBUTES: + for element, lst in insertQueue: + if element.text: + element.text = \ + markdown.inlinepatterns.handleAttributes(element.text, element) - if newChild.text: - newChild.text = \ - markdown.inlinepatterns.handleAttributes(newChild.text, - newChild) - element.insert(i, newChild) - i += 1 + i = 0 + for newChild in lst: + # Processing attributes + if newChild.tail: + newChild.tail = \ + markdown.inlinepatterns.handleAttributes(newChild.tail, + element) + if newChild.text: + newChild.text = \ + markdown.inlinepatterns.handleAttributes(newChild.text, + newChild) + element.insert(i, newChild) + i += 1 return tree @@ -3,7 +3,8 @@ import sys, os from distutils.core import setup from distutils.command.install_scripts import install_scripts -from markdown import version + +version = '2.0.3' class md_install_scripts(install_scripts): """ Customized install_scripts. Create markdown.bat for win32. """ @@ -23,39 +24,44 @@ class md_install_scripts(install_scripts): except Exception, e: print 'ERROR: Unable to create %s: %s' % (bat_path, e) -setup( - name = 'Markdown', - version = version, - url = 'http://www.freewisdom.org/projects/python-markdown', - download_url = 'http://pypi.python.org/packages/source/M/Markdown/Markdown-%s.tar.gz'%version, - description = "Python implementation of Markdown.", - author = "Manfred Stienstra and Yuri takhteyev", - author_email = "yuri [at] freewisdom.org", - maintainer = "Waylan Limberg", - maintainer_email = "waylan [at] gmail.com", - license = "BSD License", - packages = ['markdown', 'markdown.extensions', 'markdown.tests'], - scripts = ['bin/markdown'], - package_data = {'': ['tests/*/*.txt', 'tests/*/*.html', 'tests/*/*.cfg', +data = dict( + name = 'Markdown', + version = version, + url = 'http://www.freewisdom.org/projects/python-markdown', + download_url = 'http://pypi.python.org/packages/source/M/Markdown/Markdown-%s.tar.gz' % version, + description = 'Python implementation of Markdown.', + author = 'Manfred Stienstra and Yuri takhteyev', + author_email = 'yuri [at] freewisdom.org', + maintainer = 'Waylan Limberg', + maintainer_email = 'waylan [at] gmail.com', + license = 'BSD License', + packages = ['markdown', 'markdown.extensions', 'markdown.tests'], + package_data = {'': ['tests/*/*.txt', 'tests/*/*.html', 'tests/*/*.cfg', 'tests/*/*/*.txt', 'tests/*/*/*.html', 'tests/*/*/*.cfg']}, - cmdclass = {'install_scripts': md_install_scripts}, - classifiers = ['Development Status :: 5 - Production/Stable', - 'License :: OSI Approved :: BSD License', - 'Operating System :: OS Independent', - 'Programming Language :: Python', - 'Programming Language :: Python :: 2', - 'Programming Language :: Python :: 2.3', - 'Programming Language :: Python :: 2.4', - 'Programming Language :: Python :: 2.5', - 'Programming Language :: Python :: 2.6', - 'Programming Language :: Python :: 3', - 'Programming Language :: Python :: 3.0', - 'Topic :: Communications :: Email :: Filters', - 'Topic :: Internet :: WWW/HTTP :: Dynamic Content :: CGI Tools/Libraries', - 'Topic :: Internet :: WWW/HTTP :: Site Management', - 'Topic :: Software Development :: Documentation', - 'Topic :: Software Development :: Libraries :: Python Modules', - 'Topic :: Text Processing :: Filters', - 'Topic :: Text Processing :: Markup :: HTML', - ], + scripts = ['bin/markdown'], + cmdclass = {'install_scripts': md_install_scripts}, + classifiers = ['Development Status :: 5 - Production/Stable', + 'License :: OSI Approved :: BSD License', + 'Operating System :: OS Independent', + 'Programming Language :: Python', + 'Programming Language :: Python :: 2', + 'Programming Language :: Python :: 2.3', + 'Programming Language :: Python :: 2.4', + 'Programming Language :: Python :: 2.5', + 'Programming Language :: Python :: 2.6', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.0', + 'Topic :: Communications :: Email :: Filters', + 'Topic :: Internet :: WWW/HTTP :: Dynamic Content :: CGI Tools/Libraries', + 'Topic :: Internet :: WWW/HTTP :: Site Management', + 'Topic :: Software Development :: Documentation', + 'Topic :: Software Development :: Libraries :: Python Modules', + 'Topic :: Text Processing :: Filters', + 'Topic :: Text Processing :: Markup :: HTML', + ], ) + +if sys.version[:3] < '2.5': + data['install_requires'] = ['elementtree'] + +setup(**data) diff --git a/tests/extensions-x-extra/raw-html.html b/tests/extensions-x-extra/raw-html.html new file mode 100644 index 0000000..b2a7c4d --- /dev/null +++ b/tests/extensions-x-extra/raw-html.html @@ -0,0 +1,14 @@ +<div> + +<p><em>foo</em></p> +</div> + +<div class="baz"> + +<p><em>bar</em></p> +</div> + +<div> + +<p><em>blah</em></p> +</div>
\ No newline at end of file diff --git a/tests/extensions-x-extra/raw-html.txt b/tests/extensions-x-extra/raw-html.txt new file mode 100644 index 0000000..284fe0c --- /dev/null +++ b/tests/extensions-x-extra/raw-html.txt @@ -0,0 +1,12 @@ +<div markdown="1">_foo_</div> + +<div markdown=1 class="baz"> +_bar_ +</div> + +<div markdown> + +_blah_ + +</div> + diff --git a/tests/extensions-x-toc/nested.html b/tests/extensions-x-toc/nested.html new file mode 100644 index 0000000..a8a1583 --- /dev/null +++ b/tests/extensions-x-toc/nested.html @@ -0,0 +1,16 @@ +<h1 id="header-a">Header A</h1> +<h2 id="header-1">Header 1</h2> +<h3 id="header-i">Header i</h3> +<h1 id="header-b">Header B</h1> +<div class="toc"> +<ul> +<li><a href="#header-a">Header A</a><ul> +<li><a href="#header-1">Header 1</a><ul> +<li><a href="#header-i">Header i</a></li> +</ul> +</li> +</ul> +</li> +<li><a href="#header-b">Header B</a></li> +</ul> +</div>
\ No newline at end of file diff --git a/tests/extensions-x-toc/nested.txt b/tests/extensions-x-toc/nested.txt new file mode 100644 index 0000000..9b515f9 --- /dev/null +++ b/tests/extensions-x-toc/nested.txt @@ -0,0 +1,9 @@ +# Header A + +## Header 1 + +### Header i + +# Header B + +[TOC] diff --git a/tests/extensions-x-toc/nested2.html b/tests/extensions-x-toc/nested2.html new file mode 100644 index 0000000..bf87716 --- /dev/null +++ b/tests/extensions-x-toc/nested2.html @@ -0,0 +1,14 @@ +<div class="toc"> +<ul> +<li><a href="#start-with-header-other-than-one">Start with header other than one.</a></li> +<li><a href="#header-3">Header 3</a><ul> +<li><a href="#header-4">Header 4</a></li> +</ul> +</li> +<li><a href="#header-3_1">Header 3</a></li> +</ul> +</div> +<h3 id="start-with-header-other-than-one">Start with header other than one.</h3> +<h3 id="header-3">Header 3</h3> +<h4 id="header-4">Header 4</h4> +<h3 id="header-3_1">Header 3</h3>
\ No newline at end of file diff --git a/tests/extensions-x-toc/nested2.txt b/tests/extensions-x-toc/nested2.txt new file mode 100644 index 0000000..9db4d8c --- /dev/null +++ b/tests/extensions-x-toc/nested2.txt @@ -0,0 +1,10 @@ +[TOC] + +### Start with header other than one. + +### Header 3 + +#### Header 4 + +### Header 3 + diff --git a/tests/misc/raw_whitespace.html b/tests/misc/raw_whitespace.html new file mode 100644 index 0000000..7a6f131 --- /dev/null +++ b/tests/misc/raw_whitespace.html @@ -0,0 +1,8 @@ +<p>Preserve whitespace in raw html</p> +<pre> +class Foo(): + bar = 'bar' + + def baz(self): + print self.bar +</pre>
\ No newline at end of file diff --git a/tests/misc/raw_whitespace.txt b/tests/misc/raw_whitespace.txt new file mode 100644 index 0000000..bbc7cec --- /dev/null +++ b/tests/misc/raw_whitespace.txt @@ -0,0 +1,10 @@ +Preserve whitespace in raw html + +<pre> +class Foo(): + bar = 'bar' + + def baz(self): + print self.bar +</pre> + diff --git a/tests/misc/smart_em.html b/tests/misc/smart_em.html new file mode 100644 index 0000000..5683b25 --- /dev/null +++ b/tests/misc/smart_em.html @@ -0,0 +1,5 @@ +<p><em>emphasis</em></p> +<p>this_is_not_emphasis</p> +<p>[<em>punctuation with emphasis</em>]</p> +<p>[<em>punctuation_with_emphasis</em>]</p> +<p>[punctuation_without_emphasis]</p>
\ No newline at end of file diff --git a/tests/misc/smart_em.txt b/tests/misc/smart_em.txt new file mode 100644 index 0000000..3c56842 --- /dev/null +++ b/tests/misc/smart_em.txt @@ -0,0 +1,9 @@ +_emphasis_ + +this_is_not_emphasis + +[_punctuation with emphasis_] + +[_punctuation_with_emphasis_] + +[punctuation_without_emphasis] |