aboutsummaryrefslogtreecommitdiffstats
path: root/markdown/preprocessors.py
Commit message (Collapse)AuthorAgeFilesLines
* Move isBlockLevel to class. (#693)Waylan Limberg2018-07-311-3/+3
| | | | | Allows users and/or extensions to alter the list of block level elements. The old implementation remains with a DeprecationWarning. Fixes #575.
* Consistent copyright headers.Waylan Limberg2018-07-271-0/+20
| | | | Fixes #435.
* All Markdown instances are now 'md'. (#691)Waylan Limberg2018-07-271-21/+21
| | | | | | | | | | | | Previously, instances of the Markdown class were represented as any one of 'md', 'md_instance', or 'markdown'. This inconsistency made it difficult when developing extensions, or just maintaining the existing code. Now, all instances are consistently represented as 'md'. The old attributes on class instances still exist, but raise a DeprecationWarning when accessed. Also on classes where the instance was optional, the attribute always exists now and is simply None if no instance was provided (previously the attribute wouldn't exist).
* Replace homegrown OrderedDict with purpose-built Registry. (#688)Waylan Limberg2018-07-271-5/+4
| | | | | | | | | | | | | | | | | | | All processors and patterns now get "registered" to a Registry. Each item is given a name (string) and a priority. The name is for later reference and the priority can be either an integer or float and is used to sort. Priority is sorted from highest to lowest. A Registry instance is a list-like iterable with the items auto-sorted by priority. If two items have the same priority, then they are listed in the order there were "registered". Registering a new item with the same name as an already registered item replaces the old item with the new item (however, the new item is sorted by its newly assigned priority). To remove an item, "deregister" it by name or index. A backwards compatible shim is included so that existing simple extensions should continue to work. DeprecationWarnings will be raised for any code which calls the old API. Fixes #418.
* Correct spelling mistakes.Edward Betts2018-01-131-1/+1
|
* Removed deprecated safe_mode.Waylan Limberg2018-01-111-2/+1
|
* Fix raw html reference issue (#585)Isaac Muse2018-01-041-0/+3
| | | | | | | | | | | | | | Preserve the line which a reference was on to prevent raw HTML indexing issue. Fixes #584. Prevent raw HTML parsing issue in abbr and footnotes Peserve abbreviation line when stripping and preserve a line for each footnote block. Footnotes should also accumulate the extraneous padding. Test extra lines at the end of references Strip the gathered extraneous whitespace When processing footnotes, we don't actually care to process the extra whitespace at the end of a footnote, but we want it to calculate lines to preserve.
* Fix HTML parse with empty lines (#537)Isaac Muse2017-01-241-1/+7
| | | | | | | If both open and close was not found in first block, additional blocks were evaluated without context of previous blocks. The algorithm needs to evaluate a buffer with the left bracket present. So feed in all items and get the right bracket, then adjust the data_index to be relative to the last block. Fixes #452.
* Fix infinite loop #430facelessuser2015-09-041-3/+4
| | | | | | | | | This should fix the remaining corner cases that can cause infinite loops. Previous iterations did not account for scenarios where the “end” index was less than the “start” index. If the “end” index is ever less than or equal to the “start” index, the “end” will be adjusted to to be “start” + 1 allow the full range to be extracted and replaced.
* No binary operators at begining of line.Waylan Limberg2015-02-181-5/+2
| | | | | | | Apparently this is a new requirement of flake8. That's the thing about using tox. Every test run reinstalls all dependencies so an updated dependency might instroduce new errors. I could specify a specific version, but I like staying current.
* Flake8 cleanup (mostly whitespace).Waylan Limberg2014-11-201-26/+32
| | | | | | Got all but a couple files in the tests (ran out of time today). Apparently I have been using some bad form for years (although a few things seemed to look better before the update). Anyway, conformant now.
* Issue #368: Fix Markdown in raw HTML stops workingfacelessuser2014-11-191-2/+3
| | | | | | | Originally there was an infinite loop issue that was patched in issue #308. Unfortunately, it was fixed all the way. This fix patches the infinite loop fix to only add an offset to the `right_listindex` when it is in a infinite loop scenario.
* Marked a bunch of lines as 'no cover'. Coverage at 91%Waylan Limberg2014-07-111-1/+1
|
* Fix issue308 and fix (unrelated) failure to break out of nest loop.ryneeverett2014-05-211-3/+3
|
* Improved multiline comment parsing.Waylan Limberg2014-01-121-4/+3
| | | | | | | | | | | | | | | Fixes #257 and slightly alters comment parsing behavior. Unlike self-closing tags, a comment can contain angle brackets between the opening and closing tags. The greaterthan angle bracket at the end of the first block should not be mistaken for closing the comment. Need to actually check for a comment closing tag (`-->`). If one if not found, then the comment keeps going (to the end of the document if nessecary) just like in HTML. That last bit is a slight change from previous behavior, but should be unsurprising as that's how broswers parse html comments. And as far as I can tell, more implementations follow this behavior than any other. The ones that don't seem to be all over the place.
* Miscellaneous improvements and bug fixes.ryneeverett2013-11-191-8/+14
|
* Issue #52ryneeverett2013-10-141-40/+78
|
* Future imports go after the docstringsAdam Dinwoodie2013-03-181-1/+1
| | | | | | | | | A `from __future__ import ...` statement must go after any docstrings; since putting them before the docstring means the docstring loses its magic and just becomes a string literal. That then causes a syntax error if there are further future statements after the false docstring. This fixes issue #203, using the patch provided by @Arfrever.
* Now using universal code for Python 2 & 3.Waylan Limberg2013-02-271-2/+4
| | | | | | | | | | The most notable changes are the use of unicode_literals and absolute_imports. Actually, absolute_imports was the biggest deal as it gives us relative imports. For the first time extensions import markdown relative to themselves. This allows other packages to embed the markdown lib in a subdir of their project and still be able to use our extensions.
* Preserve all blank lines in code blocks.Waylan Limberg2013-02-141-1/+1
| | | | | | | | | | | | Fixes #183. Finally got this working properly. The key was using a regex substitution with non-overlapping matches that removed all whitespace from the begining of *all* blank lines when normalizing whitespace. Once I got that, I could simplfy the EmptyBlockProcessor and easily output one or two blank lines appropriately. A blank block gets two new lines (`'\n\n'`), while a block which starts with a newline gets one.
* Moved whitespace normalization to a preprocessor.Waylan Limberg2013-02-081-0/+13
| | | | | | | | | | | | | | Fixes #150 - at least as much as I'm willing to. This allows whitespace normalization to be overridable by the extension API. Yes, I realize that most other processors will also proabably need to be overniriden to work with any differant whitespace normalization - but I'm okay with that. As pointed out in #150, some processors have the tab length hardcoded in regexes. I'm willing to accept a working patch that fixes that - and keeps the regexes easy to override in a subclass (the provded patch moved them inside the __init__ method - which is not so easy to override in a subclass)). However, that is about the only additional change I'm willing to consider for this issue.
* HtmlBlockProcessor preserves empty linesWaylan Limberg2013-02-071-1/+1
| | | | | | | | | | | | | Partial fix for #183. This has the same effect on empty lines in code blocks as not using the html processor at all (which was eating some of the missing newlines as reported in issue #183). By doing `rsplit('\n\n')` the third newline (in each set of three) always ends up at the end of a block, rather than the begining - which it less of an issue for the html processor. Also updated tests to indicate final intended output, although they do not fully pass yet.
* Fixed #78. Added support for two line link refs.Waylan Limberg2012-02-021-13/+14
| | | | | Also refactored the reference preprocessor to make this a little easier to implement. Regex does more now.
* Fixed #76. HTML attrs are a dict not a tuple. Silly typo.Waylan Limberg2012-01-301-1/+1
|
* Fixed #75. Right tags in raw html are more properly identified.Waylan Limberg2012-01-301-1/+1
|
* When safe mode is 'escape', don't allow bad html to stop further processing.Mike Dirolf2012-01-141-1/+2
| | | | | | | | | | See tests/html4_safe/html_then_blockquote.(txt|html). It looks like having unclosed block-level html elements was causing further processing not to happen, even in the case where we're escaping HTML. Since we're escaping HTML, it seems like it shouldn't affect processing at all. This changes output results in a couple of other tests, but the new output seems reasonable to me.
* Fixed #68. Blank line is not required after html comments.Waylan Limberg2012-01-151-14/+10
| | | | | Interestingly, the change to the misc/mismatched-tags test is inline with PHP Markdown Extra's behavior but not markdown.pl, which produces invalid html.
* Fixed #57. Multiline HTML Blocks no longer require a blank line after them.Waylan Limberg2012-01-151-1/+8
|
* Fixed #70. Empty anglebrackets '<>' are now properly recognized as raw html.Waylan Limberg2012-01-151-1/+1
|
* Fixed the bug exposed in 8761cd1780a7cec60123. We no longer should get empty ↵Waylan Limberg2011-07-211-2/+3
| | | | rawhtml blocks. All tests pass again.
* Refactored fix- created method from local function for search. Fixes ticket 62.Gerry LaMontagne2010-08-311-20/+20
|
* Further improvements to closing tag search in HtmlBlockProcessorsGerry LaMontagne2010-08-301-7/+11
|
* Replaced block.rfind in _get_right_tag with custom search function.Gerry LaMontagne2010-08-291-1/+15
|
* Fixed Ticket 65. Lines with only a lessthan sign (<) no longer crash the ↵Waylan Limberg2010-07-141-2/+2
| | | | raw html parser. Fixed a related but I found while debugging this as well. Also added tests for both.
* Factored out the building of the various processors and patterns into ↵Waylan Limberg2010-07-071-1/+9
| | | | utility functions called by a build_parser method on the Markdown class. Editing of the processors and patterns now all happen in one file for each type. Additionaly, a subclass of Markdown could potentially override the build_parser method and build a parser for a completely differant markup language without first building the default and then overriding it.
* Moved HtmlStash and base Prosessor classes to the new util module.Waylan Limberg2010-07-061-43/+1
|
* Rename misc.py to util.py at the request of upstreamToshio Kuratomi2010-07-051-6/+6
|
* Break cyclic import of markdown. This allows people to embed markdownToshio Kuratomi2010-07-051-8/+9
| | | | if they desire.
* Fixed ticket 59. Reference links now strip angle brackets from the url.Waylan Limberg2010-03-251-2/+3
|
* Added processing of markdown text within raw html to the 'extra' extension. ↵Waylan Limberg2010-01-031-2/+4
| | | | Fixes Ticket 39. NOTE: I did not add a seperate extension which only adds this feature - it is only available as part of 'extra'.
* Cleanup and additional work on previous commit. NOTE: removed special ↵Waylan Limberg2010-01-031-17/+61
| | | | treatment if raw <div>s with multiple line breaks - they no longer automagicly process their content as markdown. This matches other implementations. Finished rest of code for use by an extension - to be added later.
* Fixed Ticket 48. Quoted attributes in raw html are specificly ackowledged ↵Waylan Limberg2010-01-031-9/+39
| | | | | | now - allowing various arbitrary stuff (like x/html to be included without breaking the rawhtml parser. Although currently unused, the code also provides the parsed attributes as a dict. Should be useful for adding support for parsing markdown text within rawhtml in an extension.
* Fixed ticket 44. Raw HTML now maintains original whitespace. Important ↵Waylan Limberg2010-01-031-1/+1
| | | | inside raw <pre> tags.
* Fixed ticket 32. When raw html starts a line, the raw html is only broken ↵Waylan Limberg2009-05-131-8/+9
| | | | into a seperate block if it is a block tag. Inline tags are left in the block for the inline pattern as they should be.
* Cleaned up recent refactor into a package from a single file.Waylan Limberg2008-11-201-0/+214