aboutsummaryrefslogtreecommitdiffstats
path: root/markdown/inlinepatterns.py
Commit message (Collapse)AuthorAgeFilesLines
* smart_emphasis keyword > legacy_em extension.Waylan Limberg2018-07-311-11/+9
| | | | | | | | | The smart_strong extension has been removed and its behavior is now the default (smart em and smart strong are the default). The legacy_em extension restores legacy behavior (no smart em or smart strong). This completes the removal of keywords. All parser behavior is now modified by extensions, not by keywords on the Markdown class.
* Fix double escaping of amp in attributes (#670)Isaac Muse2018-07-291-2/+2
| | | | | | | | | | Serializer should only escape & in attributes if not part of & Better regex avoid Unicode and `_` in amp detection. In general, we don't want to escape already escaped content, but with code content, we want literal representations of escaped content, so have code content explicitly escape its content before placing in AtomicStrings. Closes #669.
* Consistent copyright headers.Waylan Limberg2018-07-271-0/+20
| | | | Fixes #435.
* All Markdown instances are now 'md'. (#691)Waylan Limberg2018-07-271-25/+29
| | | | | | | | | | | | Previously, instances of the Markdown class were represented as any one of 'md', 'md_instance', or 'markdown'. This inconsistency made it difficult when developing extensions, or just maintaining the existing code. Now, all instances are consistently represented as 'md'. The old attributes on class instances still exist, but raise a DeprecationWarning when accessed. Also on classes where the instance was optional, the attribute always exists now and is simply None if no instance was provided (previously the attribute wouldn't exist).
* Replace homegrown OrderedDict with purpose-built Registry. (#688)Waylan Limberg2018-07-271-23/+22
| | | | | | | | | | | | | | | | | | | All processors and patterns now get "registered" to a Registry. Each item is given a name (string) and a priority. The name is for later reference and the priority can be either an integer or float and is used to sort. Priority is sorted from highest to lowest. A Registry instance is a list-like iterable with the items auto-sorted by priority. If two items have the same priority, then they are listed in the order there were "registered". Registering a new item with the same name as an already registered item replaces the old item with the new item (however, the new item is sorted by its newly assigned priority). To remove an item, "deregister" it by name or index. A backwards compatible shim is included so that existing simple extensions should continue to work. DeprecationWarnings will be raised for any code which calls the old API. Fixes #418.
* Moved enable_attributes keyword to extension: legacy_attrs.Waylan Limberg2018-07-241-21/+1
| | | | | | | If you have existing documents that use the legacy attributes format, then you should enable the legacy_attrs extension for those documents. Everyone is encouraged to use the attr_list extension going forward. Closes #643. Work adapted from 0005d7a of the md3 branch.
* Flexible inline (#629)Isaac Muse2018-01-171-116/+367
| | | | Add new InlineProcessor class that handles inline processing much better and allows for more flexibility. This adds new InlineProcessors that no longer utilize unnecessary pretext and posttext captures. New class can accept the buffer that is being worked on and manually process the text without regex and return new replacement bounds. This helps us to handle links in a better way and handle nested brackets and logic that is too much for regular expression. The refactor also allows image links to have links/paths with spaces like links. Ref #551, #613, #590, #161.
* Removed some Py2.4-2.6 specific code.Waylan Limberg2018-01-111-14/+1
|
* Removed deprecated safe_mode.Waylan Limberg2018-01-111-58/+5
|
* Make sure regex patterns are raw strings (#614)Isaac Muse2018-01-021-1/+1
| | | Python 3.6 is starting to reject invalid escapes. Regular expression patterns should be raw strings to avoid having regex escapes being mistaken for invalid string escapes. Fixes #611.
* Feature ancestry (#598)Isaac Muse2017-11-231-0/+2
| | | | | Ancestry exclusion for inline patterns. Adds the ability for an inline pattern to define a list of ancestor tag names that should be avoided. If a pattern would create a descendant of one of the listed tag names, the pattern will not match. Fixes #596.
* Fix new flake8 722 errorfacelessuser2017-10-261-1/+1
|
* fix DeprecationWarning: invalid escape sequenced9pouces2017-07-251-3/+3
|
* Fix typo s/Goggle/Google/Tim Chase2017-06-031-1/+1
|
* Better inline code escaping (#533)Isaac Muse2017-01-201-5/+9
| | | | | This aims to escape code in a more expected fashion. This handles when backticks are escaped and when the escapes before backticks are escaped.
* Add blank lines after toplevel function definitions.Dmitry Shachnev2016-11-181-0/+1
| | | | This fixes warnings with pycodestyle ≥ 2.1, see PyCQA/pycodestyle#400.
* Fix image titles not following specfacelessuser2016-07-261-1/+1
| | | | | Don’t allow spaces in image links. This was also causing an issue where any text following a space was treated as a title. Ref #484.
* Ensure InlinePatterns don't drop newlines.Waylan Limberg2015-11-061-1/+1
| | | | | | Drppoed the non-greedy quantifier from the end of the inlinePatterns as it served no useful purpose and was actually (in very rare edge cases) causing newlines to be dropped. FIxes #439. Thanks to @munificent for the report.
* No binary operators at begining of line.Waylan Limberg2015-02-181-6/+6
| | | | | | | Apparently this is a new requirement of flake8. That's the thing about using tox. Every test run reinstalls all dependencies so an updated dependency might instroduce new errors. I could specify a specific version, but I like staying current.
* Flake8 cleanup (mostly whitespace).Waylan Limberg2014-11-201-49/+95
| | | | | | Got all but a couple files in the tests (ran out of time today). Apparently I have been using some bad form for years (although a few things seemed to look better before the update). Anyway, conformant now.
* Issue #365 Bold/Italic nesting fixfacelessuser2014-11-171-2/+2
| | | | | | | | | | | The logic for the current regex for strong/em and em/strong was sound, but the way it was implemented caused some unintended side effects. Whether it is a quirk with regex in general or just with Python’s re engine, I am not sure. Put basically `(\*|_){3}` causes issues with nested bold/italic. So, allowing the group to be defined, and then using the group number to specify the remaining sequential chars is a better way that works more reliably `(\*|_)\2{2}. Test from issue #365 was also added to check for this case in the future.
* Better nested STRONG EM support.Waylan Limberg2014-09-261-2/+6
| | | | | | | | | Fixes #253. Thanks to @facelessuser for the tests. Although I removed a bunch of weird ones (even some that passed) from his PR (#342). For the most part, there is no definitive way for those to be parsed. So there is no point of testing for them. In most of those situations, authors should be mixing underscores and astericks so it is clear what is intended.
* Fix the lost tail issue in inlineprocessors.facelessuser2014-09-261-8/+8
| | | | | | See #253. Prior to this patch, if any inline processors returned an element with a tail, the tail would end up empty. This resolves that issue and will allow for #253 to be fixed. Thanks to @facelessuser for the work on this.
* Removed some old codeWaylan Limberg2014-08-251-4/+1
| | | | | | These couple lines were from an old - no longer used - method of stashing inlines. There is no need for it today. The if statement would never evaluate True.
* Mark a few more lines with 'no cover' - missed them the first time through. ↵Waylan Limberg2014-07-111-4/+4
| | | | The rest should have test cases added.
* Marked a bunch of lines as 'no cover'. Coverage at 91%Waylan Limberg2014-07-111-6/+6
|
* No longer percent encode spaces in urls.Waylan Limberg2014-01-091-1/+0
| | | | | | | | | | | The current implementation was wrong as it also percent encoded query strings (which should be plus encoded) and calling urllib.quote on the path (and urllib.quote_plus on the query string) assumes the url is not already encoded. What if the document author pasted a url that was already encoded? She probably did not intend for `%20` to become `%2520`. Or did she? It is now clear to me why many implementation do nothing to urls. Just pass them though as-is. To bad if they are not valid HTML. HTML authors have to encodee their own urls, so I guess markdown authors have to as well.
* Only escape ESCAPED_CHARS.Waylan Limberg2014-01-091-1/+1
| | | | | | Leave all other chars prefaced by a backslash alone. Fixes #242. Not sure why I thought that I needed to add another backslash. Thanks for the report and the test case @mhubig.
* Fixed parsing of brackets within inline image titles.Darell Tan2014-01-051-1/+1
|
* Future imports go after the docstringsAdam Dinwoodie2013-03-181-1/+1
| | | | | | | | | A `from __future__ import ...` statement must go after any docstrings; since putting them before the docstring means the docstring loses its magic and just becomes a string literal. That then causes a syntax error if there are further future statements after the false docstring. This fixes issue #203, using the patch provided by @Arfrever.
* Now using universal code for Python 2 & 3.Waylan Limberg2013-02-271-14/+16
| | | | | | | | | | The most notable changes are the use of unicode_literals and absolute_imports. Actually, absolute_imports was the biggest deal as it gives us relative imports. For the first time extensions import markdown relative to themselves. This allows other packages to embed the markdown lib in a subdir of their project and still be able to use our extensions.
* Whitelisted known safe url schemes in safe_mode. A better fix for #185.Waylan Limberg2013-02-061-6/+7
|
* Forbid javascript:// URLs in safe modePhilipp Hagemeister2013-02-051-0/+3
|
* Enable attributes inside image referencesAdam Backstrom2013-01-271-0/+4
|
* Fix all pyflakes unused-import/unused-variable warningsDmitry Shachnev2012-11-091-1/+0
|
* Fix silly typo in previous commit.Waylan Limberg2012-11-011-1/+1
|
* A better fix for #155. Unescaping inline placholders now returns the text ↵Waylan Limberg2012-11-011-6/+19
| | | | only of an Element - rather than the html which just gets html escaped in the output anyway.
* Fixed #154. Inline placeholders in img alt text are now unescaped.Waylan Limberg2012-11-011-1/+1
|
* Fixed #155. Early unescaping of inline placeholders now works when the ↵Waylan Limberg2012-11-011-1/+5
| | | | placeholder is an Elementtree Element.
* Fixed #153. Two spaces at end of paragraph is not a linebreak.Waylan Limberg2012-10-211-2/+0
|
* Fixed #152. Spaces in links are now escaped.Waylan Limberg2012-10-211-0/+1
|
* Misc typos.chri2012-08-281-1/+1
|
* Always use Markdown's serializers.Waylan Limberg2012-01-201-1/+1
| | | | | Not only does this ensure that all output matches the output_format, but it is nessecary to run in Python 3.
* Inline html is now escaped by the searializer.Waylan Limberg2012-01-191-1/+2
| | | | | | | | Final fix to issue introduced in fix for #59. Weird stuff inside angle brackets now also work in safe_mode='escape'. We just did the same thing as with block html, let the (x)html searializer do the escaping. Tests updated including the standalone test moved to match the non-escape cases.
* Partial fix for issue introduced in fix for #59Waylan Limberg2012-01-191-2/+7
| | | | | Markdown markup inside angle bracktes now gets rendered properly in all cases except when safe_mode='escape'. Also added tests.
* Fixed #59. Raw HTML parsing is no longer slow.Waylan Limberg2012-01-181-2/+13
| | | | | | Replaced the unescape method I carlessly threw in the RawHtmlProcessor. Unfortunetly, this reintroduces the bug just fixed in commit 425fde141f17973aea0a3a85e44632fe18737996 Sigh!
* Fix logic bug introduced in 35930e0928e19...Mike Dirolf2012-01-141-1/+1
|
* Fixed #69. url_sanitize no longer crashes on unparsable urls.Waylan Limberg2012-01-151-9/+18
| | | | | | | | | | Also optimized the code to bypass parsing when not in safe_mode and return immediately upon failure rather than continue parsing when in safe_mode. Note that in Python2.7+ more urls may fail than in older versions because IPv6 support was added to urlparse and it apparently mistakenly identifies some urls as IPv6 when they are not. Seeing this only applies to safe_mode now, I don't really care.
* Allow UPPERCASE urls in auto links.Waylan Limberg2011-10-061-1/+1
|
* Fixed #39. Refactored escaping so that it only escapes a predifined set of ↵Waylan Limberg2011-08-171-3/+13
| | | | chars (the set defined by JG in the syntax rules). All other backslashes are passed through unaltered by the parser. If extensions want to add to the escapable chars, they can append to the list at markdown.ESCAPED_CHARS.