diff options
author | Waylan Limberg <waylan@gmail.com> | 2011-07-27 10:54:12 -0400 |
---|---|---|
committer | Waylan Limberg <waylan@gmail.com> | 2011-07-27 10:54:12 -0400 |
commit | 1fbd6ebdcc913e4dae5030d35009d4d3bb803916 (patch) | |
tree | d6bbe4d2bea8ce03e1920de219b83b51b236696e /tests/misc/block_html5.txt | |
parent | 872f49b4a8e71d3e0fbaea972d964ae466eaeafe (diff) | |
download | markdown-1fbd6ebdcc913e4dae5030d35009d4d3bb803916.tar.gz markdown-1fbd6ebdcc913e4dae5030d35009d4d3bb803916.tar.bz2 markdown-1fbd6ebdcc913e4dae5030d35009d4d3bb803916.zip |
Stripped out encoding/decoding in the searializers.
Those extra steps always bothered me as being unnecessary. Additionally, this
should make conversion to Python 3 easier. The 2to3 tool wasn't converting
the searializers properly and we were getting byte strings in the output.
Previously, this wasn't a major problem because the default searializer was
the xml searializer provided in the ElementTree standard lib. However, now
that we are using our own xhtml searializer, it must work smoothly in all
supported versions.
As a side note, I believe the thought was that we needed to do the encoding to
take advantage of the "xmlcharrefreplace" error handling. However, using the
example in the python [docs](http://docs.python.org/howto/unicode.html#the-unicode-type):
>>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('utf-8', 'xmlcharrefreplace').decode('utf-8') == u
True
There's no point of using the "xmlcharrefreplace" error handling if we just
convert back to the original Unicode anyway. Interestingly, the Python 3
standard lib is doing essentially what we are doing here, so I'm convinced this
is the right way to go.
Diffstat (limited to 'tests/misc/block_html5.txt')
0 files changed, 0 insertions, 0 deletions