title:      Library Reference
prev_title: Installation
prev_url:   install.html
next_title: Command Line
next_url:   cli.html


Using Markdown as a Python Library
==================================

First and foremost, Python-Markdown is intended to be a python library module
used by various projects to convert Markdown syntax into HTML.

The Basics
----------

To use markdown as a module:

    import markdown
    html = markdown.markdown(your_text_string)

The Details
-----------

Python-Markdown provides two public functions ([`markdown.markdown`](#markdown)
and [`markdown.markdownFromFile`](#markdownFromFile)) both of which wrap the 
public class [`markdown.Markdown`](#Markdown). If you're processing one 
document at a time, the functions will serve your needs. However, if you need 
to process multiple documents, it may be advantageous to create a single 
instance of the `markdown.Markdown` class and pass multiple documents through 
it.

### `markdown.markdown (text [, **kwargs])` {: #markdown }

The following options are available on the `markdown.markdown` function:

* __`text`__{: #text } (required): The source unicode string.

    !!! note "Important"
        Python-Markdown expects **Unicode** as input (although
        some simple ASCII strings *may* work) and returns output as Unicode.
        Do not pass encoded strings to it! If your input is encoded, (e.g. as 
        UTF-8), it is your responsibility to decode it.  For example:

            input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
            text = input_file.read()
            html = markdown.markdown(text)

        If you want to write the output to disk, you *must* encode it yourself:

            output_file = codecs.open("some_file.html", "w", 
                                      encoding="utf-8", 
                                      errors="xmlcharrefreplace"
            )
            output_file.write(html)

* __`extensions`__{: #extensions }: A list of extensions.

    Python-Markdown provides an [API](extensions/api.html) for third parties to
    write extensions to the parser adding their own additions or changes to the
    syntax. A few commonly used extensions are shipped with the markdown 
    library. See the [extension documentation](extensions/index.html) for a 
    list of available extensions.

    The list of extensions may contain instances of extensions and/or strings 
    of extension names. 

        extensions=[MyExtension(), 'path.to.my.ext', 'extra']
    
    When passing in extension instances, each class instance must be a subclass
    of `markdown.extensions.Extension` and any configuration options should be 
    defined when initiating the class instance rather than using the 
    [extension_configs](#extension_configs) keyword. For example:

        from markdown.extensions import Extension
        class MyExtension(Extension):
            # define your extension here...
    
        markdown.markdown(text, extensions=[MyExtension(configs={'option': 'value'}))

    If an extension name is provided as a string, the extension must be 
    importable as a python module on your PYTHONPATH. Python's dot notation is 
    supported. Therefore, to import the 'extra' extension, one could do 
    `extensions=['markdown.extensions.extra']` However, if no dots are provided 
    in the string (`extensions=['extra']`) Markdown will first look for the 
    module `markdown.extensions.extra` (the built-in extension), then a module 
    named `mdx_extra` ('mdx_' will be appended to the beginning of the string) 
    at the root of your PYTHONPATH. 

    When loading an extension by name (as a sting), you may either pass in
    configuration settings to the extension using the 
    [extension_configs](#extension_configs) keyword or by appending the 
    settings to the name in the following format:

        extensions=['name(option1=value,option2=value)']
    
    Note that there are no quotes or whitespace in the above format, which 
    severely limits how it can be used. For more complex settings, it is 
    suggested that the [extension_configs](#extension_configs) keyword
    be used or an instance of a class be passed in.

    !!! seealso "See Also"
        See the documentation of the [Extension API](extensions/api.html) for 
        assistance in creating extensions.

* __`extension_configs`__{: #extension_configs }: A dictionary of
  configuration settings for extensions.

    Any configuration settings will only be passed to extensions loaded by name 
    (as a string). When loading extensions as class instances, pass the 
    configuration settings directly to the class when initializing it. 
    
    The dictionary of configuration settings must be in the following format:

        extension_configs = {'extension_name_1':
                               [
                                  ('option_1', 'value_1'),
                                  ('option_2', 'value_2')
                               ],
                             'extension_name_2':
                               [
                                  ('option_1', 'value_1')
                               ]
                            }
    See the documentation specific to the extension you are using for help in 
    specifying configuration settings for that extension.

* __`output_format`__{: #output_format }: Format of output. 

    Supported formats are:

    * `"xhtml1"`: Outputs XHTML 1.x. **Default**.
    * `"xhtml5"`: Outputs XHTML style tags of HTML 5
    * `"xhtml"`: Outputs latest supported version of XHTML (currently XHTML 1.1).
    * `"html4"`: Outputs HTML 4
    * `"html5"`: Outputs HTML style tags of HTML 5
    * `"html"`: Outputs latest supported version of HTML (currently HTML 4).

    The values can be in either lowercase or uppercase.

    !!! warning
        It is suggested that the more specific formats ("xhtml1", "html5", & 
        "html4") be used as the more general formats ("xhtml" or "html") may 
        change in the future if it makes sense at that time. 

* __`safe_mode`__{: #safe_mode }: Disallow raw html.

    If you are using Markdown on a web system which will transform text 
    provided by untrusted users, you may want to use the "safe_mode" 
    option which ensures that the user's HTML tags are either replaced, 
    removed or escaped. (They can still create links using Markdown syntax.)
    
    The following values are accepted:

    * `False` (Default): Raw HTML is passed through unaltered.

    * `replace`: Replace all HTML blocks with the text assigned to 
      `html_replacement_text` To maintain backward compatibility, setting 
      `safe_mode=True` will have the same effect as `safe_mode='replace'`.   

        To replace raw HTML with something other than the default, do:

            md = markdown.Markdown(safe_mode='replace', 
                               html_replacement_text='--RAW HTML NOT ALLOWED--')

    * `remove`: All raw HTML will be completely stripped from the text with
      no warning to the author.

    * `escape`: All raw HTML will be escaped and included in the document.

        For example, the following source:

            Foo <b>bar</b>.

        Will result in the following HTML:

            <p>Foo &lt;b&gt;bar&lt;/b&gt;.</p>

    !!! Note 
        "safe_mode" also alters the default value for the 
        [`enable_attributes`](#enable_attributes) option.
    
    !!! seealso "See Also"
        HTML sanitizers (like [Bleach]) may provide a better solution for 
        dealing with markdown text submitted by untrusted users. That way, 
        both the HTML generated by Markdown and user submited raw HTML are 
        fully sanitized.
    
            import markdown
            import bleach
            html = bleach.clean(markdown.markdown(evil_text))
    
[Bleach]: https://github.com/jsocol/bleach

* __`html_replacement_text`__{: #html_replacement_text }: Text used when 
  safe_mode is set to `replace`. Defaults to `[HTML_REMOVED]`.

* __`tab_length`__{: #tab_length }: Length of tabs in the source. Default: 4

* __`enable_attributes`__{: #enable_attributes}: Enable the conversion of 
  attributes. Defaults to `True`, unless [`safe_mode`](#safe_mode) is enabled, 
  in which case the default is `False`.

    !!! Note 
        `safe_mode` only overrides the default. If `enable_attributes` 
        is explicitly set, the explicit value is used regardless of `safe_mode`.
        However, this could potentially allow an untrusted user to inject
        JavaScript into your documents.

* __`smart_emphasis`__{: #smart_emphasis }: Treat `_connected_words_` 
  intelligently Default: True

* __`lazy_ol`__{: #lazy_ol }: Ignore number of first item of ordered lists. 
  Default: True

    Given the following list:

        4. Apples
        5. Oranges
        6. Pears

    By default markdown will ignore the fact the the first line started 
    with item number "4" and the HTML list will start with a number "1".
    If `lazy_ol` is set to `True`, then markdown will output the following
    HTML:

        <ol>
          <li start="4">Apples</li>
          <li>Oranges</li>
          <li>Pears</li>
        </ol>


### `markdown.markdownFromFile (**kwargs)` {: #markdownFromFile }

With a few exceptions, `markdown.markdownFromFile` accepts the same options as 
`markdown.markdown`. It does **not** accept a `text` (or Unicode) string. 
Instead, it accepts the following required options:

* __`input`__{: #input } (required): The source text file.

    `input` may be set to one of three options:

    * a string which contains a path to a readable file on the file system,
    * a readable file-like object,
    * or `None` (default) which will read from `stdin`.

* __`output`__{: #output }: The target which output is written to.

    `output` may be set to one of three options:

    * a string which contains a path to a writable file on the file system,
    * a writable file-like object,
    * or `None` (default) which will write to `stdout`.

* __`encoding`__{: #encoding }: The encoding of the source text file. Defaults 
  to "utf-8". The same encoding will always be used for input and output. 
  The 'xmlcharrefreplace' error handler is used when encoding the output.

    !!! Note 
        This is the only place that decoding and encoding of unicode
        takes place in Python-Markdown. If this rather naive solution does not
        meet your specific needs, it is suggested that you write your own code
        to handle your encoding/decoding needs.

### `markdown.Markdown ([**kwargs])` {: #Markdown }

The same options are available when initializing the `markdown.Markdown` class
as on the [`markdown.markdown`](#markdown) function, except that the class does
**not** accept a source text string on initialization. Rather, the source text
string must be passed to one of two instance methods:

* `Markdown.convert(source)`{: #convert }

    The `source` text must meet the same requirements as the [`text`](#text) 
    argument of the [`markdown.markdown`](#markdown) function.

    You should also use this method if you want to process multiple strings
    without creating a new instance of the class for each string.

        md = markdown.Markdown()
        html1 = md.convert(text1)
        html2 = md.convert(text2)

    Note that depending on which options and/or extensions are being used,
    the parser may need its state reset between each call to `convert`.

        html1 = md.convert(text1)
        md.reset()
        html2 = md.convert(text2)
    
    You can also change calls to `reset` together:
    
        html3 = md.reset().convert(text3)

* `Markdown.convertFile(**kwargs)`{: #convertFile }

    The arguments of this method are identical to the arguments of the same
    name on the `markdown.markdownFromFile` function ([`input`](#input), 
    [`output`](#output), and [`encoding`](#encoding)). As with the 
    [`convert`](#convert) method, this method should be used to 
    process multiple files without creating a new instance of the class for 
    each document. State may need to be `reset` between each call to 
    `convertFile` as is the case with `convert`.