Tools for working with marked-up text

class clldutils.markup.Table(*cols, **kw)[source]

A context manager to

  • aggregate rows in a table

  • which will be printed on exit.

>>> with Table('col1', 'col2', tablefmt='simple') as t:
...     t.append(['v1', 'v2'])
...
col1    col2
------  ------
v1      v2

For more control of the table rendering, a Table can be used without a with statement, calling Table.render() instead:

>>> t = Table('col1', 'col2')
>>> t.extend([['z', 1], ['a', 2]])
>>> print(t.render(sortkey=lambda r: r[0], tablefmt='simple'))
col1      col2
------  ------
a            2
z            1
Parameters:

cols (str) –

render(sortkey=None, condensed=True, verbose=False, reverse=False, **kw)[source]
Parameters:
  • sortkey – A callable which can be used as key when sorting the rows.

  • condensed – Flag signalling whether whitespace padding should be collapsed.

  • verbose – Flag signalling whether to output additional info.

  • reverse – Flag signalling whether we should sort in reverse order.

  • kw – Additional keyword arguments are passed to the tabulate function.

Returns:

String representation of the table in the chosen format.

clldutils.markup.iter_markdown_tables(text)[source]

Parse tables from a markdown formatted text.

Parameters:

text (str) – markdown formatted text.

Return type:

typing.Generator[typing.Tuple[typing.List[str], typing.List[typing.List[str]]], None, None]

Returns:

generator of (header, rows) pairs, where “header” is a list of column names and rows is a list of lists of row values.

clldutils.markup.iter_markdown_sections(text)[source]

Parse sections from a markdown formatted text.

Note

We only recognize the “#” syntax for marking section headings.

Parameters:

text (str) – markdown formatted text.

Return type:

typing.Generator[typing.Tuple[int, str, str], None, None]

Returns:

generator of (level, header, content) pairs, where “level” is an int, “header” is the exact section heading (including “#”s and newline) or None and “content” the markdown text of the section.

clldutils.markup.add_markdown_text(text, new, section=None)[source]

Append markdown text to a (specific section of a) markdown document.

Parameters:
  • text (str) – markdown formatted text.

  • new (str) – markdown formatted text to be inserted into text.

  • section (typing.Union[typing.Callable, str, None]) – optionally specifies a section to which to append new. section can either be a str and then specifies the first section with a header containing section as substring; or a callable and then specifies the first section for which section returns a truthy value when passed the section header. If None, new will be appended at the end.

  • text

  • new

Return type:

str

Returns:

markdown formatted text resulting from inserting new in text.

Raises:

ValueError – The specified section was not encountered.

Functionality to detect and manipulate links in markdown text.

Note

Link detection is limited to links with no nested square brackets in the label and no nested round brackets in the url. See MarkdownLink.replace() for further limitations.

Usage:

>>> MarkdownLink.replace('[](http://example.com)', lambda ml: ml.update_url(scheme='https'))
'[l](https://example.com)'
update_url(**comps)[source]

Updates the MarkdownLink.url according to comps.

Parameters:

comps – Recognized keywords are the names of the components of a named tuple as returned by urllib.parse.urlparse. Values should be str or dict for the keyword query.

Returns:

Updated MarkdownLink instance.

classmethod replace(md, repl, simple=True, markdown_kw=None)[source]

Replace links in a markdown document.

Parameters:
  • md (str) – Markdown text.

  • repl (typing.Callable) – A callable accepting a MarkdownLink instance as sole argument. Its return value is passed to str to create the replacement content for the link.

  • simple (bool) – Flag signaling whether to use simplistic link detection or not.

  • markdown_kw (typing.Optional[dict]) – dict of keyword arguments to be used with markdown.markdown to fine-tune link detection.

Return type:

str

Returns:

Updated markdown text

Note

The default link detection is rather simplistic and does not ignore markdown links in code blocks, etc. To force more accurate (but computationally expensive) link detection, pass simple=False when calling this method (and possibly use markdown_kw) to make link detection aware of particular options of the markdown implementation you want to use the output with. See https://python-markdown.github.io/reference/#markdown for details.

>>> from clldutils.markup import MarkdownLink
>>> md = '''abc
...
...     [label](url)
...
... def'''
>>> print(MarkdownLink.replace(md, lambda ml: ml.update_url(path='xyz')))
abc

    [label](xyz)

def
>>> print(MarkdownLink.replace(
...     md, lambda ml: ml.update_url(path='xyz'), simple=False))
abc

    [label](url)

def
>>> md = '''abc
... ~~~
... [label](url)
... ~~~
... def'''
>>> print(MarkdownLink.replace(
...     md, lambda ml: ml.update_url(path='xyz'), simple=False))
abc
~~~
[label](xyz)
~~~
def
>>> print(MarkdownLink.replace(
...     md,
...     lambda ml: ml.update_url(path='xyz'),
...     simple=False,
...     markdown_kw=dict(extensions=['fenced_code'])))
abc
~~~
[label](url)
~~~
def

Limitations: “Real” links are detected by running markdown.markdown and extracting a stack of links from the resulting HTML tags. Then “candidate” links are matched against these links in order. Thus, if the same link appears in a code block first and in regular text after, we will get it wrong:

>>> md = '''abc
...
...     [label](url)
...
... [label](url)'''
>>> print(MarkdownLink.replace(
...     md, lambda ml: ml.update_url(path='xyz'), simple=False))
abc

    [label](xyz)

[label](url)