Miscellaneous utilities

This module provides miscellaneous utility functions, including backports of newer Python functionality (lazyproperty()), formatting functions, etc.

clldutils.misc.data_url(content, mimetype=None)[source]

Returns content encoded as base64 Data URI. Useful to include (smallish) media resources in HTML pages.

Parameters:
  • content (typing.Union[bytes, str, pathlib.Path]) – bytes or str or Path

  • mimetype (str) – mimetype of the content

Return type:

str

Returns:

str object (consisting only of ASCII, though)

clldutils.misc.log_or_raise(msg, log=None, level='warning', exception_cls=<class 'ValueError'>)[source]

Utility for check procedures. If log is None, this works like pytest -x, otherwise the issue is just logged with the appropriate level.

>>> from clldutils.misc import log_or_raise
>>> log_or_raise("there's a problem")
Traceback (most recent call last):
...
ValueError: there's a problem
>>> import logging
>>> log_or_raise("there's a problem", log=logging.getLogger(__name__))
there's a problem
Parameters:

msg (str) –

clldutils.misc.nfilter(seq)[source]

Replacement for python 2’s filter(None, seq).

Return type:

list

Returns:

a list filtered from seq containing only truthy items.

Parameters:

seq (typing.Iterable) –

clldutils.misc.to_binary(s, encoding='utf8')[source]

Cast function.

Parameters:

s (typing.Union[str, bytes]) – object to be converted to bytes.

Return type:

bytes

clldutils.misc.dict_merged(d, _filter=None, **kw)[source]

Update dictionary d with the items passed as kw if the value passes _filter.

>>> from clldutils.misc import dict_merged
>>> dict_merged({'a': 1}, b=2, c=3, _filter=lambda v: v > 2)
{'a': 1, 'c': 3}
clldutils.misc.NO_DEFAULT = <NoDefault>

A singleton which can be used to distinguish no-argument-passed from None passed as argument in callables with optional arguments.

clldutils.misc.xmlchars(text)[source]

Not all of UTF-8 is considered valid character data in XML …

Thus, this function can be used to remove illegal characters from text.

Parameters:

text (str) –

Return type:

str

clldutils.misc.format_size(num)[source]

Format byte-sizes for human readability.

Cf. the -h option of the du command:

-h, –human-readable print sizes in human readable format (e.g., 1K 234M 2G)

Parameters:

num (int) – Size given as number of bytes.

Return type:

str

clldutils.misc.slug(s, remove_whitespace=True, lowercase=True)[source]

Condenses a string to contain only (lowercase) alphanumeric characters.

>>> from clldutils.misc import slug
>>> slug('Some words!')
'somewords'
>>> slug('Some words!', lowercase=False)
'Somewords'
>>> slug('Some words!', remove_whitespace=False)
'some words'
Parameters:
  • s (str) –

  • remove_whitespace (bool) –

  • lowercase (bool) –

Return type:

str

clldutils.misc.encoded(string, encoding='utf-8')[source]

Cast string to bytes in a specific encoding - with some guessing about the encoding.

Parameters:
  • encoding – encoding which the object is forced to

  • string (typing.Union[str, bytes]) –

Return type:

bytes

class clldutils.misc.lazyproperty(fget)[source]

Non-data descriptor caching the computed result as instance attribute.

>>> class Spam(object):
...     @lazyproperty
...     def eggs(self):
...         return 'spamspamspam'
>>> spam=Spam(); spam.eggs
'spamspamspam'
>>> spam.eggs='eggseggseggs'; spam.eggs
'eggseggseggs'
>>> Spam().eggs
'spamspamspam'
>>> Spam.eggs  
<...lazyproperty object at 0x...>

Note

Since Python 3.8 added the functools.cached_property decorator (see https://docs.python.org/3/library/functools.html#functools.cached_property), this function will be deprecated once Python 3.7 is no longer supported.