Handling SIL Standard Format (SFM) files

SIL’s Standard Format (SFM) files are used natively for Toolbox. Applications which can export in a SFM format include ELAN and Flex. In absence of other export formats, the need to read or write SFM files persists.

The (somewhat simplistic) SFM implementation provided in this module supports

  • multiline values

  • custom entry separator

Usage:

>>> from clldutils.sfm import SFM
>>> sfm = SFM.from_string('''\ex Yax bo’on ta sna Antonio.
... \exEn I’m going to Antonio’s house.
... \ex Ban yax ba’at?
... \exEn Where are you going?
... \exFr Ou allez-vous?''')
>>> sfm[0].markers()
Counter({'ex': 2, 'exEn': 2, 'exFr': 1})
>>> sfm[0].get('exFr')
'Ou allez-vous?'
class clldutils.sfm.Entry(iterable=(), /)[source]

We store entries in SFM files as lists of (marker, value) pairs.

markers()[source]

Map of markers to frequency counts.

Return type:

collections.Counter

get(key, default=None)[source]

Retrieve the first value for a marker or None.

Return type:

str

getall(key)[source]

Retrieve all values for a marker.

Return type:

typing.List[str]

class clldutils.sfm.SFM(iterable=(), /)[source]

A list of Entries

Simple usage to normalize a sfm file:

>>> sfm = SFM.from_file(fname, marker_map={'lexeme': 'lx'})
>>> sfm.write(fname)
classmethod from_file(filename, **kw)[source]

Initialize a SFM object from the contents of a file.

classmethod from_string(text, marker_map=None, entry_impl=<class 'clldutils.sfm.Entry'>, **kw)[source]

Initialize a SFM object from a SFM formatted string.

Parameters:
  • text (str) –

  • marker_map (typing.Optional[typing.Dict[str, str]]) –

read(filename, encoding='utf-8', marker_map=None, entry_impl=<class 'clldutils.sfm.Entry'>, entry_sep='\\n\\n', entry_prefix=None, keep_empty=False)[source]

Extend the entry list by parsing new entries from a file.

Parameters:
  • filename

  • encoding

  • marker_map (typing.Optional[typing.Dict[str, str]]) – A dict used to map marker names.

  • entry_impl – Subclass of Entry or None

  • entry_sep (str) –

  • entry_prefix (typing.Optional[str]) –

  • keep_empty (bool) –

visit(visitor)[source]

Run visitor on each entry.

Parameters:

visitor (typing.Callable) –

write(filename, encoding='utf-8')[source]

Write the list of entries to a file.

Parameters:
  • filename

  • encoding