Text Syllabification

Note

wordseg.syllabification in python, wordseg-syll in bash.

Estimates syllable boundaries on a text using the maximal onset principle.

This algorithm fully syllabifies a text from a list of onsets and vowels. Input text must be in orthographic form (with word separators only) or in phonemized form (with both word and phone separators). Output text has syllable separators added at estimated syllable boundaries. For exemples of vowels and onsets files, see the directory wordseg/data/syllabification.

class wordseg.syllabification.Syllabifier(onsets, vowels, separator=<wordseg.separator.Separator object>, filling_vowel=False, log=<RootLogger root (WARNING)>)[source]

Bases: object

Syllabify a text given in phonological or orthographic form

Syllabification errors can occur when the onsets and/or vowels are not adapted to the input text (see the tolerant parameter).

Parameters
  • onsets (list) – The list of valid onsets in the text

  • vowels (list) – The list of vowels in the text

  • separator (Separator, optional) – Token separation in the text

  • silent (bool, optional) – When True, append a silent vowel to the end of words without vowel (the vowel is removed after processing so the text is unchanged). When False those words cannot be syllabified.

  • log (logging.Logger, optional) – Where to send log messages

Raises

ValueError – If onsets or vowels are empty or are not lists.

static open_datafile(data_file)[source]

Read a vowel or onsets file as a list

syllabify(text, strip=False, tolerant=False)[source]

Returns the text with syllable boundaries added

Parameters
  • text (sequence) – The input text to be syllabified. Each element of the sequence is assumed to be a single and complete utterance in valid phonological form.

  • strip (bool, optional) – When True, removes the syllable boundary at the end of words.

  • tolerant (bool, optional) – When False (the default), the function raise a ValueError on the first utterance that have not been correctly syllabified. When True, ignore the failed utterances in output but issue a log warning instead.

Returns

  • The text with estimated syllable boundaries added. If tolerant

  • is True some utterances may be missing in the output.

Raises

ValueError – If an utterance has not been correctly syllabified . If separator.syllable is found in the text, or if onsets or vowels are empty.