Transitional Probabilities

Note

wordseg.algo.tp in python, wordseg-tp in bash.

Transitional Probabilities word segmentation

wordseg.algos.tp.segment(text, train_text=None, threshold='relative', dependency='ftp', log=<RootLogger root (WARNING)>)[source]

Returns a word segmented version of text using the TP algorithm

The parameters text and train_text must be formatted as follows: A

sequence of lines with syllable (or phoneme) boundaries marked by spaces and no word boundaries. Each line in the sequence corresponds to a single and complete utterance

Parameters
  • text (sequence) – The text to segment into words

  • train_text (sequence, optional) – The text used to train model on (estimation of transition probabilities). If not specified use the text.

  • threshold (str, optional) – Type of threshold to use, must be ‘relative’ or ‘absolute’.

  • dependency (str, optional) – Type of dependency measure to compute, must be ‘ftp’ for forward transitional probability, ‘btp’ for backward transitional probability or ‘mi’ for mutual information.

  • log (logging.Logger, optional) – The logging instance where to send messages.

Returns

The utterances from text with estimated words boundaries.

Return type

list

Raises

ValueError – If threshold is not ‘relative’ or ‘absolute’. If dependency is not ‘ftp’, ‘btp’ or ‘mi’.