Transitional Probabilities¶

Note

wordseg.algo.tp in python, wordseg-tp in bash.

Transitional Probabilities word segmentation

wordseg.algos.tp.segment(text, train_text=None, threshold='relative', dependency='ftp', log=<RootLogger root (WARNING)>)[source]¶

Returns a word segmented version of text using the TP algorithm

The parameters text and train_text must be formatted as follows: A: sequence of lines with syllable (or phoneme) boundaries marked by spaces and no word boundaries. Each line in the sequence corresponds to a single and complete utterance

Parameters

text (sequence) – The text to segment into words
train_text (sequence, optional) – The text used to train model on (estimation of transition probabilities). If not specified use the text.
threshold (str, optional) – Type of threshold to use, must be ‘relative’ or ‘absolute’.
dependency (str, optional) – Type of dependency measure to compute, must be ‘ftp’ for forward transitional probability, ‘btp’ for backward transitional probability or ‘mi’ for mutual information.
log (logging.Logger, optional) – The logging instance where to send messages.

Returns

The utterances from text with estimated words boundaries.

Return type

list

Raises

ValueError – If threshold is not ‘relative’ or ‘absolute’. If dependency is not ‘ftp’, ‘btp’ or ‘mi’.