wordseg.algo.tp in python,
wordseg-tp in bash.
Transitional Probabilities word segmentation
segment(text, train_text=None, threshold='relative', dependency='ftp', log=<RootLogger root (WARNING)>)¶
Returns a word segmented version of text using the TP algorithm
- The parameters text and train_text must be formatted as follows: A
sequence of lines with syllable (or phoneme) boundaries marked by spaces and no word boundaries. Each line in the sequence corresponds to a single and complete utterance
text (sequence) – The text to segment into words
train_text (sequence, optional) – The text used to train model on (estimation of transition probabilities). If not specified use the text.
threshold (str, optional) – Type of threshold to use, must be ‘relative’ or ‘absolute’.
dependency (str, optional) – Type of dependency measure to compute, must be ‘ftp’ for forward transitional probability, ‘btp’ for backward transitional probability or ‘mi’ for mutual information.
log (logging.Logger, optional) – The logging instance where to send messages.
The utterances from text with estimated words boundaries.
- Return type
ValueError – If threshold is not ‘relative’ or ‘absolute’. If dependency is not ‘ftp’, ‘btp’ or ‘mi’.