Segmentation Evaluation¶
Note
wordseg.evaluate
in python, wordseg-eval
in bash.
Word segmentation evaluation
Evaluates a segmented text against it’s gold version: outputs the precision, recall and f-score at type, token and boundary levels. We distinguish whether utterance edges (begin and end of the utterance) are counted towards the boundary performance or not.
The evaluation optionally computes the adjusted rank index (requires the prepared text to be provided) and a summary of which word come to be correctly segmented, or else segmented incorrectly (requires an output JSON file to be specified).
-
class
wordseg.evaluate.
BoundaryEvaluation
[source]¶ Bases:
wordseg.evaluate.TokenEvaluation
Evaluation of boundary f-score, precision and recall
Includes first and last boundary of an utterance
-
class
wordseg.evaluate.
BoundaryNoEdgeEvaluation
[source]¶ Bases:
wordseg.evaluate.BoundaryEvaluation
Evaluation of boundary f-score, precision and recall
Excludes first and last boundary of an utterance
-
class
wordseg.evaluate.
SegmentationSummary
[source]¶ Bases:
object
Computes a summary of the segmentation errors
The errors can be oversegmentations, undersegmentations or missegmentations. Correct segmentations are also reported.
-
summarize
(text, gold)[source]¶ Computes segmentation errors on a whole text
Call
summarize_utterance()
on each utterance of gold and text.- Parameters
text (list of str) – The list of utterances for the segmented text (to be evaluated)
gold (list of str) – The list of utterances for the gold text
- Raises
ValueError – If text and gold do not have the same number of utterances. If
summarize_utterance()
raise a ValueError.
-
summarize_utterance
(text, gold)[source]¶ Computes segmentation errors on a single utterance
This method returns no result but update the intern summary, accessible using
to_dict()
.- Parameters
text (str) – A segmented utterance
gold (str) – A gold utterance
- Raises
ValueError – If text and gold are mismatched, i.e. they do not contain the same suite of letters (once all the spaces removed).
-
-
class
wordseg.evaluate.
TokenEvaluation
[source]¶ Bases:
object
Evaluation of token f-score, precision and recall
-
class
wordseg.evaluate.
TypeEvaluation
[source]¶ Bases:
wordseg.evaluate.TokenEvaluation
Evaluation of type f-score, precision and recall
-
wordseg.evaluate.
compute_class_labels
(words, units)[source]¶ Compute class labels to be used for cluster similarity measures
Each word is considered a class, and each unit is mapped to the word it belongs to. This function is used as a preprocessing step for the Adjusted Rand Index.
- Parameters
words (list of str) – Utterances made of space separated words.
units (list of str) – Utterances made of space separated atomic units (phonemes or syllables).
- Returns
class_labels – Each unit mapped to the word it belongs to (with words coded as integers)
- Return type
numpy array of int
- Raises
ValueError: – If words and units do not match together
Examples
>>> from wordseg.evaluate import compute_class_labels >>> words = ['hello world', 'python'] >>> units = ['h el lo wo r ld', 'py th on'] >>> compute_class_labels(words, units) array([0, 0, 0, 1, 1, 1, 2, 2, 2])
-
wordseg.evaluate.
evaluate
(text, gold, units=None)[source]¶ Scores a segmented text against its gold version
- Parameters
text (sequence of str) – A suite of utterances made of space separated words.
gold (sequence of str) – A suite of utterances made of space separated words.
units (sequence of str, optional) – A suite of utterances made of space separated atomic units (phonemes or syllables). When specified, the function also computes the adjusted rand index.
- Returns
scores – A dictionary with the following entries in that fixed order:
’type_fscore’
’type_precision’
’type_recall’
’token_fscore’
’token_precision’
’token_recall’
’boundary_all_fscore’
’boundary_all_precision’
’boundary_all_recall’
’boundary_noedge_fscore’
’boundary_noedge_precision’
’boundary_noedge_recall’
If units is specified in arguments, this additional entry is added:
’adjusted_rand_index’
- Return type
ordered dict
- Raises
ValueError – If gold and text have different size or differ in tokens
-
wordseg.evaluate.
read_data
(text, separator=<wordseg.separator.Separator object>)[source]¶ Load text data for evaluation
- Parameters
text (list of str) – The list of utterances to read for the evaluation.
separator (Separator, optional) – Separators to tokenize the text with, default to space separated words.
- Returns
(words, positions, lexicon) – where words are the input utterances with word separators removed, positions stores the start/stop index of each word for each utterance, and lexicon is the list of words.
- Return type
three lists
-
wordseg.evaluate.
summary
(text, gold)[source]¶ Computes the summary of segmentation errors
This function is a simple wrapper on
SegmentationSummary
- Parameters
text (list of str) – The list of utterances for the segmented text (to be evaluated)
gold (list of str) – The list of utterances for the gold text
- Returns
summary – A dictionary with the complete summary in the following entries: ‘over’, ‘under’, ‘mis’, ‘correct’.
- Return type
dict
- Raises
ValueError – If text and gold do not match, or something went wrong during the summary computation.