Grouping Measure

class tde.measures.grouping.Grouping(disc, output_folder=None, njobs=1)[source]

Bases: tde.measures.measures.Measure

Grouping measure

The grouping measures how pure the found clusters are, and is close to the ‘purity’ measure in clustering. See for a summary of all measures.

Input :param disc: Discovered Object, contains the discovered elements :param output_folder: string, path to the output folder :param njobs: Number of cpus to be used.

Output :param precision: Grouping Precision :param recall: Grouping Recall

property precision
property recall

Get all the gold pairs that can be created using the discovered intervals. The pairs are ordered by filename and onset.

Input :param intervals: a list of all the discovered intervals, with

their transcription

Output :param gold_pairs: a set of all the gold pairs created from the

discovered intervals


gold_types – all the types (n-gram) that occur in gold_pairs


Get all the pairs that were found. The pairs are ordered by filename and onset.

Input :param clusters: a dict of all the clusters found. the keys

are the clusters names, the values are a list of the intervals in this cluster

Output :param found_pairs: a set of all the discovered pairs

static get_weights(pairs)[source]

For each type get its weight

Input :params pairs: a set containing pairs of intervals, stored

as (filename, onset, offset, token_ngram, ngram), where token_ngram is the ngram with the timestamps of each of its phone, and ngram is just a tuple of all the phones

Output :return: weights, a dict that for each type (i.e. ngram)

gives its weight, which is computed as number_of_tokens(ngram)/total_number_of_seen_tokens counter, a dict that for each type (i.e. ngram) gives the number of tokens of this ngram in the pairs.


Compute the grouping by essentially counting the number of tokens of each type in three sets: the set of gold pairs, the set of found pairs, and the intersection of gold pairs and found pairs

property fscore