Grouping Measure

class tde.measures.grouping.Grouping(disc, output_folder=None, njobs=1)[source]

Bases: tde.measures.measures.Measure

Grouping measure

The grouping measures how pure the found clusters are, and is close to the ‘purity’ measure in clustering. See https://docs.cognitive-ml.fr/tde/measures/index.html for a summary of all measures.

Input :param disc: Discovered Object, contains the discovered elements :param output_folder: string, path to the output folder :param njobs: Number of cpus to be used.

Output :param precision: Grouping Precision :param recall: Grouping Recall

property precision
property recall
get_gold_pairs()[source]

Get all the gold pairs that can be created using the discovered intervals. The pairs are ordered by filename and onset.

Input :param intervals: a list of all the discovered intervals, with

their transcription

Output :param gold_pairs: a set of all the gold pairs created from the

discovered intervals

Parameters

gold_types – all the types (n-gram) that occur in gold_pairs

get_found_pairs()[source]

Get all the pairs that were found. The pairs are ordered by filename and onset.

Input :param clusters: a dict of all the clusters found. the keys

are the clusters names, the values are a list of the intervals in this cluster

Output :param found_pairs: a set of all the discovered pairs

static get_weights(pairs)[source]

For each type get its weight

Input :params pairs: a set containing pairs of intervals, stored

as (filename, onset, offset, token_ngram, ngram), where token_ngram is the ngram with the timestamps of each of its phone, and ngram is just a tuple of all the phones

Output :return: weights, a dict that for each type (i.e. ngram)

gives its weight, which is computed as number_of_tokens(ngram)/total_number_of_seen_tokens counter, a dict that for each type (i.e. ngram) gives the number of tokens of this ngram in the pairs.

compute_grouping()[source]

Compute the grouping by essentially counting the number of tokens of each type in three sets: the set of gold pairs, the set of found pairs, and the intersection of gold pairs and found pairs

property fscore
write_score()