Grouping(disc, output_folder=None, njobs=1)¶
The grouping measures how pure the found clusters are, and is close to the ‘purity’ measure in clustering. See https://docs.syntheticlearner.net/tde/measures/index.html for a summary of all measures.
Input :param disc: Discovered Object, contains the discovered elements :param output_folder: string, path to the output folder :param njobs: Number of cpus to be used.
Output :param precision: Grouping Precision :param recall: Grouping Recall
Get all the gold pairs that can be created using the discovered intervals. The pairs are ordered by filename and onset.
Input :param intervals: a list of all the discovered intervals, with
Output :param gold_pairs: a set of all the gold pairs created from the
gold_types – all the types (n-gram) that occur in gold_pairs
Get all the pairs that were found. The pairs are ordered by filename and onset.
Input :param clusters: a dict of all the clusters found. the keys
are the clusters names, the values are a list of the intervals in this cluster
Output :param found_pairs: a set of all the discovered pairs
For each type get its weight
Input :params pairs: a set containing pairs of intervals, stored
as (filename, onset, offset, token_ngram, ngram), where token_ngram is the ngram with the timestamps of each of its phone, and ngram is just a tuple of all the phones
Output :return: weights, a dict that for each type (i.e. ngram)
gives its weight, which is computed as number_of_tokens(ngram)/total_number_of_seen_tokens counter, a dict that for each type (i.e. ngram) gives the number of tokens of this ngram in the pairs.
Compute the grouping by essentially counting the number of tokens of each type in three sets: the set of gold pairs, the set of found pairs, and the intersection of gold pairs and found pairs