Discovered Data Readers

handles reading of discovered elements from a Term Discovery system output

Discovered object contains dictionnary of clusters with all the intervals (for ned and grouping), and list of all the found intervals (cov, token, type, boundary)

class file format is:

Class 1: wav1 on1 off1 wav2 on2 off2

class

Disc represents all the discovered intervals.

The discovered elements can be represented in 3 ways, depending on the usage: :param intervals: a list of all the discovered intervals :param intervals_tree: an interval tree containing all the discovered intervals :param clusters: a dictionary where all the keys are class numbers, and the

values are all the intervals for that class

class tde.readers.disc_reader.Disc(disc_path=None, gold=None)[source]

Bases: object

Read the discovered intervals

Raises
  • AssertionError

    • if incorrect interval found (offset greater than onset) - if two classes have the same class number

  • ValueError

    • if discovered file is not found - if discovered file is is wrong format

read_clusters()[source]

Read discovered clusters

Returns a dictionnary { class_number : [intervals_found]} that gives a list of the intervals for each class_number as key. The intervals are represented as a tuple:

(fname: str, name of the speaker disc_on: float, onset of the interval disc_off: float, offset of the interval token_ngram: tuple, each discovered phone from the interval, with

their onset and offsets,

ngram: tuple, each)

Raises
  • AssertionError

    • if incorrect interval found (offset greater than onset) - if two classes have the same class number

  • ValueError

    • if a line is badly formated

read_intervals_tree()[source]

Read discovered intervals as interval tree

static get_transcription(fname, disc_on, disc_off, gold_phn)[source]

Given an interval, get its phone transcription

Parameters
  • fname (str, name of the speaker on the interval) –

  • disc_on (float, onset of the interval) –

  • disc_off (float, offset of the interval) –

  • gold_phn (intervaltree, contains the gold phones) –

Returns

  • token_ngram (list of tuples, list of all the) – (onset, offset, phone) covered by request interval

  • ngram (list, list of all the phones covered by request interval)