Discovered Data Readers¶
handles reading of discovered elements from a Term Discovery system output
Discovered object contains dictionnary of clusters with all the intervals (for ned and grouping), and list of all the found intervals (cov, token, type, boundary)
class file format is:
Class 1: wav1 on1 off1 wav2 on2 off2
- class
Disc represents all the discovered intervals.
The discovered elements can be represented in 3 ways, depending on the usage: :param intervals: a list of all the discovered intervals :param intervals_tree: an interval tree containing all the discovered intervals :param clusters: a dictionary where all the keys are class numbers, and the
values are all the intervals for that class
-
class
tde.readers.disc_reader.
Disc
(disc_path=None, gold=None)[source]¶ Bases:
object
Read the discovered intervals
- Raises
AssertionError –
if incorrect interval found (offset greater than onset) - if two classes have the same class number
ValueError –
if discovered file is not found - if discovered file is is wrong format
-
read_clusters
()[source]¶ Read discovered clusters
Returns a dictionnary { class_number : [intervals_found]} that gives a list of the intervals for each class_number as key. The intervals are represented as a tuple:
(fname: str, name of the speaker disc_on: float, onset of the interval disc_off: float, offset of the interval token_ngram: tuple, each discovered phone from the interval, with
their onset and offsets,
ngram: tuple, each)
- Raises
AssertionError –
if incorrect interval found (offset greater than onset) - if two classes have the same class number
ValueError –
if a line is badly formated
-
static
get_transcription
(fname, disc_on, disc_off, gold_phn)[source]¶ Given an interval, get its phone transcription
- Parameters
fname (str, name of the speaker on the interval) –
disc_on (float, onset of the interval) –
disc_off (float, offset of the interval) –
gold_phn (intervaltree, contains the gold phones) –
- Returns
token_ngram (list of tuples, list of all the) – (onset, offset, phone) covered by request interval
ngram (list, list of all the phones covered by request interval)