Utterances¶
Provides the Uttterance and Utterances classes
An utterance correspond to a sentence, or a speech segment, that is processed individually by an extraction pipeline. An utterance is defined by one of the following format:
- 2-uple: - <utterance-id> <audio-file>
- 3-uple: - <utterance-id> <audio-file> <speaker-id>
- 4-uple: - <utterance-id> <audio-file> <tstart> <tstop>
- 5-uple: - <utterance-id> <audio-file> <speaker-id> <tstart> <tstop>
Note
Most of shennong components (processors and post processors) work
directly on individual audio files. Utterances are used when training a
VtlnProcessor or extracting features from
a shennong.pipeline.
- 
shennong.utterances.VALID_FORMATS= {1: '<utterance-id> <audio-file>', 2: '<utterance-id> <audio-file> <speaker-id>', 3: '<utterance-id> <audio-file> <tstart> <tstop>', 4: '<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>'}¶
- The valid formats for an utterance, as detailed above 
- 
class shennong.utterances.Utterance(*args)[source]¶
- Bases: - object- Manage a single utterance - The class - Utterancemanages individual utterances and basically give access to their components: name, speaker, corresponding audio segment. The utterance must be defined by one of the formats defined above.- Parameters
- *args – The arguments must be 2, 3, 4 or 5. The number of arguments defines the utterance format and the signification of each positional argument (see - VALID_FORMATS)
- Raises
- ValueError – If the arguments are not 2, 3, 4 or 5, or if the utterance cannot be created from them (for instance the audio file is not readable) 
 - 
property format¶
- The utterance format code 
 - 
property name¶
- The utterance name, or <utterance-id> 
 - 
property audio_file¶
- The audio file attached to the utterance 
 - 
property speaker¶
- The utterance speaker, or None if no speaker information 
 - 
property tstart¶
- The utterance onset time in the audio file, or None 
 - 
property tstop¶
- The utterance offset time in the audio file, or None 
 - 
property duration¶
- The utterance duration in seconds 
 
- 
class shennong.utterances.Utterances(utterances)[source]¶
- Bases: - object- Manages a collection of - Utterance.- The - Utterancesmanages a collection of utterances and allows to iterate over the utterances by name or by speaker, as well as generating sub-utterances fit to a particular duration.- The following conditions apply: - All utterances in the collection must have the same format 
- All utterances must have a unique name 
 - Parameters
- utterances (list of - Utteranceor list of tuples) – The utterances to be stored
- Raises
- ValueError – If the utterances cannot be created because of the above conditions, or because one of the utterances if not valid 
 - 
classmethod load(filename)[source]¶
- Returns utterances loaded from a file - All the lines in the must conform to the same utterance format. - Parameters
- filename (str) – The file to load 
- Raises
- ValueError – If the - filenameis not found, if all the utterances do not have the same format, if all the <utterance-id> are not unique or if some defined utterances are not valid (audio file not found for instance).
 
 - 
save(filename)[source]¶
- Writes the utterances to file - Parameters
- filename (str) – The filename to write 
 
 - 
format(type=<class 'int'>)[source]¶
- Returns the utterances format - Parameters
- type (optional, int or str) – When int return the format code, when str returns it’s string representation 
- Raises
- ValueError – If - typeis not int or str
 
 - 
by_speaker()[source]¶
- Returns a dictionary of utterances indexed by speaker - The returned dictionary has speakers as keys and list of - Utteranceas values.- Raises
- ValueError – If there is no speaker information 
 
 - 
by_name()[source]¶
- Returns a dictonary of utterances indexed by name - The returned dictionary has utterance names as keys and - Utteranceinstances as values.
 - 
fit_to_duration(duration, truncate=False, shuffle=False)[source]¶
- Returns a subset of utterances, keeping - durationsec per speaker- Parameters
- duration (float) – The duration to keep per speaker, in seconds 
- truncate (bool, optional) – When True, truncate the the total duration to the one available if there is not enough data. When False, raise an error if the duration cannot be returned for a speaker. Default to False. 
- shuffle (bool, optional) – When True, shuffle the utterances before extracting segments. When False, take them in order. Default to False. 
 
- Returns
- utterances ( - Utterances) – The utterances segments fitting the given- durationfor each speaker
- Raises
- ValueError – If the utterances are not defined by speakers. When - durationis not strictly positive or, when- truncateis True, if a speaker has not enough data to build segments.