Utterances¶
Provides the Uttterance and Utterances classes
An utterance correspond to a sentence, or a speech segment, that is processed individually by an extraction pipeline. An utterance is defined by one of the following format:
2-uple:
<utterance-id> <audio-file>3-uple:
<utterance-id> <audio-file> <speaker-id>4-uple:
<utterance-id> <audio-file> <tstart> <tstop>5-uple:
<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>
Note
Most of shennong components (processors and post processors) work
directly on individual audio files. Utterances are used when training a
VtlnProcessor or extracting features from
a shennong.pipeline.
-
shennong.utterances.VALID_FORMATS= {1: '<utterance-id> <audio-file>', 2: '<utterance-id> <audio-file> <speaker-id>', 3: '<utterance-id> <audio-file> <tstart> <tstop>', 4: '<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>'}¶ The valid formats for an utterance, as detailed above
-
class
shennong.utterances.Utterance(*args)[source]¶ Bases:
objectManage a single utterance
The class
Utterancemanages individual utterances and basically give access to their components: name, speaker, corresponding audio segment. The utterance must be defined by one of the formats defined above.- Parameters
*args – The arguments must be 2, 3, 4 or 5. The number of arguments defines the utterance format and the signification of each positional argument (see
VALID_FORMATS)- Raises
ValueError – If the arguments are not 2, 3, 4 or 5, or if the utterance cannot be created from them (for instance the audio file is not readable)
-
property
format¶ The utterance format code
-
property
name¶ The utterance name, or <utterance-id>
-
property
audio_file¶ The audio file attached to the utterance
-
property
speaker¶ The utterance speaker, or None if no speaker information
-
property
tstart¶ The utterance onset time in the audio file, or None
-
property
tstop¶ The utterance offset time in the audio file, or None
-
property
duration¶ The utterance duration in seconds
-
class
shennong.utterances.Utterances(utterances)[source]¶ Bases:
objectManages a collection of
Utterance.The
Utterancesmanages a collection of utterances and allows to iterate over the utterances by name or by speaker, as well as generating sub-utterances fit to a particular duration.The following conditions apply:
All utterances in the collection must have the same format
All utterances must have a unique name
- Parameters
utterances (list of
Utteranceor list of tuples) – The utterances to be stored- Raises
ValueError – If the utterances cannot be created because of the above conditions, or because one of the utterances if not valid
-
classmethod
load(filename)[source]¶ Returns utterances loaded from a file
All the lines in the must conform to the same utterance format.
- Parameters
filename (str) – The file to load
- Raises
ValueError – If the
filenameis not found, if all the utterances do not have the same format, if all the <utterance-id> are not unique or if some defined utterances are not valid (audio file not found for instance).
-
save(filename)[source]¶ Writes the utterances to file
- Parameters
filename (str) – The filename to write
-
format(type=<class 'int'>)[source]¶ Returns the utterances format
- Parameters
type (optional, int or str) – When int return the format code, when str returns it’s string representation
- Raises
ValueError – If
typeis not int or str
-
by_speaker()[source]¶ Returns a dictionary of utterances indexed by speaker
The returned dictionary has speakers as keys and list of
Utteranceas values.- Raises
ValueError – If there is no speaker information
-
by_name()[source]¶ Returns a dictonary of utterances indexed by name
The returned dictionary has utterance names as keys and
Utteranceinstances as values.
-
fit_to_duration(duration, truncate=False, shuffle=False)[source]¶ Returns a subset of utterances, keeping
durationsec per speaker- Parameters
duration (float) – The duration to keep per speaker, in seconds
truncate (bool, optional) – When True, truncate the the total duration to the one available if there is not enough data. When False, raise an error if the duration cannot be returned for a speaker. Default to False.
shuffle (bool, optional) – When True, shuffle the utterances before extracting segments. When False, take them in order. Default to False.
- Returns
utterances (
Utterances) – The utterances segments fitting the givendurationfor each speaker- Raises
ValueError – If the utterances are not defined by speakers. When
durationis not strictly positive or, whentruncateis True, if a speaker has not enough data to build segments.