Utterances¶
Provides the Uttterance
and Utterances
classes
An utterance correspond to a sentence, or a speech segment, that is processed individually by an extraction pipeline. An utterance is defined by one of the following format:
2-uple:
<utterance-id> <audio-file>
3-uple:
<utterance-id> <audio-file> <speaker-id>
4-uple:
<utterance-id> <audio-file> <tstart> <tstop>
5-uple:
<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>
Note
Most of shennong
components (processors and post processors) work
directly on individual audio files. Utterances are used when training a
VtlnProcessor
or extracting features from
a shennong.pipeline
.
-
shennong.utterances.
VALID_FORMATS
= {1: '<utterance-id> <audio-file>', 2: '<utterance-id> <audio-file> <speaker-id>', 3: '<utterance-id> <audio-file> <tstart> <tstop>', 4: '<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>'}¶ The valid formats for an utterance, as detailed above
-
class
shennong.utterances.
Utterance
(*args)[source]¶ Bases:
object
Manage a single utterance
The class
Utterance
manages individual utterances and basically give access to their components: name, speaker, corresponding audio segment. The utterance must be defined by one of the formats defined above.- Parameters
*args – The arguments must be 2, 3, 4 or 5. The number of arguments defines the utterance format and the signification of each positional argument (see
VALID_FORMATS
)- Raises
ValueError – If the arguments are not 2, 3, 4 or 5, or if the utterance cannot be created from them (for instance the audio file is not readable)
-
property
format
¶ The utterance format code
-
property
name
¶ The utterance name, or <utterance-id>
-
property
audio_file
¶ The audio file attached to the utterance
-
property
speaker
¶ The utterance speaker, or None if no speaker information
-
property
tstart
¶ The utterance onset time in the audio file, or None
-
property
tstop
¶ The utterance offset time in the audio file, or None
-
property
duration
¶ The utterance duration in seconds
-
class
shennong.utterances.
Utterances
(utterances)[source]¶ Bases:
object
Manages a collection of
Utterance
.The
Utterances
manages a collection of utterances and allows to iterate over the utterances by name or by speaker, as well as generating sub-utterances fit to a particular duration.The following conditions apply:
All utterances in the collection must have the same format
All utterances must have a unique name
- Parameters
utterances (list of
Utterance
or list of tuples) – The utterances to be stored- Raises
ValueError – If the utterances cannot be created because of the above conditions, or because one of the utterances if not valid
-
classmethod
load
(filename)[source]¶ Returns utterances loaded from a file
All the lines in the must conform to the same utterance format.
- Parameters
filename (str) – The file to load
- Raises
ValueError – If the
filename
is not found, if all the utterances do not have the same format, if all the <utterance-id> are not unique or if some defined utterances are not valid (audio file not found for instance).
-
save
(filename)[source]¶ Writes the utterances to file
- Parameters
filename (str) – The filename to write
-
format
(type=<class 'int'>)[source]¶ Returns the utterances format
- Parameters
type (optional, int or str) – When int return the format code, when str returns it’s string representation
- Raises
ValueError – If
type
is not int or str
-
by_speaker
()[source]¶ Returns a dictionary of utterances indexed by speaker
The returned dictionary has speakers as keys and list of
Utterance
as values.- Raises
ValueError – If there is no speaker information
-
by_name
()[source]¶ Returns a dictonary of utterances indexed by name
The returned dictionary has utterance names as keys and
Utterance
instances as values.
-
fit_to_duration
(duration, truncate=False, shuffle=False)[source]¶ Returns a subset of utterances, keeping
duration
sec per speaker- Parameters
duration (float) – The duration to keep per speaker, in seconds
truncate (bool, optional) – When True, truncate the the total duration to the one available if there is not enough data. When False, raise an error if the duration cannot be returned for a speaker. Default to False.
shuffle (bool, optional) – When True, shuffle the utterances before extracting segments. When False, take them in order. Default to False.
- Returns
utterances (
Utterances
) – The utterances segments fitting the givenduration
for each speaker- Raises
ValueError – If the utterances are not defined by speakers. When
duration
is not strictly positive or, whentruncate
is True, if a speaker has not enough data to build segments.