Utterances¶

Provides the Uttterance and Utterances classes

An utterance correspond to a sentence, or a speech segment, that is processed individually by an extraction pipeline. An utterance is defined by one of the following format:

2-uple: <utterance-id> <audio-file>
3-uple: <utterance-id> <audio-file> <speaker-id>
4-uple: <utterance-id> <audio-file> <tstart> <tstop>
5-uple: <utterance-id> <audio-file> <speaker-id> <tstart> <tstop>

Note

Most of shennong components (processors and post processors) work directly on individual audio files. Utterances are used when training a VtlnProcessor or extracting features from a shennong.pipeline.

shennong.utterances.VALID_FORMATS = {1: '<utterance-id> <audio-file>', 2: '<utterance-id> <audio-file> <speaker-id>', 3: '<utterance-id> <audio-file> <tstart> <tstop>', 4: '<utterance-id> <audio-file> <speaker-id> <tstart> <tstop>'}¶: The valid formats for an utterance, as detailed above

class shennong.utterances.Utterance(*args)[source]¶

Bases: object

Manage a single utterance

The class Utterance manages individual utterances and basically give access to their components: name, speaker, corresponding audio segment. The utterance must be defined by one of the formats defined above.

Parameters: *args – The arguments must be 2, 3, 4 or 5. The number of arguments defines the utterance format and the signification of each positional argument (see VALID_FORMATS)
Raises: ValueError – If the arguments are not 2, 3, 4 or 5, or if the utterance cannot be created from them (for instance the audio file is not readable)

property format¶: The utterance format code

property name¶: The utterance name, or <utterance-id>

property audio_file¶: The audio file attached to the utterance

property speaker¶: The utterance speaker, or None if no speaker information

property tstart¶: The utterance onset time in the audio file, or None

property tstop¶: The utterance offset time in the audio file, or None

property duration¶: The utterance duration in seconds

load_audio()[source]¶: Returns the utterance’s Audio data

class shennong.utterances.Utterances(utterances)[source]¶

Bases: object

Manages a collection of Utterance.

The Utterances manages a collection of utterances and allows to iterate over the utterances by name or by speaker, as well as generating sub-utterances fit to a particular duration.

The following conditions apply:

All utterances in the collection must have the same format
All utterances must have a unique name

Parameters: utterances (list of Utterance or list of tuples) – The utterances to be stored
Raises: ValueError – If the utterances cannot be created because of the above conditions, or because one of the utterances if not valid

classmethod load(filename)[source]¶

Returns utterances loaded from a file

All the lines in the must conform to the same utterance format.

Parameters: filename (str) – The file to load
Raises: ValueError – If the filename is not found, if all the utterances do not have the same format, if all the <utterance-id> are not unique or if some defined utterances are not valid (audio file not found for instance).

save(filename)[source]¶

Writes the utterances to file

Parameters: filename (str) – The filename to write

format(type=<class 'int'>)[source]¶

Returns the utterances format

Parameters: type (optional, int or str) – When int return the format code, when str returns it’s string representation
Raises: ValueError – If type is not int or str

has_speakers()[source]¶: Returns True if there is speaker information, False otherwise

by_speaker()[source]¶

Returns a dictionary of utterances indexed by speaker

The returned dictionary has speakers as keys and list of Utterance as values.

Raises: ValueError – If there is no speaker information

by_name()[source]¶

Returns a dictonary of utterances indexed by name

The returned dictionary has utterance names as keys and Utterance instances as values.

duration()[source]¶: Returns the total duration of the utterances in seconds

fit_to_duration(duration, truncate=False, shuffle=False)[source]¶

Returns a subset of utterances, keeping duration sec per speaker

Parameters

duration (float) – The duration to keep per speaker, in seconds
truncate (bool, optional) – When True, truncate the the total duration to the one available if there is not enough data. When False, raise an error if the duration cannot be returned for a speaker. Default to False.
shuffle (bool, optional) – When True, shuffle the utterances before extracting segments. When False, take them in order. Default to False.

Returns

utterances (Utterances) – The utterances segments fitting the given duration for each speaker

Raises

ValueError – If the utterances are not defined by speakers. When duration is not strictly positive or, when truncate is True, if a speaker has not enough data to build segments.