Abstract base classes¶

This section documents the abstract base classes at the top of the inheritance tree in shennong.

Base processor
Base processor
Base post-processor

Base processor ¶

Base classes for all shennong components

class shennong.base.BaseProcessor[source]¶

Bases: object

Base class for all processors in shennong

Notes

All processors should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

The methods get_params() and set_params() are adapted from sklearn.base.BaseEstimator

abstract property name¶: Processor name

property log¶: Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')[source]¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

get_params(deep=True)[source]¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

set_params(**params)[source]¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

Base processor ¶

This module implements the speech features extraction models (processors)

A speech features processor takes an audio signal as input and output features:

Audio –> FeaturesProcessor –> Features

class shennong.processor.base.FeaturesProcessor[source]¶

Bases: shennong.base.BaseProcessor

Base class of all the features extraction models

abstract property name¶: Name of the processor

abstract property ndims¶: Dimension of the output features frames

get_properties(**kwargs)[source]¶: Return the processors properties as a dictionary

abstract process(signal)[source]¶

Returns features processed from an input signal

Parameters: signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on
Returns: features (Features) – The computed features

process_all(utterances, njobs=None, **kwargs)[source]¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

property log¶: Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

class shennong.processor.base.FramesProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True)[source]¶

Bases: shennong.processor.base.FeaturesProcessor

A base class for frame based features processors.

Wrap the kaldi frames implementation. See [kaldi-frame].

References

kaldi-frame: http://kaldi-asr.org/doc/structkaldi_1_1FrameExtractionOptions.html

property sample_rate¶

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

property frame_shift¶: Frame shift in seconds

property frame_length¶: Frame length in seconds

property dither¶

Amount of dithering

0.0 means no dither

property preemph_coeff¶: Coefficient for use in signal preemphasis

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)¶: Return the processors properties as a dictionary

property log¶: Processor logger

abstract property name¶: Name of the processor

abstract property ndims¶: Dimension of the output features frames

abstract process(signal)¶

Returns features processed from an input signal

Parameters: signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on
Returns: features (Features) – The computed features

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset¶: If True, subtract mean from waveform on each frame

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property window_type¶

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property round_to_power_of_two¶

If true, round window size to power of two

This is done by zero-padding input to FFT

property blackman_coeff¶

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property snip_edges¶

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)[source]¶: Returns the times label for the rows given by process()

class shennong.processor.base.MelFeaturesProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500)[source]¶

Bases: shennong.processor.base.FramesProcessor

A base class for mel-based features processors

The mel-based features are MFCC, PLP and filterbanks. The class implement common options for processing those features. See [kaldi-mel] and [kaldi-frame-2].

References

kaldi-frame-2: http://kaldi-asr.org/doc/structkaldi_1_1FrameExtractionOptions.html
kaldi-mel: http://kaldi-asr.org/doc/structkaldi_1_1MelBanksOptions.html

property blackman_coeff¶

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither¶

Amount of dithering

0.0 means no dither

property frame_length¶: Frame length in seconds

property frame_shift¶: Frame shift in seconds

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)¶: Return the processors properties as a dictionary

property log¶: Processor logger

abstract property name¶: Name of the processor

abstract property ndims¶: Dimension of the output features frames

property preemph_coeff¶: Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset¶: If True, subtract mean from waveform on each frame

property round_to_power_of_two¶

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate¶

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property snip_edges¶

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)¶: Returns the times label for the rows given by process()

property window_type¶

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property num_bins¶

Number of triangular mel-frequency bins

The minimal number of bins is 3

property low_freq¶: Low cutoff frequency for mel bins in Hertz

property high_freq¶

High cutoff frequency for mel bins in Hertz

If high_freq < 0, offset from the Nyquist frequency

property vtln_low¶

Low inflection point in piecewise linear VTLN warping function

In Hertz

property vtln_high¶

High inflection point in piecewise linear VTLN warping function

In Hertz. If vtln_high < 0, offset from high_freq

process(signal, vtln_warp=1.0)[source]¶

Compute features with the specified options

Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.

Parameters

signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono
vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.

Returns

features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.

Base post-processor ¶

A post-processor takes features as input and output new features:

Features –> FeaturesPostProcessor –> Features

class shennong.postprocessor.base.FeaturesPostProcessor[source]¶

Bases: shennong.processor.base.FeaturesProcessor

Base class of all features post-processors

abstract process(features)[source]¶: Returns features post-processed from input features

get_properties(features)[source]¶: Return the processors properties as a dictionary

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

property log¶: Processor logger

abstract property name¶: Name of the processor

abstract property ndims¶: Dimension of the output features frames

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

Abstract base classes¶

Base processor¶

Base processor¶

Base post-processor¶

Base processor ¶

Base processor ¶

Base post-processor ¶