Abstract base classes

This section documents the abstract base classes at the top of the inheritance tree in shennong.

Base processor

Base classes for all shennong components

class shennong.base.BaseProcessor[source]

Bases: object

Base class for all processors in shennong

Notes

All processors should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

The methods get_params() and set_params() are adapted from sklearn.base.BaseEstimator

abstract property name

Processor name

property log

Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')[source]

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

get_params(deep=True)[source]

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

Base processor

This module implements the speech features extraction models (processors)

A speech features processor takes an audio signal as input and output features:

Audio –> FeaturesProcessor –> Features

class shennong.processor.base.FeaturesProcessor[source]

Bases: shennong.base.BaseProcessor

Base class of all the features extraction models

abstract property name

Name of the processor

abstract property ndims

Dimension of the output features frames

get_properties(**kwargs)[source]

Return the processors properties as a dictionary

abstract process(signal)[source]

Returns features processed from an input signal

Parameters

signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on

Returns

features (Features) – The computed features

process_all(utterances, njobs=None, **kwargs)[source]

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

property log

Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

class shennong.processor.base.FramesProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True)[source]

Bases: shennong.processor.base.FeaturesProcessor

A base class for frame based features processors.

Wrap the kaldi frames implementation. See [kaldi-frame].

References

kaldi-frame

http://kaldi-asr.org/doc/structkaldi_1_1FrameExtractionOptions.html

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

property frame_shift

Frame shift in seconds

property frame_length

Frame length in seconds

property dither

Amount of dithering

0.0 means no dither

property preemph_coeff

Coefficient for use in signal preemphasis

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)

Return the processors properties as a dictionary

property log

Processor logger

abstract property name

Name of the processor

abstract property ndims

Dimension of the output features frames

abstract process(signal)

Returns features processed from an input signal

Parameters

signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on

Returns

features (Features) – The computed features

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset

If True, subtract mean from waveform on each frame

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)[source]

Returns the times label for the rows given by process()

class shennong.processor.base.MelFeaturesProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500)[source]

Bases: shennong.processor.base.FramesProcessor

A base class for mel-based features processors

The mel-based features are MFCC, PLP and filterbanks. The class implement common options for processing those features. See [kaldi-mel] and [kaldi-frame-2].

References

kaldi-frame-2

http://kaldi-asr.org/doc/structkaldi_1_1FrameExtractionOptions.html

kaldi-mel

http://kaldi-asr.org/doc/structkaldi_1_1MelBanksOptions.html

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)

Return the processors properties as a dictionary

property log

Processor logger

abstract property name

Name of the processor

abstract property ndims

Dimension of the output features frames

property preemph_coeff

Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)

Returns the times label for the rows given by process()

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property num_bins

Number of triangular mel-frequency bins

The minimal number of bins is 3

property low_freq

Low cutoff frequency for mel bins in Hertz

property high_freq

High cutoff frequency for mel bins in Hertz

If high_freq < 0, offset from the Nyquist frequency

property vtln_low

Low inflection point in piecewise linear VTLN warping function

In Hertz

property vtln_high

High inflection point in piecewise linear VTLN warping function

In Hertz. If vtln_high < 0, offset from high_freq

process(signal, vtln_warp=1.0)[source]

Compute features with the specified options

Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.

Parameters
  • signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono

  • vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.

Returns

features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.

Base post-processor

A post-processor takes features as input and output new features:

Features –> FeaturesPostProcessor –> Features

class shennong.postprocessor.base.FeaturesPostProcessor[source]

Bases: shennong.processor.base.FeaturesProcessor

Base class of all features post-processors

abstract process(features)[source]

Returns features post-processed from input features

get_properties(features)[source]

Return the processors properties as a dictionary

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

property log

Processor logger

abstract property name

Name of the processor

abstract property ndims

Dimension of the output features frames

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.