Abstract base classes¶
This section documents the abstract base classes at the top of the inheritance tree in shennong.
Base processor¶
Base classes for all shennong components
-
class
shennong.base.
BaseProcessor
[source]¶ Bases:
object
Base class for all processors in shennong
Notes
All processors should specify all the parameters that can be set at the class level in their
__init__
as explicit keyword arguments (no*args
or**kwargs
).The methods
get_params()
andset_params()
are adapted fromsklearn.base.BaseEstimator
-
get_params
(deep=True)[source]¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
Base features processor¶
This module implements the speech features extraction models (processors)
A speech features processor takes an audio signal as input and output features:
-
class
shennong.features.processor.base.
FeaturesProcessor
[source]¶ Bases:
shennong.base.BaseProcessor
Base class of all the features extraction models
-
abstract property
name
¶ Name of the processor
-
abstract property
ndims
¶ Dimension of the output features frames
-
abstract
process
(signal)[source]¶ Returns features processed from an input signal
- Parameters
signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on
- Returns
features (
Features
) – The computed features
-
process_all
(signals, njobs=None)[source]¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
- Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals.- Raises
ValueError – If the njobs parameter is <= 0
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
abstract property
-
class
shennong.features.processor.base.
FramesProcessor
(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True)[source]¶ Bases:
shennong.features.processor.base.FeaturesProcessor
A base class for frame based features processors.
Wrap the kaldi frames implementation. See [kaldi-frame].
References
-
property
sample_rate
¶ Waveform sample frequency in Hertz
Must match the sample rate of the signal specified in process
-
property
frame_shift
¶ Frame shift in seconds
-
property
frame_length
¶ Frame length in seconds
-
property
dither
¶ Amount of dithering
0.0 means no dither
-
property
preemph_coeff
¶ Coefficient for use in signal preemphasis
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
get_properties
()¶ Return the processors properties as a dictionary
-
abstract property
name
¶ Name of the processor
-
abstract property
ndims
¶ Dimension of the output features frames
-
abstract
process
(signal)¶ Returns features processed from an input signal
- Parameters
signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on
- Returns
features (
Features
) – The computed features
-
process_all
(signals, njobs=None)¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
- Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals.- Raises
ValueError – If the njobs parameter is <= 0
-
property
remove_dc_offset
¶ If True, subtract mean from waveform on each frame
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
property
window_type
¶ Type of window
Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’
-
property
round_to_power_of_two
¶ If true, round window size to power of two
This is done by zero-padding input to FFT
-
property
blackman_coeff
¶ Constant coefficient for generalized Blackman window
Used only if window_type is ‘blackman’
-
property
snip_edges
¶ If true, output only frames that completely fit in the file
When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.
-
property
-
class
shennong.features.processor.base.
MelFeaturesProcessor
(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500)[source]¶ Bases:
shennong.features.processor.base.FramesProcessor
A base class for mel-based features processors
The mel-based features are MFCC, PLP and filterbanks. The class implement common options for processing those features. See [kaldi-mel] and [kaldi-frame-2].
References
- kaldi-frame-2
http://kaldi-asr.org/doc/structkaldi_1_1FrameExtractionOptions.html
- kaldi-mel
http://kaldi-asr.org/doc/structkaldi_1_1MelBanksOptions.html
-
property
blackman_coeff
¶ Constant coefficient for generalized Blackman window
Used only if window_type is ‘blackman’
-
property
dither
¶ Amount of dithering
0.0 means no dither
-
property
frame_length
¶ Frame length in seconds
-
property
frame_shift
¶ Frame shift in seconds
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
get_properties
()¶ Return the processors properties as a dictionary
-
abstract property
name
¶ Name of the processor
-
abstract property
ndims
¶ Dimension of the output features frames
-
property
preemph_coeff
¶ Coefficient for use in signal preemphasis
-
process_all
(signals, njobs=None)¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
- Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals.- Raises
ValueError – If the njobs parameter is <= 0
-
property
remove_dc_offset
¶ If True, subtract mean from waveform on each frame
-
property
round_to_power_of_two
¶ If true, round window size to power of two
This is done by zero-padding input to FFT
-
property
sample_rate
¶ Waveform sample frequency in Hertz
Must match the sample rate of the signal specified in process
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
property
snip_edges
¶ If true, output only frames that completely fit in the file
When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.
-
property
window_type
¶ Type of window
Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’
-
property
num_bins
¶ Number of triangular mel-frequency bins
The minimal number of bins is 3
-
property
low_freq
¶ Low cutoff frequency for mel bins in Hertz
-
property
high_freq
¶ High cutoff frequency for mel bins in Hertz
If high_freq < 0, offset from the Nyquist frequency
-
property
vtln_low
¶ Low inflection point in piecewise linear VTLN warping function
In Hertz
-
property
vtln_high
¶ High inflection point in piecewise linear VTLN warping function
In Hertz. If vtln_high < 0, offset from high_freq
-
process
(signal, vtln_warp=1.0)[source]¶ Compute features with the specified options
Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.
- Parameters
signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono
vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.
- Returns
features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).
- Raises
ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.
Base features post-processor¶
A post-processor takes features as input and output new features:
Features
–>
FeaturesPostProcessor –>
Features
-
class
shennong.features.postprocessor.base.
FeaturesPostProcessor
[source]¶ Bases:
shennong.features.processor.base.FeaturesProcessor
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
abstract property
name
¶ Name of the processor
-
abstract property
ndims
¶ Dimension of the output features frames
-
process_all
(signals, njobs=None)¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
- Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals.- Raises
ValueError – If the njobs parameter is <= 0
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-