CMVN¶
Cepstral mean variance normalization (CMVN) on speech features
The
CmvnPostProcessor
class is used for accumulating CMVN statistics and applying CMVN on features using accumulated statistics. Uses the Kaldi implementation (see [kaldicmvn]):Features
–> CmvnPostProcessor –>Features
The
SlidingWindowCmvnPostProcessor
class is used to apply sliding window CMVN. With that class, each window is normalized independantly. Uses the Kaldi implementation:Features
–> SlidingWindowCmvnPostProcessor –>Features
Examples
Compute MFCC features:
>>> import numpy as np
>>> from shennong.audio import Audio
>>> from shennong.features.processor.mfcc import MfccProcessor
>>> from shennong.features.postprocessor.cmvn import CmvnPostProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> mfcc = MfccProcessor(sample_rate=audio.sample_rate).process(audio)
Accumulate CMVN statistics and normalize the features (in real life you want to accumulate statistics over several features, for example on all features belonging to one speaker, so as to obtain a normalization per speaker):
>>> processor = CmvnPostProcessor(mfcc.ndims)
>>> processor.accumulate(mfcc)
>>> cmvn = processor.process(mfcc)
The normalized features have a zero mean and unitary variance:
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e6))
True
This module also provides a highlevel method for applying CMVN to a
whole FeaturesCollection
at once:
>>> from shennong.features import FeaturesCollection
>>> from shennong.features.postprocessor.cmvn import apply_cmvn
>>> feats = FeaturesCollection(utt1=mfcc)
>>> cmvns = apply_cmvn(feats)
As above, the features has zero mean and unitary variance
>>> cmvn = cmvns['utt1']
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e6))
True
Apply slidingwindow normalization to the features:
>>> from shennong.features.postprocessor.cmvn import SlidingWindowCmvnPostProcessor
>>> processor = SlidingWindowCmvnPostProcessor(normalize_variance=True)
>>> window_size = 40
>>> processor.cmn_window = window_size
>>> processor.min_window = window_size
>>> sliding_cmvn = processor.process(mfcc)
Each frame of the original features has been normalized with statistics computed in the window:
>>> frame = 70
>>> window = mfcc.data[framewindow_size//2:frame+window_size//2, :]
>>> norm_mfcc = (mfcc.data[frame,:]  window.mean(axis=0)) / window.std(axis=0)
>>> np.all(np.isclose(sliding_cmvn.data[frame, :], norm_mfcc, atol=1e6))
True
References

class
shennong.features.postprocessor.cmvn.
CmvnPostProcessor
(dim, stats=None)[source]¶ Bases:
shennong.features.postprocessor.base.FeaturesPostProcessor
Computes CMVN statistics on speech features
 Parameters
dim (int) – The features dimension, must be strictly positive
stats (array, shape = [2, dim+1]) – Preaccumulated CMVN statistics (see
CmvnPostProcessor:stats()
)
 Raises
ValueError – If
dim
is not a strictly positive integer

property
name
¶ Name of the processor

property
dim
¶ The dimension of features on which to compute CMVN

property
stats
¶ The accumulated CMVN statistics
Array of shape [2, dim+1] with the following format:
stats[0, :]
represents the sum of accumulated feature frames, used to estimate the accumulated mean.stats[1, :]
represents the sum of elementwise squares of accumulated feature frames, used to estimate the accumulated variance.stats[0, 1]
represents the weighted total count of accumulated feature frames.stats[1, 1]
is initialized to zero but otherwise is not used.

property
count
¶ The weighted total count of accumulated features frames

property
ndims
¶ Dimension of the output features frames

accumulate
(features, weights=None)[source]¶ Accumulates CMVN statistics
Computes the CMVN statistics for the given
features
and accumulates them for further processing. Parameters
features (
Features
) – The input features on which to accumulate statisitics.weights (array, shape = [
features.nframes
, 1], optional) – Weights to apply to each frame of the features (possibly zero to ignore silences or nonspeech frames). Accumulation is nonweighted by default.
 Raises
ValueError – If
weights
have more than one dimension or ifweights
length does not fitfeatures
dimension.

process
(features, norm_vars=True, skip_dims=None, reverse=False)[source]¶ Applies the accumulated CMVN statistics to the given
features
 Parameters
features (
Features
) – The input features on which to apply CMVN statisitics.norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
skip_dims (list of positive integers, optional) – Dimensions for which to skip normalization. Default is to not skip any dimension.
reverse (bool, optional) – Whether to apply CMVN in a reverse sense, so as to transform zeromean, unitvariance features into features with the desired mean and variance.
 Returns
cmvn_features (
Features
) – The normalized features Raises
ValueError – If no stats have been accumulated

get_params
(deep=True)¶ Get parameters for this processor.
 Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
 Returns
params (mapping of string to any) – Parameter names mapped to their values.

process_all
(signals, njobs=None)¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
 Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
 Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals. Raises
ValueError – If the njobs parameter is <= 0

set_params
(**params)¶ Set the parameters of this processor.
 Returns
self
 Raises
ValueError – If any given parameter in
params
is invalid for the processor.

shennong.features.postprocessor.cmvn.
apply_cmvn
(feats_collection, by_collection=True, norm_vars=True, weights=None, skip_dims=None)[source]¶ CMVN normalization of a collection of features
This function is a simple wrapper on the class
CmvnPostProcessor
that allows to accumulate and apply CMVN statistics over a whole collections of features.Warning
The features in the collection must have the same dimensionality. It is assumed they are all extracted from the same processor. If this is not the case, a ValueError is raised.
 Parameters
feats_collection (
FeaturesCollection
) – The collection of features on wich to apply CMVN normlization. Each features in the collection is assumed to have consistent dimensions.by_collection (bool, optional) – When True, accumulate and apply CMVN over the entire collection. When False, do it independently for each features in the collection. Default to True.
norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
weights (dict of arrays, optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have
weights.keys() == feats_collections.keys()
(seeCmvnPostProcessor.accumulate()
). Unweighted by default.skip_dims (list of integers) – The dimensions for which to skip the normalization (see
CmvnPostProcessor.process()
). Default is to normalize all dimensions.
 Returns
cmvn_feats_collection (
FeaturesCollection
) Raises
ValueError – If something goes wrong during CMVN processing.

class
shennong.features.postprocessor.cmvn.
SlidingWindowCmvnPostProcessor
(center=True, cmn_window=600, min_window=100, max_warnings=5, normalize_variance=False)[source]¶ Bases:
shennong.features.postprocessor.base.FeaturesPostProcessor
Compute slidingwindow normalization on speech features
 Parameters
center (bool, optional) – Whether to center the window on the current frame, default to True
cmn_window (int, optional) – Window size for average CMN computation, default to 600
min_window (int, optional) – Minimum CMN window used at start of decoding, default to 100
max_warnings (int, optional) – Maximum warning to report per utterance, default to 5
normalize_variance (bool, optional) – Whether to normalize variance to one, default to False

property
name
¶ Name of the processor

property
ndims
¶ Dimension of the output features frames

property
center
¶ Whether to center the window on the current frame

get_params
(deep=True)¶ Get parameters for this processor.
 Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
 Returns
params (mapping of string to any) – Parameter names mapped to their values.

process_all
(signals, njobs=None)¶ Returns features processed from several input signals
This function processes the features in parallel jobs.
 Parameters
signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
 Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input signals. Raises
ValueError – If the njobs parameter is <= 0

set_params
(**params)¶ Set the parameters of this processor.
 Returns
self
 Raises
ValueError – If any given parameter in
params
is invalid for the processor.

property
cmn_window
¶ Window size for average CMN computation

property
min_window
¶ Minimum CMN window used at start of decoding

property
max_warnings
¶ Maximum warning to report per utterance

property
normalize_variance
¶ Whether to normalize variance to one