CMVN¶
Cepstral mean variance normalization (CMVN) on speech features
The
CmvnPostProcessor
class is used for accumulating CMVN statistics and applying CMVN on features using accumulated statistics. Uses the Kaldi implementation (see [kaldi-cmvn]):Features
–> CmvnPostProcessor –>Features
The
SlidingWindowCmvnPostProcessor
class is used to apply sliding window CMVN. With that class, each window is normalized independantly. Uses the Kaldi implementation:Features
–> SlidingWindowCmvnPostProcessor –>Features
Examples
Compute MFCC features:
>>> import numpy as np
>>> from shennong.audio import Audio
>>> from shennong.processor.mfcc import MfccProcessor
>>> from shennong.postprocessor.cmvn import CmvnPostProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> mfcc = MfccProcessor(sample_rate=audio.sample_rate).process(audio)
Accumulate CMVN statistics and normalize the features (in real life you want to accumulate statistics over several features, for example on all features belonging to one speaker, so as to obtain a normalization per speaker):
>>> processor = CmvnPostProcessor(mfcc.ndims)
>>> processor.accumulate(mfcc)
>>> cmvn = processor.process(mfcc)
The normalized features have a zero mean and unitary variance:
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True
This module also provides a high-level method for applying CMVN to a
whole FeaturesCollection
at once:
>>> from shennong import FeaturesCollection
>>> from shennong.postprocessor.cmvn import apply_cmvn
>>> feats = FeaturesCollection(utt1=mfcc)
>>> cmvns = apply_cmvn(feats)
As above, the features has zero mean and unitary variance
>>> cmvn = cmvns['utt1']
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True
Apply sliding-window normalization to the features:
>>> from shennong.postprocessor.cmvn import SlidingWindowCmvnPostProcessor
>>> processor = SlidingWindowCmvnPostProcessor(normalize_variance=True)
>>> window_size = 40
>>> processor.cmn_window = window_size
>>> processor.min_window = window_size
>>> sliding_cmvn = processor.process(mfcc)
Each frame of the original features has been normalized with statistics computed in the window:
>>> frame = 70
>>> window = mfcc.data[frame-window_size//2:frame+window_size//2, :]
>>> norm_mfcc = (mfcc.data[frame,:] - window.mean(axis=0)) / window.std(axis=0)
>>> np.all(np.isclose(sliding_cmvn.data[frame, :], norm_mfcc, atol=1e-6))
True
References
-
class
shennong.postprocessor.cmvn.
CmvnPostProcessor
(dim, stats=None)[source]¶ Bases:
shennong.postprocessor.base.FeaturesPostProcessor
Computes CMVN statistics on speech features
- Parameters
dim (int) – The features dimension, must be strictly positive
stats (array, shape = [2, dim+1]) – Preaccumulated CMVN statistics (see
CmvnPostProcessor:stats()
)
- Raises
ValueError – If
dim
is not a strictly positive integer
-
property
name
¶ Name of the processor
-
property
dim
¶ The dimension of features on which to compute CMVN
-
property
stats
¶ The accumulated CMVN statistics
Array of shape [2, dim+1] with the following format:
stats[0, :]
represents the sum of accumulated feature frames, used to estimate the accumulated mean.stats[1, :]
represents the sum of element-wise squares of accumulated feature frames, used to estimate the accumulated variance.stats[0, -1]
represents the weighted total count of accumulated feature frames.stats[1, -1]
is initialized to zero but otherwise is not used.
-
property
count
¶ The weighted total count of accumulated features frames
-
property
ndims
¶ Dimension of the output features frames
-
accumulate
(features, weights=None)[source]¶ Accumulates CMVN statistics
Computes the CMVN statistics for the given
features
and accumulates them for further processing.- Parameters
features (
Features
) – The input features on which to accumulate statisitics.weights (array, shape = [
features.nframes
, 1], optional) – Weights to apply to each frame of the features (possibly zero to ignore silences or non-speech frames). Accumulation is non-weighted by default.
- Raises
ValueError – If
weights
have more than one dimension or ifweights
length does not fitfeatures
dimension.
-
process
(features, norm_vars=True, skip_dims=None, reverse=False)[source]¶ Applies the accumulated CMVN statistics to the given
features
- Parameters
features (
Features
) – The input features on which to apply CMVN statisitics.norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
skip_dims (list of positive integers, optional) – Dimensions for which to skip normalization. Default is to not skip any dimension.
reverse (bool, optional) – Whether to apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance features into features with the desired mean and variance.
- Returns
cmvn_features (
Features
) – The normalized features- Raises
ValueError – If no stats have been accumulated
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
property
log
¶ Processor logger
-
process_all
(utterances, njobs=None, **kwargs)¶ Returns features processed from several input utterances
This function processes the features in parallel jobs.
- Parameters
utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input utterances.- Raises
ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.
-
set_logger
(level, formatter='%(levelname)s - %(name)s - %(message)s')¶ Change level and/or format of the processor’s logger
- Parameters
level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
shennong.postprocessor.cmvn.
apply_cmvn
(feats_collection, by_collection=True, norm_vars=True, weights=None, skip_dims=None)[source]¶ CMVN normalization of a collection of features
This function is a simple wrapper on the class
CmvnPostProcessor
that allows to accumulate and apply CMVN statistics over a whole collections of features.Warning
The features in the collection must have the same dimensionality. It is assumed they are all extracted from the same processor. If this is not the case, a ValueError is raised.
- Parameters
feats_collection (
FeaturesCollection
) – The collection of features on wich to apply CMVN normlization. Each features in the collection is assumed to have consistent dimensions.by_collection (bool, optional) – When True, accumulate and apply CMVN over the entire collection. When False, do it independently for each features in the collection. Default to True.
norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
weights (dict of arrays, optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have
weights.keys() == feats_collections.keys()
(seeCmvnPostProcessor.accumulate()
). Unweighted by default.skip_dims (list of integers) – The dimensions for which to skip the normalization (see
CmvnPostProcessor.process()
). Default is to normalize all dimensions.
- Returns
cmvn_feats_collection (
FeaturesCollection
)- Raises
ValueError – If something goes wrong during CMVN processing.
-
class
shennong.postprocessor.cmvn.
SlidingWindowCmvnPostProcessor
(center=True, cmn_window=600, min_window=100, max_warnings=5, normalize_variance=False)[source]¶ Bases:
shennong.postprocessor.base.FeaturesPostProcessor
Compute sliding-window normalization on speech features
- Parameters
center (bool, optional) – Whether to center the window on the current frame, default to True
cmn_window (int, optional) – Window size for average CMN computation, default to 600
min_window (int, optional) – Minimum CMN window used at start of decoding, default to 100
max_warnings (int, optional) – Maximum warning to report per utterance, default to 5
normalize_variance (bool, optional) – Whether to normalize variance to one, default to False
-
property
name
¶ Name of the processor
-
property
ndims
¶ Dimension of the output features frames
-
property
center
¶ Whether to center the window on the current frame
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
property
log
¶ Processor logger
-
process_all
(utterances, njobs=None, **kwargs)¶ Returns features processed from several input utterances
This function processes the features in parallel jobs.
- Parameters
utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input utterances.- Raises
ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.
-
set_logger
(level, formatter='%(levelname)s - %(name)s - %(message)s')¶ Change level and/or format of the processor’s logger
- Parameters
level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
property
cmn_window
¶ Window size for average CMN computation
-
property
min_window
¶ Minimum CMN window used at start of decoding
-
property
max_warnings
¶ Maximum warning to report per utterance
-
property
normalize_variance
¶ Whether to normalize variance to one