CMVN¶
Cepstral mean variance normalization (CMVN) on speech features
- The - CmvnPostProcessorclass is used for accumulating CMVN statistics and applying CMVN on features using accumulated statistics. Uses the Kaldi implementation (see [kaldi-cmvn]):- Features–> CmvnPostProcessor –>- Features
- The - SlidingWindowCmvnPostProcessorclass is used to apply sliding window CMVN. With that class, each window is normalized independantly. Uses the Kaldi implementation:- Features–> SlidingWindowCmvnPostProcessor –>- Features
Examples
Compute MFCC features:
>>> import numpy as np
>>> from shennong.audio import Audio
>>> from shennong.processor.mfcc import MfccProcessor
>>> from shennong.postprocessor.cmvn import CmvnPostProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> mfcc = MfccProcessor(sample_rate=audio.sample_rate).process(audio)
Accumulate CMVN statistics and normalize the features (in real life you want to accumulate statistics over several features, for example on all features belonging to one speaker, so as to obtain a normalization per speaker):
>>> processor = CmvnPostProcessor(mfcc.ndims)
>>> processor.accumulate(mfcc)
>>> cmvn = processor.process(mfcc)
The normalized features have a zero mean and unitary variance:
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True
This module also provides a high-level method for applying CMVN to a
whole FeaturesCollection at once:
>>> from shennong import FeaturesCollection
>>> from shennong.postprocessor.cmvn import apply_cmvn
>>> feats = FeaturesCollection(utt1=mfcc)
>>> cmvns = apply_cmvn(feats)
As above, the features has zero mean and unitary variance
>>> cmvn = cmvns['utt1']
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True
Apply sliding-window normalization to the features:
>>> from shennong.postprocessor.cmvn import SlidingWindowCmvnPostProcessor
>>> processor = SlidingWindowCmvnPostProcessor(normalize_variance=True)
>>> window_size = 40
>>> processor.cmn_window = window_size
>>> processor.min_window = window_size
>>> sliding_cmvn = processor.process(mfcc)
Each frame of the original features has been normalized with statistics computed in the window:
>>> frame = 70
>>> window = mfcc.data[frame-window_size//2:frame+window_size//2, :]
>>> norm_mfcc = (mfcc.data[frame,:] - window.mean(axis=0)) / window.std(axis=0)
>>> np.all(np.isclose(sliding_cmvn.data[frame, :], norm_mfcc, atol=1e-6))
True
References
- 
class shennong.postprocessor.cmvn.CmvnPostProcessor(dim, stats=None)[source]¶
- Bases: - shennong.postprocessor.base.FeaturesPostProcessor- Computes CMVN statistics on speech features - Parameters
- dim (int) – The features dimension, must be strictly positive 
- stats (array, shape = [2, dim+1]) – Preaccumulated CMVN statistics (see - CmvnPostProcessor:stats())
 
- Raises
- ValueError – If - dimis not a strictly positive integer
 - 
property name¶
- Name of the processor 
 - 
property dim¶
- The dimension of features on which to compute CMVN 
 - 
property stats¶
- The accumulated CMVN statistics - Array of shape [2, dim+1] with the following format: - stats[0, :]represents the sum of accumulated feature frames, used to estimate the accumulated mean.
- stats[1, :]represents the sum of element-wise squares of accumulated feature frames, used to estimate the accumulated variance.
- stats[0, -1]represents the weighted total count of accumulated feature frames.
- stats[1, -1]is initialized to zero but otherwise is not used.
 
 - 
property count¶
- The weighted total count of accumulated features frames 
 - 
property ndims¶
- Dimension of the output features frames 
 - 
accumulate(features, weights=None)[source]¶
- Accumulates CMVN statistics - Computes the CMVN statistics for the given - featuresand accumulates them for further processing.- Parameters
- features ( - Features) – The input features on which to accumulate statisitics.
- weights (array, shape = [ - features.nframes, 1], optional) – Weights to apply to each frame of the features (possibly zero to ignore silences or non-speech frames). Accumulation is non-weighted by default.
 
- Raises
- ValueError – If - weightshave more than one dimension or if- weightslength does not fit- featuresdimension.
 
 - 
process(features, norm_vars=True, skip_dims=None, reverse=False)[source]¶
- Applies the accumulated CMVN statistics to the given - features- Parameters
- features ( - Features) – The input features on which to apply CMVN statisitics.
- norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True. 
- skip_dims (list of positive integers, optional) – Dimensions for which to skip normalization. Default is to not skip any dimension. 
- reverse (bool, optional) – Whether to apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance features into features with the desired mean and variance. 
 
- Returns
- cmvn_features ( - Features) – The normalized features
- Raises
- ValueError – If no stats have been accumulated 
 
 - 
get_params(deep=True)¶
- Get parameters for this processor. - Parameters
- deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True. 
- Returns
- params (mapping of string to any) – Parameter names mapped to their values. 
 
 - 
property log¶
- Processor logger 
 - 
process_all(utterances, njobs=None, **kwargs)¶
- Returns features processed from several input utterances - This function processes the features in parallel jobs. - Parameters
- utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on. 
- njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine. 
- **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances. 
 
- Returns
- features ( - FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.
- Raises
- ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs. 
 
 - 
set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶
- Change level and/or format of the processor’s logger - Parameters
- level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’. 
- formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message. 
 
 
 - 
set_params(**params)¶
- Set the parameters of this processor. - Returns
- self 
- Raises
- ValueError – If any given parameter in - paramsis invalid for the processor.
 
 
- 
shennong.postprocessor.cmvn.apply_cmvn(feats_collection, by_collection=True, norm_vars=True, weights=None, skip_dims=None)[source]¶
- CMVN normalization of a collection of features - This function is a simple wrapper on the class - CmvnPostProcessorthat allows to accumulate and apply CMVN statistics over a whole collections of features.- Warning - The features in the collection must have the same dimensionality. It is assumed they are all extracted from the same processor. If this is not the case, a ValueError is raised. - Parameters
- feats_collection ( - FeaturesCollection) – The collection of features on wich to apply CMVN normlization. Each features in the collection is assumed to have consistent dimensions.
- by_collection (bool, optional) – When True, accumulate and apply CMVN over the entire collection. When False, do it independently for each features in the collection. Default to True. 
- norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True. 
- weights (dict of arrays, optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have - weights.keys() == feats_collections.keys()(see- CmvnPostProcessor.accumulate()). Unweighted by default.
- skip_dims (list of integers) – The dimensions for which to skip the normalization (see - CmvnPostProcessor.process()). Default is to normalize all dimensions.
 
- Returns
- cmvn_feats_collection ( - FeaturesCollection)
- Raises
- ValueError – If something goes wrong during CMVN processing. 
 
- 
class shennong.postprocessor.cmvn.SlidingWindowCmvnPostProcessor(center=True, cmn_window=600, min_window=100, max_warnings=5, normalize_variance=False)[source]¶
- Bases: - shennong.postprocessor.base.FeaturesPostProcessor- Compute sliding-window normalization on speech features - Parameters
- center (bool, optional) – Whether to center the window on the current frame, default to True 
- cmn_window (int, optional) – Window size for average CMN computation, default to 600 
- min_window (int, optional) – Minimum CMN window used at start of decoding, default to 100 
- max_warnings (int, optional) – Maximum warning to report per utterance, default to 5 
- normalize_variance (bool, optional) – Whether to normalize variance to one, default to False 
 
 - 
property name¶
- Name of the processor 
 - 
property ndims¶
- Dimension of the output features frames 
 - 
property center¶
- Whether to center the window on the current frame 
 - 
get_params(deep=True)¶
- Get parameters for this processor. - Parameters
- deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True. 
- Returns
- params (mapping of string to any) – Parameter names mapped to their values. 
 
 - 
property log¶
- Processor logger 
 - 
process_all(utterances, njobs=None, **kwargs)¶
- Returns features processed from several input utterances - This function processes the features in parallel jobs. - Parameters
- utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on. 
- njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine. 
- **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances. 
 
- Returns
- features ( - FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.
- Raises
- ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs. 
 
 - 
set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶
- Change level and/or format of the processor’s logger - Parameters
- level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’. 
- formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message. 
 
 
 - 
set_params(**params)¶
- Set the parameters of this processor. - Returns
- self 
- Raises
- ValueError – If any given parameter in - paramsis invalid for the processor.
 
 - 
property cmn_window¶
- Window size for average CMN computation 
 - 
property min_window¶
- Minimum CMN window used at start of decoding 
 - 
property max_warnings¶
- Maximum warning to report per utterance 
 - 
property normalize_variance¶
- Whether to normalize variance to one