CMVN¶

Cepstral mean variance normalization (CMVN) on speech features

The CmvnPostProcessor class is used for accumulating CMVN statistics and applying CMVN on features using accumulated statistics. Uses the Kaldi implementation (see [kaldi-cmvn]):

Features –> CmvnPostProcessor –> Features
The SlidingWindowCmvnPostProcessor class is used to apply sliding window CMVN. With that class, each window is normalized independantly. Uses the Kaldi implementation:

Features –> SlidingWindowCmvnPostProcessor –> Features

Examples

Compute MFCC features:

>>> import numpy as np
>>> from shennong.audio import Audio
>>> from shennong.processor.mfcc import MfccProcessor
>>> from shennong.postprocessor.cmvn import CmvnPostProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> mfcc = MfccProcessor(sample_rate=audio.sample_rate).process(audio)

Accumulate CMVN statistics and normalize the features (in real life you want to accumulate statistics over several features, for example on all features belonging to one speaker, so as to obtain a normalization per speaker):

>>> processor = CmvnPostProcessor(mfcc.ndims)
>>> processor.accumulate(mfcc)
>>> cmvn = processor.process(mfcc)

The normalized features have a zero mean and unitary variance:

>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True

This module also provides a high-level method for applying CMVN to a whole FeaturesCollection at once:

>>> from shennong import FeaturesCollection
>>> from shennong.postprocessor.cmvn import apply_cmvn
>>> feats = FeaturesCollection(utt1=mfcc)
>>> cmvns = apply_cmvn(feats)

As above, the features has zero mean and unitary variance

>>> cmvn = cmvns['utt1']
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True

Apply sliding-window normalization to the features:

>>> from shennong.postprocessor.cmvn import SlidingWindowCmvnPostProcessor
>>> processor = SlidingWindowCmvnPostProcessor(normalize_variance=True)
>>> window_size = 40
>>> processor.cmn_window = window_size
>>> processor.min_window = window_size
>>> sliding_cmvn = processor.process(mfcc)

Each frame of the original features has been normalized with statistics computed in the window:

>>> frame = 70
>>> window = mfcc.data[frame-window_size//2:frame+window_size//2, :]
>>> norm_mfcc = (mfcc.data[frame,:] - window.mean(axis=0)) / window.std(axis=0)
>>> np.all(np.isclose(sliding_cmvn.data[frame, :], norm_mfcc, atol=1e-6))
True

References

kaldi-cmvn: https://kaldi-asr.org/doc/transform.html#transform_cmvn

class shennong.postprocessor.cmvn.CmvnPostProcessor(dim, stats=None)[source]¶

Bases: shennong.postprocessor.base.FeaturesPostProcessor

Computes CMVN statistics on speech features

Parameters

dim (int) – The features dimension, must be strictly positive
stats (array, shape = [2, dim+1]) – Preaccumulated CMVN statistics (see CmvnPostProcessor:stats())

Raises

ValueError – If dim is not a strictly positive integer

property name¶: Name of the processor

property dim¶: The dimension of features on which to compute CMVN

property stats¶

The accumulated CMVN statistics

Array of shape [2, dim+1] with the following format:

stats[0, :] represents the sum of accumulated feature frames, used to estimate the accumulated mean.
stats[1, :] represents the sum of element-wise squares of accumulated feature frames, used to estimate the accumulated variance.
stats[0, -1] represents the weighted total count of accumulated feature frames.
stats[1, -1] is initialized to zero but otherwise is not used.

property count¶: The weighted total count of accumulated features frames

property ndims¶: Dimension of the output features frames

get_properties(features)[source]¶: Return the processors properties as a dictionary

accumulate(features, weights=None)[source]¶

Accumulates CMVN statistics

Computes the CMVN statistics for the given features and accumulates them for further processing.

Parameters

features (Features) – The input features on which to accumulate statisitics.
weights (array, shape = [features.nframes, 1], optional) – Weights to apply to each frame of the features (possibly zero to ignore silences or non-speech frames). Accumulation is non-weighted by default.

Raises

ValueError – If weights have more than one dimension or if weights length does not fit features dimension.

process(features, norm_vars=True, skip_dims=None, reverse=False)[source]¶

Applies the accumulated CMVN statistics to the given features

Parameters

features (Features) – The input features on which to apply CMVN statisitics.
norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
skip_dims (list of positive integers, optional) – Dimensions for which to skip normalization. Default is to not skip any dimension.
reverse (bool, optional) – Whether to apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance features into features with the desired mean and variance.

Returns

cmvn_features (Features) – The normalized features

Raises

ValueError – If no stats have been accumulated

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

property log¶: Processor logger

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

shennong.postprocessor.cmvn.apply_cmvn(feats_collection, by_collection=True, norm_vars=True, weights=None, skip_dims=None)[source]¶

CMVN normalization of a collection of features

This function is a simple wrapper on the class CmvnPostProcessor that allows to accumulate and apply CMVN statistics over a whole collections of features.

Warning

The features in the collection must have the same dimensionality. It is assumed they are all extracted from the same processor. If this is not the case, a ValueError is raised.

Parameters

feats_collection (FeaturesCollection) – The collection of features on wich to apply CMVN normlization. Each features in the collection is assumed to have consistent dimensions.
by_collection (bool, optional) – When True, accumulate and apply CMVN over the entire collection. When False, do it independently for each features in the collection. Default to True.
norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.
weights (dict of arrays, optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have weights.keys() == feats_collections.keys() (see CmvnPostProcessor.accumulate()). Unweighted by default.
skip_dims (list of integers) – The dimensions for which to skip the normalization (see CmvnPostProcessor.process()). Default is to normalize all dimensions.

Returns

cmvn_feats_collection (FeaturesCollection)

Raises

ValueError – If something goes wrong during CMVN processing.

class shennong.postprocessor.cmvn.SlidingWindowCmvnPostProcessor(center=True, cmn_window=600, min_window=100, max_warnings=5, normalize_variance=False)[source]¶

Bases: shennong.postprocessor.base.FeaturesPostProcessor

Compute sliding-window normalization on speech features

Parameters

center (bool, optional) – Whether to center the window on the current frame, default to True
cmn_window (int, optional) – Window size for average CMN computation, default to 600
min_window (int, optional) – Minimum CMN window used at start of decoding, default to 100
max_warnings (int, optional) – Maximum warning to report per utterance, default to 5
normalize_variance (bool, optional) – Whether to normalize variance to one, default to False

property name¶: Name of the processor

property ndims¶: Dimension of the output features frames

property center¶: Whether to center the window on the current frame

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

property log¶: Processor logger

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property cmn_window¶: Window size for average CMN computation

property min_window¶: Minimum CMN window used at start of decoding

property max_warnings¶: Maximum warning to report per utterance

property normalize_variance¶: Whether to normalize variance to one

get_properties(features)[source]¶: Return the processors properties as a dictionary

process(features)[source]¶

Applies sliding-window cepstral mean and/or variance normalization on features with the specified options

Parameters: features (Features) – The input features.
Returns: slid_window_cmvn_feats (Features) – The normalized features.