CMVN

Cepstral mean variance normalization (CMVN) on speech features

  • The CmvnPostProcessor class is used for accumulating CMVN statistics and applying CMVN on features using accumulated statistics. Uses the Kaldi implementation (see [kaldi-cmvn]):

    Features –> CmvnPostProcessor –> Features

  • The SlidingWindowCmvnPostProcessor class is used to apply sliding window CMVN. With that class, each window is normalized independantly. Uses the Kaldi implementation:

    Features –> SlidingWindowCmvnPostProcessor –> Features

Examples

Compute MFCC features:

>>> import numpy as np
>>> from shennong.audio import Audio
>>> from shennong.features.processor.mfcc import MfccProcessor
>>> from shennong.features.postprocessor.cmvn import CmvnPostProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> mfcc = MfccProcessor(sample_rate=audio.sample_rate).process(audio)

Accumulate CMVN statistics and normalize the features (in real life you want to accumulate statistics over several features, for example on all features belonging to one speaker, so as to obtain a normalization per speaker):

>>> processor = CmvnPostProcessor(mfcc.ndims)
>>> processor.accumulate(mfcc)
>>> cmvn = processor.process(mfcc)

The normalized features have a zero mean and unitary variance:

>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True

This module also provides a high-level method for applying CMVN to a whole FeaturesCollection at once:

>>> from shennong.features import FeaturesCollection
>>> from shennong.features.postprocessor.cmvn import apply_cmvn
>>> feats = FeaturesCollection(utt1=mfcc)
>>> cmvns = apply_cmvn(feats)

As above, the features has zero mean and unitary variance

>>> cmvn = cmvns['utt1']
>>> np.all(np.isclose(cmvn.data.mean(axis=0), np.zeros(cmvn.ndims), atol=1e-6))
True
>>> np.all(np.isclose(cmvn.data.var(axis=0), np.ones(cmvn.ndims), atol=1e-6))
True

Apply sliding-window normalization to the features:

>>> from shennong.features.postprocessor.cmvn import         SlidingWindowCmvnPostProcessor
>>> processor = SlidingWindowCmvnPostProcessor(normalize_variance=True)
>>> window_size = 40
>>> processor.cmn_window = window_size
>>> processor.min_window = window_size
>>> sliding_cmvn = processor.process(mfcc)

Each frame of the original features has been normalized with statistics computed in the window:

>>> frame = 70
>>> window = mfcc.data[frame-window_size//2:frame+window_size//2, :]
>>> norm_mfcc = (mfcc.data[frame,:] - window.mean(axis=0)) / window.std(axis=0)
>>> np.all(np.isclose(sliding_cmvn.data[frame, :], norm_mfcc, atol=1e-6))
True

References

kaldi-cmvn

https://kaldi-asr.org/doc/transform.html#transform_cmvn

class shennong.features.postprocessor.cmvn.CmvnPostProcessor(dim, stats=None)[source]

Bases: shennong.features.postprocessor.base.FeaturesPostProcessor

Computes CMVN statistics on speech features

Parameters
  • dim (int) – The features dimension, must be strictly positive

  • stats (array, shape = [2, dim+1]) – Preaccumulated CMVN statistics (see CmvnPostProcessor:stats())

Raises

ValueError – If dim is not a strictly positive integer

property name

Name of the processor

property dim

The dimension of features on which to compute CMVN

property stats

The accumulated CMVN statistics

Array of shape [2, dim+1] with the following format:

  • stats[0, :] represents the sum of accumulated feature frames, used to estimate the accumulated mean.

  • stats[1, :] represents the sum of element-wise squares of accumulated feature frames, used to estimate the accumulated variance.

  • stats[0, -1] represents the weighted total count of accumulated feature frames.

  • stats[1, -1] is initialized to zero but otherwise is not used.

property count

The weighted total count of accumulated features frames

property ndims

Dimension of the output features frames

get_properties(features)[source]

Return the processors properties as a dictionary

accumulate(features, weights=None)[source]

Accumulates CMVN statistics

Computes the CMVN statistics for the given features and accumulates them for further processing.

Parameters
  • features (Features) – The input features on which to accumulate statisitics.

  • weights (array, shape = [features.nframes, 1], optional) – Weights to apply to each frame of the features (possibly zero to ignore silences or non-speech frames). Accumulation is non-weighted by default.

Raises

ValueError – If weights have more than one dimension or if weights length does not fit features dimension.

process(features, norm_vars=True, skip_dims=None, reverse=False)[source]

Applies the accumulated CMVN statistics to the given features

Parameters
  • features (Features) – The input features on which to apply CMVN statisitics.

  • norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.

  • skip_dims (list of positive integers, optional) – Dimensions for which to skip normalization. Default is to not skip any dimension.

  • reverse (bool, optional) – Whether to apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance features into features with the desired mean and variance.

Returns

cmvn_features (Features) – The normalized features

Raises

ValueError – If no stats have been accumulated

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

process_all(signals, njobs=None)

Returns features processed from several input signals

This function processes the features in parallel jobs.

Parameters
  • signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input signals.

Raises

ValueError – If the njobs parameter is <= 0

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

shennong.features.postprocessor.cmvn.apply_cmvn(feats_collection, by_collection=True, norm_vars=True, weights=None, skip_dims=None)[source]

CMVN normalization of a collection of features

This function is a simple wrapper on the class CmvnPostProcessor that allows to accumulate and apply CMVN statistics over a whole collections of features.

Warning

The features in the collection must have the same dimensionality. It is assumed they are all extracted from the same processor. If this is not the case, a ValueError is raised.

Parameters
  • feats_collection (FeaturesCollection) – The collection of features on wich to apply CMVN normlization. Each features in the collection is assumed to have consistent dimensions.

  • by_collection (bool, optional) – When True, accumulate and apply CMVN over the entire collection. When False, do it independently for each features in the collection. Default to True.

  • norm_vars (bool, optional) – If False, do not apply variance normalization (only mean), default to True.

  • weights (dict of arrays, optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have weights.keys() == feats_collections.keys() (see CmvnPostProcessor.accumulate()). Unweighted by default.

  • skip_dims (list of integers) – The dimensions for which to skip the normalization (see CmvnPostProcessor.process()). Default is to normalize all dimensions.

Returns

cmvn_feats_collection (FeaturesCollection)

Raises

ValueError – If something goes wrong during CMVN processing.

class shennong.features.postprocessor.cmvn.SlidingWindowCmvnPostProcessor(center=True, cmn_window=600, min_window=100, max_warnings=5, normalize_variance=False)[source]

Bases: shennong.features.postprocessor.base.FeaturesPostProcessor

Compute sliding-window normalization on speech features

Parameters
  • center (bool, optional) – Whether to center the window on the current frame, default to True

  • cmn_window (int, optional) – Window size for average CMN computation, default to 600

  • min_window (int, optional) – Minimum CMN window used at start of decoding, default to 100

  • max_warnings (int, optional) – Maximum warning to report per utterance, default to 5

  • normalize_variance (bool, optional) – Whether to normalize variance to one, default to False

property name

Name of the processor

property ndims

Dimension of the output features frames

property center

Whether to center the window on the current frame

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

process_all(signals, njobs=None)

Returns features processed from several input signals

This function processes the features in parallel jobs.

Parameters
  • signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input signals.

Raises

ValueError – If the njobs parameter is <= 0

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property cmn_window

Window size for average CMN computation

property min_window

Minimum CMN window used at start of decoding

property max_warnings

Maximum warning to report per utterance

property normalize_variance

Whether to normalize variance to one

get_properties(features)[source]

Return the processors properties as a dictionary

process(features)[source]

Applies sliding-window cepstral mean and/or variance normalization on features with the specified options

Parameters

features (Features) – The input features.

Returns

slid_window_cmvn_feats (Features) – The normalized features.