UBM¶

Provides the DiagUbmProcessor class to train a Universal Background Model

Gaussian Mixture Model (UBM-GMM) with diagonal covariances.
Uses the kaldi implementation of GMM (see [kaldi-gmm]).

The UBM is used as a preprocessing step by VtlnProcessor.

Examples

>>> from shennong import Utterances
>>> from shennong.processor.ubm import DiagUbmProcessor
>>> wav = './test/data/test.wav'
>>> utterances = Utterances(
...     [('utt1', wav, 'spk1', 0, 1), ('utt2', wav, 'spk1', 1, 1.4)])

Initialize the UBM-GMM with a given number of gaussians. Other options can be specified at construction, or after:

>>> num_gauss = 4
>>> ubm = DiagUbmProcessor(num_gauss, num_iters_init=10)
>>> ubm.num_iters = 3

Process the utterances to update the model.

>>> ubm.process(utterances)

Each gaussian of the model has as many dimensions as the features.

>>> import kaldi.gmm
>>> isinstance(ubm.gmm, kaldi.gmm.DiagGmm)
True
>>> means = ubm.gmm.get_means()
>>> means.num_rows == num_gauss
True
>>> means.num_cols
39

References

kaldi-gmm: https://kaldi-asr.org/doc/model.html

class shennong.processor.ubm.DiagUbmProcessor(num_gauss, num_iters=4, num_gselect=15, initial_gauss_proportion=0.5, num_iters_init=20, num_frames=500000, subsample=5, min_gaussian_weight=0.0001, remove_low_count_gaussians=False, seed=0, features=None, vad=None)[source]¶

Bases: shennong.base.BaseProcessor

Universal Background Model with Diagonal GMM

property name¶: Processor name

property num_gauss¶: Number of Gaussians in the model

property num_iters¶: Number of iterations of training.

property num_iters_init¶: Number of E-M iterations for model initialization.

property num_gselect¶: Number of Gaussians per frame to limit computation to, for speed.

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

property initial_gauss_proportion¶: Proportion of Gaussians to start with in initialization phase (then split)

property log¶: Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property num_frames¶: Maximum num-frames to keep in memory for model initialization.

property subsample¶: In main E-M phase, use every n frames (a speedup)

property min_gaussian_weight¶: Minimum weight below which a Gaussian is not updated

property remove_low_count_gaussians¶: Remove Gaussians with a weight below min_gaussian_weight

property features¶: Features extraction configuration

property vad¶: VAD configuration for the UBM-GMM

property seed¶: Random seed for initialization from random frames

classmethod load(path)[source]¶: Load the GMM from a binary file

save(path)[source]¶: Save the GMM to a binary file

initialize_gmm(feats_collection, njobs=1)[source]¶

Initializes a single diagonal GMM

Also does multiple iterations of initial training. Adapted from [kaldi-init].

Parameters

feats_collection (FeaturesCollection) – The collection of features to initialize the GMM with.
njobs (int, optional) – Number of threads to use for computation, default to 1.

Raises

ValueError – If the features have unconsistent dimensions.

References

kaldi-init: https://kaldi-asr.org/doc/gmm-global-init-from-feats_8cc.html

gaussian_selection(feats_collection)[source]¶

Precompute Gaussian indices for pruning. For each frame, gives a list of the n best Gaussian indices sorted from best to worst.

Adapted from [kaldi-gselect].

Parameters: feats_collection (FeaturesCollection) – The collection of features to select the best Gaussians from.

References

kaldi-gselect: https://kaldi-asr.org/doc/gmm-gselect_8cc.html

gaussian_selection_to_post(feats_collection, min_post=None)[source]¶

Get per-frames posteriors

Given features and Gaussian-selection (gselect) information for a diagonal-covariance GMM, output per-frame posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold (and renormalizing the rest to sum to one).

Adapted from [kaldi-gselect-to-post]

Parameters

feats_collection (FeaturesCollection) – The collection of features to use to get the posteriors.
min_post (int, optional) – Optional, posteriors below this threshold will be pruned away and the rest will be renormalized.

Returns

posteriors (dict[str, list[list[tuple[int, float]]]]) – For each utterance, the posteriors are a list of size the number of frames of the corresponding features. For each frame, we have a list of tuples corresponding to the gaussians in the gaussian selection for this frame and their log-likelihood (if the log-likelihood is positive).

References

kaldi-gselect-to-post: https://kaldi-asr.org/doc/gmm-global-gselect-to-post_8cc.html

accumulate(feats_collection, weights_collection=None, njobs=1)[source]¶

Accumulate stats for training a diagonal-covariance GMM.

Adapted from [kaldi-acc]

Parameters

feats_collection (FeaturesCollection) – The collection of features to use to accumulate stats.
weights_collection (dict[str, ndarrays], optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have weights.keys() == feats_collections.keys(). Unweighted by default.
njobs (int, optional) – Number of threads to use for computation, default to 1.

Returns

gmm_accs (kaldi.gmm.AccumDiagGmm) – The accumulated stats.

References

kaldi-acc: https://kaldi-asr.org/doc/gmm-global-acc-stats_8cc.html

estimate(gmm_accs, mixup=None, perturb_factor=0.01)[source]¶

Estimate a diagonal-covariance GMM from the accumulated stats.

Adapted from [kaldi-gmm-est]

Parameters

gmm_accs (kaldi.gmm.AccumDiagGmm) – Accumulated stats
mixup (int, optional) – Increase number of mixture components to this overall target.
perturb_factor (float, optional) – While mixing up, perturb means by standard deviation times this factor.

References

kaldi-gmm-est: https://kaldi-asr.org/doc/gmm-global-est_8cc.html

process(utterances, njobs=1)[source]¶

Initialize the GMM, which sets the means to random data points and then does some iterations of EM. Train for a few iterations in parallel

Parameters

utterances (Utterances) – The list of utterances to train the VTLN on.
njobs (int, optional) – Number of threads to use for computation, default to 1.

Raises

ValueError – On errors