UBM

Provides the DiagUbmProcessor class to train a Universal Background Model

  • Gaussian Mixture Model (UBM-GMM) with diagonal covariances.

  • Uses the kaldi implementation of GMM (see [kaldi-gmm]).

The UBM is used as a preprocessing step by VtlnProcessor.

Examples

>>> from shennong.processor.ubm import DiagUbmProcessor
>>> wav = './test/data/test.wav'
>>> utterances = [('utt1', wav, 'spk1', 0, 1), ('utt2', wav, 'spk1', 1, 1.5)]

Initialize the UBM-GMM with a given number of gaussians. Other options can be specified at construction, or after:

>>> num_gauss = 4
>>> ubm = DiagUbmProcessor(num_gauss, num_iters_init=10)
>>> ubm.num_iters = 3

Process the utterances to update the model.

>>> ubm.process(utterances)

Each gaussian of the model has as many dimensions as the features.

>>> import kaldi.gmm
>>> isinstance(ubm.gmm, kaldi.gmm.DiagGmm)
True
>>> means = ubm.gmm.get_means()
>>> means.num_rows == num_gauss
True
>>> means.num_cols
39

References

kaldi-gmm

https://kaldi-asr.org/doc/model.html

class shennong.processor.ubm.DiagUbmProcessor(num_gauss, num_iters=4, num_gselect=15, initial_gauss_proportion=0.5, num_iters_init=20, num_frames=500000, subsample=5, min_gaussian_weight=0.0001, remove_low_count_gaussians=False, seed=0, features=None, vad=None)[source]

Bases: shennong.base.BaseProcessor

Universal Background Model with Diagonal GMM

property name

Processor name

property num_gauss

Number of Gaussians in the model

property num_iters

Number of iterations of training.

property num_iters_init

Number of E-M iterations for model initialization.

property num_gselect

Number of Gaussians per frame to limit computation to, for speed.

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

property initial_gauss_proportion

Proportion of Gaussians to start with in initialization phase (then split)

property log

Processor logger

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property num_frames

Maximum num-frames to keep in memory for model initialization.

property subsample

In main E-M phase, use every n frames (a speedup)

property min_gaussian_weight

Minimum weight below which a Gaussian is not updated

property remove_low_count_gaussians

Remove Gaussians with a weight below min_gaussian_weight

property features

Features extraction configuration

property vad

VAD configuration for the UBM-GMM

property seed

Random seed for initialization from random frames

classmethod load(path)[source]

Load the GMM from a binary file

save(path)[source]

Save the GMM to a binary file

initialize_gmm(feats_collection, njobs=1)[source]

Initializes a single diagonal GMM

Also does multiple iterations of initial training. Adapted from [kaldi-init].

Parameters
  • feats_collection (FeaturesCollection) – The collection of features to initialize the GMM with.

  • njobs (int, optional) – Number of threads to use for computation, default to 1.

Raises

ValueError – If the features have unconsistent dimensions.

References

kaldi-init

https://kaldi-asr.org/doc/gmm-global-init-from-feats_8cc.html

gaussian_selection(feats_collection)[source]

Precompute Gaussian indices for pruning. For each frame, gives a list of the n best Gaussian indices sorted from best to worst.

Adapted from [kaldi-gselect].

Parameters

feats_collection (FeaturesCollection) – The collection of features to select the best Gaussians from.

References

kaldi-gselect

https://kaldi-asr.org/doc/gmm-gselect_8cc.html

gaussian_selection_to_post(feats_collection, min_post=None)[source]

Get per-frames posteriors

Given features and Gaussian-selection (gselect) information for a diagonal-covariance GMM, output per-frame posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold (and renormalizing the rest to sum to one).

Adapted from [kaldi-gselect-to-post]

Parameters
  • feats_collection (FeaturesCollection) – The collection of features to use to get the posteriors.

  • min_post (int, optional) – Optional, posteriors below this threshold will be pruned away and the rest will be renormalized.

Returns

posteriors (dict[str, list[list[tuple[int, float]]]]) – For each utterance, the posteriors are a list of size the number of frames of the corresponding features. For each frame, we have a list of tuples corresponding to the gaussians in the gaussian selection for this frame and their log-likelihood (if the log-likelihood is positive).

References

kaldi-gselect-to-post

https://kaldi-asr.org/doc/gmm-global-gselect-to-post_8cc.html

accumulate(feats_collection, weights_collection=None, njobs=1)[source]

Accumulate stats for training a diagonal-covariance GMM.

Adapted from [kaldi-acc]

Parameters
  • feats_collection (FeaturesCollection) – The collection of features to use to accumulate stats.

  • weights_collection (dict[str, ndarrays], optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have weights.keys() == feats_collections.keys(). Unweighted by default.

  • njobs (int, optional) – Number of threads to use for computation, default to 1.

Returns

gmm_accs (kaldi.gmm.AccumDiagGmm) – The accumulated stats.

References

kaldi-acc

https://kaldi-asr.org/doc/gmm-global-acc-stats_8cc.html

estimate(gmm_accs, mixup=None, perturb_factor=0.01)[source]

Estimate a diagonal-covariance GMM from the accumulated stats.

Adapted from [kaldi-gmm-est]

Parameters
  • gmm_accs (kaldi.gmm.AccumDiagGmm) – Accumulated stats

  • mixup (int, optional) – Increase number of mixture components to this overall target.

  • perturb_factor (float, optional) – While mixing up, perturb means by standard deviation times this factor.

References

kaldi-gmm-est

https://kaldi-asr.org/doc/gmm-global-est_8cc.html

process(utterances, njobs=1)[source]

Initialize the GMM, which sets the means to random data points and then does some iterations of EM. Train for a few iterations in parallel

Parameters
  • utterances (list of tuples) – The utterances can be defined in one of the following format: * 1-uple (or str): <wav-file> * 2-uple: <utterance-id> <wav-file> * 3-uple: <utterance-id> <wav-file> <speaker-id> * 4-uple: <utterance-id> <wav-file> <tstart> <tstop> * 5-uple: <utterance-id> <wav-file> <speaker-id> <tstart> <tstop>

  • njobs (int, optional) – Number of threads to use for computation, default to 1.

Raises

ValueError – On errors