UBM¶
Provides the DiagUbmProcessor class to train a Universal Background Model
Gaussian Mixture Model (UBMGMM) with diagonal covariances.
Uses the kaldi implementation of GMM (see [kaldigmm]).
The UBM is used as a preprocessing step by
VtlnProcessor
.
Examples
>>> from shennong.processor.ubm import DiagUbmProcessor
>>> wav = './test/data/test.wav'
>>> utterances = [('utt1', wav, 'spk1', 0, 1), ('utt2', wav, 'spk1', 1, 1.5)]
Initialize the UBMGMM with a given number of gaussians. Other options can be specified at construction, or after:
>>> num_gauss = 4
>>> ubm = DiagUbmProcessor(num_gauss, num_iters_init=10)
>>> ubm.num_iters = 3
Process the utterances to update the model.
>>> ubm.process(utterances)
Each gaussian of the model has as many dimensions as the features.
>>> import kaldi.gmm
>>> isinstance(ubm.gmm, kaldi.gmm.DiagGmm)
True
>>> means = ubm.gmm.get_means()
>>> means.num_rows == num_gauss
True
>>> means.num_cols
39
References

class
shennong.processor.ubm.
DiagUbmProcessor
(num_gauss, num_iters=4, num_gselect=15, initial_gauss_proportion=0.5, num_iters_init=20, num_frames=500000, subsample=5, min_gaussian_weight=0.0001, remove_low_count_gaussians=False, seed=0, features=None, vad=None)[source]¶ Bases:
shennong.base.BaseProcessor
Universal Background Model with Diagonal GMM

property
name
¶ Processor name

property
num_gauss
¶ Number of Gaussians in the model

property
num_iters
¶ Number of iterations of training.

property
num_iters_init
¶ Number of EM iterations for model initialization.

property
num_gselect
¶ Number of Gaussians per frame to limit computation to, for speed.

get_params
(deep=True)¶ Get parameters for this processor.
 Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
 Returns
params (mapping of string to any) – Parameter names mapped to their values.

property
initial_gauss_proportion
¶ Proportion of Gaussians to start with in initialization phase (then split)

property
log
¶ Processor logger

set_logger
(level, formatter='%(levelname)s  %(name)s  %(message)s')¶ Change level and/or format of the processor’s logger
 Parameters
level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatterobjects. By default display level and message. Use ‘%(asctime)s  %(levelname)s  %(name)s  %(message)s’ to display time, level, name and message.

set_params
(**params)¶ Set the parameters of this processor.
 Returns
self
 Raises
ValueError – If any given parameter in
params
is invalid for the processor.

property
num_frames
¶ Maximum numframes to keep in memory for model initialization.

property
subsample
¶ In main EM phase, use every n frames (a speedup)

property
min_gaussian_weight
¶ Minimum weight below which a Gaussian is not updated

property
remove_low_count_gaussians
¶ Remove Gaussians with a weight below min_gaussian_weight

property
features
¶ Features extraction configuration

property
vad
¶ VAD configuration for the UBMGMM

property
seed
¶ Random seed for initialization from random frames

initialize_gmm
(feats_collection, njobs=1)[source]¶ Initializes a single diagonal GMM
Also does multiple iterations of initial training. Adapted from [kaldiinit].
 Parameters
feats_collection (FeaturesCollection) – The collection of features to initialize the GMM with.
njobs (int, optional) – Number of threads to use for computation, default to 1.
 Raises
ValueError – If the features have unconsistent dimensions.
References

gaussian_selection
(feats_collection)[source]¶ Precompute Gaussian indices for pruning. For each frame, gives a list of the n best Gaussian indices sorted from best to worst.
Adapted from [kaldigselect].
 Parameters
feats_collection (FeaturesCollection) – The collection of features to select the best Gaussians from.
References

gaussian_selection_to_post
(feats_collection, min_post=None)[source]¶ Get perframes posteriors
Given features and Gaussianselection (gselect) information for a diagonalcovariance GMM, output perframe posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold (and renormalizing the rest to sum to one).
Adapted from [kaldigselecttopost]
 Parameters
feats_collection (FeaturesCollection) – The collection of features to use to get the posteriors.
min_post (int, optional) – Optional, posteriors below this threshold will be pruned away and the rest will be renormalized.
 Returns
posteriors (dict[str, list[list[tuple[int, float]]]]) – For each utterance, the posteriors are a list of size the number of frames of the corresponding features. For each frame, we have a list of tuples corresponding to the gaussians in the gaussian selection for this frame and their loglikelihood (if the loglikelihood is positive).
References

accumulate
(feats_collection, weights_collection=None, njobs=1)[source]¶ Accumulate stats for training a diagonalcovariance GMM.
Adapted from [kaldiacc]
 Parameters
feats_collection (FeaturesCollection) – The collection of features to use to accumulate stats.
weights_collection (dict[str, ndarrays], optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have
weights.keys() == feats_collections.keys()
. Unweighted by default.njobs (int, optional) – Number of threads to use for computation, default to 1.
 Returns
gmm_accs (kaldi.gmm.AccumDiagGmm) – The accumulated stats.
References

estimate
(gmm_accs, mixup=None, perturb_factor=0.01)[source]¶ Estimate a diagonalcovariance GMM from the accumulated stats.
Adapted from [kaldigmmest]
 Parameters
gmm_accs (kaldi.gmm.AccumDiagGmm) – Accumulated stats
mixup (int, optional) – Increase number of mixture components to this overall target.
perturb_factor (float, optional) – While mixing up, perturb means by standard deviation times this factor.
References

process
(utterances, njobs=1)[source]¶ Initialize the GMM, which sets the means to random data points and then does some iterations of EM. Train for a few iterations in parallel
 Parameters
utterances (list of tuples) – The utterances can be defined in one of the following format: * 1uple (or str): <wavfile> * 2uple: <utteranceid> <wavfile> * 3uple: <utteranceid> <wavfile> <speakerid> * 4uple: <utteranceid> <wavfile> <tstart> <tstop> * 5uple: <utteranceid> <wavfile> <speakerid> <tstart> <tstop>
njobs (int, optional) – Number of threads to use for computation, default to 1.
 Raises
ValueError – On errors

property