UBM¶
Provides the DiagUbmProcessor class to train a Universal Background Model
Gaussian Mixture Model (UBM-GMM) with diagonal covariances.
Uses the kaldi implementation of GMM (see [kaldi-gmm]).
The UBM is used as a preprocessing step by
VtlnProcessor
.
Examples
>>> from shennong import Utterances
>>> from shennong.processor.ubm import DiagUbmProcessor
>>> wav = './test/data/test.wav'
>>> utterances = Utterances(
... [('utt1', wav, 'spk1', 0, 1), ('utt2', wav, 'spk1', 1, 1.4)])
Initialize the UBM-GMM with a given number of gaussians. Other options can be specified at construction, or after:
>>> num_gauss = 4
>>> ubm = DiagUbmProcessor(num_gauss, num_iters_init=10)
>>> ubm.num_iters = 3
Process the utterances to update the model.
>>> ubm.process(utterances)
Each gaussian of the model has as many dimensions as the features.
>>> import kaldi.gmm
>>> isinstance(ubm.gmm, kaldi.gmm.DiagGmm)
True
>>> means = ubm.gmm.get_means()
>>> means.num_rows == num_gauss
True
>>> means.num_cols
39
References
-
class
shennong.processor.ubm.
DiagUbmProcessor
(num_gauss, num_iters=4, num_gselect=15, initial_gauss_proportion=0.5, num_iters_init=20, num_frames=500000, subsample=5, min_gaussian_weight=0.0001, remove_low_count_gaussians=False, seed=0, features=None, vad=None)[source]¶ Bases:
shennong.base.BaseProcessor
Universal Background Model with Diagonal GMM
-
property
name
¶ Processor name
-
property
num_gauss
¶ Number of Gaussians in the model
-
property
num_iters
¶ Number of iterations of training.
-
property
num_iters_init
¶ Number of E-M iterations for model initialization.
-
property
num_gselect
¶ Number of Gaussians per frame to limit computation to, for speed.
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
property
initial_gauss_proportion
¶ Proportion of Gaussians to start with in initialization phase (then split)
-
property
log
¶ Processor logger
-
set_logger
(level, formatter='%(levelname)s - %(name)s - %(message)s')¶ Change level and/or format of the processor’s logger
- Parameters
level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
property
num_frames
¶ Maximum num-frames to keep in memory for model initialization.
-
property
subsample
¶ In main E-M phase, use every n frames (a speedup)
-
property
min_gaussian_weight
¶ Minimum weight below which a Gaussian is not updated
-
property
remove_low_count_gaussians
¶ Remove Gaussians with a weight below min_gaussian_weight
-
property
features
¶ Features extraction configuration
-
property
vad
¶ VAD configuration for the UBM-GMM
-
property
seed
¶ Random seed for initialization from random frames
-
initialize_gmm
(feats_collection, njobs=1)[source]¶ Initializes a single diagonal GMM
Also does multiple iterations of initial training. Adapted from [kaldi-init].
- Parameters
feats_collection (FeaturesCollection) – The collection of features to initialize the GMM with.
njobs (int, optional) – Number of threads to use for computation, default to 1.
- Raises
ValueError – If the features have unconsistent dimensions.
References
-
gaussian_selection
(feats_collection)[source]¶ Precompute Gaussian indices for pruning. For each frame, gives a list of the n best Gaussian indices sorted from best to worst.
Adapted from [kaldi-gselect].
- Parameters
feats_collection (FeaturesCollection) – The collection of features to select the best Gaussians from.
References
-
gaussian_selection_to_post
(feats_collection, min_post=None)[source]¶ Get per-frames posteriors
Given features and Gaussian-selection (gselect) information for a diagonal-covariance GMM, output per-frame posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold (and renormalizing the rest to sum to one).
Adapted from [kaldi-gselect-to-post]
- Parameters
feats_collection (FeaturesCollection) – The collection of features to use to get the posteriors.
min_post (int, optional) – Optional, posteriors below this threshold will be pruned away and the rest will be renormalized.
- Returns
posteriors (dict[str, list[list[tuple[int, float]]]]) – For each utterance, the posteriors are a list of size the number of frames of the corresponding features. For each frame, we have a list of tuples corresponding to the gaussians in the gaussian selection for this frame and their log-likelihood (if the log-likelihood is positive).
References
-
accumulate
(feats_collection, weights_collection=None, njobs=1)[source]¶ Accumulate stats for training a diagonal-covariance GMM.
Adapted from [kaldi-acc]
- Parameters
feats_collection (FeaturesCollection) – The collection of features to use to accumulate stats.
weights_collection (dict[str, ndarrays], optional) – For each features in the collection, an array of weights to apply on the features frames, if specified we must have
weights.keys() == feats_collections.keys()
. Unweighted by default.njobs (int, optional) – Number of threads to use for computation, default to 1.
- Returns
gmm_accs (kaldi.gmm.AccumDiagGmm) – The accumulated stats.
References
-
estimate
(gmm_accs, mixup=None, perturb_factor=0.01)[source]¶ Estimate a diagonal-covariance GMM from the accumulated stats.
Adapted from [kaldi-gmm-est]
- Parameters
gmm_accs (kaldi.gmm.AccumDiagGmm) – Accumulated stats
mixup (int, optional) – Increase number of mixture components to this overall target.
perturb_factor (float, optional) – While mixing up, perturb means by standard deviation times this factor.
References
-
process
(utterances, njobs=1)[source]¶ Initialize the GMM, which sets the means to random data points and then does some iterations of EM. Train for a few iterations in parallel
- Parameters
utterances (
Utterances
) – The list of utterances to train the VTLN on.njobs (int, optional) – Number of threads to use for computation, default to 1.
- Raises
ValueError – On errors
-
property