Features extraction pipeline

High-level functions for a complete features extraction pipeline

This module exposes two main functions get_default_config() that generates a configuration for the pipeline given some arguments, and extract_features() which takes as input a configuration and a list of utterances, extracts the features, do the postprocessing and returns the extracted features as an instance of FeaturesCollection.


>>> from shennong.features.pipeline import get_default_config, extract_features

Generates a configuration for MFCC extraction (including CMVN normalization by speaker, delta / delta-delta and pitch). The configuration is a dictionary:

>>> config = get_default_config('mfcc')
>>> config.keys()
dict_keys(['mfcc', 'pitch', 'cmvn', 'delta'])

Generates the same configuration, but without CMVN and without delta:

>>> config = get_default_config('mfcc', with_cmvn=False, with_delta=False)
>>> config.keys()
dict_keys(['mfcc', 'pitch'])

The returned configuration is intialized with default parameters. This is suitable for most usages, but you change them if you want. Here we are using a blackman windox for frame extraction, and we are changing the min/max F0 frequency for pitch extraction:

>>> config['mfcc']['window_type'] = 'blackman'
>>> config['mfcc']['blackman_coeff'] = 0.42
>>> config['pitch']['min_f0'] = 25
>>> config['pitch']['max_f0'] = 400

Generates a list of utterances to extract the features on (here we have 2 utterances from the same speaker and same file):

>>> wav = './test/data/test.wav'
>>> utterances = [('utt1', wav, 'spk1', 0, 1), ('utt2', wav, 'spk1', 1, 1.5)]

Extract the features:

>>> features = extract_features(config, utterances, njobs=1)
>>> features.keys()
dict_keys(['utt1', 'utt2'])
>>> type(features['utt1'])
<class 'shennong.features.features.Features'>
>>> features['utt1'].shape
(98, 16)

The extracted features embed a property dictionnary with information on the input audio, pipeline parameters, etc. The field ‘pipeline’ describe as a list the processing steps being executed, as well as the columns of the resulting features matrix (here MFCCs are on columns 0 to 12 and pitch on columns 13 to 15):

>>> p = features['utt1'].properties
>>> p.keys()
dict_keys(['pipeline', 'mfcc', 'speaker', 'audio', 'pitch'])
>>> p['pipeline']
[{'name': 'mfcc', 'columns': [0, 12]}, {'name': 'pitch', 'columns': [13, 15]}]

Returns the list of features that can be extracted by the pipeline.Audio

This list only includes main features extraction algorithms and excludes postprocessing. See also get_default_config().

shennong.features.pipeline.get_default_config(features, to_yaml=False, yaml_commented=True, with_pitch=True, with_cmvn=True, with_sliding_window_cmvn=False, with_delta=True, with_vtln=False)[source]

Returns the default configuration for the specified pipeline

The pipeline is specified with the main features it computes and the postprocessing steps it includes. The returned configuration can be a dictionnay or a YAML formatted string.

  • features (str) – The features extracted by the pipeline, must be ‘mfcc’, ‘filterbank’, ‘plp’ or ‘bottleneck’. See also valid_features().

  • to_yaml (bool, optional) – If False the result configuration is a dict, if True this is a YAML formatted string ready to be written to a file. Default to False.

  • yaml_commented (bool, optional) – If True add the docstring of each parameter as a comment in the YAML string, if False do nothing. This option has an effect only if to_yaml is True. Default to True.

  • with_pitch (bool, optional) – Configure the pipeline for pitch extraction, default to True

  • with_cmvn (bool, optional) – Configure the pipeline for CMVN normalization of the features, default to True.

  • with_sliding_window_cmvn (bool, optional) – Configure the pipeline for sliding window CMVN normalization of the features, default to False.

  • with_delta (bool, optional) – Configure the pipeline for features’s delta extraction, default to True.

  • with_vtln

  • with_vad_trimming (bool, optional) – Configure the pipeline for removing silent frames, default to False.


config (dict or str) – If to_yaml is True returns a YAML formatted string ready to be written to a file, else returns a dictionary.


ValueError – If features is not in valid_features().

shennong.features.pipeline.extract_features(configuration, utterances_index, njobs=1, log=<RootLogger None (INFO)>)[source]

Speech features extraction pipeline

Given a pipeline configuration and an utterances_index defining a list of utterances on which to extract features, this function applies the whole pipeline and returns the extracted features as an instance of FeaturesCollection. It uses njobs parallel subprocesses.

The utterances in the utterances_index can be defined in one of the following format (the format must be homogoneous across the index, i.e. only one format can be used):

  • 1-uple (or str): <wav-file>

  • 2-uple: <utterance-id> <wav-file>

  • 3-uple: <utterance-id> <wav-file> <speaker-id>

  • 4-uple: <utterance-id> <wav-file> <tstart> <tstop>

  • 5-uple: <utterance-id> <wav-file> <speaker-id> <tstart> <tstop>

  • config (dict or str) – The pipeline configuration, can be a dictionary, a path to a YAML file or a string formatted in YAML. To get a configuration example, see get_default_config()

  • utterances_index (sequence of tuples) – The list of utterances to extract the features on.

  • njobs (int, optional) – The number to subprocesses to execute in parallel, use a single process by default.

  • log (logging.Logger) – A logger to display messages during pipeline execution


features (FeaturesCollection) – The extracted speech features


ValueError – If the configuration or the utterances_index are invalid, or if something goes wrong during features extraction.

shennong.features.pipeline.extract_features_warp(configuration, utterances_index, warp, njobs=1, log=<RootLogger None (INFO)>)[source]

Speech features extraction pipeline when all features are warped by the same factor. Used in the process method of the VtlnProcessor.