MFCC

Extraction of MFCC features from audio signals

Extract MFCC (Mel Frequency Cepstral Coeficients) from an audio signal. Uses the Kaldi implementation (see [kaldi-mfcc]):

Audio —> MfccProcessor —> Features

Examples

>>> from shennong.audio import Audio
>>> from shennong.processor.mfcc import MfccProcessor
>>> audio = Audio.load('./test/data/test.wav')

Initialize the MFCC processor with some options. Options can be specified at construction, or after:

>>> processor = MfccProcessor(sample_rate=audio.sample_rate)
>>> processor.window_type = 'hanning'
>>> processor.low_freq = 20
>>> processor.high_freq = -100  # nyquist - 100
>>> processor.use_energy = False  # use C0 instead

Compute the MFCC features with the specified options, the output is an instance of Features:

>>> mfcc = processor.process(audio)
>>> type(mfcc)
<class 'shennong.features.Features'>
>>> mfcc.shape[1] == processor.num_ceps
True

References

kaldi-mfcc

http://kaldi-asr.org/doc/feat.html#feat_mfcc

class shennong.processor.mfcc.MfccProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500, num_ceps=13, use_energy=True, energy_floor=0.0, raw_energy=True, cepstral_lifter=22.0, htk_compat=False)[source]

Bases: shennong.processor.base.MelFeaturesProcessor

Mel Frequency Cepstral Coeficients

property name

Name of the processor

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)

Return the processors properties as a dictionary

property high_freq

High cutoff frequency for mel bins in Hertz

If high_freq < 0, offset from the Nyquist frequency

property log

Processor logger

property low_freq

Low cutoff frequency for mel bins in Hertz

property num_bins

Number of triangular mel-frequency bins

The minimal number of bins is 3

property num_ceps

Number of cepstra in MFCC computation (including C0)

Must be smaller of equal to num_bins

property preemph_coeff

Coefficient for use in signal preemphasis

process(signal, vtln_warp=1.0)

Compute features with the specified options

Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.

Parameters
  • signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono

  • vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.

Returns

features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)

Returns the times label for the rows given by process()

property vtln_high

High inflection point in piecewise linear VTLN warping function

In Hertz. If vtln_high < 0, offset from high_freq

property vtln_low

Low inflection point in piecewise linear VTLN warping function

In Hertz

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property use_energy

Use energy (instead of C0) in MFCC computation

property energy_floor

Floor on energy (absolute, not relative) in MFCC computation

property raw_energy

If true, compute energy before preemphasis and windowing

property cepstral_lifter

Constant that controls scaling of MFCCs

property htk_compat

If True, get closer to HTK MFCC features

Put energy or C0 last and use a factor of sqrt(2) on C0.

Warning: Not sufficient to get HTK compatible features (need to change other parameters).

property ndims

Dimension of the output features frames