PLP¶

Provides the PlpProcessor class to extract PLP features

Extract PLP (Perceptual Linear Predictive analysis of speech) from an audio signal. Uses the Kaldi implementation (see [Hermansky1990] and [kaldi-plp]). Optionally apply RASTA filtering (see [Herm94]).

Audio —> PlpProcessor —> Features

Examples

>>> from shennong.audio import Audio
>>> from shennong.processor.plp import PlpProcessor
>>> audio = Audio.load('./test/data/test.wav')

Initialize the PLP processor with some options. Options can be specified at construction, or after:

>>> processor = PlpProcessor()
>>> processor.sample_rate = audio.sample_rate

Here we apply RASTA filters

>>> processor.rasta = True

Compute the PLP features with the specified options, the output is an instance of Features:

>>> plp = processor.process(audio)
>>> type(plp)
<class 'shennong.features.Features'>
>>> plp.shape[1] == processor.num_ceps
True

References

Hermansky1990: H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, no. 4, pages 1738–1752 (1990)`
Herm94: H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.
kaldi-plp: http://kaldi-asr.org/doc/feat.html#feat_plp

class shennong.processor.plp.RastaFilter(size)[source]¶

Bases: object

Rasta filter for Rasta PLP implementation

Reimplemented after [labrosa] and [rastapy] on a frame by frame basis. Original implementation takes the whole signal at once.

Parameters: size (int) – The dimension of the frames to filter

References

labrosa: https://labrosa.ee.columbia.edu/matlab/rastamat/
rastapy: https://github.com/mystlee/rasta_py

reset()[source]¶: Initializes the filter state

filter(frame, do_log=True)[source]¶

RASTA filtering of a mel frame

Parameters

frame (numpy array, shape = [size, 1]) – The frame vector to filter.
do_log (bool, optional) – When True move to the log domain before filtering, and do inverse log after. When False the frame is expected to be in log domain already. Default to true.

Returns

filtered (numpy array, shape = [size, 1]) – The filtered frame.

class shennong.processor.plp.PlpProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, rasta=False, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500, lpc_order=12, num_ceps=13, use_energy=True, energy_floor=0.0, raw_energy=True, compress_factor=0.3333333333333333, cepstral_lifter=22, cepstral_scale=1.0, htk_compat=False)[source]¶

Bases: shennong.processor.base.MelFeaturesProcessor

Perceptive linear predictive features

property name¶: Name of the processor

property rasta¶: Whether to do RASTA filtering

property lpc_order¶: Order of LPC analysis in PLP computation

property num_ceps¶

Number of cepstra in PLP computation (including C0)

Must be positive and smaller or equal to lpc_order + 1.

property blackman_coeff¶

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither¶

Amount of dithering

0.0 means no dither

property frame_length¶: Frame length in seconds

property frame_shift¶: Frame shift in seconds

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)¶: Return the processors properties as a dictionary

property high_freq¶

High cutoff frequency for mel bins in Hertz

If high_freq < 0, offset from the Nyquist frequency

property log¶: Processor logger

property low_freq¶: Low cutoff frequency for mel bins in Hertz

property num_bins¶

Number of triangular mel-frequency bins

The minimal number of bins is 3

property preemph_coeff¶: Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset¶: If True, subtract mean from waveform on each frame

property round_to_power_of_two¶

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate¶

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property snip_edges¶

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)¶: Returns the times label for the rows given by process()

property use_energy¶: Use energy (instead of C0) for zeroth PLP feature

property vtln_high¶

High inflection point in piecewise linear VTLN warping function

In Hertz. If vtln_high < 0, offset from high_freq

property vtln_low¶

Low inflection point in piecewise linear VTLN warping function

In Hertz

property window_type¶

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property energy_floor¶: Floor on energy (absolute, not relative) in PLP computation

property raw_energy¶: If true, compute energy before preemphasis and windowing

property compress_factor¶: Compression factor in PLP computation

property cepstral_lifter¶: Constant that controls scaling of PLPs

property cepstral_scale¶: Scaling constant in PLP computation

property htk_compat¶

If True, get closer to HTK PLP features

Put energy or C0 last.

Warning: Not sufficient to get HTK compatible features (need to change other parameters)

property ndims¶: Dimension of the output features frames

process(signal, vtln_warp=1.0)[source]¶

Compute Rasta-PLP features with the specified options

Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.

Parameters

signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono
vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.

Returns

features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.