PLP

Provides the PlpProcessor class to extract PLP features

Extract PLP (Perceptual Linear Predictive analysis of speech) from an audio signal. Uses the Kaldi implementation (see [Hermansky1990] and [kaldi-plp]). Optionally apply RASTA filtering (see [Herm94]).

Audio —> PlpProcessor —> Features

Examples

>>> from shennong.audio import Audio
>>> from shennong.processor.plp import PlpProcessor
>>> audio = Audio.load('./test/data/test.wav')

Initialize the PLP processor with some options. Options can be specified at construction, or after:

>>> processor = PlpProcessor()
>>> processor.sample_rate = audio.sample_rate

Here we apply RASTA filters

>>> processor.rasta = True

Compute the PLP features with the specified options, the output is an instance of Features:

>>> plp = processor.process(audio)
>>> type(plp)
<class 'shennong.features.Features'>
>>> plp.shape[1] == processor.num_ceps
True

References

Hermansky1990

H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, no. 4, pages 1738–1752 (1990)`

Herm94

H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.

kaldi-plp

http://kaldi-asr.org/doc/feat.html#feat_plp

class shennong.processor.plp.RastaFilter(size)[source]

Bases: object

Rasta filter for Rasta PLP implementation

Reimplemented after [labrosa] and [rastapy] on a frame by frame basis. Original implementation takes the whole signal at once.

Parameters

size (int) – The dimension of the frames to filter

References

labrosa

https://labrosa.ee.columbia.edu/matlab/rastamat/

rastapy

https://github.com/mystlee/rasta_py

reset()[source]

Initializes the filter state

filter(frame, do_log=True)[source]

RASTA filtering of a mel frame

Parameters
  • frame (numpy array, shape = [size, 1]) – The frame vector to filter.

  • do_log (bool, optional) – When True move to the log domain before filtering, and do inverse log after. When False the frame is expected to be in log domain already. Default to true.

Returns

filtered (numpy array, shape = [size, 1]) – The filtered frame.

class shennong.processor.plp.PlpProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, rasta=False, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500, lpc_order=12, num_ceps=13, use_energy=True, energy_floor=0.0, raw_energy=True, compress_factor=0.3333333333333333, cepstral_lifter=22, cepstral_scale=1.0, htk_compat=False)[source]

Bases: shennong.processor.base.MelFeaturesProcessor

Perceptive linear predictive features

property name

Name of the processor

property rasta

Whether to do RASTA filtering

property lpc_order

Order of LPC analysis in PLP computation

property num_ceps

Number of cepstra in PLP computation (including C0)

Must be positive and smaller or equal to lpc_order + 1.

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)

Return the processors properties as a dictionary

property high_freq

High cutoff frequency for mel bins in Hertz

If high_freq < 0, offset from the Nyquist frequency

property log

Processor logger

property low_freq

Low cutoff frequency for mel bins in Hertz

property num_bins

Number of triangular mel-frequency bins

The minimal number of bins is 3

property preemph_coeff

Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters
  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)

Returns the times label for the rows given by process()

property use_energy

Use energy (instead of C0) for zeroth PLP feature

property vtln_high

High inflection point in piecewise linear VTLN warping function

In Hertz. If vtln_high < 0, offset from high_freq

property vtln_low

Low inflection point in piecewise linear VTLN warping function

In Hertz

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property energy_floor

Floor on energy (absolute, not relative) in PLP computation

property raw_energy

If true, compute energy before preemphasis and windowing

property compress_factor

Compression factor in PLP computation

property cepstral_lifter

Constant that controls scaling of PLPs

property cepstral_scale

Scaling constant in PLP computation

property htk_compat

If True, get closer to HTK PLP features

Put energy or C0 last.

Warning: Not sufficient to get HTK compatible features (need to change other parameters)

property ndims

Dimension of the output features frames

process(signal, vtln_warp=1.0)[source]

Compute Rasta-PLP features with the specified options

Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.

Parameters
  • signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono

  • vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.

Returns

features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.