RASTA-PLP

Extraction of RASTA-PLP features from a speech signal

Audio –> RastaPlpProcessor –> Features

Implementation of the RASTA-PLP features extraction algorithm (see [labrosa] and [rastapy] for implementations and [Herm94] for the paper).

Examples

Compute RASTA-PLP features on some speech signal:

>>> from shennong.audio import Audio
>>> from shennong.processor.rastaplp import RastaPlpProcessor
>>> audio = Audio.load('./test/data/test.wav')
>>> processor = RastaPlpProcessor(order=8)
>>> features = processor.process(audio)
>>> features.shape
(140, 9)

The output dimension depends on the PLP order parameter:

>>> processor.order = 10
>>> features = processor.process(audio)
>>> features.shape
(140, 11)

References

labrosa

https://labrosa.ee.columbia.edu/matlab/rastamat/

rastapy

https://github.com/mystlee/rasta_py

Herm94

H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.

class shennong.processor.rastaplp.RastaPlpProcessor(sample_rate=16000, do_rasta=True, order=8, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True)[source]

Bases: shennong.processor.base.FramesProcessor

property name

Name of the processor

property ndims

Dimension of the output features frames

property do_rasta

If False, just calculate the PLP, default to True

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties()

Return the processors properties as a dictionary

property log

Processor logger

property order

Order of the PLP model

Must be an integer in [0, 12], 0 means no PLP

property preemph_coeff

Coefficient for use in signal preemphasis

process_all(signals, njobs=None)

Returns features processed from several input signals

This function processes the features in parallel jobs.

Parameters
  • signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input signals.

Raises

ValueError – If the njobs parameter is <= 0

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

Parameters
  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)

Returns the times label for the rows given by process()

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

process(signal)[source]

Returns features processed from an input signal

Parameters

signal (:class`~shennong.audio.Audio`) – The input audio signal to process features on

Returns

features (Features) – The computed features