PLP¶
Provides the PlpProcessor class to extract PLP features
Extract PLP (Perceptual Linear Predictive analysis of speech) from an audio signal. Uses the Kaldi implementation (see [Hermansky1990] and [kaldi-plp]). Optionally apply RASTA filtering (see [Herm94]).
Examples
>>> from shennong.audio import Audio
>>> from shennong.processor.plp import PlpProcessor
>>> audio = Audio.load('./test/data/test.wav')
Initialize the PLP processor with some options. Options can be specified at construction, or after:
>>> processor = PlpProcessor()
>>> processor.sample_rate = audio.sample_rate
Here we apply RASTA filters
>>> processor.rasta = True
Compute the PLP features with the specified options, the output is an
instance of Features
:
>>> plp = processor.process(audio)
>>> type(plp)
<class 'shennong.features.Features'>
>>> plp.shape[1] == processor.num_ceps
True
References
- Hermansky1990
H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, no. 4, pages 1738–1752 (1990)`
- Herm94
H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.
- kaldi-plp
-
class
shennong.processor.plp.
RastaFilter
(size)[source]¶ Bases:
object
Rasta filter for Rasta PLP implementation
Reimplemented after [labrosa] and [rastapy] on a frame by frame basis. Original implementation takes the whole signal at once.
- Parameters
size (int) – The dimension of the frames to filter
References
-
filter
(frame, do_log=True)[source]¶ RASTA filtering of a mel frame
- Parameters
frame (numpy array, shape = [size, 1]) – The frame vector to filter.
do_log (bool, optional) – When True move to the log domain before filtering, and do inverse log after. When False the frame is expected to be in log domain already. Default to true.
- Returns
filtered (numpy array, shape = [size, 1]) – The filtered frame.
-
class
shennong.processor.plp.
PlpProcessor
(sample_rate=16000, frame_shift=0.01, frame_length=0.025, rasta=False, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, num_bins=23, low_freq=20, high_freq=0, vtln_low=100, vtln_high=- 500, lpc_order=12, num_ceps=13, use_energy=True, energy_floor=0.0, raw_energy=True, compress_factor=0.3333333333333333, cepstral_lifter=22, cepstral_scale=1.0, htk_compat=False)[source]¶ Bases:
shennong.processor.base.MelFeaturesProcessor
Perceptive linear predictive features
-
property
name
¶ Name of the processor
-
property
rasta
¶ Whether to do RASTA filtering
-
property
lpc_order
¶ Order of LPC analysis in PLP computation
-
property
num_ceps
¶ Number of cepstra in PLP computation (including C0)
Must be positive and smaller or equal to lpc_order + 1.
-
property
blackman_coeff
¶ Constant coefficient for generalized Blackman window
Used only if window_type is ‘blackman’
-
property
dither
¶ Amount of dithering
0.0 means no dither
-
property
frame_length
¶ Frame length in seconds
-
property
frame_shift
¶ Frame shift in seconds
-
get_params
(deep=True)¶ Get parameters for this processor.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
- Returns
params (mapping of string to any) – Parameter names mapped to their values.
-
get_properties
(**kwargs)¶ Return the processors properties as a dictionary
-
property
high_freq
¶ High cutoff frequency for mel bins in Hertz
If high_freq < 0, offset from the Nyquist frequency
-
property
log
¶ Processor logger
-
property
low_freq
¶ Low cutoff frequency for mel bins in Hertz
-
property
num_bins
¶ Number of triangular mel-frequency bins
The minimal number of bins is 3
-
property
preemph_coeff
¶ Coefficient for use in signal preemphasis
-
process_all
(utterances, njobs=None, **kwargs)¶ Returns features processed from several input utterances
This function processes the features in parallel jobs.
- Parameters
utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.
- Returns
features (
FeaturesCollection
) – The computed features on each input signal. The keys of output features are the keys of the input utterances.- Raises
ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.
-
property
remove_dc_offset
¶ If True, subtract mean from waveform on each frame
-
property
round_to_power_of_two
¶ If true, round window size to power of two
This is done by zero-padding input to FFT
-
property
sample_rate
¶ Waveform sample frequency in Hertz
Must match the sample rate of the signal specified in process
-
set_logger
(level, formatter='%(levelname)s - %(name)s - %(message)s')¶ Change level and/or format of the processor’s logger
- Parameters
level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.
-
set_params
(**params)¶ Set the parameters of this processor.
- Returns
self
- Raises
ValueError – If any given parameter in
params
is invalid for the processor.
-
property
snip_edges
¶ If true, output only frames that completely fit in the file
When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.
-
property
use_energy
¶ Use energy (instead of C0) for zeroth PLP feature
-
property
vtln_high
¶ High inflection point in piecewise linear VTLN warping function
In Hertz. If vtln_high < 0, offset from high_freq
-
property
vtln_low
¶ Low inflection point in piecewise linear VTLN warping function
In Hertz
-
property
window_type
¶ Type of window
Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’
-
property
energy_floor
¶ Floor on energy (absolute, not relative) in PLP computation
-
property
raw_energy
¶ If true, compute energy before preemphasis and windowing
-
property
compress_factor
¶ Compression factor in PLP computation
-
property
cepstral_lifter
¶ Constant that controls scaling of PLPs
-
property
cepstral_scale
¶ Scaling constant in PLP computation
-
property
htk_compat
¶ If True, get closer to HTK PLP features
Put energy or C0 last.
Warning: Not sufficient to get HTK compatible features (need to change other parameters)
-
property
ndims
¶ Dimension of the output features frames
-
process
(signal, vtln_warp=1.0)[source]¶ Compute Rasta-PLP features with the specified options
Do an optional feature-level vocal tract length normalization (VTLN) when vtln_warp != 1.0.
- Parameters
signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono
vtln_warp (float, optional) – The VTLN warping factor to be applied when computing features. Be 1.0 by default, meaning no warping is to be done.
- Returns
features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length).
- Raises
ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.
-
property