Energy

Extraction of energy from audio signals

Audio –> EnergyProcessor –> Features

Computes the energy on window frames extracted from an audio signal. This algorithm is identical to the first coefficient of MfccProcessor or PlpProcessor.

Examples

>>> from shennong.audio import Audio
>>> from shennong.features.processor.energy import EnergyProcessor
>>> audio = Audio.load('./test/data/test.wav')

Computes energy on the audio signal:

>>> proc = EnergyProcessor(sample_rate=audio.sample_rate)
>>> energy1 = proc.process(audio)
>>> energy1.shape
(140, 1)

By default the energy is log-compressed, you can desactivate compression available options for compression are ‘log’, ‘sqrt’ and ‘off’:

>>> proc.compression = 'off'
>>> energy2 = proc.process(audio)
>>> np.allclose(np.log(energy2.data), energy1.data, rtol=1)
True

The two energies above are not strictly identical because of dithering.

You can also fix the framing and windowing parameters:

>>> proc.frame_shift = 0.02
>>> proc.frame_length = 0.05
>>> proc.window_type = 'hanning'
>>> energy3 = proc.process(audio)
>>> energy3.shape
(69, 1)
class shennong.features.processor.energy.EnergyProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, raw_energy=True, compression='log')[source]

Bases: shennong.features.processor.base.FramesProcessor

property name

Name of the processor

property ndims

Dimension of the output features frames

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property compression

Type of energy compression

Must be ‘off’ (disable compression), ‘log’ (natural logarithm) or ‘sqrt’ (squared root).

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds

get_params(deep=True)

Get parameters for this processor.

Parameters

deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.

Returns

params (mapping of string to any) – Parameter names mapped to their values.

get_properties()

Return the processors properties as a dictionary

property preemph_coeff

Coefficient for use in signal preemphasis

process_all(signals, njobs=None)

Returns features processed from several input signals

This function processes the features in parallel jobs.

Parameters
  • signals (dict of :class`~shennong.audio.Audio`) – A dictionnary of input audio signals to process features on, where the keys are item names and values are audio signals.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input signals.

Raises

ValueError – If the njobs parameter is <= 0

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_params(**params)

Set the parameters of this processor.

Returns

self

Raises

ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)

Returns the times label for the rows given by process()

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property raw_energy

If true, compute energy before preemphasis and windowing

process(signal)[source]

Computes energy on the input signal

Parameters

signal (audioData) –

Returns

energy (Features) – The computed - and compressed - energy

Raises

ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.