Extraction of energy from audio signals

Audio –> EnergyProcessor –> Features

Computes the energy on window frames extracted from an audio signal. This algorithm is identical to the first coefficient of MfccProcessor or PlpProcessor.


>>> from import Audio
>>> from import EnergyProcessor
>>> audio = Audio.load('./test/data/test.wav')

Computes energy on the audio signal:

>>> proc = EnergyProcessor(sample_rate=audio.sample_rate)
>>> energy1 = proc.process(audio)
>>> energy1.shape
(140, 1)

By default the energy is log-compressed, you can desactivate compression available options for compression are ‘log’, ‘sqrt’ and ‘off’:

>>> proc.compression = 'off'
>>> energy2 = proc.process(audio)
>>> np.allclose(np.log(,, rtol=1)

The two energies above are not strictly identical because of dithering.

You can also fix the framing and windowing parameters:

>>> proc.frame_shift = 0.02
>>> proc.frame_length = 0.05
>>> proc.window_type = 'hanning'
>>> energy3 = proc.process(audio)
>>> energy3.shape
(69, 1)
class, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, raw_energy=True, compression='log')[source]

Bases: shennong.processor.base.FramesProcessor

property name

Name of the processor

property ndims

Dimension of the output features frames

property blackman_coeff

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property compression

Type of energy compression

Must be ‘off’ (disable compression), ‘log’ (natural logarithm) or ‘sqrt’ (squared root).

property dither

Amount of dithering

0.0 means no dither

property frame_length

Frame length in seconds

property frame_shift

Frame shift in seconds


Get parameters for this processor.


deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.


params (mapping of string to any) – Parameter names mapped to their values.


Return the processors properties as a dictionary

property log

Processor logger

property preemph_coeff

Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)

Returns features processed from several input utterances

This function processes the features in parallel jobs.

  • utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.

  • njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.

  • **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.


features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.


ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset

If True, subtract mean from waveform on each frame

property round_to_power_of_two

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')

Change level and/or format of the processor’s logger

  • level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.

  • formatter (str, optional) – A string to format the log messages, see By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.


Set the parameters of this processor.




ValueError – If any given parameter in params is invalid for the processor.

property snip_edges

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.


Returns the times label for the rows given by process()

property window_type

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property raw_energy

If true, compute energy before preemphasis and windowing


Computes energy on the input signal


signal (audioData) –


energy (Features) – The computed - and compressed - energy


ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.