Energy¶

Extraction of energy from audio signals

Audio –> EnergyProcessor –> Features

Computes the energy on window frames extracted from an audio signal. This algorithm is identical to the first coefficient of MfccProcessor or PlpProcessor.

Examples

>>> from shennong.audio import Audio
>>> from shennong.processor.energy import EnergyProcessor
>>> audio = Audio.load('./test/data/test.wav')

Computes energy on the audio signal:

>>> proc = EnergyProcessor(sample_rate=audio.sample_rate)
>>> energy1 = proc.process(audio)
>>> energy1.shape
(140, 1)

By default the energy is log-compressed, you can desactivate compression available options for compression are ‘log’, ‘sqrt’ and ‘off’:

>>> proc.compression = 'off'
>>> energy2 = proc.process(audio)
>>> np.allclose(np.log(energy2.data), energy1.data, rtol=1)
True

The two energies above are not strictly identical because of dithering.

You can also fix the framing and windowing parameters:

>>> proc.frame_shift = 0.02
>>> proc.frame_length = 0.05
>>> proc.window_type = 'hanning'
>>> energy3 = proc.process(audio)
>>> energy3.shape
(69, 1)

class shennong.processor.energy.EnergyProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, raw_energy=True, compression='log')[source]¶

Bases: shennong.processor.base.FramesProcessor

property name¶: Name of the processor

property ndims¶: Dimension of the output features frames

property blackman_coeff¶

Constant coefficient for generalized Blackman window

Used only if window_type is ‘blackman’

property compression¶

Type of energy compression

Must be ‘off’ (disable compression), ‘log’ (natural logarithm) or ‘sqrt’ (squared root).

property dither¶

Amount of dithering

0.0 means no dither

property frame_length¶: Frame length in seconds

property frame_shift¶: Frame shift in seconds

get_params(deep=True)¶

Get parameters for this processor.

Parameters: deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True.
Returns: params (mapping of string to any) – Parameter names mapped to their values.

get_properties(**kwargs)¶: Return the processors properties as a dictionary

property log¶: Processor logger

property preemph_coeff¶: Coefficient for use in signal preemphasis

process_all(utterances, njobs=None, **kwargs)¶

Returns features processed from several input utterances

This function processes the features in parallel jobs.

Parameters

utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on.
njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine.
**kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances.

Returns

features (FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.

Raises

ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs.

property remove_dc_offset¶: If True, subtract mean from waveform on each frame

property round_to_power_of_two¶

If true, round window size to power of two

This is done by zero-padding input to FFT

property sample_rate¶

Waveform sample frequency in Hertz

Must match the sample rate of the signal specified in process

set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶

Change level and/or format of the processor’s logger

Parameters

level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’.
formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message.

set_params(**params)¶

Set the parameters of this processor.

Returns: self
Raises: ValueError – If any given parameter in params is invalid for the processor.

property snip_edges¶

If true, output only frames that completely fit in the file

When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends.

times(nframes)¶: Returns the times label for the rows given by process()

property window_type¶

Type of window

Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’

property raw_energy¶: If true, compute energy before preemphasis and windowing

process(signal)[source]¶

Computes energy on the input signal

Parameters: signal (audioData) –
Returns: energy (Features) – The computed - and compressed - energy
Raises: ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate.