Spectrogram¶
Extraction of spectrogram from audio signals
Extract spectrogram (log of the power spectrum) from an audio signal. Uses the Kaldi implementation (see [kaldi-spec]):
Examples
>>> from shennong.audio import Audio
>>> from shennong.processor.spectrogram import SpectrogramProcessor
>>> audio = Audio.load('./test/data/test.wav')
Initialize the spectrogram processor with some options and compute the features:
>>> processor = SpectrogramProcessor(sample_rate=audio.sample_rate)
>>> processor.window_type = 'hanning'
>>> spect = processor.process(audio)
>>> spect.shape
(140, 257)
References
- 
class shennong.processor.spectrogram.SpectrogramProcessor(sample_rate=16000, frame_shift=0.01, frame_length=0.025, dither=1.0, preemph_coeff=0.97, remove_dc_offset=True, window_type='povey', round_to_power_of_two=True, blackman_coeff=0.42, snip_edges=True, energy_floor=0.0, raw_energy=True)[source]¶
- Bases: - shennong.processor.base.FramesProcessor- Spectogram - 
property name¶
- Name of the processor 
 - 
property ndims¶
- Dimension of the output features frames 
 - 
property blackman_coeff¶
- Constant coefficient for generalized Blackman window - Used only if window_type is ‘blackman’ 
 - 
property dither¶
- Amount of dithering - 0.0 means no dither 
 - 
property energy_floor¶
 - 
property frame_length¶
- Frame length in seconds 
 - 
property frame_shift¶
- Frame shift in seconds 
 - 
get_params(deep=True)¶
- Get parameters for this processor. - Parameters
- deep (boolean, optional) – If True, will return the parameters for this processor and contained subobjects that are processors. Default to True. 
- Returns
- params (mapping of string to any) – Parameter names mapped to their values. 
 
 - 
get_properties(**kwargs)¶
- Return the processors properties as a dictionary 
 - 
property log¶
- Processor logger 
 - 
property preemph_coeff¶
- Coefficient for use in signal preemphasis 
 - 
process_all(utterances, njobs=None, **kwargs)¶
- Returns features processed from several input utterances - This function processes the features in parallel jobs. - Parameters
- utterances (:class`~shennong.uttterances.Utterances`) – The utterances on which to process features on. 
- njobs (int, optional) – The number of parallel jobs to run in background. Default to the number of CPU cores available on the machine. 
- **kwargs (dict, optional) – Extra arguments to be forwarded to the process method. Keys must be the same as for utterances. 
 
- Returns
- features ( - FeaturesCollection) – The computed features on each input signal. The keys of output features are the keys of the input utterances.
- Raises
- ValueError – If the njobs parameter is <= 0 or if an entry is missing in optioanl kwargs. 
 
 - 
property remove_dc_offset¶
- If True, subtract mean from waveform on each frame 
 - 
property round_to_power_of_two¶
- If true, round window size to power of two - This is done by zero-padding input to FFT 
 - 
property sample_rate¶
- Waveform sample frequency in Hertz - Must match the sample rate of the signal specified in process 
 - 
set_logger(level, formatter='%(levelname)s - %(name)s - %(message)s')¶
- Change level and/or format of the processor’s logger - Parameters
- level (str) – The minimum log level handled by the logger (any message above this level will be ignored). Must be ‘debug’, ‘info’, ‘warning’ or ‘error’. 
- formatter (str, optional) – A string to format the log messages, see https://docs.python.org/3/library/logging.html#formatter-objects. By default display level and message. Use ‘%(asctime)s - %(levelname)s - %(name)s - %(message)s’ to display time, level, name and message. 
 
 
 - 
set_params(**params)¶
- Set the parameters of this processor. - Returns
- self 
- Raises
- ValueError – If any given parameter in - paramsis invalid for the processor.
 
 - 
property snip_edges¶
- If true, output only frames that completely fit in the file - When True the number of frames depends on the frame_length. If False, the number of frames depends only on the frame_shift, and we reflect the data at the ends. 
 - 
property window_type¶
- Type of window - Must be ‘hamming’, ‘hanning’, ‘povey’, ‘rectangular’ or ‘blackman’ 
 - 
property raw_energy¶
 - 
process(signal)[source]¶
- Compute spectrogram with the specified options - Parameters
- signal (Audio, shape = [nsamples, 1]) – The input audio signal to compute the features on, must be mono 
- Returns
- features (Features, shape = [nframes, ndims]) – The computed features, output will have as many rows as there are frames (depends on the specified options frame_shift and frame_length). 
- Raises
- ValueError – If the input signal has more than one channel (i.e. is not mono). If sample_rate != signal.sample_rate. 
 
 
- 
property