Audio data¶

Provides the Audio class that handles audio signals

Note

Supports all audio format from ffmpeg (wav, mp3, flac, etc…). See https://www.ffmpeg.org/general.html#File-Formats for details.

The Audio class allows to load, save and manipulate multichannels audio data. The underlying audio samples can be of one of the following types (with the corresponding min and max):

Type

Min

Max

np.int16

-32768

+32767

np.int32

-2147483648

+2147483647

np.float32

-1.0

+1.0

np.float64

-1.0

+1.0

When loading an audio file with Audio.load(), those min/max are expected to be respected. When creating an Audio instance from a raw data array, the validate parameter in the class constructor and the method Audio.is_valid() make sure the data type and min/max are respected.

Examples

>>> import os
>>> import numpy as np
>>> from shennong.audio import Audio

Create 1000 samples of a stereo signal at 16 kHz:

>>> audio = Audio(np.random.random((1000, 2)), 16000)
>>> audio.data.shape
(1000, 2)
>>> audio.dtype
dtype('float64')
>>> audio.sample_rate
16000
>>> audio.nchannels
2
>>> audio.duration
0.0625

Resample the signal to 8 kHz and convert it to 16 bits integers:

>>> audio2 = audio.resample(8000).astype(np.int16)
>>> audio2.sample_rate
8000
>>> audio2.duration == audio.duration
True
>>> audio2.dtype
dtype('int16')
>>> audio2.is_valid()
True

Save the Audio instance as a wav file, load an existing wav file as an Audio instance:

>>> audio.save('stereo.wav')
>>> audio3 = Audio.load('stereo.wav')
>>> audio == audio3
True
>>> os.remove('stereo.wav')

Extract mono signal from a stereo one (left and right are instances of Audio as well):

>>> left = audio.channel(0)
>>> right = audio.channel(1)
>>> left.duration == right.duration == audio.duration
True
>>> left.nchannels == right.nchannels == 1
True

class shennong.audio.Audio(data, sample_rate, validate=True)[source]¶

Bases: object

Create an audio signal with the given data and sample_rate

data¶

The waveform audio signal, must be of one of the supported types (see above)

Type: numpy array, shape = [nsamples, nchannels]

sample_rate¶

The sample frequency of the data, in Hertz

Type: float

validate¶

When True, make sure the underlying data is valid (see is_valid()), default to True

Type: bool, optional

Raises: ValueError – If validate is True and the audio data if not valid (see is_valid())

property data¶: The numpy array of audio data

property sample_rate¶: The sample frequency of the signal in Hertz

property duration¶: The duration of the signal in seconds

property nchannels¶: The number of audio channels in the signal

property nsamples¶: The number of samples in the signal

property shape¶: Return the shape of the underlying data

property dtype¶: The numeric type of samples

property precision¶: The number of bits per sample

classmethod scan(filename)[source]¶

Returns the audio metadata without loading the file

Returns a Python namespace (a named tuple) metadata with the following fields:

metadata.nchannels : int, number of channels

metadata.sample_rate : int, sample frequency in Hz

metadata.nsamples : int, number of audio samples in the file

metadata.duration : float, audio duration in seconds

This method is usefull to access metadata of an audio file without loading it into memory, far more faster than load().

Parameters: filename (str) – Audio filename on which to retrieve metadata, must be an existing file
Returns: metadata (namespace) – A namespace with fields as described above
Raises: ValueError – If the filename is not a valid audio file that ffmpeg can process.

classmethod load(filename)[source]¶

Creates an Audio instance from a WAV file

Parameters: filename (str) – Path to the audio file to load, must be an existing file
Returns: audio (Audio) – The Audio instance initialized from the filename
Raises: ValueError – If the filename is not a valid audio file.

save(filename)[source]¶

Saves the audio data to a filename

Parameters: filename (str) – The audio file to create, format is guessed from extension
Raises: ValueError – If the file already exists or is unreachable

channel(index)[source]¶

Builds a mono signal from a multi-channel one

Parameters: index (int) – The audio channel to extract from the original signal
Returns: mono (Audio) – The extracted single-channel data
Raises: ValueError – If index >= nchannels()

resample(sample_rate, backend='sox')[source]¶

Returns the audio signal resampled at the given sample_rate

This method first rely on pysox (excepted if backend is ‘scipy’) and, if sox is not installed on your system or anything goes wrong it falls back to scipy.signal.resample.

The sox backend is very fast and accurate but relies on an external binary whereas scipy backend can be very slow but works in pure Python.

Parameters

sample_rate (int) – The sample frequency used to resample the signal, in Hz

Returns

audio (Audio) – An Audio instance containing the resampled signal
backend (str, optional) – The backend to use for resampling, must be ‘sox’ or ‘scipy’, default to ‘sox’

Raises

ValueError – If the backend is not ‘sox’ or ‘scipy’, or if the resampling failed

is_valid()[source]¶

Returns True if the audio data is valid, False otherwise

An Audio instance is valid if the underlying data type is supported (must be np.int16, np.int32, np.float32 or np.float64), and if the samples min/max are within the expected boundaries for the given data type (see above).

astype(dtype)[source]¶

Returns the audio signal converted to the dtype numeric type

The valid types are np.int16, np.int32, np.float32 or np.float64, see above for the types min and max.

Parameters: dtype (numeric type) – Must be an integer or a floating-point type in the types described above.
Raises: ValueError – If the requested dtype is not supported

segment(segments)[source]¶

Returns audio chunks segmented from the original signal

Parameters: segments (list of pairs of floats) – A list of pairs (tstart, tstop) of the start and stop indices (in seconds) of the signal chunks we are going to extract. The times tstart and tstop must be float, with tstart < tstop.
Returns: chunks (list of Audio) – The signal chunks created from the given segments
Raises: ValueError – If one element in segments is not a pair of float or if tstart >= tstop. If segments is not a list.