Audio data

Provides the Audio class that handles audio signals

Note

Supports all audio format from ffmpeg (wav, mp3, flac, etc…). See https://www.ffmpeg.org/general.html#File-Formats for details.

The Audio class allows to load, save and manipulate multichannels audio data. The underlying audio samples can be of one of the following types (with the corresponding min and max):

Type

Min

Max

np.int16

-32768

+32767

np.int32

-2147483648

+2147483647

np.float32

-1.0

+1.0

np.float64

-1.0

+1.0

When loading an audio file with Audio.load(), those min/max are expected to be respected. When creating an Audio instance from a raw data array, the validate parameter in the class constructor and the method Audio.is_valid() make sure the data type and min/max are respected.

Examples

>>> import os
>>> import numpy as np
>>> from shennong.audio import Audio

Create 1000 samples of a stereo signal at 16 kHz:

>>> audio = Audio(np.random.random((1000, 2)), 16000)
>>> audio.data.shape
(1000, 2)
>>> audio.dtype
dtype('float64')
>>> audio.sample_rate
16000
>>> audio.nchannels
2
>>> audio.duration
0.0625

Resample the signal to 8 kHz and convert it to 16 bits integers:

>>> audio2 = audio.resample(8000).astype(np.int16)
>>> audio2.sample_rate
8000
>>> audio2.duration == audio.duration
True
>>> audio2.dtype
dtype('int16')
>>> audio2.is_valid()
True

Save the Audio instance as a wav file, load an existing wav file as an Audio instance:

>>> audio.save('stereo.wav')
>>> audio3 = Audio.load('stereo.wav')
>>> audio == audio3
True
>>> os.remove('stereo.wav')

Extract mono signal from a stereo one (left and right are instances of Audio as well):

>>> left = audio.channel(0)
>>> right = audio.channel(1)
>>> left.duration == right.duration == audio.duration
True
>>> left.nchannels == right.nchannels == 1
True
class shennong.audio.Audio(data, sample_rate, validate=True)[source]

Bases: object

Create an audio signal with the given data and sample_rate

data

The waveform audio signal, must be of one of the supported types (see above)

Type

numpy array, shape = [nsamples, nchannels]

sample_rate

The sample frequency of the data, in Hertz

Type

float

validate

When True, make sure the underlying data is valid (see is_valid()), default to True

Type

bool, optional

Raises

ValueError – If validate is True and the audio data if not valid (see is_valid())

property data

The numpy array of audio data

property sample_rate

The sample frequency of the signal in Hertz

property duration

The duration of the signal in seconds

property nchannels

The number of audio channels in the signal

property nsamples

The number of samples in the signal

property shape

Return the shape of the underlying data

property dtype

The numeric type of samples

property precision

The number of bits per sample

classmethod scan(filename)[source]

Returns the audio metadata without loading the file

Returns a Python namespace (a named tuple) metadata with the following fields:

  • metadata.nchannels : int, number of channels

  • metadata.sample_rate : int, sample frequency in Hz

  • metadata.nsamples : int, number of audio samples in the file

  • metadata.duration : float, audio duration in seconds

This method is usefull to access metadata of an audio file without loading it into memory, far more faster than load().

Parameters

filename (str) – Audio filename on which to retrieve metadata, must be an existing file

Returns

metadata (namespace) – A namespace with fields as described above

Raises

ValueError – If the filename is not a valid audio file that ffmpeg can process.

classmethod load(filename)[source]

Creates an Audio instance from a WAV file

Parameters

filename (str) – Path to the audio file to load, must be an existing file

Returns

audio (Audio) – The Audio instance initialized from the filename

Raises

ValueError – If the filename is not a valid audio file.

save(filename)[source]

Saves the audio data to a filename

Parameters

filename (str) – The audio file to create, format is guessed from extension

Raises

ValueError – If the file already exists or is unreachable

channel(index)[source]

Builds a mono signal from a multi-channel one

Parameters

index (int) – The audio channel to extract from the original signal

Returns

mono (Audio) – The extracted single-channel data

Raises

ValueError – If index >= nchannels()

resample(sample_rate, backend='sox')[source]

Returns the audio signal resampled at the given sample_rate

This method first rely on pysox (excepted if backend is ‘scipy’) and, if sox is not installed on your system or anything goes wrong it falls back to scipy.signal.resample.

The sox backend is very fast and accurate but relies on an external binary whereas scipy backend can be very slow but works in pure Python.

Parameters

sample_rate (int) – The sample frequency used to resample the signal, in Hz

Returns

  • audio (Audio) – An Audio instance containing the resampled signal

  • backend (str, optional) – The backend to use for resampling, must be ‘sox’ or ‘scipy’, default to ‘sox’

Raises

ValueError – If the backend is not ‘sox’ or ‘scipy’, or if the resampling failed

is_valid()[source]

Returns True if the audio data is valid, False otherwise

An Audio instance is valid if the underlying data type is supported (must be np.int16, np.int32, np.float32 or np.float64), and if the samples min/max are within the expected boundaries for the given data type (see above).

astype(dtype)[source]

Returns the audio signal converted to the dtype numeric type

The valid types are np.int16, np.int32, np.float32 or np.float64, see above for the types min and max.

Parameters

dtype (numeric type) – Must be an integer or a floating-point type in the types described above.

Raises

ValueError – If the requested dtype is not supported

segment(segments)[source]

Returns audio chunks segmented from the original signal

Parameters

segments (list of pairs of floats) – A list of pairs (tstart, tstop) of the start and stop indices (in seconds) of the signal chunks we are going to extract. The times tstart and tstop must be float, with tstart < tstop.

Returns

chunks (list of Audio) – The signal chunks created from the given segments

Raises

ValueError – If one element in segments is not a pair of float or if tstart >= tstop. If segments is not a list.