Audio data¶
Provides the Audio
class that handles audio signals
Note
Supports all audio format from ffmpeg (wav, mp3, flac, etc…). See https://www.ffmpeg.org/general.html#File-Formats for details.
The Audio
class allows to load, save and manipulate
multichannels audio data. The underlying audio samples can be of one
of the following types (with the corresponding min and max):
Type
Min
Max
np.int16
-32768
+32767
np.int32
-2147483648
+2147483647
np.float32
-1.0
+1.0
np.float64
-1.0
+1.0
When loading an audio file with Audio.load()
, those min/max are
expected to be respected. When creating an Audio
instance
from a raw data array, the validate
parameter in the class
constructor and the method Audio.is_valid()
make sure the data
type and min/max are respected.
Examples
>>> import os
>>> import numpy as np
>>> from shennong.audio import Audio
Create 1000 samples of a stereo signal at 16 kHz:
>>> audio = Audio(np.random.random((1000, 2)), 16000)
>>> audio.data.shape
(1000, 2)
>>> audio.dtype
dtype('float64')
>>> audio.sample_rate
16000
>>> audio.nchannels
2
>>> audio.duration
0.0625
Resample the signal to 8 kHz and convert it to 16 bits integers:
>>> audio2 = audio.resample(8000).astype(np.int16)
>>> audio2.sample_rate
8000
>>> audio2.duration == audio.duration
True
>>> audio2.dtype
dtype('int16')
>>> audio2.is_valid()
True
Save the Audio
instance as a wav file, load an existing wav
file as an Audio
instance:
>>> audio.save('stereo.wav')
>>> audio3 = Audio.load('stereo.wav')
>>> audio == audio3
True
>>> os.remove('stereo.wav')
Extract mono signal from a stereo one (left and right are instances of
Audio
as well):
>>> left = audio.channel(0)
>>> right = audio.channel(1)
>>> left.duration == right.duration == audio.duration
True
>>> left.nchannels == right.nchannels == 1
True
-
class
shennong.audio.
Audio
(data, sample_rate, validate=True)[source]¶ Bases:
object
Create an audio signal with the given data and sample_rate
-
data
¶ The waveform audio signal, must be of one of the supported types (see above)
- Type
numpy array, shape = [nsamples, nchannels]
-
sample_rate
¶ The sample frequency of the data, in Hertz
- Type
float
-
validate
¶ When True, make sure the underlying data is valid (see
is_valid()
), default to True- Type
bool, optional
- Raises
ValueError – If validate is True and the audio data if not valid (see
is_valid()
)
-
property
data
¶ The numpy array of audio data
-
property
sample_rate
¶ The sample frequency of the signal in Hertz
-
property
duration
¶ The duration of the signal in seconds
-
property
nchannels
¶ The number of audio channels in the signal
-
property
nsamples
¶ The number of samples in the signal
-
property
shape
¶ Return the shape of the underlying data
-
property
dtype
¶ The numeric type of samples
-
property
precision
¶ The number of bits per sample
-
classmethod
scan
(filename)[source]¶ Returns the audio metadata without loading the file
Returns a Python namespace (a named tuple) metadata with the following fields:
metadata.nchannels : int, number of channels
metadata.sample_rate : int, sample frequency in Hz
metadata.nsamples : int, number of audio samples in the file
metadata.duration : float, audio duration in seconds
This method is usefull to access metadata of an audio file without loading it into memory, far more faster than
load()
.- Parameters
filename (str) – Audio filename on which to retrieve metadata, must be an existing file
- Returns
metadata (namespace) – A namespace with fields as described above
- Raises
ValueError – If the filename is not a valid audio file that ffmpeg can process.
-
classmethod
load
(filename)[source]¶ Creates an Audio instance from a WAV file
- Parameters
filename (str) – Path to the audio file to load, must be an existing file
- Returns
audio (Audio) – The Audio instance initialized from the filename
- Raises
ValueError – If the filename is not a valid audio file.
-
save
(filename)[source]¶ Saves the audio data to a filename
- Parameters
filename (str) – The audio file to create, format is guessed from extension
- Raises
ValueError – If the file already exists or is unreachable
-
channel
(index)[source]¶ Builds a mono signal from a multi-channel one
- Parameters
index (int) – The audio channel to extract from the original signal
- Returns
mono (Audio) – The extracted single-channel data
- Raises
ValueError – If index >=
nchannels()
-
resample
(sample_rate, backend='sox')[source]¶ Returns the audio signal resampled at the given sample_rate
This method first rely on pysox (excepted if backend is ‘scipy’) and, if sox is not installed on your system or anything goes wrong it falls back to scipy.signal.resample.
The sox backend is very fast and accurate but relies on an external binary whereas scipy backend can be very slow but works in pure Python.
- Parameters
sample_rate (int) – The sample frequency used to resample the signal, in Hz
- Returns
audio (Audio) – An Audio instance containing the resampled signal
backend (str, optional) – The backend to use for resampling, must be ‘sox’ or ‘scipy’, default to ‘sox’
- Raises
ValueError – If the backend is not ‘sox’ or ‘scipy’, or if the resampling failed
-
is_valid
()[source]¶ Returns True if the audio data is valid, False otherwise
An Audio instance is valid if the underlying data type is supported (must be np.int16, np.int32, np.float32 or np.float64), and if the samples min/max are within the expected boundaries for the given data type (see above).
-
astype
(dtype)[source]¶ Returns the audio signal converted to the dtype numeric type
The valid types are np.int16, np.int32, np.float32 or np.float64, see above for the types min and max.
- Parameters
dtype (numeric type) – Must be an integer or a floating-point type in the types described above.
- Raises
ValueError – If the requested dtype is not supported
-
segment
(segments)[source]¶ Returns audio chunks segmented from the original signal
- Parameters
segments (list of pairs of floats) – A list of pairs (tstart, tstop) of the start and stop indices (in seconds) of the signal chunks we are going to extract. The times tstart and tstop must be float, with tstart < tstop.
- Returns
chunks (list of Audio) – The signal chunks created from the given segments
- Raises
ValueError – If one element in segments is not a pair of float or if tstart >= tstop. If segments is not a list.
-