Features manipulation¶
Features¶
Provides the Features class to manipulate speech features
A Features instance is designed to store the features extracted from a single utterance. It is made of three fields:
data
is a numpy array storing the underlying features matrix with the shape(nframes, ndims)
times
is a numpy array containg the timestamps for each frameproperties
is a dictionary containing metadata about the features, such as generation processor and parameters, original ausdio file, etc…
A Features alone cannot be saved to or loaded from file, it must be
encapsulated into a FeaturesCollection
.
Examples
>>> import numpy as np
>>> from shennong import Features
Build a random Features instance with timestamps
>>> feat = Features(np.random.random((5, 2)), np.linspace(0, 4, num=5))
>>> feat.shape
(5, 2)
>>> feat.nframes
5
>>> feat.ndims
2
>>> feat.properties
{}
Copy the features and add some properties to it
>>> feat2 = Features(feat.data, feat.times, properties={'str': 'a', 'int': 0})
>>> feat2.properties
{'str': 'a', 'int': 0}
>>> feat == feat2
False
>>> feat.data == feat2.data
array([[ True, True],
[ True, True],
[ True, True],
[ True, True],
[ True, True]])
>>> feat.times == feat2.times
array([ True, True, True, True, True])
-
class
shennong.features.
Features
(data, times, properties=None, validate=True)[source]¶ Bases:
object
Handles features data with attached timestamps and properties
-
property
data
¶ The underlying features data as a numpy matrix
-
property
times
¶ The frames timestamps on the vertical axis
-
property
dtype
¶ The type of the features data samples
-
property
shape
¶ The shape of the features data, as (nframes, ndims)
-
property
ndims
¶ The number of dimensions of a features frame (feat.shape[1])
-
property
nframes
¶ The number of features frames (feat.shape[0])
-
property
properties
¶ A dictionnary of properties used to build the features
Properties are references to the features extraction pipeline, parameters and source audio file used to generate the features.
-
is_close
(other, rtol=1e-05, atol=1e-08)[source]¶ Returns True if self is approximately equal to other
- Parameters
other (Features) – The Features instance to be compared to this one
rtol (float, optional) – Relative tolerance
atol (float, optional) – Absolute tolerance
- Returns
equal (bool) – True if these features are almost equal to the other
See also
FeaturesCollection.is_close()
,numpy.allclose()
-
copy
(dtype=None, subsample=None)[source]¶ Returns a copy of the features
Allocates new arrays for data, times and properties
- Parameters
dtype (type, optional) – When specified converts the data and times arrays to the requested dtype
subsample (int, optional) – When specified subsample the features every subsample frames. When not specified do not do subsampling.
- Raises
ValueError – If subsample is defined but is not a strictly positive integer.
- Returns
features (Features) – A new instance of Features copied from this one.
-
is_valid
()[source]¶ Returns True if the features are in a valid state
Returns False otherwise. Consistency is checked for features’s data, times and properties.
See also
-
concatenate
(other, tolerance=0, log=<Logger features (INFO)>)[source]¶ Returns the concatenation of this features with other
Build a new Features instance made of the concatenation of this instance with the other instance. Their times must be the equal.
- Parameters
other (Features, shape = [nframes +/- tolerance, ndim2]) – The other features to concatenate at the end of this one
tolerance (int, optional) – If the number of frames of the two features is different, trim the longest one up to a frame difference of tolerance, otherwise raise a ValueError. This option is usefull when concatenating pitch with other ‘standard’ features because pitch processing includes a downsampling which can alter the resulting number of frames (the same tolerance is applied in Kaldi, e.g. in paste-feats). Default to 0.
log (logging.Logger, optional) – Where to send log messages
- Returns
features (Features, shape = [nframes +/- tolerance, ndim1 + ndim2])
- Raises
ValueError – If other cannot be concatenated because of inconsistencies: number of frames difference greater than tolerance, inequal times values.
-
property
Features collection¶
Provides the FeaturesCollection class to manipulate speech features
A FeaturesCollection is basically a dictionnary of
Features
indexed by names.A collection can be saved to and loaded from a file with the
save()
andload()
methods.
Supported file formats¶
The following table details the supported file formats and compares the obtained file size, writing and reading times on MFCC features computed on the Buckeye Corpus (English, 40 speakers, about 38 hours of speech and 254 files):
File format |
Extension |
File size |
Writing time |
Reading time |
---|---|---|---|---|
pickle |
.pkl |
883.7 MB |
0:00:07 |
0:00:05 |
h5features |
.h5f |
873.0 MB |
0:00:21 |
0:00:07 |
numpy |
.npz |
869.1 MB |
0:02:30 |
0:00:22 |
matlab |
.mat |
721.1 MB |
0:00:59 |
0:00:11 |
kaldi |
.ark |
1.3 GB |
0:00:06 |
0:00:07 |
CSV |
folder |
4.8 GB |
0:03:02 |
0:03:11 |
pickle: standard Python format, fast and efficient for little to medium datasets.
h5features: based on HDF5 and specialized for very big datasets. Supports partial read/write of datasets bigger than RAM. The documention is available at https://docs.cognitive-ml.fr/h5features.
numpy: standard numpy format.
matlab and kaldi: for compatibility.
csv: each features in the collection is wrote as plain text in a dedicated file, with an optional JSON file storing features properties.
Examples
>>> import os
>>> import numpy as np
>>> from shennong import Features, FeaturesCollection
Create a collection of two random features
>>> fc = FeaturesCollection()
>>> fc['feat1'] = Features(np.random.random((5, 2)), np.linspace(0, 4, num=5))
>>> fc['feat2'] = Features(np.random.random((3, 2)), np.linspace(0, 2, num=3))
>>> fc.keys()
dict_keys(['feat1', 'feat2'])
Save the collection to a npz file
>>> fc.save('features.npz')
Load it back to a new collection
>>> fc2 = FeaturesCollection.load('features.npz')
>>> fc2.keys()
dict_keys(['feat1', 'feat2'])
>>> fc == fc2
True
>>> os.remove('features.npz')
-
class
shennong.features_collection.
FeaturesCollection
[source]¶ Bases:
dict
Handles a collection of
Features
as a dictionary-
classmethod
load
(filename, serializer=None, log=<Logger serializer (WARNING)>)[source]¶ Loads a FeaturesCollection from a filename
- Parameters
filename (str) – The file to load
serializer (str, optional) – The file serializer to use for loading, if not specified guess the serializer from the filename extension
log (logging.Logger, optional) – Where to send log messages. Default to a logger named ‘serializer’ with a ‘warning’ level.
- Returns
features (
FeaturesCollection
) – The features loaded from the filename- Raises
IOError – If the filename cannot be read
ValueError – If the serializer or the file extension is not supported, if the features loading fails.
-
save
(filename, serializer=None, with_properties=True, log=<Logger serializer (WARNING)>, **kwargs)[source]¶ Saves a FeaturesCollection to a filename
- Parameters
filename (str) – The file to write
serializer (str, optional) – The file serializer to use for loading, if not specified guess the serializer from the filename extension
with_properties (bool, optional) – When False do not save the features properties, default to True.
log (logging.Logger, optional) – Where to send log messages. Default to a logger named ‘serializer’ with a ‘warning’ level.
compress (bool_or_str_or_int, optional) – Only valid for numpy (.npz), matlab (.mat) and h5features (.h5f) serializers. When True compress the file. Default to True.
scp (bool, optional) – Only valid for kaldi (.ark) serializer. When True writes a .scp file along with the .ark file. Default to False.
- Raises
IOError – If the file filename already exists
ValueError – If the serializer or the file extension is not supported, if the features saving fails.
-
is_close
(other, rtol=1e-05, atol=1e-08)[source]¶ Returns True self is approximately equal to other
- Parameters
other (FeaturesCollection) – The collection of features to compare to the current one
rtol (float, optional) – Relative tolerance
atol (float, optional) – Absolute tolerance
- Returns
equal (bool) – True if this collection is almost equal to the other
See also
Features.is_close()
,numpy.allclose()
-
partition
(index)[source]¶ Returns a partition of the collection as a dict of FeaturesCollection
This method is usefull to create sub-collections from an existing one, for instance to make one sub-collection per speaker, or per gender, etc…
- Parameters
index (dict) – A mapping with, for each item in this collection, the sub-collection they belong to in the partition. We must have
index.keys() == self.keys()
.- Returns
features (dict of FeaturesCollection) – A dictionnary of FeaturesCollection instances, one per speaker defined in index.
- Raises
ValueError – If one utterance in the collection is not mapped in index.
-
trim
(vad)[source]¶ Returns a new instance of FeaturesCollection where each features has been trimmed with the corresponding VAD.
- Parameters
vad (dict of boolean ndarrays) – A dictionnary of arrays indicating which frame to keep.
- Returns
features (FeaturesCollection) – A new FeaturesCollection trimmed with the input VAD
- Raises
ValueError – If the utterances are not the same. If the VAD arrays are not boolean arrays.
-
clear
() → None. Remove all items from D.¶
-
copy
() → a shallow copy of D¶
-
fromkeys
(value=None, /)¶ Create a new dictionary with keys from iterable and values set to value.
-
get
(key, default=None, /)¶ Return the value for key if key is in the dictionary, else default.
-
items
() → a set-like object providing a view on D’s items¶
-
keys
() → a set-like object providing a view on D’s keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised
-
popitem
() → (k, v), remove and return some (key, value) pair as a¶ 2-tuple; but raise KeyError if D is empty.
-
setdefault
(key, default=None, /)¶ Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D’s values¶
-
classmethod