Features manipulation

Features classes

Builds, saves, loads and manipulate speech features

class shennong.features.features.Features(data, times, properties={}, validate=True)[source]

Bases: object

property data

The underlying features data as a numpy matrix

property times

The frames timestamps on the vertical axis

property dtype

The type of the features data samples

property shape

The shape of the features data, as (nframes, ndims)

property ndims

The number of dimensions of a features frame (feat.shape[1])

property nframes

The number of features frames (feat.shape[0])

property properties

A dictionnary of properties used to build the features

Properties are references to the features extraction pipeline, parameters and source audio file used to generate the features.

is_close(other, rtol=1e-05, atol=1e-08)[source]

Returns True if self is approximately equal to other

Parameters
  • other (Features) – The Features instance to be compared to this one

  • rtol (float, optional) – Relative tolerance

  • atol (float, optional) – Absolute tolerance

Returns

equal (bool) – True if these features are almost equal to the other

See also

FeaturesCollection.is_close(), numpy.allclose()

copy(dtype=None, subsample=None)[source]

Returns a copy of the features

Allocates new arrays for data, times and properties

Parameters
  • dtype (type, optional) – When specified converts the data and times arrays to the requested dtype

  • subsample (int, optional) – When specified subsample the features every subsample frames. When not specified do not do subsampling.

Raises

ValueError – If subsample is defined but is not a strictly positive integer.

Returns

features (Features) – A new instance of Features copied from this one.

is_valid()[source]

Returns True if the features are in a valid state

Returns False otherwise. Consistency is checked for features’s data, times and properties.

validate()[source]

Raises a ValueError if the features are not in a valid state

concatenate(other, tolerance=0)[source]

Returns the concatenation of this features with other

Build a new Features instance made of the concatenation of this instance with the other instance. Their times must be the equal.

Parameters
  • other (Features, shape = [nframes +/- tolerance, ndim2]) – The other features to concatenate at the end of this one

  • tolerance (int, optional) – If the number of frames of the two features is different, trim the longest one up to a frame difference of tolerance, otherwise raise a ValueError. This option is usefull when concatenating pitch with other ‘standard’ features because pitch processing includes a downsampling which can alter the resulting number of frames (the same tolerance is applied in Kaldi, e.g. in paste-feats). Default to 0.

Returns

features (Features, shape = [nframes +/- tolerance, ndim1 + ndim2])

Raises

ValueError – If other cannot be concatenated because of inconsistencies: number of frames difference greater than tolerance, inequal times values.

class shennong.features.features.FeaturesCollection[source]

Bases: dict

classmethod load(filename, serializer=None)[source]

Loads a FeaturesCollection from a filename

Parameters
  • filename (str) – The file to load

  • serializer (str, optional) – The file serializer to use for loading, if not specified guess the serializer from the filename extension

Returns

features (FeaturesCollection) – The features loaded from the filename

Raises
  • IOError – If the filename cannot be read

  • ValueError – If the serializer or the file extension is not supported, if the features loading fails.

save(filename, serializer=None, **kwargs)[source]
is_valid()[source]

Returns True if all the features in the collection are valid

is_close(other, rtol=1e-05, atol=1e-08)[source]

Returns True self is approximately equal to other

Parameters
  • other (FeaturesCollection) – The collection of features to compare to the current one

  • rtol (float, optional) – Relative tolerance

  • atol (float, optional) – Absolute tolerance

Returns

equal (bool) – True if this collection is almost equal to the other

See also

Features.is_close(), numpy.allclose()

partition(index)[source]

Returns a partition of the collection as a dict of FeaturesCollection

This method is usefull to create sub-collections from an existing one, for instance to make one sub-collection per speaker, or per gender, etc…

Parameters

index (dict) – A mapping with, for each item in this collection, the sub-collection they belong to in the partition. We must have index.keys() == self.keys().

Returns

features (dict of FeaturesCollection) – A dictionnary of FeaturesCollection instances, one per speaker defined in index.

Raises

ValueError – If one utterance in the collection is not mapped in index.

trim(vad)[source]

Returns a new instance of FeaturesCollection where each features has been trimmed with the corresponding VAD.

Parameters

vad (dict of boolean ndarrays) – A dictionnary of arrays indicating which frame to keep.

Returns

features (FeaturesCollection) – A new FeaturesCollection trimmed with the input VAD

Raises

ValueError – If the utterances are not the same. If the VAD arrays are not boolean arrays.

clear() → None. Remove all items from D.
copy() → a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values

Save/load features

Saves and loads features collections to/from various file formats

The following table shows the obtained file size, writing and reading times on MFCC features computed on the Zero Resource Speech Challenge 2019 train database (English, about 26 hours of speech and 10k files):

File format

Extension

File size

Writing time

Reading time

h5features

.h5f

562.9 MB

0:00:20

0:00:08

pickle

.pkl

609.8 MB

0:00:08

0:00:06

numpy

.npz

582.8 MB

0:02:07

0:00:19

matlab

.mat

481.8 MB

0:00:58

0:00:13

kaldi

.ark

927.8 MB

0:00:10

0:00:15

JSON

.json

6.3 GB

0:11:34

1:04:25

shennong.features.serializers.supported_extensions()[source]

Returns the list of file extensions to save/load features

Returns

serializers (dict) – File extensions mapped to their related serializer class

shennong.features.serializers.supported_serializers()[source]

Returns the list of file format serializers to save/load features

Returns

serializers (dict) – Serializers names mapped to their related class

shennong.features.serializers.get_serializer(cls, filename, serializer=None)[source]

Returns the file serializer from filename extension or serializer name

Parameters
  • cls (class) – Must be shennong.features.FeaturesCollection, this is a tweak to avoid circular imports

  • filename (str) – The file to be handled (load or save)

  • serializer (str, optional) – If not None must be one of the supported_serializers(), if not specified, guess the serializer from the filename extension using supported_extensions().

Returns

serializer (instance of FeaturesSerializer) – The guessed serializer class, a child class of FeaturesSerializer.

Raises

ValueError – If the serializer class cannot be guessed, or if cls is not FeaturesCollection

class shennong.features.serializers.FeaturesSerializer(cls, filename)[source]

Bases: object

Base class of a features file serializer

This class must be specialized to handle a given file type.

Parameters
  • cls (class) – Must be shennong.features.FeaturesCollection, this is a tweak to avoid circular imports

  • filename (str) – The file to save/load features to/from

property filename
load(**kwargs)[source]

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)[source]

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.NumpySerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

Saves and loads features to/from the numpy ‘.npz’ format

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.MatlabSerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

Saves and loads features to/from the matlab ‘.mat’ format

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.JsonSerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

Saves and loads features to/from the JSON format

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.PickleSerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

Saves and loads features to/from the Python pickle format

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.H5featuresSerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

Saves and loads features to/from the h5features format

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.

class shennong.features.serializers.KaldiSerializer(cls, filename)[source]

Bases: shennong.features.serializers.FeaturesSerializer

property filename
load(**kwargs)

Returns a collection of features from the filename

Returns

  • features (FeaturesCollection) – The features stored in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the input file does not exist or cannot be read.

  • ValueError – If the features cannot be loaded from the file or are not in a valid state.

save(features, **kwargs)

Saves a collection of features to a file

Parameters
  • features (FeaturesCollection) – The features to store in the file.

  • kwargs (optional) – Optional supplementary arguments, specific to each serializer.

Raises
  • IOError – If the output file already exists.

  • ValueError – If the features cannot be saved to the file, are not in a valid state or are not an instance of FeaturesCollection.