Features manipulation

Features

Provides the Features class to manipulate speech features

A Features instance is designed to store the features extracted from a single utterance. It is made of three fields:

  • data is a numpy array storing the underlying features matrix with the shape (nframes, ndims)

  • times is a numpy array containg the timestamps for each frame

  • properties is a dictionary containing metadata about the features, such as generation processor and parameters, original ausdio file, etc…

A Features alone cannot be saved to or loaded from file, it must be encapsulated into a FeaturesCollection.

Examples

>>> import numpy as np
>>> from shennong import Features

Build a random Features instance with timestamps

>>> feat = Features(np.random.random((5, 2)), np.linspace(0, 4, num=5))
>>> feat.shape
(5, 2)
>>> feat.nframes
5
>>> feat.ndims
2
>>> feat.properties
{}

Copy the features and add some properties to it

>>> feat2 = Features(feat.data, feat.times, properties={'str': 'a', 'int': 0})
>>> feat2.properties
{'str': 'a', 'int': 0}
>>> feat == feat2
False
>>> feat.data == feat2.data
array([[ True,  True],
       [ True,  True],
       [ True,  True],
       [ True,  True],
       [ True,  True]])
>>> feat.times == feat2.times
array([ True,  True,  True,  True,  True])
class shennong.features.Features(data, times, properties=None, validate=True)[source]

Bases: object

Handles features data with attached timestamps and properties

property data

The underlying features data as a numpy matrix

property times

The frames timestamps on the vertical axis

property dtype

The type of the features data samples

property shape

The shape of the features data, as (nframes, ndims)

property ndims

The number of dimensions of a features frame (feat.shape[1])

property nframes

The number of features frames (feat.shape[0])

property properties

A dictionnary of properties used to build the features

Properties are references to the features extraction pipeline, parameters and source audio file used to generate the features.

is_close(other, rtol=1e-05, atol=1e-08)[source]

Returns True if self is approximately equal to other

Parameters
  • other (Features) – The Features instance to be compared to this one

  • rtol (float, optional) – Relative tolerance

  • atol (float, optional) – Absolute tolerance

Returns

equal (bool) – True if these features are almost equal to the other

See also

FeaturesCollection.is_close(), numpy.allclose()

copy(dtype=None, subsample=None)[source]

Returns a copy of the features

Allocates new arrays for data, times and properties

Parameters
  • dtype (type, optional) – When specified converts the data and times arrays to the requested dtype

  • subsample (int, optional) – When specified subsample the features every subsample frames. When not specified do not do subsampling.

Raises

ValueError – If subsample is defined but is not a strictly positive integer.

Returns

features (Features) – A new instance of Features copied from this one.

is_valid()[source]

Returns True if the features are in a valid state

Returns False otherwise. Consistency is checked for features’s data, times and properties.

validate()[source]

Raises a ValueError if the features are not in a valid state

concatenate(other, tolerance=0, log=<Logger features (INFO)>)[source]

Returns the concatenation of this features with other

Build a new Features instance made of the concatenation of this instance with the other instance. Their times must be the equal.

Parameters
  • other (Features, shape = [nframes +/- tolerance, ndim2]) – The other features to concatenate at the end of this one

  • tolerance (int, optional) – If the number of frames of the two features is different, trim the longest one up to a frame difference of tolerance, otherwise raise a ValueError. This option is usefull when concatenating pitch with other ‘standard’ features because pitch processing includes a downsampling which can alter the resulting number of frames (the same tolerance is applied in Kaldi, e.g. in paste-feats). Default to 0.

  • log (logging.Logger, optional) – Where to send log messages

Returns

features (Features, shape = [nframes +/- tolerance, ndim1 + ndim2])

Raises

ValueError – If other cannot be concatenated because of inconsistencies: number of frames difference greater than tolerance, inequal times values.

Features collection

Provides the FeaturesCollection class to manipulate speech features

  • A FeaturesCollection is basically a dictionnary of Features indexed by names.

  • A collection can be saved to and loaded from a file with the save() and load() methods.

Supported file formats

The following table details the supported file formats and compares the obtained file size, writing and reading times on MFCC features computed on the Buckeye Corpus (English, 40 speakers, about 38 hours of speech and 254 files):

File format

Extension

File size

Writing time

Reading time

pickle

.pkl

883.7 MB

0:00:07

0:00:05

h5features

.h5f

873.0 MB

0:00:21

0:00:07

numpy

.npz

869.1 MB

0:02:30

0:00:22

matlab

.mat

721.1 MB

0:00:59

0:00:11

kaldi

.ark

1.3 GB

0:00:06

0:00:07

CSV

folder

4.8 GB

0:03:02

0:03:11

  • pickle: standard Python format, fast and efficient for little to medium datasets.

  • h5features: based on HDF5 and specialized for very big datasets. Supports partial read/write of datasets bigger than RAM. The documention is available at https://docs.cognitive-ml.fr/h5features.

  • numpy: standard numpy format.

  • matlab and kaldi: for compatibility.

  • csv: each features in the collection is wrote as plain text in a dedicated file, with an optional JSON file storing features properties.

Examples

>>> import os
>>> import numpy as np
>>> from shennong import Features, FeaturesCollection

Create a collection of two random features

>>> fc = FeaturesCollection()
>>> fc['feat1'] = Features(np.random.random((5, 2)), np.linspace(0, 4, num=5))
>>> fc['feat2'] = Features(np.random.random((3, 2)), np.linspace(0, 2, num=3))
>>> fc.keys()
dict_keys(['feat1', 'feat2'])

Save the collection to a npz file

>>> fc.save('features.npz')

Load it back to a new collection

>>> fc2 = FeaturesCollection.load('features.npz')
>>> fc2.keys()
dict_keys(['feat1', 'feat2'])
>>> fc == fc2
True
>>> os.remove('features.npz')
class shennong.features_collection.FeaturesCollection[source]

Bases: dict

Handles a collection of Features as a dictionary

classmethod load(filename, serializer=None, log=<Logger serializer (WARNING)>)[source]

Loads a FeaturesCollection from a filename

Parameters
  • filename (str) – The file to load

  • serializer (str, optional) – The file serializer to use for loading, if not specified guess the serializer from the filename extension

  • log (logging.Logger, optional) – Where to send log messages. Default to a logger named ‘serializer’ with a ‘warning’ level.

Returns

features (FeaturesCollection) – The features loaded from the filename

Raises
  • IOError – If the filename cannot be read

  • ValueError – If the serializer or the file extension is not supported, if the features loading fails.

save(filename, serializer=None, with_properties=True, log=<Logger serializer (WARNING)>, **kwargs)[source]

Saves a FeaturesCollection to a filename

Parameters
  • filename (str) – The file to write

  • serializer (str, optional) – The file serializer to use for loading, if not specified guess the serializer from the filename extension

  • with_properties (bool, optional) – When False do not save the features properties, default to True.

  • log (logging.Logger, optional) – Where to send log messages. Default to a logger named ‘serializer’ with a ‘warning’ level.

  • compress (bool_or_str_or_int, optional) – Only valid for numpy (.npz), matlab (.mat) and h5features (.h5f) serializers. When True compress the file. Default to True.

  • scp (bool, optional) – Only valid for kaldi (.ark) serializer. When True writes a .scp file along with the .ark file. Default to False.

Raises
  • IOError – If the file filename already exists

  • ValueError – If the serializer or the file extension is not supported, if the features saving fails.

is_valid()[source]

Returns True if all the features in the collection are valid

is_close(other, rtol=1e-05, atol=1e-08)[source]

Returns True self is approximately equal to other

Parameters
  • other (FeaturesCollection) – The collection of features to compare to the current one

  • rtol (float, optional) – Relative tolerance

  • atol (float, optional) – Absolute tolerance

Returns

equal (bool) – True if this collection is almost equal to the other

See also

Features.is_close(), numpy.allclose()

partition(index)[source]

Returns a partition of the collection as a dict of FeaturesCollection

This method is usefull to create sub-collections from an existing one, for instance to make one sub-collection per speaker, or per gender, etc…

Parameters

index (dict) – A mapping with, for each item in this collection, the sub-collection they belong to in the partition. We must have index.keys() == self.keys().

Returns

features (dict of FeaturesCollection) – A dictionnary of FeaturesCollection instances, one per speaker defined in index.

Raises

ValueError – If one utterance in the collection is not mapped in index.

trim(vad)[source]

Returns a new instance of FeaturesCollection where each features has been trimmed with the corresponding VAD.

Parameters

vad (dict of boolean ndarrays) – A dictionnary of arrays indicating which frame to keep.

Returns

features (FeaturesCollection) – A new FeaturesCollection trimmed with the input VAD

Raises

ValueError – If the utterances are not the same. If the VAD arrays are not boolean arrays.

clear() → None. Remove all items from D.
copy() → a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values