Time alignments

Handles time alignments of speech signals

This module provides two classes to operate on time alignments:

  • Alignment is the class representing a time-alignment for a single item.

  • AlignmentCollection is a high-level class to load/save alignment files. It exposes a dictionnary of items mapped to Alignment instances.

The time alignements are used as input to the OneHotProcessor and FramedOneHotProcessor features processors.

A speech signal is time-aligned when, for each pronunced token (phone or word) in the speech, their associated onset and offset times are provided. An alignment can be obtained manually (by annotation), or automatically (using a Kaldi recipe for example).

Alignment files supported by shennong are text files (optionnaly compressed) in which each line is formatted as follow:

<item> <onset> <offset> <token>

The <item> can be the reference of an utterance, a speaker, or a file. The <onset> and <offset> are begin and end timestamps (in seconds) of the <token> being pronunced. An exemple file is located in shennong/test/data/alignment.txt and has been produced by a Kaldi forced-alignement recipe. Here are its first 10 lines:

S01F1522_0001 0.0125 0.1125 e:
S01F1522_0001 0.1125 0.2225 t
S01F1522_0001 0.2225 0.3125 o
S01F1522_0001 0.3125 0.3625 u
S01F1522_0001 0.3625 0.4225 r
S01F1522_0001 0.4225 0.4925 e
S01F1522_0001 0.4925 0.5925 sy
S01F1522_0001 0.5925 0.8925 i
S01F1522_0001 0.8925 1.2025 k
S01F1522_0001 1.2025 1.2825 u


Load a collection of 34 alignments from the provided test file:

>>> from shennong.alignment import AlignmentCollection
>>> alignments = AlignmentCollection.load('./test/data/alignment.txt')
>>> len(alignments.keys())

Get the alignment of one item, an item from an AlignmentCollection is an instance of Alignment:

>>> ali1 = alignments['S01F1522_0033']
>>> type(ali1)
<class 'shennong.alignment.Alignment'>
>>> ali1.duration()
>>> print(ali1)
0.0125 0.0425 m
0.0425 0.1225 a
0.1225 0.1825 s
0.1825 0.2425 o
0.2425 0.3025 r
0.3025 0.3625 e
0.3625 0.4325 k
0.4325 0.4925 a
0.4925 0.5625 r
0.5625 0.6525 a

Extract a subpart of the alignment, as an Alignment instance as well:

>>> ali2 = ali1[0.4325:0.6525]
>>> print(ali2)
0.4325 0.4925 a
0.4925 0.5625 r
0.5625 0.6525 a
class shennong.alignment.Alignment(times, tokens, validate=True)[source]

Bases: object

Time alignment of tokens

An Alignment handles a time alignment of tokens, i.e. a suite of tokens linked with their onset and offset timestamps. See the validate() method for a list constraints applying to the data.

  • times (array of float, shape = [ntokens, 2]) – The array of (onset, offset) timestamps for each aligned token

  • tokens (array of str, shape = [ntokens, 1]) – The array of aligned tokens

  • validate (bool, optional) – When True, checks the alignment is in a valid format, when False does not perform any verification, default is True


ValueError – When validate() is True and the alignment data is not correctly formatted

property times

The (start, stop) timestamps of the aligned tokens in seconds

property onsets

The start timestamps of the aligned tokens in seconds

property offsets

The stop timestamps of the aligned tokens in seconds

property tokens

The aligned tokens associated with timestamps

static from_list(data, validate=True)[source]

Build an Alignment from a list of (tstart, tsop, token) triplets

This method checks all elements in the data list have 3 fields, convert them to times and data arrays, and instanciates an Alignment instance with them.


data (sequence of (tstart, tstop, token)) – A list or sequence of triplets (tstart, tstop, token) representing a time aligned token. tstart and tstop are the onset and offset of the pronunciation (in seconds). token is a string representation of the token.


Raises a ValueError is the Alignment is not consistent

The following conditions must apply for the alignment to be valid:

  • onsets, offsets and tokens must have the same length

  • onsets and offsets must be sorted in increasing order:

    data is a temporal sequence

  • onsets[n] must be lesser than offsets[n]: each token in data has a strictly positive duration

  • offsets[n] must be equal to onsets[n+1]: data has a temporal continuity.


Returns True if the Alignment is consistent, False otherwise


Returns the alignment as a list of triplets (onset, offset, token)

This is the reverse operation of from_list().


Returns an array of tokens read at the given sample_rate


Returns the duration of the alignment in seconds


Returns the different tokens composing the alignment


tokens (set) – Unique tokens present in the alignment

class shennong.alignment.AlignmentCollection(data)[source]

Bases: dict

A dictionary of Alignment indexed by items

An AlignmentCollection is a usual Python dictionary with some additional functions. Keys are strings, values are Alignment instances.


data (sequence of quadruplets) – A list or a sequence of quadruplets (item, onset, offset, token) representing a time aligned token for a given item, where onset is the start timestamp of the pronunced token, offset is the end timestamp of the pronunciation and token is a string representation of the token. onset and offset are expressed in seconds.


ValueError – If one element of data is not a quadruplet, if the Alignment mapped to an item cannot be instanciated.

static load(filename, compress=False)[source]

Returns an AlignmentCollection loaded from the alignment_file

The text file, optionally compressed, is read as utf8. It must be composed of lines with 4 fields <item> <onset> <offset> <token>.


filename (str) – The path to the alignment file to read, must be an existing text file.


alignment (AlignmentCollection) – The AlignmentCollection instance initialized from the alignment_file


ValueError – If the alignment_file is not a valid alignment or if the AlignmentCollection cannot be instanciated.

save(filename, sort=False, compress=False)[source]

Save the alignments to a filename

  • filename (str) – The text file to write (should have a .txt extension, or .txt.gz if compress is True, but this is not required). Must be a non existing file.

  • sort (bool, optional) – When True, the items are sorted in lexicographical order. Default to False.

  • compress (bool, optional) – When True the file is compressed using the gzip algorithm. Default to False.


ValueError – If the filename already exists or is not writable.


Returns the different tokens composing the collection


tokens (set) – Unique tokens present in the collection’s alignments

clear() → None. Remove all items from D.
copy() → a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values