Time alignments¶
Handles time alignments of speech signals
This module provides two classes to operate on time alignments:
Alignment
is the class representing a time-alignment for a single item.AlignmentCollection
is a high-level class to load/save alignment files. It exposes a dictionnary of items mapped toAlignment
instances.
The time alignements are used as input to the
OneHotProcessor
and
FramedOneHotProcessor
features processors.
A speech signal is time-aligned when, for each pronunced token (phone or word) in the speech, their associated onset and offset times are provided. An alignment can be obtained manually (by annotation), or automatically (using a Kaldi recipe for example).
Alignment files supported by shennong are text files (optionnaly compressed) in which each line is formatted as follow:
<item> <onset> <offset> <token>
The <item>
can be the reference of an utterance, a speaker, or a
file. The <onset>
and <offset>
are begin and end timestamps
(in seconds) of the <token>
being pronunced. An exemple file is
located in shennong/test/data/alignment.txt
and has been produced
by a Kaldi forced-alignement recipe. Here are its first 10 lines:
S01F1522_0001 0.0125 0.1125 e:
S01F1522_0001 0.1125 0.2225 t
S01F1522_0001 0.2225 0.3125 o
S01F1522_0001 0.3125 0.3625 u
S01F1522_0001 0.3625 0.4225 r
S01F1522_0001 0.4225 0.4925 e
S01F1522_0001 0.4925 0.5925 sy
S01F1522_0001 0.5925 0.8925 i
S01F1522_0001 0.8925 1.2025 k
S01F1522_0001 1.2025 1.2825 u
Examples
Load a collection of 34 alignments from the provided test file:
>>> from shennong.alignment import AlignmentCollection
>>> alignments = AlignmentCollection.load('./test/data/alignment.txt')
>>> len(alignments.keys())
34
Get the alignment of one item, an item from an
AlignmentCollection
is an instance of Alignment
:
>>> ali1 = alignments['S01F1522_0033']
>>> type(ali1)
<class 'shennong.alignment.Alignment'>
>>> ali1.duration()
0.64
>>> print(ali1)
0.0125 0.0425 m
0.0425 0.1225 a
0.1225 0.1825 s
0.1825 0.2425 o
0.2425 0.3025 r
0.3025 0.3625 e
0.3625 0.4325 k
0.4325 0.4925 a
0.4925 0.5625 r
0.5625 0.6525 a
Extract a subpart of the alignment, as an Alignment
instance
as well:
>>> ali2 = ali1[0.4325:0.6525]
>>> print(ali2)
0.4325 0.4925 a
0.4925 0.5625 r
0.5625 0.6525 a
-
class
shennong.alignment.
Alignment
(times, tokens, validate=True)[source]¶ Bases:
object
Time alignment of tokens
An Alignment handles a time alignment of tokens, i.e. a suite of tokens linked with their onset and offset timestamps. See the
validate()
method for a list constraints applying to the data.- Parameters
times (array of float, shape = [ntokens, 2]) – The array of (onset, offset) timestamps for each aligned token
tokens (array of str, shape = [ntokens, 1]) – The array of aligned tokens
validate (bool, optional) – When True, checks the alignment is in a valid format, when False does not perform any verification, default is True
- Raises
ValueError – When
validate()
is True and the alignment data is not correctly formatted
-
property
times
¶ The (start, stop) timestamps of the aligned tokens in seconds
-
property
onsets
¶ The start timestamps of the aligned tokens in seconds
-
property
offsets
¶ The stop timestamps of the aligned tokens in seconds
-
property
tokens
¶ The aligned tokens associated with timestamps
-
static
from_list
(data, validate=True)[source]¶ Build an Alignment from a list of (tstart, tsop, token) triplets
This method checks all elements in the data list have 3 fields, convert them to times and data arrays, and instanciates an Alignment instance with them.
- Parameters
data (sequence of (tstart, tstop, token)) – A list or sequence of triplets (tstart, tstop, token) representing a time aligned token. tstart and tstop are the onset and offset of the pronunciation (in seconds). token is a string representation of the token.
-
validate
()[source]¶ Raises a ValueError is the Alignment is not consistent
The following conditions must apply for the alignment to be valid:
onsets, offsets and tokens must have the same length
- onsets and offsets must be sorted in increasing order:
data is a temporal sequence
onsets[n] must be lesser than offsets[n]: each token in data has a strictly positive duration
offsets[n] must be equal to onsets[n+1]: data has a temporal continuity.
-
to_list
()[source]¶ Returns the alignment as a list of triplets (onset, offset, token)
This is the reverse operation of
from_list()
.
-
class
shennong.alignment.
AlignmentCollection
(data)[source]¶ Bases:
dict
A dictionary of
Alignment
indexed by itemsAn
AlignmentCollection
is a usual Python dictionary with some additional functions. Keys are strings, values areAlignment
instances.- Parameters
data (sequence of quadruplets) – A list or a sequence of quadruplets (item, onset, offset, token) representing a time aligned token for a given item, where onset is the start timestamp of the pronunced token, offset is the end timestamp of the pronunciation and token is a string representation of the token. onset and offset are expressed in seconds.
- Raises
ValueError – If one element of data is not a quadruplet, if the Alignment mapped to an item cannot be instanciated.
-
static
load
(filename, compress=False)[source]¶ Returns an AlignmentCollection loaded from the alignment_file
The text file, optionally compressed, is read as utf8. It must be composed of lines with 4 fields
<item> <onset> <offset> <token>
.- Parameters
filename (str) – The path to the alignment file to read, must be an existing text file.
- Returns
alignment (AlignmentCollection) – The AlignmentCollection instance initialized from the alignment_file
- Raises
ValueError – If the alignment_file is not a valid alignment or if the AlignmentCollection cannot be instanciated.
-
save
(filename, sort=False, compress=False)[source]¶ Save the alignments to a filename
- Parameters
filename (str) – The text file to write (should have a .txt extension, or .txt.gz if compress is True, but this is not required). Must be a non existing file.
sort (bool, optional) – When True, the items are sorted in lexicographical order. Default to False.
compress (bool, optional) – When True the file is compressed using the gzip algorithm. Default to False.
- Raises
ValueError – If the filename already exists or is not writable.
-
get_tokens_inventory
()[source]¶ Returns the different tokens composing the collection
- Returns
tokens (set) – Unique tokens present in the collection’s alignments
-
clear
() → None. Remove all items from D.¶
-
copy
() → a shallow copy of D¶
-
fromkeys
(value=None, /)¶ Create a new dictionary with keys from iterable and values set to value.
-
get
(key, default=None, /)¶ Return the value for key if key is in the dictionary, else default.
-
items
() → a set-like object providing a view on D’s items¶
-
keys
() → a set-like object providing a view on D’s keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised
-
popitem
() → (k, v), remove and return some (key, value) pair as a¶ 2-tuple; but raise KeyError if D is empty.
-
setdefault
(key, default=None, /)¶ Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D’s values¶