API reference¶
- fastabx.zerospeech_abx(item, root, *, max_size_group, max_x_across, speaker='within', context='within', distance='angular', frequency=50, feature_maker=<function load>, extension='.pt', seed=0)[source]¶
Compute the ABX similarly to the ZeroSpeech 2021 challenge.
On triphone or phoneme, described by an item file. Within or across speaker, and within context or ignoring context.
- Parameters:
item (str | Path) – Path to the item file.
root (str | Path) – Path to the root directory containing either the features or the audio files.
max_size_group (int | None) – Maximum number of instances of A, B, or X in each
Cell
. Passed to theSubsampler
of theTask
. Set to 10 in the original ZeroSpeech ABX code. Disabled if set toNone
.max_x_across (int | None) – In the “across” speaker mode, maximum number of X considered for given values of A and B. Passed to the
Subsampler
of theTask
. Set to 5 in the original ZeroSpeech ABX code. Disabled if set toNone
.speaker (Literal['within', 'across']) – The speaker mode, either “within” or “across”. Defaults to “within”.
context (Literal['within', 'any']) – The context mode, either “within” or “any”. Always use “within” with representations of triphones. Defaults to “within”.
distance (DistanceName) – The distance metric, “angular” (same as “cosine”), “euclidean”, “kl_symmetric” or “identical”. Defaults to “angular”.
frequency (int) – The feature frequency of the features / the output of the feature maker, in Hz. Defaults to 50 Hz.
feature_maker (Callable[[str | Path], Tensor]) – Function that takes a path and returns a torch.Tensor. Defaults to
torch.load
.extension (str) – The filename extension of the files to process in
root
, default is “.pt”.seed (int) – The random seed for the subsampling, default is 0.
- Return type:
float
Standard classes and functions¶
Dataset¶
- class fastabx.Dataset(labels, accessor)[source]¶
Simple interface to a dataset.
- Parameters:
labels (DataFrame) –
pl.DataFrame
containing the labels of the datapoints.accessor (InMemoryAccessor) –
InMemoryAccessor
to access the data.
- classmethod from_csv(path, feature_columns, *, separator=',')[source]¶
Create a dataset from a CSV file.
- Parameters:
path (str | Path) – Path to the CSV file containing both the labels and the features.
feature_columns (str | Collection[str]) – Column name or list of column names containing the features.
separator (str) – Separator used in the CSV file.
- Return type:
- classmethod from_dataframe(df, feature_columns)[source]¶
Create a dataset from a DataFrame (polars or pandas).
- Parameters:
df (SupportsInterchange) – DataFrame containing both the labels and the features.
feature_columns (str | Collection[str]) – Column name or list of column names containing the features.
- Return type:
- classmethod from_item(item, root, frequency, *, feature_maker=<function load>, extension='.pt', file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file.
If you want to keep the Libri-Light bug to reproduce previous results, set the environment variable FASTABX_WITH_LIBRILIGHT_BUG=1.
- Parameters:
item (str | Path) – Path to the item file.
root (str | Path) – Path to the root directory containing either the features or the audio files.
frequency (int) – The feature frequency of the features / the output of the feature maker, in Hz.
feature_maker (Callable[[str | Path], Tensor]) – Function that takes a path and returns a torch.Tensor. Defaults to
torch.load
.extension (str) – The filename extension of the files to process in
root
, default is “.pt”.file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_item_and_units(item, units, frequency, *, audio_key='audio', units_key='units', separator=' ', file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file with the units all described in a single JSONL file.
- Parameters:
item (str | Path) – Path to the item file.
units (str | Path) – Path to the JSONL file containing the units.
frequency (int) – The feature frequency, in Hz.
audio_key (str) – Key in the JSONL file that contains the audio file names, default is “audio”.
units_key (str) – Key in the JSONL file that contains the units, default is “units”.
separator (str) – Separator used in the units field, default is whitespace “ “.
file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_item_with_times(item, features, times, *, file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file.
Use arrays containing the times associated to the features instead of a given frequency.
- Parameters:
item (str | Path) – Path to the item file.
features (str | Path) – Path to the root directory containing either the features or the audio files.
times (str | Path) – Path to the root directory containing the times arrays.
file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_numpy(features, labels)[source]¶
Create a dataset from the features (numpy array) and the labels (dictionary of sequences).
- Parameters:
features (ArrayLike) – 2D array-like containing the features.
labels (Mapping[str, Sequence[object]] | SupportsInterchange) – Dictionary of sequences or DataFrame containing the labels.
- Return type:
Task¶
- class fastabx.Task(dataset, *, on, by=None, across=None, subsampler=None)[source]¶
The ABX task class.
A Task builds all the
Cell
givenon
,by
andacross
conditions. It can be subsampled to limit the number of cells.- Parameters:
dataset (Dataset) – The dataset containing the features and the labels.
on (str) – The
on
condition.by (list[str] | None) – The list of
by
conditions.across (list[str] | None) – The list of
across
conditions.subsampler (Subsampler | None) – An optional subsampler to limit the number of cells and their sizes.
Subsample¶
- class fastabx.Subsampler(max_size_group, max_x_across, seed=0)[source]¶
Subsample the ABX
Task
.Each cell is limited to
max_size_group
items for A, B and X independently. When using “across” conditions, each group of (A, B) is limited tomax_x_across
possible values for X. Subsampling for one or more conditions can be disabled by setting the corresponding argument toNone
.- Parameters:
max_size_group (int | None) – Maximum number of instances of A, B, or X in each
Cell
. Set to 10 in the original ZeroSpeech ABX code. Disabled if set toNone
.max_x_across (int | None) – In the “across” speaker mode, maximum number of X considered for given values of A and B. Set to 5 in the original ZeroSpeech ABX code. Disabled if set to
None
.seed (int) – The random seed for the subsampling, default is 0.
Score¶
- class fastabx.Score(task, distance_name, *, constraints=None)[source]¶
Compute the score of a
Task
using a given distance specified bydistance_name
.Additional
Constraints
can be provided to restrict the possible triplets in each cell.- Parameters:
- collapse(*, levels=None, weighted=False)[source]¶
Collapse the scored cells into the final score.
Use either levels or weighted=True to collapse the scores.
- Parameters:
levels (Sequence[tuple[str, ...] | str] | None) – List of levels to collapse. The order matters a lot.
weighted (bool) – Whether to collapse the scores using a mean weighted by the size of the cells.
- Return type:
float
Pooling¶
- fastabx.pooling(dataset, pooling_name)[source]¶
Pool the
Dataset
using the pooling method given bypooling_name
.The pooled dataset is a new one, with data stored in memory. For simplicity, we iterate through the original dataset and apply pooling on each element.
- Parameters:
dataset (Dataset) – The dataset to pool.
pooling_name (PoolingName) – The pooling method, either “mean” or “hamming”.
- Return type:
PooledDataset
Advanced¶
Cell¶
- class fastabx.cell.Cell(a, b, x, header, description, is_symmetric)[source]¶
Individual cell of the ABX task.
Cells are the unit of work for the ABX
Task
andScore
. They are collections of triplets (A, B, X) that share the same values for theon
,by
andacross
conditions.- Parameters:
a (Batch) – Batch of A samples.
b (Batch) – Batch of B samples.
x (Batch) – Batch of X samples.
header (str) – Short string identifying the cell.
description (str) – Long string describing the cell.
is_symmetric (bool) – Whether or not the cell is symmetric (i.e., A and X are the same set).
Distance¶
- fastabx.distance.distance_on_cell(cell, distance)[source]¶
Compute the distance matrices between all A and X, and all B and X in the
cell
, for a givendistance
.- Parameters:
cell (Cell) – The cell to compute the distances on.
distance (Distance) – The distance function to use. It takes two tensors of shape (n1, s1, d) and (n2, s2, d) and returns a tensor of shape (n1, n2, s1, s2).
- Return type:
tuple[Tensor, Tensor]
- fastabx.distance.abx_on_cell(cell, distance, *, mask=None)[source]¶
Compute the ABX of a
cell
using the givendistance
.- Parameters:
cell (Cell) – The cell to compute the ABX on.
distance (Distance) – The distance function to use. It takes two tensors of shape (n1, s1, d) and (n2, s2, d) and returns a tensor of shape (n1, n2, s1, s2).
mask (Tensor | None) – Optional boolean mask of shape (nx, na, nb) to select which triplets to include in the score.
- Return type:
Tensor
DTW¶
- fastabx.dtw.dtw(distances)[source]¶
Compute the DTW of the given
distances
2D tensor.- Parameters:
distances (Tensor) – A 2D tensor of shape (n, m) representing the pairwise distances between two sequences.
- Return type:
Tensor
- fastabx.dtw.dtw_batch(distances, sx, sy, *, symmetric)[source]¶
Compute the batched DTW on the
distances
4D tensor.- Parameters:
distances (Tensor) – A 4D tensor of shape (n1, n2, s1, s2) representing the pairwise distances between two batches of sequences.
sx (Tensor) – A 1D tensor of shape (n1,) representing the lengths of the sequences in the first batch.
sy (Tensor) – A 1D tensor of shape (n2,) representing the lengths of the sequences in the second batch.
symmetric (bool) – Whether or not the DTW is symmetric (i.e., the two batches are the same).
- Return type:
Tensor
Constraints¶
- type fastabx.constraints.Constraints¶
Type alias for
Iterable[pl.Expr | pl.Series | str]
.Should be a valid input to
pl.DataFrame.filter
. See With constraints to understand how to use them.
- fastabx.constraints.constraints_all_different(*columns)[source]¶
Return
Constraints
that ensure that each specified column has different values for A, B and X.- Parameters:
columns (str) – The columns to apply the constraints on.
- Return type:
Constraints
Environment variables¶
FASTABX_WITH_LIBRILIGHT_BUG
: If set to 1, changes the behaviour ofDataset.from_item
to match Libri-Light. Every feature will now be one frame shorter. This should be set only if you want to replicate previous results obtained with Libri-Light / ZeroSpeech 2021. See Slicing features for more details on how features are sliced.