API reference¶
- fastabx.zerospeech_abx(item, root, *, max_size_group, max_x_across, speaker='within', context='within', distance='angular', frequency=50, feature_maker=<function load>, extension='.pt', seed=0)[source]¶
Compute the ABX similarly to the ZeroSpeech 2021 challenge.
On triphone or phoneme, described by an item file. Within or across speaker, and within context or ignoring context.
- Parameters:
item (str | Path) – Path to the item file.
root (str | Path) – Path to the root directory containing either the features or the audio files.
max_size_group (int | None) – Maximum number of instances of A, B, or X in each
Cell. Passed to theSubsamplerof theTask. Set to 10 in the original ZeroSpeech ABX code. Disabled if set toNone.max_x_across (int | None) – In the “across” speaker mode, maximum number of X considered for given values of A and B. Passed to the
Subsamplerof theTask. Set to 5 in the original ZeroSpeech ABX code. Disabled if set toNone.speaker (Literal['within', 'across']) – The speaker mode, either “within” or “across”. Defaults to “within”.
context (Literal['within', 'any']) – The context mode, either “within” or “any”. Always use “within” with representations of triphones. Defaults to “within”.
distance (DistanceName) – The distance metric, “angular” (same as “cosine”), “euclidean”, “kl_symmetric” or “identical”. Defaults to “angular”.
frequency (int) – The feature frequency of the features / the output of the feature maker, in Hz. Defaults to 50 Hz.
feature_maker (Callable[[str | Path], Tensor]) – Function that takes a path and returns a torch.Tensor. Defaults to
torch.load.extension (str) – The filename extension of the files to process in
root, default is “.pt”.seed (int) – The random seed for the subsampling, default is 0.
- Return type:
float
Standard classes and functions¶
Dataset¶
- class fastabx.Dataset(labels, accessor)[source]¶
Simple interface to a dataset.
- Parameters:
labels (DataFrame) –
pl.DataFramecontaining the labels of the datapoints.accessor (InMemoryAccessor) –
InMemoryAccessorto access the data.
- classmethod from_csv(path, feature_columns, *, separator=',')[source]¶
Create a dataset from a CSV file.
- Parameters:
path (str | Path) – Path to the CSV file containing both the labels and the features.
feature_columns (str | Collection[str]) – Column name or list of column names containing the features.
separator (str) – Separator used in the CSV file.
- Return type:
- classmethod from_dataframe(df, feature_columns)[source]¶
Create a dataset from a DataFrame (polars or pandas).
- Parameters:
df (SupportsInterchange) – DataFrame containing both the labels and the features.
feature_columns (str | Collection[str]) – Column name or list of column names containing the features.
- Return type:
- classmethod from_item(item, root, frequency, *, feature_maker=<function load>, extension='.pt', file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file.
If you want to keep the Libri-Light bug to reproduce previous results, set the environment variable FASTABX_WITH_LIBRILIGHT_BUG=1.
- Parameters:
item (str | Path) – Path to the item file.
root (str | Path) – Path to the root directory containing either the features or the audio files.
frequency (int) – The feature frequency of the features / the output of the feature maker, in Hz.
feature_maker (Callable[[str | Path], Tensor]) – Function that takes a path and returns a torch.Tensor. Defaults to
torch.load.extension (str) – The filename extension of the files to process in
root, default is “.pt”.file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_item_and_units(item, units, frequency, *, audio_key='audio', units_key='units', separator=' ', file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file with the units all described in a single JSONL file.
- Parameters:
item (str | Path) – Path to the item file.
units (str | Path) – Path to the JSONL file containing the units.
frequency (int) – The feature frequency, in Hz.
audio_key (str) – Key in the JSONL file that contains the audio file names, default is “audio”.
units_key (str) – Key in the JSONL file that contains the units, default is “units”.
separator (str) – Separator used in the units field, default is whitespace “ “.
file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_item_with_times(item, features, times, *, file_col='#file', onset_col='onset', offset_col='offset')[source]¶
Create a dataset from an item file.
Use arrays containing the times associated to the features instead of a given frequency.
- Parameters:
item (str | Path) – Path to the item file.
features (str | Path) – Path to the root directory containing either the features or the audio files.
times (str | Path) – Path to the root directory containing the times arrays.
file_col (str) – Column in the item file that contains the audio file names, default is “#file”.
onset_col (str) – Column in the item file that contains the onset times, default is “onset”.
offset_col (str) – Column in the item file that contains the offset times, default is “offset”.
- Return type:
- classmethod from_numpy(features, labels)[source]¶
Create a dataset from the features (numpy array) and the labels (dictionary of sequences).
- Parameters:
features (ArrayLike) – 2D array-like containing the features.
labels (Mapping[str, Sequence[object]] | SupportsInterchange) – Dictionary of sequences or DataFrame containing the labels.
- Return type:
Task¶
- class fastabx.Task(dataset, *, on, by=None, across=None, subsampler=None)[source]¶
The ABX task class.
A Task builds all the
Cellgivenon,byandacrossconditions. It can be subsampled to limit the number of cells.- Parameters:
dataset (Dataset) – The dataset containing the features and the labels.
on (str) – The
oncondition.by (list[str] | None) – The list of
byconditions.across (list[str] | None) – The list of
acrossconditions.subsampler (Subsampler | None) – An optional subsampler to limit the number of cells and their sizes.
Subsample¶
- class fastabx.Subsampler(max_size_group, max_x_across, seed=0)[source]¶
Subsample the ABX
Task.Each cell is limited to
max_size_groupitems for A, B and X independently. When using “across” conditions, each group of (A, B) is limited tomax_x_acrosspossible values for X. Subsampling for one or more conditions can be disabled by setting the corresponding argument toNone.- Parameters:
max_size_group (int | None) – Maximum number of instances of A, B, or X in each
Cell. Set to 10 in the original ZeroSpeech ABX code. Disabled if set toNone.max_x_across (int | None) – In the “across” speaker mode, maximum number of X considered for given values of A and B. Set to 5 in the original ZeroSpeech ABX code. Disabled if set to
None.seed (int) – The random seed for the subsampling, default is 0.
Score¶
- class fastabx.Score(task, distance_name, *, constraints=None)[source]¶
Compute the score of a
Taskusing a given distance specified bydistance_name.Additional
Constraintscan be provided to restrict the possible triplets in each cell.- Parameters:
- collapse(*, levels=None, weighted=False)[source]¶
Collapse the scored cells into the final score.
Use either levels or weighted=True to collapse the scores.
- Parameters:
levels (Sequence[tuple[str, ...] | str] | None) – List of levels to collapse. The order matters a lot.
weighted (bool) – Whether to collapse the scores using a mean weighted by the size of the cells.
- Return type:
float
Pooling¶
- fastabx.pooling(dataset, pooling_name)[source]¶
Pool the
Datasetusing the pooling method given bypooling_name.The pooled dataset is a new one, with data stored in memory. For simplicity, we iterate through the original dataset and apply pooling on each element.
- Parameters:
dataset (Dataset) – The dataset to pool.
pooling_name (PoolingName) – The pooling method, either “mean” or “hamming”.
- Return type:
PooledDataset
Advanced¶
Cell¶
- class fastabx.cell.Cell(a, b, x, header, description, is_symmetric)[source]¶
Individual cell of the ABX task.
Cells are the unit of work for the ABX
TaskandScore. They are collections of triplets (A, B, X) that share the same values for theon,byandacrossconditions.- Parameters:
a (Batch) – Batch of A samples.
b (Batch) – Batch of B samples.
x (Batch) – Batch of X samples.
header (str) – Short string identifying the cell.
description (str) – Long string describing the cell.
is_symmetric (bool) – Whether or not the cell is symmetric (i.e., A and X are the same set).
Distance¶
- fastabx.distance.distance_on_cell(cell, distance)[source]¶
Compute the distance matrices between all A and X, and all B and X in the
cell, for a givendistance.- Parameters:
cell (Cell) – The cell to compute the distances on.
distance (Distance) – The distance function to use. It takes two tensors of shape (n1, s1, d) and (n2, s2, d) and returns a tensor of shape (n1, n2, s1, s2).
- Return type:
tuple[Tensor, Tensor]
- fastabx.distance.abx_on_cell(cell, distance, *, mask=None)[source]¶
Compute the ABX of a
cellusing the givendistance.- Parameters:
cell (Cell) – The cell to compute the ABX on.
distance (Distance) – The distance function to use. It takes two tensors of shape (n1, s1, d) and (n2, s2, d) and returns a tensor of shape (n1, n2, s1, s2).
mask (Tensor | None) – Optional boolean mask of shape (nx, na, nb) to select which triplets to include in the score.
- Return type:
Tensor
DTW¶
- fastabx.dtw.dtw(distances)[source]¶
Compute the DTW of the given
distances2D tensor.- Parameters:
distances (Tensor) – A 2D tensor of shape (n, m) representing the pairwise distances between two sequences.
- Return type:
Tensor
- fastabx.dtw.dtw_batch(distances, sx, sy, *, symmetric)[source]¶
Compute the batched DTW on the
distances4D tensor.- Parameters:
distances (Tensor) – A 4D tensor of shape (n1, n2, s1, s2) representing the pairwise distances between two batches of sequences.
sx (Tensor) – A 1D tensor of shape (n1,) representing the lengths of the sequences in the first batch.
sy (Tensor) – A 1D tensor of shape (n2,) representing the lengths of the sequences in the second batch.
symmetric (bool) – Whether or not the DTW is symmetric (i.e., the two batches are the same).
- Return type:
Tensor
Constraints¶
- type fastabx.constraints.Constraints¶
Type alias for
Iterable[pl.Expr | pl.Series | str].Should be a valid input to
pl.DataFrame.filter. See With constraints to understand how to use them.
- fastabx.constraints.constraints_all_different(*columns)[source]¶
Return
Constraintsthat ensure that each specified column has different values for A, B and X.- Parameters:
columns (str) – The columns to apply the constraints on.
- Return type:
Constraints
Environment variables¶
FASTABX_WITH_LIBRILIGHT_BUG: If set to 1, changes the behaviour ofDataset.from_itemto match Libri-Light. Every feature will now be one frame shorter. This should be set only if you want to replicate previous results obtained with Libri-Light / ZeroSpeech 2021. See Slicing features for more details on how features are sliced.