.. _constraints:

================
With constraints
================

.. warning::
    This is an advanced feature that should be used with care.
    In most cases, it is not necessary to use constraints.

In the majority of applications, the ABX evaluation is fully specified using only the ON, BY, and ACROSS conditions.
However, in some cases, because of the specificites of the dataset or the hierarchy of attributes, it can be
necessary to filter the triplets further.

The :py:class:`.Subsampler` can already be used to limit the number of :py:class:`.Cell` in a :py:class:`.Task`, but
it only operates at the cell level, not at the triplet level. It can subsample the cells, and remove some based
on the categories used for the ON, BY, and ACROSS conditions. But there is no constrained filtering inside each cell.

To achieve this finer filtering of triplets, the :py:class:`.Score` class accepts a :py:type:`.Constraints` argument.
:py:type:`.Constraints` are lists of polars expressions that are applied to each triplet before computing the ABX
score of the individual cell. The expressions should involve the labels of the triplets contained in ``Dataset.labels``,
suffixed by ``_a``, ``_b``, and ``_x``. This is a powerful and general mechanism, and it can be used to do any kind of filtering.

For example, let's say we are interested in accent discrimination from sentence embeddings.
The dataset is described by the following labels:

.. csv-table::
    :header: sentence, accent, speaker, path

    Hello world, american, A1, /path/1.pt
    Hello world, british, B1, /path/2.pt
    We went to the park yesterday, french, F1, /path/3.pt
    ...

We want to understand whether our embeddings can discriminate accents when the semantic content is the same.
Therefore, we compute the ABX score ON accent, BY sentence.
However, we want to ensure that the speakers are different in each triplet to avoid the trivial cases where the same
speaker has uttered both stimuli A and X. We achieve this by constraining all speakers in a triplet to be different:

.. code-block:: python

    import polars as pl
    import torch
    from fastabx import Dataset, Task, Score

    labels = pl.read_csv("labels.csv")
    embeddings = torch.stack([torch.load(path) for path in labels["path"]]) # Shape (len(labels), dim)

    constraints =  [
        pl.col("speaker_a").ne(pl.col("speaker_x"))
        & pl.col("speaker_a").ne(pl.col("speaker_b"))
        & pl.col("speaker_x").ne(pl.col("speaker_b"))
    ]

    dataset = Dataset.from_numpy(embeddings, labels)
    task = Task(dataset, on="accent", by=["sentence"])
    score = Score(task, "angular", constraints=constraints)
    print(score.collapse(levels=["sentence"]))


In this particular example, we provide :py:func:`.constraints_all_different` to directly build the constraints.

.. code-block:: python

    from fastabx.constraints import constraints_all_different

    constraints = constraints_all_different("speaker")