==================== ABX discriminability ==================== The ABX discriminability task is inspired by match-to-sample tasks used in human psychophysics, and measures the discriminability between two categories. It is a zero-resource evaluation metric that does not rely on training an additional probe. It measures what is directly extractable from the representations. It is dimensionality-agnostic and works with dense or discrete representations. The task is illustrated in the figure below. It measures how well categories of interest are separated in the representation space by determining whether tokens from the same category are closer to each other than to those from a different category. .. image:: ./static/abx_light.svg :width: 70% :align: center :class: only-light .. image:: ./static/abx_dark.svg :width: 70% :align: center :class: only-dark The A, B, and X in the name ABX refer to the methodology. The discriminability of category :math:`A` from category :math:`B` is the probability that a token :math:`x` of category :math:`A` is closer to another :math:`a \in A` than to a token :math:`b \in B`. For example, to measure the discriminability of the phoneme /a/ from /e/, we construct :math:`A` as the set of all the instances of /a/ in our corpus and :math:`B` as all instances of /e/. Below is an example of the ABX task where the categories of to discriminate are the indices of the underlying Gaussians. The Gaussians follow :math:`\mathcal{N}(\mathbf{0}, I)` and :math:`\mathcal{N}(\mathbf{\mu}, I)`. .. image:: ./static/gaussians_light.svg :width: 100% :align: center :class: only-light .. image:: ./static/gaussians_dark.svg :width: 100% :align: center :class: only-dark In this initial formulation, categories have a single attribute: phoneme in our example. However, in many cases, the input signal is characterized simultaneously by multiple attributes. In speech, for instance, the signal at a given time window can be characterized both by the underlying phoneme being uttered and by additional factors such as the surrounding context (previous and following phonemes), and by speaker’s identity. This additional information can be used to build rich ABX tasks that test the extent to which discriminability remains robust despite variability induced by one or several other categories. We can therefore construct an ABX task specified by three conditions, illustrated in the table below: .. list-table:: Example of valid triples for various ABX tasks. :widths: 70 10 10 10 :header-rows: 1 * - Task - :math:`a` - :math:`b` - :math:`x` * - ON fruit - 🍎 - πŸ‹ - 🍏 * - ON color - πŸ‹β€πŸŸ© - 🍎 - 🍐 * - ON fruit, BY color - 🍎 - πŸ“ - 🍎 * - ON fruit, BY color, ACROSS count - 🍏 - 🍐 - 🍏🍏 We say that we measure the ABX discriminability ON the attribute that is identical between the A and X categories, and that is different for B. We measure BY the attribute that remains the same for A, B and X. Finally, when an attribute is the same for A and B but different for X, we say that the measure is ACROSS this attribute. For example, in the standard ABX task that was used in the ZeroSpeech challenges, we measure the ABX discriminability on phoneme, by context, and by or across speaker, using representations of triphones. With a slight abuse of notations: - ON: :math:`a = x \neq b` - BY: :math:`a = b = x` - ACROSS: :math:`a = b \neq x` We call "cell" the set of triples :math:`\mathcal{C} = A \times B \times X`. One example of a cell in the standard phoneme ABX task is the set of all instances of /bag/ and /beg/ spoken by a given speaker. The comparison between tokens is performed using a distance :math:`d` on the representations of :math:`a`, :math:`b` and :math:`x`. The test is successful if the representations of :math:`a` and :math:`x` are closer than the representations of :math:`b` and :math:`x`. Formally, the ABX discriminability of a cell is .. math:: \mathcal{D}_\mathcal{C} = \frac{1}{|\mathcal{C}|} \sum_{(a, b, x) \in \mathcal{C}} (\mathbf{1}_{d(a, x) < d(b, x)} + \frac{1}{2} \mathbf{1}_{d(a, x) = d(b, x)}). The overall ABX discriminability :math:`\mathcal{D}` is a weighted average across all cells. The weighting function is a way to balance the effects of the asymmetries between cells and the differences in cell size. What was done in the phoneme ABX task was to average first over contexts, then over the speaker identities, and finally over phonemes. The library returns results in terms of ABX error rate :math:`1 βˆ’ \mathcal{D}`.