.. _ned_coverage:

NED and Coverage Measure
~~~~~~~~~~~~~~~~~~~~~~~~

Many spoken term discovery systems incorporate a step whereby
fragments of speech are realigned and compared. Matching quality
measures the accuraty of this process. Here, we use the *NED/Coverage*
metrics for evaluating that.


*NED* and *Coverage* are quick to compute and give a qualitative
estimate of the matching step. *NED* is the Normalised Edit Distance;
it is equal to zero when a pair of fragments have exactly the same
transcription, and 1 when they differ in all phonemes. *Coverage* is
the fraction of corpus that contain matching pairs that has been
discovered.

.. math::

   \textrm{NED} &= \sum_{\langle x, y\rangle \in P_{disc}}
   \frac{\textrm{ned}(x, y)}{|P_{disc}|} \\
   \textrm{Coverage} &= \frac{|\textrm{cover}(P_{disc})|}{|\textrm{cover}(P_{all})|}

where

.. math::

   \textrm{ned}(\langle i, j \rangle, \langle k, l \rangle) &=
   \frac{\textrm{Levenshtein}(T_{i,j}, T_{k,l})}{\textrm{max}(j-i+1,k-l+1)} \\
   \textrm{cover}(P) &= \bigcup_{\langle i, j \rangle \in \textrm{flat}(P)}[i, j] \\
   \textrm{flat}(P) &= \{p|\exists q:\{p,q\}\in P\}


with
- :math:`P_{all}`: the set of all possible non overlapping matching
  fragment pairs. :math:`P_{all}=\{ \{a,b \}\in F_{all} \times F_{all}
  | T_{a} = T_{b}, \neg \textrm{overlap}(a,b)\}`.
- :math:`P_{disc}`: the set of non overlapping discovered pairs,
  :math:`P_{disc} = \{ \{a,b\} | a \in c, b \in c, \neg
  \textrm{overlap}(a,b), c \in C_{disc} \}`

- :math:`P_{disc^*}`: the set of pairwise substring completion of
  :math:`P_{disc}`, which mean that we compute all of the possible
  minimal path realignments of the two strings, and extract all of the
  substrings pairs along the path (e.g., for fragment pair