NED and Coverage MeasureΒΆ
Many spoken term discovery systems incorporate a step whereby fragments of speech are realigned and compared. Matching quality measures the accuraty of this process. Here, we use the NED/Coverage metrics for evaluating that.
NED and Coverage are quick to compute and give a qualitative estimate of the matching step. NED is the Normalised Edit Distance; it is equal to zero when a pair of fragments have exactly the same transcription, and 1 when they differ in all phonemes. Coverage is the fraction of corpus that contain matching pairs that has been discovered.
where
with - \(P_{all}\): the set of all possible non overlapping matching
fragment pairs. \(P_{all}=\{ \{a,b \}\in F_{all} \times F_{all} | T_{a} = T_{b}, \neg \textrm{overlap}(a,b)\}\).
\(P_{disc}\): the set of non overlapping discovered pairs, \(P_{disc} = \{ \{a,b\} | a \in c, b \in c, \neg \textrm{overlap}(a,b), c \in C_{disc} \}\)
\(P_{disc^*}\): the set of pairwise substring completion of \(P_{disc}\), which mean that we compute all of the possible minimal path realignments of the two strings, and extract all of the substrings pairs along the path (e.g., for fragment pair