The package takes as input the same format as in the Zero Resource Speech Challenge (zerospeech.com):
Class 1: wav1 on1 off1 wav2 on2 off2 Class 2: wav1 on3 off3 wav3 on4 off4 wav2 on5 off5
offset times are expressed in seconds.
Note that each class must end with an empty line, including the last class of the file. So the file must be terminated by a blank line.
If you want to use other input formats, you need to edit the
read_clusters method in
The package uses gold phone and words alignments to evaluate the inputs.
The alignments are stored in
The formats for the alignements is (without header):
filename1 on1 off1 symbol1 filename2 on2 off2 symbol2 ...
Where filename are the names of the wavs, and symbol are the words or phones.
To add your own language in the package, you need to add
tde/share and add the option in