Files format

This package uses several types of files, this section describe them all.

Dataset

Extension: .item

This file indexes the database on which the ABX task is executed. It is a regular text file and should have the following structure:

#file

onset

offset

#label 1

label 2

label 3

file 1

start 1

stop 1

value 1

value 1

value 1

file 2

start 2

stop 2

value 2

value 1

value 1

file 3

start 3

stop 3

value 3

value 1

value 1

  • the first line must be a header line beginning with the 3 fields #file onset offset.

  • #file is the name of the file minus the extension. Note that the ‘#’ at the begining is mandatory.

  • onset is the instant when the sound start.

  • offset is the instant when the sound end.

  • the label columns are various regressors relevant to the discrimination task. Note that the first label column must start with a ‘#’.

Features file

Extension: .features or .h5f

This file contains the features and the center time of each window in the h5features format. This is a special hdf5 file with the following attributes:

  • features a 2D arrays with the ‘feature’ dimension along the columns and the ‘time’ dimension along the lines.

  • times a 1D array with the center time of each window.

  • files the basename of the files from which the features are extracted. Note that it does not contain the full absolute path nor the relative path of the files, each file must have a unique name.

Task file

Extension: .abx

This file can be generated by the task module. It is a hdf5 file. It contains all the triplets and the resulting pairs. The elements are grouped by their ‘by’ attribute (all the elements with the same by attributes belong to the same block)

The structure is as follow:

data.abx

  • triplets

    • by0: (3 x ?)-array referencing all the possible triplets sharing a ‘by’ value of by0

    • by1

    • etc.

  • unique_pairs (All the pairs AX and BX, useful to calculate the distances. Note that a pair is designated by a single number due to a special encoding)

    • by0: 1D-array referencing all the pairs sharing a ‘by’ value of by0. Note that this is only 1D instead of 2D due to a special encoding of the pairs. Let ‘n’ be the number of items in the block, ‘a’ be the index of the first item of the pair and ‘b’ the index of the second item: the index of the pair ‘p’ = n*a + b

    • etc.

  • regressors (infos of the item file in a computer efficient format)

  • feat_dbs (infos of the item file in a computer efficient format)

Distance file

Extension: .distance

This file contains the distances between the two members of each unique pair. The distances are store by ‘by’ block and in the same order as the unique_pairs in the Task file.

  • distances

    • by0: 1D-array containing the distances between the two members of each pair.

    • by1

    • etc.

Score file

Extension: .score

This file contains the score of each triplets. The score is 1 when X is closer to A and -1 when X is closer to B. The score are stored by ‘by’ block and in the same order as the triplets in the Task file.

  • scores

    • by0: 1D-array of integers containing the score of each triplet.

    • by1

    • etc.

Analyse file

Extension: .csv

The output file of the ABX baseline, in a human readable format. Contains the average results collapsed over triplets sharing the same on, across and by attributes. It uses a score of 1 when X is closer to A and 0 when X is closer to B.

The extensions _1 and _2 to the labels name follow the following convention:

A

B

X

on_1

on_2

on_1

ac_1

ac_1

ac_2

Example for a task on ‘on’, across ‘ac’ and by ‘by’:

on_1

ac_1

ac_2

on_2

by

score

n

v0

v0

v1

v1

v0

0.2

5

v1

v1

v0

v0

v0

0.7

3

  • on_1 value of ‘on’ label for A and X

  • on_2 value of ‘on’ label for B

  • ac_1 value of ‘ac’ label for A and B

  • ac_2 value of ‘ac’ label for X

  • by value of ‘by’ label for A, B and X

  • score average score for those triplets

  • n number of triplets