ABXpy Package

ABXpy Package

ABX discrimination test in Python

ABX discrimination is a term that is used for three stimuli presented on an ABX trial. The third is the focus. The first two stimuli (A and B) are standard, S1 and S2 in a randomly chosen order, and the subjects’ task is to choose which of the two is matched by the final stimulus (X). (Glottopedia)

This package contains the operations necessary to initialize, calculate and analyse the results of an ABX discrimination task.

Organisation

It is composed of 3 main modules and other submodules.

The features can be calculated in numpy via external tools, and made compatible with this package with the npz2h5features function.

The pipeline

In

Module

Out

  • data.item

  • parameters

task

  • data.abx

  • data.abx

  • data.features

  • distance

distance

  • data.distance

  • data.abx

  • data.distance

score

  • data.score

  • data.abx

  • data.score

analyse

  • data.csv

See Files Format for a description of the files used as input and output.

task Module

This module is used for creating a new task and preprocessing.

This module contains the functions to specify and initialise a new ABX task, compute and display the statistics, and generate the ABX triplets and pairs.

It can also be used in a command line. See task –help for the documentation

Usage

From the command line:

python task.py my_data.item         -o column1 -a column2 column3 -b column4 column5         -f "[attr == 0 for attr in column3_X]"

my_data.item is a special file containing an index of the database and a set of labels or attributes. See input format [here](http://abxpy.readthedocs.io/en/latest/FilesFormat.html#dataset)

In python:

import ABXpy.task

# create a new task and compute the statistics
myTask = ABXpy.task.Task(
    'data.item', 'on_label', 'across_feature', 'by_label',
    filters=my_filters, regressors=my_regressors)

# display statistics
print myTask.stats

# generate a h5db file 'data.abx'containing all the triplets and pairs
myTask.generate_triplets()

Example

An example of ABX triplet:

A

B

X

on_1

on_2

on_1

ac_1

ac_1

ac_2

by

by

by

A and X share the same ‘on’ attribute; A and B share the same ‘across’ attribute; A,B and X share the same ‘by’ attribute

class ABXpy.task.Task(db_name, on, across=None, by=None, filters=None, regressors=None, verbose=False)[source]

Bases: object

Define an ABX task for a given database.

Parameters
db_namestr

the filename of database on which the ABX task is applied.

onstr

the ‘on’ attribute of the ABX task. A and X share the same ‘on’ attribute and B has a different one.

acrosslist, optional

a list of strings containing the ‘across’ attributes of the ABX task. A and B share the same ‘across’ attributes and X has a different one.

bylist, optional

a list of strings containing the ‘by’ attributes of the ABX task. A,B and X share the same ‘by’ attributes.

filterslist, optional

a list of string specifying a filter on A, B or X.

regressorslist, optional

a list of string specifying a filter on A, B or X.

verbosebool, optional

display additionnal information is set to True.

Attributes
`stats`dict. Contain several statistics about the task. The main

3 attributes are:

- nb_blocks the number of blocks of ABX triplets sharing the same ‘on’,

‘across’ and ‘by’ features.

- nb_triplets the number of triplets considered.
- nb_by_levels the number of blocks of ABX triplets sharing the same

‘by’ attribute.

compute_nb_levels()[source]
compute_statistics(approximate=False)[source]

Compute the statistics of the task

The number of ABX triplets is exact in most cases if approximate is set to false. The other statistics can only be approxrimate in the case where there are A, B, X or ABX filters.

Parameters
approximatebool

approximate the number of triplets

generate_triplets(output=None, threshold=None, tmpdir=None, seed=None)[source]

Generate all possible triplets for the whole task

Generate the triplets and the pairs for an ABXpy.Task and store it in a h5db file.

Parameters
outputfilename, optional

The output file. If not specified, it will automatically create a new file with the same name as the input file.

thresholdTODO
tmpdirdirectory, optional

where to write temporary files

seedint, optional

seed for initializing the random number generator

on_across_triplets(by, on, across, on_across_block, on_across_by_values, with_regressors=True)[source]

Generate all possible triplets for a given by block.

Given an on_across_block of the database and the parameters of the task, this function will generate the complete set of triplets and the regressors.

Parameters
byint

The block index

on, acrossint

The task attributes

on_across_blocklist

the block

on_across_by_valuesdict

the actual values

with_regressorsbool, optional

By default, true

Returns
tripletsnumpy.Array

the set of triplets generated

regressorsnumpy.Array

the regressors generated

print_stats(filename=None, summarized=True)[source]
print_stats_to_stream(stream, summarized)[source]
ABXpy.task.main()[source]

Command-line API for generating ABX tasks

ABXpy.task.on_across_from_key(key)[source]
ABXpy.task.parse_arguments()[source]

Defines and parses input arguments for the command-line API

ABXpy.task.sort_and_threshold(permut, new_index, ind_type, threshold=None, count_only=False)[source]
ABXpy.task.sort_pairs(abx_file, by, memory=1000, tmpdir=None)[source]

Sort pairs in a ABX task file

abx_file: the hdf5 file generated by ABX task

by:

memory: available RAM in Mo

tmpdir: dircetory to write temporary files

score Module

This module is used for computing the score of a task (see task Module on how to create a task)

This module contains the actual computation of the score. It requires a task and a distance, and redirect the output in a score file.

The main function takes a distance file and a task file as input to compute the score of the task on those distances. X closer to A is associated with a score of 1 and X closer to B with score of -1.

The distances between pairs in the distance file must be ordered the same way as the pairs in the task file, and the triplet score int the output file will be ordered the same way as the triplets in the task file.

Usage

Form the command line:

python score.py data.abx data.distance data.score

In python:

import ABXpy.task
import ABXpy.score
# create a new task:
myTask = ABXpy.task.Task('data.item', 'on_feature', 'across_feature', 'by_feature', filters=my_filters, regressors=my_regressors)
myTask.generate_triplets()
#initialise distance
#TODO shouldn't this be available from score
# calculate the scores:
ABXpy.score('data.abx', 'myDistance.???', 'data.score')
ABXpy.score.main()[source]
ABXpy.score.score(task_file, distance_file, score_file=None, score_group='scores')[source]

Calculate the score of a task and put the results in a hdf5 file.

Parameters
task_filestring

The hdf5 file containing the task (with the triplets and pairs generated)

distance_filestring

The hdf5 file containing the distances between the pairs

score_filestring, optional

The hdf5 file that will contain the results

analyze Module

This module is used to analyse the results of an ABX discrimination task

It collapses the result and give the mean score for each block of triplets sharing the same on, across and by labels. It output a tab separated csv file which columns are the relevant labels, the average score and the number of triplets in the block. See Files format for a more in-depth explanation.

It requires a score file and a task file.

Usage

Form the command line:

python analyze.py data.score data.abx data.csv

In python:

import ABXpy.analyze
# Prerequisite: calculate a task data.abx, and a score data.score
ABXpy.analyze.analyze(data.score, data.abx, data.csv)
ABXpy.analyze.analyze(task_file, score_file, result_file)[source]

Analyse the results of a task

Parameters
task_filestring, hdf5 file

the file containing the triplets and pairs of the task

score_filestring, hdf5 file

the file containing the score of a task

result_file: string, csv file

the file that will contain the analysis results

ABXpy.analyze.collapse(scorefile, taskfile, fid)[source]

Collapses the results for each triplets sharing the same on, across and by labels.

ABXpy.analyze.main()[source]
ABXpy.analyze.npdecode(keys, max_ind)[source]

Vectorized implementation of the decoding of the labels: i = (a1*n2 + a2)*n3 + a3 …

ABXpy.analyze.parse_args()[source]
ABXpy.analyze.unique(index, scores)[source]
ABXpy.analyze.unique_rows(arr)[source]

Numpy unique applied to the row only