Installation and configuration¶
Note
First of all, clone the Abkhazia github repository and go to the
created abkhazia
directory:
git clone git@github.com:bootphon/abkhazia.git
cd ./abkhazia
Note
Abkhazia have been succesfully installed on various Unix flavours (Debian, Ubuntu, CentOS) and on Mac OS. It should be possible to install it on a Windows system as well (through CygWin), but this has not been tested.
Note
Abkhazia is for python3 only, if you really want to use python2 please use [abkhazia-0.3.1](https://github.com/bootphon/abkhazia/releases/tag/v0.3.1), the last python2 compatible version.
Use in docker¶
The most simple way of deploy abkhazia is to use it under a docker container. Once you have docker installed on your machine, build the container with:
docker build -t abkhazia .
Then you can run it for instance with:
docker run -it --rm abkhazia bash
The need to mount your corpus data inside the container (using the -v option of docker run and modify the abkhazia configuration (see Configuration files). Read the docker documentation here.
Install dependencies¶
Before deploying Abkahzia on your system, you need to install the following dependencies: Kaldi, sox, shorten and festival.
C++ compilers¶
You need to have both gcc
and clang
installed. On
Debian/Ubuntu just have a:
sudo apt-get install gcc gfortran clang
Flac, sox and festival¶
Abkhazia relies on flac and sox for audio conversion from various file formats to wav.
They should be in repositories of every standard Unix distribution, for exemple in Debian/Ubuntu:
sudo apt-get install flac sox
Abkhazia also needs festival to phonemize the transcriptions of the Childes Brent corpus. Visit this link for installation guidelines, or on Ubuntu/Debian use:
sudo apt-get install festival
Shorten¶
shorten is used for wav conversion from the original shn audio files, it must be installed manually. Follow these steps to download, compile and install it:
wget http://shnutils.freeshell.org/shorten/dist/src/shorten-3.6.1.tar.gz
tar xzf shorten-3.6.1.tar.gz
cd shorten-3.6.1
./configure
make
sudo make install
Kaldi¶
Kaldi is an open-source toolkit for HMM based ASR. It is a collection of low-level C++ programs and high-level bash scripts. See here.
In brief: use install_kaldi.sh
The
./install_kaldi.sh
script will download, configure and compile Kaldi to./kaldi
. This should work on any standard Unix distribution and fail on the first encountered error. If so, install Kaldi manually as detailed in the following steps.If install_kaldi.sh failed
If so you need to install Kaldi manually with the following steps:
Because Kaldi is developed under continuous integration, there is no published release to rely on. To ensure the abkhazia stability, we therefore maintain a Kaldi fork that is guaranteed to work. Clone it in a directory of your choice and ensure the active branch is abkhazia (this should be the default):
git clone git@github.com:bootphon/kaldi.git
Once cloned, you have to install Kaldi (configuration and compilation). Follow those instructions. Basically, you have to do (from the
kaldi
directory):cd tools ./extras/check_dependancies.sh make -j # -j does a parallel build on multiple cores cd ../src ./configure make depend -j make -j
Install Kaldi extras tools (SRILM and IRSTLM librairies) required by abkhazia. From your local kaldi directory, type:
cd ./tools ./extras/install_irstlm.sh ./extras/install_srilm.sh
Install Abkhazia¶
It will check the dependancies for you and will initialize a default configuration file in
share/abkhazia.conf
:./configure
The
install_kaldi.sh
setup theKALDI_PATH
environment variable to point to the installed Kaldi root directory. If you have installed Kaldi manually, or if theconfigure
script complains for a missingKALDI_PATH
, you need to specify it with for exemple:KALDI_PATH=./kaldi ./configure
Rerun this script and correct the prompted configuration errors until it succed. At least you are asked to specify the path to Kaldi (as installed in previous step) in the configuration file.
Finally install abkhazia:
python setup.py build [sudo] python setup.py install
In case you want to modify and test the code inplace, replace the last step by
python setup.py develop
.To build the documentation (the one you are actually reading), install sphinx (with
pip install Sphinx
) and, from the Abkhazia root directory, have a:sphinx-build -b html ./docs/source ./docs/html
Then open the file
./docs/html/index.html
with your favorite browser.
Configuration files¶
Abkhazia relies on two configuration files, abkhazia.conf
and
queue.conf
. Those files are generated at install time (during the
configuration step) and wrote in the share
directory. But those files
are usually copied in the installation directory (e.g. in /usr/bin
)
Note
To know where are located the configuration files on your setup, have a:
abkhazia --help
abkhazia.conf
¶
The abkhazia.conf
configuration file defines a set of general
parameters that are used by Abkhazia.
abkhazia.data-directory is the directory where abkhazia write its data (corpora, recipes and results are stored here). During installation, the data directory is configured to point in a
data
folder of the abkhazia source tree. You can specify a path to another dircetory, maybe on another partition.kaldi.kaldi-directory is the path to an installed kaldi distribution. This path is configured during abkhazia installation.
kaldi.{train, decode, highmem}-cmd setup the parallelization to run the Kaldi recipes. Choose either
run.pl
to run locally orqueue.pl
to use a cluster managed with the Sun GridEngine.raw corpora directories can be specified in the
corpus
section of the configuration file.
Run the tests¶
Note
The tests are actually based on the Buckeye corpus, so you must
provide the path to the raw Buckeye distribution before launching the
tests. Enter this path in the buckeye-directory
entry in the
Abkhazia configuration file.
Abkhazia comes with a test suite, from the abkhazia root directory run it using:
pytest ./test
To install the
pytest
package, simply have a:[sudo] pip install pytest
If you run the tests on a cluster and you configured Abkhazia to use Sun GridEngine, you must specify the temp directory to be in a shared filesystem with
pytest ./test --basetemp=mydir/tmp
.