Installation¶
The wordseg
package is made of a collection of command-line programs and a
Python library that can be installed using the instructions below.
On Linux: native support, tested on continuous integration.
On MacOS: two algorithms are actually unsupported on MacOS,
wordseg-ag
andwordseg-dpseg
. You can still use them with docker (see Installation in docker).On Windows: use docker or the Linux subsytem for Windows, see issue #6 for a discussion.
Note
Before going further, please clone the repository from github and go in its root directory:
git clone https://github.com/bootphon/wordseg.git ./wordseg
cd ./wordseg
Dependencies¶
The package is implemented in Python and C++ and requires extra softwares to work:
Python3 (Python2 is no more supported),
a C++ compiler supporting the C++11 standard,
cmake for configuration and build (see here),
the boost program options C++ library for option parsing (see here),
the joblib Python package for parallel processing (see here).
the scikit-learn Python package for statistical analysis (see here).
Installation of the wordseg package¶
There are three options:
System-wide installation: This is the recommended installation if you want to use
wordseg
on your personal computer (and you do not want to modify/contribute to the code).Installation in a virtual environment: This is the recommended installation if you are not administrator of your machine, if you are working in a multi-user environment (e.g. a computing cluster) or if you are developing with
wordseg
.Installation in docker: This is the recommended installation if you are working on Windows or Mac, or in a cloud infrastructure, or if you want a reproducible environment.
System-wide installation¶
Install the required dependencies:
on Ubuntu/Debian:
sudo apt-get install python3 python3-pip cmake libboost-program-options-dev
on Mac OSX:
brew install python3 boost cmake
Finally compile and install
wordseg
:[sudo] make install
Note
If you planned to modify the wordseg’s code, use make develop
instead of
make install
.
Installation in a virtual environment¶
Note
If you have already followed the instructions under System-wide installation skip this section to go directly to Optional: Run tests to check your installation.
This installation process is based on the conda python package manager and can be performed on any Linux, Mac OS or Windows system supported by conda (but you can use virtualenv as well).
First install conda from here.
Create a new Python 3 virtual environment named
wordseg
and install the required dependencies:conda env create -f environment.yml
Activate your virtual environment:
conda activate wordseg
Install the wordseg package:
make install
Note
Do not forget to activate your virtual environment before using wordseg:
conda activate wordseg
Installation in docker¶
We provide a Dockerfile to build a docker image of wordseg that can be run on Linux, Mac and Windows.
First install docker for you OS:
Build the wordseg image:
[sudo] docker build -t wordseg .
Now you can run wordseg from within a docker container.
For exemple run an interactive bash session in docker, mapping a data directory on your local host to /data in docker:
[sudo] docker run -v $PWD/test/data/:/data -it wordseg /bin/bash # you are now in the docker machine, run wordseg as usual root@1d32398b8c8e:/wordseg# head -5 /data/tagged.txt | wordseg-prep | wordseg-dpseg --nfolds 1 yuw kuhdiytihtwihdhaxspuwn yuw hhaev t axkaht dhaet kaorn tuw aen d baxnaenax guhdchiyz ehmehm teystiy kaorn
Note
On Mac use wordseg-ag and wordseg-dpseg within docker. For exemple, if you already have a wordseg installation on your computer, you can use it for all but ag an dpseg algorithms, and use those two from docker. Here we use the local wordseg-prep along with the docker wordseg-dpseg:
user@host:~/dev/wordseg$ head -5 $PWD/test/data/tagged.txt | wordseg-prep | docker run -i wordseg wordseg-dpseg --nfolds 1
yuw kuhdiytihtwihdhaxspuwn
yuw hhaev t axkaht dhaet kaorn tuw
aen d baxnaenax
guhdchiyz
ehmehm teystiy kaorn
Optional: Run tests to check your installation¶
We recommend you always run this tests suite, because that will allow you to make sure that all dependencies and all subparts of the package have been appropriately installed. Simply have a:
make test
Note
If all your tests passed, then you can skip this section. You have successfully installed wordseg. If some of the tests failed, then the package’s capabilities may be reduced.
The tests are located in
./test
and are executed by pytest. In case of test failure, you may want to rerun the tests with the commandpytest -v ./test
to have a more detailed output.pytest supports a lot of options. For exemple to stop the execution at the first failure, use
pytest -x
. To execute a single test case, usepytest ./test/test_separator.py::test_bad_separators
.
Optional: Build the documentation¶
To build the html documentation (the one you are currently reading), first install some dependencies. On Ubuntu/Debian:
sudo apt-get install texlive textlive-latex-extra dvipng
[sudo] pip install sphinx sphinx_rtd_theme numpydoc
Then have a:
make html
The documentation is built, it’s homepage being build/doc/html/index.html
.