wordseg package is made of a collection of command-line programs and a
Python library that can be installed using the instructions below.
On Linux: native support, tested on continuous integration.
On MacOS: two algorithms are actually unsupported on MacOS,
wordseg-dpseg. You can still use them with docker (see Installation in docker).
Before going further, please clone the repository from github and go in its root directory:
git clone https://github.com/bootphon/wordseg.git ./wordseg cd ./wordseg
The package is implemented in Python and C++ and requires extra softwares to work:
Python3 (Python2 is no more supported),
a C++ compiler supporting the C++11 standard,
cmake for configuration and build (see here),
the boost program options C++ library for option parsing (see here),
the joblib Python package for parallel processing (see here).
the scikit-learn Python package for statistical analysis (see here).
Installation of the wordseg package¶
There are three options:
System-wide installation: This is the recommended installation if you want to use
wordsegon your personal computer (and you do not want to modify/contribute to the code).
Installation in a virtual environment: This is the recommended installation if you are not administrator of your machine, if you are working in a multi-user environment (e.g. a computing cluster) or if you are developing with
Installation in docker: This is the recommended installation if you are working on Windows or Mac, or in a cloud infrastructure, or if you want a reproducible environment.
Install the required dependencies:
sudo apt-get install python3 python3-pip cmake libboost-program-options-dev
on Mac OSX:
brew install python3 boost cmake
Finally compile and install
[sudo] make install
If you planned to modify the wordseg’s code, use
make develop instead of
Installation in a virtual environment¶
First install conda from here.
Create a new Python 3 virtual environment named
wordsegand install the required dependencies:
conda env create -f environment.yml
Activate your virtual environment:
conda activate wordseg
Install the wordseg package:
Do not forget to activate your virtual environment before using wordseg:
conda activate wordseg
Installation in docker¶
We provide a Dockerfile to build a docker image of wordseg that can be run on Linux, Mac and Windows.
First install docker for you OS:
Build the wordseg image:
[sudo] docker build -t wordseg .
Now you can run wordseg from within a docker container.
For exemple run an interactive bash session in docker, mapping a data directory on your local host to /data in docker:
[sudo] docker run -v $PWD/test/data/:/data -it wordseg /bin/bash # you are now in the docker machine, run wordseg as usual root@1d32398b8c8e:/wordseg# head -5 /data/tagged.txt | wordseg-prep | wordseg-dpseg --nfolds 1 yuw kuhdiytihtwihdhaxspuwn yuw hhaev t axkaht dhaet kaorn tuw aen d baxnaenax guhdchiyz ehmehm teystiy kaorn
On Mac use wordseg-ag and wordseg-dpseg within docker. For exemple, if you already have a wordseg installation on your computer, you can use it for all but ag an dpseg algorithms, and use those two from docker. Here we use the local wordseg-prep along with the docker wordseg-dpseg:
user@host:~/dev/wordseg$ head -5 $PWD/test/data/tagged.txt | wordseg-prep | docker run -i wordseg wordseg-dpseg --nfolds 1 yuw kuhdiytihtwihdhaxspuwn yuw hhaev t axkaht dhaet kaorn tuw aen d baxnaenax guhdchiyz ehmehm teystiy kaorn
Optional: Run tests to check your installation¶
We recommend you always run this tests suite, because that will allow you to make sure that all dependencies and all subparts of the package have been appropriately installed. Simply have a:
If all your tests passed, then you can skip this section. You have successfully installed wordseg. If some of the tests failed, then the package’s capabilities may be reduced.
The tests are located in
./testand are executed by pytest. In case of test failure, you may want to rerun the tests with the command
pytest -v ./testto have a more detailed output.
pytest supports a lot of options. For exemple to stop the execution at the first failure, use
pytest -x. To execute a single test case, use
Optional: Build the documentation¶
To build the html documentation (the one you are currently reading), first install some dependencies. On Ubuntu/Debian:
sudo apt-get install texlive textlive-latex-extra dvipng [sudo] pip install sphinx sphinx_rtd_theme numpydoc
Then have a:
The documentation is built, it’s homepage being